8181 Commits

Author SHA1 Message Date
Willy Tarreau
7c6f8a2b0d MINOR: queue: implement pendconn queue locking functions
The new pendconn_queue_lock() and pendconn_queue_unlock() functions are
made to make it more convenient to lock or unlock the pendconn queue
either at the proxy or the server depending on pendconn->srv. This way
it is possible to remove the open-coding of these locks at various places.
These ones have been used in pendconn_unlink() and pendconn_add(), thus
significantly simplifying the logic there.
2018-07-26 17:32:51 +02:00
Willy Tarreau
88930dd364 MINOR: queue: use a distinct variable for the assigned server and the queue
The pendconn struct uses ->px and ->srv to designate where the element is
queued. There is something confusing regarding threads though, because we
have to lock the appropriate queue before inserting/removing elements, and
this queue may only be determined by looking at ->srv (if it's not NULL
it's the server, otherwise use the proxy). But pendconn_grab_from_px() and
pendconn_process_next_strm() both assign this ->srv field, making it
complicated to know what queue to lock before manipulating the element,
which is exactly why we have the pendconn_lock in the first place.

This commit introduces pendconn->target which is the target server that
the two aforementioned functions will set when assigning the server.
Thanks to this, the server pointer may always be relied on to determine
what queue to use.
2018-07-26 17:32:51 +02:00
Willy Tarreau
c1a60d6218 MINOR: queue: make sure pendconn->strm->pend_pos is always valid
pendconn_add() used to assign strm->pend_pos very late, after unlocking
the queue, so that a watching thread could see a random value in
pendconn->strm->pend_pos even while holding the lock on the element and
the queue itself. While there's currently nothing wrong with this, it
costs nothing to arrange it and will simplify code analysis later.
2018-07-26 17:32:51 +02:00
Willy Tarreau
6bdd05c0ef DOC: queue: document the expected locking model for the server's queue
The locking model is not trivial and is worth documenting to avoid seeing
apparent bugs everywhere while they are not.
2018-07-26 17:32:51 +02:00
Willy Tarreau
d0ad4a87f0 MEDIUM: queue: make pendconn_free() work on the stream instead
Now pendconn_free() takes a stream, checks that pend_pos is set, clears
it, and uses pendconn_unlink() to complete the job. It's cleaner and
centralizes all the bookkeeping work in pendconn_unlink() only and
ensures that there's a single place where the stream's position in the
queue is manipulated.
2018-07-26 17:32:51 +02:00
Willy Tarreau
9624faec86 MINOR: queue: centralize dequeuing code a bit better
For now the pendconns may be dequeued at two places :
  - pendconn_unlink(), which operates on a locked queue
  - pendconn_free(), which operates on an unlocked queue and frees
    everything.

Some changes are coming to the queue and we'll need to be able to be a
bit stricter regarding the places where we dequeue to keep the accounting
accurate. This first step renames the locked function __pendconn_unlink()
as it's for use by those aware of it, and introduces a new general purpose
pendconn_unlink() function which automatically grabs the necessary locks
before calling the former, and pendconn_cond_unlink() which additionally
checks the pointer and the presence in the queue.
2018-07-26 17:32:48 +02:00
Olivier Houchard
77551ee8a7 BUG/MEDIUM: tasks: make __task_unlink_rq responsible for the rqueue size.
As __task_wakeup() is responsible for increasing
rqueue_local[tid]/global_rqueue_size, make __task_unlink_rq responsible for
decreasing it, as process_runnable_tasks() isn't the only one that removes
tasks from runqueues.
2018-07-26 16:33:29 +02:00
Olivier Houchard
76e45181b2 MINOR: tasks: Add a flag that tells if we're in the global runqueue.
How that we have bits available in task->state, add a flag that tells if we're
in the global runqueue or not.
2018-07-26 16:33:10 +02:00
Willy Tarreau
f0cea1ee3f MINOR: tasks: extend the state bits from 8 to 16 and remove the reason
By removing the reason code for the wakeup we can gain 8 extra bits to
encode the task's state. The reason code was never used at all and is
wrong by design since subsequent calls will OR this value anyway. Let's
say it goodbye and leave the room for more precious bits. The woken bits
were moved to the higher byte so that the most important bits can stay
grouped together.
2018-07-26 16:13:00 +02:00
Willy Tarreau
ad8bd2467c MINOR: signal: don't pass the signal number anymore as the wakeup reason
This is never used and would even be wrong since the reasons are ORed
so two signals would be turned into a third value, just like if any
other reason was used at the same time.
2018-07-26 16:12:48 +02:00
Olivier Houchard
c4aac9effe BUG/MEDIUM: tasks: Make sure there's no task left before considering inactive.
We may remove the thread's bit in active_tasks_mask despite tasks for that
thread still being present in the global runqueue. To fix that, introduce
global_tasks_mask, and set the correspnding bits when we add a task to the
runqueue.
2018-07-26 15:40:22 +02:00
Willy Tarreau
189ea856a7 BUG/MEDIUM: tasks: use atomic ops for active_tasks_mask
We don't have the lock anymore so we need to protect it.
2018-07-26 15:16:43 +02:00
Olivier Houchard
e85ee7b663 BUG/MEDIUM: tasks: Decrement rqueue_size at the right time.
We need to decrement requeue_size when we remove a task form rqueue_local,
not when we remove if from the task list, or we'd also decrement it for any
tasklet, that was never in the rqueue in the first place.
2018-07-26 15:00:58 +02:00
Willy Tarreau
9a77186cb0 BUG/MEDIUM: tasks: make sure we pick all tasks in the run queue
Commit 09eeb76 ("BUG/MEDIUM: tasks: Don't forget to increase/decrease
tasks_run_queue.") addressed a count issue in the run queue and uncovered
another issue with the way the tasks are dequeued from the global run
queue. The number of tasks to pick is computed using an integral divide,
which results in up to nbthread-1 tasks never being run. The fix simply
consists in getting rid of the divide and checking the task count in the
loop.

No backport is needed, this is 1.9-specific.
2018-07-26 14:24:46 +02:00
Olivier Houchard
306e653331 BUG/MINOR: servers: Don't make "server" in a frontend fatal.
When parsing the configuration, if "server", "default-server" or
"server-template" are found in a frontend, we first warn that it will be
ignored, only to be considered a fatal error later. Be true to our word, and
just ignore it.

This should be backported to 1.8 and 1.7.
2018-07-24 17:13:54 +02:00
Willy Tarreau
055ba4f505 BUG/MEDIUM: stats: don't ask for more data as long as we're responding
The stats applet is still a bit hackish. It uses the HTTP txn to parse
the POST contents. Due to this it pretends not having parsed the request
from the buffer so that the HTTP parser continues to work fine on these
data. This comes with a side effect : the request lies pending in the
channel's buffer, and because of this, stream_int_update_applet() always
wakes the applet up. It's very visible when retrieving a large stats page
over a slow link as haproxy eats 100% of the CPU waiting for the data to
leave.

While the proper long term solution definitely is to consume these data
and parse the body from the applet, changing this is not suitable for a
fix.

What this patch does instead is to disable request polling as long as there
are pending data in the response buffer. Given that for almost all cases,
the applet remains busy sending data, this is at least enough to ensure
that we don't wake up for the pending request data while we're waiting for
the client to receive these data. Now a 5k backend stats page is dumped at
1% CPU over a 10 Mbps link instead of 100%, using 1500 epoll_wait() calls
instead of 80000.

Note that the previous fix (BUG/MEDIUM: stream-int: don't immediately
enable reading when the buffer was reportedly full) is necessary for the
effects of the fix to be noticed since both bugs have the exact same
effect.

This fix must be backported at least as far as 1.5.
2018-07-24 17:13:32 +02:00
Willy Tarreau
171d5f203a BUG/MEDIUM: stream-int: don't immediately enable reading when the buffer was reportedly full
There is a long-time issue which affects some applets, at least the stats
applet. If a large stats page is read over a slow link, regularly the
channel's buffer contains too many response data to allow another round
of ci_putblk() to copy a new message. In this case the applet calls
si_applet_cant_put() to mention that it failed to emit data into the
channel's buffer, and wants to be called only once some room is made.

The problem is that stream_int_update(), which is called from
process_stream(), will clear this flag whenever it sees there's some
spare room in the channel's buffer. It causes the applet to be woken
again immediately. This is very visible when reading a large stats
page over a slow link, because in this case haproxy will run at 100%
CPU and strace shows mostly epoll_wait(0). It is very likely that some
other applets like CLI, Lua, peers or SPOE have also been affected but
that the effect were less noticeable because it was mixed with traffic.

Ideally stream_int_update() should not touch these flags, but changing
this would require a very careful auditing of all users. Instead here
what we do is that we respect the flag if the channel still has output
data. This way the flag will automatically disappear once the buffer is
empty, and the applet function will be called only when input data
remains, if at all.

This patch alone is not enough to observe the behaviour change on the
stats page because another bug takes over, addressed by next patch
(BUG/MEDIUM: stats: don't ask for more data as long as we're responding).
When both are applied, dumping stats for 5k backends over a 10 Mbps link
take 1% CPU instead of 100%, with 1.5k epoll_wait() calls instead of 80k.

This fix should be backported at least as far as 1.5.
2018-07-24 17:12:38 +02:00
Willy Tarreau
616ac81dec MINOR: h2: add the error code and the max/last stream IDs to "show fd"
This is intented to help debugging H2 in field.
2018-07-24 14:12:42 +02:00
Willy Tarreau
7cc040cc74 DOC: add more design feedback on the new layering model
Introduce the distinction between structured messages and raw data,
and how to make them coexist in a buffer. This is still a design draft.
2018-07-23 17:29:37 +02:00
Willy Tarreau
842ed9b1cb MEDIUM: h2: use the default conn_stream's receive function
This removes h2_rcv_buf() now that the generic code can handle it fine.
2018-07-20 19:37:12 +02:00
Willy Tarreau
39d68508c3 MINOR: h2: make use of CS_FL_REOS to indicate that end of stream was seen
This allows h2_rcv_buf() not to depend anymore on h2s at all and to become
generic.
2018-07-20 19:35:14 +02:00
Willy Tarreau
2df65e7194 MEDIUM: h2: don't call data_cb->recv() anymore
Now we simply call data_cb->wake() which will automatically perform the
recv() call if required.
2018-07-20 19:31:36 +02:00
Willy Tarreau
2a761dcf0d MEDIUM: h2: perform a single call to the data layer in demux()
Instead of calling the data layer from each individual frame processing
function, we now call it from demux. This requires to know the h2s that
was created inside h2c_frt_handle_headers(), which is why the pointer is
now returned. This results in a small performance boost from 58k to 60k
POST requests/s compared to -master, thanks to half the number of
si_cs_recv_cb() calls and 66% calls to si_cs_wake_cb().

It's interesting to note that all calls to data_cb->recv() are now always
immediately followed by a call to data_cb->wake(). The next step should
be to let the ->wake handler perform the recv() call itself. For this it
will be useful to have some info on the CS to indicate whether or not it
is ready to be read (ie: contains a non-empty input buffer).
2018-07-20 19:30:03 +02:00
Willy Tarreau
7999bfbfd3 MEDIUM: buffers: make b_xfer() automatically swap buffers when possible
Whenever it's possible to avoid a copy, b_xfer() will simply swap the
buffer's heads without touching the data. This has brought the performance
back from 140 kH/s to 202 kH/s on the test case.
2018-07-20 19:21:43 +02:00
Willy Tarreau
a56a6def91 MEDIUM: h2: move headers and data frame decoding to their respective parsers
Now we entirely process the input frame before transfering it above, so
that h2_rcv_buf() doesn't have to "speak" h2 anymore.
2018-07-20 19:21:43 +02:00
Willy Tarreau
454b57b347 MEDIUM: h2: centralize transfer of decoded frames in h2_rcv_buf()
We still call the parser but it should soon not be needed anymore. The
decode functions don't need the buffer nor the max size anymore. They
must also not touch the CS_FL_EOS or CS_FL_RCV_MORE flags either, so
this is done within h2_rcv_buf() after transmission.

The "flags" argument to h2_frt_decode_headers() and h2_frt_transfer_data()
has been removed since it's not used anymore.
2018-07-20 19:21:43 +02:00
Willy Tarreau
d755ea6c7d MEDIUM: h2: make h2_frt_transfer_data() copy via an intermediary buffer
The purpose here is also to ensure we can split the lower from the top
layers. The way the CS_FL_MSG_MORE flag is set was updated so that it's
set or cleared upon exit depending on the buffer's remaining contents.
2018-07-20 19:21:43 +02:00
Willy Tarreau
937f760e1e MEDIUM: h2: make h2_frt_decode_headers() use an intermediary buffer
The purpose is to decode to a temporary buffer and then to copy this buffer
to the caller. This double-copy definitely has an impact on performance, the
test code goes down from 220k to 140k req/s, but this memcpy() will disappear
soon.

The test on CO_RFL_BUF_WET has become irrelevant now since we only use
the cs' rxbuf, so we cannot be blocked by "output" data that has to be
forwarded first. Thus instead we don't start until the rxbuf is empty
(it will be drained from any input data once the stream processes it).
2018-07-20 19:21:43 +02:00
Willy Tarreau
0b559071dd MINOR: h2: make each H2 stream support an intermediary input buffer
The purpose is to decode to a temporary buffer and then to copy this buffer
to the caller upon request to avoid having to process frames on the fly
when called from the higher level. For now the buffer is only initialized
on stream creation via cs_new() and allocated if the buffer_wait's callback
is called.
2018-07-20 19:21:43 +02:00
Willy Tarreau
67b1e78f68 MEDIUM: stream-int: automatically call si_cs_recv_cb() if the cs has data on wake()
If the cs has data pending or shutdown and the input channel is still
waiting for reads, let's simply call the recv() function from the wake()
callback. This will allow the lower layers to simply wake the upper one
up without having to consider the recv() nor anything else.
2018-07-20 19:21:43 +02:00
Willy Tarreau
11c9aa424e MEDIUM: conn_stream: add cs_recv() as a default rcv_buf() function
This function is generic and is able to automatically transfer data
from a conn_stream's rx buffer to the destination buffer. It does this
automatically if the mux doesn't define another rcv_buf() function.
2018-07-20 19:21:43 +02:00
Willy Tarreau
5e1cc5ea83 MINOR: conn_stream: add an rx buffer to the conn_stream
In order to reorganize the connection layers, recv() operations will
need to be retryable and to support partial transfers. This requires
an intermediary buffer to hold the data coming from the mux. After a
few attempts, it turns out that this buffer is best placed inside the
conn_stream itself. For now it's only set to buf_empty and it will be
up to the caller to allocate it if required.
2018-07-20 19:21:43 +02:00
Willy Tarreau
a3f7efe009 MINOR: conn_stream: add a new CS_FL_REOS flag
This flag indicates that the mux layer has already detected an end of
stream which will become CS_FL_EOS during a recv() once the rx buffer
is empty.
2018-07-20 19:21:43 +02:00
Willy Tarreau
9382cdd8e1 DOC: add some design notes about the new layering model
This explains how streams and connection should interact.
2018-07-20 19:21:43 +02:00
Willy Tarreau
f148888d19 MINOR: buffers: add b_xfer() to transfer data between buffers
Instead of open-coding buffer-to-buffer transfers using blocks, let's
have a dedicated function for this. It also adjusts the buffer counts.
2018-07-20 19:21:43 +02:00
Willy Tarreau
f7d0447376 MINOR: buffers: split b_putblk() into __b_putblk()
The latter function is more suited to operations that don't require any
check because the check has already been performed. It will be used by
other b_* functions.
2018-07-20 19:21:43 +02:00
Willy Tarreau
ab322d4fd4 MINOR: buffers: simplify b_contig_space()
This function is used a lot in block copies and is needlessly
complicated since it still uses pointer arithmetic. Let's fall
back to regular offsets and simplify it. This removed around
23 bytes from b_putblk() and it removed any conditional jump.
2018-07-20 19:21:43 +02:00
Olivier Houchard
f495fc460e BUG/MEDIUM: mux_h2: Call h2_send() before updating polling.
In h2_wake(), make sure we call h2_send() before we try to update the
polling flags, and detect connection errors, or errors will never be
detected.
2018-07-20 19:07:49 +02:00
Christopher Faulet
ddb6c16576 BUG/MEDIUM: threads: Fix the exit condition of the thread barrier
In thread_sync_barrier, we exit when all threads have set their own bit in the
barrier mask. It is done by comparing it to all_threads_mask. But we must not
use a simple equality to do so, becaue all_threads_mask may change. Since commit
ba86c6c25 ("MINOR: threads: Be sure to remove threads from all_threads_mask on
exit"), when a thread exit, its bit is removed from all_threads_mask. Instead,
we must use a bitwise AND to test is all bits of all_threads_mask are set.

This also requires that all_threads_mask is set to volatile if we want to
catch changes.

This patch must be backported in 1.8.
2018-07-20 14:24:41 +02:00
Christopher Faulet
20761453fb MINOR: ist: Add the function isteqi
This new function does the same as isteq, but ignoring the case.
2018-07-20 13:39:30 +02:00
Christopher Faulet
5f8ef13d5d MINOR: debug: Add checks for conn_stream flags
This may be carefully backported to 1.8 (a few flags don't exist there).
2018-07-20 13:39:30 +02:00
Christopher Faulet
aff9328739 MINOR: debug: Add check for CO_FL_WILL_UPDATE
This could be backported to 1.8.
2018-07-20 13:39:30 +02:00
Tim Duesterhus
3ce3811a9c BUILD: Generate sha256 checksums in publish-release
Currently only md5 signatures are generated. While md5
still is not broken with regard to preimage attacks, sha256
clearly is the current secure solution.

This patch should be backported to all supported branches.
2018-07-20 10:50:20 +02:00
Christopher Faulet
4507351a2f BUG/MINOR: build: Fix compilation with debug mode enabled
It remained some fragments of the old buffers API in debug messages, here and
there.

This was caused by the recent buffer API changes, no backport is needed.
2018-07-20 10:45:20 +02:00
Christopher Faulet
005e79e5dd BUG/MINOR: http: Set brackets for the unlikely macro at the right place
When test on the header "Early-Data" is made, the unlikely macro must encompass
the condition.

This patch must be backported in 1.8.
2018-07-20 10:42:53 +02:00
Willy Tarreau
8318885487 MINOR: connection: simplify subscription by adding a registration function
This new function wl_set_waitcb() prepopulates a wait_list with a tasklet
and a context and returns it so that it can be passed to ->subscribe() to
be added to a connection or conn_stream's wait_list. The caller doesn't
need to know all the insiders details anymore this way.
2018-07-19 18:31:07 +02:00
Olivier Houchard
910b2bc829 MEDIUM: connections/mux: Revamp the send direction.
Totally nuke the "send" method, instead, the upper layer decides when it's
time to send data, and if it's not possible, uses the new subscribe() method
to be called when it can send data again.
2018-07-19 18:31:07 +02:00
Olivier Houchard
6ff2039d13 MINOR: connections/mux: Add a new "subscribe" method.
Add a new "subscribe" method for connection, conn_stream and mux, so that
upper layer can subscribe to them, to be called when the event happens.
Right now, the only event implemented is "SUB_CAN_SEND", where the upper
layer can register to be called back when it is possible to send data.

The connection and conn_stream got a new "send_wait_list" entry, which
required to move a few struct members around to maintain an efficient
cache alignment (and actually this slightly improved performance).
2018-07-19 16:23:43 +02:00
Olivier Houchard
e17c2d3e57 MINOR: tasklets: Don't attempt to add a tasklet in the list twice.
Don't try to add a tasklet to the run queue if it's already in there, or we
might get an infinite loop.
2018-07-19 16:23:43 +02:00
Willy Tarreau
23d465c48c DOC: buffers: remove obsolete docs about buffers
A number of outdated docs dating 2012 about buffers implementation
and management were totally irrelevant to the current code (and even
to most 1.8 code as well). These docs have all been removed so that
only the up to date documentation remains.
2018-07-19 16:23:43 +02:00