IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Each fd_{may|cant|stop|want}_{recv|send} function sets or resets a
single bit at once, then recomputes the need for updates, and then
the new cache state. Later, pollers will compute the new polling
state based on the resulting operations here. In fact the conditions
are so simple that they can be performed by a single "if", or sometimes
even optimized away.
This means that in practice a simple compare-and-swap operation if often
enough to set the new value inluding the new polling state, and that only
the cache and fdupdt have to be performed under the lock. Better, for the
most common operations (fd_may_{recv,send}, used by the pollers), a simple
atomic OR is needed.
This patch does this for the fd_* functions above and it doesn't yet
remove the now useless fd_compute_new_polling_status() because it's still
used by other pollers. A pure connection rate test shows a 1% performance
increase.
An fd cache entry might be removed and added at the end of the list, while
another thread is parsing it, if that happens, we may miss fd cache entries,
to avoid that, add a new field in the struct fdtab, "added_mask", which
contains a mask for potentially affected threads, if it is set, the
corresponding thread will set its bit in fd_cache_mask, to avoid waiting in
poll while it may have more work to do.
Create a local, per-thread, fdcache, for file descriptors that only belongs
to one thread, and make the global fd cache mostly lockless, as we can get
a lot of contention on the fd cache lock.
Instead of looking for CO_FL_EARLY_DATA to know if we have to try to wake
up a stream, because it is waiting for a SSL handshake, instead add a new
conn_stream flag, CS_FL_WAIT_FOR_HS. This way we don't have to rely on
CO_FL_EARLY_DATA, and we will only wake streams that are actually waiting.
Instead of using a list of applets with idle ones in front, we now use an
ebtree. Aapplets in the tree are idle by definition. And the key is the applet's
weight. When a new frame is queued, the first idle applet (with the lowest
weight) is woken up and its weight is increased by one. And when an applet sends
a frame to a SPOA, its weight is decremented by one.
This is empirical, but it should avoid to overuse a very few number of applets
and increase the balancing between idle applets.
So it is easier to respect the max_fpa value. This is no more the maximum frames
processed by an applet at each loop but the maximum frames waiting for an ack
for a specific applet.
The function spoe_handle_processing_appctx has been rewritten accordingly.
sending_rate was a counter used to evaluate the SPOE capacity to process
frames. Because it was not really accurrate, it has been replaced by a frequency
counter representing the number of frames handled by the SPOE per second. We
just check this counter is higher than the number of streams waiting for a
reply. If not, a new applet is created.
The calculation of a minimal number of active applets was really empirical and
finally useless. On heavy load, there are always many active applets (most of
time, more than the minimal required) and when the load is low, there is no
reason to keep unused applets opened.
Because of this change, the flag SPOE_APPCTX_FL_PERSIST is now unused. So it has
been removed.
Recent changes to the enum were not synchronized with the lock debugging
code. Now we use a switch/case instead of an array so that the compiler
throws a warning if there is any inconsistency.
To be backported to 1.8 (at least to add the START entry).
fd_insert() is currently called just after setting the owner and iocb,
but proceeding like this prevents the operation from being atomic and
requires a lock to protect the maxfd computation in another thread from
meeting an incompletely initialized FD and computing a wrong maxfd.
Fortunately for now all fdtab[].owner are set before calling fd_insert(),
and the first lock in fd_insert() enforces a memory barrier so the code
is safe.
This patch moves the initialization of the owner and iocb to fd_insert()
so that the function will be able to properly arrange its operations and
remain safe even when modified to become lockless. There's no other change
beyond the internal API.
These functions were created for poll() in 1.5-dev18 (commit 80da05a4) to
replace the previous FD_{CLR,SET,ISSET} that were shared with select()
because some libcs enforce a limit on FD_SET. But FD_SET doesn't seem
to be universally MT-safe, requiring locks in the select() code that
are not needed in the poll code. So let's move back to the initial
situation where we used to only use bit fields, since that has been in
use since day one without a problem, and let's use these hap_fd_*
functions instead of FD_*.
This patch only moves the functions to fd.h and revives hap_fd_isset()
that was recently removed to kill an "unused" warning.
Since only select() and poll() still make use of maxfd, let's move
its computation right there in the pollers themselves, and only
during each fd update pass. The computation doesn't need a lock
anymore, only a few atomic ops. It will be accurate, be done much
less often and will not be required anymore in the FD's fast patch.
This provides a small performance increase of about 1% in connection
rate when using epoll since we get rid of this computation which was
performed under a lock.
The incorrect comment was introduced in commit:
2ac5718dbd4ec722ece228e9f613d2be74eee9da
v1.5-dev9 is the first tag containing this comment, the fix
should be backported to haproxy 1.5 and newer.
Marc Fournier reported an interesting case when using threads with the
master-worker mode : sometimes, a listener would have its FD closed
during startup. Sometimes it could even be health checks seeing this.
What happens is that after the threads are created, and the pollers
enabled on each threads, the master-worker pipe is registered, and at
the same time a close() is performed on the write side of this pipe
since the children must not use it.
But since this is replicated in every thread, what happens is that the
first thread closes the pipe, thus releases the FD, and the next thread
starting a listener in parallel gets this FD reassigned. Then another
thread closes the FD again, which this time corresponds to the listener.
It can also happen with the health check sockets if they're started
early enough.
This patch splits the mworker_pipe_register() function in two, so that
the close() of the write side of the FD is performed very early after the
fork() and long before threads are created (we don't need to delay it
anyway). Only the pipe registration is done in the threaded code since
it is important that the pollers are properly allocated for this.
The mworker_pipe_register() function now takes care of registering the
pipe only once, and this is guaranteed by a new surrounding lock.
The call to protocol_enable_all() looks fragile in theory since it
scans the list of proxies and their listeners, though in practice
all threads scan the same list and take the same locks for each
listener so it's not possible that any of them escapes the process
and finishes before all listeners are started. And the operation is
idempotent.
This fix must be backported to 1.8. Thanks to Marc for providing very
detailed traces clearly showing the problem.
Some pollers like epoll() need to know if the fd is already known or
not in order to compute the operation to perform (add, mod, del). For
now this is performed based on the difference between the previous FD
state and the new state but this will not be usable anymore once threads
become responsible for their own polling.
Here we come with a different approach : a bitmask is stored with the
fd to indicate which pollers already know it, and the pollers will be
able to simply perform the add/mod/del operations based on this bit
combined with the new state.
This patch only adds the bitmask declaration and initialization, it
is it not yet used. It will be needed by the next two fixes and will
need to be backported to 1.8.
Since the fd update tables are per-thread, we need to have a bit per
thread to indicate whether an update exists, otherwise this can lead
to lost update events every time multiple threads want to update the
same FD. In practice *for now*, it only happens at start time when
listeners are enabled and ask for polling after facing their first
EAGAIN. But since the pollers are still shared, a lost event is still
recovered by a neighbor thread. This will not reliably work anymore
with per-thread pollers, where it has been observed a few times on
startup that a single-threaded listener would not always accept
incoming connections upon startup.
It's worth noting that during this code review it appeared that the
"new" flag in the fdtab isn't used anymore.
This fix should be backported to 1.8.
A bitfield has been added to know if there are some FDs processable by a
specific thread in the FD cache. When a FD is inserted in the FD cache, the bits
corresponding to its thread_mask are set. On each thread, the bitfield is
updated when the FD cache is processed. If there is no FD processed, the thread
is removed from the bitfield by unsetting its tid_bit.
Note that this bitfield is updated but not checked in
fd_process_cached_events. So, when this function is called, the FDs cache is
always processed.
[wt: should be backported to 1.8 as it will help fix a design limitation]
A number of counters have been added at special places helping better
understanding certain bug reports. These counters are maintained per
thread and are shown using "show activity" on the CLI. The "clear
counters" commands also reset these counters. The output is sent as a
single write(), which currently produces up to about 7 kB of data for
64 threads. If more counters are added, it may be necessary to write
into multiple buffers, or to reset the counters.
To backport to 1.8 to help collect more detailed bug reports.
This one allows not to inflate some structures when threads are
disabled. Now struct global is 1.4 kB instead of 33 kB.
Should be backported to 1.8 for ease of backporting of upcoming
patches.
The "thread" part is 32kB long, better move it at the end of the
structure since it's only used during initialization, to keep the
rest grouped together.
Should be backported to 1.8 to ease backporting of upcoming patches,
no functional impact.
In addition to "option force-set-var", recently added, this directive can be
used to selectivelly register unknown variable names, without totally relaxing
their registration during the runtime, like "option force-set-var" does.
So there is no way for a malicious agent to exhaust memory by defining a too
high number of variable names. In other hand, you need to enumerate all
variable names. This could be painfull in some circumstances.
Remember, this directive is only usefull when the variable names are not
referenced anywhere in the HAProxy configuration or the SPOE one.
Thanks to Etienne Carrire for his help on this part.
Commit 1a69af6d3892fe1946bb8babb3044d2d26afd46e introduced code
for atomic prior to 4.7. Unfortunately clang uses as well those
constants which is misleading.
Till now the use of __atomic_* gcc builtins required gcc >= 4.7. Since
some supported and quite common operating systems like CentOS 6 still
come with older versions (4.4) and the mapping to the older builtins
is reasonably simple, let's implement it.
This code is only used for gcc < 4.7. It has been quickly tested on a
machine using gcc 4.4.4 and provided expected results.
This patch should be backported to 1.8.
A SRV record weight can range from 0 to 65535, while haproxy weight goes
from 0 to 256, so we have to divide it by 256 before handing it to haproxy.
Also, a SRV record with a weight of 0 doesn't mean the server shouldn't be
used, so use a minimum weight of 1.
This should probably be backported to 1.8.
Since commit f9ce57e ("MEDIUM: connection: make conn_sock_shutw() aware
of lingering"), we refrain from performing the shutw() on the socket if
there is no lingering risk. But there is a problem with this in tunnel
and in TCP modes where a client is explicitly allowed to send a shutw
to the server, eventhough it it risky.
Not doing it creates this situation reported by Ricardo Fraile and
diagnosed by Christopher : a typical HTTP client (eg: curl) connecting
via the config below to an HTTP server would receive its response,
immediately close while the server remains in keep-alive mode. The
shutr() received by haproxy from the client is "propagated" to the
server side but not acted upon because fdtab[fd].linger_risk is set,
so we expect that the next close will immediately complete this
operation.
listen proxy-tcp
bind 127.0.0.1:8888
mode tcp
timeout connect 5s
timeout server 10s
timeout client 10s
server server1 127.0.0.1:8000
But since the whole stream will not end until the server closes in
turn, the server doesn't close and haproxy expires on server timeout.
This problem has already struck by waking up an older bug and was
partially fixed with commit 8059351 ("BUG/MEDIUM: http: don't disable
lingering on requests with tunnelled responses") though it was not
enough.
The problem is that linger_risk is not suited here. In fact we need to
know whether or not it is desired to close normally or silently, and
whether or not a shutr() has already been received on this connection.
This is the approach this patch takes, and it solves the problem for
the various difficult modes (tcp, http-server-close, pretend-keepalive).
This fix needs to be backported to 1.8. Many thanks to Ricardo for
providing very detailed traces and configurations.
The new function check_request_for_cacheability() is used to check if
a request may be served from the cache, and/or allows the response to
be stored into the cache. For this it checks the cache-control and
pragma header fields, and adjusts the existing TX_CACHEABLE and a new
TX_CACHE_IGNORE flags.
For now, just like its response side counterpart, it only checks the
first value of the header field. These functions should be reworked to
improve their parsers and validate all elements.
By copying the info in the stream interface that the mux cleanly reports
aborts, we'll have the ability to check this flag wherever needed regardless
of the presence of a mux or not.
This new field will be used to describe certain properties of some
muxes. For now we only add MX_FL_CLEAN_ABRT to indicate that a mux
is able to unambiguously report aborts using CS_FL_ERROR contrary
to others who may only report it via a read0. This will be used to
improve handling of the abortonclose option with H2. Other flags
may come later to report multiplexing capabilities or not, support
of client/server sides etc.
For security reasons, the spoe filter was only able to change values of
existing variables. In specific cases (ex : with LUA code), the name of
variables are unknown at the configuration parsing phase.
The force-set-var option can be enabled to register all variables.
Due to the nature of multiplexed protocols, it will often happen that
some operations are only performed on full frames, preventing any partial
operation from being performed. HTTP/2 is one such example. The current
MUX API causes a problem here because the rcv_buf() function has no way
to let the stream layer know that some data could not be read due to a
lack of room in the buffer, but that data are definitely present. The
problem with this is that the stream layer might not know it needs to
call the function again after it has made some room. And if the frame
in the buffer is not followed by any other, nothing will move anymore.
This patch introduces a new conn_stream flag CS_FL_RCV_MORE whose purpose
is to indicate on the stream that more data than what was received are
already available for reading as soon as more room will be available in
the buffer.
This patch doesn't make use of this flag yet, it only declares it. It is
expected that other similar flags may come in the future, such as reports
of pending end of stream, errors or any such event that might save the
caller from having to poll, or simply let it know that it can take some
actions after having processed data.
The thread patches adds refcount for notifications. The notifications are
used with the Lua cosocket. These refcount free the notifications when
the session is cleared. In the Lua task case, it not have sessions, so
the nofications are never cleraed.
This patch adds a garbage collector for signals. The garbage collector
just clean the notifications for which the end point is disconnected.
This patch should be backported in 1.8
The number of async fd is computed considering the maxconn, the number
of sides using ssl and the number of engines using async mode.
This patch should be backported on haproxy 1.8
In hpack_dht_make_room(), we try to fulfill this rule form RFC7541#4.4 :
"It is not an error to attempt to add an entry that is larger than the
maximum size; an attempt to add an entry larger than the maximum size
causes the table to be emptied of all existing entries and results in
an empty table."
Unfortunately it is not consistent with the way it's used in
hpack_dht_insert() as this last one will consider a success as a
confirmation it can copy the header into the table, and a failure as
an indexing error. This results in the two following issues :
- if a client sends too large a header into an empty table, this
header may overflow the table. Fortunately, most clients send
small headers like :authority first, and never mark headers that
don't fit into the table as indexable since it is counter-productive ;
- if a client sends too large a header into a populated table, the
operation fails after the table is totally flushed and the request
is not processed.
This patch fixes the two issues at once :
- a header not fitting into an empty table is always a sign that it
will never fit ;
- not fitting into the table is not an error
Thanks to Yves Lafon for reporting detailed traces demonstrating this
issue. This fix must be backported to 1.8.
If the hpack decoder sees an invalid header index, it emits value
"### ERR ###" that was used during debugging instead of rejecting the
block. This is harmless, and was detected by h2spec.
To backport to 1.8.