IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Till now pseudo headers were passed as const strings, but having them as
ISTs will be more convenient for traces. This doesn't change anything for
strings which are derived from them (and being constants they're still
zero-terminated).
(cherry picked from commit 271c440392242b5129ffc68787b781b1c39b4618)
[wt: will be used for h2 header traces]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 666d816bbb9ea25dc006a2a8de4335db242f8022)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
This patch follows this one which was not sufficient:
"BUG/MINOR: quic: Missing STREAM frame length updates"
Indeed, it is not sufficient to update the ->len and ->offset member
of a STREAM frame to move it forward. The data pointer must also be updated.
This is not done by the STREAM frame builder.
Must be backported to 2.6 and 2.7.
(cherry picked from commit ca07979b978ed683b96d0ec99073c7cb972fd022)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit dcc827589a3a351af315d1f4cf0126dd2dc8a746)
[cf: ctx adjustment]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
It's cheaper and cleaner than using br_count()==1 given that it just compares
two indexes, and that a ring having a single buffer is in a special case where
it is between empty and used up-to-1. In other words it's not congested.
(cherry picked from commit 9824f8c8908d67f6cedf2434e12a23f18a27eaf0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 6fd2e323a41d972fd0f6630ee02085366fd591bb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
The commit 5e1b0e7bf ("BUG/MEDIUM: connection: Clear flags when a conn is
removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and
CO_FL_IDLE_LIST flags are used when the connection is released to properly
decrement used/idle connection counters. if a connection is idle, these
flags must be preserved till the connection is really released. It may be
removed from the list but not immediately released. If these flags are lost
when it is finally released, the current number of used connections is
erroneously decremented. If means this counter may become negative and the
counters tracking the number of idle connecitons is not decremented,
suggesting a leak.
So, the above commit is reverted and instead we improve a bit the way to
detect an idle connection. The function conn_get_idle_flag() must now be
used to know if a connection is in an idle list. It returns the connection
flag corresponding to the idle list if the connection is idle
(CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection
is scheduled to be removed, 0 is also returned, regardless the connection
flags.
This new function is used when the connection is temporarily removed from
the list to be used, mainly in muxes.
This patch should fix#2078 and #2057. It must be backported as far as 2.2.
(cherry picked from commit 3a7b539b124bccaa57478e0a5a6d66338594615a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit a81a1e2aea0793aa624565a14cb7579b907f116a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
In environments where SYSTEM_MAXCONN is defined when compiling, the
master will use this value instead of the original minimal value which
was set to 100. When this happens, the master process could allocate
RAM excessively since it does not need to have an high maxconn. (For
example if SYSTEM_MAXCONN was set to 100000 or more)
This patch fixes the issue by using the new define MASTER_MAXCONN which
define a default maxconn of 100 for the master process.
Must be backported as far as 2.5.
(cherry picked from commit 2078d4b1f77f70ac110022c2b5ac4644e4c01640)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 585a6cb8ae836094df5ab4944607349866028ca1)
Signed-off-by: Willy Tarreau <w@1wt.eu>
This code was there because the timer task was not running on the same thread
as the one which parse the QUIC packets. Now that this is no more the case,
we can wake up this task directly.
Must be backported to 2.7.
(cherry picked from commit 75c8ad549002b01de0b0d139a26b356973af57a9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 367c06096b2338c46bb9bec273c3bcb80d983ee7)
[ad: taken on 2.6 to facilitate next cherry-pick]
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
When a STREAM frame is re-emitted, it will point to the same stream
buffer as the original one. If an ACK is received for either one of
these frame, the underlying buffer may be freed. Thus, if the second
frame is declared as lost and schedule for retransmission, we must
ensure that the underlying buffer is still allocated or interrupt the
retransmission.
Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid
to lookup over a tree each time a STREAM frame is re-emitted, a lost
STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST.
In most cases, this code is functional. However, there is several
potential issues which may cause a segfault :
- when explicitely probing with a STREAM frame, the frame won't be
flagged as lost
- when splitting a STREAM frame during retransmission, the flag is not
copied
To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted
to a <dup> field in quic_stream structure. This field is now properly
copied when splitting a STREAM frame. Also, as this is now an inner
quic_frame field, it will be copied automatically on qc_frm_dup()
invocation thus ensuring that it will be set on probing.
This issue was encounted randomly with the following backtrace :
#0 __memmove_avx512_unaligned_erms ()
#1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>,
#2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620,
#3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8,
#4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>,
#5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10,
#6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30,
#7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184
#8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310,
This should fix github issue #2051.
This should be backported up to 2.6.
(cherry picked from commit c8a0efbda86a14af38084ce85933bb691563935c)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 85ab1edd1549c4eb4680543d7f86c3065fbaf30e)
[ad: remove block which rejects frame on too many retransmission]
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
Properly handle a STREAM frame with no data but the FIN bit set at the
application layer. H3 and hq-interop decode_qcs() callback have been
adjusted to not return early in this case.
If the FIN bit is accepted, a HTX EOM must be inserted for the upper
stream layer. If the FIN is rejected because the stream cannot be
closed, a proper CONNECTION_CLOSE error will be triggered.
A new utility function qcs_http_handle_standalone_fin() has been
implemented in the qmux_http module. This allows to simply add the HTX
EOM on qcs HTX buffer. If the HTX buffer is empty, a EOT is first added
to ensure it will be transmitted above.
This commit will allow to properly handle FIN notify through an empty
STREAM frame. However, it is not sufficient as currently qcc_recv() skip
the decode_qcs() invocation when the offset is already received. This
will be fixed in the next commit.
This should be backported up to 2.6 along with the next patch.
(cherry picked from commit 381d8137e31d941c9143a1dc8b5760d29f388fef)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit c07a1c32d98812326dfff05cca02f8af44123b7e)
[ad: adjusted context : no h3 stream error level on 2.6]
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
As mentioned in commit 237e6a0d6 ("BUG/MAJOR: fd/thread: fix race between
updates and closing FD"), a race was found during stress tests involving
heavy backend connection reuse with many competing closes.
Here the problem is complex. The analysis in commit f69fea64e ("MAJOR:
fd: get rid of the DWCAS when setting the running_mask") that removed
the DWCAS in 2.5 overlooked a few races.
First, a takeover from thread1 could happen just after fd_update_events()
in thread2 validates it holds the tmask bit in the CAS loop. Since thread1
releases running_mask after the operation, thread2 will succeed the CAS
and both will believe the FD is theirs. This does explain the occasional
crashes seen with h1_io_cb() being called on a bad context, or
sock_conn_iocb() seeing conn->subs vanish after checking it. This issue
can be addressed using a DWCAS in both fd_takeover() and fd_update_events()
as it was before the patch above but this is not portable to all archs and
is not easy to adapt for those lacking it, due to some operations still
happening only on individual masks after the thread groups were added.
Second, the checks after fd_clr_running() for the current thread being
the last one is not sufficient: at the exact moment the operation
completes, another thread may also set and drop the running bit and see
itself as alone, and both can call _fd_close_orphan() in parallel. In
order to prevent this from happening, we cannot rely on the absence of
others, we need an explicit flag indicating that the FD must be closed.
One approach that was attempted consisted in playing with the thread_mask
but that was not reliable since it could still match between the late
deletion and the early insertion that follows. Instead, a new FD flag
was added, FD_MUST_CLOSE, that exactly indicates that the call to
_fd_delete_orphan() must be done. It is set by fd_delete(), and
atomically cleared by the first one which checks it, and which is the
only one to call _fd_delete_orphan().
With both points addressed, there's no more visible race left:
- takeover() only happens under the connection list's lock and cannot
compete with fd_delete() since fd_delete() must first remove the
connection from the list before deleting the FD. That's also why it
doesn't need to call _fd_delete_orphan() when dropping its running
bit.
- takeover() sets its running bit then atomically replaces the thread
mask, so that until that's done, it doesn't validate the condition
to end the synchonization loop in fd_update_events(). Once it's OK,
the previous thread's bit is lost, and this is checked for in
fd_update_events()
- fd_update_events() can compete with fd_delete() at various places
which are explained above. Since fd_delete() clears the thread mask
as after setting its running bit and after setting the FD_MUST_CLOSE
bit, the synchronization loop guarantees that the thread mask is seen
before going further, and that once it's seen, the FD_MUST_CLOSE flag
is already present.
- fd_delete() may start while fd_update_events() has already started,
but fd_delete() must hold a bit in thread_mask before starting, and
that is checked by the first test in fd_update_events() before setting
the running_mask.
- the poller's _update_fd() will not compete against _fd_delete_orphan()
nor fd_insert() thanks to the fd_grab_tgid() that's always done before
updating the polled_mask, and guarantees that we never pretend that a
polled_mask has a bit before the FD is added.
The issue is very hard to reproduce and is extremely time-sensitive.
Some tests were required with a 1-ms timeout with request rates
closely matching 1 kHz per server, though certain tests sometimes
benefitted from saturation. It was found that adding the following
slowdown at a few key places helped a lot and managed to trigger the
bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread
setup:
{ volatile int i = 10000; while (i--); }
Particularly, placing it at key places where only one of running_mask
or thread_mask is set and not the other one yet (e.g. after the
synchronization loop in fd_update_events or after dropping the
running bit) did yield great results.
Many thanks to Olivier Houchard for this expert help analysing these
races and reviewing candidate fixes.
The patch must be backported to 2.5. Note that 2.6 does not have tgid
in FDs, and that it requires a change of output on fd_clr_running() as
we need the previous bit. This is provided by carefully backporting
commit d6e1987612 ("MINOR: fd: make fd_clr_running() return the previous
value instead"). Tests have shown that the lack of tgid is a showstopper
for 2.6 and that unless a better workaround is found, it could still be
preferable to backport the minimum pieces required for fd_grab_tgid()
to 2.6 so that it stays stable long.
(cherry picked from commit cd8914bc525668bc4a41278deee3430c69d29500)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 877a13db06dbbbd2b85e3f50489af0560d29eb61)
Signed-off-by: Willy Tarreau <w@1wt.eu>
When a new fd is inserted in the fdtab array, its state is initialized. The
"newstate" variable is used to compute the right state (0 by default, but
FD_ET_POSSIBLE flag is set if edge-triggered is supported for the fd).
However, this variable is never used and the fd state is always set to 0.
Now, the fd state is initialized with "newstate" variable.
This bug was introduced by commit ddedc1662 ("MEDIUM: fd: make
fd_insert/fd_delete atomically update fd.tgid"). No backport needed.
(cherry picked from commit 7e94b40a22fab080b072c4757d487a40d2c6f828)
[wt: this is necessary to fix a design bug introduced in 2.5]
Signed-off-by: Willy Tarreau <w@1wt.eu>
These functions need to set/reset the FD's tgid but when they're called
there may still be wakeups on other threads that discover late updates
and have to touch the tgid at the same time. As such, it is not possible
to just read/write the tgid there. It must only be done using operations
that are compatible with what other threads may be doing.
As we're using inc/dec on the refcount, it's safe to AND the area to zero
the lower part when resetting the value. However, in order to set the
value, there's no other choice but fd_claim_tgid() which will assign it
only if possible (via a CAS). This is convenient in the end because it
protects the FD's masks from being modified by late threads, so while
we hold this refcount we can safely reset the thread_mask and a few other
elements. A debug test for non-null masks was added to fd_insert() as it
must not be possible to face this situation thanks to the protection
offered by the tgid.
(cherry picked from commit ddedc1662487600d5124cb6d4972396438b9953c)
[wt: this is necessary to fix a design bug introduced in 2.5; ctx
adjustment for s/tgid/1 and the fact that we don't enforce the mask
in fd_insert() in 2.6 and older. Note that this patch had a bug with
newstate not being used for fd.state, and needs commit 7e94b40a]
Signed-off-by: Willy Tarreau <w@1wt.eu>
It's an AND so it destroys information and due to this there's a call
place where we have to perform two reads to know the previous value
then to change it. With a fetch-and-and instead, in a single operation
we can know if the bit was previously present, which is more efficient.
(cherry picked from commit d6e198761281e755e5b96272ed2a751df096efdf)
[wt: this is necessary to fix a design bug introduced in 2.5; ctx
adjustment for tid_bit instead of ti->ltid_bit]
Signed-off-by: Willy Tarreau <w@1wt.eu>
The running mask is only valid if the tgid is the expected one. This
function takes a reference on the tgid before reading the running mask,
so that both are checked at once. It returns either the mask or zero if
the tgid differs, thus providing a simple way for a caller to check if
it still holds the FD.
(cherry picked from commit ceffd17f52b6b1aa481365fc6f9b88e8efc436e8)
[wt: this is necessary to fix a design bug introduced in 2.5]
Signed-off-by: Willy Tarreau <w@1wt.eu>
The FD's tgid is refcounted and must be atomically manipulated. Function
fd_grab_tgid() will increase the refcount but only if the tgid matches the
one in argument (likely the current one). fd_claim_tgid() will be used to
self-assign the tgid after waiting for its refcount to reach zero.
fd_drop_tgid() will be used to drop a temporarily held tgid. All of these
are needed to prevent an FD from being reassigned to another group, either
when inspecting/modifying the running_mask, or when checking for updates,
in order to be certain that the mask being seen corresponds to the desired
group. Note that once at least one bit is set in the running mask of an
active FD, it cannot be closed, thus not migrated, thus the reference does
not need to be held long.
(cherry picked from commit 080373ea3896781963f647fd1dcea30ab46fa50f)
[wt: this is necessary to fix a design bug introduced in 2.5]
Signed-off-by: Willy Tarreau <w@1wt.eu>
The file descriptors will need to know the thread group ID in addition
to the mask. This extends fd_insert() to take the tgid, and will store
it into the FD.
In the FD, the tgid is stored as a combination of tgid on the lower 16
bits and a refcount on the higher 16 bits. This allows to know when it's
really possible to trust the tgid and the running mask. If a refcount is
higher than 1 it indeed indicates another thread else might be in the
process of updating these values.
Since a closed FD must necessarily have a zero refcount, a test was
added to fd_insert() to make sure that it is the case.
(cherry picked from commit 9464bb1f05b5e0046716b4573a567d3450ac7604)
[wt: this is necessary to fix a design bug introduced in 2.5. The patch
was simplified as 2.6 does not yet support thread groups, only group 1
is added and the fd_insert() API does not change]
Signed-off-by: Willy Tarreau <w@1wt.eu>
The ssl_bind_kw structure is exclusively used for crt-list keyword, it
must be named otherwise to remove the confusion.
The structure was renamed ssl_crtlist_kws.
(cherry picked from commit af678066518ea5569005b5e43c140a8facb2ee61)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 30b1d8b63f8ed09ab2b2b7bcb4d6b4ba5f4f45e8)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Since commit cc9bf2e5f "MEDIUM: cache: Change caching conditions"
responses that do not have an explicit expiration time are not cached
anymore. But this mechanism wrongly used the TX_CACHE_IGNORE flag
instead of the TX_CACHEABLE one. The effect this had is that a cacheable
response that corresponded to a request having a "Cache-Control:
no-cache" for instance would not be cached.
Contrary to what was said in the other commit message, the "checkcache"
option should not be impacted by the use of the TX_CACHEABLE flag
instead of the TX_CACHE_IGNORE one. The response is indeed considered as
not cacheable if it has no expiration time, regardless of the presence
of a cookie in the response.
This should fix GitHub issue #2048.
This patch can be backported up to branch 2.4.
(cherry picked from commit 879debeecb93202c983f25f5f1d765e74d77faa5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit c6dbdbc94329c02402f2560299ca97f0c5a82e49)
Signed-off-by: Willy Tarreau <w@1wt.eu>
In bb581423b ("BUG/MEDIUM: httpclient/lua: crash when the lua task timeout
before the httpclient"), a new logic was implemented to make sure that
when a lua ctx destroyed, related httpclients are correctly destroyed too
to prevent a such httpclients from being resuscitated on a destroyed lua ctx.
This was implemented by adding a list of httpclients within the lua ctx,
and a new function, hlua_httpclient_destroy_all(), that is called under
hlua_ctx_destroy() and runs through the httpclients list in the lua context
to properly terminate them.
This was done with the assumption that no concurrent Lua garbage collection
cycles could occur on the same ressources, which seems OK since the "lua"
context is about to be freed and is not explicitly being used by other threads.
But when 'lua-load' is used, the main lua stack is shared between multiple
OS threads, which means that all lua ctx in the process are linked to the
same parent stack.
Yet it seems that lua GC, which can be triggered automatically under
lua_resume() or manually through lua_gc(), does not limit itself to the
"coroutine" stack (the stack referenced in lua->T) when performing the cleanup,
but is able to perform some cleanup on the main stack plus coroutines stacks
that were created under the same main stack (via lua_newthread()) as well.
This can be explained by the fact that lua_newthread() coroutines are not meant
to be thread-safe by design.
Source: http://lua-users.org/lists/lua-l/2011-07/msg00072.html (lua co-author)
It did not cause other issues so far because most of the time when using
'lua-load', the global lua lock is taken when performing critical operations
that are known to interfere with the main stack.
But here in hlua_httpclient_destroy_all(), we don't run under the global lock.
Now that we properly understand the issue, the fix is pretty trivial:
We could simply guard the hlua_httpclient_destroy_all() under the global
lua lock, this would work but it could increase the contention over the
global lock.
Instead, we switched 'lua->hc_list' which was introduced with bb581423b
from simple list to mt_list so that concurrent accesses between
hlua_httpclient_destroy_all and hlua_httpclient_gc() are properly handled.
The issue was reported by @Mark11122 on Github #2037.
This must be backported with bb581423b ("BUG/MEDIUM: httpclient/lua: crash
when the lua task timeout before the httpclient") as far as 2.5.
(cherry picked from commit 3ffbf3896d6e6d6bef99763957fb67fb9023cce3)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 5a9a9efc52e8fd61e63ed54b29f50a7f78ca1131)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
In check_config_validity() function, we performed some consistency checks to
adjust minconn/maxconn attributes for each declared server.
We move this logic into a dedicated function named srv_minmax_conn_apply()
to be able to perform those checks later in the process life when needed
(ie: dynamic servers)
(cherry picked from commit 3e7a0bb70b2f1a81d17163f27132f2f44b71521e)
[wt: needed for next fix]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 5bda7da2467095177468e0af211d36673281de72)
Signed-off-by: Willy Tarreau <w@1wt.eu>
The SCID (source connection ID) used by a peer (client or server) is sent into the
long header of a QUIC packet in clear. But it is also sent into the transport
parameters (initial_source_connection_id). As these latter are encrypted into the
packet, one must check that these two pieces of information do not differ
due to a packet header corruption. Furthermore as such a connection is unusuable
it must be killed and must stop as soon as possible processing RX/TX packets.
Implement qc_kill_con() to flag a connection as unusable and to kille it asap
waking up the idle timer task to release the connection.
Add a check to quic_transport_params_store() to detect that the SCIDs do not
match and make it call qc_kill_con().
Add several tests about connection to be killed at several critial locations,
especially in the TLS stack callback to receive CRYPTO data from or derive secrets,
and before preparing packet after having received others.
Must be backported to 2.6 and 2.7.
(cherry picked from commit 0aa79953c9779a2da092b3d0f6908726fe7013ec)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit b8f3e25677d9433869846e47cefb71b9841e8f12)
[wt: adj ctx: no quic_dgram_parse() in 2.6]
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is a bad idea to make the TLS ClientHello callback call qc_conn_finalize().
If this latter fails, this would generate a TLS alert and make the connection
send packet whereas it is not functional. But qc_conn_finalize() job was to
install the transport parameters sent by the QUIC listener. This installation
cannot be done at any time. This must be done after having possibly negotiated
the QUIC version and before sending the first Handshake packets. It seems
the better moment to do that in when the Handshake TX secrets are derived. This
has been found inspecting the ngtcp2 code. Calling SSL_set_quic_transport_params()
too late would make the ServerHello to be sent without the transport parameters.
The code for the connection update which was done from qc_conn_finalize() has
been moved to quic_transport_params_store(). So, this update is done as soon as
possible.
Add QUIC_FL_CONN_TX_TP_RECEIVED to flag the connection as having received the
peer transport parameters. Indeed this is required when the ClientHello message
is splitted between packets.
Add QUIC_FL_CONN_FINALIZED to protect the connection from calling qc_conn_finalize()
more than one time. This latter is called only when the connection has received
the transport parameters and after returning from SSL_do_hanshake() which is the
function which trigger the TLS ClientHello callback call.
Remove the calls to qc_conn_finalize() from from the TLS ClientHello callbacks.
Must be backported to 2.6. and 2.7.
(cherry picked from commit af25a69c8b0faa57a27dd5e9a99cbbe350ee6b07)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 933a302947f615d4407bb327f8ded2dbc915f031)
Signed-off-by: Willy Tarreau <w@1wt.eu>
In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"),
we forgot to apply the patch in resolvers.c which provides the
stats_dump_resolvers() function that is involved when dumping with "resolvers"
domain.
As a consequence, resolvers dump was broken because stats_dump_one_line(),
which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk
from stats.c to prepare the dump, and stats_putchk() is then called with
global trash (currently empty) as output data.
Given that trash_dump variable is static and thus only available within stats.c
we change stats_putchk() function prototype so that the function does not take
the output buffer as an argument. Instead, stats_putchk() will implicitly use
the local trash_dump variable declared in stats.c.
It will also prevent further mixups between stats_dump_* functions and
stats_putchk().
This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash
buffer to dump the stats")
(cherry picked from commit e5958d0292222c6cc122b1b19b79c959ec27370b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 1a8db96980c972b10805dfc8f8287a86c46d0d81)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Define a new qcc_app_ops callback named close(). This will be used to
notify app-layer about the closure of a stream by the remote peer. Its
main usage is to ensure that the closure is allowed by the application
protocol specification.
For the moment, close is not implemented by H3 layer. However, this
function will be mandatory to properly reject a STOP_SENDING on the
control stream and preventing a later crash. As such, this commit must
be backported with the next one on 2.6.
This is related to github issue #2006.
(cherry picked from commit 1e340ba6bc0f747bf94e14c91f0351a9a0d7cf03)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit b403127cdb6fbac47dbf16c0587166337d2531b3)
Signed-off-by: Willy Tarreau <w@1wt.eu>
This function was introduced in OpenSSL 1.1.0. Prior to that, the
ECDSA_SIG structure was public.
This function was used in commit 5a8f02ae "BUG/MEDIUM: jwt: Properly
process ecdsa signatures (concatenated R and S params)".
This patch needs to be backported up to branch 2.5 alongside commit
5a8f02ae.
(cherry picked from commit bb35e1f5aa69f55f588e060fee6098db37abe94c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a1958aad1de4e8df9504aefeae0f1b4631768ac9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
There is a bug in b_slow_realign() function when wrapping output data are
copied in the swap buffer. block1 and block2 sizes are inverted. Thus blocks
with a wrong size are copied. It leads to data mixin if the first block is
in reality larger than the second one or to a copy of data outside the
buffer is the first block is smaller than the second one.
The bug was introduced when the buffer API was refactored in 1.9. It was
found by a code review and seems never to have been triggered in almost 5
years. However, we cannot exclude it is responsible of some unresolved bugs.
This patch should fix issue #1978. It must be backported as far as 2.0.
(cherry picked from commit 61aded057dafc419f62b9534d03e6c99a3405f7a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 4a048c13f5ec3bcd060c8af955fe51694400b69d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
The same change was already performed for the cli. The stats applet and the
prometheus exporter are also concerned. Both use the stats API and rely on
pool functions to get total pool usage in bytes. pool_total_allocated() and
pool_total_used() must return 64 bits unsigned integer to avoid any wrapping
around 4G.
This may be backported to all versions.
(cherry picked from commit c960a3b60f5d05b82cdac2a33ab22ca465787e60)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit b174d82dff11d7fb67e9a7f53c20a658f23dd9e7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
qcs instances for bidirectional streams are inserted in
<qcc.opening_list>. It is removed from the list once a full HTTP request
has been parsed. This is required to implement http-request timeout.
In case a stream is deleted before receiving full HTTP request, it also
must be removed from <qcc.opening_list>. This was not the case on first
implementation but has been fixed by the following patch :
641a65ff3cccd394eed49378c6ccdb8ba0a101a7
BUG/MINOR: mux-quic: remove qcs from opening-list on free
This means that now a stream can be deleted from the list in two
different functions. Sadly, as LIST_DELETE was used in both cases,
nothing prevented a double-deletion from the list, even though
LIST_INLIST was used. Both calls are replaced with LIST_DEL_INIT which
is idempotent.
This bug causes memory corruption which results in most cases in a
segfault, most of times outside of mux-quic code itself. It has been
found first by gabrieltz who reported it on the github issue #1903. Big
thanks to him for his testing.
This bug also causes failures on several 'M' transfer testcase of QUIC
interop-runner. The s2n-quic client is particularly useful in this case
as segfaults triggers were most of the times on the LIST_DELETE
operation itself. This is probably due to its encapsulating of HEADERS
frame with fin bit delayed in a following empty STREAM frame.
This must be backported wherever the above patch is, up to 2.6.
(cherry picked from commit 15337fd8085288ac10de66bb048c8d655fbb0f25)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 151737fa818ffb37c8eb1706ef16722b6dd68f8b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Performance profiling on a 48-thread machine showed a lot of time spent
in pool_free(), precisely at the point where pool->limit was retrieved.
And the reason is simple. Some parts of the pool_head are heavily updated
only when facing a cache miss ("allocated", "used", "needed_avg"), while
others are always accessed (limit, flags, size). The fact that both
entries were stored into the same cache line makes it very difficult for
each thread to access these precious info even when working with its own
cache.
By just splitting the fields apart, a test on QUIC (which stresses pools
a lot) more than doubled performance from 42 Gbps to 96 Gbps!
Given that the patch only reorders fields and addresses such a significant
contention, it should be backported to 2.7 and 2.6.
(cherry picked from commit 4dd33d9c322d3be167c3d672aebf6108f4f7889b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 7d1b6977199fb663f39c928f3f159fd078d1b30d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Add a new value in stats ctx: field.
Implement field support in line dumping parent functions
stats_print_proxy_field_json() and stats_dump_proxy_to_buffer().
This will allow child dumping functions to support partial line dumping
when needed. ie: when dumping buffer is exhausted: do a partial send and
wait for a new buffer to finish the dump. Thanks to field ctx, the function
can start dumping where it left off on previous (unterminated) invokation.
(cherry picked from commit 559418419048426faaa216f8bb9ad254f0052d4f)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 84f6ea521b4f92779b15d5cd4de6539462dba54a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
qc_new_conn() is used to allocate a quic_conn instance and its various
internal members. If one allocation fails, quic_conn_release() is used
to cleanup things.
For the moment, pool_zalloc() is used which ensures that all content is
null. However, some members must be initialized to a special values
to be able to use quic_conn_release() safely. This is the case for
quic_conn lists and its tasklet.
Also, some quic_conn internal allocation functions were doing their own
cleanup on failure without reset to NULL. This caused an issue with
quic_conn_release() which also frees this members. To fix this, these
functions now only return an error without cleanup. It is the caller
responsibility to free the allocated content, which is done via
quic_conn_release().
Without this patch, allocation failure in qc_new_conn() would often
result in segfault. This was reproduced easily using fail-alloc at 10%.
This should be backported up to 2.6.
(cherry picked from commit dbf6ad470b3206f64254141e7cf80a980261be29)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit d35d46916d8ff53b13c08862297f49b5d881d738)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Extract function h2_parse_cont_len_header() in the generic HTTP module.
This allows to reuse it for all HTTP/x parsers. The function is now
available as http_parse_cont_len_header().
Most notably, this will be reused in the next bugfix for the H3 parser.
This is necessary to check that content-length header match the length
of DATA frames.
Thus, it must be backported to 2.6.
(cherry picked from commit 15f3cc4b389d1e92f7d537a2321ad027cf3b5a15)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 76d3becee5c10aacabb5cb26b6776c00ca5b9ae6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
This patch introduces haproxy_backend_agg_check_status metric
as we wanted in 42d7c402d but with the right data source.
This patch could be backported as far as 2.4.
(cherry picked from commit e06e31ea3b62ef8ccb911ac3969ae70f7bbb7574)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit f0319e0f56581873f906f79dc218bf6f10b8f6c2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
haproxy_backend_agg_server_check_status currently aggregates
haproxy_server_status instead of haproxy_server_check_status.
We deprecate this and create a new one,
haproxy_backend_agg_server_status to clarify what it really
does.
This patch could be backported as far as 2.4.
(cherry picked from commit 7d6644e689f15b329789a355ea2812ea0223fe4f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 2c0d7982e7612b2e7157170aa7109f20b780bb64)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Coverity raised a potential overflow issue in these new functions that
work on unsigned long long objects. They were added in commit 9b25982
"BUG/MEDIUM: ssl: Verify error codes can exceed 63".
This patch needs to be backported alongside 9b25982.
(cherry picked from commit e239e4938d89956e7820be4a0f26e782a86bcf6d)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
The CRT and CA verify error codes were stored in 6 bits each in the
xprt_st field of the ssl_sock_ctx meaning that only error code up to 63
could be stored. Likewise, the ca-ignore-err and crt-ignore-err options
relied on two unsigned long longs that were used as bitfields for all
the ignored error codes. On the latest OpenSSL1.1.1 and with OpenSSLv3
and newer, verify errors have exceeded this value so these two storages
must be increased. The error codes will now be stored on 7 bits each and
the ignore-err bitfields are replaced by a big enough array and
dedicated bit get and set functions.
It can be backported on all stable branches.
[wla: let it be tested a little while before backport]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 9b25982716f0416c28f8fc894c58eb40885cf9e5)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
peers-t.h uses "struct stktable" as well as STKTABLE_DATA_TYPES which
are defined in stick-table-t.h. It works by accident because
stick-table-t.h was always included before. But could provoke build
issue with EXTRA code.
To be backported as far as 2.2.
(cherry picked from commit 46bea1c6163731a45749e4429fbd1294441a7c68)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
(cherry picked from commit 5c89a0c0484b706cfa10398be8539f39c7b311e9)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
ncbuf API relies on lot of small functions. Mark these functions as
inline to reduce call invocations and facilitate compiler optimizations
to reduce code size.
This should be backported up to 2.6.
(cherry picked from commit d64a26f0238f386065e26654e6a8a925f96c8baa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.
This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.
Must be backported in 2.6.
(cherry picked from commit 0a2d63236c4ada9a33f7e9495aa332fdcd9f5f82)
[wla: context changed in httpclient_precheck()]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.
This may be backported as far as 2.2.
(cherry picked from commit 9dc231a6b23fc7d5cf3c233b46e00b9e251325b4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:
"BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"
This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.
This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.
Thank you to @gabrieltz for having reported this issue in GH #1903.
Must be backported to 2.6.
(cherry picked from commit 74b5f7b31b6c68e220e68cb4a0b302137a9a7362)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.
Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)
With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).
Thank you to @gabrieltz for having reported this issue.
Must be backported to 2.6.
(cherry picked from commit 814645f42fae3e6dea994f88aa3b67cf43958dcf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Subscribing was not properly designed between quic-conn and quic MUX
layers. Align this as with in other haproxy components : <subs> field is
moved from the MUX to the quic-conn structure. All mention of qcc MUX is
cleaned up in quic_conn_subscribe()/quic_conn_unsubscribe().
Thanks to this change, ACK reception notification has been simplified.
It's now unnecessary to check for the MUX existence before waking it.
Instead, if <subs> quic-conn field is set, just wake-up the upper layer
tasklet without mentionning MUX. This should probably be extended to
other part in quic-conn code.
This should be backported up to 2.6.
(cherry picked from commit bbb1c68508ceebb98ac4234c906a65a42596e6ea)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
This patch complete the previous incomplete commit. The new counter
sendto_err_unknown is now displayed on stats page/CLI show stats.
This is related to github issue #1903.
This should be backported up to 2.6.
(cherry picked from commit 7941ead3aa00c9e83fadf70a1d6d515d20421ad0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remove ABORT_NOW() statement on unhandled sendto error. Instead use a
dedicated counter sendto_err_unknown to report these cases.
If we detect increment of this counter, strace can be used to detect
errno value :
$ strace -p $(pidof haproxy) -f -e trace=sendto -Z
This should be backported up to 2.6.
This should help to debug github issue #1903.
(cherry picked from commit 1d9f170eddd8703ba550e91322298e88e8280075)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Max stream data was not enforced and respect for local/remote uni
streams. Previously, qcs instances incorrectly reused the limit defined
from bidirectional ones.
This is now fixed. Two fields are added in qcc structure connection :
* value for local flow control to enforce on remote uni streams
* value for remote flow control to respect on local uni streams
These two values can be reused to properly initialized msd field of a
qcs instance in qcs_new(). The rest of the code is similar.
This must be backported up to 2.6.
(cherry picked from commit 176174f7e4734ca8d7a27a622be44ec386d36f4c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
When the lua task finished before the httpclient that are associated to
it, there is a risk that the httpclient try to task_wakeup() the lua
task which does not exist anymore.
To fix this issue the httpclient used in a lua task are stored in a
list, and the httpclient are destroyed at the end of the lua task.
Must be backported in 2.5 and 2.6.
(cherry picked from commit bb581423b3ba48dfafb53b70205483f766242a6b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Received packets treatment has some difference regarding if this is the
first one or not of the encapsulating datagram. Previously, this was set
via a function argument. Simplify this by defining a new Rx packet flag
named QUIC_FL_RX_PACKET_DGRAM_FIRST.
This change does not have functional impact. It will simplify API when
qc_lstnr_pkt_rcv() is broken into several functions : their number of
arguments will be reduced thanks to this patch.
This should be backported up to 2.6.
(cherry picked from commit deb7c87f5525d5645dd7c94fb187603edbb8d27a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
pn_offset field was only set if header protection cannot be removed.
Extend the usage of this field : it is now set everytime on packet
parsing in qc_lstnr_pkt_rcv().
This change helps to clean up API of Rx functions by removing
unnecessary variables and function argument.
This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.
This should be backported up to 2.6.
(cherry picked from commit 845169da584655dedce3286e7e0011fab3f10507)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Add a new field version on quic_rx_packet structure. This is set on
header parsing in qc_lstnr_pkt_rcv() function.
This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.
This should be backported up to 2.6.
(cherry picked from commit 0eae57273b3a2b585ddc19d6f97cdc05f2203f0b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>