haproxy

Author	SHA1	Message	Date
Frederic Lecaille	92b811d520	MINOR: quic: Token for future connections implementation. There exist two sorts of token used by QUIC. They are both used to validate the peer address (path validation). Retry are used for the current connection the client want to open. This patch implement the other sort of tokens which after having been received from a connection, may be provided for the next connection from the same IP address to validate it (or validate the network path between the client and the server). The token generation is implemented by quic_generate_token(), and the token validation by quic_token_chek(). The same method is used as for Retry tokens to build such tokens to be reused for future connections. The format is very simple: one byte for the format identifier to distinguish these new tokens for the Retry token, followed by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes are added to this token and a salt to derive the AEAD secret used to cipher the token. In addition to this salt, this is the client IP address which is used also as AAD to derive the AEAD secret. So, the length of the token is fixed: 37 bytes. (cherry picked from commit f5b09dc452f582eb876527fd28103bc29c51afad) [fl: very minor Makefile modif to correctly add quic_token.o object to be compiled] Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:17:27 +02:00
William Lallemand	0c7265509c	MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD The QUIC crypto is using the EVP_CIPHER API in order to achieve authenticated encryption, this was the API which was used with OpenSSL. With libraries that inspires from BoringSSL (libreSSL and AWS-LC), the AEAD algorithms are implemented using the EVP_AEAD API. This patch converts the call to the EVP_CIPHER API when called in the contex of AEAD cryptography for QUIC. The patch defines some QUIC_AEAD macros that can be either EVP_CIPHER or EVP_AEAD depending on the library. This was mainly done for AWS-LC but this could be useful for other libraries. This should finally allow to use CHACHA20_POLY1305 with AWS-LC. This patch allows to use the following ciphers with the EVP_AEAD API: - TLS1_3_CK_AES_128_GCM_SHA256 - TLS1_3_CK_AES_256_GCM_SHA384 AWS-LC does not implement TLS1_3_CK_AES_128_CCM_SHA256 and TLS1_3_CK_CHACHA20_POLY1305_SHA256 requires some hack for headers protection which will come in another patch. (cherry picked from commit 31c831e29b432f0a9958be63948e8f4cb278e9f8) [fl: required to support NEW_TOKEN which depends on QUIC_AEAD_* definitions] Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:15:15 +02:00
Frederic Lecaille	d074491bd4	MINOR: quic: Implement quic_tls_derive_token_secret(). This is function is similar to quic_tls_derive_retry_token_secret(). Its aim is to derive the secret used to cipher the token to be used for future connections. This patch renames quic_tls_derive_retry_token_secret() to a more and reuses its code to produce a more generic one: quic_do_tls_derive_token_secret(). Two arguments are added to this latter to produce both quic_tls_derive_retry_token_secret() and quic_tls_derive_token_secret() new function which calls quic_do_tls_derive_token_secret(). (cherry picked from commit 74caa0eece1cc3a8b35f1d34674ea5f357819314) Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:13:00 +02:00
Frederic Lecaille	ee7ad6615d	MINOR: tools: Implement ipaddrcpy(). Implement ipaddrcpy() new function to copy only the IP address from a sockaddr_storage struct object into a buffer. (cherry picked from commit fb7a0922038932a6b82f1827a0214c5d2e8da32e) Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:12:41 +02:00
Willy Tarreau	61d73137f1	BUG/MEDIUM: clock: also update the date offset on time jumps In GH issue #2704, @swimlessbird and @xanoxes reported problems handling time jumps. Indeed, since 2.7 with commit `4eaf85f5d9` ("MINOR: clock: do not update the global date too often") we refrain from updating the global offset in case it didn't change. But there's a catch: in case of a large time jump, if the poller was interrupted, the local time remains the same and we return immediately from there without updating the offset. It then becomes incorrect regarding the "date" value, and upon subsequent call to the poller, there's no way to detect a jump anymore so we apply the old, incorrect offset and the date becomes wrong. Worse, going back to the original time (then in the past), global_now_ns remains higher than the local time and neither get updated anymore. What is missing in practice is to immediately update the offset when detecting a time jump. In an ideal world, the offset would be updated upon every call, that's what was being done prior to commit above but it's extremely CPU intensive on large systems. However we can perfectly afford to update the offset every time we detect a time jump, as it's not as common. This needs to be backported as far as 2.8. Thanks to both participants above for providing very helpful details. (cherry picked from commit e8b1ad4c2b3985eb9e826fd279e419719a2c03ce) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-04 17:13:00 +02:00
Frederic Lecaille	5ad7493933	BUILD: quic: 32bits build broken by wrong integer conversions for printf() Since these commits the 32bits build is broken due to several errors as follow: CC src/quic_cli.o src/quic_cli.c: In function ‘dump_quic_full’: src/quic_cli.c:285:94: error: format ‘%ld’ expects argument of type ‘long int’, but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=] 285 \| chunk_appendf(&trash, " [initl] rx.ackrng=%-6zu tx.inflight=%-6zu(%ld%%)\n", \| ~~^ \| \| \| long int \| %lld 286 \| pktns->rx.arngs.sz, pktns->tx.in_flight, 287 \| pktns->tx.in_flight * 100 / qc->path->cwnd); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| \| \| uint64_t {aka long long unsigned int} Replace several %ld by %llu with ull as printf conversion in quic_clic.c and a %ld by %lld with (long long) as printf conversion in quic_cc_cubic.c. Thank you to Ilya (@chipitsine) for having reported this issue in GH #2689. Must be backported to 3.0. (cherry picked from commit 414e3aa6bc80d66a448dc25d8e50f4e457dc8711) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Valentine Krasnobaeva	fa72729511	BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list Remove tune.fast-forward from common_kw_list. It was replaced by 'tune.disable-fast-forward' and it's no longer present in "if..else if.." parser from cfg_parse_global(). Otherwise, it may be shown as the best-match keyword for some tune options, which is now wrong. Should be backported in versions 2.9 and 3.0. (cherry picked from commit 2e6e159ac47468b6b65e9678c9e4d1fe746165e8) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Nathan Wehrman	692c298a88	DOC: config: correct the table for option tcplog option tcplog was reported as functional in the backend section in error. This can be back ported as needed but it simply corrects that. (cherry picked from commit 9788ae1d19ea159f2a87a8ef0a02ff57a480b703) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Valentine Krasnobaeva	b4af0cff48	BUG/MINOR: pattern: pat_ref_set: return 0 if err was found pat_ref_set_elt() returns 0, if we are run out of memory or can't parse a new map value. Any arror message emitted by pat_ref_set_elt() is saved in err buffer, if its provided by caller. These error messages are cumulated during the loop. pat_ref_set() is used to update values in map, referred to the same given key. If during the update pat_ref_set_elt() fails, let's retun 0 to caller immediately. We have the same non-unique key and the same new value in each loop. So it seems quite odd to cumulate the same error messages and print it in CLI: > add map @1 mytest.map << + 1.0.1.11 TestA + 1.0.1.11 TESTA + 1.0.1.11 test_a + > set map mytest.map 1.0.1.11 15 unable to parse '15' unable to parse '15' unable to parse '15'. cli_parse_set_map(), which calls pat_ref_set() to update map, will return only one error message with this patch: > set map mytest.map 1.0.1.11 15 unable to parse '15'. hlua_set_map() and http_action_set_map() don't provide error buffer and will just exit on the first error. This should be backported in all stable versions. (cherry picked from commit 911f4d93d436a9067fb45168f70f79b5916489b0) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Valentine Krasnobaeva	bf82f46ef0	BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity memprintf() performs realloc and updates then the pointer to an output buffer, where it has written the data. So free() is called on the previous buffer address, if it was provided. pat_ref_set_elt() uses memprintf() to write its error message as well as pat_ref_set(). So, when we re-enter into the while loop the second time and pat_ref_set_elt() has returned, the err ptr (previous value of merr) is already freed by memprintf() from pat_ref_set_el(). 'if (!found)' condition is false at this point, because we've found a node at the first loop. So, the second memprintf(), in order to write error messages, does again free(*err). This should be backported in all stable versions. (cherry picked from commit 4f2493f3551f8c344485cccf075c3b15f9825181) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Amaury Denoyelle	b8523ffaf8	BUG/MINOR: h3: properly reject too long header responses When encoding HTX to HTTP/3 headers on the response path, a bunch of ABORT_NOW() where used when buffer room was not enough. In most cases this is safe as output buffer has just been allocated and so is empty at the start of the function. However, with a header list longer than a whole buffer, this would cause an unexpected crash. Fix this by removing ABORT_NOW() statement with proper error return path. For the moment, this would cause the whole connection to be close rather than the stream only. This may be further improved in the future. Also remove ABORT_NOW() when encoding frame length at the end of headers or trailers encoding. Buffer room is sufficient as it was already checked prior in the same function. This should be backported up to 2.6. Special care should be handled however as this code path has changed frequently : * for 2.9 and older, the extra following statement must be inserted prior each newly added goto statement : h3c->err = H3_INTERNAL_ERROR; * for 2.6, trailers support is not implemented. As such, related chunks should just be ignored when backporting. (cherry picked from commit 48514c118ce881e2ae20aeb1edda929663a8397d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Valentine Krasnobaeva	317e918193	BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails This patch is done mostly as a safeguard in order not to trigger BUG_ON(fdtab[fd].owner != NULL) check, if listen() will fail on UNIX domain socket. In uxst_bind_listener(), the pretty same logic of closing socket on error path was kept, as it was in tcp_bind_listener() before. The use of fd_delete() was not generalized, when the support of UNIX sock_stream protocol was implemented. So, let's remove fd from fdtab on failure, instead of closing it. Otherwise, uxst_bind_listener(), which could be called in loop for each receiver, will obtain the same fd via socket() for the next receiver. Then, it will bind it again and it will try to re-insert it in fdtab. This can be backported to all stable versions. (cherry picked from commit eb8235869027fe6f472160febb6edb169f38d1ee) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Amaury Denoyelle	cf920ac69b	BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID QUIC stream IDs are expressed as QUIC variable integer which cover the range for 0 to 2^62 - 1. As such, it is forbidden to send an ID for MAX_STREAMS flow-control frame which would allow to overcome this value. This patch fixes MAX_STREAMS emission to ensure sent value is valid. This also ensures that the peer cannot open a stream with an invalid ID as this would cause a flow-control violation instead. This must be backported up to 2.6. (cherry picked from commit f3c75a52df29247e5d502344127d42efb2c12b82) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
William Lallemand	3ce79401b3	REGTESTS: mcli: test the pipelined commands on master CLI A recent fix broke the pipelined command on the master CLI, this reg-tests implement a simple test that allow to check its right behavior. This could be backported as far as 2.6. (cherry picked from commit fe5ddcc4901e3b43e58f5cf903c500a69d091b57) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
William Lallemand	a948603e99	BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI Since commit `3d93ecc` ("BUG/MAJOR: cli: Restore non-interactive mode behavior with pipelined commands") and commit `598c7f16` ("BUG/MEDIUM: cli: Warn if pipelined commands are delimited by a \n"), the pipelined command on the master CLI are either broken or emit warnings depending on which version. The reason is that mode applied on the master CLI are saved on the in the current CLI session, and then reinserted for each pipelined command, however, these commande were inserted as new lines. For example: "@1; expert-mode on; debug dev log foo; debug dev log bar" Would be sent as: "expert mode on\ndebug dev log foo" "expert mode on\ndebug dev log bar" This patch fixes the issue by using the new ci_insert() function which inserts a string instead of a newline, and the command are now suffixed by ';' upon insertion allowing a correct pipelined command chain. This must be backported with the previous commit introducing ci_insert() in every stable version. This is broken since the 3.0 version, but it emits a warning in every version below, because `598c7f164` was backported. (cherry picked from commit b75edf2f11486bc2f2cc92c2b5273219a5e728c6) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
William Lallemand	dcfe6118b2	MINOR: channel: implement ci_insert() function ci_insert() is a function which allows to insert a string <str> of size <len> at <pos> of the input buffer. This is the equivalent of ci_insert_line2() but without inserting '\r\n' (cherry picked from commit b2a8e8731da82b8bbd9dfff6d5a0d71f25a5ee49) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
Valentine Krasnobaeva	1038e1517a	BUG/MINOR: proto_tcp: keep error msg if listen() fails If listen() fails, we need to keep the message about it, which is copied then in errmsg buffer on the error path. This buffer is properly provided by the caller (protocol_bind_all()) and reallocated if needed in memprintf(), but it was deleted without being returned. This can be backported to all stable versions. (cherry picked from commit 81f48395b325b9875d215ec2743e75f7a56e1e5f) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:52 +02:00
Valentine Krasnobaeva	de397e2e97	BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails If listen() fails, fd should be deleted from fdtab, not just closed. Otherwise, sock_inet_bind_receiver(), which is called in loop for each receiver, will obtain the same fd via socket() for the next receiver, registered in the receivers list. Then, it will bind it again and it will try to re-insert it in fdtab, and fd_insert() will trigger the BUG_ON(fdtab[fd].owner != NULL) check. When tcp_bind_listener() code was implemented, the use of fd_delete() was not generalized and this one remained overlooked. This can be backported to all stable versions. (cherry picked from commit 308c6881c03b6302afd5cc48781d73a11ef994d4) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:48 +02:00
Willy Tarreau	be0d95414c	BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE The event emitted by this trace was of type CLOSE instead of NEW, which would somtimes temporarily pause a started trace. This can be backported to 3.0, probably 2.6. (cherry picked from commit 6bf50dfccca992d7f05febb5819e57b601ef94c0) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:38 +02:00
Willy Tarreau	1c44929ed1	BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion The test was was performed but there's no way to set the option! Let's just add "qconn" to select the quic conn when the source supports it. This can be backported at least to 3.0, probably 2.6. (cherry picked from commit 7a22fbd453d7ef732b72c1d45d0e2c4f89b43fcb) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:32 +02:00
Willy Tarreau	830fcd3082	BUG/MINOR: trace: automatically start in waiting mode with "start <evt>" The doc clearly says that "start <evt>" should leave the trace in pause mode until the indicated event appears. However it's not what's happening, the state is not changed until one command uses "now", so it's typically needed to configure the events with "start <evt>" then enable the waiting mode using "pause now". This is counter-intuitive and does not match the doc, so let's fix it so that "start <evt>" switches from stopped to waiting as long as at least one event is enabled. This can be backported to all versions. (cherry picked from commit 0406efe9ad129e91f8a6c93b780064b3c27ccaa0) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:27 +02:00
Willy Tarreau	fe71ad89da	BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED() When calling TRACE_ENABLED(), which is called by TRACE_PRINTF(), we pass a NULL plockptr to __trace_enabled(). This argument is used when lockon is active, and may update the pointer. This is an overlook which also broke the lockon mechanism because now for calls from __trace(), it dereferences a pointer pointing to NULL, and never updates it due to the broken condition, so that trace() never sets up src->lockon_ptr. The bug was introduced in 2.8 by commit `8f9a9704bb` ("MINOR: trace: add a TRACE_ENABLED() macro to determine if a trace is active"), so the fix must be backported there. (cherry picked from commit b5df6b5a31b86b4403f00b7e0230c97883eca0f3) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:22 +02:00
Willy Tarreau	ab1a247177	BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc These ones were not proposed in the list of trackable elements. Note that this depends on previous commit: BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn This should be backported to at least 3.0, maybe even 2.6. (cherry picked from commit 88a752ca789e6a2f863e16a35ddaa4fade5ccecd) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:17 +02:00
Willy Tarreau	ab921e1935	BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn In __trace_enabled(), a quic_conn was detected, but it was not possible to derive the connection nor the session from it, which was quite limiting in terms of ability to track a same instance. This should be backported to at least 3.0, maybe even 2.6. (cherry picked from commit aa1915a9f559724cb3fc2be39a2d928d99556e5a) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:12 +02:00
Willy Tarreau	762f3df4e4	DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted These must be before {bs,fs}.id, not after. Should be backported wherever `068ce2d5d2` ("MINOR: stconn: Add samples to retrieve about stream aborts") is (normally 3.0). (cherry picked from commit b681a9e48813742850299fb5207766ac6f15007d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:38:07 +02:00
Ilia Shipitsin	111e8590c1	BUG/MINOR: fcgi-app: handle a possible strdup() failure This defect was found by the coccinelle script "unchecked-strdup.cocci". It can be backported to 2.2. (cherry picked from commit aaaacaaf4b558f4e0206b98c787cdcef773b1d76) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:37:59 +02:00
Christopher Faulet	136fe7e3c1	BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync When the peer applet is waiting for a synchronisation with the global sync task, we must notify it won't consume data. Otherwise, if some data are already waiting in the input buffer, the applet will be woken up in loop and this wil trigger the watchdog. Once synchronized, the applet is woken up. In that case, the peer applet must indicate it is going to consume data again. This patch should fix the issue #2656. It must be backported to 3.0. (cherry picked from commit 78b8b6003082b54a24fa7b13be954f73248cc9d4) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:33:45 +02:00
Christopher Faulet	fef9d21bae	BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream When a stream is explicitly woken up by the H2 conneciton, if an error condition is detected, the corresponding error flag is set on the SE. So SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was reported or not. However, there is no attempt to propagate other termination flags. We must be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able to switch a pending error to a fatal error. Because of this bug, the SE remains with a pending error and no end of stream, preventing the applicative stream to trully abort it. It means on some abort scenario, it is possible to block a stream infinitely. This patch must be backported at least as far as 2.8. No bug was observed on older versions while the same code is inuse. (cherry picked from commit 184f16ded7a0274bffe99a4795d0a27f8be7c006) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:33:40 +02:00
Christopher Faulet	cac1e39e86	BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams For regular H2 messages, the HTX EOM flag is synonymous the end of input. So SE_FL_EOI flag must also be set on the stream-endpoint descriptor. However, there is an exception. For tunneled streams, the end of message is reported on the HTX message just after the headers. But in that case, no end of input is reported on the SE. But here, there is a bug. The "early" EOM is also report on the HTX messages when there is no payload (for instance a content-length set to 0). If there is no ES flag on the H2 HEADERS frame, it is an unexpected case. Because for the applicative stream and most probably for the opposite endpoint, the message is considered as finihsed. It is switched in its DONE state (or the equivalent on the endpoint). But, if an extra H2 frame with the ES flag is received, a TRAILERS frame or an emtpy DATA frame, an extra EOT HTX block is pushed to carry the HTX EOM flag. So an extra HTX block is emitted for a regular HTX message. It is totally invalid, it must never happen. Because it is an undefined behavior, it is difficult to predict the result. But it definitly prevent the applicative stream to properly handle aborts and errors because data remain blocked in the channel buffer. Indeed, the end of the message was seen, so no more data are forwarded. It seems to be an issue for 2.8 and upper. Harder to evaluate for older versions. This patch must be backported as far as 2.4. (cherry picked from commit 6743e128f34fba297f2cac836a4f11b84acd503a) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:33:34 +02:00
Christopher Faulet	aa43ed1719	BUG/MEDIUM: http-ana: Report error on write error waiting for the response When we are waiting for the server response, if an error is pending on the frontend side (a write error on client), it is handled as an abort and all regular response analyzers are removed, except the one responsible to release the filters, if any. However, while it is handled as an abort, the error is not reported, as usual, via http_reply_and_close() function. It is an issue because in that, the channels buffers are not reset. Because of this bug, it is possible to block a stream infinitely. The request side is waiting for the response side and the response side is blocked because filters must be released and this cannot be done because data remain blocked in channels buffers. So, in that case, calling http_reply_and_close() with no message is enough to unblock the stream. This patch must be backported as far as 2.8. (cherry picked from commit 0ba6202796fe24099aeff89a5a4b83af99fc027b) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:33:26 +02:00
Amaury Denoyelle	0a06f5f5ae	BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content Received QUIC packets are stored in quic_conn Rx buffer after header protection removal in qc_rx_pkt_handle(). These packets are then removed after quic_conn IO handler via qc_treat_rx_pkts(). If HP cannot be removed, packets are still copied into quic_conn Rx buffer. This can happen if encryption level TLS keys are not yet available. The packet remains in the buffer until HP can be removed and its content processed. An issue occurs if client emits a 0-RTT packet but haproxy does not have the shared secret, for example after a haproxy process restart. In this case, the packet is copied in quic_conn Rx buffer but its HP won't ever be removed. This prevents the buffer to be purged. After some time, if the client has emitted enough packets, Rx buffer won't have any space left and received packets are dropped. This will cause the connection to freeze. To fix this, remove any 0-RTT buffered packets on handshake completion. At this stage, 0-RTT packets are unnecessary anymore. The client is expected to reemit its content in 1-RTT packet which are properly deciphered. This can easily reproduce with HTTP/3 POST requests or retrieving a big enough object, which will fill the Rx buffer with ACK frames. Here is a picoquic command to provoke the issue on haproxy startup : $ picoquicdemo -Q -v 00000001 -a h3 <hostname> 20443 "/?s=1g" Note that allow-0rtt must be present on the bind line to trigger the issue. Else haproxy will reject any 0-RTT packets. This must be backported up to 2.6. This could be one of the reason for github issue #2549 but it's unsure for now. (cherry picked from commit bba6baff306820a01ea66fddde5acbad11c601b6) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:33:14 +02:00
William Lallemand	5072a968c3	BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC Revert patch `fcc8255` "MINOR: ssl_sock: Early data disabled during SSL_CTX switching (aws-lc)". The patch was done in the wrong callback which is never built for AWS-LC, and applies options on the SSL_CTX instead of the SSL, which should never be done elsewhere than in the configuration parsing. This was probably triggered by successfully linking haproxy against AWS-LC without using USE_OPENSSL_AWSLC. The patch also reintroduced SSL_CTX_set_early_data_enabled() in the ssl_quic_initial_ctx() and ssl_sock_initial_ctx(). So the initial_ctx does have the right setting, but it still needs to be applied to the selected SSL_CTX in the clienthello, because we need it on the selected SSL_CTX. Must be backported to 3.0. (ssl_clienthello.c part was in ssl_sock.c) (cherry picked from commit 1889b86561ee67696760111c6df5759c628430dc) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:33:00 +02:00
William Lallemand	5bf426baa4	BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC Then reactivate HAVE_SSL_0RTT and HAVE_SSL_0RTT_QUIC for AWS-LC, which were wrongly deactivated in `f5353f2c` ("MINOR: ssl: add HAVE_SSL_0RTT constant"). Must be backported to 3.0. (cherry picked from commit 56eefd6827b42afcefed7cc41d2cc38f5c1a2172) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:31:27 +02:00
Willy Tarreau	01aeb7495f	BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect The backend depends on the response and the frontend on the request, not the other way around. In addition, they used to depend on L6 (hence contents in the channel buffers) while they should only depend on L5 (permanent info known in the mux). This came in 2.9 with commit `24059615a7` ("MINOR: Add sample fetches to get the frontend and backend stream ID") so this can be backported there. (cherry picked from commit 61dd0156c82ea051779e6524cad403871c31fc5a) Signed-off-by: Willy Tarreau <w@1wt.eu> (cherry picked from commit 376b147ffffef0a1f898d72d1d70f10f07d2e5a4) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:31:22 +02:00
Christopher Faulet	40142d2b95	BUILD: mux-pt: Use the right name for the sedesc variable A typo was introduced in 760d26a86 ("BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path"). The sedesc variable is 'sd', not 'se'. This patch must be backported with the commit above. (cherry picked from commit d9f41b1d6e811022372ce541e67b047bd18630a9) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:31:16 +02:00
Christopher Faulet	7749062aba	BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path When data are sent using the kernel splicing, if a connection error occurred, the pipe must be released. Indeed, in that case, no more data can be sent and there is no reason to not release the pipe. But it is in fact an issue for the stream because the channel will appear are not empty. This may prevent the stream to be released. This happens on 2.8 when a filter is also attached on it. On 2.9 and upper, it seems there is not issue. But it is hard to be sure and the current patch remains valid is all cases. On 2.6 and lower, the code is not the same and, AFAIK, there is no issue. This patch must be backported to 2.8. However, on 2.8, there is no zero-copy data forwarding. The patch must be adapted. There is no done_ff/resume_ff callback functions for muxes. The pipe must released in sc_conn_send() when an error flag is set on the SE, after the call to snd_pipe callback function. (cherry picked from commit 760d26a8625f3af2b6939037a40f19b5f8063be1) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:31:09 +02:00
Christopher Faulet	e2a93b6492	BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set When a send on a connection is performed, if a SE error (or a pending error) was already reported earlier, we leave immediately. No send is performed. However, we must be sure to report the error at the SC level if necessary. Indeed, the SE error may have been reported during the zero-copy data forwarding. So during receive on the opposite side. In that case, we may have missed the opportunity to report it at the SC level. The patch must be backported as far as 2.8. (cherry picked from commit 5dc45445ff18207dbacebf1f777e1f1abcd5065d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:31:01 +02:00
Aurelien DARRAGON	b2dabc930c	BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak An issue has been introduced with `cd99440` ("BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates"). Indeed, in the above commit we implemented the atomic_sync task which is responsible for consuming pending server events to apply the changes atomically. For now only server's addr updates are concerned. To prevent the task from causing contention, a budget was assigned to it. It can be controlled with the global tunable 'tune.events.max-events-at-once': the task may not process more than this number of events at once. However, a bug was introduced with this budget logic: each time the task has to be interrupted because it runs out of budget, we reschedule the task to finish where it left off, but the current event which was already removed from the queue wasn't processed yet. This means that this pending event (each tune.events.max-events-at-once) is effectively lost. When the atomic_sync task deals with large number of concurrent events, this bug has 2 known consequences: first a server's addr/port update will be lost every 'tune.events.max-events-at-once'. This can of course cause reliability issues because if the event is not republished periodically, the server could stay in a stale state for indefinite amount of time. This is the case when the DNS server flaps for instance: some servers may not come back UP after the incident as described in GH #2666. Another issue is that the lost event was not cleaned up, resulting in a small memory leak. So in the end, it means that the bug is likely to cause more and more degradation over time until haproxy is restarted. As a workaround, 'tune.events.max-events-at-once' may be set to the maximum number of events expected per batch. Note however that this value cannot exceed 10 000, otherwise it could cause the watchdog to trigger due to the task being busy for too long and preventing other threads from making any progress. Setting higher values may not be optimal for common workloads so it should only be used to mitigate the bug while waiting for this fix. Since tune.events.max-events-at-once defaults to 100, this bug only affects configs that involve more than 100 servers whose addr:port properties are likely to be updated at the same time (batched updates from cli, lua, dns..) To fix the bug, we move the budget check after the current event is fully handled. For that we went from a basic 'while' to 'do..while' loop as we assume from the config that 'tune.events.max-events-at-once' cannot be 0. While at it, we reschedule the task once thread isolation ends (it was not required to perform the reschedule while under isolation) to give the hand back faster to waiting threads. This patch should be backported up to 2.9 with `cd99440`. It should fix GH #2666. (cherry picked from commit 8f1fd96d17588fb571959901bd20d4239b1a96af) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 16:39:52 +02:00
Willy Tarreau	7a59afa93b	[RELEASE] Released version 3.0.4 Released version 3.0.4 with the following main changes : - MINOR: proto: extend connection thread rebind API - BUILD: listener: silence a build warning about unused value without threads - BUG/MEDIUM: quic: prevent crash on accept queue full - CLEANUP: proto: rename TID affinity callbacks - CLEANUP: quic: rename TID affinity elements - BUG/MINOR: session: Eval L4/L5 rules defined in the default section - BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts - DOC: install: don't reference removed CPU arg - BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path - BUG/MAJOR: mux-h2: force a hard error upon short read with pending error - DOC: configuration: issuers-chain-path not compatible with OCSP - DOC: config: improve the http-keep-alive section - BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter - BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution - BUG/MINOR: cli: Atomically inc the global request counter between CLI commands - BUG/MINOR: quic: Non optimal first datagram. - MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface - BUG/MINOR: quic: Lack of precision when computing K (cubic only cc) - BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the signature - MINOR: quic: Dump TX in flight bytes vs window values ratio. - MINOR: quic: Add information to "show quic" for CUBIC cc. - MEDIUM: h1: allow to preserve keep-alive on T-E + C-L - MINOR: queue: add a function to check for TOCTOU after queueing - BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue() - MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD (take #2) - BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn - Revert "MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface" - MEDIUM: log: relax some checks and emit diag warnings instead in lf_expr_postcheck() - DOC: quic: fix default minimal value for max window size - MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status - BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding - BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready - BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry - BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown - BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli - BUG/MINOR: quic: unexploited retransmission cases for Initial pktns. - BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered - MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places - BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf - BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only) - BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns - BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()	2024-09-03 15:37:09 +02:00
Christopher Faulet	710f2389d4	BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut() A regression was introduced in the commit 76fa71f7a ("BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown") because of a typo on the connection flags. CO_FL_SOCK_WR_SH flag must be tested to prevent a call to conn_sock_shutw() and not CO_FL_SOCK_RD_SH. Concretly, most of time, it is harmeless because shutdown for writes is always performed before any shutdown for reads. Except in case describe by the commit above. But it is not clear if it has an impact or not. This patch must be backported with the commit above, so as far as 2.9. (cherry picked from commit e1cae428791abf4e4fdf3969761eaaafd45df636) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 15:31:58 +02:00
Frederic Lecaille	2f7ae07342	BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns This bug arrived with this naive commit: BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only) which omitted to consider the case where the Initial packet number space could be discarded before receiving 0-RTT packets. To fix this, append/insert the O-RTT (early-data) packet number space into the encryption level list depending on the presence or not of the Initial packet number space. This issue was revealed when using aws-lc as TLS stack in GH #2701 issue. Thank you to @Tristan971 for having reported this issue. Must be backported where the commit mentionned above is supposed to be backported: as far as 2.9. (cherry picked from commit 7e19432fd41e9c0146f0227b43d0dd3dc740e20b) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 15:29:32 +02:00
Frederic Lecaille	19d9009a91	BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only) By "aws-lc only", one means that this bug was first revealed by aws-lc stack. This does not mean it will not appeared for new versions of other TLS stacks which have never revealed this bug. This bug was reported by Ilya (@chipitsine) in GH #2657 where some QUIC interop tests (resumption, zerortt) could lead to crash with haproxy compiled against aws-lc TLS stack. These crashed were triggered by this BUG_ON() which detects that too short datagrams with at least one ack-eliciting Initial packet inside could be built. <0>2024-07-31T15:13:42.562717+02:00 [01\|quic\|5\|quic_tx.c:739] qc_prep_pkts(): next encryption level : qc@0x61d000041080 idle_timer_task@0x60d000006b80 flags=0x6000058 FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163 call trace(12): \| 0x563ea447bc02 [ba d9 00 00 00 48 8d 35]: main-0x1958ce \| 0x563ea4482703 [e9 73 fe ff ff ba 03 00]: qc_send+0x17e4/0x1b5d \| 0x563ea4488ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c \| 0x563ea468e6f9 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2 \| 0x563ea468f24a [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6 \| 0x563ea4610893 [83 3d aa 65 44 00 01 0f]: run_poll_loop+0x6e/0x57b \| 0x563ea4611043 [48 8b 1d 46 c7 1d 00 48]: main-0x48d \| 0x7f64d05fb609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609 \| 0x7f64d0520353 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e That said everything was correctly done by qc_prep_ptks() to prevent such a case. But this relied on the hypothesis that the list of encryption levels it used was always built in the same order as follows for 0-RTT sessions: initial, early-data, handshake, application But this order is determined but the order the TLS stack derives the secrets for these encryption levels. For aws-lc, this order is not the same but as follows: initial, handshake, application, early-data During 0-RTT sessions, the server may have to build three ack-eliciting packets (with CRYPTO data inside) to reply to the first client packet: initial, hanshake, application. qc_prep_pkts() adds a PADDING frame to the last built packet for the last encryption level in the list. But after application level encryption, there is early-data encryption level. This prevented qc_prep_pkts() to build a padded applicaiton level last packet to send a 1200-bytes datagram. To fix this, always insert early-data encryption level after the initial encryption level into the encryption levels list when initializing this encryption level from quic_conn_enc_level_init(). Must be backported as far as 2.9. (cherry picked from commit e12620a8a909805755ae1e8b8552c187202a9f3f) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 15:29:25 +02:00
Willy Tarreau	c725db17e8	BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf There exists an extremely tricky code path that was revealed in 3.0 by the glitches feature, though it might theoretically have existed before. TL;DR: a mux mbuf may be full after successfully sending GOAWAY, and discard its remaining contents without clearing H2_CF_MUX_MFULL and H2_CF_DEM_MROOM, then endlessly loop in h2_send(), until the watchdog takes care of it. What can happen is the following: Some data are received, h2_io_cb() is called. h2_recv() is called to receive the incoming data. Then h2_process() is called and in turn calls h2_process_demux() to process input data. At some point, a glitch limit is reached and h2c_error() is called to close the connection. The input frame was incomplete, so some data are left in the demux buffer. Then h2_send() is called, which in turn calls h2_process_mux(), which manages to queue the GOAWAY frame, turning the state to H2_CS_ERROR2. The frame is sent, and h2_process() calls h2_send() a last time (doing nothing) and leaves. The streams are all woken up to notify about the error. Multiple backend streams were waiting to be scheduled and are woken up in turn, before their parents being notified, and communicate with the h2 mux in zero-copy-forward mode, request a buffer via h2_nego_ff(), fill it, and commit it with h2_done_ff(). At some point the mux's output buffer is full, and gets flags H2_CF_MUX_MFULL. The io_cb is called again to process more incoming data. h2_send() isn't called (polled) or does nothing (e.g. TCP socket buffers full). h2_recv() may or may not do anything (doesn't matter). h2_process() is called since some data remain in the demux buf. It goes till the end, where it finds st0 == H2_CS_ERROR2 and clears the mbuf. We're now in a situation where the mbuf is empty and MFULL is still present. Then it calls h2_send(), which doesn't call h2_process_mux() due to MFULL, doesn't enter the for() loop since all buffers are empty, then keeps sent=0, which doesn't allow to clear the MFULL flag, and since "done" was not reset, it loops forever there. Note that the glitches make the issue more reproducible but theoretically it could happen with any other GOAWAY (e.g. PROTOCOL_ERROR). What makes it not happen with the data produced on the parsing side is that we process a single buffer of input at once, and there's no way to amplify this to 30 buffers of responses (RST_STREAM, GOAWAY, SETTINGS ACK, WINDOW_UPDATE, PING ACK etc are all quite small), and since the mbuf is cleared upon every exit from h2_process() once the error was sent, it is not possible to accumulate response data across multiple calls. And the regular h2_snd_buf() path checks for st0 >= H2_CS_ERROR so it will not produce any data there either. Probably that h2_nego_ff() should check for H2_CS_ERROR before accepting to deliver a buffer, but this needs to be carefully studied. In the mean time the real problem is that the MFULL flag was kept when clearing the buffer, making the two inconsistent. Since it doesn't seem possible to trigger this sequence without the zero-copy-forward mechanism, this fix needs to be backported as far as 2.9, along with previous commit "MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places" which will strengthen the consistency between these checks. Many thanks to Annika Wickert for her detailed report that allowed to diagnose this problem. CVE-2024-45506 was assigned to this problem. (cherry picked from commit 830e50561c6636be4ada175d03e8df992abbbdcd) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 14:59:09 +02:00
Willy Tarreau	d636e51545	MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places The code leading to H2_CF_MUX_MFULL and H2_CF_DEM_MROOM being cleared is quite complex and assumptions about its state are extremely difficult when reading the code. There are indeed long sequences where the mux might possibly be empty, still having the flag set until it reaches h2_send() which will clear it after the last send. Even then it's not obviour whether it's always guaranteed to release the flag when invoked in multiple passes. Let's just simplify the conditionnn so that h2_send() does not depend on "sent" anymore and that h2_timeout_task() doesn't leave the flags set on the buffer on emptiness. While it doesn't seem to fix anything, it will make the code more robust against future changes. (cherry picked from commit e9cdedb39b2020a7eb1ae5d8462b391d4301fb93) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 14:59:09 +02:00
Christopher Faulet	dfefb9953e	BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered When a 400/408/500/501 error is returned by the H1 multiplexer, we first try to get the error message of the proxy before using the default one. This may be configured to be mapped on /dev/null or on an empty file. In that case, no message is emitted, as expected. But everything is handled as the error was successfully sent. However, there is an bug here. In h1_send_error() function, this case is not properly handled. The flag H1C_F_ABRTED is not set on the H1 connection as it should be and h1_close() function is not called, leaving the H1 connection in an undefined state. It is especially an issue when a "empty" 408-Request-Time-out error is emitted while there are data blocked in the output buffer. In that case, the connection remains openned until the client closes and a "cR--"/408 is logged repeatedly, every time the client timeout is reached. This patch must backported as far as 2.8. (cherry picked from commit 0d4271cdae18780de79e1ce997d562f91eeee316) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 14:39:30 +02:00
Frederic Lecaille	657e745c16	BUG/MINOR: quic: unexploited retransmission cases for Initial pktns. qc_prep_hdshk_fast_retrans() job is to pick some packets to be retransmitted from Initial and Handshake packet number spaces. A packet may be coalesced to a first one into the same datagram. When a coalesced packet is inspected for retransmission, it is skipped if its length would make the total datagram length it is attached to exceeding the anti-amplification limit. But in this case, the first packet must be kept for the current retransmission. This is tracked by this trace statemement: TRACE_PROTO("will probe Initial packet number space", QUIC_EV_CONN_SPPKTS, qc); This was not the case because of the wrong "goto end" statement. This latter must be run only if the Initial packet number space must not be probe with the first packet found as coalesced to another one which must be skipped. This bug was revealed by AWS-LC interop runner with handshakeloss and handshakecorruption which always fail because this stack leads the server to send more Initial packets. Thank you to Ilya (@chipitsine) for this issue report in GH #2663. Must be backported as far as 2.6. (cherry picked from commit 15a737eb5fc54bbc8aa5cadad054a69badde5b8e) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 13:56:55 +02:00
Christopher Faulet	008f445a4f	BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli When several commands are chained on the master CLI, the same client connection is used. Because, it is a TCP connection, the mux PT is used. It means there is no stream at the mux level. It is not possible to release the applicative stream between each commands as for the HTTP. So, to work around this limitation, between two commands, the master CLI is resetting the stream. It does exactly what it was performed on HTTP to manage keep-alive connections on old HAProxy versions. But this part was copied from a code dealing with connection only while the back endpoint can be an applet or a mux for the master cli. The previous fix on the mux PT ("BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown") revealed a bug. Between two commands, the back endpoint was only released if the connection's XPRT was closed. This works if the back endpoint is an applet because there is no connection. But for commands sent to a worker, a connection is used. At this stage, this only works if the connection's XPRT is closed. Otherwise, the old endpoint is never detached leading to undefined behavior on the next command execution (most probably a crash). Without the commit above, the connection's XPRT is always closed on shutdown. It is no longer true. At this stage, we must inconditionnally release the back endpoint by resetting the corresponding sedesc to fix the bug. This patch must be backported with the commit above in all stable versions. On 2.4 and lower, it will need to be adapted. (cherry picked from commit d4781bd5e7f0e0c0491fb7dfccb4da5c15055e3f) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 07:51:16 +02:00
Christopher Faulet	8c94c485b0	BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown When a shutdown is reported to the mux (shutdown for reads or writes), the connexion is immediately fully closed if the mux detects the connexion is closed in both directions. Only the passthrough multiplexer is able to perform this action at this stage because there is no stream and no internal data. Other muxes perform a full connection close during the mux's release stage. It was working quite well since recently. But, in theory, the bug is quite old. In fact, it seems possible for the lower layer to report an error on the connection in same time a shutdown is performed on the mux. Depending on how events are scheduled, the following may happen: 1. An connection error is detected at the fd layer and a wakeup is scheduled on the mux to handle the event. 2. A shutdown for writes is performed on the mux. Here the mux decides to fully close the connexion. If the xprt is not used to log info, it is released. 3. The mux is finally woken up. It tries to retrieve data from the xprt because it is not awayre there was an error. This leads to a crash because of a NULL-deref. By reading the code, it is not obvious. But it seems possible with SSL connection when the handshake is rearmed. It happens when a SSL_ERROR_WANT_WRITE is reported on a SSL_read() attempt or a SSL_ERROR_WANT_READ on a SSL_write() attempt. This bug is only visible if the XPRT is not used to log info. So it is no so common. This patch should fix the 2nd crash reported in the issue #2656. It must first be backported as far as 2.9 and then slowly to all stable versions. (cherry picked from commit 76fa71f7a8d27006ea1b06b417963501d3a5fcab) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-03 07:51:08 +02:00
Christopher Faulet	46f72f4379	BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry Support for 429 was recently added to L7 retries (0d142e075 "MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status"). But the l7_status_match() function was not properly updated. The switch statement must match the 429 status to be able to perform a L7 retry. This patch must be backported if the commit above is backported. It is related to #2687. (cherry picked from commit 62c9d51ca4d4f870723522b30d368d984f536e7e) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-02 20:09:33 +02:00
Christopher Faulet	13437097c3	BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready If an early error occurred on the client connection, we must prevent any multiplexer upgrades. Indeed, it is unexpected for a mux to be initialized with no xprt. On a normal workflow it is impossible. So it is not an issue. But if a mux upgrade is performed at the stream level, an early error on the connection may have already been handled by the previous mux and the connection may be already fully closed. If the mux upgrade is still performed, a crash can be experienced. It is possible to have a crash with an implicit TCP>HTTP upgrade if there is no data in the input buffer. But it is also possible to get a crash with an explicit "switch-mode http" rule. It must be backported to all stable versions. In 2.2, the patch must be applied directly in stream_set_backend() function. (cherry picked from commit e4812404c541018ba521abf6573be92553ba7c53) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-02 20:09:33 +02:00

... 2 3 4 5 6 ...

22744 Commits