haproxy

Author	SHA1	Message	Date
Christopher Faulet	10827a18ec	MINOR: connection: No longer include stconn type header in connection-t.h It is a small change, but it is cleaner to no include stconn-t.h header in connection-t.h, mainly to avoid circular definitions. The related issue is #2502. (cherry picked from commit 4b8098bf4831c0dfca4a058bd3170a5ed7ae8bbf) Signed-off-by: William Lallemand <wlallemand@haproxy.com>	2024-09-30 15:40:44 +02:00
Amaury Denoyelle	0f7ae826ed	BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM STREAM frames have dedicated handling on retransmission. A special check is done to remove data already acked in case of duplicated frames, thus only unacked data are retransmitted. This handling is faulty in case of an empty STREAM frame with FIN set. On retransmission, this frame does not cover any unacked range as it is empty and is thus discarded. This may cause the transfer to freeze with the client waiting indefinitely for the FIN notification. To handle retransmission of empty FIN STREAM frame, qc_stream_desc layer have been extended. A new flag QC_SD_FL_WAIT_FOR_FIN is set by MUX QUIC when FIN has been transmitted. If set, it prevents qc_stream_desc to be freed until FIN is acknowledged. On retransmission side, qc_stream_frm_is_acked() has been updated. It now reports false if FIN bit is set on the frame and qc_stream_desc has QC_SD_FL_WAIT_FOR_FIN set. This must be backported up to 2.6. However, this modifies heavily critical section for ACK handling and retransmission. As such, it must be backported only after a period of observation. This issue can be reproduced by using the following socat command as server to add delay between the response and connection closure : $ socat TCP-LISTEN:<port>,fork,reuseaddr,crlf SYSTEM:'echo "HTTP/1.1 200 OK"; echo ""; sleep 1;' On the client side, ngtcp2 can be used to simulate packet drop. Without this patch, connection will be interrupted on QUIC idle timeout or haproxy client timeout with ERR_DRAINING on ngtcp2 : $ ngtcp2-client --exit-on-all-streams-close -r 0.3 <host> <port> "http://<host>:<port>/?s=32o" Alternatively to ngtcp2 random loss, an extra haproxy patch can also be used to force skipping the emission of the empty STREAM frame : diff --git a/include/haproxy/quic_tx-t.h b/include/haproxy/quic_tx-t.h index efbdfe687..1ff899acd 100644 --- a/include/haproxy/quic_tx-t.h +++ b/include/haproxy/quic_tx-t.h @@ -26,6 +26,8 @@ extern struct pool_head pool_head_quic_cc_buf; / Flag a sent packet as being probing with old data / #define QUIC_FL_TX_PACKET_PROBE_WITH_OLD_DATA (1UL << 5) +#define QUIC_FL_TX_PACKET_SKIP_SENDTO (1UL << 6) + / Structure to store enough information about TX QUIC packets. / struct quic_tx_packet { / List entry point. / diff --git a/src/quic_tx.c b/src/quic_tx.c index 2f199ac3c..2702fc9b9 100644 --- a/src/quic_tx.c +++ b/src/quic_tx.c @@ -318,7 +318,7 @@ static int qc_send_ppkts(struct buffer buf, struct ssl_sock_ctx ctx) tmpbuf.size = tmpbuf.data = dglen; TRACE_PROTO("TX dgram", QUIC_EV_CONN_SPPKTS, qc); - if (!skip_sendto) { + if (!skip_sendto && !(first_pkt->flags & QUIC_FL_TX_PACKET_SKIP_SENDTO)) { int ret = qc_snd_buf(qc, &tmpbuf, tmpbuf.data, 0, gso); if (ret < 0) { if (gso && ret == -EIO) { @@ -354,6 +354,7 @@ static int qc_send_ppkts(struct buffer buf, struct ssl_sock_ctx ctx) qc->cntrs.sent_bytes_gso += ret; } } + first_pkt->flags &= ~QUIC_FL_TX_PACKET_SKIP_SENDTO; b_del(buf, dglen + QUIC_DGRAM_HEADLEN); qc->bytes.tx += tmpbuf.data; @@ -2066,6 +2067,17 @@ static int qc_do_build_pkt(unsigned char pos, const unsigned char *end, continue; } + switch (cf->type) { + case QUIC_FT_STREAM_8 ... QUIC_FT_STREAM_F: + if (!cf->stream.len && (qc->flags & QUIC_FL_CONN_TX_MUX_CONTEXT)) { + TRACE_USER("artificially drop packet with empty STREAM frame", QUIC_EV_CONN_TXPKT, qc); + pkt->flags \|= QUIC_FL_TX_PACKET_SKIP_SENDTO; + } + break; + default: + break; + } + quic_tx_packet_refinc(pkt); cf->pkt = pkt; } (cherry picked from commit e177cf341cf3665e340855312714fe002688b2db) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-19 10:54:02 +02:00
Amaury Denoyelle	af8d1c24e5	MINOR: quic: implement function to check if STREAM is fully acked When a STREAM frame is retransmitted, a check is performed to remove range of data already acked from it. This is useful when STREAM frames are duplicated and splitted to cover different data ranges. The newly retransmitted frame contains only unacked data. This process is performed similarly in qc_dup_pkt_frms() and qc_build_frms(). Refactor the code into a new function named qc_stream_frm_is_acked(). It returns true if frame data are already fully acked and retransmission can be avoided. If only a partial range of data is acknowledged, frame content is updated to only cover the unacked data. This patch does not have any functional change. However, it simplifies retransmission for STREAM frames. Also, it will be reused to fix retransmission for empty STREAM frames with FIN set from the following patch : BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM As such, it must be backported prior to it. (cherry picked from commit 714009b7bcf921836c2df7fc0d26b2ad257c8307) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-19 10:54:02 +02:00
Amaury Denoyelle	cbdf1ecb19	MINOR: quic: convert qc_stream_desc release field to flags qc_stream_desc had a field <release> used as a boolean. Convert it with a new <flags> field and QC_SD_FL_RELEASE value as equivalent. The purpose of this patch is to be able to extend qc_stream_desc by adding newer flags values. This patch is required for the following patch BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM As such, it must be backported prior to it. (cherry picked from commit bb9ac256a1e5468535b8242dc762e7bb0d9a8bf3) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-19 10:54:02 +02:00
Christopher Faulet	3b75ca846c	BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send On a synchronous send from the stream to an applet, if some data were sent, we must take care to wake the applet up. It is important because if everything was sent at this stage, there is no other chance to wake the applet up, mainly because SE_FL_WAIT_DATA flag is set on the applet's sedesc in sc_update_tx() at the end of process_stream(). This flag prevent any wakeup of the applet for a send event. It is not necessary for a mux because the mux stream is called when a syncrhonous send from the stream is performed. So it is reponsible to wake the mux connection if necessary. This patch must be backport to 3.0. (cherry picked from commit 5fc12b0afd97ac68ef77335b9b5b9a0288f439f5) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-17 09:29:16 +02:00
Willy Tarreau	36ea5570d2	BUG/MEDIUM: queue: implement a flag to check for the dequeuing As unveiled in GH issue #2711, commit 5541d4995d ("BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()") does have some side effects in that it can occasionally cause an endless loop. As Christopher analysed it, the problem is that process_srv_queue(), which uses a trylock in order to leave only one thread in charge of the dequeueing process, can lose the lock race against pendconn_add(). If this happens on the last served request, then there's no more thread to deal with the dequeuing, and assign_server_and_queue() will loop forever on a condition that was initially exepected to be extremely rare (and still is, except that now it can become sticky). Previously what was happening is that such queued requests would just time out and since that was very rare, nobody would notice. The root of the problem really is that trylock. It was added so that only one thread dequeues at a time but it doesn't offer only that guarantee since it also prevents a thread from dequeuing if another one is in the process of queuing. We need a different criterion. What we're doing now is to set a flag "dequeuing" in the server, which indicates that one thread is currently in the process of dequeuing requests. This one is atomically tested, and only if no thread is in this process, then the thread grabs the queue's lock and dequeues. This way it will be serialized with pendconn_add() and no request addition will be missed. It is not certain whether the original race covered by the fix above can still happen with this change, so better keep that fix for now. Thanks to @Yenya (Jan Kasprzak) for the precise and complete report allowing to spot the problem. This patch should be backported wherever the patch above was backported. (cherry picked from commit b11495652e724d71f1f4247332f060fe48577664) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-17 09:27:58 +02:00
Aurelien DARRAGON	fc0af6f2bb	BUG/MEDIUM: pattern: prevent UAF on reused pattern expr Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to crash) can be experienced if the same pattern file (and match method) is used in two default sections and the first one is not referenced later in the config. In this case, the first default section will be cleaned up. However, due to an unhandled case in the above optimization, the original expr which the second default section relies on is mistakenly freed. This issue was discovered while trying to reproduce GH #2708. The issue was particularly tricky to reproduce given the config and sequence required to make the UAF happen. Hopefully, Github user @asmnek not only provided useful informations, but since he was able to consistently trigger the crash in his environment he was able to nail down the crash to the use of pattern file involved with 2 named default sections. Big thanks to him. To fix the issue, let's push the logic from c5959fd a bit further. Instead of relying on "do_free" variable to know if the expression should be freed or not (which proved to be insufficient in our case), let's switch to a simple refcounting logic. This way, no matter who owns the expression, the last one attempting to free it will be responsible for freeing it. Refcount is implemented using a 32bit value which fills a previous 4 bytes structure gap: int mflags; /* 80 4 / / XXX 4 bytes hole, try to pack / long unsigned int lock; / 88 8 */ (output from pahole) Even though it was not reproduced in 2.6 or below by @asmnek (the bug was revealed thanks to another bugfix), this issue theorically affects all stable versions (up to c5959fd), thus it should be backported to all stable versions. (cherry picked from commit 68cfb222b559b909415c9aa92114ca42c9a93459) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-09 19:44:33 +02:00
Frederic Lecaille	e875aa59a9	BUG/MEDIUM: quic: always validate sender address on 0-RTT It has been reported by Wedl Michael, a student at the University of Applied Sciences St. Poelten, a potential vulnerability into haproxy as described below. An attacker could have obtained a TLS session ticket after having established a connection to an haproxy QUIC listener, using its real IP address. The attacker has not even to send a application level request (HTTP3). Then the attacker could open a 0-RTT session with a spoofed IP address trusted by the QUIC listen to bypass IP allow/block list and send HTTP3 requests. To mitigate this vulnerability, one decided to use a token which can be provided to the client each time it successfully managed to connect to haproxy. These tokens may be reused for future connections to validate the address/path of the remote peer as this is done with the Retry token which is used for the current connection, not the next one. Such tokens are transported by NEW_TOKEN frames which was not used at this time by haproxy. So, each time a client connect to an haproxy QUIC listener with 0-RTT enabled, it is provided with such a token which can be reused for the next 0-RTT session. If no such a token is presented by the client, haproxy checks if the session is a 0-RTT one, so with early-data presented by the client. Contrary to the Retry token, the decision to refuse the connection is made only when the TLS stack has been provided with enough early-data from the Initial ClientHello TLS message and when these data have been accepted. Hopefully, this event arrives fast enough to allow haproxy to kill the connection if some early-data have been accepted without token presented by the client. quic_build_post_handshake_frames() has been modified to build a NEW_TOKEN frame with this newly implemented token to be transported inside. quic_tls_derive_retry_token_secret() was renamed to quic_do_tls_derive_token_secre() and modified to be reused and derive the secret for the new token implementation. quic_token_validate() has been implemented to validate both the Retry and the new token implemented by this patch. When this is a non-retry token which could not be validated, the datagram received is marked as requiring a Retry packet to be sent, and no connection is created. When the Initial packet does not embed any non-retry token and if 0-RTT is enabled the connection is marked with this new flag: QUIC_FL_CONN_NO_TOKEN_RCVD. As soon as the TLS stack detects that some early-data have been provided and accepted by the client, the connection is marked to be killed (QUIC_FL_CONN_TO_KILL) from ha_quic_add_handshake_data(). This is done calling qc_ssl_eary_data_accepted() new function. The secret TLS handshake is interrupted as soon as possible returnin 0 from ha_quic_add_handshake_data(). The connection is also marked as requiring a Retry packet to be sent (QUIC_FL_CONN_SEND_RETRY) from ha_quic_add_handshake_data(). The the handshake I/O handler (quic_conn_io_cb()) knows how to behave: kill the connection after having sent a Retry packet. About TLS stack compatibility, this patch is supported by aws-lc. It is disabled for wolfssl which does not support 0-RTT at this time thanks to HAVE_SSL_0RTT_QUIC. This patch depends on these commits: MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event. MINOR: quic: Implement qc_ssl_eary_data_accepted(). MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct) BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder MINOR: quic: Token for future connections implementation. MINOR: quic: Implement quic_tls_derive_token_secret(). MINOR: tools: Implement ipaddrcpy(). Must be backported as far as 2.6. (cherry picked from commit f627b9272bd8ffca6f2f898bfafc6bf0b84b7d46) [fl: Add ->flags to quic_dgram struct (would arrive with quic_initial feature). Add QUIC_DGRAM_FL_ quic_dgram flags (would arrive with quic_initial feature). Modify quic_rx_pkt_retrieve_conn() to fix a compilation issue and correctly handle the "if (pkt->token_len) {}" else block to do so with quic_initial feature] Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:42:48 +02:00
Frederic Lecaille	7cae9b607f	MINOR: quic: Implement qc_ssl_eary_data_accepted(). This function is a wrapper around SSL_get_early_data_status() for OpenSSL derived stack and SSL_early_data_accepted() boringSSL derived stacks like AWS-LC. It returns true for a TLS server if it has accepted the early data received from a client. Also implement quic_ssl_early_data_status_str() which is dedicated to be used for debugging purposes (traces). This function converts the enum returned by the two function mentionned above to a human readable string. (cherry picked from commit 609b1245610c8f3e219a1eff12d13559cae109cd) Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:25:03 +02:00
Frederic Lecaille	ceb7dbf420	MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct) Modify qf_new_token structure to use a static buffer with QUIC_TOKEN_LEN as size as defined by the token for future connections (quic_token.c). Modify consequently the NEW_TOKEN frame parser (see quic_parse_new_token_frame()). Also add comments to denote that the NEW_TOKEN parser function is used only by clients and that its builder is used only by servers. (cherry picked from commit e926378375bcf579daadea071c600651eb7dce0d) [fl: remove useless openssl/chacha.h header inclusion when moving openssl-compat.h at the start of the header inclusions as expected by this patch] Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:23:20 +02:00
Frederic Lecaille	92b811d520	MINOR: quic: Token for future connections implementation. There exist two sorts of token used by QUIC. They are both used to validate the peer address (path validation). Retry are used for the current connection the client want to open. This patch implement the other sort of tokens which after having been received from a connection, may be provided for the next connection from the same IP address to validate it (or validate the network path between the client and the server). The token generation is implemented by quic_generate_token(), and the token validation by quic_token_chek(). The same method is used as for Retry tokens to build such tokens to be reused for future connections. The format is very simple: one byte for the format identifier to distinguish these new tokens for the Retry token, followed by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes are added to this token and a salt to derive the AEAD secret used to cipher the token. In addition to this salt, this is the client IP address which is used also as AAD to derive the AEAD secret. So, the length of the token is fixed: 37 bytes. (cherry picked from commit f5b09dc452f582eb876527fd28103bc29c51afad) [fl: very minor Makefile modif to correctly add quic_token.o object to be compiled] Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:17:27 +02:00
William Lallemand	0c7265509c	MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD The QUIC crypto is using the EVP_CIPHER API in order to achieve authenticated encryption, this was the API which was used with OpenSSL. With libraries that inspires from BoringSSL (libreSSL and AWS-LC), the AEAD algorithms are implemented using the EVP_AEAD API. This patch converts the call to the EVP_CIPHER API when called in the contex of AEAD cryptography for QUIC. The patch defines some QUIC_AEAD macros that can be either EVP_CIPHER or EVP_AEAD depending on the library. This was mainly done for AWS-LC but this could be useful for other libraries. This should finally allow to use CHACHA20_POLY1305 with AWS-LC. This patch allows to use the following ciphers with the EVP_AEAD API: - TLS1_3_CK_AES_128_GCM_SHA256 - TLS1_3_CK_AES_256_GCM_SHA384 AWS-LC does not implement TLS1_3_CK_AES_128_CCM_SHA256 and TLS1_3_CK_CHACHA20_POLY1305_SHA256 requires some hack for headers protection which will come in another patch. (cherry picked from commit 31c831e29b432f0a9958be63948e8f4cb278e9f8) [fl: required to support NEW_TOKEN which depends on QUIC_AEAD_* definitions] Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:15:15 +02:00
Frederic Lecaille	d074491bd4	MINOR: quic: Implement quic_tls_derive_token_secret(). This is function is similar to quic_tls_derive_retry_token_secret(). Its aim is to derive the secret used to cipher the token to be used for future connections. This patch renames quic_tls_derive_retry_token_secret() to a more and reuses its code to produce a more generic one: quic_do_tls_derive_token_secret(). Two arguments are added to this latter to produce both quic_tls_derive_retry_token_secret() and quic_tls_derive_token_secret() new function which calls quic_do_tls_derive_token_secret(). (cherry picked from commit 74caa0eece1cc3a8b35f1d34674ea5f357819314) Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:13:00 +02:00
Frederic Lecaille	ee7ad6615d	MINOR: tools: Implement ipaddrcpy(). Implement ipaddrcpy() new function to copy only the IP address from a sockaddr_storage struct object into a buffer. (cherry picked from commit fb7a0922038932a6b82f1827a0214c5d2e8da32e) Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>	2024-09-05 16:12:41 +02:00
William Lallemand	dcfe6118b2	MINOR: channel: implement ci_insert() function ci_insert() is a function which allows to insert a string <str> of size <len> at <pos> of the input buffer. This is the equivalent of ci_insert_line2() but without inserting '\r\n' (cherry picked from commit b2a8e8731da82b8bbd9dfff6d5a0d71f25a5ee49) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:49:07 +02:00
William Lallemand	5bf426baa4	BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC Then reactivate HAVE_SSL_0RTT and HAVE_SSL_0RTT_QUIC for AWS-LC, which were wrongly deactivated in f5353f2c ("MINOR: ssl: add HAVE_SSL_0RTT constant"). Must be backported to 3.0. (cherry picked from commit 56eefd6827b42afcefed7cc41d2cc38f5c1a2172) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-09-03 18:31:27 +02:00
Christopher Faulet	46f72f4379	BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry Support for 429 was recently added to L7 retries (0d142e075 "MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status"). But the l7_status_match() function was not properly updated. The switch statement must match the 429 status to be able to perform a L7 retry. This patch must be backported if the commit above is backported. It is related to #2687. (cherry picked from commit 62c9d51ca4d4f870723522b30d368d984f536e7e) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-02 20:09:33 +02:00
Christopher Faulet	ac2dc762f8	MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status The "429" status can now be specified on retry-on directives. PR_RE_* flags were updated to remains sorted. This patch should fix the issue #2687. It is quite simple so it may safely be backported to 3.0 if necessary. (cherry picked from commit 0d142e0756986b56819ecb2d131a0c4b30ae899f) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-09-02 20:09:33 +02:00
Valentine Krasnobaeva	d6c8f7d7ae	MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD (take #2 ) Let's provide a default value for fd_hard_limit, if it's not set in the configuration. With this patch we could set some specific default via compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for haproxy package maintainers. make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000 If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set to 1048576. This is done to avoid killing the process by its watchdog, while it started without any limitations in its configuration or in the command line and the hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the size of internal fdtab, which becames very-very large as well. When the process starts to simply loop over this fdtab (0(n)), this takes a lot of time, so watchdog does it job. To avoid this, maxconn now is always reduced to some reasonable value either by explicit global.fd-hard-limit from configuration, or by its default. The default may be changed at build-time and overwritten then by global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the configuration has always precedence over DEFAULT_MAXFD, if set. Must be backported in all stable versions until v2.6.0, including v2.6.0. (cherry picked from commit 41275a691839df5f8dc7cb9faa4e259fbb755d34) [wt: the discussion around this patch came to an agreement on the list: https://www.mail-archive.com/haproxy@formilux.org/msg45098.html ] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 14:19:27 +02:00
Willy Tarreau	94f85bd646	MINOR: queue: add a function to check for TOCTOU after queueing There's a rare TOCTOU case that happens from time to time with maxconn 1 and multiple threads. Between the moment we see the queue full and the moment we queue a request, it's possible that the last request on the server or proxy ended and that no other one is left to offer it its place. Given that all this code path is performance-critical and we cannot afford to increase the lock duration, better recheck for the condition after queueing. For this we need to be able to check for the condition and cleanly dequeue a request. That's what this patch provides via the new function pendconn_must_try_again(). It will catch more requests than absolutely needed though it will catch them all. It may find that around 1/1000 of requests are at risk, though testing shows that in practice, it's around 1 per million that really gets stuck (other ones benefit from timing and finishing late requests). Maybe in the future some conditions might be refined but it's harmless. What happens to such requests is that they're dequeued and their pendconn freed, so that the caller can decide to try to LB or queue them again. For now the function is not used, it's just added separately for easier tracking. (cherry picked from commit 1a8f3a368f1d212f5c2869d400fb07c78b2e7f45) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 12:11:48 +02:00
Willy Tarreau	822602b3ff	MEDIUM: h1: allow to preserve keep-alive on T-E + C-L In 2.5-dev9, commit 631c7e866 ("MEDIUM: h1: Force close mode for invalid uses of T-E header") enforced a recently arrived new security rule in the HTTP specification aiming at preventing a class of content-smuggling attacks involving HTTP/1.0 agents. It consists in handling the very rare T-E + C-L requests or responses in close mode. It happens it does have an impact of a rare few and very old clients (probably running insecure TLS stacks by the way) that continue to send both with their POST requests. The impact is that for each and every request they'll have to reconnect, possibly negotiating a full TLS handshake that becomes harmful to the machine in terms of CPU computation. This commit adds a new option "h1-do-not-close-on-insecure-transfer-encoding" that does exactly what it says, it just asks not to close on such messages, even though the message continues to be sanitized and C-L dropped. It means that the risk is only between the sender and haproxy, which is limited, and might be the only acceptable solution for such environments having to deal with broken implementations. The cases are so rare that it should not need to be backported, or in the worst case, to the latest LTS if there is any demand. (cherry picked from commit 2dab1ba84b11fe43baa91642ffcddb90e9ec09d2) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 12:11:39 +02:00
Frederic Lecaille	70f6d9986e	MINOR: quic: Add information to "show quic" for CUBIC cc. Add ->state_cli() new callback to quic_cc_algo struct to define a function called by the "show quic (cc\|full)" commands to dump some information about the congestion algorithm internal state currently in use by the QUIC connections. Implement this callback for CUBIC algorithm to dump its internal variables: - K: (the time to reach the cubic curve inflexion point), - last_w_max: the last maximum window value reached before intering the last recovery period. This is also the window value at the inflexion point of the cubic curve, - wdiff: the difference between the current window value and last_w_max. So negative before the inflexion point, and positive after. (cherry picked from commit 76ff8afa2d9eb0206bc72f4e2f8ad230720dfb94) [wt: adjusted ctx in quic_cli since no GSO] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 11:56:13 +02:00
Amaury Denoyelle	aca17100a0	CLEANUP: quic: rename TID affinity elements This commit is the renaming counterpart of the previous one, this time for quic_conn module. Several elements related to TID affinity update from quic_conn has been renamed : public functions, but also flag renamed to QUIC_FL_CONN_TID_REBIND and trace event to QUIC_EV_CONN_BIND_TID. This should be backported with the same instruction as the previous commit. (cherry picked from commit 3be58fc720c406ce4f4dfc70b87662cef4838886) [wt: dropped the BUG_ON() from quic_conn since bfdf145859d not backported] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 11:56:13 +02:00
Amaury Denoyelle	8669ecdf8a	CLEANUP: proto: rename TID affinity callbacks Since the following patch, protocol API to update a connection TID affinity has been extended. commit 1a43b9f32c71267e3cb514aa70a13c75adb20742 MINOR: proto: extend connection thread rebind API The single callback set_affinity has been splitted in 3 different functions which are called at different stages during listener_accept(), depending on accept queue push success or not. However, the naming was rendered confusing by the usage of function prefix 1 and 2. Rename proto callback related to TID affinity update and use the following names : * bind_tid_prep * bind_tid_commit * bind_tid_reset This commit should probably be backported at least up to 3.0 with the above patch. This is because the fix was recently backported and it would allow to keep changes minimal between the two versions. It could even be backported up to 2.8 if there is no major conflict. (cherry picked from commit 9fbe8b03346a98cc8cc7b47eaa68935b1d4b3916) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 11:56:13 +02:00
Amaury Denoyelle	6772e175ad	BUG/MEDIUM: quic: prevent crash on accept queue full Handshake for quic_conn instances runs on a single non-chosen thread. On completion, listener_accept() is performed to select the less loaded thread before initializing connection instance. As such, quic_conn instance is migrated to the thread with its upper connection. In case accept queue is full, listener_accept() fallback to local accept mode, which cause the connection to be assigned to the current thread. However, this is not supported by QUIC as quic_conn instance is left on the previously selected thread. In most cases, this will cause a BUG_ON() due to a task manipulation from an outside thread. To fix this, handle quic_conn thread rebind in multiple steps using the new extended protocol API. Several operations have been moved from qc_set_tid_affinity1() to newly defined qc_set_tid_affinity2(), in particular CID TID update. This ensures that quic_conn instance is not prematurely accessed on the new thread until accept queue push is guaranteed to succeed. qc_reset_tid_affinity() is also newly defined to reassign the newly created tasks and tasklets to the current thread. This is necessary to prevent the BUG_ON() crash described above. This must be backported up to 2.8 after a period of observation. Note that it depends on previous patch : MINOR: proto: extend connection thread rebind API (cherry picked from commit 95f624540b87e06e7a3c36b8c1ed4d76f0add2dc) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 11:56:13 +02:00
Amaury Denoyelle	b9d8312e34	MINOR: proto: extend connection thread rebind API MINOR: listener: define callback for accept queue push Extend API for connection thread rebind API by replacing single callback set_affinity by three different ones. Each one of them is used at a different stage of the operation : * set_affinity1 is used similarly to previous set_affinity * set_affinity2 is called directly from accept_queue_push_mp() when an entry has been found in accept ring. This operation cannot fail. * reset_affinity is called after set_affinity1 in case of failure from accept_queue_push_mp() due to no space left in accept ring. This is necessary for protocols which must reconfigure resources before fallback on the current tid. This patch does not have any functional changes. However, it will be required to fix crashes for QUIC connections when accept queue ring is full. As such, it must be backported with it. (cherry picked from commit 1a43b9f32c71267e3cb514aa70a13c75adb20742) [wt: backported for ease of maintenance, as suggested in commit 9fbe8b0334 ("CLEANUP: proto: rename TID affinity callbacks") also marked for backporting] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-29 11:55:21 +02:00
Amaury Denoyelle	49b64e6539	DEV: flags/quic: decode quic_conn flags Decode quic_conn flags via qc_show_flags() function. To support this, quic flags definition have been put outside of USE_QUIC directive. (cherry picked from commit 19b8c1b7cdda84775eb1afb452eee044d4920d4a) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-07-11 15:44:58 +02:00
Willy Tarreau	c742566b5b	Revert "MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD" This reverts the following commit: e3aefc50d8 ("MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD") Lukas expressed some concerns about possible consequences of this change so let's wait for a consensus to be found in mainline before we backport anything (if at all), as we certainly don't want to change the behavior after it's backported. No version was released with this patch, it's the right moment to revert it. For reference, the discussion is here: https://www.mail-archive.com/haproxy@formilux.org/msg45098.html Please note that if it were to be re-introduced later, it should be applied along with a small fix that already references it.	2024-07-11 15:44:52 +02:00
Valentine Krasnobaeva	e3aefc50d8	MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD Let's provide a default value for fd_hard_limit, if it's not set in the configuration. With this patch we could set some specific default via compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for haproxy package maintainers. make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000 If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set to 1048576. This is done to avoid killing the process by its watchdog, while it started without any limitations in its configuration or in the command line and the hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the size of internal fdtab, which becames very-very large as well. When the process starts to simply loop over this fdtab (0(n)), this takes a lot of time, so watchdog does it job. To avoid this, maxconn now is always reduced to some reasonable value either by explicit global.fd-hard-limit from configuration, or by its default. The default may be changed at build-time and overwritten then by global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the configuration has always precedence over DEFAULT_MAXFD, if set. Must be backported in all stable versions until v2.6.0, including v2.6.0. (cherry picked from commit 41275a691839df5f8dc7cb9faa4e259fbb755d34) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-07-05 12:20:58 +02:00
Amaury Denoyelle	47d13c68cf	BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid Ensure pseudo-header method is only constitued of valid characters according to RFC 9110. If an invalid value is found, the request is rejected and stream is resetted. Previously only characters forbidden in headers were rejected (NUL/CR/LF), but this is insufficient for :method, where some other forbidden chars might be used to trick a non-compliant backend server into seeing a different path from the one seen by haproxy. Note that header injection is not possible though. This must be backported up to 2.6. Many thanks to Yuki Mogi of FFRI Security Inc for the detailed report that allowed to quicky spot, confirm and fix the problem. (cherry picked from commit 789d4abd7328f0a745d67698e89bbb888d4d9b2c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-07-03 08:44:57 +02:00
Willy Tarreau	dd42e4233d	MINOR: activity: make the memory profiling hash size configurable at build time The MEMPROF_HASH_BITS variable was set to 10 without a possibility to change it (beyond patching the code). After seeing a few reports already with "other" being listed and a list with close to 1024 entries, it looks like it's about time to either increase the hash size, or at least make it configurable for special cases. As a reminder, in order to remain fast, the algorithm searches no more than 16 places after the hash, so when a table is almost full, searches are long and new places are rare. The present patch just makes it possible to redefine it by passing "-DMEMPROF_HASH_BITS=11" or "-DMEMPROF_HASH_BITS=12" in CFLAGS, and moves the definition to defaults.h to make it easier to find. Such values should be way sufficient for the vast majority of use cases. Maybe in the future we'd change the default. At least this version should be backported to ease rebuilds, say, till 2.8 or so. (cherry picked from commit 290659ffd3a2eead918adc387e8842c59fbff2e7) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-07-03 08:43:49 +02:00
Christopher Faulet	933f35fe26	BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session When we try to kill a session, the shard must be locked before decrementing the ref count on the session. Otherwise, the ref count can fall to 0 and a purge task (stktable_trash_oldest or process_table_expire) may release the session before we have the opportunity to acquire the lock on the shard to effectively kill the session. This could lead to a double free. Here is the scenario: Thread 1 Thread 2 sktsess_kill(ts) if (ATOMIC_DEC(&ts->ref_cnt) != 0) return /* here the ref count is 0 / stktable_trash_oldest() LOCK(&sh_lock) if (!ATOMIC_LOAD(&ts->ref_cnf)) __stksess_free(ts) UNLOCK(&sh_lock) / here the session was released */ LOCK(&sh_lock) __stksess_free(ts) <--- double free UNLOCK(&sh_lock) The bug was introduced in 2.9 by the commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt atomically"). The ref count must be decremented inside the lock for stksess_kill() and sktsess_kill_if_expired() function. This patch should fix the issue #2611. It must be backported as far as 2.9. On the 2.9, there is no sharding. All the table is locked. The patch will have to be adapted. (cherry picked from commit 9357873641c5de29b169848fc1c808747818a1eb) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-06-26 15:16:25 +02:00
Aurelien DARRAGON	4a744814ed	DEBUG: hlua: distinguish burst timeout errors from exec timeout errors hlua burst timeout was introduced in 58e36e5b1 ("MEDIUM: hlua: introduce tune.lua.burst-timeout"). It is a safety measure that allows to detect when too much time is spent on a single lua execution (between 2 interruptions/yields), meaning that the current thread is not able to perform other tasks. Such scenario should be avoided because it will cause thread contention which may have negative performance impact and could cause the watchdog to trigger. When the burst timeout is exceeded, the current Lua execution is aborted and a timeout error is reported to the user. Unfortunately, the same error is currently being reported for cumulative (AKA execution) timeout and for burst timeout, which may be confusing to the user. Indeed, "execution timeout" error historically results from the current hlua context exceeding the total (cumulative) time it's allowed to run. It is set per lua context using the dedicated tunables: - tune.lua.session-timeout - tune.lua.task-timeout - tune.lua.service-timeout We've already faced an user report where the user was able to trigger the burst timeout and got "Lua task: execution timeout." error while the user didn't set cumulative timeout. Thus the error was actually confusing because it was indeed the burst timeout which was causing it due to the use of cpu-intensive call from within the task without sufficient manual "yield" keypoints around the cpu-intensive call to ensure it runs on a dedicated scheduler cycle. In this patch we make it so burst timeout related errors are reported as "burst timeout" errors instead of "execution timeout" errors (which in fact became the generic timeout errors catchall with 58e36e5b1). To do this, hlua_timer_check() now returns a different value depending if the exeeded timeout is the burst one or the cumulative one, which allows us to return either HLUA_E_ETMOUT or HLUA_E_BTMOUT in hlua_ctx_resume(). It should improve the situation described in GH #2356 and may possibly be backported with 58e36e5b1 to improve error reporting if it applies without resistance. (cherry picked from commit 983513d901bb7511ea6b1e8c3bb00d58a9d432f2) [cf: No reason to backport further] Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-06-26 15:16:24 +02:00
Christopher Faulet	96aef291cb	BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego The previous fix (792a645ec2 ["BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released"]) introduced a regression. The zero-copy data forwarding must only be unblocked if it was blocked by the producer, after a successful negotiation. It is important because during a negotiation, the consumer may be blocked for another reason. Because of the flow control for instance. In that case, there is not necessarily a TX buffer. And it unexpected to try to release an unallocated TX buf. In addition, the same may happen while a TX buf is still in-use. In that case, it must also not be released. So testing the TX buffer is not the right solution. To fix the issue, a new IOBUF flag was added (IOBUF_FL_FF_WANT_ROOM). It must be set by the producer if it is blocked after a sucessful negotiation because it needs more room. In that case, we know a buffer was provided by the consummer. In done_fastfwd() callback function, it is then possible to safely unblock the zero-copy data forwarding if this flag is set. This patch must be backported to 3.0 with the commit above. (cherry picked from commit 9748df29ff655a9626d6e75ea9db79bb9afa2a50) Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>	2024-06-06 14:15:51 +02:00
Willy Tarreau	2e42a19cde	MINOR: version: mention that it's 3.0 LTS now. The version will be maintained up to around Q2 2029. Let's also update the INSTALL file to mention this.	2024-05-29 14:40:26 +02:00
Willy Tarreau	decb7c90df	CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat Valentine noticed this ugly SSL_CTX_get_tlsext_status_cb() macro definition inside ssl_sock.c that is dedicated to openssl-1.0.2 only. It would be better placed in openssl-compat.h, which is what this patch does. It also addresses a missing pair of parenthesis and removes an invalid extra semicolon.	2024-05-28 19:17:57 +02:00
Aurelien DARRAGON	435a9da267	MINOR: log: rename 'log-format tag' to 'log-format alias' In 2.9 we started to introduce an ambiguity in the documentation by referring to historical log-format variables ('%var') as log-format tags in 739c4e5b1e ("MINOR: sample: accept_date / request_date return %Ts / %tr timestamp values") and 454c372b60 ("DOC: configuration: add sample fetches for timing events"). In fact, we've had this confusion between log-format tag and log-format var for more than 10 years now, but in 2.9 it was the first time the confusion was exposed in the documentation. Indeed, both 'log-format variable' and 'log-format tag' actually refer to the same feature (that is: '%B' and friends that can be used for direct access to some log-oriented predefined fetches instead of using %[expr] with generic sample expressions). This feature was first implemented in 723b73ad75 ("MINOR: config: Parse the string of the log-format config keyword") and later documented in 4894040fa ("DOC: log-format documentation"). At that time, it was clear that we used to name it 'log-format variable'. But later the same year, 'log-format tag' naming started to appear in some commit messages (while still referring to the same feature), for instance with ffc3fcd6d ("MEDIUM: log: report SSL ciphers and version in logs using logformat %sslc/%sslv"). Unfortunately in 2.9 when we added (and documented) new log-format variables we officially started drifting to the misleading 'log-format tag' naming (perhaps because it was the most recent naming found for this feature in git log history, or because the confusion has always been there) Even worse, in 3.0 this confusion led us to rename all 'var' occurrences to 'tag' in log-format related code to unify the code with the doc. Hopefully William quickly noticed that we made a mistake there, but instead of reverting to historical naming (log-format variable), it was decided that we must use a different name that is less confusing than 'tags' or 'variables' (tags and variables are keywords that are already used to designate other features in the code and that are not very explicit under log-format context today). Now we refer to '%B' and friends as a logformat alias, which is essentially a handy way to print some log oriented information in the log string instead of leveraging '%[expr]' with generic sample expressions made of fetches and converters. Of course, there are some subtelties, such as a few log-format aliases that still don't have sample fetch equivalent for historical reasons, and some aliases that may be a little faster than their generic sample expression equivalents because most aliases are pretty much hardcoded in the log building function. But in general logformat aliases should be simply considered as an alternative to using expressions (with '%[expr']') Also, under log-format context, when we want to refer to either an alias ('%alias') or an expression ('%[expr]'), we should use the generic term 'logformat item', which in fact designates a single item within the logformat string provided by the user. Indeed, a logformat item (whether is is an alias or an expression) always starts with '%' and may accept optional flags / arguments Both the code and the documentation were updated in that sense, hopefully this will clarify things and prevent future confusions.	2024-05-27 17:03:48 +02:00
Amaury Denoyelle	47168e217a	MEDIUM: connection: use pool-conn-name instead of sni on reuse Implement pool-conn-name support for idle connection reuse. It replaces SNI as arbitrary identifier for connections in the idle pool. Thus, every SNI reference in this context have been replaced. Main change occurs in connect_server() where pool-conn-name sample fetch is now prehash to generate idle connection identifier. SNI is now solely used in the context of SSL for ssl_sock_set_servername().	2024-05-24 14:47:21 +02:00
Amaury Denoyelle	be4f89f2b2	MINOR: server: define pool-conn-name keyword Define a new server keyword pool-conn-name. The purpose of this keyword will be to identify connections inside the idle connections pool, replacing SNI in case SSL is not wanted. This keyword uses a sample expression argument. It thus can reuse existing function parse_srv_expr() for parsing. In the future, it may be necessary to define a keyword variant which uses a logformat for extensability. This patch only implement parsing. Argument is stored inside new server field <pool_conn_name> and expression is generated in _srv_parse_finalize() into <pool_conn_name_expr>. If pool-conn-name is not set but SNI is, the latter is reused automatically as pool-conn-name via _srv_parse_finalize(). This ensures current reuse behavior remains compatible and idle connection reuse will not mix connections with different SNIs by mistake. Main usage will be for rhttp when SSL is not wanted between the two haproxy instances. Previously, it was possible to use "sni" keyword even without SSL on a server line which have a similar effect. However, having a dedicated "pool-conn-name" keyword is deemed clearer. Besides, it would allow for more complex configuration where pool-conn-name and SNI are use in parallel with different values.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	91001422b4	MINOR: server: generalize sni expr parsing Two functions exists for server sni sample expression parsing. This is confusing so this commit aims at clarifying this. Functions are renamed with the following identifiers. First function is named parse_srv_expr() and can be used during parsing. Besides expression parsing, it has ensure sample fetch validity in the context of a server line. Second function is renamed _parse_srv_expr() and is used internally by parse_srv_expr(). It only implements sample parsing without extra checks. It is already use for server instantiation derived from server-template as checks were already performed. Also, it is now used in http-client code as SNI is a fixed string. Finally, both functions are generalized to remove any reference to SNI. This will allow to reuse it to parse other server keywords which use an expression. This will be the case for the future keyword pool-conn-name.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	5764bc50b5	BUG/MINOR: quic: adjust restriction for stateless reset emission Review RFC 9000 and ensure restriction on Stateless reset are properly enforced. After careful examination, several changes are introduced. First, redefine minimal Stateless Reset emitted packet length to 21 bytes (5 random bytes + a token). This is the new default length used in every case, unless received packet which triggered it is 43 bytes or smaller. Ensure every Stateless Reset packets emitted are at 1 byte shorter than the received packet which triggered it. No Stateless reset will be emitted if this falls under the above limit of 21 bytes. Thus this should prevent looping issues. This should be backported up to 2.6.	2024-05-24 14:36:31 +02:00
Amaury Denoyelle	45f40bac4c	MEDIUM: config: prevent communication with privileged ports This commit introduces a new global setting named harden.reject_privileged_ports.{tcp\|quic}. When active, communications with clients which use privileged source ports are forbidden. Such behavior is considered suspicious as it can be used as spoofing or DNS/NTP amplication attack. Value is configured per transport protocol. For each TCP and QUIC distinct code locations are impacted by this setting. The first one is in sock_accept_conn() which acts as a filter for all TCP based communications just after accept() returns a new connection. The second one is dedicated for QUIC communication in quic_recv(). In both cases, if a privileged source port is used and setting is disabled, received message is silently dropped. By default, protection are disabled for both protocols. This is to be able to backport it without breaking changes on stable release. This should be backported as it is an interesting security feature yet relatively simple to implement.	2024-05-24 14:36:31 +02:00
Aurelien DARRAGON	9d37c4b989	DEBUG: tools: add vma_set_name_id() helper Just like vma_set_name() from 51a8f134e ("DEBUG: tools: add vma_set_name() helper"), but also takes <id> as parameter to append "-$id" suffix after the name in order to differentiate 2 areas that were named using the same <type> and <name> combination. example, using mmap + MAP_SHARED\|MAP_ANONYMOUS: 7364c4fff000-736508000000 rw-s 00000000 00:01 3540 [anon_shmem:type:name-id] Another example, using mmap + MAP_PRIVATE\|MAP_ANONYMOUS or using glibc/malloc() above MMAP_THRESHOLD: 7364c4fff000-736508000000 rw-s 00000000 00:01 3540 [anon:type:name-id]	2024-05-24 12:07:13 +02:00
Willy Tarreau	381ed2a4dd	MINOR: config: add thread-hard-limit to set an upper bound to nbthread On todays large systems, it's not always desired to run on all threads for light loads, and usually users enforce nbthread to a lower value (e.g. 8). The problem is that this is a fixed value, and moving such configs to smaller machines continues to enforce the value and this becomes extremely unproductive due to having more threads than CPUs. This also happens quite a bit in VMs, containers, or cloud instances of various sizes. This commit introduces the thread-hard-limit setting that allows to only set an upper bound to the number of threads without raising a lower value. This means that using "thread-hard-limit 8" will make sure that no more than 8 threads will be used when available, but it will remain two when run on a dual-core machine.	2024-05-24 09:46:49 +02:00
Willy Tarreau	c7335d55f8	BUG/MEDIUM: quic_tls: prevent LibreSSL < 4.0 from negotiating CHACHA20_POLY1305 As diagnosed in GH issue #2569, there's currently an issue in LibreSSL's CHACHA20 in-place implementation that makes haproxy discard incoming QUIC packets encrypted with it. It's not very easy to observe the issue because: - QUIC recommends that CHACHA20 is used in priority - on x86 with AES-NI, LibreSSL prefers AES-GCM for performance reasons, so the problem is only observed there if a client explicitly forces TLS_CHACHA20_POLY1305_SHA256 only. - discarded packets cause retransmits showing some apparent activity, and the handshake succeeds so it's not easy to analyze from the client which thinks that the server is slow to respond. Thus in practice, on non-x86 machines running LibreSSL, requests made over QUIC freeze for a long time, unless the client explicitly forces algos excluding TLS_CHACHA20_POLY1305_SHA256. That's typically the case by default on modern OpenBSD systems, and was reported in the issue above for an arm64 machine running OpenBSD -current, and was also observed on a mips64 one running OpenBSD 7.5. There is no simple solution to this problem due to some of the protocol's constraints without digging too low into the stack (and risking to break more). Here we're taking a pragmatic approach consisting in making the connection fail hard when TLS_CHACHA20_POLY1305_SHA256 is selected, regardless of the availability of other ciphers. This means that every time a connection would have hung, instead it will fail fast, allowing the client to retry over TLS/TCP. Theo Buehler recommends that we limit this protection to all LibreSSL versions before 4.0 since it's where the fix will be implemented. Older stable versions will just see TLS_CHACHA20_POLY1305_SHA256 disabled, which should be sufficient to make QUIC work there again as well. The following config is sufficient to reproduce the issue (on a non-x86 machine, both arm64 & mips64 were confirmed to reproduce it): global limited-quic frontend stats mode http #bind :8181 #bind :8443 ssl crt rsa+dh2048.pem bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3 timeout client 5s stats uri / And the following commands will trigger the problem on affected LibreSSL versions: curl --tls13-ciphers TLS_CHACHA20_POLY1305_SHA256 -v --http3 -k https://127.0.0.1:8443/ curl -v --http3 -k https://127.0.0.1:8443/ while these ones must work: curl --tls13-ciphers TLS_AES_128_GCM_SHA256 -v --http3 -k https://127.0.0.1:8443/ curl --tls13-ciphers TLS_AES_256_GCM_SHA384 -v --http3 -k https://127.0.0.1:8443/ Normally all of them will work with LibreSSL 4, and only the first one should fail with stable LibreSSL versions higher than 3.9.2. An haproxy version without this workaround will show an unresponsive command after the GET is sent, while a version with the workaround will close the connection on error. On a version with this workaround, if TCP listeners are uncommented, curl will automatically fall back to TCP and attempt the reqeust again over HTTP/2. Finally, on OpenSSL 1.1.1 in compat mode (hence the limited-quic option above) all of them must work. Many thanks to github user @lgv5 for the detailed report, tests, and for spotting the issue, and to @botovq (Theo Buehler) for the quick analysis, patch and help on this workaround. This needs to be backported to versions 2.6 and above.	2024-05-22 16:22:22 +02:00
Amaury Denoyelle	60496e884e	MINOR: connection: support PROXY v2 TLV emission without stream Update API for PROXY protocol header encoding. Previously, it requires stream parameter to be set. Change make_proxy_line() and associated functions to add an extra session parameter. This is useful in context where no stream is instantiated. For example, this is the case for rhttp preconnect. This change allows to extend PROXY v2 TLV encoding. Replace build_logline() which requires a stream instance and call directly sess_build_logline(). Note that stream parameter is kept as it is necessary for unique ID encoding. This change has no functional change for standard connections. However, it is necessary to support TLV encoding on rhttp preconnect.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	12c40c25a9	MEDIUM: rhttp: create session for active preconnect Modify rhttp preconnect by instantiating a new session for each connection attempt. Connection is thus linked to a session directly on its instantiation contrary to previously where no session existed until listener_accept(). This patch will allow to extend rhttp usage. Most notably, it will be useful to use various sample fetches on the server line and extend logging capabilities. Changes are minimal, yet consequences are considered not trivial as for the first time a FE connection session is instantiated before listener_accept(). This requires an extra explicit check in session_accept_fd() to not overwrite an existing session. Also, flag SESS_FL_RELEASE_LI is not set immediately as listener counters must note be decremented if connection and its session are freed before reversal is completed, or else listener counters will be invalid. conn_session_free() is used as connection destroy callback to ensure the session will be freed automatically on connection release.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	45b80aed70	MINOR: session: define flag to explicitely release listener on free When a session is allocated for a FE connection, session_free() is responsible to call listener_release() to decrement listener connection counters and resume listening. Until now, <listener> member of session was tested inside session_free() before invocating listener_release(). To highlight more explicitely the relation between sessions and listeners, introduce a new flag SESS_FL_RELEASE_LI. Only session with such flag set will invoke listener_release() on their cleanup. Flag is set inside session_accept_fd() on success. This patch has no functional change. However, it will be useful to implement session creation for rHTTP preconnect.	2024-05-22 10:01:57 +02:00
Amaury Denoyelle	2770ef352e	BUG/MINOR: rhttp: prevent listener suspend Ensure "disable frontend" on a reverse HTTP listener is forbidden by returing -1 on suspend callback. Suspending such a listener has unknown effect and so is not properly implemented for now. This should be backported up to 2.9.	2024-05-22 10:01:57 +02:00
Valentine Krasnobaeva	5f713c03be	BUG/MEDIUM: proto: fix fd leak in <proto>_connect_server This fixes the fd leak, introduced in the commit d3fc982cd788 ("MEDIUM: proto: make common fd checks in sock_create_server_socket"). Initially sock_create_server_socket() was designed to return only created socket FD or -1. Its callers from upper protocol layers were required to test the returned errno and were required then to apply different configuration related checks to obtained positive sock_fd. A lot of this code was duplicated among protocols implementations. The new refactored version of sock_create_server_socket() gathers in one place all duplicated checks, but in order to be complient with upper protocol layers, it needs the 3rd parameter: 'stream_err', in which it sets the Stream Error Flag for upper levels, if the obtained sock_fd has passed all additional checks. No backport needed since this was introduced in 3.0-dev10.	2024-05-21 20:14:05 +02:00

1 2 3 4 5 ...

7735 Commits