Commit Graph

6368 Commits

Author SHA1 Message Date
William Lallemand
b1351c1a05 BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.

This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.

Must be backported in 2.6.

(cherry picked from commit 0a2d63236c)
[wla: context changed in httpclient_precheck()]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:58:29 +01:00
Willy Tarreau
51743eea8b BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.

This may be backported as far as 2.2.

(cherry picked from commit 9dc231a6b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:57 +01:00
Frédéric Lécaille
38c47fb838 BUG/MAJOR: quic: Crash after discarding packet number spaces
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:

     "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"

This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.

This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.

Thank you to @gabrieltz for having reported this issue in GH #1903.

Must be backported to 2.6.

(cherry picked from commit 74b5f7b31b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Frédéric Lécaille
05e719b7bd BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.

Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)

With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).

Thank you to @gabrieltz for having reported this issue.

Must be backported to 2.6.

(cherry picked from commit 814645f42f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Amaury Denoyelle
d4c880d649 BUG/MINOR: quic: fix subscribe operation
Subscribing was not properly designed between quic-conn and quic MUX
layers. Align this as with in other haproxy components : <subs> field is
moved from the MUX to the quic-conn structure. All mention of qcc MUX is
cleaned up in quic_conn_subscribe()/quic_conn_unsubscribe().

Thanks to this change, ACK reception notification has been simplified.
It's now unnecessary to check for the MUX existence before waking it.
Instead, if <subs> quic-conn field is set, just wake-up the upper layer
tasklet without mentionning MUX. This should probably be extended to
other part in quic-conn code.

This should be backported up to 2.6.

(cherry picked from commit bbb1c68508)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-17 16:34:22 +01:00
Amaury Denoyelle
9c15bd5d37 MINOR: quic: display unknown error sendto counter on stat page
This patch complete the previous incomplete commit. The new counter
sendto_err_unknown is now displayed on stats page/CLI show stats.

This is related to github issue #1903.

This should be backported up to 2.6.

(cherry picked from commit 7941ead3aa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:52:33 +02:00
Amaury Denoyelle
934659e0ce MINOR: quic: do not crash on unhandled sendto error
Remove ABORT_NOW() statement on unhandled sendto error. Instead use a
dedicated counter sendto_err_unknown to report these cases.

If we detect increment of this counter, strace can be used to detect
errno value :
  $ strace -p $(pidof haproxy) -f -e trace=sendto -Z

This should be backported up to 2.6.

This should help to debug github issue #1903.

(cherry picked from commit 1d9f170edd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:52:22 +02:00
Amaury Denoyelle
9287b4047b BUG/MINOR: mux-quic: complete flow-control for uni streams
Max stream data was not enforced and respect for local/remote uni
streams. Previously, qcs instances incorrectly reused the limit defined
from bidirectional ones.

This is now fixed. Two fields are added in qcc structure connection :
* value for local flow control to enforce on remote uni streams
* value for remote flow control to respect on local uni streams

These two values can be reused to properly initialized msd field of a
qcs instance in qcs_new(). The rest of the code is similar.

This must be backported up to 2.6.

(cherry picked from commit 176174f7e4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:50:54 +02:00
William Lallemand
3fd456abc7 BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient
When the lua task finished  before the httpclient that are associated to
it, there is a risk that the httpclient try to task_wakeup() the lua
task which does not exist anymore.

To fix this issue the httpclient used in a lua task are stored in a
list, and the httpclient are destroyed at the end of the lua task.

Must be backported in 2.5 and 2.6.

(cherry picked from commit bb581423b3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:49:59 +02:00
Amaury Denoyelle
2b697ca18f MINOR: quic: define first packet flag
Received packets treatment has some difference regarding if this is the
first one or not of the encapsulating datagram. Previously, this was set
via a function argument. Simplify this by defining a new Rx packet flag
named QUIC_FL_RX_PACKET_DGRAM_FIRST.

This change does not have functional impact. It will simplify API when
qc_lstnr_pkt_rcv() is broken into several functions : their number of
arguments will be reduced thanks to this patch.

This should be backported up to 2.6.

(cherry picked from commit deb7c87f55)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:49:29 +02:00
Amaury Denoyelle
2f0d2c3197 MINOR: quic: extend pn_offset field from quic_rx_packet
pn_offset field was only set if header protection cannot be removed.
Extend the usage of this field : it is now set everytime on packet
parsing in qc_lstnr_pkt_rcv().

This change helps to clean up API of Rx functions by removing
unnecessary variables and function argument.

This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.

This should be backported up to 2.6.

(cherry picked from commit 845169da58)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:49:25 +02:00
Amaury Denoyelle
60abfa542e MINOR: quic: add version field on quic_rx_packet
Add a new field version on quic_rx_packet structure. This is set on
header parsing in qc_lstnr_pkt_rcv() function.

This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.

This should be backported up to 2.6.

(cherry picked from commit 0eae57273b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:49:22 +02:00
Amaury Denoyelle
6f2acccd90 CLEANUP: quic: improve naming for rxbuf/datagrams handling
QUIC datagrams are read from a random thread. They are then redispatch
to the connection thread according to the first packet DCID. These
operations are implemented through a special buffer designed to avoid
locking.

Refactor this code with the following changes :
* <rxbuf> type is renamed <quic_receiver_buf>. Its list element is also
  renamed to highligh its attach point to a receiver.
* <quic_dgram> and <quic_receiver_buf> definition are moved to
  quic_sock-t.h. This helps to reduce the size of quic_conn-t.h.
* <quic_dgram> list elements are renamed to highlight their attach point
  into a <quic_receiver_buf> and a <quic_dghdlr>.

This should be backported up to 2.6.

(cherry picked from commit 1cba8d60f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:46:11 +02:00
Amaury Denoyelle
94cf0e576d CLEANUP: quic: remove unused rxbufs member in receiver
rxbuf is the structure used to store QUIC datagrams and redispatch them
to the connection thread.

Each receiver manages a list of rxbuf. This was stored both as an array
and a mt_list. Currently, only mt_list is needed so removed <rxbufs>
member from receiver structure.

This should be backported up to 2.6.

(cherry picked from commit 8c4d062d25)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:46:08 +02:00
Frédéric Lécaille
118e94d0c0 MINOR: quic: Split the secrets key allocation in two parts
Implement quic_tls_secrets_keys_alloc()/quic_tls_secrets_keys_free() to allocate
the memory for only one direction (RX or TX).
Modify ha_quic_set_encryption_secrets() to call these functions for one of this
direction (or both). So, for now on we can rely on the value of the secret keys
to know if it was derived.
Remove QUIC_FL_TLS_SECRETS_SET flag which is no more useful.
Consequently, the secrets are dumped by the traces only if derived.

Must be backported to 2.6.

(cherry picked from commit e1a49cfd4d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:46:04 +02:00
Frédéric Lécaille
e0212bcb47 BUG/MINOR: quic: Stalled 0RTT connections with big ClientHello TLS message
This issue was reproduced with -Q picoquic client option to split a big ClientHello
message into two Initial packets and haproxy as server without any knowledged of
any previous ORTT session (restarted after a firt 0RTT session). The ORTT received
packets were removed from their queue when the second Initial packet was parsed,
and the QUIC handshake state never progressed and remained at Initial state.

To avoid such situations, after having treated some Initial packets we always
check if there are ORTT packets to parse and we never remove them from their
queue. This will be done after the hanshake is completed or upon idle timeout
expiration.

Also add more traces to be able to analize the handshake progression.

Tested with ngtcp2 and picoquic

Must be backported to 2.6.

(cherry picked from commit 4aa7d8197a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:46:01 +02:00
Frédéric Lécaille
c094639d65 MINOR: quic: Use a non-contiguous buffer for RX CRYPTO data
Implement quic_get_ncbuf() to dynamically allocate a new ncbuf to be attached to
any quic_cstream struct which needs such a buffer. Note that there is no quic_cstream
for 0RTT encryption level. quic_free_ncbuf() is added to release the memory
allocated for a non-contiguous buffer.

Modify qc_handle_crypto_frm() to call this function and allocate an ncbuf for
crypto data which are not received in order. The crypto data which are received in
order are not buffered but provide to the TLS stack (calling qc_provide_cdata()).

Modify qc_treat_rx_crypto_frms() which is called after having provided the
in order received crypto data to the TLS stack to provide again the remaining
crypto data which has been buffered, if possible (if they are in order). Each time
buffered CRYPTO data were consumed, we try to release the memory allocated for
the non-contiguous buffer (ncbuf).
Also move rx.crypto.offset quic_enc_level struct member to rx.offset quic_cstream
struct member.

Must be backported to 2.6.

(cherry picked from commit 9f9263ed13)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:45:58 +02:00
Frédéric Lécaille
6b84c17d99 MINOR: quic: New quic_cstream object implementation
Add new quic_cstream struct definition to implement the CRYPTO data stream.
This is a simplication of the qcs object (QUIC streams) for the CRYPTO data
without any information about the flow control. They are not attached to any
tree, but to a QUIC encryption level, one by encryption level except for
the early data encryption level (for 0RTT). A stream descriptor is also allocated
for each CRYPTO data stream.

Must be backported to 2.6

(cherry picked from commit 7e3f7c47e9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:45:49 +02:00
Willy Tarreau
b088056f3d CLEANUP: quic/receiver: remove the now unused tx_qring list
The tx_qrings[] and tx_qring_list in the receiver are not used
anymore since commit f2476053f ("MINOR: quic: replace custom buf on Tx
by default struct buffer"), the only place where they're referenced
was in quic_alloc_tx_rings_listener(), which by the way implies that
these were not even freed on exit.

Let's just remove them. This should be backported to 2.6 since the
commit above also was.

(cherry picked from commit cab054bbf9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:42:38 +02:00
Amaury Denoyelle
3701ab7823 MEDIUM: quic: retrieve frontend destination address
Retrieve the frontend destination address for a QUIC connection. This
address is retrieve from the first received datagram and then stored in
the associated quic-conn.

This feature relies on IP_PKTINFO or affiliated flags support on the
socket. This flag is set for each QUIC listeners in
sock_inet_bind_receiver(). To retrieve the destination address,
recvfrom() has been replaced by recvmsg() syscall. This operation and
parsing of msghdr structure has been extracted in a wrapper quic_recv().

This change is useful to finalize the implementation of 'dst' sample
fetch. As such, quic_sock_get_dst() has been edited to return local
address from the quic-conn. As a best effort, if local address is not
available due to kernel non-support of IP_PKTINFO, address of the
listener is returned instead.

This should be backported up to 2.6.

(cherry picked from commit 97ecc7a8ea)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-25 11:42:27 +02:00
Amaury Denoyelle
68f165013c MINOR: quic: limit usage of ssl_sock_ctx in favor of quic_conn
Continue on the cleanup of QUIC stack and components.

quic_conn uses internally a ssl_sock_ctx to handle mandatory TLS QUIC
integration. However, this is merely as a convenience, and it is not
equivalent to stackable ssl xprt layer in the context of HTTP1 or 2.

To better emphasize this, ssl_sock_ctx usage in quic_conn has been
removed wherever it is not necessary : namely in functions not related
to TLS. quic_conn struct now contains its own wait_event for tasklet
quic_conn_io_cb().

This should be backported up to 2.6.

(cherry picked from commit 2ed840015f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 08:43:48 +02:00
Willy Tarreau
d22ff05c3e MINOR: fd: add a new function to only raise RLIMIT_NOFILE
In issue #1866 an issue was reported under docker, by which a user cannot
lower the number of FD needed. It looks like a restriction imposed in this
environment, but it results in an error while it ought not have to in the
case of shrinking.

This patch adds a new function raise_rlim_nofile() that takes the desired
new setting, compares it to the current one, and only calls setrlimit() if
one of the values in the new setting is larger than the older one. As such
it will continue to emit warnings and errors in case of failure to raise
the limit but will never shrink it.

This patch is only preliminary to another one, but will have to be
backported where relevant (likely only 2.6).

(cherry picked from commit 922a907926)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 08:43:48 +02:00
Amaury Denoyelle
4a6be622b2 CLEANUP: quic: create a dedicated quic_conn module
xprt_quic module was too large and did not reflect the true architecture
by contrast to the other protocols in haproxy.

Extract code related to XPRT layer and keep it under xprt_quic module.
This code should only contains a simple API to communicate between QUIC
lower layer and connection/MUX.

The vast majority of the code has been moved into a new module named
quic_conn. This module is responsible to the implementation of QUIC
lower layer. Conceptually, it overlaps with TCP kernel implementation
when comparing QUIC and HTTP1/2 stacks of haproxy.

This should be backported up to 2.6.

(cherry picked from commit 92fa63f735)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 08:43:48 +02:00
Amaury Denoyelle
228883ca32 CLEANUP: quic: remove duplicated varint code from xprt_quic.h
There was some identical code between xprt_quic and quic_enc modules.
This concerns helper on QUIC varint type. Keep only the version in
quic_enc file : this should help to reduce dependency on xprt_quic
module.

Note that quic_max_int_by_size() has been removed and is replaced by the
identical quic_max_int().

This should be backported up to 2.6.

(cherry picked from commit a2639383ec)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 08:43:48 +02:00
Amaury Denoyelle
69baa24752 CLEANUP: quic: fix headers
Clean up quic sources by adjusting headers list included depending
on the actual dependency of each source file.

On some occasion, xprt_quic.h was removed from included list. This is
useful to help reducing the dependency on this single file and cleaning
up QUIC haproxy architecture.

This should be backported up to 2.6.

(cherry picked from commit 5c25dc5bfd)
[cf: Include <haproxy/global.h> from cfgparse-quic.c instead of only
     <haproxy/global-t.h">. On 2.7, it is shipped with "tools.h" (tools.h >
     cli.h > global.h). But not on the 2.6]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 08:41:20 +02:00
Amaury Denoyelle
90a008239e BUG/MINOR: quic: adjust quic_tls prototypes
Two prototypes in quic_tls module were not identical to the actual
function definition.

* quic_tls_decrypt2() : the second argument const attribute is not
  present, to be able to use it with EVP_CIPHER_CTX_ctlr(). As a
  consequence of this change, token field of quic_rx_packet is now
  declared as non-const.

* quic_tls_generate_retry_integrity_tag() : the second argument type
  differ between the two. Adjust this by fixing it to as unsigned char
  to match EVP_EncryptUpdate() SSL function.

This situation did not seem to have any visible effect. However, this is
clearly an undefined behavior and should be treated as a bug.

This should be backported up to 2.6.

(cherry picked from commit f3c40f83fb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 07:43:42 +02:00
Amaury Denoyelle
adf910e519 CLEANUP: quic: remove global var definition in quic_tls header
Some variables related to QUIC TLS were defined in a header file : their
definitions are now moved properly in the implementation file, with only
declarations in the header.

This should be backported up to 2.6.

(cherry picked from commit a19bb6f0b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 07:43:39 +02:00
Willy Tarreau
22beb2ad21 BUG/MINOR: backend: only enforce turn-around state when not redispatching
In github issue #1878, Bart Butler reported observing turn-around states
(1 second pause) after connection retries going to different servers,
while this ought not happen.

In fact it does happen because back_handle_st_cer() enforces the TAR
state for any algo that's not round-robin. This means that even leastconn
has it, as well as hashes after the number of servers changed.

Prior to doing that, the call to stream_choose_redispatch() has already
had a chance to perform the correct choice and to check the algo and
the number of retries left. So instead we should just let that function
deal with the algo when needed (and focus on deterministic ones), and
let the former just obey. Bart confirmed that the fixed version works
as expected (no more delays during retries).

This may be backported to older releases, though it doesn't seem very
important. At least Bart would like to have it in 2.4 so let's go there
for now after it has cooked a few weeks in 2.6.

(cherry picked from commit 406efb96d1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 07:43:13 +02:00
Willy Tarreau
c564abd107 BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns
Idle connections do not work on 32-bit machines due to an alignment issue
causing the connection nodes to be indexed with their lower 32-bits set to
zero and the higher 32 ones containing the 32 lower bitss of the hash. The
cause is the use of ebmb_node with an aligned data, as on this platform
ebmb_node is only 32-bit aligned, leaving a hole before the following hash
which is a uint64_t:

  $ pahole -C conn_hash_node ./haproxy
  struct conn_hash_node {
        struct ebmb_node           node;                 /*     0    20 */

        /* XXX 4 bytes hole, try to pack */

        int64_t                    hash;                 /*    24     8 */
        struct connection *        conn;                 /*    32     4 */

        /* size: 40, cachelines: 1, members: 3 */
        /* sum members: 32, holes: 1, sum holes: 4 */
        /* padding: 4 */
        /* last cacheline: 40 bytes */
  };

Instead, eb64 nodes should be used when it comes to simply storing a
64-bit key, and that is what this patch does.

For backports, a variant consisting in simply marking the "hash" member
with a "packed" attribute on the struct also does the job (tested), and
might be preferable if the fix is difficult to adapt. Only 2.6 and 2.5
are affected by this.

(cherry picked from commit 8522348482)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-10-10 07:40:32 +02:00
Aurelien DARRAGON
9779c1b5d7 BUG/MINOR: log: improper behavior when escaping log data
Patrick Hemmer reported an improper log behavior when using
log-format to escape log data (+E option):
Some bytes were truncated from the output:

- escape_string() function now takes an extra parameter that
  allow the caller to specify input string stop pointer in
  case the input string is not guaranteed to be zero-terminated.
- Minors checks were added into lf_text_len() to make sure dst
  string will not overflow.
- lf_text_len() now makes proper use of escape_string() function.

This should be backported as far as 1.8.

(cherry picked from commit c5bff8e550)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 16:31:39 +02:00
Amaury Denoyelle
28e18246be BUG/MEDIUM: mux-quic: properly trim HTX buffer on snd_buf reset
MUX QUIC snd_buf operation whill return early if a qcs instance is
resetted. In this case, HTX is left untouched and the callback returns
the whole bufer size. This lead to an undefined behavior as the stream
layer is notified about a transfer but does not see its HTX buffer
emptied. In the end, the transfer may stall which will lead to a leak on
session.

To fix this, HTX buffer is now resetted when snd_buf is short-circuited.
This should fix the issue as now the stream layer can continue the
transfer until its completion.

This patch has already been tested by Tristan and is reported to solve
the github issue #1801.

This should be backported up to 2.6.

(cherry picked from commit 0ed617ac2f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:58:22 +02:00
Amaury Denoyelle
53e4116b6b MINOR: mux-quic: refactor snd_buf
Factorize common code between h3 and hq-interop snd_buf operation. This
is inserted in MUX QUIC snd_buf own callback.

The h3/hq-interop API has been adjusted to directly receive a HTX
message instead of a plain buf. This led to extracting part of MUX QUIC
snd_buf in qmux_http module.

This should be backported up to 2.6.

(cherry picked from commit 9534e59bb9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:58:18 +02:00
Amaury Denoyelle
0859fbf203 REORG: mux-quic: export HTTP related function in a dedicated file
Extract function dealing with HTX outside of MUX QUIC. For the moment,
only rcv_buf stream operation is concerned.

The main objective is to be able to support both TCP and HTTP proxy mode
with a common base and add specialized modules on top of it.

This should be backported up to 2.6.

(cherry picked from commit d80fbcaca2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:58:14 +02:00
Amaury Denoyelle
f3cab28f96 REORG: mux-quic: extract traces in a dedicated source file
QUIC MUX implements several APIs to interface with stream, quic-conn and
app-ops layers. It is planified to better separate this roles, possibly
by using several files.

The first step is to extract QUIC MUX traces in a dedicated source
files. This will allow to reuse traces in multiple files.

The main objective is to be
able to support both TCP and HTTP proxy mode with a common base and add
specialized modules on top of it.

This should be backported up to 2.6.

(cherry picked from commit 36d50bff22)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:58:06 +02:00
Amaury Denoyelle
4075ed06e6 BUG/MEDIUM: mux-quic: fix nb_hreq decrement
nb_hreq is a counter on qcc for active HTTP requests. It is incremented
for each qcs where a full HTTP request was received. It is decremented
when the stream is closed locally :
- on HTTP response fully transmitted
- on stream reset

A bug will occur if a stream is resetted without having processed a full
HTTP request. nb_hreq will be decremented whereas it was not
incremented. This will lead to a crash when building with
DEBUG_STRICT=2. If BUG_ON_HOT are not active, nb_hreq counter will wrap
which may break the timeout logic for the connection.

This bug was triggered on haproxy.org. It can be reproduced by
simulating the reception of a STOP_SENDING frame instead of a STREAM one
by patching qc_handle_strm_frm() :

+       if (quic_stream_is_bidi(strm_frm->id))
+               qcc_recv_stop_sending(qc->qcc, strm_frm->id, 0);
+       //ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+       //               strm_frm->offset.key, strm_frm->fin,
+       //               (char *)strm_frm->data);

To fix this bug, a qcs is now flagged with a new QC_SF_HREQ_RECV. This
is set when the full HTTP request is received. When the stream is closed
locally, nb_hreq will be decremented only if this flag was set.

This must be backported up to 2.6.

(cherry picked from commit afb7b9d8e5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:52:18 +02:00
Amaury Denoyelle
01a5be8c38 CLEANUP: mux-quic: remove stconn usage in h3/hq
Small cleanup on snd_buf for application protocol layer.
* do not export h3_snd_buf
* replace stconn by a qcs argument. This is better as h3/hq-interop only
  uses the qcs instance.

This should be backported up to 2.6.

(cherry picked from commit 8d4ac48d3d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:38 +02:00
Amaury Denoyelle
57b3c47e70 BUG/MEDIUM: mux-quic: fix crash on early app-ops release
H3 SETTINGS emission has recently been delayed. The idea is to send it
with the first STREAM to reduce sendto syscall invocation. This was
implemented in the following patch :
  3dd79d378c
  MINOR: h3: Send the h3 settings with others streams (requests)

This patch works fine under nominal conditions. However, it will cause a
crash if a HTTP/3 connection is released before having sent any data,
for example when receiving an invalid first request. In this case,
qc_release will first free qcc.app_ops HTTP/3 application protocol layer
via release callback. Then qc_send is called to emit any closing frames
built by app_ops release invocation. However, in qc_send, as no data has
been sent, it will try to complete application layer protocol
intialization, with a SETTINGS emission for HTTP/3. Thus, qcc.app_ops is
reused, which is invalid as it has been just freed. This will cause a
crash with h3_finalize in the call stack.

This bug can be reproduced artificially by generating incomplete HTTP/3
requests. This will in time trigger http-request timeout without any
data send. This is done by editing qc_handle_strm_frm function.

-       ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+       ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len - 1,
                       strm_frm->offset.key, strm_frm->fin,
                       (char *)strm_frm->data);

To fix this, application layer closing API has been adjusted to be done
in two-steps. A new shutdown callback is implemented : it is used by the
HTTP/3 layer to generate GOAWAY frame in qc_release prologue.
Application layer context qcc.app_ops is then freed later in qc_release
via the release operation which is now only used to liberate app layer
ressources. This fixes the problem as the intermediary qc_send
invocation will be able to reuse app_ops before it is freed.

This patch fixes the crash, but it would be better to adjust H3 SETTINGS
emission in case of early connection closing : in this case, there is no
need to send it. This should be implemented in a future patch.

This should fix the crash recently experienced by Tristan in github
issue #1801.

This must be backported up to 2.6.

(cherry picked from commit f8aaf8bdfa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:25 +02:00
William Lallemand
86120e4953 MEDIUM: quic: separate path for rx and tx with set_encryption_secrets
With quicTLS the set_encruption_secrets callback is always called with
the read_secret and the write_secret.

However this is not the case with libreSSL, which uses the
set_read_secret()/set_write_secret() mecanism. It still provides the
set_encryption_secrets() callback, which is called with a NULL
parameter for the write_secret during the read, and for the read_secret
during the write.

The exchange key was not designed in haproxy to be called separately for
read and write, so this patch allow calls with read or write key to
NULL.

(cherry picked from commit 95fc737fc6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:19 +02:00
Emeric Brun
2cc1ed89d4 BUG/MEDIUM: sink: bad init sequence on tcp sink from a ring.
The init of tcp sink, particularly for SSL, was done
too early in the code, during parsing, and this can cause
a crash specially if nbthread was not configured.

This was detected by William using ASAN on a new regtest
on log forward.

This patch adds the 'struct proxy' created for a sink
to a list and this list is now submitted to the same init
code than the main proxies list or the log_forward's proxies
list. Doing this, we are assured to use the right init sequence.
It also removes the ini code for ssl from post section parsing.

This patch should be backported as far as v2.2

Note: this fix uses 'goto' labels created by commit
'BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized'
but this code didn't exist before v2.3 so this patch needs to be
adapted for v2.2.

(cherry picked from commit d6e581de4b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:36:59 +02:00
Aurelien DARRAGON
1f9d9f53f0 MINOR: proxy/listener: support for additional PAUSED state
This patch is a prerequisite for #1626.
Adding PAUSED state to the list of available proxy states.
The flag is set when the proxy is paused at runtime (pause_listener()).
It is cleared when the proxy is resumed (resume_listener()).

It should be backported to 2.6, 2.5 and 2.4

(cherry picked from commit d46f437de6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Aurelien DARRAGON
614f99ee0a MINOR: listener: small API change
A minor API change was performed in listener(.c/.h) to restore consistency
between stop_listener() and (resume/pause)_listener() functions.

LISTENER_LOCK was never locked prior to calling stop_listener():
lli variable hint is thus not useful anymore.

Added PROXY_LOCK locking in (resume/pause)_listener() functions
with related lpx variable hint (prerequisite for #1626).

It should be backported to 2.6, 2.5 and 2.4

(cherry picked from commit 001328873c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
ba627a66ce MINOR: h3: Send the h3 settings with others streams (requests)
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.

Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.

Perhaps there is a better way to do. This will have to be checked with Amaury.

Must be backported to 2.6.

(cherry picked from commit 3dd79d378c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
68436e33d1 BUG/MINOR: quic: Speed up the handshake completion only one time
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).

Must be backported to 2.6.

(cherry picked from commit bb995eafc7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
397fcc008d MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.

(cherry picked from commit 1efddfa6bf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
1cb273c718 BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
  - a new wakeup while the task is executing would reset the ->wake_date
    field before returning and make abnormally low values being reported;
    that was likely the case for taskèrun_applet for self-waking applets;

  - when the task dies, NULL is returned and the call date couldn't be
    retrieved, so that CPU time was not being accounted for. This was
    particularly visible with process_stream() which is usually called
    only twice per request, and whose time was systematically halved.

The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.

One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.

Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.

Task profiling output after 1 million HTTP request previously looked like
this:

  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2012338   4.850s    2.410us   12.91s    6.417us
    process_stream              2000136   9.594s    4.796us   34.26s    17.13us
    sc_conn_io_cb               2000135   1.973s    986.0ns   30.24s    15.12us
    h1_timeout_task                 137      -         -      2.649ms   19.34us
    accept_queue_process             49   152.3us   3.107us   321.7yr   6.564yr
    main+0x146430                     7   5.250us   750.0ns   25.92us   3.702us
    srv_cleanup_idle_conns            1   559.0ns   559.0ns   918.0ns   918.0ns
    task_run_applet                   1      -         -      2.162us   2.162us

  Now it looks like this:
  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2014194   4.794s    2.380us   13.75s    6.826us
    process_stream              2000151   20.01s    10.00us   36.04s    18.02us
    sc_conn_io_cb               2000148   2.167s    1.083us   32.27s    16.13us
    h1_timeout_task                 198   54.24us   273.0ns   3.487ms   17.61us
    accept_queue_process             52   158.3us   3.044us   409.9us   7.882us
    main+0x1466e0                    18   16.77us   931.0ns   63.98us   3.554us
    srv_cleanup_toremove_conns        8   282.1us   35.26us   546.8us   68.35us
    srv_cleanup_idle_conns            3   149.2us   49.73us   8.131us   2.710us
    task_run_applet                   3   268.1us   89.38us   11.61us   3.871us

Note the two-fold difference on process_stream().

This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:

     MINOR: task: permanently enable latency measurement on tasklets
     CLEANUP: task: rename ->call_date to ->wake_date

(cherry picked from commit 62b5b96bcc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Christopher Faulet
43bd98151e BUG/MINOR: task: Fix detection of tasks profiling in tasklet_wakeup_after()
The regression was introduced when ad548b54a7 ["MINOR: task: Add
tasklet_wakeup_after()"] was backported to 2.6 (21e0c31695).
TH_FL_TASK_PROFILING flag does not exist. To detect if tasks profiling is
enabled, "task_profiling_mask" variable must be used.

It is a 2.6-specific issue. Thus there is no upstream commit ID. This patch
must be backported if the commit above is also backported. For now, no
backport is needed.
2022-09-12 17:54:22 +02:00
Willy Tarreau
41f645a05a CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.

This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.

(cherry picked from commit 04e50b3d32)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
c432738cb0 MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.

This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.

This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.

(cherry picked from commit 768c2c5678)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
87f8732e1b BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
There's a subtle (harmless) bug in task_instant_wakeup(). As it uses
some tasklet code instead of some task code, the debug part also acts
on the tasklet equivalent, and the call_date is only set when DEBUG_TASK
is set instead of inconditionally like with tasks. As such, without this
debugging macro, call dates are not updated for tasks woken this way.

There isn't any impact yet because this function was introduced in 2.6 to
solve certain classes of issues and is not used yet, and in the worst case
it would only affect the reported latency time.

This may be backported to 2.6 in case a future fix would depend on it but
currently will not fix existing code.

(cherry picked from commit 0fae3a0360)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
c94b84c6ba BUG/MINOR: task: always reset a new tasklet's call date
The tasklet's call date was not reset, so if profiling was enabled while
some tasklets were in the run queue, their initial random value could be
used to preload a bogus initial latency value into the task profiling bin.
Let's just zero the initial value.

This should be backported to 2.4 as it was brought with initial commit
b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK
is set"). The impact is very low though.

(cherry picked from commit f27acd961e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00