18233 Commits

Author SHA1 Message Date
Amaury Denoyelle
98afc4ec7d BUG/MINOR: mux-quic: do not keep detached qcs with empty Tx buffers
A qcs instance free may be postponed in stream detach operation if the
stream is not locally closed. This condition is there to achieve
transfering data still present in Tx buffer. Once all data have been
emitted to quic-conn layer, qcs instance can be released.

However, the stream is only closed locally if HTX EOM has been seen or
it has been resetted. In case the transfer finished without EOM, a
detached qcs won't be freed even if there is no more activity on it.

This bug was not reproduced but was found on code analysis. Its precise
impact is unknown but it should not cause any leak as all qcs instances
are freed with its parent qcc connection : this should eventually happen
on MUX timeout or QUIC idle timeout.

To adjust this, condition to mark a stream as locally closed has been
extended. On qcc_streams_sent_done() notification, if its Tx buffer has
been fully transmitted, it will be closed if either FIN STREAM was set
or the stream is detached.

This must be backported up to 2.6.

(cherry picked from commit 3dc4e5a5b947aa55b65a6bde17bbce331586894b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:52:40 +02:00
Amaury Denoyelle
4075ed06e6 BUG/MEDIUM: mux-quic: fix nb_hreq decrement
nb_hreq is a counter on qcc for active HTTP requests. It is incremented
for each qcs where a full HTTP request was received. It is decremented
when the stream is closed locally :
- on HTTP response fully transmitted
- on stream reset

A bug will occur if a stream is resetted without having processed a full
HTTP request. nb_hreq will be decremented whereas it was not
incremented. This will lead to a crash when building with
DEBUG_STRICT=2. If BUG_ON_HOT are not active, nb_hreq counter will wrap
which may break the timeout logic for the connection.

This bug was triggered on haproxy.org. It can be reproduced by
simulating the reception of a STOP_SENDING frame instead of a STREAM one
by patching qc_handle_strm_frm() :

+       if (quic_stream_is_bidi(strm_frm->id))
+               qcc_recv_stop_sending(qc->qcc, strm_frm->id, 0);
+       //ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+       //               strm_frm->offset.key, strm_frm->fin,
+       //               (char *)strm_frm->data);

To fix this bug, a qcs is now flagged with a new QC_SF_HREQ_RECV. This
is set when the full HTTP request is received. When the stream is closed
locally, nb_hreq will be decremented only if this flag was set.

This must be backported up to 2.6.

(cherry picked from commit afb7b9d8e5a70a741bbb890945fa9ff51dad027d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-20 15:52:18 +02:00
Willy Tarreau
59b23ebdf4 SCRIPTS: announce-release: update some URLs to https
Some components like Discourse were already redirecting to https. Other
ones like docs and git are covered by the certificate, and finally
switching the advertised scheme for www should increase the ratio of
H2 and H3 in the stats (resp 8.9 and 1.9%) and possibly help spot new
issues.

(cherry picked from commit 68b3e135e36ddb17a6b2643c7af938226705f713)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:44:04 +02:00
Willy Tarreau
3822ac3e5e BUILD: fd: fix a build warning on the DWCAS
Ilya reported in issue #1816 a build warning on armhf (promoted to error
here since -Werror):

  src/fd.c: In function fd_rm_from_fd_list:
  src/fd.c:209:87: error: passing argument 3 of __ha_cas_dw discards volatile qualifier from pointer target type [-Werror=discarded-array-qualifiers]
    209 |    unlikely(!_HA_ATOMIC_DWCAS(((long *)&fdtab[fd].update), (uint32_t *)&cur_list.u32, &next_list.u32))
        |                                                                                       ^~~~~~~~~~~~~~

This happens only on such an architecture because the DWCAS requires the
pointer not the value, and gcc seems to be needlessly picky about reading
a const from a volatile! This may safely be backported to older versions.

(cherry picked from commit 85af76070412d87433fbcaa0ac95833a8470159d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:43:28 +02:00
Willy Tarreau
61b4de015c BUG/MEDIUM: captures: free() an error capture out of the proxy lock
Ed Hein reported in github issue #1856 some occasional watchdog panics
in 2.4.18 showing extreme contention on the proxy's lock while the libc
was in malloc()/free(). One cause of this problem is that we call free()
under the proxy's lock in proxy_capture_error(), which makes no sense
since if we can free the object under the lock after it's been detached,
we can also free it after releasing the lock (since it's not referenced
anymore).

This should be backported to all relevant versions, likely all
supported ones.

(cherry picked from commit da9f25875958757fd1f16b74bd887977e78c8b09)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:42:32 +02:00
cui fliter
e8b0b14eff CLEANUP: quic,ssl: fix tiny typos in C comments
This fixes 4 tiny and harmless typos in mux_quic.c, quic_tls.c and
ssl_sock.c. Originally sent via GitHub PR #1843.

Signed-off-by: cui fliter <imcusg@gmail.com>
[Tim: Rephrased the commit message]
[wt: further complete the commit message]
(cherry picked from commit a94bedc0de218e784683e52ba669912b6cc71741)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:42:20 +02:00
Aurelien DARRAGON
d87047115f BUG/MEDIUM: server: segv when adding server with hostname from CLI
When calling 'add server' with a hostname from the cli (runtime),
str2sa_range() does not resolve hostname because it is purposely
called without PA_O_RESOLVE flag.

This leads to 'srv->addr_node.key' being NULL. According to Willy it
is fine behavior, as long as we handle it properly, and is already
handled like this in srv_set_addr_desc().

This patch fixes GH #1865 by adding an extra check before inserting
'srv->addr_node' into 'be->used_server_addr'. Insertion and removal
will be skipped if 'addr_node.key' is NULL.

It must be backported to 2.6 and 2.5 only.

(cherry picked from commit 8d0ff284064e7a47ae46897e0ce9b08abe539315)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:42:09 +02:00
Amaury Denoyelle
2ff9519605 BUG/MINOR: mux-quic: do not remotely close stream too early
A stream is considered as remotely closed once we have received all the
data with the FIN bit set.

The condition to close the stream was wrong. In particular, if we
receive an empty STREAM frame with FIN bit set, this would have close
the stream even if we do not have yet received all the data. The
condition is now adjusted to ensure that Rx buffer contains all the data
up to the stream final size.

In most cases, this bug is harmless. However, if compiled with
DEBUG_STRICT=2, a BUG_ON_HOT crash would have been triggered if close is
done too early. This was most notably the case sometimes on interop test
suite with quinn or kwik clients. This can also be artificially
reproduced by simulating reception of an empty STREAM frame with FIN bit
set in qc_handle_strm_frm() :

+       if (strm_frm->fin) {
+               qcc_recv(qc->qcc, strm_frm->id, 0,
+                        strm_frm->len, strm_frm->fin,
+                        (char *)strm_frm->data);
+       }
        ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
                       strm_frm->offset.key, strm_frm->fin,
                       (char *)strm_frm->data);

This must be backported up to 2.6.

(cherry picked from commit d1310f8d327b7102558e8c549ce09e4925b1824b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:42 +02:00
Amaury Denoyelle
01a5be8c38 CLEANUP: mux-quic: remove stconn usage in h3/hq
Small cleanup on snd_buf for application protocol layer.
* do not export h3_snd_buf
* replace stconn by a qcs argument. This is better as h3/hq-interop only
  uses the qcs instance.

This should be backported up to 2.6.

(cherry picked from commit 8d4ac48d3def189190c29b6f1f5d697b180f7e30)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:38 +02:00
Amaury Denoyelle
57b3c47e70 BUG/MEDIUM: mux-quic: fix crash on early app-ops release
H3 SETTINGS emission has recently been delayed. The idea is to send it
with the first STREAM to reduce sendto syscall invocation. This was
implemented in the following patch :
  3dd79d378c86b3ebf60e029f518add5f1ed54815
  MINOR: h3: Send the h3 settings with others streams (requests)

This patch works fine under nominal conditions. However, it will cause a
crash if a HTTP/3 connection is released before having sent any data,
for example when receiving an invalid first request. In this case,
qc_release will first free qcc.app_ops HTTP/3 application protocol layer
via release callback. Then qc_send is called to emit any closing frames
built by app_ops release invocation. However, in qc_send, as no data has
been sent, it will try to complete application layer protocol
intialization, with a SETTINGS emission for HTTP/3. Thus, qcc.app_ops is
reused, which is invalid as it has been just freed. This will cause a
crash with h3_finalize in the call stack.

This bug can be reproduced artificially by generating incomplete HTTP/3
requests. This will in time trigger http-request timeout without any
data send. This is done by editing qc_handle_strm_frm function.

-       ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+       ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len - 1,
                       strm_frm->offset.key, strm_frm->fin,
                       (char *)strm_frm->data);

To fix this, application layer closing API has been adjusted to be done
in two-steps. A new shutdown callback is implemented : it is used by the
HTTP/3 layer to generate GOAWAY frame in qc_release prologue.
Application layer context qcc.app_ops is then freed later in qc_release
via the release operation which is now only used to liberate app layer
ressources. This fixes the problem as the intermediary qc_send
invocation will be able to reuse app_ops before it is freed.

This patch fixes the crash, but it would be better to adjust H3 SETTINGS
emission in case of early connection closing : in this case, there is no
need to send it. This should be implemented in a future patch.

This should fix the crash recently experienced by Tristan in github
issue #1801.

This must be backported up to 2.6.

(cherry picked from commit f8aaf8bdfa40e21b1a2f600c3ed6455bf9b6a763)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:25 +02:00
William Lallemand
86120e4953 MEDIUM: quic: separate path for rx and tx with set_encryption_secrets
With quicTLS the set_encruption_secrets callback is always called with
the read_secret and the write_secret.

However this is not the case with libreSSL, which uses the
set_read_secret()/set_write_secret() mecanism. It still provides the
set_encryption_secrets() callback, which is called with a NULL
parameter for the write_secret during the read, and for the read_secret
during the write.

The exchange key was not designed in haproxy to be called separately for
read and write, so this patch allow calls with read or write key to
NULL.

(cherry picked from commit 95fc737fc6edfa2575ce982b739184e99475c215)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:41:19 +02:00
Mathias Weiersmueller
af88b39e2d DOC: fix TOC in starter guide for subsection 3.3.8. Statistics
This subsection has been moved from 3.4.9 to 3.3.8 somewhere along
2.4, but the TOC has not been updated - resulting in a invalid
anchor in the HTML version.

Needs to be backported to 2.4+

(cherry picked from commit 243c2d18221b36b087ad9c177293306119f3fafd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:40:43 +02:00
William Lallemand
26dd079ed9 REGTESTS: ssl/log: test the log-forward with SSL
Test the log-forward section with an SSL server and an SSL bind.

Must be backported as far as 2.3.

(cherry picked from commit 23bc0b20bd82c983bccb289825c6024730aaf405)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:37:03 +02:00
Emeric Brun
2cc1ed89d4 BUG/MEDIUM: sink: bad init sequence on tcp sink from a ring.
The init of tcp sink, particularly for SSL, was done
too early in the code, during parsing, and this can cause
a crash specially if nbthread was not configured.

This was detected by William using ASAN on a new regtest
on log forward.

This patch adds the 'struct proxy' created for a sink
to a list and this list is now submitted to the same init
code than the main proxies list or the log_forward's proxies
list. Doing this, we are assured to use the right init sequence.
It also removes the ini code for ssl from post section parsing.

This patch should be backported as far as v2.2

Note: this fix uses 'goto' labels created by commit
'BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized'
but this code didn't exist before v2.3 so this patch needs to be
adapted for v2.2.

(cherry picked from commit d6e581de4be1d3564d771056303242c9ae930c40)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:36:59 +02:00
William Lallemand
4b10e9bea8 REGTESTS: log: test the log-forward feature
This reg-test test the log-forward feature by chaining a UDP and a TCP
log-forwarder.

It could be backported as far as 2.3.

(cherry picked from commit ebf600a8384040a023b5278c1005ee1a2c04d712)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:36:54 +02:00
Aurelien DARRAGON
e96752b16d BUG/MINOR: listener: null pointer dereference suspected by coverity
Please refer to GH #1859 for more info.
Coverity suspected improper proxy pointer handling.
Without the fix it is considered safe for the moment, but it might not
be the case in the future as we want to keep the ability to have
isolated listeners.

Making sure stop_listener(), pause_listener(), resume_listener()
and listener_release() functions make proper use
of px pointer in that context.

No need for backport except if multi-connection protocols (ie:FTP)
were to be backported as well.

(cherry picked from commit a57786e87d0746baec43ea888bf6cd30c490d2fb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-19 11:36:37 +02:00
Aurelien DARRAGON
c0f8d3477d CLEANUP: listener: function comment typo in stop_listener()
A minor typo related to stop_listener() function comment
was introduced in 0013288.

This makes stop_listener() function comment easier to read.

(cherry picked from commit 187396e34ed1ab28e73ebcd678fbe7acc32eaad4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Christopher Faulet
852411765e REGTESTS: healthcheckmail: Relax matching on the healthcheck log message
Depending on the timing, the conneciton on lisrv listener may be fully
accepted before any reject. Thus, instead of getting a socket error, an
invalid L7 response is reported. There is no reason to be strick on the
error type. Any failure is good here, because we just want to test the
email-alert feature.

This patch should fix issue #1857. It may be backported as far as 2.2.

(cherry picked from commit 28bc152aa4a42ba91818aaf2af33ccb76f75a426)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Christopher Faulet
a5f7291252 BUG/MINOR: mux-h1: Increment open_streams counter when H1 stream is created
Since this counter was added, it was incremented at the wrong place for
client streams. It was incremented when the stream-connector (formely the
conn-stream) was created while it should be done when the H1 stream is
created. Thus, on parsing error, on H1>H2 upgrades or TCP>H1 upgrades, the
counter is not incremented. However, it is always decremented when the H1
stream is destroyed.

On bakcned side, there is no issue.

This patch must be backported to 2.6.

(cherry picked from commit af5336fd238ad2dcac8c09c45720152e36be7bb9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
a0c0f446af CLEANUP: pollers: remove dead code in the polling loop
As reported by Ilya and Coverity in issue #1858, since recent commit
eea152ee6 ("BUG/MINOR: signals/poller: ensure wakeup from signals")
which removed the test for the global signal flag from the pollers'
loop, the remaining "wake" flag doesn't need to be tested since it
already participates to zeroing the wait_time and will be caught
on the previous line.

Let's just remove that test now.

(cherry picked from commit af985e0151c7d12d9dac4fc364b5c50d3db1e1db)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Aurelien DARRAGON
8cbc474b39 BUG/MINOR: stats: fixing stat shows disabled frontend status as 'OPEN'
This patch adresses the issue #1626.

Adding support for PR_FL_PAUSED flag in the function stats_fill_fe_stats().
The command 'show stat' now properly reports a disabled frontend
using "PAUSED" state label.

This patch depends on the following commits:
  - 7d00077fd5 "BUG/MEDIUM: proxy: ensure pause_proxy()
  and resume_proxy() own PROXY_LOCK".
  - 001328873c "MINOR: listener: small API change"
  - d46f437de6 "MINOR: proxy/listener: support for additional PAUSED state"

It should be backported to 2.6, 2.5 and 2.4

(cherry picked from commit cddec0aef526f2dc64bad5a83ad788d60c12639c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Aurelien DARRAGON
1f9d9f53f0 MINOR: proxy/listener: support for additional PAUSED state
This patch is a prerequisite for #1626.
Adding PAUSED state to the list of available proxy states.
The flag is set when the proxy is paused at runtime (pause_listener()).
It is cleared when the proxy is resumed (resume_listener()).

It should be backported to 2.6, 2.5 and 2.4

(cherry picked from commit d46f437de69d5d4d84a207531a3ba6f8d3d697dc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Aurelien DARRAGON
614f99ee0a MINOR: listener: small API change
A minor API change was performed in listener(.c/.h) to restore consistency
between stop_listener() and (resume/pause)_listener() functions.

LISTENER_LOCK was never locked prior to calling stop_listener():
lli variable hint is thus not useful anymore.

Added PROXY_LOCK locking in (resume/pause)_listener() functions
with related lpx variable hint (prerequisite for #1626).

It should be backported to 2.6, 2.5 and 2.4

(cherry picked from commit 001328873c352e5e4b1df0dcc8facaf2fc1408aa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Aurelien DARRAGON
3c263c09fa BUG/MEDIUM: proxy: ensure pause_proxy() and resume_proxy() own PROXY_LOCK
There was a race involving hlua_proxy_* functions
and some proxy management functions.

pause_proxy() and resume_proxy() can be used directly from lua code,
but that could lead to some race as lua code didn't make sure PROXY_LOCK
was owned before calling the proxy functions.

This patch makes sure it won't happen again elsewhere in the code
by locking PROXY_LOCK directly in resume and pause proxy functions
so that it's not the caller's responsibility anymore.
(based on stop_proxy() behavior that was already safe prior to the patch)

This should be backported to stable series.
Note that the API will likely differ < 2.4

(cherry picked from commit 7d00077fd5bd21e13aa976e6f3221cd44ae05eea)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
23cd90d9d2 DEV: flags: add missing CO_FL_FDLESS connection flag
This was added in 2.6 by commit c78a9698e ("MINOR: connection: add a new
flag CO_FL_FDLESS on fd-less connections") but forgotten in flags.c.
This must be backported to 2.6.

(cherry picked from commit 20273ceec0a6ec2645ee26c44dcc20c57ed0c92e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
f67f8eaafa DEV: flags: fix usage message to reflect available options
The proposed decoding options were not updated after the changes in 2.6,
let's fix that by taking the names from the existing declaration. This
should be backported to 2.6.

(cherry picked from commit c7ac17412b3478eed1725d73e8f51b8cc1118d93)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Ilya Shipitsin
2a87e4d4c4 CI: cirrus-ci: bump FreeBSD image to 13-1
we use FreeBSD binary packages that we rebuilt for FreeBSD-13.1

Newer FreeBSD version for package zstd:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1301000
- running kernel: 1300139

(cherry picked from commit b0ab121da142b68d2a53311eff5f9b66dc62531f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Matthias Wirth
4a1f703e9a BUG/MINOR: signals/poller: ensure wakeup from signals
Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.

The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.

Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.

The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().

This fixes issue #1841.

Must be backported in every stable version.

(cherry picked from commit eea152ee68e82eae49ae188cd1b1fbbf63dc6913)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
ba627a66ce MINOR: h3: Send the h3 settings with others streams (requests)
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.

Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.

Perhaps there is a better way to do. This will have to be checked with Amaury.

Must be backported to 2.6.

(cherry picked from commit 3dd79d378c86b3ebf60e029f518add5f1ed54815)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
5b723503b9 MINOR: h3: Missing connection argument for a TRACE_LEAVE() argument
This should help in debbuging issues to be able to associate this trace to a
QUIC connection.

Must be backported to 2.6.

(cherry picked from commit befcf7031d79298ab68c0d19ba77fa991aa9f024)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
5bf1c9b808 MINOR: h3: Add the quic_conn object to h3 traces
This is very useful to associate h3 traces to a QUIC connection when debugging.

Must be backported to 2.6.

(cherry picked from commit 2eb5faa2ad734c2c65186da2533732163ead5d43)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
7ab54248c2 BUG/MINOR: h3: Crash when h3 trace verbosity is "minimal"
This was due to a missing check in h3_trace() about the first argument
presence (connection) and h3_parse_settings_frm() which calls TRACE_LEAVE()
without any argument. Then this argument was dereferenced.

Must be backported to 2.6

(cherry picked from commit 1c725aa9cd0e3799d7751381aabc9862bed10aff)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
7652bec1cf BUG/MINOR: quic: Trace fix about packet number space information.
<qc> variable was confused with <qel>. The consequence was that it was
always the same packet number space which was displayed: the first one (or
the Initial packet number space).

Must be backported to 2.6.

(cherry picked from commit 3c1b81fdd7c0e9822bbb8a14c4b665b7df53ecf5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
68436e33d1 BUG/MINOR: quic: Speed up the handshake completion only one time
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).

Must be backported to 2.6.

(cherry picked from commit bb995eafc7e8e7d0457e1c3af17a98ef94d8b40b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
William Lallemand
96aaeb5fc3 BUG/MINOR: signals/poller: set the poller timeout to 0 when there are signals
When receiving a signal before entering the poller, and without any
activity in the process, the poller will be entered with a timeout
calculated without checking the signals.

Since commit 4f59d3 ("MINOR: time: increase the minimum wakeup interval
to 60s") the issue is much more visible because it could be stuck for
60s.

When in mworker mode, if a worker quits and the SIGCHLD signal deliver
at the right time to the master, this one could be stuck for the time of
the timeout.

This should fix issue #1841

Must be backported in every stable version.

(cherry picked from commit 43c891dda0c7c1c9f12dab5b77ac20b158a68adc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
dc5f81140f BUG/MINOR: stream/sched: take into account CPU profiling for the last call
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.

The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:

   MINOR: sched: store the current profile entry in the thread context

It should be backported to 2.6 as it does help for troubleshooting.

(cherry picked from commit beee600491c15861a923113ee322c9f57aba07e5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
397fcc008d MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.

(cherry picked from commit 1efddfa6bfdcaf57198866db67e49b40442d278f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
1cb273c718 BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
  - a new wakeup while the task is executing would reset the ->wake_date
    field before returning and make abnormally low values being reported;
    that was likely the case for taskèrun_applet for self-waking applets;

  - when the task dies, NULL is returned and the call date couldn't be
    retrieved, so that CPU time was not being accounted for. This was
    particularly visible with process_stream() which is usually called
    only twice per request, and whose time was systematically halved.

The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.

One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.

Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.

Task profiling output after 1 million HTTP request previously looked like
this:

  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2012338   4.850s    2.410us   12.91s    6.417us
    process_stream              2000136   9.594s    4.796us   34.26s    17.13us
    sc_conn_io_cb               2000135   1.973s    986.0ns   30.24s    15.12us
    h1_timeout_task                 137      -         -      2.649ms   19.34us
    accept_queue_process             49   152.3us   3.107us   321.7yr   6.564yr
    main+0x146430                     7   5.250us   750.0ns   25.92us   3.702us
    srv_cleanup_idle_conns            1   559.0ns   559.0ns   918.0ns   918.0ns
    task_run_applet                   1      -         -      2.162us   2.162us

  Now it looks like this:
  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2014194   4.794s    2.380us   13.75s    6.826us
    process_stream              2000151   20.01s    10.00us   36.04s    18.02us
    sc_conn_io_cb               2000148   2.167s    1.083us   32.27s    16.13us
    h1_timeout_task                 198   54.24us   273.0ns   3.487ms   17.61us
    accept_queue_process             52   158.3us   3.044us   409.9us   7.882us
    main+0x1466e0                    18   16.77us   931.0ns   63.98us   3.554us
    srv_cleanup_toremove_conns        8   282.1us   35.26us   546.8us   68.35us
    srv_cleanup_idle_conns            3   149.2us   49.73us   8.131us   2.710us
    task_run_applet                   3   268.1us   89.38us   11.61us   3.871us

Note the two-fold difference on process_stream().

This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:

     MINOR: task: permanently enable latency measurement on tasklets
     CLEANUP: task: rename ->call_date to ->wake_date

(cherry picked from commit 62b5b96bcc91985cb6bf6a30264ef3c54315c7c7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Christopher Faulet
43bd98151e BUG/MINOR: task: Fix detection of tasks profiling in tasklet_wakeup_after()
The regression was introduced when ad548b54a7 ["MINOR: task: Add
tasklet_wakeup_after()"] was backported to 2.6 (21e0c31695).
TH_FL_TASK_PROFILING flag does not exist. To detect if tasks profiling is
enabled, "task_profiling_mask" variable must be used.

It is a 2.6-specific issue. Thus there is no upstream commit ID. This patch
must be backported if the commit above is also backported. For now, no
backport is needed.
2022-09-12 17:54:22 +02:00
Willy Tarreau
41f645a05a CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.

This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.

(cherry picked from commit 04e50b3d325fa35ce9557701513773a8a84e9230)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
c432738cb0 MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.

This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.

This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.

(cherry picked from commit 768c2c5678d462a3622492a1230946978292571e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
87f8732e1b BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
There's a subtle (harmless) bug in task_instant_wakeup(). As it uses
some tasklet code instead of some task code, the debug part also acts
on the tasklet equivalent, and the call_date is only set when DEBUG_TASK
is set instead of inconditionally like with tasks. As such, without this
debugging macro, call dates are not updated for tasks woken this way.

There isn't any impact yet because this function was introduced in 2.6 to
solve certain classes of issues and is not used yet, and in the worst case
it would only affect the reported latency time.

This may be backported to 2.6 in case a future fix would depend on it but
currently will not fix existing code.

(cherry picked from commit 0fae3a0360314285a17153cac76413184143ee74)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Willy Tarreau
c94b84c6ba BUG/MINOR: task: always reset a new tasklet's call date
The tasklet's call date was not reset, so if profiling was enabled while
some tasklets were in the run queue, their initial random value could be
used to preload a bogus initial latency value into the task profiling bin.
Let's just zero the initial value.

This should be backported to 2.4 as it was brought with initial commit
b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK
is set"). The impact is very low though.

(cherry picked from commit f27acd961e9b4291f80bc54100e57969ec4372ec)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
d404122c2f BUG/MINOR: quic: Wrong connection ID to thread ID association
To work, quic_pin_cid_to_tid() must set cid[0] to a value with <target_id>
as <global.nbthread> modulo. For each integer n, (n - (n % m)) + d has always
d as modulo m (with d < m).

So, this statement seemed correct:

     cid[0] = cid[0] - (cid[0] % global.nbthread) + target_tid;

except when n wraps or when another modulo is applied to the addition result.
Here, for 8bit modulo arithmetic, if m does not divides 256, this cannot
works for values which wraps when we increment them by d.
For instance n=255 m=3 and d=1 the formula result is 0 (should be d).

To fix this, we first limit c[0] to 255 - <target_id> to prevent c[0] from wrapping.

Thank you to @esb for having reported this issue in GH #1855.

Must be backported to 2.6

(cherry picked from commit 3122c75fd1f9a73a13ec533a4f313be0af1c5348)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
089ed1081c MINOR: quic: No TRACE_LEAVE() in retrieve_qc_conn_from_cid()
This macro was confused with TRACE_ENTER().

Should be backported to 2.6.

(cherry picked from commit 614742b79c63626cf477b4a85779db41223adbf9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
Frédéric Lécaille
c29b565e08 MINOR: quic: Add traces about sent or resent TX frames
Very useful to help in debugging issues, especially during retransmissions.

Should be backported to 2.6

(cherry picked from commit 449804e27dba70949b8495f46ee8de5664a5ddd1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
William Lallemand
1c2991ec14 MINOR: quic: add QUIC support when no client_hello_cb
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.

This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h

(cherry picked from commit 70a6e637b47d8e0ccf49dff8e2f3f4bb1a9c0b29)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
William Lallemand
069ad6acc3 BUILD: quic: fix the #ifdef in ssl_quic_initial_ctx()
As done on with ssl_sock_initial_ctx(), cleanup the ifdef for the
client_hello_cb and the no anti replay.

(cherry picked from commit 373ce73695541b9bdb9826a63a6a092cb2dbe779)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
William Lallemand
825468a159 BUILD: ssl: fix the ifdef mess in ssl_sock_initial_ctx
ssl_sock_initial_ctx uses the wrong #ifdef to check the availability of
the client_hello_cb.

Cleanup the #ifdef, add comments and indentation.

(cherry picked from commit 4b7938d1604ce5cd782693add21b461b634a8005)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00
William Lallemand
927e42b419 BUILD: quic: enable early data only with >= openssl 1.1.1
Disable the early data in the QUIC code when not built with openssl >=
1.1.1.

LibreSSL 3.6.0 is impacted.

(cherry picked from commit e6ec626ac5b21041b997de350f29e385c479155d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-09-12 17:54:22 +02:00