18396 Commits

Author SHA1 Message Date
Frédéric Lécaille
3da16eabd4 BUG/MINOR: quic: Endless loop during retransmissions
qc_dgrams_retransmit() could reuse the same local list and could splice it two
times to the packet number space list of frame to be send/resend. This creates a
loop in this list and makes qc_build_frms() possibly endlessly loop when trying
to build frames from the packet number space list of frames. Then haproxy aborts.

This issue could be easily reproduced patching qc_build_frms() function to set <dlen>
variable value to 0 after having built at least 10 CRYPTO frames and using ngtcp2
as client with 30% packet loss in both direction.

Thank you to @gabrieltz for having reported this issue in GH #1903.

Must be backported to 2.6.

(cherry picked from commit 7b5d9b1f03cef92bda6cd2a3be93b9bbbfd61734)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:31:34 +01:00
Amaury Denoyelle
6ce0ffdb3d CLEANUP: ncbuf: use standard BUG_ON with DEBUG_STRICT
ncbuf can be compiled for haproxy or standalone to run unit test suite.
For the latest mode, BUG_ON() macro has been re-implemented in a simple
version.

The inclusion of the default or the redefined macro relied on DEBUG_DEV.
Change this to now rely on DEBUG_STRICT as this is activated for the
default build.

This change is safe as only BUG_ON_HOT() macro is used in ncbuf code,
which is activated only with the default value DEBUG_STRICT=2.

This should be backported up to 2.6.

(cherry picked from commit 2526a6aca5c6f453571f2ccdcea7e2ab66aeea67)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:31:31 +01:00
Amaury Denoyelle
8083a52d73 CLEANUP: ncbuf: inline small functions
ncbuf API relies on lot of small functions. Mark these functions as
inline to reduce call invocations and facilitate compiler optimizations
to reduce code size.

This should be backported up to 2.6.

(cherry picked from commit d64a26f0238f386065e26654e6a8a925f96c8baa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:31:28 +01:00
Amaury Denoyelle
213d39921a CLEANUP: ncbuf: remove ncb_blk args by value
ncb_blk structure is used to represent a block of data or a gap in a
non-contiguous buffer. This is used in several functions for ncbuf
implementation. Before this patch, ncb_blk was passed by value, which is
sub-optimal. Replace this by const pointer arguments.

This has the side-effect of suppressing a compiler warning reported in
older GCC version :
  CC      src/http_conv.o
  src/ncbuf.c: In function 'ncb_blk_next':
  src/ncbuf.c:170: warning: 'blk.end' may be used uninitialized in this function

This should be backported up to 2.6.

(cherry picked from commit 17e20e8cefbdd6518f362325cd528aee2bcdb277)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:31:24 +01:00
Willy Tarreau
d4223c17e7 SCRIPTS: announce-release: add a link to the data plane API
Since Marko announced at HAProxyConf 2022 that the data plane API is
mostly complete and will now follow the same release cycle as haproxy
starting with 2.7, it's probably the right moment to encourage users
to start trying it so that we can hope to migrate all the painful
discovery stuff there in a not too distant future.

Let's just point to the latest release for now. We'll see in the future
if we need to adapt the link depending on the branch.

(cherry picked from commit e3a02d5e08e4ee4f70bb4553694bc32adb7931fc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:31:03 +01:00
Willy Tarreau
541ef40850 DOC: config: clarify the -m dir and -m dom pattern matching methods
There's regularly some confusion about them (do they match at the
beginning, end ? do they support multiple components etc). Tim
suggested to improve the doc in issue #61, it's never too late, so
let's do it now wih a few examples.

(cherry picked from commit f386a2de92702386ff73aa9da662ebed2766bfbb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:29:48 +01:00
Willy Tarreau
e29a4d1547 DOC: config: clarify the fact that "retries" is not just for connections
In issue #412 it was rightfully reported that the wording in "retries"
still exclusively speaks about connection attempts, while since L7
retries with "retry-on" it's no longer a limitation. Let's update the
text.

(cherry picked from commit 0b4a622b49437e9f41f764a3909f43b7cbd04f61)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:29:38 +01:00
Willy Tarreau
73f69e72a7 DOC: config: explain how default matching method for ACL works
In issue #698, it's made apparent that the default matching method for
ACL keywords can be confusing when a converter is applied, because
depending on the converters used, users may think that the default
matching method from the sample fetch name might apply to the whole
expression. It's easier to understand that this doesn't make sense
when thinking about converters turning to completely different types
(e.g. hdr_beg(host),do_resolve() returns an IP, thus it's obvious
that _beg makes no sense at all).  This patch states this in the
doc to avoid future confusion.

(cherry picked from commit 4f4fea417b2c80b2db738a8cf2ea0a3c67ccfc7b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:29:28 +01:00
Willy Tarreau
104cf6ed1a DOC: config: mention that a single monitor-uri rule is supported
It was reported in issue #1059 that when multiple monitor-uri rules are
specified, only the last one is used. While this was done on purpose
since a single URI is used, it was not clearly mentioned in the doc,
possibly leading to confusion or wasted time trying to establish a
working setup. Let's clarify this point.

(cherry picked from commit 7fe0c62516ac0cb54290f8b6febe351038792e98)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:29:18 +01:00
Willy Tarreau
0b41d6be9b DOC: config: clarify the fact that SNI should not be used in HTTP scenarios
As reported by Tim in issue #1373 some warnings are deserved to explain
why using the frontend SNI for routing or connecting to a server is
usually not correct, especially since it can be tempting and used to
make sense in pure TCP scenarios.

(cherry picked from commit d26fb57e817da6cb587c0b61baf60a011e756c49)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:29:11 +01:00
Willy Tarreau
0b80fa9093 DOC: config: refer to section about quoting in the "add_item" converter
As requested by Nick in issue #1719, let's add a reference to the section
about quoting there, since add_item() will often be used with commas and
it's easy to mess up.

(cherry picked from commit b143d110bf6b5fa65ba864e153f7cc0ee1c5ebac)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:28:29 +01:00
Willy Tarreau
e3bafd5980 DOC: config: provide some configuration hints for "http-reuse"
This adds some configuration hints regarding various workloads that do
not manage to achieve high reuse rates due to too low a global maxconn
or thread groups.

This fixes github issue #1472.

(cherry picked from commit 44fce8bd73cc743afc7e7f3f70a5a177a9557e62)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:28:15 +01:00
Christopher Faulet
f7acbd8407 Revert "BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action"
This reverts commit 7533b98b8a4023f815eb7374fa3118bed67a3fc5.

This fix is reverted for now because it may introduce issues with some
config. So we want to announce the possible breakage in the 2.6.7 and
include the fix into the 2.6.8. This way, users will have some time to
modify their configuration.
2022-12-02 15:22:28 +01:00
William Lallemand
b1351c1a05 BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.

This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.

Must be backported in 2.6.

(cherry picked from commit 0a2d63236c4ada9a33f7e9495aa332fdcd9f5f82)
[wla: context changed in httpclient_precheck()]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:58:29 +01:00
William Lallemand
d371520001 MINOR: ssl: forgotten newline in error messages on ca-file
Add forgotten newlines in ssl_store_load_ca_from_buf() error messages.

(cherry picked from commit 3992f55ff390b3db423ffbc7fd322e84a7256ab4)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:52:15 +01:00
William Lallemand
854e30d7d6 MINOR: ssl: enhance ca-file error emitting
Enhance the errors and warnings when trying to load a ca-file with
ssl_store_load_locations_file().

Add errors from ERR_get_error() and strerror to give more information to
the user.

(cherry picked from commit 0f17ab2fdd83ff4f58151a44a8fe1bc22e5b9c18)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:51:34 +01:00
Amaury Denoyelle
2c4ad1e89d BUG/MEDIUM: quic: fix datagram dropping on queueing failed
After reading a datagram, it is enqueud for the thread attached to the
DCID. This is done via quic_lstnr_dgram_dispatch() function. If this
step fails, we remove the datagram from the buffer of quic_receiver_buf.

This step is faulty because we use b_del() instead of b_sub(). If
quic_receiver_buf was not empty, we will remove content from another
datagram while leaving the content of the last read datagram. This
probably produces valid datagram dropping and may even result in crash.

As stated, this bug can only happen if qc_lstnr_dgram_dispatch() fails
which happen on two occaions :
* on quic_dgram allocation failure, which should be pretty rare
* on datagram labelled as invalid for QUIC protocol. This may happen
  more frequently depending on the network conditions. Thus, this bug
  has been labelled as a medium one.

This should be backported up to 2.6.

(cherry picked from commit 9875f024baa450f412e7dfacdd7172bb57a39b8e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:30:35 +01:00
Aurelien DARRAGON
d8a7a1af58 DOC: configuration.txt: fix typo in table_idle signature
An extra ',' was mistakenly added in table_idle converter signature
with commit ed36968 ("DOC: configuration.txt: add default_value for
table_idle signature").

(cherry picked from commit fd766ddfaf5b08402c4b32e607c1aa1572b063fe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:29:18 +01:00
Christopher Faulet
df6e94633e BUG/MINOR: mux-h1: Fix handling of 408-Request-Time-Out
When a timeout is detected waiting for the request, a 408-Request-Time-Out
response is sent. However, an error was introduced by commit 6858d76cd3
("BUG/MINOR: mux-h1: Obey dontlognull option for empty requests"). Instead
of inhibiting the log message, the option was stopping the error sending.

Of course, we must do the opposite.

This patch must be backported as far as 2.4.

(cherry picked from commit 227424450c874cd17fbf81af487c2f2775668877)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:29:04 +01:00
Christopher Faulet
9be48aa215 BUILD: ssl-sock: Silent error about NULL deref in ssl_sock_bind_verifycbk()
In ssl_sock_bind_verifycbk(), when compiled without QUIC support, the
compiler may report an error during compilation about a possible NULL
dereference:

src/ssl_sock.c: In function ‘ssl_sock_bind_verifycbk’:
src/ssl_sock.c:1738:12: error: potential null pointer dereference [-Werror=null-dereference]
 1738 |         ctx->xprt_st |= SSL_SOCK_ST_FL_VERIFY_DONE;
      |         ~~~^~~~~~~~~

A BUG_ON() was addeded because it must never happen. But when compiled
without DEBUG_STRICT, there is nothing to help the compiler. Thus
ALREADY_CHECKED() macro is used. The ssl-sock context and the bind config
are concerned.

This patch must be backported to 2.6.

(cherry picked from commit 881cce9f139ddc4f02994ad443f9fc250e34da7c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:28:57 +01:00
Aurelien DARRAGON
2c11d36960 DOC: configuration.txt: add default_value for table_idle signature
table_idle converter takes optional default_value argument.
The documentation correctly describes this usage but <default_value> was
missing in the converter signature.

It should be backported with bbeec37b3 ("MINOR: stick-table:
Add table_expire() and table_idle() new converters")

(cherry picked from commit ed36968f16d15336e34045319d0f220e04d4e733)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:28:45 +01:00
Christopher Faulet
9d7b6975f7 BUILD: http-htx: Silent build error about a possible NULL start-line
In http_replace_req_uri(), if the URI was successfully replaced, it means
the start-line exists. However, the compiler reports an error about a
possible NULL pointer dereference:

src/http_htx.c: In function ‘http_replace_req_uri’:
src/http_htx.c:392:19: error: potential null pointer dereference [-Werror=null-dereference]
  392 |         sl->flags &= ~HTX_SL_F_NORMALIZED_URI;

So, ALREADY_CHECKED() macro is used to silent the build error. This patch
must be backported with 84cdbe478a ("BUG/MINOR: http-htx: Don't consider an
URI as normalized after a set-uri action").

(cherry picked from commit 92c2de1a06a5d2be4f498503fa09372256339c8b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:28:29 +01:00
Christopher Faulet
7533b98b8a BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action
An abosulte URI is marked as normalized if it comes from an H2 client. This
way, we know we can send a relative URI to an H1 server. But, after a
set-uri action, the URI must no longer be considered as normalized.
Otherwise there is no way to send an absolute URI on the server side.

If it is important to update a normalized absolute URI without altering this
property, the host, path and/or query-string must be set separatly.

This patch should fix the issue #1938. It should be backported as far as
2.4.

(cherry picked from commit 84cdbe478a82afdcaf4f049e8ed431ca349c6ba2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:27:47 +01:00
Aurelien DARRAGON
0a069e2482 BUG/MINOR: log: fix parse_log_message rfc5424 size check
In parse_log_message(), if log is rfc5424 compliant, p pointer
is incremented and size is not. However size is still used in further
checks as if p pointer was not incremented.

This could lead to logic error or buffer overflow if input buf is not
null-terminated.

Fixing this by making sure size is up to date where it is needed.

It could be backported up to 2.4.

(cherry picked from commit ab9efc25f07b9870c827e2da05fe23a084a8b8f4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:23 +01:00
Aurelien DARRAGON
c33990458c BUG/MINOR: cfgparse-listen: fix ebpt_next_dup pointer dereference on proxy "from" inheritance
ebpt_next_dup() was used 2 times in a row but only the first call was
checked against NULL, probably assuming that the 2 calls always yield the
same result here.

gcc is not OK with that, and it should be safer to store the result of
the first call in a temporary var to dereference it once checked against NULL.

This should fix GH #1869.
Thanks to Ilya for reporting this issue.

It may be backported up to 2.4.

(cherry picked from commit 9dce88ba2cb73b6c3665b4eeaa74186ebce2c7a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:18 +01:00
Amaury Denoyelle
bc0dd61e23 DOC: quic: add note on performance issue with listener contention
Complete quic4/quic6 bind lines by a note on performance issues due to
receiver socket contention. Suggest to use sharding to improve the
situation.

This should be backported up to 2.6.

(cherry picked from commit 7078fb1f3a11b1af3170e47cbc250387b3eb6ee5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:15 +01:00
Willy Tarreau
9840786232 BUILD: listener: fix build warning on global_listener_rwlock without threads
The global_listener_rwlock was introduced by recent commit 13e86d947
("BUG/MEDIUM: listener: Fix race condition when updating the global mngmt
task"), but it's declared static and is not used when threads are disabled,
thus causing a warning to be emitted in this case. Let's just condition it
to thread usage to shut the warning.

This will need to be backported where the patch above is backported.

(cherry picked from commit 469fa479501f4807d9983ca46618aba3c4ec8cb7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:06 +01:00
Willy Tarreau
51743eea8b BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.

This may be backported as far as 2.2.

(cherry picked from commit 9dc231a6b23fc7d5cf3c233b46e00b9e251325b4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:57 +01:00
Amaury Denoyelle
21301a0d7c MINOR: global: generate random cluster.secret if not defined
If no cluster-secret is defined by the user, a random one is silently
generated.

This ensures that at least QUIC Retry tokens are generated if abnormal
conditions are detected. However, it is advisable to specify it in the
configuration for tokens to be valid even after a reload or across LBs
instances in the same cluster.

This should be backported up to 2.6.

(cherry picked from commit 28ea31c7cb219bf6f85533b3ef57edfecba821d8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:56 +01:00
Amaury Denoyelle
815649ca7b MINOR: quic: report error if force-retry without cluster-secret
QUIC Retry generation relies on global cluster-secret to produce token
valid even after a process restart and across several LBs instances.

Before this patch, Retry is automatically deactivated if no
cluster-secret is provided. This is the case even if a user has
configured a QUIC listener with quic-force-retry. Change this behavior
by now returning an error during configuration parsing. The user must
provide a cluster-secret if quic-force-retry is used.

This shoud be backported up to 2.6.

(cherry picked from commit 996ca7d0fa6a01ef6f4e5c6a9fc511cdcf06afe2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Amaury Denoyelle
5d67b316c3 DOC: configuration: fix quic prefix typo
Replace quicv4/quicv6 -> quic4/quic6 as prefix for bind lines of QUIC
listeners.

This should be backported up to 2.6.

(cherry picked from commit 936c135e05ccee7efdbcf6a5adae37b8c1770b9a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Frédéric Lécaille
38c47fb838 BUG/MAJOR: quic: Crash after discarding packet number spaces
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:

     "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"

This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.

This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.

Thank you to @gabrieltz for having reported this issue in GH #1903.

Must be backported to 2.6.

(cherry picked from commit 74b5f7b31b6c68e220e68cb4a0b302137a9a7362)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Frédéric Lécaille
05e719b7bd BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.

Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)

With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).

Thank you to @gabrieltz for having reported this issue.

Must be backported to 2.6.

(cherry picked from commit 814645f42fae3e6dea994f88aa3b67cf43958dcf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Amaury Denoyelle
dfe7efea2d MINOR: quic: complete traces/debug for handshake
Add more traces to follow CRYPTO data buffering in ncbuf. Offset for
quic_enc_level is now reported for event QUIC_EV_CONN_PRHPKTS. Also
ncb_advance() must never fail so a BUG_ON() statement is here to
guarantee it.

This was useful to track handshake failure reported too often. This is
related to github issue #1903.

This should be backported up to 2.6.

(cherry picked from commit 2f668f0e601a6ae5fa13a6e15b10dce5a42b4167)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
9f64b3f69e BUG/MEDIUM: quic: fix memleak for out-of-order crypto data
Liberate quic_enc_level ncbuf in quic_stream_free(). In most cases, this
will already be done when handshake is completed via
qc_treat_rx_crypto_frms(). However, if a connection is released before
handshake completion, a leak was present without this patch.

Under normal situation, this leak should have been limited due to the
majority of QUIC connection success on handshake. However, another bug
caused handshakes to fail too frequently, especially with chrome client.
This had the side-effect to dramatically increase this memory leak.

This should fix in part github issue #1903.

(cherry picked from commit bc174b2101f5487c9ff79ddc044a2466b7da1036)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
4f6e4c293b BUG/MEDIUM: quic: fix unsuccessful handshakes on ncb_advance error
QUIC handshakes were frequently in error due to haproxy misuse of
ncbuf. This resulted in one of the following scenario :
- handshake rejected with CONNECTION_CLOSE due to overlapping data
  rejected
- CRYPTO data fully received by haproxy but handshake completion signal
  not reported causing the client to emit PING repeatedly before timeout

This was produced because ncb_advance() result was not checked after
providing crypto data to the SSL stack in qc_provide_cdata(). However,
this operation can fail if a too small gap is formed. In the meantime,
quic_enc_level offset was always incremented. In some cases, this caused
next ncb_add() to report rejected overlapping data. In other cases, no
error was reported but SSL stack never received the end of CRYPTO data.

Change slightly the handling of new CRYPTO frames to avoid this bug :
when receiving a CRYPTO frame for the current offset, handle them
directly as previously done only if quic_enc_level ncbuf is empty. In
the other case, copy them to the buffer before treating contiguous data
via qc_treat_rx_crypto_frms().

This change ensures that ncb_advance() operation is now conducted only
in a data block : thus this is guaranteed to never fail.

This bug was easily reproduced with chromium as it fragments CRYPTO
frames randomly in several frames out of order.

This commit has two drawbacks :
- it is slightly less worst on performance because as sometimes even
  data at the current offset will be memcpy
- if a client uses too many fragmented CRYPTO frames, this can cause
  repeated ncb_add() error on gap size. This can be reproduced with
  chrome, albeit with a slighly less frequent rate than the fixed issue.

This change should fix in part github issue #1903.

This must be backported up to 2.6.

(cherry picked from commit ff95f2d44751422c7f05c06d344a83c6dbf40a49)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
bd54596933 MINOR: ncbuf: complete doc for ncb_advance()
ncb_advance() operation may reject the operation if a too small gap is
formed in buffer front. This must be documented to avoid an issue with
it.

This should be backported up to 2.6.

(cherry picked from commit 7f0295f08ae273f3d732436234cddb88b68f511f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Christopher Faulet
71443462c4 BUILD: peers: Remove unused variables
Since 0909f62266 ("BUG/MEDIUM: peers: messages about unkown tables not
correctly ignored"), the 'sc' variable is no longer used in
peer_treat_updatemsg() and peer_treat_definemsg() functions. So, we must
remove them to avoid compilation warning.

This patch must be backported with the commit above.

(cherry picked from commit 4cfdcbbd19071cbf74339a7d26839ca98952b2c7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Emeric Brun
88f8801bfb BUG/MEDIUM: peers: messages about unkown tables not correctly ignored
Table defintion's messages and update messages are not correctly
ignored if the table is not configured on the local peer.

It is a bug because, receiving those messages, the parser
returns an error and the upper layer considers that the state of the
peer's connection is modified (as it is done in the case of protocol
error) and switch immediatly the automate to process the new state.
But, even if message is silently ignored because the connection's
state doesn't change and we continue to process the next message, some
processing remains not performed: for instance the ALIVE flag is not set
on the peer's connection as it should be done after receiving any valid
messages. This results in a shutdown of the connection when timeout
is elapsed as if no message has been received during this delay.

This patch fix the behavior, those messages are now silently ignored
and the upper layer continue the processing as it is done for any valid
messages.

This bug appears with the code re-work of the peers on 2.0 so
it should be backported until this version.

(cherry picked from commit 0909f62266df1f023c2c373f0984a68acacba2a3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
William Lallemand
a054cb61ba BUG/MINOR: ssl: don't initialize the keylog callback when not required
The registering of the keylog callback seems to provoke a loss of
performance. Disable the registration as well as the fetches if
tune.ssl.keylog is off.

Must be backported as far as 2.2.

(cherry picked from commit b60a77b6d0a50c3a006b541908f69d6bd91b3e8c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Christopher Faulet
369c66d040 BUG/MEDIUM: raw-sock: Don't report connection error if something was received
In raw_sock_to_buf(), if a low-level error is reported, we no longer
immediately set an error on the connexion if something was received. This
may happen when a RST is received with data. This way, we let a chance to
the mux to process received data first instead of immediately aborting.

This patch should fix some spurious health-check failures. It is pretty hard
to observe, but with a server immediately returning the response followed by
a RST, without waiting the request, it is possible to have some health-check
errors. For instance, with the following tcploop server:

  tcploop 8000 L Q W N1 A S:"HTTP/1.0 200 OK\r\n\r\n" F K
  ( Accept -> send response -> FIN -> Close)

we can have such strace output:

15:11:21.433005 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 38
15:11:21.433141 fcntl(38, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
15:11:21.433233 setsockopt(38, SOL_TCP, TCP_NODELAY, [1], 4) = 0
15:11:21.433359 setsockopt(38, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
15:11:21.433457 connect(38, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
15:11:21.434215 epoll_ctl(4, EPOLL_CTL_ADD, 38, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=38, u64=38}}) = 0
15:11:21.434468 epoll_wait(4, [{events=EPOLLOUT, data={u32=38, u64=38}}], 200, 21) = 1
15:11:21.434810 recvfrom(38, 0x7f32a83e5020, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
15:11:21.435405 sendto(38, "OPTIONS / HTTP/1.0\r\ncontent-leng"..., 41, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 41
15:11:21.435833 epoll_ctl(4, EPOLL_CTL_MOD, 38, {events=EPOLLIN|EPOLLRDHUP, data={u32=38, u64=38}}) = 0
15:11:21.435907 epoll_wait(4, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=38, u64=38}}], 200, 17) = 1
15:11:21.436024 recvfrom(38, "HTTP/1.0 200 OK\r\n\r\n", 16320, 0, NULL, NULL) = 19
15:11:21.436189 close(38)  = 0
15:11:21.436402 write(2, "[WARNING]  (163564) : Server bac"..., 184[WARNING]  (163564) : Server back-http/www is DOWN, reason: Socket error, check duration: 5ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

The response was received, but it is ignored because an error was reported
too. The error handling must be refactored. But it a titanic stain. Thus,
for now, a good fix is to delay the error report when something was
received. The error will be reported on the next receive, if any.

This patch should fix the issue #1863, but it must be confirmed. At least it
fixes the above example. It must be backported to 2.6. For older versions,
it must be evaluated first.

(cherry picked from commit dfefebcd7a01faab9988d0aa0811d43bdfc7c665)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Aurelien DARRAGON
61882cc68c BUG/MINOR: http_ana/txn: don't re-initialize txn and req var lists
In http_create_txn(): vars_init_head() was performed on both s->vars_txn
and s->var_reqres lists.

But this is wrong, these two lists are already initialized upon stream
creation in stream_new().
Moreover, between stream_new() and http_create_txn(), some variable may
be defined (e.g.: by the frontend), resulting in lists not being empty.

Because of this "extra" list initialization, already defined variables
can be lost.
This causes txn dependant code not being able to access previously defined
variables as well as memory leak because http_destroy_txn() relies on these
lists to perform the purge.

This proved to be the case when a frontend sets variables and lua sample
fetch is used in backend section as described in GH #1935.
Many thanks to Darragh O'Toole for his detailed report.

Removing extra var_init_head (x2) in http_create_txn() to fix the issue.
Adding somme comments in the code in an attempt to prevent future misuses
of s->var_reqres, and s->var_txn lists.

It should be backported in every stable version.
(This is an old bug that seems to exist since 1.6-dev6)

[cf: On 2.0 and 1.8, for the legacy HTTP code, vars_init() are used during
     the TXN cleanup, when the stream is reused. So, these calls must be
     moved from http_init_txn() to http_reset_txn() and not removed.]

(cherry picked from commit 5ad2b642625b89cdf4f5fd26a598fc480abdc806)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Christopher Faulet
9cfdbf2dd7 BUG/MEDIUM: listener: Fix race condition when updating the global mngmt task
It is pretty similar to fbb934da90 ("BUG/MEDIUM: stick-table: fix a race
condition when updating the expiration task"). When the global management
task is running, at the end of its process function, it resets the expire
date by setting it to TICK_ETERNITY. In same time, a listener may reach a
global limit and decides to schedule the task. Thus it is possible to queue
the task and trigger the BUG_ON() on the expire date because its value was
set to TICK_ETERNITY in the means time:

  FATAL: bug condition "task->expire == 0" matched at src/task.c:310
    call trace(12):
    |       0x662de8 [b8 01 00 00 00 c6 00 00]: __task_queue+0xc7/0x11e
    |       0x63b03f [48 b8 04 00 00 00 05 00]: main+0x2535e
    |       0x63ed1a [e9 d2 fd ff ff 48 8b 45]: listener_accept+0xf72/0xfda
    |       0x6a36d3 [eb 01 90 c9 c3 55 48 89]: sock_accept_iocb+0x82/0x87
    |       0x6af22f [48 8b 05 ca f9 13 00 8b]: fd_update_events+0x35a/0x497
    |       0x42a7a8 [89 45 d8 83 7d d8 02 75]: main-0x1eb539
    |       0x6158fb [48 8b 05 e6 06 1c 00 64]: run_poll_loop+0x2e7/0x319
    |       0x615b6c [48 8b 05 ed 65 1d 00 48]: main-0x175
    | 0x7ffff775bded [e9 69 fe ff ff 48 8b 4c]: libc:+0x8cded
    | 0x7ffff77e1370 [48 89 c7 b8 3c 00 00 00]: libc:+0x112370

To fix the bug, a RW lock is introduced. It is used to fix the race
condition. A read lock is taken when the task is scheduled, in
listener_accpet() and a write lock is used at the end of process function to
set the expire date to TICK_ETERNITY. This lock should not be used very
often and most of time by "readers". So, the impact should be really
limited.

This patch should fix the issue #1930. It must be backported as far as 1.8
with some cautions because the code has evolved a lot since then.

(cherry picked from commit 13e86d947d1536dbe6d8f4057388727528546ae6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Willy Tarreau
c65d18df9e BUG/MINOR: pool/cli: use ullong to report total pool usage in bytes
As noticed by Gabriel Tzagkarakis in issue #1903, the total pool size
in bytes is historically still in 32 bits, but at least we should report
the product of the number of objects and their size in 64 bits so that
the value doesn't wrap around 4G.

This may be backported to all versions.

(cherry picked from commit 0c5e9896c79cbec3507894ed33a8553127bc170e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
566e8d57dc BUILD: quic: fix dubious 0-byte overflow on qc_release_lost_pkts
With GCC 12.2.0 and O2 optimization activated, compiler reports the
following warning for qc_release_lost_pkts().

In function ‘quic_tx_packet_refdec’,
    inlined from ‘qc_release_lost_pkts.constprop’ at src/quic_conn.c:2056:3:
include/haproxy/atomic.h:320:41: error: ‘__atomic_sub_fetch_4’ writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
  320 | #define HA_ATOMIC_SUB_FETCH(val, i)     __atomic_sub_fetch(val, i, __ATOMIC_SEQ_CST)
      |                                         ^~~~~~~~~~~~~~~~~~
include/haproxy/quic_conn.h:499:14: note: in expansion of macro ‘HA_ATOMIC_SUB_FETCH’
  499 |         if (!HA_ATOMIC_SUB_FETCH(&pkt->refcnt, 1)) {
      |              ^~~~~~~~~~~~~~~~~~~

GCC thinks that quic_tx_packet_refdec() can be called with a NULL
argument from qc_release_lost_pkts() with <oldest_lost> as arg.

This warning is a false positive as <oldest_lost> cannot be NULL in
qc_release_lost_pkts() at this stage. This is due to the previous check
to ensure that <pkts> list is not empty.

This warning is silenced by using ALREADY_CHECKED() macro.

This should be backported up to 2.6.

This should fix github issue #1852.

(cherry picked from commit 3a72ba2aede63e49e53fe22b80e07c9d3c8f72e3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Willy Tarreau
857095c121 BUG/MEDIUM: ring: fix creation of server in uninitialized ring
If a "ring" section initialization fails (e.g. due to a duplicate name,
invalid chars, or missing memory), any subsequent "server" statement that
appears in the same section will crash the config parser by dereferencing
the currently NULL cfg_sink. E.g:

    ring x
    ring x                 # fails on "already exists"
       server srv 1.1.1.1  # crashes on cfg_sink==NULL

All other statements have a test for this but "server" was missing it,
so this patch adds it.

Thanks to Joel Hutchinson for reporting this issue.

This must be backported as far as 2.2.

(cherry picked from commit 1b662aabbfa32fb6ddeff4ff5f0e3031f12dafd3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Willy Tarreau
abab4bd4d0 DOC: config: fix alphabetical ordering of global section
the global section keywords were seriously misordered, and it's visible
that some mistakes have induced other ones over time, so it was about
time to fix this. Roughly 20% of the keywords were misplaced.

This commit only reordered the keywords index and their description,
nothing else was changed. It might be backported because it's a real
pain to find certain options there.

(cherry picked from commit 8e6ad2548ce933ef52113b20f2766d66d16f3e39)
[cf: Context adjustment]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:20 +01:00
Christopher Faulet
5d05aa64de REG-TESTS: cache: Remove T-E header for 304-Not-Modified responses
VTEST does not properly handle 304-Not-Modified responses. If a
Transfer-Encoding header (and probably a Content-Lenght header too), it
waits for a body. Waiting for a fix, the Transfer-Encoding encoding of
cached responses in 2 VTEST scripts are removed.

Note it is now an issue because of a fix in the H1 multiplexer :

  * 226082d13a "BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers"

This patch must be backported with the above commit.

(cherry picked from commit a0e1a87948cc1858e82c656d18344051838a4af0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Mickael Torres
611bfc4356 BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers
HEAD answers should not contain any body data. Currently when a
"transfer-encoding: chunked" header is returned, a last null-chunk is added to
the answer. Some clients choke on it and fail when trying to reuse the
connection. Check that the response should not be body-less before sending the
null-chunk.

This patch should fix #1932. It must be backported as far as 2.4.

(cherry picked from commit 226082d13a0d0b83114e933b3a63916b18f9824b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Christopher Faulet
7dbd951a95 BUG/MEDIUM: mux-fcgi: Avoid value length overflow when it doesn't fit at once
When the request data are copied in a mbuf, if the free space is too small
to copy all data at once, the data length is shortened. When this is
performed, we reserve the size of the STDIN recod header and eventually the
same for the empty STDIN record if it is the last HTX block of the request.

However, there is no test to be sure the free space is large enough. Thus,
on this special case, when the mbuf is almost full, it is possible to
overflow the value length. Because of this bug, it is possible to experience
crashes from time to time.

This patch should fix the issue #1923. It must be backported as far as 2.4.

(cherry picked from commit 52fd8a1b7b8a5a328cb5f4fabd42d2ca7af78760)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00