Commit Graph

18386 Commits

Author SHA1 Message Date
Willy Tarreau
0b80fa9093 DOC: config: refer to section about quoting in the "add_item" converter
As requested by Nick in issue #1719, let's add a reference to the section
about quoting there, since add_item() will often be used with commas and
it's easy to mess up.

(cherry picked from commit b143d110bf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:28:29 +01:00
Willy Tarreau
e3bafd5980 DOC: config: provide some configuration hints for "http-reuse"
This adds some configuration hints regarding various workloads that do
not manage to achieve high reuse rates due to too low a global maxconn
or thread groups.

This fixes github issue #1472.

(cherry picked from commit 44fce8bd73)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-12-02 15:28:15 +01:00
Christopher Faulet
f7acbd8407 Revert "BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action"
This reverts commit 7533b98b8a.

This fix is reverted for now because it may introduce issues with some
config. So we want to announce the possible breakage in the 2.6.7 and
include the fix into the 2.6.8. This way, users will have some time to
modify their configuration.
2022-12-02 15:22:28 +01:00
William Lallemand
b1351c1a05 BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.

This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.

Must be backported in 2.6.

(cherry picked from commit 0a2d63236c)
[wla: context changed in httpclient_precheck()]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:58:29 +01:00
William Lallemand
d371520001 MINOR: ssl: forgotten newline in error messages on ca-file
Add forgotten newlines in ssl_store_load_ca_from_buf() error messages.

(cherry picked from commit 3992f55ff3)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:52:15 +01:00
William Lallemand
854e30d7d6 MINOR: ssl: enhance ca-file error emitting
Enhance the errors and warnings when trying to load a ca-file with
ssl_store_load_locations_file().

Add errors from ERR_get_error() and strerror to give more information to
the user.

(cherry picked from commit 0f17ab2fdd)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-25 09:51:34 +01:00
Amaury Denoyelle
2c4ad1e89d BUG/MEDIUM: quic: fix datagram dropping on queueing failed
After reading a datagram, it is enqueud for the thread attached to the
DCID. This is done via quic_lstnr_dgram_dispatch() function. If this
step fails, we remove the datagram from the buffer of quic_receiver_buf.

This step is faulty because we use b_del() instead of b_sub(). If
quic_receiver_buf was not empty, we will remove content from another
datagram while leaving the content of the last read datagram. This
probably produces valid datagram dropping and may even result in crash.

As stated, this bug can only happen if qc_lstnr_dgram_dispatch() fails
which happen on two occaions :
* on quic_dgram allocation failure, which should be pretty rare
* on datagram labelled as invalid for QUIC protocol. This may happen
  more frequently depending on the network conditions. Thus, this bug
  has been labelled as a medium one.

This should be backported up to 2.6.

(cherry picked from commit 9875f024ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:30:35 +01:00
Aurelien DARRAGON
d8a7a1af58 DOC: configuration.txt: fix typo in table_idle signature
An extra ',' was mistakenly added in table_idle converter signature
with commit ed36968 ("DOC: configuration.txt: add default_value for
table_idle signature").

(cherry picked from commit fd766ddfaf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:29:18 +01:00
Christopher Faulet
df6e94633e BUG/MINOR: mux-h1: Fix handling of 408-Request-Time-Out
When a timeout is detected waiting for the request, a 408-Request-Time-Out
response is sent. However, an error was introduced by commit 6858d76cd3
("BUG/MINOR: mux-h1: Obey dontlognull option for empty requests"). Instead
of inhibiting the log message, the option was stopping the error sending.

Of course, we must do the opposite.

This patch must be backported as far as 2.4.

(cherry picked from commit 227424450c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:29:04 +01:00
Christopher Faulet
9be48aa215 BUILD: ssl-sock: Silent error about NULL deref in ssl_sock_bind_verifycbk()
In ssl_sock_bind_verifycbk(), when compiled without QUIC support, the
compiler may report an error during compilation about a possible NULL
dereference:

src/ssl_sock.c: In function ‘ssl_sock_bind_verifycbk’:
src/ssl_sock.c:1738:12: error: potential null pointer dereference [-Werror=null-dereference]
 1738 |         ctx->xprt_st |= SSL_SOCK_ST_FL_VERIFY_DONE;
      |         ~~~^~~~~~~~~

A BUG_ON() was addeded because it must never happen. But when compiled
without DEBUG_STRICT, there is nothing to help the compiler. Thus
ALREADY_CHECKED() macro is used. The ssl-sock context and the bind config
are concerned.

This patch must be backported to 2.6.

(cherry picked from commit 881cce9f13)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:28:57 +01:00
Aurelien DARRAGON
2c11d36960 DOC: configuration.txt: add default_value for table_idle signature
table_idle converter takes optional default_value argument.
The documentation correctly describes this usage but <default_value> was
missing in the converter signature.

It should be backported with bbeec37b3 ("MINOR: stick-table:
Add table_expire() and table_idle() new converters")

(cherry picked from commit ed36968f16)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:28:45 +01:00
Christopher Faulet
9d7b6975f7 BUILD: http-htx: Silent build error about a possible NULL start-line
In http_replace_req_uri(), if the URI was successfully replaced, it means
the start-line exists. However, the compiler reports an error about a
possible NULL pointer dereference:

src/http_htx.c: In function ‘http_replace_req_uri’:
src/http_htx.c:392:19: error: potential null pointer dereference [-Werror=null-dereference]
  392 |         sl->flags &= ~HTX_SL_F_NORMALIZED_URI;

So, ALREADY_CHECKED() macro is used to silent the build error. This patch
must be backported with 84cdbe478a ("BUG/MINOR: http-htx: Don't consider an
URI as normalized after a set-uri action").

(cherry picked from commit 92c2de1a06)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:28:29 +01:00
Christopher Faulet
7533b98b8a BUG/MINOR: http-htx: Don't consider an URI as normalized after a set-uri action
An abosulte URI is marked as normalized if it comes from an H2 client. This
way, we know we can send a relative URI to an H1 server. But, after a
set-uri action, the URI must no longer be considered as normalized.
Otherwise there is no way to send an absolute URI on the server side.

If it is important to update a normalized absolute URI without altering this
property, the host, path and/or query-string must be set separatly.

This patch should fix the issue #1938. It should be backported as far as
2.4.

(cherry picked from commit 84cdbe478a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:27:47 +01:00
Aurelien DARRAGON
0a069e2482 BUG/MINOR: log: fix parse_log_message rfc5424 size check
In parse_log_message(), if log is rfc5424 compliant, p pointer
is incremented and size is not. However size is still used in further
checks as if p pointer was not incremented.

This could lead to logic error or buffer overflow if input buf is not
null-terminated.

Fixing this by making sure size is up to date where it is needed.

It could be backported up to 2.4.

(cherry picked from commit ab9efc25f0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:23 +01:00
Aurelien DARRAGON
c33990458c BUG/MINOR: cfgparse-listen: fix ebpt_next_dup pointer dereference on proxy "from" inheritance
ebpt_next_dup() was used 2 times in a row but only the first call was
checked against NULL, probably assuming that the 2 calls always yield the
same result here.

gcc is not OK with that, and it should be safer to store the result of
the first call in a temporary var to dereference it once checked against NULL.

This should fix GH #1869.
Thanks to Ilya for reporting this issue.

It may be backported up to 2.4.

(cherry picked from commit 9dce88ba2c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:18 +01:00
Amaury Denoyelle
bc0dd61e23 DOC: quic: add note on performance issue with listener contention
Complete quic4/quic6 bind lines by a note on performance issues due to
receiver socket contention. Suggest to use sharding to improve the
situation.

This should be backported up to 2.6.

(cherry picked from commit 7078fb1f3a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:15 +01:00
Willy Tarreau
9840786232 BUILD: listener: fix build warning on global_listener_rwlock without threads
The global_listener_rwlock was introduced by recent commit 13e86d947
("BUG/MEDIUM: listener: Fix race condition when updating the global mngmt
task"), but it's declared static and is not used when threads are disabled,
thus causing a warning to be emitted in this case. Let's just condition it
to thread usage to shut the warning.

This will need to be backported where the patch above is backported.

(cherry picked from commit 469fa47950)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:26:06 +01:00
Willy Tarreau
51743eea8b BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.

This may be backported as far as 2.2.

(cherry picked from commit 9dc231a6b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:57 +01:00
Amaury Denoyelle
21301a0d7c MINOR: global: generate random cluster.secret if not defined
If no cluster-secret is defined by the user, a random one is silently
generated.

This ensures that at least QUIC Retry tokens are generated if abnormal
conditions are detected. However, it is advisable to specify it in the
configuration for tokens to be valid even after a reload or across LBs
instances in the same cluster.

This should be backported up to 2.6.

(cherry picked from commit 28ea31c7cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:56 +01:00
Amaury Denoyelle
815649ca7b MINOR: quic: report error if force-retry without cluster-secret
QUIC Retry generation relies on global cluster-secret to produce token
valid even after a process restart and across several LBs instances.

Before this patch, Retry is automatically deactivated if no
cluster-secret is provided. This is the case even if a user has
configured a QUIC listener with quic-force-retry. Change this behavior
by now returning an error during configuration parsing. The user must
provide a cluster-secret if quic-force-retry is used.

This shoud be backported up to 2.6.

(cherry picked from commit 996ca7d0fa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Amaury Denoyelle
5d67b316c3 DOC: configuration: fix quic prefix typo
Replace quicv4/quicv6 -> quic4/quic6 as prefix for bind lines of QUIC
listeners.

This should be backported up to 2.6.

(cherry picked from commit 936c135e05)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Frédéric Lécaille
38c47fb838 BUG/MAJOR: quic: Crash after discarding packet number spaces
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:

     "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"

This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.

This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.

Thank you to @gabrieltz for having reported this issue in GH #1903.

Must be backported to 2.6.

(cherry picked from commit 74b5f7b31b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Frédéric Lécaille
05e719b7bd BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.

Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)

With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).

Thank you to @gabrieltz for having reported this issue.

Must be backported to 2.6.

(cherry picked from commit 814645f42f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:23 +01:00
Amaury Denoyelle
dfe7efea2d MINOR: quic: complete traces/debug for handshake
Add more traces to follow CRYPTO data buffering in ncbuf. Offset for
quic_enc_level is now reported for event QUIC_EV_CONN_PRHPKTS. Also
ncb_advance() must never fail so a BUG_ON() statement is here to
guarantee it.

This was useful to track handshake failure reported too often. This is
related to github issue #1903.

This should be backported up to 2.6.

(cherry picked from commit 2f668f0e60)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
9f64b3f69e BUG/MEDIUM: quic: fix memleak for out-of-order crypto data
Liberate quic_enc_level ncbuf in quic_stream_free(). In most cases, this
will already be done when handshake is completed via
qc_treat_rx_crypto_frms(). However, if a connection is released before
handshake completion, a leak was present without this patch.

Under normal situation, this leak should have been limited due to the
majority of QUIC connection success on handshake. However, another bug
caused handshakes to fail too frequently, especially with chrome client.
This had the side-effect to dramatically increase this memory leak.

This should fix in part github issue #1903.

(cherry picked from commit bc174b2101)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
4f6e4c293b BUG/MEDIUM: quic: fix unsuccessful handshakes on ncb_advance error
QUIC handshakes were frequently in error due to haproxy misuse of
ncbuf. This resulted in one of the following scenario :
- handshake rejected with CONNECTION_CLOSE due to overlapping data
  rejected
- CRYPTO data fully received by haproxy but handshake completion signal
  not reported causing the client to emit PING repeatedly before timeout

This was produced because ncb_advance() result was not checked after
providing crypto data to the SSL stack in qc_provide_cdata(). However,
this operation can fail if a too small gap is formed. In the meantime,
quic_enc_level offset was always incremented. In some cases, this caused
next ncb_add() to report rejected overlapping data. In other cases, no
error was reported but SSL stack never received the end of CRYPTO data.

Change slightly the handling of new CRYPTO frames to avoid this bug :
when receiving a CRYPTO frame for the current offset, handle them
directly as previously done only if quic_enc_level ncbuf is empty. In
the other case, copy them to the buffer before treating contiguous data
via qc_treat_rx_crypto_frms().

This change ensures that ncb_advance() operation is now conducted only
in a data block : thus this is guaranteed to never fail.

This bug was easily reproduced with chromium as it fragments CRYPTO
frames randomly in several frames out of order.

This commit has two drawbacks :
- it is slightly less worst on performance because as sometimes even
  data at the current offset will be memcpy
- if a client uses too many fragmented CRYPTO frames, this can cause
  repeated ncb_add() error on gap size. This can be reproduced with
  chrome, albeit with a slighly less frequent rate than the fixed issue.

This change should fix in part github issue #1903.

This must be backported up to 2.6.

(cherry picked from commit ff95f2d447)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
bd54596933 MINOR: ncbuf: complete doc for ncb_advance()
ncb_advance() operation may reject the operation if a too small gap is
formed in buffer front. This must be documented to avoid an issue with
it.

This should be backported up to 2.6.

(cherry picked from commit 7f0295f08a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Christopher Faulet
71443462c4 BUILD: peers: Remove unused variables
Since 0909f62266 ("BUG/MEDIUM: peers: messages about unkown tables not
correctly ignored"), the 'sc' variable is no longer used in
peer_treat_updatemsg() and peer_treat_definemsg() functions. So, we must
remove them to avoid compilation warning.

This patch must be backported with the commit above.

(cherry picked from commit 4cfdcbbd19)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Emeric Brun
88f8801bfb BUG/MEDIUM: peers: messages about unkown tables not correctly ignored
Table defintion's messages and update messages are not correctly
ignored if the table is not configured on the local peer.

It is a bug because, receiving those messages, the parser
returns an error and the upper layer considers that the state of the
peer's connection is modified (as it is done in the case of protocol
error) and switch immediatly the automate to process the new state.
But, even if message is silently ignored because the connection's
state doesn't change and we continue to process the next message, some
processing remains not performed: for instance the ALIVE flag is not set
on the peer's connection as it should be done after receiving any valid
messages. This results in a shutdown of the connection when timeout
is elapsed as if no message has been received during this delay.

This patch fix the behavior, those messages are now silently ignored
and the upper layer continue the processing as it is done for any valid
messages.

This bug appears with the code re-work of the peers on 2.0 so
it should be backported until this version.

(cherry picked from commit 0909f62266)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
William Lallemand
a054cb61ba BUG/MINOR: ssl: don't initialize the keylog callback when not required
The registering of the keylog callback seems to provoke a loss of
performance. Disable the registration as well as the fetches if
tune.ssl.keylog is off.

Must be backported as far as 2.2.

(cherry picked from commit b60a77b6d0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Christopher Faulet
369c66d040 BUG/MEDIUM: raw-sock: Don't report connection error if something was received
In raw_sock_to_buf(), if a low-level error is reported, we no longer
immediately set an error on the connexion if something was received. This
may happen when a RST is received with data. This way, we let a chance to
the mux to process received data first instead of immediately aborting.

This patch should fix some spurious health-check failures. It is pretty hard
to observe, but with a server immediately returning the response followed by
a RST, without waiting the request, it is possible to have some health-check
errors. For instance, with the following tcploop server:

  tcploop 8000 L Q W N1 A S:"HTTP/1.0 200 OK\r\n\r\n" F K
  ( Accept -> send response -> FIN -> Close)

we can have such strace output:

15:11:21.433005 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 38
15:11:21.433141 fcntl(38, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
15:11:21.433233 setsockopt(38, SOL_TCP, TCP_NODELAY, [1], 4) = 0
15:11:21.433359 setsockopt(38, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
15:11:21.433457 connect(38, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
15:11:21.434215 epoll_ctl(4, EPOLL_CTL_ADD, 38, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=38, u64=38}}) = 0
15:11:21.434468 epoll_wait(4, [{events=EPOLLOUT, data={u32=38, u64=38}}], 200, 21) = 1
15:11:21.434810 recvfrom(38, 0x7f32a83e5020, 16320, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
15:11:21.435405 sendto(38, "OPTIONS / HTTP/1.0\r\ncontent-leng"..., 41, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 41
15:11:21.435833 epoll_ctl(4, EPOLL_CTL_MOD, 38, {events=EPOLLIN|EPOLLRDHUP, data={u32=38, u64=38}}) = 0
15:11:21.435907 epoll_wait(4, [{events=EPOLLIN|EPOLLERR|EPOLLHUP|EPOLLRDHUP, data={u32=38, u64=38}}], 200, 17) = 1
15:11:21.436024 recvfrom(38, "HTTP/1.0 200 OK\r\n\r\n", 16320, 0, NULL, NULL) = 19
15:11:21.436189 close(38)  = 0
15:11:21.436402 write(2, "[WARNING]  (163564) : Server bac"..., 184[WARNING]  (163564) : Server back-http/www is DOWN, reason: Socket error, check duration: 5ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

The response was received, but it is ignored because an error was reported
too. The error handling must be refactored. But it a titanic stain. Thus,
for now, a good fix is to delay the error report when something was
received. The error will be reported on the next receive, if any.

This patch should fix the issue #1863, but it must be confirmed. At least it
fixes the above example. It must be backported to 2.6. For older versions,
it must be evaluated first.

(cherry picked from commit dfefebcd7a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Aurelien DARRAGON
61882cc68c BUG/MINOR: http_ana/txn: don't re-initialize txn and req var lists
In http_create_txn(): vars_init_head() was performed on both s->vars_txn
and s->var_reqres lists.

But this is wrong, these two lists are already initialized upon stream
creation in stream_new().
Moreover, between stream_new() and http_create_txn(), some variable may
be defined (e.g.: by the frontend), resulting in lists not being empty.

Because of this "extra" list initialization, already defined variables
can be lost.
This causes txn dependant code not being able to access previously defined
variables as well as memory leak because http_destroy_txn() relies on these
lists to perform the purge.

This proved to be the case when a frontend sets variables and lua sample
fetch is used in backend section as described in GH #1935.
Many thanks to Darragh O'Toole for his detailed report.

Removing extra var_init_head (x2) in http_create_txn() to fix the issue.
Adding somme comments in the code in an attempt to prevent future misuses
of s->var_reqres, and s->var_txn lists.

It should be backported in every stable version.
(This is an old bug that seems to exist since 1.6-dev6)

[cf: On 2.0 and 1.8, for the legacy HTTP code, vars_init() are used during
     the TXN cleanup, when the stream is reused. So, these calls must be
     moved from http_init_txn() to http_reset_txn() and not removed.]

(cherry picked from commit 5ad2b64262)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Christopher Faulet
9cfdbf2dd7 BUG/MEDIUM: listener: Fix race condition when updating the global mngmt task
It is pretty similar to fbb934da90 ("BUG/MEDIUM: stick-table: fix a race
condition when updating the expiration task"). When the global management
task is running, at the end of its process function, it resets the expire
date by setting it to TICK_ETERNITY. In same time, a listener may reach a
global limit and decides to schedule the task. Thus it is possible to queue
the task and trigger the BUG_ON() on the expire date because its value was
set to TICK_ETERNITY in the means time:

  FATAL: bug condition "task->expire == 0" matched at src/task.c:310
    call trace(12):
    |       0x662de8 [b8 01 00 00 00 c6 00 00]: __task_queue+0xc7/0x11e
    |       0x63b03f [48 b8 04 00 00 00 05 00]: main+0x2535e
    |       0x63ed1a [e9 d2 fd ff ff 48 8b 45]: listener_accept+0xf72/0xfda
    |       0x6a36d3 [eb 01 90 c9 c3 55 48 89]: sock_accept_iocb+0x82/0x87
    |       0x6af22f [48 8b 05 ca f9 13 00 8b]: fd_update_events+0x35a/0x497
    |       0x42a7a8 [89 45 d8 83 7d d8 02 75]: main-0x1eb539
    |       0x6158fb [48 8b 05 e6 06 1c 00 64]: run_poll_loop+0x2e7/0x319
    |       0x615b6c [48 8b 05 ed 65 1d 00 48]: main-0x175
    | 0x7ffff775bded [e9 69 fe ff ff 48 8b 4c]: libc:+0x8cded
    | 0x7ffff77e1370 [48 89 c7 b8 3c 00 00 00]: libc:+0x112370

To fix the bug, a RW lock is introduced. It is used to fix the race
condition. A read lock is taken when the task is scheduled, in
listener_accpet() and a write lock is used at the end of process function to
set the expire date to TICK_ETERNITY. This lock should not be used very
often and most of time by "readers". So, the impact should be really
limited.

This patch should fix the issue #1930. It must be backported as far as 1.8
with some cautions because the code has evolved a lot since then.

(cherry picked from commit 13e86d947d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Willy Tarreau
c65d18df9e BUG/MINOR: pool/cli: use ullong to report total pool usage in bytes
As noticed by Gabriel Tzagkarakis in issue #1903, the total pool size
in bytes is historically still in 32 bits, but at least we should report
the product of the number of objects and their size in 64 bits so that
the value doesn't wrap around 4G.

This may be backported to all versions.

(cherry picked from commit 0c5e9896c7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Amaury Denoyelle
566e8d57dc BUILD: quic: fix dubious 0-byte overflow on qc_release_lost_pkts
With GCC 12.2.0 and O2 optimization activated, compiler reports the
following warning for qc_release_lost_pkts().

In function ‘quic_tx_packet_refdec’,
    inlined from ‘qc_release_lost_pkts.constprop’ at src/quic_conn.c:2056:3:
include/haproxy/atomic.h:320:41: error: ‘__atomic_sub_fetch_4’ writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
  320 | #define HA_ATOMIC_SUB_FETCH(val, i)     __atomic_sub_fetch(val, i, __ATOMIC_SEQ_CST)
      |                                         ^~~~~~~~~~~~~~~~~~
include/haproxy/quic_conn.h:499:14: note: in expansion of macro ‘HA_ATOMIC_SUB_FETCH’
  499 |         if (!HA_ATOMIC_SUB_FETCH(&pkt->refcnt, 1)) {
      |              ^~~~~~~~~~~~~~~~~~~

GCC thinks that quic_tx_packet_refdec() can be called with a NULL
argument from qc_release_lost_pkts() with <oldest_lost> as arg.

This warning is a false positive as <oldest_lost> cannot be NULL in
qc_release_lost_pkts() at this stage. This is due to the previous check
to ensure that <pkts> list is not empty.

This warning is silenced by using ALREADY_CHECKED() macro.

This should be backported up to 2.6.

This should fix github issue #1852.

(cherry picked from commit 3a72ba2aed)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Willy Tarreau
857095c121 BUG/MEDIUM: ring: fix creation of server in uninitialized ring
If a "ring" section initialization fails (e.g. due to a duplicate name,
invalid chars, or missing memory), any subsequent "server" statement that
appears in the same section will crash the config parser by dereferencing
the currently NULL cfg_sink. E.g:

    ring x
    ring x                 # fails on "already exists"
       server srv 1.1.1.1  # crashes on cfg_sink==NULL

All other statements have a test for this but "server" was missing it,
so this patch adds it.

Thanks to Joel Hutchinson for reporting this issue.

This must be backported as far as 2.2.

(cherry picked from commit 1b662aabbf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:22 +01:00
Willy Tarreau
abab4bd4d0 DOC: config: fix alphabetical ordering of global section
the global section keywords were seriously misordered, and it's visible
that some mistakes have induced other ones over time, so it was about
time to fix this. Roughly 20% of the keywords were misplaced.

This commit only reordered the keywords index and their description,
nothing else was changed. It might be backported because it's a real
pain to find certain options there.

(cherry picked from commit 8e6ad2548c)
[cf: Context adjustment]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-25 09:25:20 +01:00
Christopher Faulet
5d05aa64de REG-TESTS: cache: Remove T-E header for 304-Not-Modified responses
VTEST does not properly handle 304-Not-Modified responses. If a
Transfer-Encoding header (and probably a Content-Lenght header too), it
waits for a body. Waiting for a fix, the Transfer-Encoding encoding of
cached responses in 2 VTEST scripts are removed.

Note it is now an issue because of a fix in the H1 multiplexer :

  * 226082d13a "BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers"

This patch must be backported with the above commit.

(cherry picked from commit a0e1a87948)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Mickael Torres
611bfc4356 BUG/MINOR: mux-h1: Do not send a last null chunk on body-less answers
HEAD answers should not contain any body data. Currently when a
"transfer-encoding: chunked" header is returned, a last null-chunk is added to
the answer. Some clients choke on it and fail when trying to reuse the
connection. Check that the response should not be body-less before sending the
null-chunk.

This patch should fix #1932. It must be backported as far as 2.4.

(cherry picked from commit 226082d13a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Christopher Faulet
7dbd951a95 BUG/MEDIUM: mux-fcgi: Avoid value length overflow when it doesn't fit at once
When the request data are copied in a mbuf, if the free space is too small
to copy all data at once, the data length is shortened. When this is
performed, we reserve the size of the STDIN recod header and eventually the
same for the empty STDIN record if it is the last HTX block of the request.

However, there is no test to be sure the free space is large enough. Thus,
on this special case, when the mbuf is almost full, it is possible to
overflow the value length. Because of this bug, it is possible to experience
crashes from time to time.

This patch should fix the issue #1923. It must be backported as far as 2.4.

(cherry picked from commit 52fd8a1b7b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Christopher Faulet
df5fcf4146 BUG/MINOR: mux-fcgi: Be sure to send empty STDING record in case of zero-copy
When the last HTX DATA block was copied in zero-copy, the empty STDIN
record, marking the end of the request data was never sent. Thanks to this
patch, it is now sent.

This patch must be backported as far as 2.4.

(cherry picked from commit e8c7fb3588)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Christopher Faulet
24e3037b52 BUG/MINOR: resolvers: Set port before IP address when processing SRV records
For a server subject to SRV resolution, when the server's address is set,
its dynamic cookie, if any, and its server key are computed. Both are based
on the ip/port pair. However, this happens before the server's port is
set. Thus the port is equal to 0 at this stage. It is a problem if several
servers share the same IP but with different ports because they will share
the same dynamic cookie and the same server key, disturbing this way the
connection persistency and the session stickiness.

This patch must be backported as far as 2.2.

(cherry picked from commit 2364b39984)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Christopher Faulet
98ccf10af1 BUG/MINOR: resolvers: Don't wait periodic resolution on healthcheck failure
DNS resoltions may be triggered via a "do-resolve" action or when a connection
failure is experienced during a healthcheck. Cached valid responses are used, if
possible. But if the entry is expired or if there is no valid response, a new
reolution should be performed. However, an resolution is only performed if the
"resolve" timeout is expired. Thus, when this comes from a healthcheck, it means
no extra resolution is performed at all.

Now, when the resolution is performed for a server (SRV or SRVEQ) and no valid
response is found, the resolution timer is reset (last_resolution is set to
TICK_ETERNITY). Of course, it is only performed if no resolution is already
running.

Note that this feature was broken 5 years ago when the resolvers code was
refactored (67957bd59e).

This patch should fix the issue #1906. It affects all stable versions. However,
it is probably a good idea to not backport it too far (2.6, maybe 2.4) and with
some delay.

(cherry picked from commit 68a61b6321)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Christopher Faulet
de3a7ba373 BUG/MINOR: http-htx: Fix error handling during parsing http replies
When an error is triggered during arguments parsing of an http reply (for
instance, from a "return" rule), while a log-format body was expected but
not evaluated yet, HAproxy crashes when the body log-format string is
released because it was not properly initialized.

The list used for the log-format string must be initialized earlier.

This patch should fix the issue #1925. It must be backported as far as 2.2.

(cherry picked from commit 5a3d9a77e2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Aurelien DARRAGON
a7d82157ef BUG/MEDIUM: wdt/clock: properly handle early task hangs
In ae053b30 - BUG/MEDIUM: wdt: don't trigger the watchdog when p is unitialized:
	wdt is not triggering until prev_cpu_time
	is initialized to prevent unexpected process
	termination.

Unfortunately this is not enough, some tasks could start
immediately after process startup, and in such cases
prev_cpu_time could be uninitialized, because
prev_cpu_time is set after the polling loop while
process_runnable_tasks() is executed before the polling loop.

It happens to be the case with lua tasks registered using
register_task function from lua script.

Those tasks are registered in early init stage of haproxy and
they are scheduled to run before the first polling loop,
leading to prev_cpu_time being uninitialized (equals 0)
on the thread when the task is first executed.

Because of this, if such tasks get stuck right away
(e.g: blocking IO) the watchdog won't behave as expected
and the thread will remain stuck indefinitely.
(polling loop for the thread won't run at all as
the thread is already stuck)

To solve this, we're now making sure that prev_cpu_time is first
set before any tasks are processed on the thread.
This is done by setting initial prev_cpu_time value directly
in clock_init_thread_date()

Thanks to Abhijeet Rastogi for reporting this unexpected behavior.

It could be backported in every stable versions.
(everywhere ae053b30 is, because both are related)

(cherry picked from commit 16d6c0cb09)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Willy Tarreau
f41015781b CI: emit the compiler's version in the build reports
Some occasional builds fail only on a specific platform and being able
to figure the exact compiler version used there is crucial. It's not
easy to guess from the rest of the output, so let's add it before the
platform-specific defines, which suit the same needs.

(cherry picked from commit a051816c03)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-18 11:33:37 +01:00
Ilya Shipitsin
4b7c0f54eb CI: enable QUIC for LibreSSL builds
since LibreSSL-3.6.x supports QUIC, let us enable it

(cherry picked from commit 6397c7c55f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-17 16:42:10 +01:00
Ilya Shipitsin
a7c11202a8 CI: switch to the "latest" LibreSSL
LibreSSL-3.6.0 had some regression, it was fixed in 3.6.1, let us
switch back to the latest LibreSSL available

(cherry picked from commit 70b2c72687)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-17 16:42:05 +01:00
Remi Tricot-Le Breton
09ae90d983 BUG/MINOR: ssl: ocsp structure not freed properly in case of error
In case of error, the ocsp item might already be in the ocsp certificate
tree but simply freed instead of destroyed through ssl_sock_free_ocsp.

This patch can be backported to all stable versions.

(cherry picked from commit aa529f776d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-17 16:42:00 +01:00
Remi Tricot-Le Breton
d842a5a47d BUG/MINOR: ssl: Memory leak of AUTHORITY_KEYID struct when loading issuer
When calling ssl_get0_issuer_chain, if akid is not NULL but its keyid
is, then the AUTHORITY_KEYID is not freed.

This patch can be backported to all stable branches.

(cherry picked from commit 1621dc1cc5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
2022-11-17 16:41:55 +01:00