haproxy

Author	SHA1	Message	Date
Willy Tarreau	c984817bb8	MINOR: debug: do not limit backtraces to stuck threads Historically for size limitation reasons, we would only dump the backtrace of stuck threads. The problem is that when triggering a panic or other reasons, we have no backtrace, which effectively limits it to the watchdog timer. It's also visible in "show threads" which used to report backtraces for all threads in 2.4 and displays none nowadays, making its use much more limited. A first approach could be to just dump the thread that triggers the panic (in addition to stuck threads). But that remains quite limited since "show threads" would still display nothing. This patch takes a better approach consisting in dumping all non-idle threads. This way the output is less polluted that with the older approach (no need to dump all those waiting in the poller), and all active threads are visible, in panics as well as in "show threads". As such, the CLI command "debug dev panic" now dmups backtraces again. This is already a benefit which will ease testing of various locations against the ability to resolve useful symbols. (cherry picked from commit 4adb2d864d7e3ca9df1e39beabf7b2ffa5aee35c) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 19:04:38 +01:00
Willy Tarreau	96847724af	MINOR: debug: print gdb hints when crashing To make bug reporting easier for users, when crashing, let's suggest what to do. Typically when a BUG_ON() matches, only the current thread is useful the vast majority of the time, while when the watchdog triggers, all threads are interesting. The messages are printed at the end after the dump. We may adjust these with wiki links in the future is more detailed instructions are relevant. (cherry picked from commit 8f204fa8aeadef3faea4471ba9cfd93d9d168960) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 19:04:38 +01:00
Willy Tarreau	2913ab11dc	MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name These functions return a symbolic error code such as ECONNRESET to keep logs compact while making them human-readable. It's a good alternative to the numeric code in that it's more expressive, and a good one to the full message since it's shorter and more precise (some codes even match errno names). The doc was updated so that the symbolic names appear in the table. It could be useful to backport this feature to help with troubleshooting some issues, though backporting the doc might possibly be more annoying in case users have local patches already, so maybe the table update does not need to be backported in this case. (cherry picked from commit 601b34fe7bd50c733a437f26817580bbd56c8d56) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 19:04:38 +01:00
Willy Tarreau	3b36ac5726	MINOR: rawsock: set connection error codes when returning from recv/send/splice For a long time the errno values returned by recv/send/splice() were not translated to connection error codes. There are not that many eligible and having them would help a lot when debugging some complex issues where logs disagree with network traces. Let's add them now. (cherry picked from commit 822d82caf4165f0f6da681737c7e3db17d01f599) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 19:04:38 +01:00
Willy Tarreau	6200536920	MINOR: connection: add more connection error codes to cover common errno While we get reports of connection setup errors in fc_err/bc_err, we don't have the equivalent for the recv/send/splice syscalls. Let's add provisions for new codes that cover the common errno values that recv/send/splice can return, i.e. ECONNREFUSED, ENOMEM, EBADF, EFAULT, EINVAL, ENOTCONN, ENOTSOCK, ENOBUFS, EPIPE. We also add a special case for when the poller reported the error itself. It's worth noting that EBADF/EFAULT/EINVAL will generally indicate serious bugs in the code and should not be reported. The only thing is that it's quite hard to forcefully (and reliably) trigger these errors in automated tests as the timing is critical. Using iptables to manually reset established connections in the middle of large transfers at least permits to see some ECONNRESET and/or EPIPE, but the other ones are harder to trigger. (cherry picked from commit 00c383ff65c6378327382d2c055f66efb098498d) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 19:04:38 +01:00
Christopher Faulet	aa35557e76	BUG/MINOR: stats: Fix the name for the total number of streams created Because of a copy/paste error, CurrStreams was reused by mistake. It should be "CumStreams" No backports needed. (cherry picked from commit 131b877565db423930909f0c26f25e000cbd6e3b) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 18:59:58 +01:00
Christopher Faulet	acc009f882	MINOR: stream/stats: Expose the total number of streams ever created in stats A shared counter is added in the thread context to track the total number of streams created on the thread. This number is then reported in stats. It will be a useful information to diagnose some bugs. (cherry picked from commit 273d322b6fa8117423bbdc9b818002563d4fd3a3) [wt: ctx adj in tinfo-t] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 18:59:58 +01:00
Christopher Faulet	fb9c53581b	MINOR: stream/stats: Expose the current number of streams in stats A shared counter is added in the thread context to track the current number of streams. This number is then reported in stats. It will be a useful information to diagnose some bugs. (cherry picked from commit 18ee22ff766bd7399947af3be2b512ac5827b3c8) [wt: adj ctx in tinfo-t] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 18:57:42 +01:00
Valentine Krasnobaeva	5ca7eb5e84	MINOR: cli/debug: show dev: add cmdline and version 'show dev' command is very convenient to obtain haproxy debugging information, while process is run in container. Let's extend its output with version and cmdline. cmdline is useful in a way, as it shows absolute binary path and its arguments, because sometimes the person, who is debugging failing container is not the same, who has created and deployed it. argc and argv are stored in the exported global structure, because feed_post_mortem() is added as a post check function callback in the post_check_list. So we can't simply change the signature of feed_post_mortem(), without breaking other post check callbacks APIs. Parsers are not supposed to modify argv, so we can safely bypass its pointer to debug_parse_cli_show_dev(), without copying all argument stings somewhere in the heap or on stack. (cherry picked from commit 0d79c9bedfa564e3c032c1e910c29949f5133d91) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-11-06 18:57:42 +01:00
Frederic Lecaille	4655bd1e64	BUG/MINOR: quic: fix malformed probing packet building This bug arrived with this commit: cdfceb10a MINOR: quic: refactor qc_prep_pkts() loop which prevents haproxy from sending PING only packets/datagrams (some packets/datagrams with only PING frame as ack-eliciting frames inside). Such packets/datagrams are useful in rare cases during retransmissions when one wants to probe the peer without exceeding the anti-amplification limit. Modify the condition passed to qc_build_pkt() to add padding to the current datagram. One does not want to do that when probing the peer without ack-eliciting frames passed as <frms> parameter. Indeed qc_build_pkt() calls qc_do_build_pkt() which supports this case: if <probe> is true (probing required), qc_do_build_pkt() handles the case where some padding must be added to a PING only packet/datagram. This is the case when probing with an empty <frms> frame list of ack-eliciting frames without exceeding the anti-amplification limit from qc_dgrams_retransmit(). Add some comments to qc_build_pkt() and qc_do_build_pkt() to clarify this as this code is easy to break! Thank you for @Tristan971 for having reported this issue in GH #2709. Must be backported to 3.0. (cherry picked from commit 217e467e89d15f3c22e11fe144458afbf718c8a8) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:55:11 +01:00
Willy Tarreau	c91c678b12	CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry It was the only one prefixed with "CO_ERR_", making it harder to batch process and to look up. It was added in 2.5 by commit `61944f7a73` ("MINOR: ssl: Set connection error code in case of SSL read or write fatal failure") so it can be backported as far as 2.6 if needed to help integrate other patches. (cherry picked from commit 393957908bf492ff6660fba239106f0da7988fe8) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:53:56 +01:00
Willy Tarreau	79abc12539	DOC: config: document connection error 44 (reverse connect failure) It was missing from commit `ac1164de7c` ("MINOR: connection: define error for reverse connect"), and can be backported to 3.0 and 2.9. (cherry picked from commit abed9e0426c2f24522e0053452435082870e3afc) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:52:41 +01:00
Christopher Faulet	61ecd35113	BUG/MEDIUM: promex: Fix dump of extra counters When extra counters are dumped for an entity (frontend, backend, server or listener), there is a filter on capabilities. Some extra counters are not available for all entities and must be ignored. However, when this was performed, the field number, used as an index to dump the metric value, was still incremented while it should not and leads to an overflow or a stats mix-up. This patch must be backported to 3.0. (cherry picked from commit d1adfd9fe41b0f9f67944eec07348213a7debbf3) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:51:56 +01:00
Christopher Faulet	533b6f37ce	MINOR: stream: Save last evaluated rule on invalid yield When an action yields while it is not allowed, an internal error is reported. This interrupts the processing. So info about the last evaluated rule must be filled. This patch may be bakcported if needed. If so, the commit ("MINOR: stream: Save last evaluated rule on invalid yield") must be backported first. (cherry picked from commit 0b7605491e4ccb66a0468c219306adf354355e0d) [cf: Of course the mentionned commit to be backported with this one is wrong. It must be "BUG/MINOR: http-ana: Report internal error if an action yields on a final eval"]. Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:50:09 +01:00
Christopher Faulet	14abd1881d	BUG/MINOR: http-ana: Report internal error if an action yields on a final eval This was already performed for tcp actions at content level, but not for HTTP actions. It is always a bug, so it must be reported accordingly. This patch may be backported to all stable versions. (cherry picked from commit 65ea29dcf85c6553e6dd0613a9c6c506fe22b9ac) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:48:34 +01:00
Christopher Faulet	127462ca1c	BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections There were several flaws in the way the different timeouts were applied on H1 connections. First, the H1C task handling timeouts was not created if no client/server timeout was specified. But there are other timeouts to consider. First, the client-fin/server-fin timeouts. But for frontend connections, http-keey-alive and http-request timeouts may also be used. And finally, on soft-stop, the close-spread-time value must be considered too. So at the end, it is probably easier to always create a task to manage H1C timeouts. Especially since the client/server timeouts are most often set. Then, when the expiration date of the H1C's task must only be updated if the considered timeout is set. So tick_add_ifset() must be used instead of tick_add(). Otherwise, if a timeout is undefined, the taks may expire immediately while it should in fact never expire. Finally, the idle expiration date must only be considered for idle connections. This patch should be backported in all stable versions, at least as far as 2.6. On the 2.4, it will have to be slightly adapted for the idle_exp part. On 2.2 and 2.0, the patch will have to be rewrite because h1_refresh_timeout() is quite different. (cherry picked from commit 3c09b34325a073e2c110e046f9705b2fddfa91c5) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:48:25 +01:00
Aurelien DARRAGON	af94845eb5	DOC: config: add missing glitch_{cnt,rate} sample definitions Following previous commit, when glitch_cnt and glitch_rate data types were implemented in `c9c6b683f` ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate"), newly exposed samples such as table_glitch_cnt(), table_glitch_rate, src_glitch_cnt() and src_glitch_rate() were documented but their definitions was missing in supported keywords list. It should be backported in 3.0 with `c9c6b683f` (cherry picked from commit 0686fd8cfccd7ff12211b8253bf2446d62c90a18) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:47:58 +01:00
Aurelien DARRAGON	69f3b4099e	DOC: config: add missing glitch_{cnt,rate} data types When glitch_cnt and glitch_rate data types were implemented in `c9c6b683f` ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and glitch_rate"), the data types list for "stick-table" keyword documentation was overlooked. This was reported by Nick Ramirez. It should be backported in 3.0 with `c9c6b683f`. (cherry picked from commit 9a6fc2d474511ead2fe8c39524d23b156d640ef8) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:47:53 +01:00
William Lallemand	0309e93dbd	BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly Since commit `089c13850f` ("MEDIUM: ssl: ssl-load-extra-del-ext work only with .crt"), the 'set ssl cert' CLI command does not check correctly if the transaction you are trying to update is the right one. The consequence is that you could commit accidentaly a transaction on the wrong certificate. The fix introduces the check again in case you are not using ssl-load-extra-del-ext. This must be backported in all stable versions. (cherry picked from commit 984d2cfb61744bed29ce92cdc5360155cbd8ca44) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:47:13 +01:00
William Lallemand	6bec3fbe6f	BUG/MINOR: trace: stop rewriting argv with -dt When using trace with -dt, the trace_parse_cmd() function is doing a strtok which write \0 into the argv string. When using the mworker mode, and reloading, argv was modified and the trace won't work anymore because the first : is replaced by a '\0'. This patch fixes the issue by allocating a temporary string so we don't modify the source string directly. It also replace strtok by its reentrant version strtok_r. Must be backported as far as 2.9. (cherry picked from commit 596db3ef86844617565a0b4b4ce8358fe6537d87) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-11-06 15:46:27 +01:00
William Lallemand	cdb7dac982	MINOR: cli: remove non-printable characters from 'debug dev fd' When using 'debug dev fd', the output of laddr and raddr can contain some garbage. This patch replaces any control or non-printable character by a '.'. (cherry picked from commit 944a224358ab2865a3a1c0bf700aba38550b19cc) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:58:20 +02:00
Willy Tarreau	f215ad3221	MINOR: debug: store important pointers in post_mortem Dealing with a core and a stripped executable is a pain when it comes to finding pools, proxies or thread contexts. Let's put a pointer to these heads and arrays in the post_mortem struct for easier location. Other critical lists like this could possibly benefit from being added later. Here we now have: - tgroup_info - thread_info - tgroup_ctx - thread_ctx - pools - proxies Example: $ objdump -h haproxy\|grep post 34 _post_mortem 000014b0 0000000000cfd400 0000000000cfd400 008fc400 2*8 (gdb) set $pm=(struct post_mortem)0x0000000000cfd400 (gdb) p $pm->tgroup_ctx[0] $8 = { threads_harmless = 254, threads_idle = 254, stopping_threads = 0, timers = { b = {0x0, 0x0} }, niced_tasks = 0, __pad = 0xf5662c <ha_tgroup_ctx+44> "", __end = 0xf56640 <ha_tgroup_ctx+64> "" } (gdb) info thr Id Target Id Frame * 1 Thread 0x7f9e7706a440 (LWP 21169) 0x00007f9e76a9c868 in raise () from /lib64/libc.so.6 2 Thread 0x7f9e76a60640 (LWP 21175) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 3 Thread 0x7f9e7613d640 (LWP 21176) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 4 Thread 0x7f9e7493a640 (LWP 21179) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 5 Thread 0x7f9e7593c640 (LWP 21177) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 6 Thread 0x7f9e7513b640 (LWP 21178) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 7 Thread 0x7f9e6ffff640 (LWP 21180) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 8 Thread 0x7f9e6f7fe640 (LWP 21181) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 (gdb) p/x $pm->thread_info[0].pth_id $12 = 0x7f9e7706a440 (gdb) p/x $pm->thread_info[1].pth_id $13 = 0x7f9e76a60640 (gdb) set $px = *$pm->proxies while ($px != 0) printf "%#lx %s served=%u\n", $px, $px->id, $px->served set $px = ($px)->next end 0x125eda0 GLOBAL served=0 0x12645b0 stats served=0 0x1266940 comp served=0 0x1268e10 comp_bck served=0 0x1260cf0 <OCSP-UPDATE> served=0 0x12714c0 <HTTPCLIENT> served=0 (cherry picked from commit e5fccfe0b6397ec2b14ebc3a0d09646442b2018d) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:55:20 +02:00
Willy Tarreau	9e0fc2a8c5	MINOR: debug: place the post_mortem struct in its own section. Placing it in its own section will ease its finding, particularly in gdb which is too dumb to find anything in memory. Now it will be sufficient to issue this: $ gdb -ex "info files" -ex "quit" ./haproxy core 2>/dev/null \|grep _post_mortem 0x0000000000cfd300 - 0x0000000000cfe780 is _post_mortem or this: $ objdump -h haproxy\|grep post 34 _post_mortem 00001480 0000000000cfd300 0000000000cfd300 008fc300 2*8 to spot the symbol's address. Then it can be read this way: (gdb) p (struct post_mortem *)0x0000000000cfd300 (cherry picked from commit 93c3f2a0b4da77c0317496b8585192fb64ef400f) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:55:20 +02:00
Willy Tarreau	c4b1c0b276	MINOR: debug: place a magic pattern at the beginning of post_mortem In order to ease finding of the post_mortem struct in core dumps, let's make it start with a recognizable pattern of exactly 32 chars (to preserve alignment): "POST-MORTEM STARTS HERE+7654321\0" It can then be found like this from gdb: (gdb) find 0x000000012345678, 0x0000000100000000, 'P','O','S','T','-','M','O','R','T','E','M' 0xcfd300 <post_mortem> 1 pattern found. Or easier with any other more practical tool (who as ever used "find" in gdb, given that it cannot iterate over maps and is 100% useless?). (cherry picked from commit 989b02e1930d7ecd1a728c3d18ccfba095cdd636) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:55:20 +02:00
Willy Tarreau	3f31155977	MINOR: pools: export the pools variable We want it to be accessible from debuggers for inspection and it's currently unavailable. Let's start by exporting it as a first step. (cherry picked from commit fba48e1c40287f1abb4066935f2436bd0b8cd7a4) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:55:20 +02:00
Willy Tarreau	9ec0260698	BUILD: debug: silence a build warning with threads disabled Commit 091de0f9b2 ("MINOR: debug: slightly change the thread_dump_pointer signification") caused the following warning to be emitted when threads are disabled: src/debug.c: In function 'ha_thread_dump_one': src/debug.c:359:9: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] Let's just disguise the pointer to silence it. It should be backported where the patch above was backported, since it was part of a series aiming at making thread dumps more exploitable from core dumps. (cherry picked from commit f163cbfb7f893a06d158880a753cad01908143d8) [wt: s/MT_LIST_FOR_EACH_ENTRY_LOCKED/mt_list_for_each_entry_safe/ with two backup elements in 3.0] Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:54:28 +02:00
Amaury Denoyelle	38c874bad6	BUG/MEDIUM: server: fix race on servers_list during server deletion Each server is inserted in a global list named servers_list on new_server(). This list is then only used to finalize servers initialization after parsing. On dynamic server creation, there is no issue as new_server() is under thread isolation. However, when a server is deleted after its refcount reached zero, srv_drop() removes it from servers_list without lock protection. In the longterm, this can cause list corruption and crashes, especially if multiple adjacent servers are removed in parallel. To fix this, convert servers_list to a mt_list. This should not impact performance as servers_list is not used during runtime outside of server creation/deletion. This should fix github issue #2733. Thanks to Chris Staite who first found the issue here. This must be backported up to 2.6. (cherry picked from commit 7a02fcaf20dbc19db36052bbc7001bcea3912ab5) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 16:40:35 +02:00
Christopher Faulet	adceb4a595	BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side There is no reason to disable the 0-copy data forwarding if an end-of-stream was reported on the consumer side. Indeed, the consumer will send data in this case. So there is no reason to check the read side here. This patch may be backported as far as 2.9. (cherry picked from commit 362de90f3e4ddd0c15331c6b9cb48b671a6e2385) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-24 12:17:19 +02:00
Christopher Faulet	04b5a16bc2	BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding When the response forwarding is aborted, we must not report a client abort if a EOS was seen on client side. On abort performed by the stream must be considered. This bug was introduced when the SHUTR was splitted in 2 flags. This patch must be backported as far as 2.8. (cherry picked from commit 5970c6abec3e0ee4ac44364e999cae2cc852f4c8) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-24 12:17:19 +02:00
Christopher Faulet	f55508ab67	BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error When some data must be sent to the endpoint but an error was previously reported, nothing is performed and we leave. But, in this case, the SC is not notified the sends are blocked. It is indeed an issue if the endpoint reports an error after consuming all data from the SC. In the endpoint the outgoing data are trashed because of the error, but on the SC, everything was sent, even if an error was also reported. Because of this bug, it is possible to have outgoing data blocked at the SC level but without any write timeout armed. In some cases, this may lead to blocking conditions where the stream is never closed. So now, when outgoing data cannot be sent because an previous error was triggered, a blocked send is reported. This way, it is possible to report a write timeout. This patch should fix the issue #2754. It must be backported as far as 2.8. (cherry picked from commit fbc3de6e9e59679d2e9ece3984ce31b6a7dd418f) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-24 12:17:19 +02:00
Amaury Denoyelle	55fada172c	BUG/MINOR: server: fix dynamic server leak with check on failed init If a dynamic server is added with check or agent-check, its refcount is incremented after server keyword parsing. However, if add server fails at a later stage, refcount is only decremented once, which prevented the server to be fully released. This causes a leak with a server which is detached from most of the lists but still exits in the system. This bug is considered minor as only a few conditions may cause a failure in add server after check/agent-check initialization. This is the case if there is a naming collision or the dynamic ID cannot be generated. To fix this, simply decrement server refcount on add server error path if either check and/or agent-check are flagged as activated. This bug is related to github issue #2733. Thanks to Chris Staite who first found the leak. This must be backported up to 2.6. (cherry picked from commit 116178563c2fb57e28a76838cf85c4858b185b76) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-24 12:10:01 +02:00
Willy Tarreau	fcaae8d6d2	MINOR: activity/memprofile: show per-DSO stats On systems where many libs are loaded, it's hard to track suspected leaks. Having a per-DSO summary makes it more convenient. That's what we're doing here by summarizing all calls per DSO before showing the total. (cherry picked from commit 401fb0e87a2cea7171e4d37da6094755eb10a972) Signed-off-by: Willy Tarreau <w@1wt.eu>	2024-10-24 10:52:59 +02:00
Willy Tarreau	27ade1e5fe	MINOR: activity/memprofile: always return "other" bin on NULL return address It was found in a large "show profiling memory" output that a few entries have a NULL return address, which causes confusion because this address will be reused by the next new allocation caller, possibly resulting in inconsistencies such as "free() ... pool=trash" which makes no sense. The cause is in fact that the first caller had an entry->info pointing to the trash pool from a p_alloc/p_free with a NULL return address, and the second had a different type and reused that entry. Let's make sure undecodable stacks causing an apparent NULL return address all lead to the "other" bin. While this is not exactly a bug, it would make sense to backport it to the recent branches where the feature is used (probably at least as far as 2.8). (cherry picked from commit 5091f90479ab4d963b55cb725cee8201d93521d9) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:25:12 +02:00
Aurelien DARRAGON	a6ecd879b1	BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families As described in GH #2765, there were situations where http connections would be re-used for requests to different endpoints, which is obviously unexpected. In GH #2765, this occured with httpclient and UNIX socket combination, but later code analysis revealed that while disabling http reuse on httpclient proxy helped, it didn't fix the underlying issue since it was found that conn_calculate_hash_sockaddr() didn't take into account families such as AF_UNIX or AF_CUST_SOCKPAIR, and because of that the sock_addr part of the connection wasn't hashed. To properly fix the issue, let's explicly handle UNIX (both regular and ABNS) and AF_CUST_SOCKPAIR families, so that the destination address is properly hashed. To prevent this bug from re-appearing: when the family isn't known, instead of doing nothing like before, let's fall back to a generic (unoptimal) hashing which hashes the whole sockaddr_storage struct As a workaround, http-reuse may be disabled on impacted proxies. (unfortunately this doesn't help for httpclient since reuse policy defaults to safe and cannot be modified from the config) It should be backported to all stable versions. Shout out to @christopherhibbert for having reported the issue and provided a trivial reproducer. [ada: prior to 3.0, ctx adjt is required because conn_hash_update()'s prototype is slightly different] (cherry picked from commit b5b40a9843e505ed84153327ab897ca0e8d9a571) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:24:09 +02:00
Christopher Faulet	a910a25232	BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF When data are sent via the zero-copy data forwarding, in h2_done_ff, we must be sure to remove the H2 stream from the send list if something is send. It was only performed if no blocking condition was encountered. But we must also do it if something is sent. Otherwise the transfer may be blocked till timeout. This patch must be backported as far as 2.9. (cherry picked from commit ded28f6e5c210b49ede7edb25cd4b39163759366) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:24:03 +02:00
Christopher Faulet	dda720da66	BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF During the zero-copy data forwarding, the caller specify the maximum amount of data the producer may push. However, the HTML stats applet does not use it and can fill all the free space in the buffer. It is especially an issue when the consumer is limited by a flow control, like the H2. Because we may emit too large DATA frame in this case. It is especially visible with big buffer (for instance 32k). In the early age or zero-copy data forwarding, the caller was responsible to pass a properly resized buffer. And during the different refactoring steps, this has changed but the HTML stats applet was not updated accordingly. To fix the bug, the buffer used to dump the HTML page is resized to be sure not too much data are dumped. This patch should solve the issue #2757. It must be backported to 3.0. (cherry picked from commit 529e4f36a353bca292196e1344a79b8cd4ba143c) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:23:57 +02:00
Amaury Denoyelle	260fa5648e	BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent A stream may be shut without any HTX EOM reported to report a proper closure. This is the case for QCS instances flagged with QC_SF_UNKNOWN_PL_LENGTH. Shut is performed with an empty FIN emission instead of a RESET_STREAM. This has been implemented since the following patch : `24962dd178` BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length However, in case of HTTP/3, an empty FIN should only be done after a full message is emitted, which requires at least a HEADERS frame. If an empty FIN is emitted without it, client may interpret this as invalid and close the connection. To prevent this, fallback to a RESET_STREAM emission if no data were emitted on the stream. This was reproduced using ngtcp2-client with 10% loss (-r 0.1) on a remote host, with httpterm request "/?s=100k&C=1&b=0&P=400". An error ERR_H3_FRAME_UNEXPECTED is returned by ngtcp2-client when the bug occurs. Note that this change is incomplete. The message validity depends solely on the application protocol in use. As such, a new app_ops callback should be implemented to ensure the stream is closed accordingly. However, this first patch ensures that at least HTTP/3 case is valid while keeping a minimal backport process. This should be backported up to 2.8. (cherry picked from commit 68c8c910238f0b759d75b4da2128370abf184cd1) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:23:49 +02:00
Valentine Krasnobaeva	7e69007d4a	BUG/MINOR: mworker: fix mworker-max-reloads parser Before this patch, when wrong argument was provided in the configuration for mworker-max-reloads keyword, parser shows these errors below on the stderr: [WARNING] (1820317) : config : parsing [haproxy.cfg:154] : (null)parsing [haproxy.cfg:154] : 'mworker-max-reloads' expects an integer argument. In a case, when by mistake two arguments were provided instead of one, this has also triggered a buggy error message: [ALERT] (1820668) : config : parsing [haproxy.cfg:154] : 'mworker-max-reloads' cannot handle unexpected argument '45'. [WARNING] (1820668) : config : parsing [haproxy.cfg:154] : (null) So, as 'mworker-max-reloads' is parsed in discovery mode by master process let's align now its parser with all others, which could be called for this mode. Like this in cases, when there are too many args or argument isn't a valid integer we return proper error codes to global section parser and messages are formated properly. This fix should be backported in all stable versions. (cherry picked from commit af1d170122369094a1f3869791fb34fb7286e31e) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:23:37 +02:00
Aurelien DARRAGON	fceb0f42e7	DOC: config: fix rfc7239 forwarded typo in desc replace specicy with specify in rfc7239 forwarded option description. Multiple occurences were found. May be backported in 2.8. (cherry picked from commit 45cbbdc84551e51cdaf0046e1371e8495d053fb5) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:22:26 +02:00
Frederic Lecaille	cf5344e507	BUG/MEDIUM: quic: avoid freezing 0RTT connections This issue came with this commit: f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT and could be easily reproduced with picoquic QUIC client with -Q option which splits a big ClientHello TLS message into two Initial datagrams. A second condition must be fulfilled to reprodue this issue: picoquic must not send the token provided by haproxy (NEW_TOKEN). To do that, haproxy must be patched to prevent it to send such tokens. Under these conditions, if haproxy has enough time to reply to the first Initial datagrams, when it receives the second Initial datagram it sends a Retry paquet. Then the client ignores the Retry paquet as mentionned by RFC 9000: 17.2.5.2. Handling a Retry Packet A client MUST accept and process at most one Retry packet for each connection attempt. After the client has received and processed an Initial or Retry packet from the server, it MUST discard any subsequent Retry packets that it receives. On its side, haproxy has closed the connection. When it receives the second Initial datagram, it open a new connection but with Initial packets it cannot decrypt (wrong ODCID) leaving the client without response. To fix this, as the aim of the token (NEW_TOKEN) sent by haproxy is to validate the peer address, in place of closing the connection when no token was received for a 0RTT connection, one leaves this validation to the handshake process. Indeed, the peer adress is validated during the handshake when a valid handshake packet is received by the listener. But as one does not want haproxy to process 0RTT data when no token was received, one does not accept the connection before the successful handshake completion. In addition to this, the 0RTT packets are not released after successful handshake completion when no token was received to leave a chance to haproxy to process these 0RTT data in such case (see quic_conn_io_cb()). Must be backported as far as 2.9. (cherry picked from commit b1af5dabf0c4af1eda3a520a90332df1f4c12dcf) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:22:18 +02:00
Frederic Lecaille	b47a34ce8d	BUG/MINOR: quic: avoid leaking post handshake frames This bug came with this commit: f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT If an error happens in quic_build_post_handshake_frames() during the code exexuted for th NEW_TOKEN frame allocation, some could leak because of the wrong label used to interrupt this function asap. Replace the "goto leave" by "goto err" to deallocated such frames to fix this issue. Must be backported as far as 2.9. (cherry picked from commit 19aa320f640f701544c3441787da1577a2479590) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:22:13 +02:00
Christopher Faulet	11ba718b69	REGTESTS: Never reuse server connection in http-messaging/truncated.vtc A "Connection: close" header is added to responses to avoid any connection reuse. This should avoid errors on the client side. (cherry picked from commit e7be13da87f8ec00470ef60bb43b85f0480fd85d) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:22:02 +02:00
Christopher Faulet	7d3fed6bf8	BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter When a filter is registered on the data, it means it may change the payload length by rewritting data. It means consumers of the message cannot trust the expected length of payload as announced by the producer. The commit `8bd835b2d2` ("MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered") was pushed to solve this issue. When the HTTP payload of a message is filtered, the extra field is set to 0 to be sure it will never be used by error by any consumer. However, it is not enough. Indeed, the filters must be called before fowarding some data. They cannot be by-passed. But if a consumer is unable to flush the HTX message, some outgoing data can remain blocked in the channel's buffer. If some new data are then pushed because there is some room in the channel's buffe, the producer will set the HTX extra field. At this stage, if the consumer is unblocked and can send again data, it is possible to call it to forward outgoing data blocked in the channel's buffer before waking the stream up to filter new input data. It is the purpose of the data fast-forwarding. In this case, the HTX extra field will be seen by the consumer. It is unexpected and leads to undefined behavior. One consequence of this bug is to perform a wrong chunking on compressed messages, leading to processing errors at the end of the message, reported as "ID--" in logs. To fix the bug, a HTX flag is added to state the payload of the current HTX message is altered. When this flag is set (HTX_FL_ALTERED_PAYLOAD), the HTX extra field must not be trusted. And to keep things simple, when this flag is set, the HTX extra field is automatically set to 0 when the HTX message is loaded, in htxbuf() function. It is probably the less intrusive way to fix the bug for now. But this part must be reviewed to save meta-info of the HTX message outside of the message itself. This commit should solve the issue #2741. It must be backported as far as 2.9. (cherry picked from commit 52a3d807fc332b57b62f5e30aa6f697636a22695) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:21:56 +02:00
Christopher Faulet	dc103ef13e	BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify() In sc_notify() function, the consumer side of the SC is tested to verify if we must perform a shutdown on the endpoint. To do so, no output data must be present in the buffer and in the iobuf. However, there is a bug here, the iobuf of the opposite SC is tested instead of the one of the current SC. So a shutdown can be performed on the endpoint while there are still output data in the iobuf that must be sent. Concretely, it can only be data blocked in a pipe. Because of this bug, data blocked in the pipe will be never sent. I've not tested but I guess this may block the stream during the client or server timeout. This patch must be backported as far as 2.9. (cherry picked from commit 0fcfed9e231f2bc3963fe6085598970db2174af1) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:21:46 +02:00
Christopher Faulet	bd3df74651	BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid If a parsing error is reported by the mux on the response payload, a proxy error (PRXCOND) must be reported instead of a server abort (SRVCL). Because of this bug, inavlid response may are reported as "SD--" or "SL--" in logs instead of "PD--" or "PL--". This patch must be backported to all stable versions. (cherry picked from commit 6790067e79566b2ca5943e72200361c40001bde2) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:21:40 +02:00
Christopher Faulet	38b1197a78	BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send When a send attempt is performed on the opposite side from sc_notify() and all outgoing data are sent while a shut was scheduled, the SE is shut down because we consider all data were sent and no more are expected. However, here we must also be carefull to have sent all pending data in the iobuf. Indeed, some spliced data may be blocked. In this case, if the SE is shut down, these data may be lost. This patch should fix the original bug reported in #2749. It must be backported as far as 2.9. (cherry picked from commit 48f1e2b6fe8457bb5b9d8db9447157c244d871b7) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:21:33 +02:00
William Lallemand	d356446984	BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new() Latest patches on the mworker rework skipped the httpclient_proxy creation by accident. This is not supposed to happen because haproxy is supposed to stop when the proxy creation failed, but it shows a flaw in the API. When the httpclient_proxy or the proxy used in parameter of httpclient_new_from_proxy() is NULL, it will be dereferenced and cause a crash. The patch only returns a NULL when doing an httpclient_new() if the proxy is not available. Must be backported as far as 2.7. (cherry picked from commit e7b7072943d658702eba3651d66c6093f1a79fa8) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:21:27 +02:00
Willy Tarreau	bd773295ae	BUG/MEDIUM: queue: make sure never to queue when there's no more served conns Since commit 53f52e67a0 ("BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server"), we've got two reports again still showing the theoretically impossible condition in pendconn_add(), including a single threaded one. Thanks to the traces, the issue could be tracked down to the redispatch part. In fact, in non-determinist LB algorithms (RR, LC, FAS), we don't perform the LB if there are pending connections in the backend, since it indicates that previous attempts already failed, so we directly return SRV_STATUS_FULL. And contrary to a previous belief, it is possible to meet this condition with be->served==0 when redispatching (and likely with maxconn not greater than the number of threads). The problem is that in this case, the entry is queued and then the pendconn_must_try_again() function checks if any connections are currently being served to detect if we missed a race, and tries again, but that situation is not caused by a concurrent thread and will never fix itself, resulting in the loop. All that part around pendconn_must_try_again() is still quite brittle, and a safer approach would involve a sequence counter to detect new arrivals and dequeues during the pendconn_add() call. But it's more sensitive work, probably for a later fix. This fix must be backported wherever the fix above was backported. Thanks to Patrick Hemmer, as well as Damien Claisse and Basha Mougamadou from Criteo for their help on tracking this one! (cherry picked from commit ca275d99ce02e72d707fc87da133d739cdda5146) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 17:21:13 +02:00
Amaury Denoyelle	1bd64e5d46	BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests If a small request is received on QUIC MUX frontend, it can be transmitted directly with the FIN on attach operation. rcv_buf is skipped by the stream layer. Thus, it is necessary to ensure that there is similar behavior when FIN is reported either on attach or rcv_buf. One difference was that se_expect_data() was called only for rcv_buf but not on attach. This most obvious effect is that stream timeout was deactivated for this request : client timeout was disabled on EOI but server one not armed due to previous se_expect_no_data(). This prevents the early closure of too long requests. To fix this, add an invokation of se_expect_data() on attach operation. This bug can simply be detected using httpterm with delay request (for example /?t=10000) and using smaller client/server timeouts. The bug is present if the request is not aborted on timeout but instead continue until its proper HTTP 200 termination. This has been introduced by the following commit : `85eabfbf67` MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished This must be backported up to 2.8. (cherry picked from commit 232083c3e5ca3f23a44fa64def6a88dd257c3b23) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 16:41:52 +02:00
Aurelien DARRAGON	d5d2dd25df	BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}() To execute sample fetches and converters from lua. hlua API leverages the sample API. Prior to executing the sample func, the arg checker is called from hlua_run_sample_{fetch,conv}() to detect potential errors. However, hlua_run_sample_{fetch,conv}() both pass NULL as <err> argument, but it is wrong for two reasons. First we miss an opportunity to report precise error messages to help the user know what went wrong during the check.. and more importantly, some val check functions consider that the <err> pointer is never NULL. This is the case for example with check_crypto_hmac(). Because of this, when such val check functions encounter an error, they will crash the process because they will try to de-reference NULL. This bug was discovered and reported by GH user @JB0925 on #2745. Perhaps val check functions should make sure that the provided <err> pointer is != NULL prior to de-referencing it. But since there are multiple occurences found in the code and the API isn't clear about that, it is easier to fix the hlua part (caller) for now. To fix the issue, let's always provide a valid <err> pointer when leveraging val_arg() check function pointer, and make use of it in case or error to report relevant message to the user before freeing it. It should be backported to all stable versions. (cherry picked from commit f88f162868df9053ca71e3be0628221c36153d9a) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>	2024-10-23 16:41:46 +02:00

1 2 3 4 5 ...

22705 Commits