haproxy

Author	SHA1	Message	Date
Christopher Faulet	8e55d29109	MINOR: mux-h1: Add a flag to ignore the request payload There was a flag to skip the response payload on output, if any, by stating it is bodyless. It is used for responses to HEAD requests or for 204/304 responses. This allow rewrites during analysis. For instance a HEAD request can be rewrite to a GET request for any reason (ie, a server not supporting HEAD requests). In this case, the server will send a response with a payload. On frontend side, the payload will be skipped and a valid response (without payload) will be sent to the client. With this patch we introduce the corresponding flag for the request. It will be used to skip the request payload. In addition, when payload must be skipped for a request or a response, The zero-copy data forwarding is now disabled.	2024-05-17 16:33:53 +02:00
Christopher Faulet	45a45c917a	BUG/MINOR: stats: Don't state the 303 redirect response is chunked Start-line flags for 303-See-Other response returned by the stats applet are not properly set. Indeed, the reponse has a "content-length" header but both HTX_SL_F_CHNK and HTX_SL_F_CLEN flags are set. Because of this bug, the reponse is considered as chunked. So, let's remove HTX_SL_F_CHNK flag. And also add HTX_SL_F_BODYLESS flag because there is no payload ("content-length" header is always set to 0). This patch must be backported to all stable versions. On the 2.8 and lower versions, the commit `d0b04920d1` ("BUG/MINOR: htpp-ana/stats: Specify that HTX redirect messages have a C-L header") must be backported first.	2024-05-17 16:33:53 +02:00
Willy Tarreau	e362b076b1	Revert: MEDIUM: evports: permit to report multiple events at once" Tests have shown that switching nevlist to global.tune.maxpollevents is totally unreliable when using evports, and that events seem to be missed. A good reproducer seems to be QUIC. There are not enough users of Solaris to warrant spending more time trying to get down to this, and even the few that remain are by definition not interested in performance, so let's just revert the commit that tried to lift the value: `e6662bf706` ("MEDIUM: evports: permit to report multiple events at once"). No backport is needed.	2024-05-17 15:57:18 +02:00
Aurelien DARRAGON	b9915a745e	BUG/MEDIUM: fd: prevent memory waste in fdtab array In `97ea9c49f1` ("BUG/MEDIUM: fd: always align fdtab[] to 64 bytes"), the patch doesn't do what the message says. The intent was only to align the base fdtab addr on 64 bytes so that all fdtab entries are aligned and thus don't share the same cache line. For that, fdtab pointer is adjusted from fdtab_addr (unaligned) address after it is allocated. Thus, all we need is an extra 64 bytes in the fdtab_addr array for the aligment. Because we use calloc() to perform the allocation, a dumb mistake was made: the '+64' was added on <size> calloc argument, which means EACH fdtab entry is allocated with 64 extra bytes. Given that a single fdtab entry is 64 bytes, since `97ea9c49f1` each fdtab entry now takes 128 bytes! We doubled fdtab memory consumption. To give you an idea, on my laptop, when looking at memory consumption using 'ps -p `pidof haproxy` -o size' right after starting haproxy process with default settings (no maxsock enforced): before `97ea9c49f1`: -> 118440 (KB, ~= 118MB) after `97ea9c49f1`: -> 183976 (KB, ~= 184MB) To fix this, use calloc with 1 <nmemb> and manually provide the size with <size> as we would do if we used malloc(). With this patch, we're back to pre-97ea9c49f1 for fdtab memory consumption (with 64 extra bytes the whole array, which is insignificant). It should be backported to all stable versions.	2024-05-17 15:25:03 +02:00
Aurelien DARRAGON	e84c8dee1a	BUILD: log: get rid of non-portable strnlen() func In `c614fd3b9` ("MINOR: log: add +cbor encoding option"), I wrongly used strnlen() without noticing that the function is not portable (requires _POSIX_C_SOURCE >= 2008) and that it was the first occurrence in the entire project. In fact it is not a hard requirement since it's a pretty simple function. Thus to restore build compatibility with minimal/older build systems, let's actually get rid of it and use an equivalent portable code where needed (we cannot simply rely on strlen() because the string might not be NULL terminated, we must take upstream len into account). No backport needed (unless `c614fd3b9` gets backported)	2024-05-17 15:24:53 +02:00
William Lallemand	f18ed8d07e	MEDIUM: ssl: add ocsp-update.mindelay and ocsp-update.maxdelay This patch deprecates tune.ssl.ocsp-update.* in favor of "ocsp-update.*". Since the ocsp-update is not really a tunable of the SSL connections.	2024-05-17 15:00:11 +02:00
Amaury Denoyelle	fbc3d46b9f	BUILD: stats: remove non portable getline() usage getline() was used to read stats-file. However, this function is not portable and may cause build issue on some systems. Replace it by standard fgets(). No need to backport.	2024-05-17 14:53:19 +02:00
William Lallemand	ee58fac1b4	MINOR: ssl: rename tune.ssl.ocsp-update.mode in ocsp-update.mode Since the ocsp-update is not strictly a tuning of the SSL stack, but a feature of its own, lets rename the option. The option was also missing from the index.	2024-05-17 14:50:00 +02:00
Amaury Denoyelle	0d35f8d918	MINOR: h3: report glitch on RFC violation Increment glitch connection counter on every HTTP/3 or QPACK errors which is a violation of the specification. This could be useful to get rid early of bogus clients.	2024-05-16 10:58:54 +02:00
Amaury Denoyelle	216f70f989	MINOR: mux-quic: support glitches Implement basic support for glitches on QUIC multiplexer. This is mostly identical too glitches for HTTP/2. A new configuration option named tune.quic.frontend.glitches-threshold is defined to limit the number of glitches on a connection before closing it. Glitches counter is incremented via qcc_report_glitch(). A new qcc_app_ops callback <report_susp> is defined. On threshold reaching, it allows to set an application error code to close the connection. For HTTP/3, value H3_EXCESSIVE_LOAD is returned. If not defined, default code INTERNAL_ERROR is used. For the moment, no glitch are reported for QUIC or HTTP/3 usage. This will be added in future patches as needed.	2024-05-16 10:58:20 +02:00
Amaury Denoyelle	a6993a669b	MINOR: h3: adjust error reporting on receive This commit is the second step to simplify HTTP/3 error management. This times it deals with receive side on h3_rcv_buf(). Various internal HTTP/3 to HTX conversion functions does not set H3_INTERNAL_ERROR on h3c err anymore. Only standard error code are set. For every errors, both internal and protocol ones, a negative value is returned. This ensure that h3_rcv_buf() looping is interrupted. This function will then set H3_INTERNAL_ERROR only if no standard error is registered via h3c or h3s. Along the previous commit, this should better reflect internal errors from protocol ones caused by a faulty client.	2024-05-16 10:31:17 +02:00
Amaury Denoyelle	079d13f73f	MINOR: h3: adjust error reporting on sending It's currently difficult to differentiate HTTP/3 standard protocol violation from internal issues which use solely H3_INTERNAL_ERROR code. This patch aims is the first step to simplify this. The objective is to reduce H3_INTERNAL_ERROR. <err> field of h3c should be reserved exclusively to other values. Simplify error management in sending via h3_snd_buf(). Sending side is straightforward as only internal errors can be encountered. Do not manually set h3c.err to H3_INTERNAL_ERROR in HTX to HTTP/3 various conversion function. Instead, just return a negative value which is enough to break h3_snd_buf() loop. H3_INTERNAL_ERROR is thus positionned on a single location in this function for all sending operations.	2024-05-16 10:31:17 +02:00
Amaury Denoyelle	e094412337	MINOR: h3/qpack: adjust naming for errors Rename enum values used for HTTP/3 and QPACK RFC defined codes. First uses a prefix H3_ERR_* which serves as identifier between them. Also separate QPACK values in a new dedicated enum qpack_err. This is deemed cleaner.	2024-05-16 10:31:17 +02:00
Amaury Denoyelle	2dabcf30be	MINOR: qpack: prepare error renaming There is two distinct enums both related to QPACK error management. The first one is dedicated to RFC defined code. The other one is a set of internal values returned by qpack_decode_fs(). There has been issues discovered recently due to the confusion between them. Rename internal values with the prefix QPACK_RET_. The older name QPACK_ERR_ will be used in a future commit for the first enum.	2024-05-16 10:31:17 +02:00
Christopher Faulet	25bcdb1d95	BUG/MAJOR: h1: Be stricter on request target validation during message parsing As stated in issue #2565, checks on the request target during H1 message parsing are not good enough. Invalid paths, not starting by a slash are in fact parsed as authorities. The same error is repeated at the sample fetch level. This last point is annoying because routing rules may be fooled. It is also an issue when the URI or the Host header are updated. Because the error is repeated at different places, it must be fixed. We cannot be lax by arguing it is the server's job to accept or reject invalid request targets. With this patch, we strengthen the checks performed on the request target during H1 parsing. Idea is to reject invalid requests at this step to be sure it is safe to manipulate the path or the authority at other places. So now, the asterisk-form is only allowed for OPTIONS and OTHER methods. This last point was added to not reject the H2 preface. In addition, we take care to have only one asterisk and nothing more. For the CONNECT method, we take care to have a valid authority-form. All other form are rejected. The authority-form is now only supported for CONNECT method. No specific check is performed on the origin-form (except for the CONNECT method). For the absolute-form, we take care to have a scheme and a valid authority. These checks are not perfect but should be good enough to properly identify each part of the request target for a relative small cost. But, it is a breaking change. Some requests are now be rejected while they was not on older versions. However, nowadays, it is most probably not an issue. If it turns out it's really an issue for legitimate use-cases, an option would be to supports these kinds of requests when the "accept-invalid-http-request" option is set, with the consequence of seeing some sample fetches having an unexpected behavior. This patch should fix the issue #2665. It MUST NOT be backported. First because it is a breaking change. And then because by avoiding backporting it, it remains possible to relax the parsing with the "accept-invalid-http-request" option.	2024-05-15 21:20:37 +02:00
Christopher Faulet	d3d9d83f03	BUG/MEDIUM: h1: Reject CONNECT request if the target has a scheme The target of a CONNECT request must not have scheme. However, this was not checked during the message parsing. It is now rejected. This patch may be backported as far as 2.4.	2024-05-15 21:20:37 +02:00
Christopher Faulet	d724b0d147	BUG/MINOR: h1: Check authority for non-CONNECT methods only if a scheme is found When a non-CONNECT H1 request is parsed, the authority is compared to the host header value, to validate that they are the same. However there is an issue here when a relative path is used (not begining with a '/'). In this case, the path is considered as the authority and will be erroneously compared to the host header value. It is observable with this kind of request: GET admin HTTP/1.1 Host: www.mysite.com In this case "admin" is parsed as an authority while it is in fact a path. At this step, it is not a big deal because it just happens on the very first checks on the message during the parsing. However, the same happens when the authority is updated. This will be fixed in another commit Note this kind of request is invalid because the path does not start with a '/'. But, till now, HAProxy does not reject it. This patch is related to issue #2565. It must be backported as far as 2.4.	2024-05-15 21:20:37 +02:00
Willy Tarreau	821a04377d	BUG/MEDIUM: muxes: enforce buf_wait check in takeover() The ->takeover() is quite tricky. It didn't take care of the possibility that the original thread's connection handler had been woken up to handle an event (e.g. read0), failed to get a buffer, registered against its own thread's buffer_wait queue and left the connection in an idle state. A new thread could then come by, perform a takeover(), and when a buffer was available, the new thread's tasklet would be woken up by the old one via _buf_available(), causing all sort of problems. These problems are easy to reproduce, by running with shared backend connections and few buffers (tune.buffers.limit=20, 8 threads, 500 connections, transfer 64kB objects and wait 2-5s for a crash to appear). A first estimated solution consisted in removing the connection from the idle list but it turns out that it would be worse for the delete stuff (the connection no longer appearing as idle, making it impossible to find it in order to close it). Also, idle counts wouldn't match anymore the list's state, and the special case of private connections could be difficult to handle as the connection could be forcefully re-added to the idle list after allocation despite being private. After multiple attempts to address the problem in various ways, it appears that the only reliable solution for now (without starting to turn many lists to mt_lists) is to have the takeover() function handle the buf_wait detection or unregistration itself: - when doing a regular takeover aiming at finding an idle connection for a new request, connections that are blocked in a buffer_wait queue are quite rare and not interesting at all (since not immediately usable), so skipping them is sufficient. For this we detect that the desired connection belongs to a buffer_wait list by checking its buf_wait.list element. Note that this check is not* thread-safe! The LIST_DEL_INIT() is performed by __offer_buffers() after the callback was called. But this is sufficient as it is now because the only way for the element to be seen as not in a list is after the element was last touched by __offer_buffers(), so the situation for this connection will not change in a different way later. - when doing a server delete, we're running under thread isolation. The connection might get taken over to be killed. The only trick is that private connections not belonging to any idle list may also experience this, and in this case even the idle_conns lock will not offer any protection against anything. But since we're run under thread isolation, we're certain not to compete with the other thread, so it's safe to directly unregister the connection from its owner thread. Normally this is already handled by conn_release() in cli_parse_delete_server(), which calls mux->destroy(), but this would actually update the current thread's queue instead of the origin thread's, thus we do need to perform an explicit dequeue before completing the takeover. With this, the problem now looks solved for HTTP/1, HTTP/2 and FCGI, though extensive tests were essentially run on HTTP/1 and HTTP/2. While the problem has been there for a very long time, there should be no reason to backport it since buffer_wait didn't practically work before 3.0-dev and the process used to freeze hard very quickly before we'd even have a chance to meet that race.	2024-05-15 19:37:12 +02:00
Willy Tarreau	edb99e296d	BUG/MINOR: ssl_sock: fix xprt_set_used() to properly clear the TASK_F_USR1 bit In 2.4-dev8 with commit `5c7086f6b0` ("MEDIUM: connection: protect idle conn lists with locks"), the idle conns list started to be protected using the lock for takeover, and the SSL layer used to always take that lock. Later in 2.4-dev11, with commit `4149168255` ("MEDIUM: ssl: implement xprt_set_used and xprt_set_idle to relax context checks"), we decided to relax this lock using TASK_F_USR1 just as is done in muxes. However the xprt_set_used() call, that's supposed to clear the flag, visibly suffered from a copy-paste and kept the OR operation instead of the AND, resulting in the flag never being released, so that SSL on the backend continues to take the lock on each and every I/O access even when the connection is not idle. The effect is only a reduced performance. This could be backported, but given the non-zero risk of triggering another bug somewhere, it would be prudent to wait for this fix to be sufficiently tested in new versions first.	2024-05-15 19:37:12 +02:00
Amaury Denoyelle	86aafd0236	BUG/MINOR: qpack: fix error code reported on QPACK decoding failure qpack_decode_fs() is used to decode QPACK field section on HTTP/3 headers parsing. Its return value is incoherent as it returns either QPACK_DECOMPRESSION_FAILED defined in RFC 9204 or any other internal values defined in qpack-dec.h. On failure, such return code is reused by HTTP/3 layer to be reported via a CONNECTION_CLOSE frame. This is incorrect if an internal error values was reported as it is not defined by any specification. Fir return values of qpack_decode_fs() in two ways. Firstly, fix invalid usages of QPACK_DECOMPRESSION_FAILED when decoded content is too large for the correct internal error QPACK_ERR_TOO_LARGE. Secondly, adjust qpack_decode_fs() API to only returns internal code values. A new internal enum QPACK_ERR_DECOMP is defined to replace QPACK_DECOMPRESSION_FAILED. Caller is responsible to convert it to a suitable error value. For other internal values, H3_INTERNAL_ERROR is used. This is done through a set of convert functions. This should be backported up to 2.6. Note that trailers are not supported in 2.6 so chunk related to h3_trailers_to_htx() can be safely skipped.	2024-05-15 16:07:15 +02:00
Amaury Denoyelle	4295dd21bd	BUG/MINOR: mux-quic: fix error code on shutdown for non HTTP/3 qcc_shutdown() is called whenever the connection must be closed. If application protocol defined its owned shutdown callback, it is invoked to use the correct error code. Else transport error code NO_ERROR is used. A bug occurs in the latter case as NO_ERROR is used with quic_err_app() which is reserved for application errro codes. This will trigger the emission of a CONNECTION_CLOSE of type 0x1d (Application) instead of 0x1c (Transport). This bug is considered minor as it does not impact QUIC with HTTP/3. It may only be visible when using experimental HTTP/0.9 protocol. This should be backported up to 2.6. For 2.6, patch must be completed rewritten due to code differences. Here is the change to apply : diff --git a/src/mux_quic.c b/src/mux_quic.c index 26fb70ddf..c48f82e27 100644 --- a/src/mux_quic.c +++ b/src/mux_quic.c @@ -1918,7 +1918,9 @@ static void qc_release(struct qcc qcc) qc_send(qcc); } else { - qcc_emit_cc_app(qcc, QC_ERR_NO_ERROR, 0); + / Duplicate from qcc_emit_cc_app() for Transport error code. */ + if (!(qcc->conn->handle.qc->flags & QUIC_FL_CONN_IMMEDIATE_CLOSE)) + qcc->conn->handle.qc->err = quic_err_transport(QC_ERR_NO_ERROR); } }	2024-05-15 16:03:01 +02:00
Amaury Denoyelle	412f1eeb89	BUG/MEDIUM: server: clear purgeable conns before server deletion Since the following commit, idle connections are cleared before a server is deleted. This is better than blocking server deletion due to inactive connections : `6e0afb2e27` MEDIUM: server: close idle conn on server deletion A BUG_ON() has been added to ensure that server idle conn counter is nul after these connections are removed. However, Willy managed to trigger it easily by repeatedly and randomly delete servers accross a single-thread haproxy using a server-template with 1000 instances. In parallel, a h1load client is executed to generate traffic. This BUG_ON() reflected that it some connections referencing the server targetted for deletion remained, even though idle server list is empty. In fact, this is caused by connections scheduled for purging. These connections are moved from idle server list to a global toremove_list while still being accounted by the server. A first approach could be to decrement server idle counter while moving connection to the purge list. However, this is functionnaly incorrect as these purgeable connections still reference the server and it could cause a crash if cleared after it. The correct fix for this issue is simply to remove every purgeable connections before a server is deleted. This is implemented by this patch by extending cli_parse_delete_server(). It could be enough to only remove connections targetted the deleted server, but as these connections will be purged anyway it is justified to clear the whole list. This must not be backported, unless the above mentionned patch is.	2024-05-15 15:01:55 +02:00
Aurelien DARRAGON	231d3d32be	MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction() Based on Willy's idea (from 3.0-dev6 announcement message): in this patch we try to reduce the max latency that can be caused by running lua scripts with default settings. Indeed, by default, hlua engine is allowed to process up to 10k instructions per batch. While this value was found to be the optimal one for a single thread, it turns out that keeping a thread busy for 10k lua instructions could increase thread contention. This is especially true when the script is loaded with 'lua-load', because in that case the current thread owns the main lua lock and prevent other threads from making any progress if they're also waiting on the main lock. Thanks to Thierry Fournier's work, we know that performance-wise we can reach optimal performance by sticking between 500 and 10k instructions per batch. Given that, when the script is loaded using 'lua-load', if no "tune.lua.forced-yield" was set by the user, we automatically divide the default value (10K) by the number of threads haproxy can use to reduce thread contention (given that all threads could compete for the main lua lock), however we make sure not to return a value below 500, because Thierry's work showed that this would come with a significant performance loss. The historical behavior may still be enforced by setting "tune.lua.forced-yield" to 10000 in the global config section.	2024-05-15 11:59:44 +02:00
Aurelien DARRAGON	e60d9dddf8	MINOR: hlua: add hlua_nb_instruction getter No functional behavior change, but this will ease the work of dynamically computing hlua_nb_instruction value depending on various inputs.	2024-05-15 11:59:37 +02:00
Tim Duesterhus	6610f656ea	DOC: Update UUID references to RFC 9562 When support for UUIDv7 was added in commit `aab6477b67` the specification still was a draft. It has since been published as RFC 9562. This patch updates all UUID references from the obsoleted RFC 4122 and the draft for RFC 9562 to the published RFC 9562.	2024-05-15 11:40:08 +02:00
William Manley	366b722f7e	MINOR: rhttp: Don't require SSL when attach-srv name parsing An attach-srv config line usually looks like this: tcp-request session attach-srv be/srv name ssl_c_s_dn(CN) while a rhttp server line usually looks like this: server srv rhttp@ sni req.hdr(host) The server sni argument is used as a key for looking up connection in the connection pool. The attach-srv name argument is used as a key for inserting connections into the pool. For it to work correctly they must match. There was a check that either both the attach-srv and server provide that key or neither does. It also checked that SSL and SNI was activated on the server. However, thanks to current connect_server() implementation, it appears that SNI is usable even without SSL to identify a connection in the pool. Thus, it can be diverted from its original intent in reverse HTTP case to serve even without SSL activated. For example, this could be useful to use `fc_pp_unique_id` as a name expression (DISCLAIMER: note that for now PROXY protocol is not compatible with rhttp). Error is still reported if either SNI or name is used without the other. This patch adjust the message to a more helpful one. Arguably it would be easier to understand if instead of using `name` and `sni` for `attach-srv` and `server` rules it used the same term in both places - like "conn-pool-key" or something. That would make it clear that the two must match.	2024-05-14 16:39:07 +02:00
Aurelien DARRAGON	32f0cd3242	BUG/MINOR: log: smp_rgs array issues with inherited global log directives When a log directive is defined in the global section, each time we use "log global" in a proxy section, the global log directives are duplicated for the current proxy. This works by creating a new proxy logger struct and duplicating every members for each global one. However, smp_rgs logger member is a special pointer member that is allocated when "range" is used on a log directive. Currently, we simply copy the array pointer (from the global one), instead of creating our own copy. Because of that, range log sampling may not work properly in some situations prior to `3f1284560` ("MINOR: log: remove the unused curr_idx in struct smp_log_range") when used in global log directives, for instance: global log 127.0.0.1:5114 format raw sample 1-2,3:4 local0 info # should receive 75% of all proxy logs log 127.0.0.1:5115 format raw sample 4:4 local0 info # should receive 25% of all proxy logs listen proxy1 log global listen proxy2 log global May not work as expected, because curr_idx was stored within smp_rgs array member prior to `3f1284560`, and due to this bug, it happens to be shared between every log directive inherited from a "global" one. The result is that curr_idx counter will not behave properly because the index will be increased globally instead of per-log directive, and it could even suffer from concurrent thread accesses under load since we don't own the global log directive's lock when manipulating it. Another issue that was revealed because of this bug is that the smp_rgs array allocated during config parsing is never freed in free_logger(), resulting in small memory leak during clean exit. To fix these issues all at once, let's properly duplicate smp_rgs logger struct member in dup_logger() like we already do for other special members so that every log directive have its own sms_rgs copy, and then systematically free it in free_logger(). While this bug affects all stable versions (including 2.4), it's probably best to not backport this beyond 2.6 because of `211ea252d` ("BUG/MINOR: logs: fix logsrv leaks on clean exit") prerequisite that first appears in 2.6. [ada: for versions prior to 2.9, `969e212` ("MINOR: log: add dup_logsrv() helper function") and `76acde91` ("BUG/MINOR: log: keep the ref in dup_logger()") must be backported first. Note: Some ctx adjustments should be performed because 'logger' struct used to be named 'logsrv' in the past and 2.9 introduced logger target struct member. Thus it's probably easier to manually apply `76acde91` and the current bugfix by hand directly on top of `969e212`. ]	2024-05-14 12:00:23 +02:00
Aurelien DARRAGON	9d4a44e713	BUG/MINOR: log: fix leak in add_sample_to_logformat_list() error path If add_sample_to_logformat_list() fails to allocate new logformat_node, then we directly jump to error_free label to cleanup the node using free_logformat_node() before returning an error. However if the node failed to allocate, then the sample expression that was allocated just before (not yet assigned) isn't released (free_logformat_node() is a no-op when NULL is provided). Thus if expr wasn't assigned to the node during early failure, then it must be manually released. This bug was introduced by `2462e5bcc` ("BUG/MINOR: log: fix potential lf->name memory leak") which wasn't marked for backports. It only affects 3.0.	2024-05-13 16:44:27 +02:00
Willy Tarreau	0ce51dc93b	MEDIUM: dynbuf: implement emergency buffers The buffer reserve set by tune.buffers.reserve has long been unused, and in order to deal gracefully with failed memory allocations we'll need to resort to a few emergency buffers that are pre-allocated per thread. These buffers are only for emergency use, so every time their count is below the configured number a b_free() will refill them. For this reason their count can remain pretty low. We changed the default number from 2 to 4 per thread, and the minimum value is now zero (e.g. for low-memory systems). The tune.buffers.limit setting has always been a problem when trying to deal with the reserve but now we could simplify it by simply pushing the limit (if set) to match the reserve. That was already done in the past with a static value, but now with threads it was a bit trickier, which is why the per-thread allocators increment the limit on the fly before allocating their own buffers. This also means that the configured limit is saner and now corresponds to the regular buffers that can be allocated on top of emergency buffers. At the moment these emergency buffers are not used upon allocation failure. The only reason is to ease bisecting later if needed, since this commit only has to deal with resource management.	2024-05-10 17:18:13 +02:00
Willy Tarreau	47665be083	MEDIUM: mux-h1: allocate without queuing when retrying Now when trying to allocate a buffer, we can check if we've been notified of availability via the callback, in which case we should not consult the queue, or if we're doing a first allocation and check the queue. At this point it still doesn't change much since the stream still doesn't make use of it but some progress is expected.	2024-05-10 17:18:13 +02:00
Willy Tarreau	b5714b45e8	MEDIUM: stream: allocate without queuing when retrying Now when trying to allocate the work buffer, we can check if we've been notified of availability via the buf_wait callback, in which case we should not consult the queue, or if we're doing a first allocation and check the queue.	2024-05-10 17:18:13 +02:00
Willy Tarreau	f552f79ba5	MINOR: mux-h1: report that a buffer allocation succeeded When the buffer allocation callback is notified of a buffer availability, it will now set a MAYALLOC flag in addition to clearing the ALLOC one, for each of the 3 levels where we may fail an allocation. The flag will be cleared upon a successful allocation. This will soon be used to decide to re-allocate without waiting again in the queue. For now it has no effect. There's just a trick, we need to clear the various *_ALLOC flags before testing h1_recv_allowed() otherwise it will return false!	2024-05-10 17:18:13 +02:00
Willy Tarreau	cb2d758043	MINOR: applet: report about buffer allocation success When appctx_buf_available() is called, it now sets APPCTX_FL_IN_MAYALLOC or APPCTX_FL_OUT_MAYALLOC depending on the reportedly permitted buffer allocation, and these flags are cleared when the said buffers are allocated. For now they're not used for anything else.	2024-05-10 17:18:13 +02:00
Willy Tarreau	17d8916bb1	MINOR: stream: report that a buffer allocation succeeded When the buffer allocation callback is notified of a buffer availability, it will now set a MAYALLOC flag on the stream so that the stream knows it is allowed to bypass the queue checks. For now this is not used.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a160b3c50c	MEDIUM: dynbuf/mux-h1: do not allocate the buffers in the callback One of the problematic designs with the buffer_wait mechanism is that the callbacks pre-allocate the buffers and stay in the run queue for a while, resulting in all of the few buffers being assigned to waiting tasks instead of being all available to one task that needs them all at once. Here we simply stop doing this, the callback clears the waiting flags and wakes the task up so that it has a chance of still finding some buffers.	2024-05-10 17:18:13 +02:00
Willy Tarreau	c510e81a3f	MINOR: dynbuf/mux-h1: use different criticalities for buffer allocations While it could certainly still be improved, this first approach consists in assigning buffers like this in the H1 mux: - h1c->obuf : DB_MUX_TX - h1c->ibuf : DB_MUX_RX - h1s->rxbuf: DB_SE_RX That's done via 3 distinct functions for better code clarity, and it also allowed to move the missing buffer flags assignment there. Among possible improvements would be to take into consideration the state of the parser (i.e. no data yet vs data, or headers vs payload) so that even server beginning of response or pure payload can be lowered in priority.	2024-05-10 17:18:13 +02:00
Willy Tarreau	4ffb3b5ebe	MINOR: applet: set the blocking flag in the buffer allocation function Instead of having each caller of appctx_get_buf() think about setting the blocking flag, better have the function do it, since it's already handling the queue anyway. This way we're sure that both are consistent.	2024-05-10 17:18:13 +02:00
Willy Tarreau	ee0d56ac85	MEDIUM: applet: make appctx_buf_available() only wake the applet up, not allocate Now we don't want bufwait handlers to preallocate the resources they were expecting since it contributes to the shortage. Let's just wake the applet up and that's all.	2024-05-10 17:18:13 +02:00
Willy Tarreau	9a27d7aa6f	MEDIUM: dynbuf/stream: do not allocate the buffers in the callback One of the problematic designs with the buffer_wait mechanism is that the callbacks pre-allocate the buffers and stay in the run queue for a while, resulting in all of the few buffers being assigned to waiting tasks instead of being all available to one task that needs them all at once. Here we simply stop doing this, the callback clears the waiting flags and wakes the task up so that it has a chance of still finding some buffers.	2024-05-10 17:18:13 +02:00
Willy Tarreau	db21062881	MEDIUM: dynbuf/stream: re-enable queueing upon failed buffer allocation The errors were not working fine anyway since we know that upon low memory condition everything freezes. However we have a chance to do better now, so let's start by re-enabling queueing when allocations fail.	2024-05-10 17:18:13 +02:00
Willy Tarreau	f5566afec6	MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait Now thanks to this the bufq_map field is expected to remain accurate.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a5d6a79986	MEDIUM: dynbuf: make the buffer_wq an array of list heads Let's turn the buffer_wq into an array of 4 list heads. These are chosen by criticality. The DB_CRIT_TO_QUEUE() macro maps each criticality level into one of these 4 queues. The goal here clearly is to make it possible to wake up the most critical queues in priority in order to let some tasks finish their job and release buffers that others can use. In order to avoid having to look up all queues, a bit map indicates which queues are in use, which also allows to avoid looping in the most common case where queues are empty..	2024-05-10 17:18:13 +02:00
Willy Tarreau	a214197ce7	MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere The code places that were used to manipulate the buffer_wq manually now just call b_queue() or b_requeue(). This will simplify the multiple list management later.	2024-05-10 17:18:13 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Amaury Denoyelle	cc9827bb09	BUG/MEDIUM: mux-quic: fix crash on STOP_SENDING received without SD Abort reason code received on STOP_SENDING is notified to upper layer since the following commit : `367ce1ebf3` MINOR: mux-quic: Set tha SE abort reason when a STOP_SENDING frame is received However, this causes a crash when a STOP_SENDING is received on a QCS instance without any stream instantiated. Fix this by checking first if qcs->sd is not NULL before setting abort code. This bug can easily be reproduced by emitting a STOP_SENDING as first frame of a stream. This should fix github issue #2563. This does not need to be backported.	2024-05-10 11:01:05 +02:00
Aurelien DARRAGON	fbbc2925d4	BUG/MEDIUM: log/ring: broken syslog octet counting As reported by Tristan in GH #2561, syslog messages sent over rings are malformed since commit `01aa0a05` ("MEDIUM: ring: change the ring reader to use the new vector-based API now"). Indeed, take a look at the following log message produced prior to `01aa0a05`: 181 <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 113700 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/21/30/51 404 369 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1" Starting with `01aa0a05`, here's the equivalent log message: <134>1 2024-05-07T09:45:21.543263+02:00 - haproxy 112729 - - 127.0.0.1:56136 [07/May/2024:09:45:21.491] front front/s1 0/0/66/39/105 404 369 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"-fwr -> Message is missing octet counting header, and garbage bytes are found at the end of the payload. This bug is caused by a small mistake in syslog_applet_append_event(): when the function was refactored to use vector API instead of buffer API, we used 'trash.area' as starting pointer to write the event instead of 'trash.area + trash.data', causing existing octet counting prefix (already written in trash) to be overwritten and trash.data to be wrongly incremented. No backport needed (`01aa0a05` was introduced during 3.0 development)	2024-05-07 19:23:01 +02:00
Christopher Faulet	bd47e344b8	MINOR: connection: Add samples to retrieve info on streams for a connection Thanks to the previous fix, it is now possible to get the number of opened streams for a connection and the negociated limit. Here, corresponding sample feches are added, in fc_ and bc_ scopes. On frontend side, the limit of streams is imposed by HAProxy. But on the backend side, the limit is defined by the server. it may be useful for debugging purpose because it may explain slow-downs on some processing.	2024-05-06 22:00:01 +02:00
Christopher Faulet	eca9831ec8	MINOR: muxes: Add ctl commands to get info on streams for a connection There are 2 new ctl commands that may be used to retrieve the current number of streams openned for a connection and its limit (the maximum number of streams a mux connection supports). For the PT and H1 muxes, the limit is always 1 and the current number of streams is 0 for idle connections, otherwise 1 is returned. For the H2 and the FCGI muxes, info are already available in the mux connection. For the QUIC mux, the limit is also directly available. It is the maximum initial sub-ID of bidirectional stream allowed for the connection. For the current number of streams, it is the number of SC attached on the connection and the number of not already attached streams present in the "opening_list" list.	2024-05-06 22:00:00 +02:00
Christopher Faulet	12fb6d73cd	MINOR: mux-quic: Add .ctl callback function to get info about a mux connection Other muxes implement this callback function. It was not implemented for the QUIC mux because it was useless. It will be used to retrieve the current/max number of stream for a quic connection. So let's added it, adding the default support for MUX_CTL_EXIT_STATUS command.	2024-05-06 22:00:00 +02:00
Christopher Faulet	068ce2d5d2	MINOR: stconn: Add samples to retrieve about stream aborts It is now possible to retrieve some info about the abort received for a server or a client stream, if any. * fs.aborted and bs.aborted can be used to know if an abort was received on frontend or backend side. A boolean is returned. * fs.rst_code and bs.rst_code return the code of the received RESET_STREAM frame for a H2 stream or the code of the received STOP_SENDING frame for a QUIC stream. In both cases, the error code attached to the frame is returned. The sample fetch fails if no such frame was received or if the stream is not an H2/QUIC stream.	2024-05-06 22:00:00 +02:00
Christopher Faulet	367ce1ebf3	MINOR: mux-quic: Set tha SE abort reason when a STOP_SENDING frame is received When STOP_SENDING frame is received for a quic stream, the error code is now saved in the SE abort reason. To do so, we use the QUIC source (SE_ABRT_SRC_MUX_QUIC). For now, this code is only set but not used on the opposite side.	2024-05-06 22:00:00 +02:00
Christopher Faulet	20b156ee15	MEDIUM: mux-h2: Forward h2 client cancellations to h2 servers When a H2 client sends a RST_STREAM(CANCEL) frame to abort a request, the abort reason is now used on server side, in the H2 mux, to set the RST_STREAM code. The main use case is to forward client cancellations to gRPC applications. This patch should fix the issue #172.	2024-05-06 22:00:00 +02:00
Christopher Faulet	dea79f3fe1	MINOR: mux-h2: Set the SE abort reason when a RST_STREAM frame is received When RST_STREAM frame is received, the error code is now saved in the SE abort reason. To do so, we use the H2 source (SE_ABRT_SRC_MUX_H2). For now, this code is only set but not used on the opposite side.	2024-05-06 22:00:00 +02:00
Christopher Faulet	96f8b7ad08	MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes A reason is now passed as parameter to muxes shutdowns to pass additional info about the abort, if any. No info means no abort or only generic one. For now, the reason is composed of 2 32-bits integer. The first on represents the abort code and the other one represents the info about the code (for instance the source). The code should be interpreted according to the associated info. One info is the source, encoding on 5 bits. Other bits are reserverd for now. For now, the muxes are the only supported source. But we can imagine to extend it to applets, streams, health-checks... The current design is quite simple and will most probably evolved.. But the idea is to let the opposite side forward some errors and let's a mux know why its stream was aborted. At first glance, a abort reason must only be evaluated if SE_SHW_SILENT flag is set. The main goal at short term, is to forward some H2 RST_STREAM codes because it is mandatory for gRPC applications, mainly to forward gRPC cancellation from an H2 client to an H2 server. But we can imagine to alter this reason at the applicative level to enrich it. It would also be used to report more accurate errors in logs.	2024-05-06 22:00:00 +02:00
Patrick Hemmer	28489021b3	BUG/MINOR: cfgparse: use curproxy global var from config post validation Previously check_config_validity() had its own curproxy variable. This resulted in the acl() sample fetch being unable to determine which proxy was in use when used from within log-format statements. This change addresses the issue by having the check_config_validity() function use the global variable instead.	2024-05-06 18:45:47 +02:00
Patrick Hemmer	93d4e99714	BUG/MINOR: acl: support built-in ACLs with acl() sample Built-in ACLs were not being searched by the acl() sample fetch. This fixes that so they are searched if no other match is found.	2024-05-06 18:42:54 +02:00
Valentine Krasnobaeva	4a9e3e102e	BUG/MINOR: haproxy: only tid 0 must not sleep if got signal This patch fixes the commit `eea152ee68` ("BUG/MINOR: signals/poller: ensure wakeup from signals"). There is some probability that run_poll_loop() becomes inifinite, if TH_FL_SLEEPING is withdrawn from all threads in the second signal_queue_len check, when a signal has received just after the first one. In such particular case, the 'wake' variable, which is used to terminate thread's poll loop is never reset to 0. So, we never enter to the "stopping" part of the run_poll_loop() and threads, except the one with id 0 (tid 0 handles signals), will continue to call _do_poll() eternally and will never sleep, as its TH_FL_SLEEPING flag was unset. This flag needs to be removed only for the tid 0, as it was done in the first signal_queue_len check. This fixes an issue #2537 "infinite loop when shutting down". This fix must be backported in every stable version.	2024-05-06 18:39:08 +02:00
Aurelien DARRAGON	03ca16f38b	OPTIM: log: resolve logformat options during postparsing In lf_buildctx_prepare(), we perform costly bitwise operations for every nodes to resolve node options and check for incompatibilities with global options. In fact, all this logic may safely be performed during postparsing. This is what we're doing in this commit. Doing so saves us from unnecessary runtime checks and could help speedup sess_build_logline(). Since checks are not as costly as before (due to them being performed during postparsing and not on log building path anymore), an complementary check for OPT_HTTP vs OPT_ENCODE incompatibity was added: encoding is ignored if HTTP option is set, unless HTTP option wasn't set globally and encoding was set globally, which means encoding takes the precedence Thanks to this patch, lf_buildctx_prepare() now only takes care of assigning proper typecast and options settings depending if it's used from global or per-node context, and prepares CBOR-specific structure members when CBOR encode option is set.	2024-05-06 11:13:46 +02:00
Ilia Shipitsin	a7cf2454dd	BUILD: clock: improve check for pthread_getcpuclockid() if _POSIX_THREAD_CPUTIME is greater than 0, pthread_getcpuclockid() is implemented. This should fix the build on Solaris 11. Reference: https://docs.oracle.com/cd/E88353_01/html/E37842/unistd-3head.html ML: https://www.mail-archive.com/haproxy@formilux.org/msg44915.html	2024-05-06 08:25:17 +02:00
Aurelien DARRAGON	d26a160133	OPTIM: log: speedup date printing in sess_build_logline() when no encoding is used In sess_build_logline(), we have multiple fieds such as '%t' that build a fixed-length string out of a date struct and then print it using lf_rawtext(). In fact, printing it using lf_rawtext() is only mandatory to deal with encoding options, but when no encoding is used we can output the result to tmplog directly. Since most dates generate between 25 and 30 chars, doing so spares us from writing them twice and could help make sess_build_logline() a bit faster when no encoding is used. (to match with pre-encoding patch series performance).	2024-05-04 10:13:05 +02:00
Aurelien DARRAGON	bf3b4001ce	OPTIM: log: use lf_buildctx's buffer instead of temporary stack buffers Now that lf_buildctx isn't pushed on the stack anymore, let's take this opportunity to store a small buffer of 256 bytes within it, and then use this buffer as general purpose buffer to build fixed-length strings that are then printed using lf_{raw}text() function. By doing so we stop relying on temporary stack buffers.	2024-05-04 10:13:05 +02:00
Aurelien DARRAGON	ccc4341258	OPTIM: log: use thread local lf_buildctx to stop pushing it on the stack Following previous commit's logic, let's move lf_buildctx ctx away from sess_build_logline() to stop abusing from the stack to push large structure each time sess_build_logline() is called. Also, don't memset the structure for each invokation, but only reset members explicitly when required. For that we now declare one static lf_buildctx per thread (using THREAD_LOCAL) and make sess_build_logline() refer to it using a pointer.	2024-05-04 10:13:05 +02:00
Aurelien DARRAGON	728b5aa835	OPTIM: log: declare empty buffer as global variable 'empty' buffer used in sess_build_logline() inside a loop, and since it is only being read from and not modified, until recently it ended up being cached most of the time and didn't cause overhead due to systematic push on the stack. However, due recent encoding work and new added variables on the stack, we're starting to reach a stack limit and declaring 'empty' buffer within the loop seems to cause non-negligible CPU overhead. Since the variable isn't modified during log generation, let's declare 'empty' buffer as a global variable outside from sess_build_logline() to prevent pushing it on the stack for each node evaluation.	2024-05-04 10:13:05 +02:00
Aurelien DARRAGON	cc2e94a948	BUG/MINOR: log: prevent double spaces emission in sess_build_logline() Christian reported in GH #2556 that since 3.0-dev double spaces may be found in log messages on some cases where it was not the case before. As we were able to easily reproduce, a quick bisect led us to `c6a7138` ("MINOR: log: simplify last_isspace in sess_build_logline()"). While it is true that all switch cases set the last_isspace variable to 0, there was a subtelty for some fields such as '%hr', '%hrl', '%hs' or '%hsl' and I overlooked it. Indeed, for '%hr', last_isspace was only set to 0 if data was emitted, else the assignment didn't occur. But with `c6a7138`, last_isspace is always set to 0 as long as the current node type is not a separator. Because of that, if no data is emitted for the current node value, and a space was already emitted prior to the current node, then an extra space could be emitted after the node, resulting in two spaces being emitted. Note that while `c6a7138` introduces a slight behavior regression regarding last_isspace logic with the specific fields mentionned above, this behavior could already be triggered with a failing or empty logformat node sample expression. Consider this logformat expression: log-format "%{-M}o \| %[str()] \|" str() will not print anything, and since we disabled mandatory option with '-M', nothing gets printed for the node sample expression. As a result, we have the following output: "\| \|" Instead of (when mandatory option is enabled): "\| - \|" Thus in order to stick to the historical behavior, systematically set last_isspace to 0 for EXPR nodes, and only set last_isspace to 0 when data was written for TAG nodes. This way, '%hr', '%hrl', '%hs' or '%hsl' should behave as before. No backport needed.	2024-05-03 16:48:21 +02:00
Aurelien DARRAGON	48e0efb00b	MEDIUM: log: optimizing tmp->type handling in sess_build_logline() Instead of chaining 2 switchcases and performing encoding checks for all nodes let's actually split the logic in 2: first handle simple node types (text/separator), and then handle dynamic node types (tag, expr). Encoding options are only evaluated for dynamic node types. Also, last_isspace is always set to 0 after next_fmt label, since next_fmt label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR. Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now unused, let's get rid of it. No functional change should be expected. (Use diff -w to check patch changes since reindentation makes the patch look heavy, but in fact it remains fairly small)	2024-05-03 16:48:21 +02:00
Ilia Shipitsin	a65c6d3574	CLEANUP: assorted typo fixes in the code and comments This is 42nd iteration of typo fixes	2024-05-03 09:01:36 +02:00
Amaury Denoyelle	53782b9ea5	MINOR: stats: extract proxy clear-counter in a dedicated function Split code related to proxies list looping in cli_parse_clear_counters() to a new dedicated function. This function is placed in the new module stats-proxy.	2024-05-02 16:43:26 +02:00
Amaury Denoyelle	f0644d1bd7	REORG: stats: define stats-proxy source module Create a new module stats-proxy. Move stats functions related to proxies list looping in it. This allows to reduce stats source file dividing its size by half.	2024-05-02 16:42:36 +02:00
William Lallemand	271def959c	MINOR: ssl: rename ocsp_update.http_proxy into ocsp-update.httpproxy Rename to the option to have a more consistent name.	2024-05-02 16:32:06 +02:00
William Lallemand	964f093504	CLEANUP: ssl: rename new_ckch_store_load_files_path() to ckch_store_new_load_files_path() Rename the new_ckch_store_load_files_path() function to ckch_store_new_load_files_path(), in order to be more consistent.	2024-05-02 16:03:20 +02:00
Amaury Denoyelle	10ab56831e	MINOR: stats: convert age as generic column for proxy stat Convert FN_AGE in stat_cols_px[] as generic columns. These values will be automatically used for dump/preload of a stats-file. Remove srv_lastsession() / be_lastsession() function which are now useless as last_sess is calculated via me_generate_field().	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	e92ae8f0ba	MINOR: stats: support age in stats-file Extend generic stat column support to be able to fully support age stats type. Several changes were required. On output, me_generate_field() has been updated to report the difference between the current tick with the stored value for FN_AGE type. Also, if an age stats is hidden in show stats, -1 is returned instead of an empty metric, which is the value to mark an age as unset. On counters preload, load_ctr() was updated to handled FN_AGE. A similar substraction is performed to the current tick value.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	634cc2a5d8	MINOR: counters: move last_change into counters struct last_change was a member present in both proxy and server struct. It is used as an age statistics to report the last update of the object. Move last_change into fe_counters/be_counters. This is necessary to be able to manipulate it through generic stat column and report it into stats-file. Note that there is a change for proxy structure with now 2 different last_change values, on frontend and backend side. Special care was taken to ensure that the value is initialized only on the proxy side. The other value is set to 0 unless a listen proxy is instantiated. For the moment, only backend counter is reported in stats. However, with now two distinct values, stats could be extended to report it on both side.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	9b35e1f30c	MINOR: stats: convert rate as generic column for proxy stats Convert every FN_RATE in stat_cols_px[] to generic column. Thanks to prior patch, this allows to automatically dump their value into stats-file and preload corresponding freq-ctr on process startup.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	fec2ae9b76	MINOR: stats: support rate in stats-file Implement support for FN_RATE stat column into stat-file. For the output part, only minimal change is required. Reuse the function read_freq_ctr() to print the same value in both stats output and stats-file dump. For counter preloading, define a new utility function preload_freq_ctr(). This can be used to initialize a freq-ctr type by preloading previous period value. Reuse this function in load_ctr() during stats-file parsing. At the moment, no rate column is defined as generic. Thus, this commit does not have functional change. This will be changed as soon as FN_RATE are converted to generic columns.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	639e73f8f2	MINOR: counters: move freq-ctr from proxy/server into counters struct Move freq-ctr defined in proxy or server structures into their dedicated fe_counters/be_counters struct. Functionnaly no change here. This commit will allow to convert rate stats column to generic one, which is mandatory to manipulate them in the stats-file.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	4e9e841878	MINOR: stats: prepare stats-file support for values other than FN_COUNTER Currently, only FN_COUNTER are dumped and preloaded via a stats-file. Thus in several places we relied on the assumption that only FN_COUNTER are valid in stats-file context. New stats types will soon be implemented as they are also eligilible to statistics reloading on process startup. Thus, prepare stats-file functions to remove any FN_COUNTER restriction. As one of this change, generate_stat_tree() now uses stcol_is_generic() for stats name tree indexing before stats-file parsing. Also related to stats-file parsing, individual counter preloading step as been extracted from line parsing in a dedicated new function load_ctr(). This will allow to extend it to support multiple mechanism of counter preloading depending on the stats type.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	933b4ae27d	MINOR: stats: convert req_tot as generic column req_tot counter is a special case as it is not managed identically between frontend and backend side. For the backend side, this metric is available directly into be_counters, which allows to use a generic stat column definition. On the frontend side however, the metric value is an aggredate of multiple fe_counters value. This is the case since the splitting between HTTP version introduced in the following patch : `9969adbcdc` MINOR: stats: add by HTTP version cumulated number of sessions and requests This difference cannot be handled automatically by me_generate_field(). Add a special case in the function to produce it on frontend side reusing the aggregated value. This not done however for stats-file as there is no counter to preload.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	56e6c57aa1	MINOR: stats: fix visual alignment for stat_cols_px definition Simply adjust visual alignment in definition of proxy stats columns definition for ST_I_PX_HANAFAIL column.	2024-05-02 10:55:25 +02:00
William Lallemand	3a19698b81	CLEANUP: ssl: move the global ocsp-update options parsing to ssl_ocsp.c Move the global tunel.ssl.ocsp-update option parsing to ssl_ocsp.c.	2024-05-02 10:48:05 +02:00
William Lallemand	622c635815	CLEANUP: ssl: clean the includes in ssl_ocsp.c Clean the includes in ssl_ocsp.c which were copied from ssl_sock.c and are not relevant anymore. Also move the include in the right order.	2024-05-02 10:35:27 +02:00
Valentine Krasnobaeva	5cbb278fae	MINOR: capabilities: add cap_sys_admin support If 'namespace' keyword is used in the backend server settings or/and in the bind string, it means that haproxy process will call setns() to change its default namespace to the configured one and then, it will create a socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN capability in the process Effective set (see man 2 setns). Otherwise, the process must be run as root. To avoid to run haproxy as root, let's add cap_sys_admin capability in the same way as we already added the support for some other network capabilities. As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was found during configuration parsing. The flag may be unset only in prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which inspect process EUID/RUID and Effective and Permitted capabilities sets. If system doesn't support Linux capabilities or 'cap_sys_admin' was not set in 'setcap', but 'namespace' keyword is presented in the configuration, we keep the previous strict behaviour. Process, that has changed uid to the non-priviledged user, will terminate with alert. This alert invites the user to recheck its configuration. In the case, when haproxy will start and run under a non-root user and 'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch does not change previous behaviour as well. We'll still let the user to try its configuration, but we inform via warning, that unexpected things, like socket creation errors, may occur.	2024-04-30 21:40:17 +02:00
Valentine Krasnobaeva	13ef552488	MINOR: sock: add EPERM case in sock_handle_system_err setns() may return EPERM if thread, that tries to move into different namespace, do not have CAP_SYS_ADMIN capability in its Effective set. So, extending sock_handle_system_err() with this error allows to send appropriate log message and set SF_ERR_PRXCOND (SC termination flag in log) as stream termination error code. This error code can be simply checked with SF_ERR_MASK at protocol layer.	2024-04-30 21:39:32 +02:00
Valentine Krasnobaeva	d3fc982cd7	MEDIUM: proto: make common fd checks in sock_create_server_socket quic_connect_server(), tcp_connect_server(), uxst_connect_server() duplicate same code to check different ERRNOs, that socket() and setns() may return. They also duplicate some runtime condition checks, applied to the obtained server socket fd. So, in order to remove these duplications and to improve code readability, let's encapsulate socket() and setns() ERRNOs handling in sock_handle_system_err(). It must be called just before fd's runtime condition checks, which we also move in sock_create_server_socket by the same reason.	2024-04-30 21:39:24 +02:00
Valentine Krasnobaeva	772d070ab5	MINOR: sock_set_mark: take sock family in account SO_MARK, SO_USER_COOKIE, SO_RTABLE socket options (used to set the special mark/ID on socket, in order to perform mark-based routing) are only supported by AF_INET sockets. So, let's check socket address family, when we enter into this function.	2024-04-30 21:38:29 +02:00
Valentine Krasnobaeva	d602d568e0	MEIDUM: unix sock: use my_socketat to create bind socket As UNIX Domain sockets could be attached to Linux namespaces (see more details about it from the Linux kernel patch set below: https://lore.kernel.org/netdev/m1hbl7hxo3.fsf@fess.ebiederm.org), it is better to use my_socket_at() in order to create UNIX listener's socket. my_socket_at() takes in account a network namespace, that may be configured for a frontend in the bind line: frontend fe_foo ... bind uxst@frontend.sock user haproxy group haproxy mode 660 namespace frontend Like this, namespace aware applications as netstat for example, will see this listening socket in its 'frontend' namespace and not in the root namespace as it was before. It is important to mention, that fixes in Linux kernel referenced above allow to connect to this listener's socket from the root and from any other namespace. UNIX Domain socket is protected by its permission set, which must be set with caution on its inode.	2024-04-30 21:38:24 +02:00
Valentine Krasnobaeva	84babc93ce	MEDIUM: proto_uxst: take in account server namespace As UNIX Domain sockets could be attached to Linux namespaces (see more details about it from the Linux kernel patch set below: https://lore.kernel.org/netdev/m1hbl7hxo3.fsf@fess.ebiederm.org), it is better to use sock_create_server_socket() in UNIX stream protocol implementation, as this function calls my_socket_at() and the latter takes in account server network namespace, which could be configured as in example below: backend be_bar ... server rpicam0 /run/ustreamer.sock namespace foonet So, for UNIX Domain socket, used as an address of some backend server, this patch makes possible to perform connect() to this backend server from the same network namespace, where the server is running, or where its listening socket was created. Using sock_create_server_socket() in UNIX stream protocol implementation also makes the code of uxst_connect_server() more uniform with tcp_connect_server() and quic_connect_server().	2024-04-30 21:38:18 +02:00
Valentine Krasnobaeva	a0b5324cff	MINOR: sock: rename sock to sock_fd in sock_create_server_socket Renaming sock to sock_fd makes it more clear, that sock_create_server_socket returns the fd of newly created server socket and then we check this fd. As we heavily use "fd" variable name in all protocol implementations, let's prefix this one with the name of its object file: sock.o.	2024-04-30 21:38:12 +02:00
Willy Tarreau	072686dafd	BUG/MINOR: stconn: don't wake up an applet waiting on buffer allocation Since the extension of the buffers API to applets in 3.0-dev, an applet may find itself unable to allocate a buffer, and will block respectively on APPCTX_FL_OUTBLK_ALLOC or APPCTX_FL_INBLK_ALLOC depending on the direction. However the code in sc_applet_process() doesn't consider this situation when deciding to wake up an applet, so when the condition arises, the applet keeps ringing and is killed by the loop detector. The fix is trivial and simply consists in checking for the flags above. No backport is needed since this is new in 3.0.	2024-04-30 21:36:47 +02:00
Aurelien DARRAGON	12d08cf912	BUG/MEDIUM: log: don't ignore disabled node's options In `3f2e8d0ed` ("MEDIUM: log: lf_* build helpers now take a ctx argument") I made a mistake, because starting with this commit it is no longer possible from a node to disable global logformat options. The result is that when an option is set globally, it cannot be disabled anymore. For instance, it is not possible to do this anymore: log-format "%{+X}o %{-X}Ts" The original intent was to prevent encoding options from being disabled once enabled globally, because when encoding is enabled globally we start the object enumeration right away (ie: in CBOR and JSON we announce dynamic map, and for each node we announce the key..), thus it doesn't make sense to mix encoding types there, unless encoding is only used per-node, in which case only the value gets encoded, thus it remains possible to print a value in JSON/CBOR-compatible format while the next one shouldn't be printed as-is. Thus, to restore the original behavior, slightly change the logic in lf_buildctx_prepare() so that only global encoding options take the precedence over node's options (instead of all options). No backport needed.	2024-04-30 18:45:07 +02:00
Aurelien DARRAGON	41d7e82e0f	MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx (again) The BUG_ON() statement that was added in `9bdea51` ("MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx") isn't sufficient as Coverity still thinks the lf_buildctx itself may be NULL as shown in GH #2554. In fact the original reports complains about the lf_buildctx itself and I didn't understand it properly, let's add another check in the BUG_ON() to ensure both cbor_ctx and cbor_ctx->ctx are not NULL since it is not expected if used properly.	2024-04-30 10:10:35 +02:00
Aurelien DARRAGON	9931a62c3f	BUG/MINOR: log: fix global lf_expr node options behavior (2nd try) In `98b44e8` ("BUG/MINOR: log: fix global lf_expr node options behavior"), I properly restored global node options behavior for when encoding is not used, however the fix is not optimal when encoding is involved: Indeed, encoding logic in sess_build_logline() relies on global node options to know if encoding must be handled expression-wide or individually. However, because of the above fix, if an expression is made of 1 or multiple nodes that all set an encoding option manually (without '%o'), we consider that the option was set globally, but that's probably not what the user intended. Instead we should only evaluate global options from '%o', so that it remains possible to skip global encoding when needed. No backport needed.	2024-04-30 10:10:35 +02:00
Aurelien DARRAGON	97240d01b3	BUG/MINOR: log/encode: fix potential NULL-dereference in LOGCHAR() When CBOR encoding was added in `c614fd3b9` ("MINOR: log: add +cbor encoding option"), in LOGCHAR(), we forgot to check that we don't assign the NULL value to tmplog (as we assume that tmplog cannot be NULL at the end of sess_build_logline()) No backport needed.	2024-04-30 10:10:35 +02:00
Aurelien DARRAGON	949ac95aa6	BUG/MINOR: log/encode: consider global options for key encoding In sess_build_logline(), contrary to what's stated in the comment "only consider global ctx for key encoding", we check for LOG_OPT_ENCODE flag on the current ctx options instead of global ones. Because of this, we could end up doing the wrong thing if the previous node had encoding enabled but it isn't set globally for instance. To fix the issue, let's simply check the presence of the flag on g_options before entering the "key encoding" block. This bug was introduced with `3f7c8387` ("MINOR: log: add +json encoding option"), no backport needed.	2024-04-30 10:10:35 +02:00
William Lallemand	6b634c4779	MINOR: ssl: introduce ocsp_update.http_proxy for ocsp-update keyword The ocsp_update.http_proxy global option allows to set an HTTP proxy address which will be used to send the OCSP update request with an absolute form URI.	2024-04-29 17:23:02 +02:00
William Lallemand	95949e6868	MINOR: httpclient: allow to use absolute URI with new flag HC_F_HTTPROXY The new HC_F_HTTPPROXY flag allows to use an absolute URI within a request that won't be modified in order to use an http proxy.	2024-04-29 17:10:47 +02:00
Aurelien DARRAGON	9bdce67585	CLEANUP: log: add a macro to know if a lf_node is configurable LF_NODE_WITH_OPT(node) returns true if the node's option may be set and thus should be considered. Logic is based on logformat node's type: for now only TAG and FMT nodes can be configured.	2024-04-29 14:47:37 +02:00
Aurelien DARRAGON	98b44e8edb	BUG/MINOR: log: fix global lf_expr node options behavior In `507223d5` ("MINOR: log: global lf_expr node options"), a mistake was made because it was assumed that only the last occurence of %o (LOG_FMT_GLOBAL) should be kept as global node options. However, although not documented, it is possible to have multiple %o within a single logformat expression to change the global settings on the fly. For instance, consider this example: log-format "%{+X}o test1=%ms %{-X}o test2=%ms %{+X}o test3=%ms" Prior to `3f2e8d0ed` ("MEDIUM: log: lf_* build helpers now take a ctx argument"), this would output something like this: test1=18B test2=395 test3=18B This is because global options is properly updated as the lf_expr string is parsed. But now due to `507223d5` and `3f2e8d0ed`, only the last %o occurence is considered. With the above example, this gives: test1=18B test2=18B test3=18B To restore historical behavior, let's partially revert `507223d5`: to compute global node options, we now start with all options enabled and then for each configurable node in lf_expr_postcheck(), we keep options common to the current node and previous nodes using AND masking, this way we really end up with options common to all nodes. No backport needed.	2024-04-29 14:47:37 +02:00
Aurelien DARRAGON	9bdea51d7e	MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx As shown in GH #2550, Coverity is tempted to think that NULL-dereference can occur in _lf_cbor_encode_byte() due to user-ctx being dereferenced from cbor_ctx, while coverity thinks that cbor_ctx may be NULL. In practise this cannot happen, because _lf_cbor_encode_byte() is only leveraged through a function pointer that is set in conjunction with the function pointer ctx (which ain't NULL). All this logic is done inside lf_buildctx_prepare() when LOG_OPT_ENCODE_CBOR is set. Since coverity doesn't seem to understand the logic properly, then it might as well confuse humans, so let's make it clear in _lf_cbor_encode_byte() that we expect non-NULL ctx by adding a BUG_ON()	2024-04-29 14:47:37 +02:00
Aurelien DARRAGON	0e2aea8224	CLEANUP: tools/cbor: rename cbor_encode_ctx struct members Rename e_byte_fct to e_fct_byte and e_fct_byte_ctx to e_fct_ctx, and adjust some comments to make it clear that e_fct_ctx is here to provide additional user-ctx to the custom cbor encode function pointers. For now, only e_fct_byte function may be provided, but we could imagine having e_fct_int{16,32,64}() one day to speed up the encoding when we know we can encode multiple bytes at a time, but for now it's not worth the hassle.	2024-04-29 14:47:37 +02:00
Amaury Denoyelle	20bc42e697	BUG/MINOR: stats: replace objt_* by __objt_* macros Update parse_stat_line() used during stats-file parsing. For each line, GUID is extracted first to access to the object instance. obj_type() is then invoked to retrieve the correct object type. Replace objt_* by __objt_* macros to mark its result as safe and non NULL. This should fix coverity report from github issue #2550. No need to backport.	2024-04-29 14:21:10 +02:00
Remi Tricot-Le Breton	0610f52bcd	BUG/MEDIUM: cache: Vary not working properly on anything other than accept-encoding If a response varies on anything other than accept-encoding (origin or referer) but still contains an 'Encoding' header, the cached responses were never sent back. This is because of the 'set_secondary_key_encoding' call that always filled the accept-encoding part of the secondary signature with the response's actual encoding, regardless of whether the response varies on this or not. This meant that the accept-encoding part of the signature could be non-null in the cached entry which made the 'get_secondary_entry' calls in 'http_action_req_cache_use' always fail because in those cases the request's secondary signature always had a null accept-encoding part. This patch can be backported up to branch 2.4.	2024-04-29 10:41:46 +02:00
Willy Tarreau	b957e741b0	MINOR: cli/wait: rename the condition "srv-unused" to "srv-removable" As previously discussed, "srv-unused" is sufficiently ambiguous to cause some trouble over the long term. Better use "srv-removable" to indicate that the server is removable, and if the conditions to delete a server change over time, the wait condition will be adjusted without renaming it.	2024-04-27 09:36:36 +02:00
Willy Tarreau	bc236ad133	CLEANUP: dynbuf: move the reserve and limit parsers to dynbuf.c I just added a new setting to set the number of reserved buffer, to discover we already had one... Let's move the parsing of this keyword (tune.buffers.reserve) and tune.buffers.limit to dynbuf.c where they should be.	2024-04-27 09:36:36 +02:00
Aurelien DARRAGON	c33b857df9	MINOR: log: support true cbor binary encoding CBOR in hex format as implemented in previous commit is convenient because the produced output is portable and can easily be embedded in regular syslog payloads. However, one of the goal of CBOR implementation is to be able to produce "Concise Binary" object representation. Here is an excerpt from cbor.io website: "Some applications also benefit from CBOR itself being encoded in binary. This saves bulk and allows faster processing." Currently we don't offer that with '+cbor', quite the opposite actually since a text string encoded with '+cbor' option will be larger than a text string encoded with '+json' or without encoding at all, because for each CBOR binary byte, 2 characters will be emitted. Hopefully, the sink/log API allows for binary data to be passed as parameter, this is because all relevant functions in the chain don't rely on the terminating NULL byte and take a string pointer + string length as parameter. We can actually rely on this property to support the '+bin' option when combined with '+cbor' to produce RAW binary CBOR output. Be careful though, as this is only intended for use with set-var-fmt or to send binary data to capable UDP/ring endpoints. Example: log-format "%{+cbor,+bin}o %(test)[bin(00AABB)]" Will produce: bf64746573745f4300aabbffff (output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX characters) With cbor.me pretty printer, it gives us: BF # map() 64 # text(4) 74657374 # "test" 5F # bytes() 43 # bytes(3) 00AABB # "\u0000\xAA\xBB" FF # primitive() FF # primitive()	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	c614fd3b9f	MINOR: log: add +cbor encoding option In this patch, we make use of the CBOR (RFC8949) encode helper functions from the previous commit to implement '+cbor' encoding option for log- formats. The logic behind it is pretty similar to '+json' encoding option, except that the produced output is a CBOR payload written in HEX format so that it remains compatible to use this with regular syslog endpoints. Example: log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]" Will produce: BF6B6E616D65645F6669656C64626F6BFF Detailed view (from cbor.me): BF # map() 6B # text(11) 6E616D65645F6669656C64 # "named_field" 62 # text(2) 6F6B # "ok" FF # primitive() If the option isn't set globally, but on a specific node instead, then only the value will be encoded according to CBOR specification. Example: log-format "test cbor bool: %{+cbor}[bool(true)]" Will produce: test cbor bool: F5	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	810303e3e6	MINOR: tools: add cbor encode helpers Add cbor helpers to encode strings (bytes/text) and integers according to RFC8949, also add cbor_encode_ctx struct to pass encoding options such as how to encode a single byte.	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	3f7c8387c0	MINOR: log: add +json encoding option In this patch, we add the "+json" log format option that can be set globally or per log format node. What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the current context which is provided to all lf_* log building function. This way, all lf_* are now aware of this option and try to comply with JSON specification when the option is set. If the option is set globally, then sess_build_logline() will produce a map-like object with key=val pairs for named logformat nodes. (logformat nodes that don't have a name are simply ignored). Example: log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]" Will produce: {"named_field": "ok"} If the option isn't set globally, but on a specific node instead, then only the value will be encoded according to JSON specification. Example: log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }" Will produce: {"manual_key": true} When the option is set, +E option will be ignored, and partial numerical values (ie: because of logasap) will be encoded as-is.	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	b7c3d8c87c	MINOR: log: add +bin logformat node option Support '+bin' option argument on logformat nodes to try to preserve binary output type with binary sample expressions. For this, we rely on the log/sink API which is capable of conveying binary data since all related functions don't search for a terminating NULL byte in provided log payload as they take a string pointer and a string length as argument. Example: log-format "%{+bin}o %[bin(00AABB)]" Will produce: 00aabb (output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX characters) This should be used carefully, because many syslog endpoints don't expect binary data (especially NULL bytes). This is mainly intended for use with set-var-fmt actions or with ring/udp log endpoints that know how to deal with such binary payloads. Also, this option is only supported globally (for use with '%o'), it will not have any effect when set on an individual node. (it makes no sense to have binary data in the middle of log payload that was started without binary data option)	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	162e311a0e	MINOR: log: add no_escape_map to bypass escape with _lf_encode_bytes() Providing no_escape_map as <map> argument to _lf_encode_bytes() function will make the function skip escaping since the map is empty. This is for convenience, as it might be useful to call lf_encode_chunk() to encoding binary data without escaping it.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	fb8b47fed8	MINOR: log: postpone conversion for sample expressions in sess_build_logline() In sess_build_logline(), for sample expression nodes, instead of directly calling sample_fetch_as_type(... SMP_T_STR), let's first process the sample using sample_process(), and then proceed with the conversion to str if required. Doing so will allow us to implement type casting and preserving logic.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	84963fb743	MINOR: log: expose node typecast in lf_buildctx struct Store node->typecast setting inside lf_buildctx struct so that encoding functions may benefit from it.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	3f2e8d0ed2	MEDIUM: log: lf_* build helpers now take a ctx argument Add internal lf_buildctx struct that is only used inside sess_build_logline() scope and is passed to lf_* log building helpers to expose current building context. For now, node options and the in_text counter are stored in the ctx struct. Thanks to this change, lf_* building functions don't depend on a logformat_node struct pointer, and may be used in a standalone manner as long as a build context is provided. Also, global options are now handled explictly in sess_build_logline() to make sure that global options are always considered even if they were not duplicated on every nodes. No functional change should be expected.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	f7cb384f1a	MINOR: log: merge lf_encode_string() and lf_encode_chunk() logic lf_encode_string() and lf_encode_chunk() function are pretty similar. The only difference is the stopping behavior, encode_chunk stops at a given position while encode_string stops when encountering '\0'. Moreover, both functions leverage tools.c encode helpers, but because of the LOG_OPT_ESC option, they reimplement those helpers with added logic. Instead of having to deal with code duplication which makes both functions harder to maintain, let's define a _lf_encode_bytes() helper function which satisfies lf_encode_string() and lf_encode_chunk() needs while keeping the function as simple as possible. _lf_encode_bytes() itself is made of multiple static inline helper functions, in the attempt to keep checks outside of core loop for better performance.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	a1583ec7c7	MINOR: log: make all lf_* sess build helper static There is no need to expose such functions since they are only involved in the log building process that occurs inside sess_build_logline(). Making functions static and removing their public prototype to ease code maintenance.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	3b9096bd36	MINOR: log: use LOG_VARTEXT_{START,END} to enclose text strings Rename LOGQUOTE_{START,END} macros to more generic LOG_VARTEXT_{START,END} in order to prepare for new encoding types that rely on specific treatment for variable-length texts. No functional change should be expected.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	278d6c3379	MINOR: log: explicitly handle %ts and %tsc as text strings Build fixed-length strings for %ts and %tsc to be able to print them using lf_rawtext_len(), this way it will be easier to encode them when new encoding options will be added. No functional change should be expected.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	2e4cc517bf	MEDIUM: log: use lf_rawtext for lf_ip() and lf_port() hex strings Same as the previous commit, but for ip and port oriented values when +X option is provided. No functional change should be expected. Because of this patch, we add a little overhead because we first generate the text into a temporary variable and then use lf_rawtext() to print it. Thus we have a double-copy, and this could have some performance implications that were not yet evaluated. Due to the small number of bytes that can end up being copied twice, we could be lucky and have no visible performance impact, but if we happen to see a significant impact, it could be useful to add a passthrough mechanism (to keep historical behavior) when no encoding is involved.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	3a3bdf1c76	MEDIUM: log: write raw strings using lf_rawtext() Make use of the previous commit to print strings that should not be modified. For instance, when +X option is provided, we have to print numerical values in ASCII HEX form. For that, we used snprintf() to output the result to the log output buffer directly, but now we build the string in a temporary buffer of fixed-size and then print it using lf_rawtext() which will take care of encoding options. Because of this patch, we add a little overhead because we first generate the text into a temporary variable and then use lf_rawtext() to print it. Thus we have a double-copy, and this could have some performance implications that were not yet evaluated. Due to the small number of bytes that can end up being copied twice, we could be lucky and have no visible performance impact, but if we happen to see a significant impact, it could be useful to add a passthrough mechanism (to keep historical behavior) when no encoding is involved.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	0d1e99c086	MEDIUM: log: pass date strings to lf_rawtext() Don't directly call functions that take date as argument and output the string representation to the log output buffer under sess_build_logline(), and instead build the strings in temporary buffers of fixed size (hopefully such functions, such as date2str_log() and gmt2str_log() procuce strings of known size), and then print the result using lf_rawtext() helper function. This way, we will be able to encode them automatically as regular string/text when new encoding methods are added. Because of this patch, we add a little overhead because we first generate the text into a temporary variable and then use lf_rawtext() to print it. Thus we have a double-copy, and this could have some performance implications that were not yet evaluated. Due to the small number of bytes that can end up being copied twice (< 30), we could be lucky and have no visible performance impact, but if we happen to see a significant impact, it could be useful to add a passthrough mechanism (to keep historical behavior) when no encoding is involved.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	fcb7e4beaa	MINOR: log: add lf_rawtext{_len}() functions similar to lf_text_{len}, except that quoting and mandatory options are ignored. Use this to print the input string without any modification ( except for encoding logic).	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	1fa2da18cd	MINOR: log: add lf_int() wrapper to print integers Wrap ltoa(), lltoa(), ultoa() and utoa_pad() functions that are used by sess_build_logline() to print numerical values by implementing a dedicated helper named lf_int() that takes <dft_hld> as argument to know how to write the integer by default (when no encoding is specified). LF_INT_UTOA_PAD_4 is used to emulate utoa_pad(x, 4) since it's found only once under sess_build_logline(), thus there is no need to pass an extra parameter to lf_int() function.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	d3c92a3a83	MINOR: log: skip custom logformat_node name if empty Reminder: Since 3.0-dev4, we can optionally give a name to logformat nodes: log-format "%(custom_name1)B %(custom_name2)[str(value)]" But we may also optionally set the expected node type by appending ':type' after the name, type being either sint,str or bool, like this: log-format "%(string_as_int:sint)[str(14)]" However, it is currently not possible to provide a type without providing a name that is a least 1 char long. But it could be useful to provide a type without setting a name, like this, for typecasting purposes only: log-format "%(:sint)[bool(true)]" Thus in order to allow this usage, don't set node->name if node name is not at least 1 character long. By doing so, node->name will remain NULL and will not be considered, but the typecast setting will.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	c584600083	CLEANUP: log: simplify complex values usages in sess_build_logline() make sess_build_logline() switch case more readable by performing some simplifications: complex values are first extracted in a temporary variable so that it's easier to refer to them and at a single place.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	507223d527	MINOR: log: global lf_expr node options Add options to lf_expr->nodes to store global options (those that are common to all node) for easier access. No functional change should be expected.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	7ff4f09e23	MINOR: log: store lf_expr nodes inside substruct Add another struct level inside lf_expr struct to allow new information to be stored alongside lf_expr nodes.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	f8e1357a05	CLEANUP: log: remove unused checks for encode_{chunk,string} Thanks to `8226e92eb` ("BUG/MINOR: tools/log: invalid encode_{chunk,string} usage"), we only need to check for NULL return value from encode_{chunk,string}() and escape_string() to know if the call failed.	2024-04-26 18:39:31 +02:00
William Lallemand	2ab42dddc4	BUG/MINOR: mworker: reintroduce way to disable seamless reload with -x /dev/null Since the introduction of the automatic seamless reload using the internal socketpair, there is no way of disabling the seamless reload. Previously we just needed to remove -x from the startup command line, and remove any "expose-fd" keyword on stats socket lines. This was introduced in `2be557f7c` ("MEDIUM: mworker: seamless reload use the internal sockpairs"). The patch copy /dev/null again and pass it to the next exec so we never try to get socket from the -x. Must be backported as far as 2.6.	2024-04-26 15:25:49 +02:00
Amaury Denoyelle	e4a29447ce	MEDIUM: stats: define stats-file keyword This commit is the final to implement preloading of haproxy internal counters via stats-file parsing. Define a global keyword "stats-file". It allows to specify the path to the stats-file which will be parsed on process startup.	2024-04-26 14:18:15 +02:00
Amaury Denoyelle	782be288ca	MINOR: stats: parse values from stats-file This patch implement parsing of counter values line from stats-file. It reuses domain context previously set by the last header line. Each value is separated by ',' character, relative to the list of column names describe by the header line. This is implemented via static function parse_stat_line(). It first extract a GUID and retrieve the object instance. Then each numerical value is parsed and object counters updated. For the moment, only U64 counters metrics is supported. parse_stat_line() is called on each line until a new header line is found.	2024-04-26 11:34:02 +02:00
Amaury Denoyelle	374dc08611	MINOR: stats: parse header lines from stats-file This patch implements parsing of headers line from stats-file. A header line is defined as starting with '#' character. It is directly followed by a domain name. For the moment, either 'fe' or 'be' is allowed. The following lines will contain counters values relatives to the domain context until the next header line. This is implemented via static function parse_header_line(). It first sets the domain context used during apply_stats_file(). A stats column array is generated to contains the order on which column are stored. This will be reused to parse following lines values. If an invalid line is found and no header was parsed, considered the stats-file as ill formatted and stop parsing. This allows to immediately interrupt parsing if a garbage file was used without emitting a ton of warnings to the user.	2024-04-26 11:34:02 +02:00
Amaury Denoyelle	34ae7755b3	MINOR: stats: apply stats-file on process startup This commit is the first one of a serie to implement preloading of haproxy counters via stats-file parsing. This patch defines a basic apply_stats_file() function. It implements reading line by line of a stats-file without any parsing for the moment. It is called automatically on process startup via init().	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	83731c8048	MINOR: guid: define guid_is_valid_fmt() Extract GUID format validation in a dedicated function named guid_is_valid_fmt(). For the moment, it is only used on guid_insert(). This will be reused when parsing stats-file, to ensure GUID has a valid format before tree lookup.	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	e74148fb7c	MEDIUM: stats: implement dump stats-file CLI Define a new CLI command "dump stats-file" with its handler cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump first frontend and then backend side. It reuses the common function stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the correct side. A new module stats-file.c is added to regroup function specifics to stats-file. It defines two main functions : * stats_dump_file_header() to generate the list of column list prefixed by the line context, either "#fe" or "#be" * stats_dump_fields_file() to generate each stat lines. Object without GUID are skipped. Each stat entry is separated by a comma. For the moment, stats-file does not support statistics modules. As such, stats_dump_*_line() functions are updated to prevent looping over stats module on stats-file output.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	83281303f6	MINOR: stats: define stats-file output format support Prepare stats function to handle a new format labelled "stats-file". Its purpose is to generate a statistics dump with a format closed from the CSV output. Such output will be then used to preload haproxy internal counters on process startup. stats-file output differs from a standard CSV on several points. First, only an excerpt of all statistics is outputted. All values that does not make sense to preload are excluded. For the moment, stats-file only list stats fully defined via "struct stat_col" method. Contrary to a CSV, sll columns of a stats-file will be filled. As such, empty field value is used to mark stats which should not be outputted. Some adaptation specifics to stats-file are necessary into me_generate_field(). First, stats-file will output separatedly values from frontend and backend sides with their own respective set of columns. As such, an empty field value is returned if stat is not defined for either frontend/listener, or backend/server when outputting the other side. Also, as stats-file does not support empty column, stcol_hide() is not used for it. A minor adjustement was necessary for stats_fill_fe_line() to pass context flags. This is necessary to detect stat output format. All other listener/server/backend corresponding functions already have it.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	6615252656	MEDIUM: stats: convert counters to new column definition Convert most of proxy counters statistics to new "struct stat_col" definition. Remove their corresponding switch..case entries in stats_fill_*_line() functions. Their value are automatically calculate via me_generate_field() invocation. Along with this, also complete stcol_hide() when some stats should be hidden. Only a few counters where not converted. This is because they rely on values stored outside of fe/be_counters structure, which me_generate_field() cannot use for now.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	168301411d	MINOR: stats: hide some columns in output Metric style stats can be automatically calculate since the introduction of metric_generate() when using "struct stat_col" as input. This would allow to centralize statistics generation. However, some stats are not outputted under specific condition. For example, health check failures on a server are only reported if checks are active. To support this, define a new function metric_hide(). It is called by metric_generate(). If true, it will skip metric calcuation and return an empty field value instead. This allows to define "stat_col" metrics and calculate them with metric_generate() but hiding them under certain circumstances.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	a7810b7be6	MINOR: stats: implement automatic metric generation from stat_col This commit is a direct follow-up of the previous one which define a new type "struct stat_col" to fully define a statistic entry. Define a new function metric_generate(). For metrics statistics, it is able to automatically calculate a stat value field for "offsets" from "struct stat_col". Use it in stats_fill_*_stats() functions. Maintain a fallback to previously used switch-case for old-style statistics. This commit does not introduce functional change as currently no statistic is defined as "struct stat_col". This will be the subject of a future commit.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	65624876f2	MINOR: stats: introduce a more expressive stat definition method Previously, statistics were simply defined as a list of name_desc, as for example "stat_cols_px" for proxy stats. No notion of type was fixed for each stat definition. This correspondance was done individually inside stats_fill_*_line() functions. This renders the process to define new statistics tedious. Implement a more expressive stat definition method via a new API. A new type "struct stat_col" for stat column to replace name_desc usage is defined. It contains a field to store the stat nature and format. A <cap> field is also defined to be able to define a proxy stat only for certain type of objects. This new type is also further extended to include counter offsets. This allows to define a method to automatically generate a stat value field from a "struct stat_col". This will be the subject of a future commit. New type "struct stat_col" is fully compatible full name_desc. This allows to gradually convert stats definition. The focus will be first for proxies counters to implement statistics preservation on reload.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	861370a6d4	MINOR: stats: update ambiguous "metrics" naming to "stat_cols" The name "metrics" was chosen to represent the various list of haproxy exposed statistics. However, it is deemed as ambiguous as some stats are indeed metric in the true sense, but some are not, as highlighted by various "enum field_origin" values. Replace it by the new name "stat_cols" for statistic columns. Along with the already existing notion of stat lines it should better reflect its purpose.	2024-04-26 10:20:57 +02:00
Christopher Faulet	4b1a7ea66c	BUG/MINOR: peers: Don't wait for a remote resync if there no remote peer When a resync is needed, a local resync is first tried and if it does not work, a remote resync is tried. It happens when the worker is started for instance. There is a timeout to wait for the local resync, except for the first start. And if the local resync fails or times out, the same timeout is applied to the remote resync. This one is always applied, even if there is no remote peer. On the other hand, on reload, if the old worker has never performed its resync, it does not try to resync the new worker. And here there is an issue. On the first reload, when there is no remote peer, we must wait for the resync timeout expiration to have a chance to resync the new worker. If the reload happens too early, there is no resync at all. Concretly, after a fresh start, if a reload happens in the first 5 seconds, there is no resync with the new worker. The issue only concerns the first reload and affects the second worker. To fix the issue, we must only skip the remote resync if there is no remote peer. This way, on a fresh start, the worker is immediately considered as resync. The local reynsc is skipped because it is the first worker and the remote resync is skipped because there is no remote peer. This patch must be backported to all stable versions.	2024-04-25 21:47:02 +02:00
Christopher Faulet	0243691de1	REORG: peers: Rename all occurrences to 'ps' variable In loops on the peer list in the code, the 'ps' variable was used as a shortcut for the peer session. However, if mays be confusing with the peers section too. So, all occurrences to 'ps' variable were renamed to 'peer'.	2024-04-25 18:29:58 +02:00
Christopher Faulet	fff5f63e10	BUG/MEDIUM: peers: Use atomic operations on peers flags when necessary Peers flags are mainly used from the sync task. At least, it is only updated by the sync task. However, there is one place where a peer may read these flags, when the message marking the end of a synchro is sent. So to be sure the value retrieved at this place is consistent, we must use an atomic operation to read it. And of course, from the sync task, atomic operations must be used to update peers flags. However, from the sync task, there is no reason to use atomic operations to read flags because they cannot be update from somewhere eles.	2024-04-25 18:29:58 +02:00
Christopher Faulet	608e23c495	MINOR: peers: Use a static variable to wait a resync on reload When a process is reloaded, the old process must performed a synchronisation with the new process. To do so, the sync task notify the local peer to proceed and waits. Internally, the sync task used PEERS_F_DONOTSTOP flag to know it should wait. However, this flag was only set/unset in a single function. There is no real reason to set a flag to do so. A static variable set to 1 when the resync starts and to 0 when it is finished is enough.	2024-04-25 18:29:58 +02:00
Christopher Faulet	bdcfacdb78	MINOR: peers: Add comment on processing functions of the sync task Just add a comment on __process_running_peer_sync() and __process_stopping_peer_sync() functions.	2024-04-25 18:29:58 +02:00
Christopher Faulet	697bd69efc	REORG: peers: Move peer and peers flags in the corresponding header file PEER_F_* and PEERS_F_ * flags were moved to <peer-t.h> header file. It is mandatory to decode them from "flags" dev tool.	2024-04-25 18:29:58 +02:00
Christopher Faulet	31f544209d	MINOR: peers: Reorder and rename PEERS flags Peers flags were renamed and reordered, mainly to move flags used for debugging purpose at the end. PEERS_F_RESYNC_LOCAL and PEERS_F_RESYNC_REMOTE were also renamed to PEERS_F_RESYNC_LOCAL_FINISHED and PEERS_F_RESYNC_REMOTE_FINISHED to be clear on the fact the operation is finished when the flag is set.	2024-04-25 18:29:58 +02:00
Christopher Faulet	17c4030aaa	MINOR: peers: Reorder and slightly rename PEER flags There are too many holes in peer flags. So let's reorder them. In addition, PEER_F_RESYNC_REQUESTED flag was renamed to PEER_F_DBG_RESYNC_REQUESTED to clearly state it is a flag set for debugging purpose. Finally, PEER_TEACH_RESET was replaced by PEER_TEACH_FLAGS and the bitwise complement operator is now used on lines updating the peer flags. It is a far more common way to do (in HAProxy code at least) and less surprising.	2024-04-25 18:29:58 +02:00
Christopher Faulet	9934eebc19	MINOR: peers: Rename PEERS_F_TEACH_COMPLETE to PEERS_F_LOCAL_TEACH_COMPLETE PEERS_F_TEACH_COMPLETE flag is only used for the old local peer to let the sync task know it can stop waiting during a soft-stop. So it is less confusing to rename this flag to clearly state it concerns local peer only.	2024-04-25 18:29:57 +02:00
Christopher Faulet	45f4698725	MINOR: peers: Start learning for local peer before receiving messages A local peer assigned for leaning can immediately start to learn, without sending any request. So we can do that first, before receiving messages. This way, only PEER_LR_ST_PROCESSING state is evaluating when received messages are processed. In addition, when the resync request is sent, we are sure it is for a remote peer.	2024-04-25 18:29:57 +02:00
Christopher Faulet	c904f7b440	MEDIUM: peers: Use true states for the learn state of a peer Some flags were used to define the learn state of a peer. It was a bit confusing, especially because the learn state of a peer is manipulated from the peer applet but also from the sync task. It is harder to understand the transitions if it is based on flags than if it is based a dedicated state based on an enum. It is the purpose of this patch. Now, we can define the following rules regarding this learn state: * A peer is assigned to learn by the sync task * The learn state is then changed by the peer itself to notify the learning is in progress and when it is finished. * Finally, when the peer finished to learn, the sync task must acknowledge it by unassigning the peer.	2024-04-25 18:29:57 +02:00
Christopher Faulet	ea9bd6d075	MEDIUM: peers: Use true states for the peer applets as seen from outside This patch is a cleanup of the recent change about the relation between a peer and the applet used to deal with I/O. Three flags was introduced to reflect the peer applet state as seen from outside (from the sync task in fact). Using flags instead of true states was in fact a bad idea. This work but it is confusing. Especially because it was mixed with LEARN and TEACH peer flags. So, now, to make it clearer, we are now using a dedicated state for this purpose. From the outside, the peer may be in one of the following state with respects of its applet: * the peer has no applet, it is stopped (PEER_APP_ST_STOPPED). * the peer applet was created with a validated connection from the protocol perspective. But the sync task must synchronized it with the peers section. It is in starting state (PEER_APP_ST_STARTING). * The starting starting was acknowledged by the sync task, the peer applet can start to process messages. It is in running state (PEER_APP_ST_RUNNING). * The last peer applet was released and the associated connection closed. But the sync task must synchronized it with the peers section. It is in stopping state (PEER_APP_ST_STOPPING). Functionnaly speaking, there is no true change here. But it should be easier to understand now. In addition to these changes, __process_peer_state() function was renamed sync_peer_app_state().	2024-04-25 18:29:57 +02:00
Christopher Faulet	229755d8f5	MEDIUM: peers: Simplify the peer flags dealing with the connection state Recently, some peer flags were added to deal with the connection state (PEER_F_ST_). 3 states were added: RELEASED: Set when we forced to shutdown the peer session and no new session was created yet. * CONNECTED: Set when the peer has established connection and validated it from the peer protocol point of view * ACCEPTED: Set when the peer has accepted a connection and validated it from the peer protocol point of view However, management of these pseudo states is a bit confusing. And it appears there is no reason to have 2 flags to express there is a validated peer session. CONNECTED state was used for a peer session on the frontend side while ACCEPTED state was used for a peer session on the backend side. So, there is now only one "connected" state and we test if the applet was created on the frontend or the backend side to decide what to do, in addition to the fact the peer is local or remote. It is a transitionnal patch. True states will be created to deal with all this stuff and corresponding flags will be removed. This patch depends on the commit "MINOR: applet: Add a function to know the sidde where an applet was created".	2024-04-25 18:29:57 +02:00
Christopher Faulet	0c1ea46fe0	MINOR: peers: Remove unused PEERS_F_RESYNC_PROCESS flag This flag is now set or unset but never tested. So we can safely remove it.	2024-04-25 18:29:57 +02:00
Christopher Faulet	e35293b2d3	BUG/MEDIUM: peers: Wait for sync task ack when a resynchro is finished When a learning process is finished, partially or not, the event must be processed by the sync task. It is important for the peer applet to wait in this case, especially if the same peer is teaching to another peer, to be sure to send the right resync finished message (full or partial). Thanks to the previous patch, we can set PEER_F_WAIT_SYNCTASK_ACK flag on the peer when a PEER_MSG_CTRL_RESYNCPARTIAL or PEER_MSG_CTRL_RESYNCFINISHED message is received to be sure to stop the processing. Of course, we must also take care to wake the peer up after having acknowledged the learn status from the sync task. This patch depends on the commit "BUG/MEDIUM: peers: Wait for sync task ack when a resynchro is finished". Both must be backported if commit `9425aeaffb` ("BUG/MAJOR: peers: Update peers section state from a thread-safe manner") is backported.	2024-04-25 18:29:57 +02:00
Christopher Faulet	12014587fa	MINOR: peers: Use a peer flag to block the applet waiting ack of the sync task Since recent fixes on peers, some changes on a peer must be acknowledged by the sync task before letting the peer applet processing messages. Blocking conditions was based on a combination of flags. It was errorprone. So, this patch introduces PEER_F_WAIT_SYNCTASK_ACK peer flag for this purpose. This flag is set by the peer when it must wait for an ack from the sync task. This sync task, on its side, must remove it and wake the peer up.	2024-04-25 18:29:57 +02:00
Christopher Faulet	f80f1635ec	MINOR: peers: Don't set TEACH flags on a peer from the sync task The TEACH flags only concerns the peer applet. There is no reason to set it from the sync task. It is confusing. And at the end, after some refactoring/fixes, setting these flags directly from the peer applet will allow us to immediatly performing the corresponding teach processing, while for now we must wait the sync task acknowledges the changes.	2024-04-25 18:29:57 +02:00
Christopher Faulet	6380fd5eb9	MINOR: peers: Remove unused PEERS_F_RESYNC_REQUESTED flag This flag was used for debugging purpose to know a resync was requested at least once in the process life. Since the last bunch of fixes about the peers locking mechanism, this info is now set per-peer. There is no reason to still have it on peers too. So, just remove it.	2024-04-25 18:29:57 +02:00
Christopher Faulet	2a902e3188	BUG/MEDIUM: peers: Reprocess peer state after all session shutdowns When a session is shut down, the peer is switched in released state (PEER_F_ST_RELEASED) and the sync task must process it to eventually perform some clean up, in case the peer was assigned to learn. However, this was only true when the session was shut down from the peer applet itself. This was not performed when it was shut down from the sync task. It is now fixed.	2024-04-25 18:29:57 +02:00
Christopher Faulet	3541c54481	BUG/MEDIUM: peers: Automatically start to learn on local peer The previous fix (`c0b2015aae` "BUG/MEDIUM: peers: Don't set PEERS_F_RESYNC_PROCESS flag on a peer") was made due to lack of knowledge on the peers. A local peer, when assigned to learn, must start to learn immediately without sending any request. This happens on reload. Thus, in this case, the PEER_F_LEARN_PROCESS flag must be set with PEER_F_LEARN_ASSIGN flag from the sync task. This patch must only be backported if the above commit is backported.	2024-04-25 18:29:57 +02:00
Willy Tarreau	e158b7efb7	CLEANUP: h1: make use of the multi-byte matching functions Instead of leaving the hard-coded non-trivial operations in the H1 parsing code, let's just rely on the new intops functions that do the same and that are less prone to being accidentally touched. It was verified that the resulting code is exactly the same.	2024-04-24 16:05:38 +02:00
Willy Tarreau	b9bf16b382	BUG/MINOR: h1: fix detection of upper bytes in the URI In 1.7 with commit `5f10ea30f4` ("OPTIM: http: improve parsing performance of long URIs") we improved the URI parser's performance on platforms supporting unaligned accesses by reading 4 chars at a time in a 32-bit word. However, as reported in GH issue #2545, there's a bug in the way the top bytes are checked, as the parser will stop when all 4 of them are above 7e instead of when one of them is, so certain patterns can be accepted through if the last ones are all valid. The fix requires to negate the value but on the other hand it allows to parallelize some of the tests and fuse the masks, which could even end up slightly faster. This needs to be backported to all stable versions, but be careful, this code moved a lot over time, from proto_http.c to h1.c, to http_msg.c, to h1.c again. Better just grep for "24242424" or "21212121" in each version to find it. Big kudos to Martijn van Oosterhout (@kleptog) for spotting this problem while analyzing that piece of code, and reporting it.	2024-04-24 11:50:36 +02:00
David Carlier	98d22f212a	MEDIUM: shctx: Naming shared memory context From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA so caches can be identified when looking at HAProxy process memory mapping. The most possible error is lack of kernel support, as a result we ignore it, if the naming fails the mapping of memory context ought to still occur.	2024-04-24 10:25:38 +02:00
Tim Duesterhus	3ef60012ae	MINOR: Add support for UUIDv7 to the `uuid` sample fetch This adds support for UUIDv7 to the existing `uuid` sample fetch that was added in `8a694b859c`.	2024-04-24 08:23:56 +02:00
Tim Duesterhus	aab6477b67	MINOR: Add `ha_generate_uuid_v7` This function generates a version 7 UUID as per draft-ietf-uuidrev-rfc4122bis-14.	2024-04-24 08:23:56 +02:00
Tim Duesterhus	c6cea750a9	MINOR: tools: Rename `ha_generate_uuid` to `ha_generate_uuid_v4` This is in preparation of adding support for other UUID versions.	2024-04-24 08:23:56 +02:00
Willy Tarreau	19f8762a98	BUILD: stick-tables: silence build warnings when threads are disabled Since 3.0-dev7 with commit `1a088da7c2` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), building without threads yields a warning about the shard not being used. This is because the locks API does nothing of its arguments, which is the only place where the shard is being used. We cannot modify the lock API to pretend to consume its argument because quite often it's not even instantiated. Let's just pretend we consume shard using an explict ALREADY_CHECKED() statement instead. While we're at it, let's make sure that XXH32() is not called when there is a single bucket! No backport is needed.	2024-04-24 08:23:56 +02:00
Christopher Faulet	589fb12904	BUG/MEDIUM: applet: Let's applets decide if they have more data to deliver Unlike the muxes, the applets have the responsibility to notify the SC if they have more data to deliver to the stream. The same is done to notify the SC that applets must be woken up ASAP to continue some processing. When an applet is woken up, we pretend it has no more data to deliver by setting SE_FL_HAVE_NO_DATA flag. If the applet removes this flag, we must take care to not set it again just after. Otherwise, the applet may remain blocked if there is no other condition to wake it up. It is an issue for the applets using their own buffers because SE_FL_HAVE_NO_DATA is erroneously set in sc_applet_recv() function, after the applet execution. For instance, it happens for the cli applet when a huge map is cleared. No data are delivered to the stream but we pretend it is the case to clear the map per batches. This patch should fix the issue #2543. No Backported needed.	2024-04-23 07:33:10 +02:00
Amaury Denoyelle	341bf913d4	MINOR: stats: use STAT_F_* prefix for flags Some flags are defined during statistics generation and output. They use the prefix STAT_* which is also used for other purposes. Rename them with the new prefix STAT_F_* to differentiate them from the other usages.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	e97375dcab	MINOR: stats: use stricter naming stats/field/line Several unique names were used for different purposes under statistics implementation. This caused the code to be difficult to understand. * stat/stats name is removed when a more specific name could be used * restrict field usage to purely refer to <struct field> which represents a raw stat value. * use "line" naming to represent an array of <struct field>	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	8dbb74542f	MINOR: stats: rename info stats Info are used to expose haproxy global metrics. It is similar to proxy statistics and any other module. As such, rename info indexes using SI_I_INF_* prefix. Also info variable is renamed stat_line_info. Thanks to this, naming is now consistent between info and other statistics. It will help to integrate it as a "global" statistics module.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	02e0dd6d30	MINOR: stats: rename ambiguous stat_l and stat_count Statistics were extended with the introduction of stats module. This mechanism allows to expose various metrics for several haproxy components. As a consequence of this, some static variables were transformed to dynamic ones to be able to regroup all statistics definition. Rename these variables with more explicit naming : * stat_lines can be used to generate one line of statistics for any module using struct field as value * metrics and metrics_len are used to stored description of metrics indexed by module Note that info is not integrated in the statistics module mechanism. However, it could be done in the future to better reflect its purpose.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	8fc0b18087	MINOR: stats: rename proxy stats This commit is the first one of a serie which adjust naming convention for stats module. The objective is to remove ambiguity and better reflect how stats are implemented, especially since the introduction of stats module. This patch renames elements related to proxies statistics. One of the main change is to rename ST_F_* statistics indexes prefix with the new name ST_I_PX_*. This remove the reference to field which represents another concept in the stats module. In the same vein, global stat_fields variable is renamed metrics_px.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	282a8e9f52	BUG/MINOR: stats: fix stot metric for listeners This commit is part of a series to align counters usage between frontends/listeners on one side and backends/servers on the other. On frontend side, "stot" is the total count of sessions for both proxies and listeners. For proxies, fe_counters <cum_sess> is correctely used. The bug is on listeners where <cum_conn> value is returned, which instead indicates a number of connection. This commit fixes this by returning <cum_sess> counter value for "stot" metric. Along this fixes, use the opportunity to report "conn_tot" for listeners using <cum_conn> value, as for frontend proxies. This commit fixes a bug but must not be backported as stats output is changed.	2024-04-22 10:35:18 +02:00
Amaury Denoyelle	c02ec9a9db	BUG/MINOR: backend: use cum_sess counters instead of cum_conn This commit is part of a serie to align counters usage between frontends/listeners on one side and backends/servers on the other. "stot" metric refers to the total number of sessions. On backend side, it is interpreted as a number of streams. Previously, this was accounted using <cum_sess> be_counters field for servers, but <cum_conn> instead for backend proxies. Adjust this by using <cum_sess> for both proxies and servers. As such, <cum_conn> field can be removed from be_counters. Note that several diagnostic messages which reports total frontend and backend connections were adjusted to use <cum_sess>. However, this is an outdated and misleading information as it does reports streams count on backend side. These messages should be fixed in a separate commit. This should be backported to all stable releases.	2024-04-22 10:35:18 +02:00
Amaury Denoyelle	93066be32d	MINOR: backend: use be_counters for health down accounting This commit is the first one of a series which aims to align counters usage between frontends/listeners on one side and backends/servers on the other. Remove <down_trans> field from proxy structure. Use instead the same name field from be_counters structure, which is already used for servers.	2024-04-22 10:35:18 +02:00
William Lallemand	7556e5b3a4	BUILD: ssl: use %zd for sizeof() in ssl_ckch.c 32bits build was broken because of wrong printf length modifier. src/ssl_ckch.c:4144:66: error: format specifies type 'long' but the argument has type 'unsigned int' [-Werror,-Wformat] 4143 \| memprintf(err, "parsing [%s:%d] : cannot parse '%s' value '%s', too long, max len is %ld.\n", \| ~~~ \| %u 4144 \| file, linenum, args[cur_arg], args[cur_arg + 1], sizeof(alias_name)); \| ^~~~~~~~~~~~~~~~~~ src/ssl_ckch.c:4217:64: error: format specifies type 'long' but the argument has type 'unsigned int' [-Werror,-Wformat] 4216 \| memprintf(err, "parsing [%s:%d] : cannot parse '%s' value '%s', too long, max len is %ld.\n", \| ~~~ \| %u 4217 \| file, linenum, args[cur_arg], args[cur_arg + 1], sizeof(alias_name)); \| ^~~~~~~~~~~~~~~~~~ 2 errors generated. make: * [Makefile:1034: src/ssl_ckch.o] Error 1 make: * Waiting for unfinished jobs.... Replace %ld by %zd. Should fix issue #2542.	2024-04-20 14:25:42 +02:00
Valentine Krasnobaeva	865db6307f	MINOR: init: use RLIMIT_DATA instead of RLIMIT_AS Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is no longer effective, in order to restrict memory consumption at run time. We can see from process memory map below, that there are many holes within the process VA space, which bumps its VSZ to 1.5G. These holes are here by many reasons and could be explaned at first by the full randomization of system VA space. Now it is usually enabled in Linux kernels by default. There are always gaps around the process stack area to trap overflows. Holes before and after shared libraries could be explained by the fact, that on many architectures libraries have a 'preferred' address to be loaded at; putting them elsewhere requires relocation work, and probably some unshared pages. Repetitive holes of 65380K are most probably correspond to the header that malloc has to allocate before asked a claimed memory block. This header is used by malloc to link allocated chunks together and for its internal book keeping. $ sudo pmap -x -p `pidof haproxy` 127136: ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg Address Kbytes RSS Dirty Mode Mapping 0000555555554000 388 64 0 r---- /home/haproxy/haproxy/haproxy 00005555555b5000 2608 1216 0 r-x-- /home/haproxy/haproxy/haproxy 0000555555841000 916 64 0 r---- /home/haproxy/haproxy/haproxy 0000555555926000 60 60 60 r---- /home/haproxy/haproxy/haproxy 0000555555935000 116 116 116 rw--- /home/haproxy/haproxy/haproxy 0000555555952000 7872 5236 5236 rw--- [ anon ] 00007fff98000000 156 36 36 rw--- [ anon ] 00007fff98027000 65380 0 0 ----- [ anon ] 00007fffa0000000 156 36 36 rw--- [ anon ] 00007fffa0027000 65380 0 0 ----- [ anon ] 00007fffa4000000 156 36 36 rw--- [ anon ] 00007fffa4027000 65380 0 0 ----- [ anon ] 00007fffa8000000 156 36 36 rw--- [ anon ] 00007fffa8027000 65380 0 0 ----- [ anon ] 00007fffac000000 156 36 36 rw--- [ anon ] 00007fffac027000 65380 0 0 ----- [ anon ] 00007fffb0000000 156 36 36 rw--- [ anon ] 00007fffb0027000 65380 0 0 ----- [ anon ] ... 00007ffff7fce000 4 4 0 r-x-- [ anon ] 00007ffff7fcf000 4 4 0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so 00007ffff7fd0000 140 140 0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so ... 00007ffff7ffe000 4 4 4 rw--- [ anon ] 00007ffffffde000 132 20 20 rw--- [ stack ] ffffffffff600000 4 0 0 --x-- [ anon ] ---------------- ------- ------- ------- total kB 1499288 75504 72760 This exceeded VSZ makes impossible to start an haproxy process with 200M memory limit, set at its initialization stage as RLIMIT_AS. We usually have in this case such cryptic output at stderr: $ haproxy -m 200 -f haproxy_quic.cfg (null)(null)(null)(null)(null)(null) At the same time the process RSS (a memory really used) is only 75,5M. So to make process memory accounting more realistic let's base the memory limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead of RLIMIT_AS. RLIMIT_AS was used before, because earlier versions of haproxy always allocate memory buffers for new connections, but data were not written there immediately. So these buffers were not instantly counted in RSS, but were always counted in VSZ. Now we allocate new buffers only in the case, when we will write there some data immediately, so using RLIMIT_DATA becomes more appropriate.	2024-04-19 17:36:40 +02:00
Christopher Faulet	d43f0e7f5a	BUG/MEDIUM: peers: Fix state transitions of a peer The commit `9425aeaffb` ("BUG/MAJOR: peers: Update peers section state from a thread-safe manner") introduced regressions about state transitions of a peer. A peer may be in a connected, accepted or released state. Before, changes for these states were performed synchronously. Since the commit above, changes are mainly performed in the sync process task. The first regression was about the released then accepted state transition, called the renewed state. In reality the state was always crushed by the accepted state. After some review, the state was just removed to always perform the cleanup in the sync process task before acknowledging the connected or accepted states. Then, a wakeup of the peer applet was missing from the sync process task after the ack of connected or accepted states, blocking the applet. Finally, when a peer is in released, connected or accepted state, we must take care to wait the sync process task wakeup before trying to receive or send messages. This patch must only be backported if the above commit is backported.	2024-04-19 17:08:22 +02:00
Christopher Faulet	c0b2015aae	BUG/MEDIUM: peers: Don't set PEERS_F_RESYNC_PROCESS flag on a peer The bug was introduced by commit `9425aeaffb` ("BUG/MAJOR: peers: Update peers section state from a thread-safe manner"). A peers flags was set on a peer by error. Just remove it. This patch must only be backported if the above commit is backported.	2024-04-19 17:08:22 +02:00
Willy Tarreau	64d20fc9e0	BUG/MINOR: fd: my_closefrom() on Linux could skip contiguous series of sockets We got a detailed report analysis showing that our optimization consisting in using poll() to detect already closed FDs within a 1024 range has an issue with the case where 1024 consecutive FDs are open (hence do not show POLLNVAL) and none of them has any activity report. In this case poll() returns zero update and we would just skip the loop that inspects all the FDs to close the valid ones. One visible effect is that the called programs might occasionally see some FDs being exposed in the low range of their fd space, possibly making the process run out of FDs when trying to open a file for example. Note that this is actually a fix for commit `b8e602cb1b` ("BUG/MINOR: fd: make sure my_closefrom() doesn't miss some FDs") that already faced a more common form of this problem (incomplete but non-empty FDs reported). This can be backported up to 2.0.	2024-04-19 17:06:21 +02:00
Willy Tarreau	b4734c2bd7	BUG/MINOR: sock: handle a weird condition with connect() As reported on github issue #2491, there's a very strange situation where epoll_wait() appears to be reported EPOLLERR only (and not IN/OUT/HUP etc as normally happens with EPOLLERR), and when connect() is called again to check the state of the ongoing connection, it returns EALREADY, basically saying "no news, please wait". This obviously triggers a wakeup loop. For now it has remained impossible to reproduce this issue outside of the reporter's environment, but that's definitely something that is impossible to get out from. The workaround here is to address the lowest level cause we can act on, which is to avoid returning to wait if EPOLLERR was returned. Indeed, in this case we know it will loop, so we must definitely take this one into account. We only do that after connect() asks us to wait, so that a properly established connection with a queued error at the end of an exchange will not be diverted and will be handled as usual. This should be backported to approximately all versions, at least as far as 2.4 according to the reporter who observed it there. Thanks to @donnyxray for their useful captures isolating the problem.	2024-04-19 17:04:25 +02:00
Christopher Faulet	fbc0850d36	MEDIUM: muxes: Use one callback function to shut a mux stream mux-ops .shutr and .shutw callback functions are merged into a unique functions, called .shut. The shutdown mode is still passed as argument, muxes are responsible to test it. Concretly, .shut() function of each mux is now the content of the old .shutw() followed by the content of the old .shutr().	2024-04-19 16:33:40 +02:00
Christopher Faulet	1e38ac72ce	MEDIUM: stconn: Use one function to shut connection and applet endpoints se_shutdown() function is now used to perform a shutdown on a connection endpoint and an applet endpoint. The same function is used for both. sc_conn_shut() function was removed and appctx_shut() function was updated to only deal with the applet stuff.	2024-04-19 16:33:35 +02:00
Christopher Faulet	4b80442832	MEDIUM: stconn: Explicitly pass shut modes to shut applet endpoints It is the same than the previous patch but for applets. Here there is already only one function. But with this patch, appctx_shut() function was modified to explicitly get shutdown mode as parameter. In addition appctx_shutw() was removed.	2024-04-19 16:25:06 +02:00
Christopher Faulet	c96a873ba3	MEDIUM: stconn: Use only one SC function to shut connection endpoints The SC API to perform shutdowns on connection endpoints was unified to have only one function, sc_conn_shut(), with read/write shut modes passed explicitly. It means sc_conn_shutr() and sc_conn_shutw() were removed. The next step is to do the same at the mux level.	2024-04-19 16:25:06 +02:00
Christopher Faulet	61fbbbe42f	MINOR: stconn: Rewrite shutdown functions to simplify the switch statements To ease shutdown API refactoring, shutdown callback functions were simplified. The fallthrough were removed from the switch statements.	2024-04-19 16:25:06 +02:00
Christopher Faulet	d2c3f8dde7	MINOR: stconn/connection: Move shut modes at the SE descriptor level CO_SHR_* and CO_SHW_* modes are in fact used by the stream-connectors to instruct the muxes how streams must be shut done. It is then the mux responsibility to decide if it must be propagated to the connection layer or not. And in this case, the modes above are only tested to pass a boolean (clean or not). So, it is not consistant to still use connection related modes for information set at an upper layer and never used by the connection layer itself. These modes are thus moved at the sedesc level and merged into a single enum. Idea is to add more modes, not necessarily mutually exclusive, to pass more info to the muxes. For now, it is a one-for-one renaming.	2024-04-19 16:24:46 +02:00
Christopher Faulet	293b8f7530	MINOR: mux-pt: Test conn flags instead of sedesc ones to perform a full close In .shutr and .shutw callback functions, we must rely on the connection flags (CO_FL_SOCK_RD_SH/WR_SH) to decide to fully close the connection instead of using sedesc flags. At the end, for the PT multiplexer, it is equivalent. But it is more logicial and consistent this way.	2024-04-19 15:34:27 +02:00
William Lallemand	219d95281a	MINOR: ssl: implement keylog fetches for backend connections This patch implements the backend side of the keylog fetches. The code was ready but needed the SSL message callbacks. This could be used like this: log-format "CLIENT_EARLY_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_client_early_traffic_secret]\n CLIENT_HANDSHAKE_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_client_handshake_traffic_secret]\n SERVER_HANDSHAKE_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_server_handshake_traffic_secret]\n CLIENT_TRAFFIC_SECRET_0 %[ssl_bc_client_random,hex] %[ssl_bc_client_traffic_secret_0]\n SERVER_TRAFFIC_SECRET_0 %[ssl_bc_client_random,hex] %[ssl_bc_server_traffic_secret_0]\n EXPORTER_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_exporter_secret]\n EARLY_EXPORTER_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_early_exporter_secret]"	2024-04-19 14:48:44 +02:00
William Lallemand	1494cd7137	MAJOR: ssl: use the msg callback mecanism for backend connections Backend SSL connections never used the ssl_sock_msg_callbacks() which prevent the use of keylog on the server side. The impact should be minimum, though it add a major callback system for protocol analysis, which is the same used on frontend connections. https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_msg_callback.html The patch add a call to SSL_CTX_set_msg_callback() in ssl_sock_prepare_srv_ssl_ctx() the same way it's done for bind lines in ssl_sock_prepare_ctx().	2024-04-19 14:48:44 +02:00
William Lallemand	64201ad2c3	MEDIUM: ssl: crt-base and key-base local keywords for crt-store Add support for crt-base and key-base local keywords for the crt-store. current_crtbase and current_keybase are filed with a copy of the global keyword argument when a crt-store is declared, and updated with a new path when the keywords are in the crt-store section. The ckch_conf_kws[] array was updated with &current_crtbase and &current_keybase instead of the global_ssl ones so the parser can use them. The keyword must be used before any "load" line in a crt-store section. Example: crt-store web crt-base /etc/ssl/certs/ key-base /etc/ssl/private/ load crt "site3.crt" alias "site3" load crt "site4.crt" key "site4.key" frontend in2 bind *:443 ssl crt "@web/site3" crt "@web/site4.crt"	2024-04-18 17:47:24 +02:00
Amaury Denoyelle	0109c0658d	REORG: stats: extract JSON related functions This commit is similar to the previous one. This time it deals with functions related to stats JSON output.	2024-04-18 17:04:08 +02:00
Amaury Denoyelle	b8c1fdf24e	REORG: stats: extract HTML related functions Extract functions related to HTML stats webpage from stats.c into a new module named stats-html. This allows to reduce stats.c to roughly half of its original size.	2024-04-18 17:04:08 +02:00
Amaury Denoyelle	b3d5708adc	MINOR: stats: remove implicit static trash_chunk usage A static variable trash_chunk was used as implicit buffer in most of stats output function. It was a oneline buffer uses as temporary storage before emitting to the final applet or CLI buffer. Replaces it by a buffer defined in show_stat_ctx structure. This allows to retrieve it in most of stats output function. An additional parameter was added for the function where context was not already used. This renders the code cleaner and will allow to split stats.c in several source files. As a result of a new member into show_stat_ctx, per-command context max size has increased. This forces to increase APPLET_MAX_SVCCTX to ensure pool size is big enough. Increase it to 128 bytes which includes some extra room for the future.	2024-04-18 17:04:08 +02:00
William Lallemand	ffea2e1a13	MEDIUM: ssl: support a named crt-store section This patch introduces named crt-store section. A named crt-store allows to add a scope to the crt name. For example, a crt named "foo.crt" in a crt-store named "web" will result in a certificate called "@web/foo.crt".	2024-04-18 16:10:09 +02:00
Aurelien DARRAGON	81a8a2cae1	MINOR: peers: stop relying on srv->addr to find peer port Now that peers entirely rely on peer->srv for connection settings, and that it was confirmed that it works properly thanks to previous commit, let's finish what we started in `f6ae258` ("MINOR: peers: rely on srv->addr and remove peer->addr") and stop using srv->addr to find out peers port and instead rely on srv->svc_port as it's already done for other proxy types.	2024-04-18 11:18:26 +02:00
Aurelien DARRAGON	f51f438875	BUG/MEDIUM: peers: fix localpeer regression with 'bind+server' config style A dumb mistake was made in `f6ae25858` ("MINOR: peers: rely on srv->addr and remove peer->addr"). I completely overlooked the part where the bind address settings are used as implicit server's address settings when the peers are declared using the new bind+server config style (which is the new recommended method to declare peers as it follows the same logic as the one used in other proxy sections). As such, the peers synchro fails to work between previous and new process (localpeer mechanism) upon reload when declaring peers with way: global localpeer local peers mypeers bind 127.0.0.1:10001 server local And one has to use the 'old' config style to make it work: global localpeer local peers mypeers peer local 127.0.0.1:10001 -- To fix the issue, let's explicitly set the server's addr:port according to the bind's address settings (only the first listener is considered) when local peer was declared using the 'bind+server' method. No backport needed.	2024-04-18 11:18:13 +02:00
Christopher Faulet	494bc03ff7	BUG/MEDIUM: peers: Fix exit condition when max-updates-at-once is reached When a peer applet is pushing updates, we limit the number of update sent at once via a global parameter to not spend too much time in the applet. On interrupt, we claimed for more room to be woken up quickly. However, this statement is only true if something was pushed in the buffer. Otherwise, with an empty buffer, if the stream itself is not woken up, the applet remains also blocked because there is no send activity on the other side to unblock it. In this case, instead of requesting more room, it is sufficient to state the applet have more data to send. This patch must be backported as far as 2.6.	2024-04-18 09:17:03 +02:00
Christopher Faulet	4fd656e311	BUG/MEDIUM: spoe: Always retry when an applet fails to send a frame This bug is related to the previous one ("BUG/MEDIUM: spoe: Always retry when an applet fails to send a frame"). applet_putblk() function retruns -1 on error and it should always be interpreted as a missing of room in the buffer. However, on the spoe, this was processed as an I/O error. This patch must be backported as far as 2.8.	2024-04-18 09:17:03 +02:00
William Lallemand	10224d72fd	BUG/MINOR: ssl: fix crt-store load parsing The crt-store load line parser relies on offsets of member of the ckch_conf struct. However the new "alias" keyword as an offset to -1, because it does not need to be used. Plan was to handle it that way in the parser, but it wasn't supported yet. So -1 was still used in an offset computation which was not used, but ASAN could see the problem. This patch fixes the issue by using a signed type for the offset value, so any negative value would be skipped. It also introduced a PARSE_TYPE_NONE for the parser. No backport needed.	2024-04-17 21:00:34 +02:00
William Lallemand	ff4a0f6562	BUG/MINOR: ssl: check on forbidden character on wrong value The check on the forbidden '/' for the crt-store load keyword was done on the keyword instead of the value itself. No backport needed.	2024-04-17 21:00:25 +02:00
William Lallemand	bdee8ace81	MEDIUM: ssl: support aliases in crt-store The crt-store load line now allows to put an alias. This alias is used as the key in the ckch_tree instead of the certificate. This way an alias can be referenced in the configuration with the '@/' prefix. This can only be define with a crt-store.	2024-04-17 17:24:49 +02:00
Willy Tarreau	e6662bf706	MEDIUM: evports: permit to report multiple events at once Since the beginning in 2.0 the nevlist parameter was set to 1 before calling port_getn(), which means that a single FD event will be reported per polling loop. This is extremely inefficient, and all the code was designed to use global.tune.maxpollevents. It looks like it's a leftover of a temporary debugging change. No apparent issues were found by setting it to a higher value, so better do that. That code is not much used nowadays with Solaris disappearing from the landscape, so even if this definitely was a bug, it's preferable not to backport that fix as it could uncover other subtle bugs that were never raised yet.	2024-04-17 16:37:04 +02:00
Willy Tarreau	36d92dcd9b	BUG/MEDIUM: evports: do not clear returned events list on signal Since 2.0 with commit `0ba4f483d2` ("MAJOR: polling: add event ports support (Solaris)"), the polling system on Solaris suffers from a signal handling problem. It turns out that this API is very bizarre, as reported events are automatically unregistered and their counter is updated in the same variable that was used to pass the count on input, making it difficult to handle certain error codes (how should one handle ENOSYS for example?). And to complete everything, the API is able to return both EINTR and an event if a signal is reported. The code tries to deal with certain such cases (e.g. ETIME for timeout can also report an event), otherwise it defaults to clearing the event counter upon error. This has the effect that EINTR clears the list of events, which are also automatically cleared from the set by the system. This is visible when using external checks where the SIGCHLD of the leaving child causes a wakeup that ruins the event counter and causes endless loops, apparently due to the queued inter-thread byte in the pipe used to wake threads up that never gets removed in this case. Note that extcheck would also deserve deeper investigation because it can immediately re-trigger a check in such a case, which is not normal. Removing the wiping of the nevlist variable fixes the problem. This can be backported to all versions since it affects 2.0.	2024-04-17 16:25:20 +02:00
Ilya Shipitsin	ab7f05daba	CLEANUP: assorted typo fixes in the code and comments This is 41st iteration of typo fixes	2024-04-17 11:14:44 +02:00
Willy Tarreau	1c944eab08	BUILD: cache: fix a build warning with gcc < 7 Gcc before 7 does really not like direct operations on cast pointers such as "((struct a)b)->c += d;". It turns our that we have exactly that construct in 3.0 since commit `5baa9ea168` ("MEDIUM: cache: Save body size of cached objects and track it on delivery"). It's generally sufficient to use an intermediary variable such as : "({ (struct a) _ = b; _; })->c +=d;" but that's ugly. Fortunately DISGUISE() implicitly does something very similar and works fine, so let's use that. No backport is needed.	2024-04-17 09:43:32 +02:00
Christopher Faulet	50d8c18742	BUG/MEDIUM: stconn: Don't forward channel data if input data must be filtered Once data are received and placed in a channel buffer, if it is possible, outgoing data are immediately forwarded. But we must take care to not do so if there is also pending input data and a filter registered on the channel. It is especially important for HTX streams because the HTX may be altered, especially the extra field. And it is indeed an issue with the HTTP compression filter and the H1 multiplexer. The wrong chunk size may be announced leading to an internal error. This patch should fix the issue #2530. It must be backported to all stable versions.	2024-04-16 11:36:54 +02:00
Christopher Faulet	ffe0874cfb	MINOR: peer: Restore previous peer flags value to ease debugging The last fixes on the peers to improve the locking mechanism introduced new peer flags and the value of some old flags was changed. This was done in the commit `9b78e33837` ("MINOR: peers: Add 2 peer flags about the peer learn status"). But, to ease the debugging of the peers team, old values are restored. This patch must be backported with the commit above.	2024-04-16 11:35:47 +02:00
Christopher Faulet	9075a7e32f	MEDIUM: peers: Only lock one peer at a time in the sync process function Thanks to all previous changes, it is now possible to stop locking all peers at once in the resync process function. Peer are locked one after the other. Wen a peer is locked, another one may be locked when all peer sharing the same shard must be updated. Otherwise, at anytime, at most one peer is locked. This should significantly improve the situation. This patch depends on the following patchs: * BUG/MAJOR: peers: Update peers section state from a thread-safe manner * BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner * MINOR: peers: Add functions to commit peer changes from the resync task * MINOR: peers: sligthly adapt part processing the stopping signal * MINOR: peers: Add flags to report the peer state to the resync task * MINOR: peers: Add 2 peer flags about the peer learn status * MINOR: peers: Split resync process function to separate running/stopping states It may be good to backport it to 2.9. All the seris should fix the issue #2470.	2024-04-16 10:29:21 +02:00
Christopher Faulet	9425aeaffb	BUG/MAJOR: peers: Update peers section state from a thread-safe manner It is the main part of this series. In the peer applet, only the peer flags are updated. It is now the responsibility of the resync process function to check changes on each peer to update the peers section state accordingly. Concretly, changes on the connection state (accepted, connected, released or renewed) are first reported at the peer level and then handled in __process_peer_state() function. In the same manner, when the learn status of a peer changes, the peers section state is no longer updated immediately. The resync task is woken up to deal with this changes. Thanks to these changes, the peers should be now really thread-safe. This patch relies on the following ones: * BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner * MINOR: peers: Add functions to commit peer changes from the resync task * MINOR: peers: sligthly adapt part processing the stopping signal * MINOR: peers: Add flags to report the peer state to the resync task * MINOR: peers: Add 2 peer flags about the peer learn status * MINOR: peers: Split resync process function to separate running/stopping states No bug was reported about the thread-safety of peers. Only a performance issue was encountered with a huge number of peers (> 50). So there is no reason to backport all these patches further than 2.9.	2024-04-16 10:29:21 +02:00
Christopher Faulet	ef066fa186	BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner Flags on the peers section state must be updated from a thread-safe manner. It is not true today. With this patch we take care PEERS_F_RESYNC_REQUESTED flag is only set by the resync task. To do so, a peer flag is used. This flag is only set once and never removed. It is juste used for debugging purpose. So it is enough to set it on a peer and be sure to report it on the peers section when the sync task is executed. This patch relies on previous ones: * MINOR: peers: Add functions to commit peer changes from the resync task * MINOR: peers: sligthly adapt part processing the stopping signal * MINOR: peers: Add flags to report the peer state to the resync task * MINOR: peers: Add 2 peer flags about the peer learn status * MINOR: peers: Split resync process function to separate running/stopping states	2024-04-16 10:29:21 +02:00
Christopher Faulet	bdf1634883	MINOR: peers: Add functions to commit peer changes from the resync task For now, nothing is done in these functions. It is only a patch to prepare the huge part of the refactoring about the locking mechanism of the peers. These functions will be responsible to check peers state and their learn status to update the peers section flags accordingly.	2024-04-16 10:29:21 +02:00
Christopher Faulet	4a16560315	MINOR: peers: sligthly adapt part processing the stopping signal The signal and the PEERS_F_DONOTSTOP flag are now handled in the loop on peers to force sessions shutdown. We will need to loop on all peers to update their state. It is easier this way.	2024-04-16 10:29:21 +02:00
Christopher Faulet	4ca8a00955	MINOR: peers: Add flags to report the peer state to the resync task As the previous patch, this patch is also part of the refactoring of peer locking mechanisme. Here we add flags to represent a transitional state for a peer. It will be the resync task responsibility to update the peers state accordingly. A peer may be in 4 transitional states: * accepted : a connection was accepted from a peer * connected: a connection to a peer was established * release : a peer session was released * renewed : a peer session was released because it was replaced by a new one. Concretly, this will be equivalent to released+accepted If none of these flags is set, it means the transition, if any, was processed by the resync task, or no transition happened.	2024-04-16 10:29:21 +02:00
Christopher Faulet	9b78e33837	MINOR: peers: Add 2 peer flags about the peer learn status PEER_F_LEARN_PROCESS and PEER_F_LEARN_FINISHED flags are added to help to fix locking issue about peers. Indeed, a peer is able to update the peers "section" state under its own lock. Because the resync task locks all peers at once, there is no conflict at this level. But there is nothing to prevent 2 peers to update the peers state in same time. So it seems there is no real issue here, but there is a theorical thread-safety issue here. And it means the locking mechanism of the peers must be reviewed. In this context, the 2 flags above will help to move all update of the peers state in the scope of resync task. Each peer will be able to update its own state and the resync task will be responsible to update the peers state accordingly.	2024-04-16 10:29:21 +02:00
Christopher Faulet	4078893049	MINOR: peers: Split resync process function to separate running/stopping states The function responsible to deal with resynchro between all peers is now split in two subfunctions. The first one is used when HAProxy is running while the other one is used in soft-stop case. This patch is required to be able to refactor locking mechanism of the peers.	2024-04-16 10:29:21 +02:00
Frederic Lecaille	98583c4256	BUG/MEDIUM: grpc: Fix several unaligned 32/64 bits accesses There were several places in grpc and its dependency protobuf where unaligned accesses were done. Read accesses to 32 (resp. 64) bits values should be performed by read_u32() (resp. read_u64()). Replace these unligned read accesses by correct calls to these functions. Same fixes for doubles and floats. Such unaligned read accesses could lead to crashes with bus errors on CPU archictectures which do not fix them at run time. This patch depends on this previous commit: 861199fa71 MINOR: net_helper: Add support for floats/doubles. Must be backported as far as 2.6.	2024-04-16 07:37:28 +02:00
William Lallemand	fa5c4cc6ce	MINOR: ssl: 'key-base' allows to load a 'key' from a specific path The global 'key-base' keyword allows to read the 'key' parameter of a crt-store load line using a path prefix. This is the equivalent of the 'crt-base' keyword but for 'key'. It only applies on crt-store.	2024-04-15 15:27:10 +02:00
William Lallemand	6567d09af5	MINOR: ssl: supports crt-base in crt-store Add crt-base support for "crt-store". It will be used by 'crt', 'ocsp', 'issuer', 'sctl' load line parameter. In order to keep compatibility with previous configurations and scripts for the CLI, a crt-store load line will save its ckch_store using the absolute crt path with the crt-base as the ckch tree key. This way, a `show ssl cert` on the CLI will always have the completed path.	2024-04-15 15:25:36 +02:00
William Lallemand	785d5ef3f0	CLEANUP: ssl: remove dead code in cfg_parse_crtstore() Remove dead code reported in #2531.	2024-04-15 09:05:27 +02:00
Willy Tarreau	3ef7daa731	BUG/MAJOR: ring: use the correct size to reallocate startup_logs In 3.0-dev, with commit `7c9ce715c9` ("MINOR: ring: make callers use ring_data() and ring_size(), not ring->buf"), we made startup_logs_dup() use ring_size() to get the old ring size and pass it to ring_new() to create a new ring. But due to the ambiguity of the allocate vs usable size, this resulted in slightly shrinking the buffer compared to the previous one, occasionally causing crashes if the first one was already full of warnings, as seen in GH issue #2529. We need to use the allocated size instead, thanks to the function brought by previous commit. No backport is needed, this only affects 3.0-dev. Thanks to @felipewd for the detailed report that allowed to spot the problem.	2024-04-15 08:26:41 +02:00
Willy Tarreau	b662c5d2b8	MINOR: ring: clarify the usage of ring_size() and add ring_allocated_size() There's currently an abiguity around ring_size(), it's said to return the allocated size but returns the usable size. We can't change it as it's used everywhere in the code like this. Let's fix the comment and add ring_allocated_size() instead for anything related to allocation.	2024-04-15 08:25:03 +02:00
Willy Tarreau	da6bb13790	BUG/MINOR: lru: fix the standalone test case for invalid revision In 2.6, a build issue for LRU in standalone test mode was addressed by commit `bf9c07fd9` ("BUILD/DEBUG: lru: update the standalone code to support the revision"), but using revision 1 while looking up rev 0 results in 100% misses. Let's fix this and commit with revision 0 as well. No backport is needed, this only happens when hacking on the code.	2024-04-13 08:43:12 +02:00
Valentine Krasnobaeva	985d458571	MINOR: proto_quic: add proto name in alert In quic_alloc_dghdlrs() add proto name in the last alert. This helps to identify potential problem immediately and makes log messages more uniform.	2024-04-12 18:51:50 +02:00
Valentine Krasnobaeva	7041c078d6	MINOR: listener/protocol: add proto name in alerts Frontend and listen sections allow unlimited number of bind statements, it is often, when there is a bind statement per supported protocol, like below: listen test mode http bind quic4@0.0.0.0:443 name quic ssl crt ... bind 0.0.0.0:443 name https ssl alpn http/1.1,h2 crt ... bind 0.0.0.0:8080 ... ... It seems useful to show corresponded protocol name in alerts and warnings, when problem occures with port binding, connection resuming or sharding. This helps to figure out immediately, which bind statement has a wrong setting or which protocol module is the root cause of the issue.	2024-04-12 18:51:40 +02:00
Willy Tarreau	c0ee2d78d7	DEBUG: pools: report the data around the offending area in case of mismatch When the integrity check fails, it's useful to get a dump of the area around the first faulty byte. That's what this patch does. For example it now shows this before reporting info about the tag itself: Contents around first corrupted address relative to pool item:. Contents around address 0xe4febc0792c0+40=0xe4febc0792e8: 0xe4febc0792c8 [80 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792d0 [a0 f7 23 a4 fe e4 00 00] [..#.....] 0xe4febc0792d8 [90 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792e0 [d9 93 fb ff fd ff ff ff] [........] 0xe4febc0792e8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f0 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc079300 [d9 93 fb ff ff ff ff ff] [........] This may be backported to 2.9 and maybe even 2.8 as it does help spot the cause of the memory corruption.	2024-04-12 18:01:55 +02:00
Willy Tarreau	16e3655fbd	REORG: pool: move the area dump with symbol resolution to tools.c This function is particularly useful to dump unknown areas watching for opportunistic symbols, so let's move it to tools.c so that we can reuse it a little bit more.	2024-04-12 18:01:20 +02:00
Willy Tarreau	b21aaef4e5	DEBUG: pool: improve decoding of corrupted pools When a corruption was detected in an object, it's often said that the tag doesn't match the pool, but it should also check if it matches the location of an earlier pool_free() call, which happens when -dMcaller is used. That's what we're doing now.	2024-04-12 18:01:05 +02:00
Willy Tarreau	21447b1dd4	BUG/MAJOR: stick-tables: fix race with peers in entry expiration In 2.9 with commit `7968fe3889` ("MEDIUM: stick-table: change the ref_cnt atomically") we significantly relaxed the stick-tables locking when dealing with peers by adjusting the ref_cnt atomically and moving it out of the lock. However it opened a tiny window that became problematic in 3.0-dev7 when the table's contention was lowered by commit `1a088da7c2` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"). What happens is that some peers may access the entry for reading at the moment it's about to expire, and while the read accesses to push the data remain unnoticed (possibly that from time to time we push crap), but the releasing of the refcount causes a new write that may damage anything else. The scenario is the following: process_table_expire() peer_send_teachmsgs() RDLOCK(&updt_lock); tick_is_expired() != 0 ebmb_delete(ts->key); if (ts->upd.node.leaf_p) { HA_ATOMIC_INC(&ts->ref_cnt); RDUNLOCK(&updt_lock); WRLOCK(&updt_lock); eb32_delete(&ts->upd); } __stksess_free(t, ts); peer_send_updatemsg(ts); RDLOCK(&updt_lock); HA_ATOMIC_DEC(&ts->ref_cnt); Here it's clear that the bottom part of peer_send_teachmsgs() believes to be protected but may act on freed data. This is more visible when enabling -dMtag,no-merge,integrity because the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes the eviction check fail while the tag has the address of the left __stksess_free(), proving a completed pool_free() before the decrement, and the anomaly there is pretty visible in the crash dump. Changing INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two, confirming that the operation happened there. The solution is not very hard, it consists in checking for the ref_cnt on the left after grabbing the lock, and doing both before deleting the element, so that we have the guarantee that either the peer will not take it or that it has already started taking it. This was proven to be sufficient, as instead of crashing after 3s of injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn. In order to stress the setup, a config involving 4+ peers, tracking HTTP request with randoms and applying a bwlim-out filter with a random key, with a client made of 160 h2 conns downloading 10 streams of 4MB objects in parallel managed to trigger it within a few seconds: frontend ft http-request track-sc0 rand(100000) table tbl filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl http-request set-bandwidth-limit lim-out use_backend bk backend bk server s1 198.18.0.30:8000 server s2 198.18.0.34:8000 backend tbl stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers This seems to be very dependent on the timing and setup though. This will need to be backported to 2.9. This part of the code was reindented with shards but the block should remain mostly unchanged. The logic to apply is the same.	2024-04-12 18:00:13 +02:00
Willy Tarreau	d8c2f5c586	BUG/MEDIUM: peers/trace: fix crash when listing event types Sending "trace peers event" on the CLI crashes because the event list in the peers is not finished. This was introduced in 2.4 by commit `d865935f32` ("MINOR: peers: Add traces to peer_treat_updatemsg().") so this must be backported to 2.4.	2024-04-12 17:59:55 +02:00
Willy Tarreau	90efe8a877	CLEANUP: stick-tables: always respect the to_batch limit when trashing When adding the shards support to tables with commit `1a088da7c` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), the condition to stop eliminating entries based on the batch size being reached is based on a pre-decrement of the max_search counter, but now it goes back into the outer loop which doesn't check it, so next time it does it when entering the next shard, it will become even more negative and will properly stop, but at first glance it looks like an int overflow (which it is not). Let's make sure the outer loop stops on this condition so that we don't continue searching when the limit is reached.	2024-04-12 17:58:54 +02:00
Willy Tarreau	44a8f9e7fc	BUG/MEDIUM: stick-tables: fix the task's next expiration date While changing the stick-table indexing that led to commit `1a088da7c` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), I met a problem with the task's expiration date being incorrectly updated, I fixed it and apparently I committed the wrong version :-/ The effect is that the task's date is only correctly reset if the table is empty, otherwise the task wakes up again and is queued at the previous date, eating 100% CPU. The tick_isfirst() must not be used when storing the last result. No backport is needed as this was only merged in 3.0-dev7.	2024-04-12 17:58:54 +02:00
William Lallemand	d308c9a9f6	MINOR: ssl/crtlist: alloc ssl_conf only when a valid keyword is found crt-list will be enhanced with ckch_conf keywords, however these keywords does not fill the 'ssl_conf' structure. So we don't need to allocate the ssl_conf for every options between [ ] but only when we found a relevant one.	2024-04-12 15:38:54 +02:00
William Lallemand	81e54ef197	MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path Remove the ambiguous "ckchs" name and reflect the fact that its loaded from a path.	2024-04-12 15:38:54 +02:00
William Lallemand	00eb44864b	MINOR: ssl: add the section parser for 'crt-store' 'crt-store' is a new section useful to define the struct ckch_store. The "load" keyword in the "crt-store" section allows to define which files you want to load for a specific certificate definition. Ex: crt-store load crt "site1.crt" key "site1.key" load crt "site2.crt" key "site2.key" frontend in bind *:443 ssl crt "site1.crt" crt "site2.crt" This is part of the certificate loading which was discussed in #785.	2024-04-12 15:38:54 +02:00
Christopher Faulet	aaa72e06e5	BUG/MEDIUM: cache/stats: Handle inbuf allocation failure in the I/O handler When cache and stats applets were changed to use their own buffers, a change was also performed to no longer access the stream from the I/O handller. Among other things, the HTTP start-line of the request is now retrieved to get the method. But, when these changes were brought, the inbuf buffer allocation failures were not handled. It is of course not so common. But if this happens, a crash may be experienced. To fix the issue, we now check for inbuf allocation failures before accessing it. No backported needed.	2024-04-12 15:00:04 +02:00
Damien Claisse	0797e05d9f	BUG/MINOR: server: fix slowstart behavior We observed that a dynamic server which health check is down for longer than slowstart delay at startup doesn't trigger the warmup phase, it receives full traffic immediately. This has been confirmed by checking haproxy UI, weight is immediately the full one (e.g. 75/75), without any throttle applied. Further tests showed that it was similar if it was in maintenance, and even when entering a down or maintenance state after being up. Another issue is that if the server is down for less time than slowstart, when it comes back up, it briefly has a much higher weight than expected for a slowstart. An easy way to reproduce is to do the following: - Add a server with e.g. a 20s slowstart and a weight of 10 in config file - Put it in maintenance using CLI (set server be1/srv1 state maint) - Wait more than 20s, enable it again (set server be1/srv1 state ready) - Observe UI, weight will show 10/10 immediately. If server was down for less than 20s, you'd briefly see a weight and throttle value that is inconsistent, e.g. 50% throttle value and a weight of 5 if server comes back up after 10s before going back to 6% after a second or two. Code analysis shows that the logic in server_recalc_eweight stops the warmup task by setting server's next state to SRV_ST_RUNNING if it didn't change state for longer than the slowstart duration, regardless of its current state. As a consequence, a server being down or disabled for longer than the slowstart duration will never enter the warmup phase when it will be up again. Regarding the weight when server comes back up, issue is that even if the server is down, we still compute its next weight as if it was up, hence when it comes back up, it can briefly have a much higher weight than expected during slowstart, until the warmup task is called again after last_change is updated. This patch aims to fix both issues.	2024-04-11 19:24:01 +02:00
Willy Tarreau	018443b8a1	BUILD: makefile: get rid of the CPU variable The CPU variable, when used, is almost always exclusively used with "generic" to disable any CPU-specific optimizations, or "native" to enable "-march=native". Other options are not used and are just making CPU_CFLAGS more confusing. This commit just drops all pre-configured variants and replaces them with documentation about examples of supported options. CPU_CFLAGS is preserved as it appears that it's mostly used as a proxy to inject the distro's CFLAGS, and it's just empty by default. The CPU variable is checked, and if set to anything but "generic", it emits a warning about its deprecation and invites the user to read INSTALL. Users who would just set CPU_CFLAGS will be able to continue to do so, those who were using CPU=native will have to pass CPU_CFLAGS=-march=native and those who were passing one of the other options will find it in the doc as well. Note that this also removes the "CPU=" line from haproxy -vv, that most users got used to seeing set to "generic" or occasionally "native" anyway, thus that didn't provide any useful information.	2024-04-11 17:33:28 +02:00
Willy Tarreau	772f9a5874	BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option This option has been set by default for a very long time and also complicates the manipulation of the DEBUG variable. Let's make it the official default and permit to unset it by setting it to zero. The other pool-related DEBUG options were adjusted to also explicitly check for the zero value for consistency.	2024-04-11 17:25:45 +02:00
Willy Tarreau	b22b968a48	BUILD: cache: fix non-inline vs inline declaration mismatch to silence a warning Some compilers report this on the cache: src/cache.c:235: warning: 'release_entry_locked' declared inline after being called src/cache.c:235: warning: previous declaration of 'release_entry_locked' was here And indeed, the function is first declared non-inline and later inline. Let's just set the inline status from the beginning. It's not really needed to backport this.	2024-04-11 17:25:25 +02:00
Amaury Denoyelle	32b9e97f92	BUG/MINOR: guid: fix crash on invalid guid name Using an invalid GUID for guid_insert() causes a crash. This is easily reproducible using for example an invalid character with "guid" keyword. Here is the provided backtrace : Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault. 0x00005555561fda95 in guid_insert (objt=0x520000002080, uid=0x519000002dac "@foo2", errmsg=0x7ffff4c0a7a0) at src/guid.c:83 83 ha_free(&guid->node.key); This error is present in guid_insert() cleanup parts. GUID node is not allocated in case of an early error so it's impossible to dereference it to free guid.node.key. Fix this simply by using an intermediary pointer key. This does not need to be backported.	2024-04-11 15:09:53 +02:00
Christopher Faulet	3fc38593d5	BUG/MEDIUM: http-ana: Deliver 502 on keep-alive for fressh server connection In HTTP keep-alive, if we face a connection error to the server while sending the request, the error should not be reported, and the client-side connection should simply be closed, so that client knows it can retry. If the error happens during the connection stage, there is two cases. We have a connection timeout or an allocation error. In this case, the 503 response must be skipped if it is not the first request on the client-side connection. Or we have a connection error. In this case, the 503 response must be skipped if it is a reused server connection. Otherwise, during the connection stage, the 503-Service-unavailable response is delivered to the client. The part works properly. If the error happens after this stage, the 502-Bad-gateway response delivering should only be based on the server-side connection status. For a reused server connection, the client-side connection must be closed with no reponses. However, for a fresh server-side connection, a 502-Bad-gateway response must be delivered to the client. Unfortunately, This part is buggy. Only the client-side connection state is considered and the response is skipped if it is not the first request for the same client connection. The bug is not so visbile in HTTP/1.1 but in H2 and H3 it is pretty annoying because for a connection, requests are multiplexed, in parallels. It means there is no first request. So, because of this bug, for H2 and H3, 502-Bad-gateway responses because of a connection error before receiveing the response are always skipped. To fix the issue, in http_wait_for_response() analyser, we must only rely on SF_SRV_REUSED stream flag to skip the 502 response or not. This flag is set if the server connection was reused. The bug is their since a while. SF_SRV_REUSED flag was added in the version 1.5 especially to fix this kind of bug. But only the 503 case was fixed. This patch should fix the issue #2285. It must be backported to every stable versions.	2024-04-10 15:42:22 +02:00
Amaury Denoyelle	c6e3d60fc1	OPTIM: quic: do not call qc_prep_pkts() if everything sent qc_send() is implemented as a loop to repeatedly invoke qc_prep_pkts()/qc_send_ppkts(). This ensures that all data are emitted even if bigger that a single Tx buffer instance. This is useful if congestion window is empty but big enough for application data. Looping is interrupted if qc_prep_pkts() returns a negative error code, for example due to no space left in congestion window. It can also returns 0 if no input data to sent, which also interrupt the loop. To limit this last case, removed quic_enc_level from send_list each time everything already send via qc_prep_pkts(). Loop can then be interrupted as soon as send_list is empty, avoiding an extra superfluous call to qc_prep_pkts().	2024-04-10 11:18:01 +02:00
Amaury Denoyelle	34b31d85cb	OPTIM: quic: do not call qc_send() if nothing to emit qc_send() was systematically called by quic_conn IO handlers with all instantiated quic_enc_level. Change this to only register quic_enc_level for send if needed. Do not call at all qc_send() if no qel registered. A new function qel_need_sending() is defined to detect if sending is required. First, it checks if quic_enc_level has prepared frames or probing is set. It can also returns true if ACK required either on quic_enc_level itself or because of quic_conn ack timer fired. Finally, a CONNECTION_CLOSE emission for quic_conn is also a valid case. This should reduce the number of invocations of qc_send(). This could improve slightly performance, as well as simplify traces debugging.	2024-04-10 11:17:21 +02:00
Amaury Denoyelle	7fc1ce5bc8	MEDIUM: quic: remove duplicate hdshk/app send functions A series of previous patches have clean up sending function for handshake case. Their new exposed API is now flexible enough to convert app case to use the same functions. As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single entry point for QUIC emission. It is used during application packets emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal function qc_prep_hpkts() is renamed qc_prep_pkts(). Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts(). Also removed qc_send_app_probing(). It was a simple wrapper over other application send functions. Now, default qc_send() can be reuse for such cases with <old_data> argument set to true. An adjustment was needed when converting qc_send_hdshk_pkts() to the general qc_send() version. Previously, only a single packets encoding/emission cycle was performed. This was enough as handshake packets are always smaller than Tx buffer. However, it may be possible to emit more application data. As such, a loop is necessary to perform multiple encoding/emission cycles, as this was already the case in qc_send_app_pkts(). No functional difference should happen with this commit. However, as these are critcal functions with a lot of changes, this patch is labelled as medium.	2024-04-10 11:07:35 +02:00
Amaury Denoyelle	4e4127a66d	MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb quic_conn_io_cb() manually implements emission by using lower level functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the higher level function qc_send_hdshk_pkts() which notably handle buffer allocation and purging. This allows to clean up send API by flagging qc_prep_pkts() and qc_send_ppkts() as static. They are now used in a single location inside qc_send_hdshk_pkts().	2024-04-10 11:07:19 +02:00
Amaury Denoyelle	3a8f4761e7	MINOR: quic: improve sending API on retransmit qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on retransmission. It was restricted to use two quic_enc_level pointers as distinct arguments. Adapt it to directly use the same list of quic_enc_level which is passed then to qc_prep_hpkts(). Now for retransmission quic_enc_level send list is built directly into qc_dgrams_retransmit() which calls qc_send_hdshk_pkts(). Along this change, a new utility function qel_register_send() is defined. It is an helper to build the quic_enc_level send list. It enfores that each quic_enc_level instance is only registered in a single list to prevent memory issues. It is both used in qc_dgrams_retransmit() and quic_conn_io_cb().	2024-04-10 11:06:55 +02:00
Amaury Denoyelle	93f5b4c8ae	MINOR: quic: uniformize sending methods for handshake Emission of packets during handshakes was implemented via an API which uses two alternative ways to specify the list of frames. The first one uses a NULL list of quic_enc_level as argument for qc_prep_hpkts(). This was an implicit method to iterate on all qels stored in quic_conn instance, with frames already inserted in their corresponding quic_pktns. The second method was used for retransmission. It uses a custom local quic_enc_level list specified by the caller as input to qc_prep_hpkts(). Frames were accessible through <retransmit> list pointers of each quic_enc_level used in an implicit mechanism. This commit clarifies the API by using a single common method. Now quic_enc_level list must always be specified by the caller. As for frames list, each qels must set its new field <send_frms> pointer to the list of frames to send. Callers of qc_prep_hpkts() are responsible to always clear qels send list. This prevent a single instance of quic_enc_level to be inserted while being attached to another list. This allows notably to clean up some unnecessary code. First, <retransmit> list of quic_enc_level is removed as it is replaced by new <send_frms>. Also, it's now possible to use proper list_for_each_entry() inside qc_prep_hpkts() to loop over each qels. Internal functions for quic_enc_level selection is now removed.	2024-04-10 11:06:41 +02:00
Amaury Denoyelle	44eec848e8	MINOR: quic: simplify qc_send_hdshk_pkts() return Clean up trailer of qc_send_hdshk_pkts() by removing label "leave". Only "out" label is now used. This operation is safe as LIST_DEL_INIT() is idempotent. Caller of qc_send_hdshk_pkts() also ensures input frame lists are freed, so it's better to always reset quic_enc_level <retrans_frms> member. Also take the opportunity to reset QUIC_FL_CONN_RETRANS_OLD_DATA only if already set. This is considered more robust and will also remove unneeded trace occurences. No functional change. The main objective of this commit is to clean up code in preparation of a refactoring on send functions.	2024-04-10 10:14:36 +02:00
Aurelien DARRAGON	9420cfc0db	CLEANUP: log: lf_text_len() returns a pointer not an integer In `c83684519` ("MEDIUM: log: add the ability to include samples in logs") we checked the return value of lf_text_len() as an integer instead of comparing the pointer with NULL explicitly. Since this may be confusing, let's test the return value against NULL. [ada: for backports, the patch needs to be applied manually because of `c6a713842` ("MINOR: log: simplify last_isspace in sess_build_logline()")]	2024-04-09 17:35:53 +02:00
Aurelien DARRAGON	28548f812f	BUG/MINOR: log: invalid snprintf() usage in sess_build_logline() According to snprintf() man page: The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit, then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. However, in sess_build_logline(), each time we need to check the return value of snprintf(), here is how we proceed: iret = snprintf(tmplog, max, ...); if (iret < 0 \|\| iret > max) // error // success tmplog += iret; Here is the issue: if snprintf() lacks 1 byte space to write the terminating NULL byte, it will return max. Which means in this case that we fail to know that snprintf() truncated the output in reality, and we still add iret to tmplog pointer. Considering sess_build_logline() should NOT write more than <maxsize> bytes (including the terminating NULL byte) as per the function description, in this case the function would write <maxsize>+1 byte (to write the terminating NULL byte upon return), which may lead to invalid write if <dst> was meant to hold <maxsize> bytes at maximum. Hopefully, this bug wasn't triggered so far because sess_build_logline() is called with logline as <dst> argument and <global.max_syslog_len> as <maxsize> argument, logline being initialized with 1 extra byte upon startup. But we better fix this to comply with the function description and prevent any side-effect since some sess_build_logline() helpers may assume that 'tmplog-dst < maxsize' is always true. Also sess_build_logline() users probably don't expect NULL-byte to be accounted for in the produced logline length. This should be backported to all stable versions. [ada: for backports, the patch needs to be applied manually because of `c6a713842` ("MINOR: log: simplify last_isspace in sess_build_logline()")]	2024-04-09 17:35:53 +02:00
Aurelien DARRAGON	8226e92eb0	BUG/MINOR: tools/log: invalid encode_{chunk,string} usage encode_{chunk,string}() is often found to be used this way: ret = encode_{chunk,string}(start, stop...) if (ret == NULL \|\| *ret != '\0') { //error } //success Indeed, encode_{chunk,string} will always try to add terminating NULL byte to the output string, unless no space is available for even 1 byte. However, it means that for the caller to be able to spot an error, then it must provide a buffer (here: start) which is already initialized. But this is wrong: not only this is very tricky to use, but since those functions don't return NULL on failure, then if the output buffer was not properly initialized prior to calling the function, the caller will perform invalid reads when checking for failure this way. Moreover, even if the buffer is initialized, we cannot reliably tell if the function actually failed this way because if the buffer was previously initialized with NULL byte, then the caller might think that the call actually succeeded (since the function didn't return NULL and didn't update the buffer). Also, sess_build_logline() relies lf_encode_{chunk,string}() functions which are in fact wrappers for encode_{chunk,string}() functions and thus exhibit the same error handling mechanism. It turns out that sess_build_logline() makes unsafe use of those functions because it uses the error-checking logic mentionned above while buffer (tmplog) is not guaranteed to be initialized when entering the function. This may ultimately cause malfunctions or invalid reads if the output buffer is lacking space. To fix the issue once and for all and prevent similar bugs from being introduced, we make it so encode_{string, chunk} and escape_string() (based on encode_string()) now explicitly return NULL on failure (when the function failed to write at least the ending NULL byte) lf_encode_{string,chunk}() helpers had to be patched as well due to code duplication. This should be backported to all stable versions. [ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to backport `ae1e14d65` ("CLEANUP: tools: removing escape_chunk() function") first, considering it's not very relevant to maintain a dead function]	2024-04-09 17:35:45 +02:00
Aurelien DARRAGON	b15f6dfae8	BUG/MINOR: log: fix lf_text_len() truncate inconsistency In `c5bff8e550` ("BUG/MINOR: log: improper behavior when escaping log data") we fixed lf_text_len() behavior with +E (escape) option. However we introduced an inconsistency if output buffer is too small to hold the whole output and truncation occurs: indeed without +E option up to <size> bytes (including NULL byte) will be used whereas with +E option only <size-1> bytes will be used. Fixing the function and related comment so that the function behaves the same in regards to truncation whether +E option is used or not. This should be backported to all stable versions.	2024-04-09 17:30:13 +02:00
Willy Tarreau	0db8b6034d	BUG/MINOR: listener: always assign distinct IDs to shards When sharded listeners were introdcued in 2.5 with commit `6dfbef4145` ("MEDIUM: listener: add the "shards" bind keyword"), a point was overlooked regarding how IDs are assigned to listeners: they are just duplicated! This means that if a "option socket-stats" is set and a shard is configured, or multiple thread groups are enabled, then a stats dump will produce several lines with exactly the same socket name and ID. This patch tries to address this by trying to assign consecutive numbers to these sockets. The usual algo is maintained, but with a preference for the next number in a shard. This will help users reserve ranges for each socket, for example by using multiples of 100 or 1000 on each bind line, leaving enough room for all shards to be assigned. The mechanism however is quite tricky, because the configured listener currently ends up being the last one of the shard. This helps insert them before the current position without having to revisit them. But here it causes a difficulty which is that we'd like to restart from the current ID and assign new ones on top of it. What is done is that the number is passed between shards and the current one is cleared (and removed from the tree) so that we instead insert the new one. It's tricky because of the situation which depends whether it's the listener that was already assigned on the bind line or not. But overall, always removing the entry, always adding the new one when the ID is not zero, and passing them from the reference to the next one does the trick. This may be backported to all versions till 2.6.	2024-04-09 08:57:02 +02:00
Christopher Faulet	70251a2aeb	BUG/MINOR: cli: Don't warn about a too big command for incomplete commands When a command is too big to fit in a buffer, a error is returned before closing. However, the error is also returned if the command is small enough but incomplete. It happens on abort. In this case, the error must not be reported. The regression was introduced when a dedicated sn_buf callbac function was added. To fix the issue, both cases are now handled separately. No backport needed.	2024-04-08 11:49:13 +02:00
Willy Tarreau	c499cd15c7	BUG/MEDIUM: quic: don't blindly rely on unaligned accesses There are several places where the QUIC low-level code performs unaligned accesses by casting unaligned char* pointers to uint32_t, but this is totally forbidden as it only works on machines that support unaligned accesses, and either crashes on other ones (SPARC, MIPS), can result in reading garbage (ARMv5) or be very slow due to the access being emulated (RISC-V). We do have functions for this, such as read_u32() and write_u32() that rely on the compiler's knowledge of the machine's capabilities to either perform an unaligned access or do it one byte at a time. This must be backported at least as far as 2.6. Some of the code moved a few times since, so in order to figure the points that need to be fixed, one may look for a forced pointer cast without having verified that either the machine is compatible or that the pointer is aligned using this: $ git grep 'uint[36][24]_t \*)' Or build and run the code on a MIPS or SPARC and perform requests using curl to see if they work or crash with a bus error. All the places fixed in this commit were found thanks to an immediate crash on the first request. This was tagged medium because the affected archs are not the most common ones where QUIC will be found these days.	2024-04-06 00:07:49 +02:00
Valentine Krasnobaeva	f0b6436f57	MEDIUM: capabilities: check process capabilities sets Since the Linux capabilities support add-on (see the commit `bd84387beb` ("MEDIUM: capabilities: enable support for Linux capabilities")), we can also check haproxy process effective and permitted capabilities sets, when it starts and runs as non-root. Like this, if needed network capabilities are presented only in the process permitted set, we can get this information with capget and put them in the process effective set via capset. To do this properly, let's introduce prepare_caps_from_permitted_set(). First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If there is a match, LSTCHK_NETADM is removed from global.last_checks list to avoid warning, because in the initialization sequence some last configuration checks are based on LSTCHK_NETADM flag and haproxy process euid may stay unpriviledged. If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted set will be checked and only capabilities given in 'setcap' keyword will be promoted in the process effective set. LSTCHK_NETADM will be also removed in this case by the same reason. In order to be transparent, we promote from permitted set only capabilities given by user in 'setcap' keyword. So, if caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not be unset and warning about missing priviledges will be emitted at initialization. Need to call it before protocol_bind_all() to allow binding to priviledged ports under non-root and 'setcap cap_net_bind_service' must be set in the global section in this case.	2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva	e4306fb822	BUG/MINOR: init: relax LSTCHK_NETADM checks for non root Linux capabilities support and ability to preserve it for running process after switching to a global.uid was added recently by the commit `bd84387beb` ("MEDIUM: capabilities: enable support for Linux capabilities")). This new feature hasn't yet been taken into account by last config checks, which are performed at initialization stage. So, to update it, let's perform it after set_identity() call. Like this, current EUID is already changed to a global.uid and prepare_caps_for_setuid() would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap' keyword in the configuration file were preserved. Otherwise, if system doesn't support Linux capabilities or they were not set via 'setcap', we keep the previous strict behaviour: process will terminate with an alert, in order to insist that user: either needs to change run UID (worst case: start and run as root), or he needs to set/recheck capabilities listed as 'setcap' arguments. In the case, when haproxy will start and run under a non-root user this patch doesn't change the previous behaviour: we'll still let him try the configuration, but we inform via warning that unexpected things may occur. Need to be backported until v2.9, including v2.9.	2024-04-05 18:01:54 +02:00
Amaury Denoyelle	0489d85263	MINOR: listener: implement GUID support This commit is similar with the two previous ones. Its purpose is to add GUID support on listeners. Due to bind_conf and listeners configuration, some specifities were required. Its possible to define several listeners on a single bind line, for example by specifying multiple addresses. As such, it's impossible to support a "guid" keyword on a bind line. The problem is exacerbated by the cloning of listeners when sharding is used. To resolve this, a new keyword "guid-prefix" is defined for bind lines. It allows to specify a string which will be used as a prefix for automatically generated GUID for each listeners attached to a bind_conf. Automatic GUID listeners generation is implemented via a new function bind_generate_guid(). It is called on post-parsing, after bind_complete_thread_setup(). For each listeners on a bind_conf, a new GUID is generated with bind_conf prefix and the index of the listener relative to other listeners in the bind_conf. This last value is stored in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted, for example due to a non-unique value, an error is returned, startup is interrupted with configuration rejected.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	8259456981	MINOR: server: implement GUID support This commit is similar to previous one, except that it implements GUID support for server instances. A guid_node field is inserted into server structure. A new "guid" server keyword is defined.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	da754b4533	MINOR: proxy: implement GUID support Implement proxy identiciation through GUID. As such, a guid_node member is inserted into proxy structure. A proxy keyword "guid" is defined to allow user to fix its value.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	1009ca4160	MINOR: guid: restrict guid format GUID format is unspecified to allow users to choose the naming scheme. Some restrictions however are added by this patch, mainly to ensure coherence and memory usage. The first restriction is on the length of GUID. No more than 127 characters can be used to prevent memory over consumption. The second restriction is on the character set allowed in GUID. Utility function invalid_char() is used for this : it allows alphanumeric values and '-', '_', '.' and ':'.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	84fa6b344a	MINOR: guid: introduce global UID module Define a new module guid. Its purpose is to be able to attach a global identifier for various objects such as proxies, servers and listeners. A new type guid_node is defined. It will be stored in the objects which can be referenced by such GUID. Several functions are implemented to properly initialized, insert, remove and lookup GUID in a global tree. Modification operations should only be conducted under thread isolation.	2024-04-05 15:40:42 +02:00
Aurelien DARRAGON	e751eebfc6	MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing Currently, the way proxy-oriented logformat directives are handled is way too complicated. Indeed, "log-format", "log-format-error", "log-format-sd" and "unique-id-format" all rely on preparsing hints stored inside proxy->conf member struct. Those preparsing hints include the original string that should be compiled once the proxy parameters are known plus the config file and line number where the string was found to generate precise error messages in case of failure during the compiling process that happens within check_config_validity(). Now that lf_expr API permits to compile a lf_expr struct that was previously prepared (with original string and config hints), let's leverage lf_expr_compile() from check_config_validity() and instead of relying on individual proxy->conf hints for each logformat expression, store string and config hints in the lf_expr struct directly and use lf_expr helpers funcs to handle them when relevant (ie: original logformat string freeing is now done at a central place inside lf_expr_deinit(), which allows for some simplifications) Doing so allows us to greatly simplify the preparsing logic for those 4 proxy directives, and to finally save some space in the proxy struct. Also, since httpclient proxy has its "logformat" automatically compiled in check_config_validity(), we now use the file hint from the logformat expression struct to set an explicit name that will be reported in case of error ("parsing [httpclient:0] : ...") and remove the extraneous check in httpclient_precheck() (logformat was parsed twice previously..)	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	2b79457bc0	MEDIUM: log: add compiling logic to logformat expressions split parse_logformat_string() into two functions: parse_logformat_string() sticks to the same behavior, but now becomes an helper for lf_expr_compile() which uses explicit arguments so that it becomes possible to use lf_expr_compile() without a proxy, but also compile an expression which was previously prepared for compiling (set string and config hints within the logformat expression to avoid manually storing string and config context if the compiling step happens later). lf_expr_dup() may be used to duplicate an expression before it is compiled, lf_expr_xfer() now makes sure that the input logformat is already compiled. This is some prerequisite works for log-profiles implementation, no functional change should be expected.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7a21c3a4ef	MAJOR: log: implement proper postparsing for logformat expressions This patch tries to address a design flaw with how logformat expressions are parsed from config. Indeed, some parse_logformat_string() calls are performed during config parsing when the proxy mode is not yet known. Here's a config example that illustrates the issue: defaults mode tcp listen test bind :8888 http-response set-header custom-hdr "%trl" # needs http mode http The above config should work, because the effective proxy mode is http, yet haproxy fails with this error: [ALERT] (99051) : config : parsing [repro.conf:6] : error detected in proxy 'test' while parsing 'http-response set-header' rule : format tag 'trl' is reserved for HTTP mode. To fix the issue once and for all, let's implement smart postparsing for logformat expressions encountered during config parsing: - split parse_logformat_string() (and subfonctions) in order to create a new lf_expr_postcheck() function that must be called to finish preparing and checking the logformat expression once the proxy type is known. - save some config hints info during parse_logformat_string() to generate more precise error messages during lf_expr_postcheck(), if needed, we rely on curpx->conf.args.{file,line} hints for that because parse_logformat_string() doesn't know about current file and line number. - lf_expr_postcheck() uses PR_FL_CHECKED proxy flag to know if the function may try to make the proxy compatible with the expression, or if it should simply fail as soon as an incompatibility is detected. - if parse_logformat_string() is called from an unchecked proxy, then schedule the expression for postparsing, else (ie: during runtime), run the postcheck right away. This change will also allow for some logformat expression error handling simplifications in the future.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	56d8074798	MINOR: proxy: add PR_FL_CHECKED flag PR_FL_CHECKED is set on proxy once the proxy configuration was fully checked (including postparsing checks). This information may be useful to functions that need to know if some config-related proxy properties are likely to change or not due to parsing or postparsing/check logics. Also, during runtime, except for some rare cases config-related proxy properties are not supposed to be changed.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	6810c41f8e	MEDIUM: tree-wide: add logformat expressions wrapper log format expressions are broadly used within the code: once they are parsed from input string, they are converted to a linked list of logformat nodes. We're starting to face some limitations because we're simply storing the converted expression as a generic logformat_node list. The first issue we're facing is that storing logformat expressions that way doesn't allow us to add metadata alongside the list, which is part of the prerequites for implementing log-profiles. Another issue with storing logformat expressions as generic lists of logformat_node elements is that it's starting to become really hard to tell when we rely on logformat expressions or not in the code given that there isn't always a comment near the list declaration or manipulation to indicate that it's relying on logformat expressions under the hood, so this adds some complexity for code maintenance. This patch looks quite impressive due to changes in a lot of header and source files (since logformat expressions are broadly used), but it does a simple thing: it defines the lf_expr structure which itself holds a generic list of logformat nodes, and then declares some helpers to manipulate lf_expr elements and fixes the code so that we now exclusively manipulate logformat_node lists as lf_expr elements outside of log.c. For now, lf_expr struct only contains the list of logformat nodes (no additional metadata), but now that we have dedicated type and helpers, doing so in the future won't be problematic at all and won't require extensive code changes.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7d8f45b647	MEDIUM: log: carry tag context in logformat node This is a pretty simple patch despite requiring to make some visible changes in the code: When parsing a logformat string, log tags (ie: '%tag', AKA log tags) are turned into logformat nodes with their type set to the type of the corresponding logformat_tag element which was matched by name. Thus, when "compiling" a logformat tag, we only keep a reference to the tag type from the original logformat_tag. For example, for "%B" log tag, we have the following logformat_tag element: { .name = "B", .type = LOG_FMT_BYTES, .mode = PR_MODE_TCP, .lw = LW_BYTES, .config_callback = NULL } When parsing "%B" string, we search for a matching logformat tag inside logformat_tags[] array using the provided name, once we find a matching element, we craft a logformat node whose type will be LOG_FMT_BYTES, but from the node itself, we no longer have access to other informations that are set in the logformat_tag struct element. Thus from a logformat_node resulting from a log tag, with current implementation, we cannot easily get back to matching logformat_tag struct element as it would require us to scan the whole logformat_tags array at runtime using node->type to find the matching element. Let's take a simpler path and consider all tag-specific LOG_FMT_* subtypes as being part of the same logformat node type: LOG_FMT_TAG. Thanks to that, we're now able to distinguish logformat nodes made from logformat tag from other logformat nodes, and link them to their corresponding logformat_tag element from logformat_tags[] array. All it costs is a simple indirection and an extra pointer in logformat_node struct. While at it, all LOG_FMT_* types related to logformat tags were moved inside log.c as they have no use outside of it since they are simply lookup indexes for sess_build_logline() and could even be replaced by function pointers some day...	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	8cf5c3d7f0	MINOR: log: expose logformat_tag struct rename logformat_type internal struct to logformat_tag to to make it less confusing, then expose logformat_tag struct through header file so that it can be referenced in other structs. also rename logformat_keywords[] to logformat_tags[] for better consistency.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	c85cbc1061	MEDIUM: log: rename logformat var to logformat tag What we use to call logformat variable in the code is referred as log-format tag in the documentation. Having both 'var' and 'tag' labels referring to the same thing is really confusing. Let's make the code comply with the documentation by replacing all logformat var/variable/VAR occurences with either tag or TAG. No functional change should be expected, the only visible side-effect from user point of view is that "variable" was replaced by "tag" in some error messages.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	64b5ab87ef	BUG/MINOR: proxy: fix logformat expression leak in use_backend rules When support for dynamic names was added for use_backend rules in `702d44f2f` ("MEDIUM: proxy: support use_backend with dynamic names"), the sample expression resulting from parse_logformat_string() was only freed for non dynamic rules (when the expression resolved to a simple string node). But for complex expressions (ie: multiple nodes), rule->dynamic was set but the expression was never released, resulting in a small memory leak when freeing the parent proxy. To fix the issue, in free_proxy(), we free the switching rule expression if the switching rule is dynamic. This should be backported to every stable versions. [ada: prior to 2.9, free_logformat_list() helper did not exist: we may use the same manual sample expr freeing logic as in server_rules pruning right above it]	2024-04-04 19:10:01 +02:00
Tim Duesterhus	ad54273cf9	MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message As per the `sd_notify` manual: > A field carrying the monotonic timestamp (as per CLOCK_MONOTONIC) formatted > in decimal in μs, when the notification message was generated by the client. > This is typically used in combination with "RELOADING=1", to allow the > service manager to properly synchronize reload cycles. See systemd.service(5) > for details, specifically "Type=notify-reload". Thus this change allows users with a recent systemd to switch to `Type=notify-reload`, should they desire to do so. Correct behavior was verified with a Fedora 39 VM. see systemd/systemd#25916 [wla: the service file should be updated this way:] diff --git a/admin/systemd/haproxy.service.in b/admin/systemd/haproxy.service.in index 22a53d8aab..8c6dadb5e5 100644 --- a/admin/systemd/haproxy.service.in +++ b/admin/systemd/haproxy.service.in @@ -8,12 +8,11 @@ EnvironmentFile=-/etc/default/haproxy EnvironmentFile=-/etc/sysconfig/haproxy Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "EXTRAOPTS=-S /run/haproxy-master.sock" ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS -ExecReload=@SBINDIR@/haproxy -Ws -f $CONFIG -c $EXTRAOPTS -ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 -Type=notify +Type=notify-reload +ReloadSignal=SIGUSR2 # The following lines leverage SystemD's sandboxing options to provide # defense in depth protection at the expense of restricting some flexibility Signed-off-by: William Lallemand <wlallemand@haproxy.com>	2024-04-04 15:58:29 +02:00
Frederic Lecaille	fcb096f7cd	BUG/MINOR: stick-tables: Missing stick-table key nullity check This bug arrived with this commit: MAJOR: stktable: split the keys across multiple shards to reduce contention At this time, there are no callers which call stktable_get_entry() without checking the nullity of <key> passed as parameter. But the documentation of this function says it supports this case where the <key> passed as parameter could be null. Move the nullity test on <key> at first statement of this function. Thanks to @chipitsine for having reported this issue in GH #2518.	2024-04-04 11:08:56 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	864ac31174	OPTIM: stick-tables: check the stksess without taking the read lock Thanks to the previous commit, we can now simply perform an atomic read on stksess->seen and take the write lock to recreate the entry only if at least one peer has seen it, otherwise leave it untouched. On a test on 40 cores, the performance used to drop from 2.10 to 1.14M RPS when one peer was connected, now it drops to 2.05, thus there's basically no impact of connecting a peer vs ~45% previously, all spent in the read lock. This can be particularly important when often updating the same entries (user-agent, source address during an attack etc).	2024-04-03 17:34:47 +02:00
Willy Tarreau	4c1480f13b	MINOR: stick-tables: mark the seen stksess with a flag "seen" Right now we're taking the stick-tables update lock for reads just for the sake of checking if the update index is past it or not. That's costly because even taking the read lock is sufficient to provoke a cache line write, while when under load or attack it's frequent that the update has not yet been propagated and wouldn't require anything. This commit brings a new field to the stksess, "seen", which is zeroed when the entry is updated, and set to one as soon as at least one peer starts to consult it. This way it will reflect that the entry must be updated again so that this peer can see it. Otherwise no update will be necessary. For now the flag is only set/reset but not exploited. A great care is taken to avoid writes whenever possible.	2024-04-03 17:34:47 +02:00
Willy Tarreau	15522fc243	BUG/MINOR: bwlim/config: fix missing '\n' after error messages Some bwlim error messages at parsing time were missing the trailing '\n' in commit `2b6777021d` ("MEDIUM: bwlim: Add support of bandwith limitation at the stream level"). This commit can be backported wherever the commit above is (likely as far as 2.7).	2024-04-03 17:34:36 +02:00
Willy Tarreau	f821a3983e	BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1 Thanks to previous commit, we can now build with USE_SYSTEMD=1 on any system without requiring any parts from systemd. It just turns our that there was one remaining include in haproxy.c that needed to be replaced with haproxy/systemd.h to build correctly. That's what this commit does.	2024-04-03 17:34:36 +02:00
William Lallemand	aa3632962f	MEDIUM: mworker: get rid of libsystemd Given the xz drama which allowed liblzma to be linked to openssh, lets remove libsystemd to get rid of useless dependencies. The sd_notify API seems to be stable and is now documented. This patch replaces the sd_notify() and sd_notifyf() function by a reimplementation inspired by the systemd documentation. This should not change anything functionnally. The function will be built when haproxy is built using USE_SYSTEMD=1. References: https://github.com/systemd/systemd/issues/32028 https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes Before: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffcfaf65000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000) libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000) libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000) libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000) liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000) /lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000) After: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffd51901000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000) /lib64/ld-linux-x86-64.so.2 (0x00007f758d729000) A backport to all stable versions could be considered at some point.	2024-04-03 15:53:18 +02:00
Aurelien DARRAGON	837a26ab05	BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port() Since `faa8c3e` ("MEDIUM: lb-chash: Deterministic node hashes based on server address") the following configuration will cause haproxy to crash: backend test1 mode http balance hash int(1) server s1 haproxy.org:80 This is because lbprm.update_server_eweight() method is now systematically called in _srv_set_inetaddr_port() upon srv addr/port change (and with the above config it happens during startup after initial dns resolution). However, depending on the chosen lbprm algo, update_server_eweight function may not be set (it is not a mandatory method, some lb implementations don't define it). Thus, using 'balance hash' with map-based hashing or 'balance sticky' will cause a crash due to a NULL de-reference in _srv_set_inetaddr_port(). To fix the issue, we first check that the update_server_eweight() method is set before using it. No backport needed unless `faa8c3e` ("MEDIUM: lb-chash: Deterministic node hashes based on server address") gets backported.	2024-04-03 11:58:03 +02:00
Frederic Lecaille	0e14bac7bd	BUILD: quic: 32 bits compilation issue (QUIC_MIN() usage) This issue arrived with this commit: "MINOR: quic: HyStart++ implementation (RFC 9406)" Thanks to @chipitsine for having reported this issue in GH #2513. Should be backported where the previous commit will be backported.	2024-04-03 11:14:50 +02:00
Willy Tarreau	6a2f09de1c	OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs() In peer_send_msg(), we take a lock before calling peer_send_teach_process_msgs because of the check on the flags and update indexes, and the function then drops it then takes it again just to resume in the same situation, so that on return we can drop it again! Not only this is absurd because it doubles the costs of taking the lock, it's also totally inefficient because it takes a write lock while the only usage that is done with it is to read the indexes! Let's drop the lock from peer_send_teach_process_msgs() and move it explicitly in its only caller around the condition, and turn it into a read lock only.	2024-04-03 09:34:08 +02:00
Willy Tarreau	1ea18fa8a3	BUG/MAJOR: applet: fix a MIN vs MAX usage in appctx_raw_rcv_buf() The MAX() macro was used to limit the count of bytes to be transferred in appctx_raw_rcv_buf() by commit `ee53d8421f` ("MEDIUM: applet: Simplify a bit API to exchange data with applets") instead of MIN(). It didn't seem to have any consequences until commit `f37ddbeb4b` ("MAJOR: cli: Update the CLI applet to handle its own buffers") that triggers a BUG_ON() in __b_putblk() when the other side is slow to read, because we're trying to append a full buffer on top of a non-empty one. A way to reproduce it is to dump a heavy stick table on the CLI with a screen scrolling. No backport is needed since this was introduced in 3.0-dev3 and revealed after dev5 only.	2024-04-03 09:34:08 +02:00
Willy Tarreau	ed45d13321	BUG/MEDIUM: stick-table: use the update lock when reading tables from peers In 2.9, the stick-tables' locking was split between the lock used to manipulate the contents (->lock) and the lock used to manipulate the list of updates and the update indexes (->updt_lock). This was done with commit `87e072eea5` ("MEDIUM: stick-table: use a distinct lock for the updates tree"). However a part was overlooked in the peers code, the parts that consult (and update) the indexes use the table's lock instead of the update lock. It's surprising that it hasn't caused more trouble. It's likely due to the fact that the tree nodes are not often immediately freed and that their memory area remains connected to valid nodes in the tree during peer_stksess_lookup(), while other parts only check or update indexes, thus are not that critical. This needs to be backported wherever the commit above is, thus logically 2.9.	2024-04-03 09:33:10 +02:00
Christopher Faulet	3abf6934a4	BUG/MEDIUM: stconn: Don't forward shutdown to SE if iobuf is not empty It is only an issue when the kernel splicing is used. The zero-copy forwarding via the buffers is not affected. When a shutdown is received on the producer side and some data are blocked in the pipe for a while, the shutdown may be forwarded to the other side. Usually, in this case, the shutdown must be scheduled, waiting all output data (from the channel and the consumer's iobuf) are sent. But only the channel was considered. The bug was introduced by commit `20c463955d` ("MEDIUM: channel: don't look at iobuf to report an empty channel"). To fix the issue, we must also check data blocked in the consummer iobuf. This patch should solve the issue #2505. It must be backported to 2.9.	2024-04-03 08:46:37 +02:00
Frederic Lecaille	a305bb92b9	MINOR: quic: HyStart++ implementation (RFC 9406) This is a simple algorithm to replace the classic slow start phase of the congestion control algorithms. It should reduce the high packet loss during this step. Implemented only for Cubic.	2024-04-02 18:47:19 +02:00
Willy Tarreau	e9b774f4b3	BUG/MINOR: backend: properly handle redispatch 0 According to the documentation, "option redispatch 0" is expected to disable redispatch just like "no option redispatch", but due to the fact that it keeps PR_O_REDISP set, it doesn't actually work. Let's make sure value 0 is properly handled and drops PR_O_REDISP. This can be backported to all versions since it seems it has been broken since its introduction in 1.6 with commit `726ab7145c` ("MEDIUM: backend: Allow redispatch on retry intervals"). As a workaround, "no option redispatch" does work though.	2024-04-02 15:19:18 +02:00
Tim Duesterhus	ec38e1b39b	CLEANUP: Reapply ha_free.cocci This reapplies ha_free.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Tim Duesterhus	7c317f4619	CLEANUP: Reapply xalloc_cast.cocci This reapplies xalloc_cast.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Tim Duesterhus	f88ea5949c	CLEANUP: Reapply strcmp.cocci (2) This reapplies strcmp.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Tim Duesterhus	cd5d62249f	CLEANUP: Reapply ist.cocci (3) This reapplies ist.cocci across the whole src/ tree.	2024-04-02 07:27:33 +02:00
Willy Tarreau	5fc1afb341	BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task In 2.7 we addressed a race condition in the stick tables expiration task with commit `fbb934d` ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task"). The issue was that the task could be running on another thread which would destroy its expiration timer while one had just recalculated it and prepares to queue it, causing a bug due to the attempt to queue an expired task. The fix consisted in enclosing the change into the stick-table's lock, which had a very low cost since it's done only after having checked that the date changed, i.e. no more than once every millisecond. But as reported by Ricardo and Felipe from Taghos in github issue #2508, a tiny race remained after the fix: the unlock() was done before the call to task_queue(), leaving a tiny window for another thread to run between unlock() and task_queue() and erase the timer. As confirmed, it's sufficient to also protect the task_queue() call. But overall this raises a point regarding the task_queue() API on tasks that may run anywhere. A while ago an attempt was made at removing the timer for woken up tasks, but something like this would be deserved with more atomicity on the timer manipulation (e.g. atomically use task_schedule() instead maybe). This should be backported to all stable branches.	2024-04-02 07:07:57 +02:00
Anthony Deschamps	faa8c3e024	MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key <key>" which can be set to "id" (the existing behaviour, default), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now check whether the server's hash has changed. If it has, we have to remove all its nodes first, since the node keys will also have to change.	2024-04-02 07:00:10 +02:00
Amaury Denoyelle	da03396bb3	BUG/BUILD: debug: fix unused variable error A compilation error occurs when using DEBUG_MEM_STATS due to a variable now being unused in debug_iohandler_memstats() : src/debug.c: In function ‘debug_iohandler_memstats’: src/debug.c:1862:24: error: unused variable ‘sc’ [-Werror=unused-variable] 1862 \| struct stconn *sc = appctx_sc(appctx); \| ^~ This is caused since the following commit : `94b8ed446f` MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands This must not be backported.	2024-03-29 17:21:04 +01:00
Aurelien DARRAGON	3c6dfa618a	MEDIUM: log/balance: leverage lbprm api for log load-balancing log load-balancing implementation was not seamlessly integrated within lbprm API. The consequence is that it could become harder to maintain over time since it added some specific cases just for the log backend. Moreover, it resulted in some code duplication since balance algorithms that are common to logs and regular (tcp, http) backends were specifically rewritten for log backends. Thanks to the previous commit, we now have all the prerequisites to make log load-balancing fully leverage lbprm logic. Thus in this patch we make __do_send_log_backend() use existing lbprm algorithms, and we no longer require log-specific lbprm initialization in cfgparse.c and in postcheck_log_backend(). As a bonus, for log backends this allows weighed algorithms to properly support weights (ie: roundrobin, random and log-hash) since we now leverage the same lb algorithms that we use for tcp/http backends (doc was updated).	2024-03-29 17:08:37 +01:00
Aurelien DARRAGON	9aea6df81f	MINOR: lbprm: implement true "sticky" balance algo As previously mentioned in `cd352c0db` ("MINOR: log/balance: rename "log-sticky" to "sticky""), let's define a sticky algorithm that may be used from any protocol. Sticky algorithm sticks on the same server as long as it remains available. The documentation was updated accordingly.	2024-03-29 17:08:37 +01:00
Aurelien DARRAGON	d0692d7019	BUG/MINOR: log/balance: detect if user tries to use unsupported algo `b61147fd` ("MEDIUM: log/balance: merge tcp/http algo with log ones") introduced some ambiguities, because while it shares some algos with the ones from mode {tcp,http}, we forgot report an error when the user tries to use an algorithm that is not available in this mode (as per the doc). Because of that, haproxy would silently drop log messages during runtime. To fix that, we ensure that algo is one of the supported ones during log backend postparsing. If the algo is not supported, we raise an error. This should be backported in 2.9 with `b61147fd`	2024-03-29 17:08:36 +01:00
Christopher Faulet	87426e82ec	MAJOR: cli: Use a custom .snd_buf function to only copy the current command The CLI applet is now using its own snd_buf callback function. Instead of copying as most output data as possible, only one command is copied at a time. To do so, a new state CLI_ST_PARSEREQ is added for the CLI applet. In this state, the CLI I/O handle knows a full command was copied into its input buffer and it must parse this command to evaluate it.	2024-03-28 17:32:55 +01:00
Christopher Faulet	838fb54de6	MINOR: stconn: Add a connection flag to notify sending data are the last ones This flag can be use by endpoints to know the data to send, via .snd_buf callback function are the last ones. It is useful to know a shutdown is pending but it cannot be delivered while sedning data are not consumed.	2024-03-28 17:32:55 +01:00
Christopher Faulet	a933569b52	MINOR: applet: Let's applets .snd_buf function deal with full input buffers It is now the responsbility of applets .snd_buf callback function to notify the input buffer is full. This will allow the applets to not consume all data waiting for more data. Of course, it is only useful for applets using a custom .snd_buf callback function.	2024-03-28 17:32:55 +01:00
Christopher Faulet	f37ddbeb4b	MAJOR: cli: Update the CLI applet to handle its own buffers It is the third applet to be refactored to use its own buffers. In addition to the CLI applet, some I/O handlers of CLI commands were also updated, especially the stats ones. Some command I/O handlers were updated to use applet's buffers instead of channels ones.	2024-03-28 17:32:51 +01:00
Christopher Faulet	b8ca114031	BUG/MEDIUM: applet: State appctx have more data if its EOI/EOS/ERROR flag is set It is an harmless bug for now because only stats and cache applets are using their own buffers and it is not possible to trigger this bug with these applets. However, it remains important to try a receive if EOI, EOS or ERROR is reached by the applet while no data was produced. Otherwise, it is not possible to ack these events at the SE level. No backport needed.	2024-03-28 17:28:21 +01:00
Christopher Faulet	d2403a412c	MINOR: applet: Always use applet API to set appctx flags Some appctx flags were still set manually while there is a dedicated function to do so. Be sure to always use applet_fl_set() to set appctx flags.	2024-03-28 17:28:20 +01:00
Christopher Faulet	94b8ed446f	MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands The main CLI I/O handle is responsible to interrupt the processing on shutdown/abort. It is not the responsibility of the I/O handler of CLI commands to take care of it.	2024-03-28 17:28:20 +01:00
Christopher Faulet	2e6733eb45	MEDIUM: stream: Use generic version to perform sync receives and sends Instead of using connection versions, we now use generic versions. It means we will also perfom sync receives and sync sends on applets too, but only for applets using their own buffers. Old applets are not concerned.	2024-03-28 17:28:20 +01:00
Christopher Faulet	5056cbdb86	MINOR: sc_strm: Add generic version to perform sync receives and sends sc_sync_recv() and sc_sync_send() were added to use connection or applet versions, depending on the endpoint type. For now these functions are not used. But this will be used by process_stream() to replace the connection version.	2024-03-28 17:28:20 +01:00
Christopher Faulet	498520fdf5	BUG/MINOR: cli: Report an error to user if command or payload is too big Too big command, larger than a buffer, was silently rejected by the CLI applet. It was handled as an error and the connection was closed, but no error message was reported to user to notify him. Now an error is reported before closing. It is only displayed if the chunk buffer used by the CLI applet is full and no delimiter (\n or ;) is found to mark the end of the command. It works for a simple command but also for a command with a huge payload. This patch could be backported to all stable versions.	2024-03-28 17:28:20 +01:00
Amaury Denoyelle	6333e6ec8e	MINOR: server: allow cookie for dynamic servers This commit allows "cookie" keyword for dynamic servers. After code review, nothing was found which could prevent a dynamic server to use it. An extra warning is added under cli_parse_add_server() if cookie value is ignored due to a non HTTP backend. This patch is not considered a bugfix. However, it may backported if needed as its impact seems minimal.	2024-03-28 11:54:21 +01:00
Damien Claisse	9a0e0d3a19	BUG/MINOR: server: fix persistence cookie for dynamic servers When adding a server dynamically, we observe that when a backend has a dynamic persistence cookie, the new server has no cookie as we receive the following HTTP header: set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/ Whereas we were expecting to receive something like the following, which is what we receive for a server added in the config file: set-cookie: test-cookie=abcdef1234567890; path=/ After investigating code path, srv_set_dyncookie() is never called when adding a server through CLI, it is only called when parsing config file or using "set server bkd1/srv1 addr". To fix this, call srv_set_dyncookie() inside cli_parse_add_server(). This patch must be backported up to 2.4.	2024-03-28 11:54:21 +01:00
Amaury Denoyelle	250c19032f	BUG/MINOR: server: reject enabled for dynamic server Since their first implementation, dynamic servers are created into maintenance state. This has been done purposely to avoid immediate activation of a newly inserted server. However, this principle is incompatible if "enabled" keyword is used on "add server". The newly created instance will be unreacheable as proxy load-balancing algorithm is not informed of its presence via srv_lb_propagate(). The new server could be unblocked by toggling its state with "disable server" / "enable server" commands, which will trigger srv_lb_propagate() invocation. To avoid this unexpected state, simply forbid "enabled" keyword for dynamic servers. In the long-term, it could be possible to re authorize it but at least this requires to call srv_lb_propagate() on dynamic server creation. This should fix github issue #2497. This patch should not be backported as-is, to avoid breaking dynamic servers API on stable versions. "enabled" should instead be ignored for them. This will be implemented in a dedicated patch on top of 2.9.	2024-03-28 11:51:05 +01:00
Remi Tricot-Le Breton	7359c0c7f4	MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option This option can be used to set a default ocsp-update mode for all certificates of a given conf file. It allows to activate ocsp-update on certificates without the need to create separate crt-lists. It can still be superseded by the crt-list 'ocsp-update' option. It takes either "on" or "off" as value and defaults to "off". Since setting this new parameter to "on" would mean that we try to enable ocsp-update on any certificate, and also certificates that don't have an OCSP URI, the checks performed in ssl_sock_load_ocsp were softened. We don't systematically raise an error when trying to enable ocsp-update on a certificate that does not have an OCSP URI, be it via the global option or the crt-list one. We will still raise an error when a user tries to load a certificate that does have an OCSP URI but a missing issuer certificate (if ocsp-update is enabled).	2024-03-27 11:38:28 +01:00
Remi Tricot-Le Breton	b1d623949c	BUG/MINOR: ssl: Detect more 'ocsp-update' incompatibilities The inconsistencies in 'ocsp-update' parameter were only checked when parsing a crt-list line so if a certificate was used on a bind line after being used in a crt-list with 'ocsp-update' set to 'on', then no error would be raised. This patch helps detect such inconsistencies. This patch can be backported up to branch 2.8.	2024-03-27 11:38:28 +01:00
Remi Tricot-Le Breton	97c2734f44	BUG/MINOR: ssl: Wrong ocsp-update "incompatibility" error message In a crt-list such as the following: foo.pem [ocsp-update off] foo.com foo.pem bar.com we would get a wrong "Incompatibilities found in OCSP update mode ..." error message during init when the two lines are actually saying the same thing since the default for 'ocsp-update' option is 'off'. This patch can be backported up to branch 2.8.	2024-03-27 11:38:28 +01:00
Willy Tarreau	40d1c84bf0	BUG/MAJOR: ring: free the ring storage not the ring itself when using maps A recent issue was uncovered by the CI which started to randomly report segfaults on a few tests, and more systematically on FreeBSD. It turn out that it was introduced by recent commit `03816ccfa9` ("MAJOR: ring: insert an intermediary ring_storage level"), which overlooked the munmap() path of the sink and startup logs: once the ring and its storage were split, it was no longer correct to munmap() the ring, only its storage area needs to be unmapped, and the ring must always be freed separately. Thanks to Christopher and William for their help at trying to reproduce it and figure the circumstances that triggers it. No backport is needed.	2024-03-26 15:15:59 +01:00
Aurelien DARRAGON	bd98db5078	BUG/MINOR: server: 'source' interface ignored from 'default-server' directive Sebastien Gross reported that 'interface' keyword ('source' subargument) is silently ignored when used from 'default-server' directive despite the documentation implicitly stating that the keyword should be supported there. When support for 'source' keyword was added to 'default-server' directive in `dba97077` ("MINOR: server: Make 'default-server' support 'source' keyword."), we properly duplicated the conn iface_name from the default- server but we forgot to copy the conn iface_len which must be set as well since it is used as setsockopt()'s 'optlen' argument in tcp_connect_server(). It should be backported to all stable versions.	2024-03-26 11:09:02 +01:00
Willy Tarreau	2431b20640	BUILD: ssl: fix build error on older compilers with openssl-3.2 OpenSSL 3.2 triggers the code part added by commit `25da217` ("MINOR: ssl: Update ssl_fc_curve/ssl_bc_curve to use SSL_get0_group_name") which contains a variable declaration in the for() statement and breaks on older compilers, as reported in GH issues #2501. Let's just declare it normally to fix the problem. This must be backported wherever the commit above is (at least 2.9).	2024-03-25 21:21:47 +01:00
Willy Tarreau	4bc81ec985	CLEANUP: ring: use only curr_cell and not next_cell in the main write loop It turns out that we can reduce by one variable in the loop and this clobbers one less register, making it slightly faster on Cortex A72.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0a0a64ef02	OPTIM: ring: use relaxed stores to release the threads We don't care in what order the threads are released, so we can write their sent value using relaxed atomic stores. This brings a 3-5% perf boost on ARM with 80 cores, reaching 7.25M/s, and doesn't change anything on x86 since it keeps using strict ordering.	2024-03-25 17:34:19 +00:00
Willy Tarreau	cabe945876	MINOR: ring: avoid writes to cells during copy It has been found that performing a first pass consisting in copying all messages, and a second one to notify about releases is more efficient on AMD than updating all of them on the fly using a CAS, despite making writers wait longer to be released. Maybe it's related to the ability for the CPU to prefetch the contents during a simple load while it wouldn't do it for an XCHG, it's unsure at this point. This will also mater permit to use relaxed stores to release threads. On ARM the performance increased to 7.0M/s. If this patch is applied before the dropping of the intermediary step, instead it drops to 3.9M/s. This shows the dependency between such changes that strive to limit the number of writes on the fast path. On x86_64, the EPYC at 3C6T saw a small drop from 4.57M to 4.45M, but the 24C48T setup saw a nice 33% boost from 3.33M to 4.44M, i.e. we get stable perf at 3 and 24 cores, despite having 8 CCX involved and fighting with each other. Other possibilities are: - use of HA_ATOMIC_XCHG() instead of FETCH_OR() => slightly faster (4.62/7.37 vs 4.58/7.34). Pb: requires to modify the readers to wait much longer since the tail value won't be valid in this case during updates, and it will have to wait by looping over it. - use other conditions to release a cell => to be tested	2024-03-25 17:34:19 +00:00
Willy Tarreau	39df8c903d	MINOR: ring: it's not x86 but all non-ARMv8.1 which needs the read before OR Archs relying on CAS benefit from a read prior to FETCH_OR, so it's not just x86 that benefits from this. Let's just change the condition to only exclude __ARM_FEATURE_ATOMICS which is the only one faster without.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e6fc167aec	CLEANUP: ring: further simplify the write loop The loop was cleaned up a little bit so that the inner loops are more readable and that the ifdef'd parts are whole blocks and not just an "if" condition. A few conditions were adjusted to benefit from "break" and "continue".	2024-03-25 17:34:19 +00:00
Willy Tarreau	4b984c5baa	MINOR: ring: simplify the write loop a little bit This is mostly a cleanup in that it turns the two-level loop into a single one, but it also simplifies the code a little bit and brings some performance savings again, which are mostly noticeable on ARM, but don't change anything for x86.	2024-03-25 17:34:19 +00:00
Willy Tarreau	573bbbe127	MEDIUM: ring: improve speed in the queue waiting loop on x86_64 x86_64 doesn't have a native atomic FETCH_OR(), it's implemented using a CAS, which will always cause a write cycle. Here we know we can just wait as long as the lock bit is held so better loop on a load, and only attempt the CAS on success. This requires a tiny ifdef and brings nice benefits. This brings the performance back from 3.33M to 3.75M at 24C48T while doing no change at 3C6T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	30a659c355	MEDIUM: ring: significant boost in the loop by checking the ring queue ptr first By doing that and placing the cpu_relax at the right places, the ARM reaches 6.0M/s on 80 threads. On x86_64, at 3C6T the EPYC sees a small increase from 4.45M to 4.57M but at 24C48T it sees a drop from 3.82M to 3.33M due to the write contention hidden behind the CAS that implements the FETCH_OR(), that we'll address next.	2024-03-25 17:34:19 +00:00
Willy Tarreau	1e2311edbc	MAJOR: ring: implement a waiting queue in front of the ring The queue-based approach consists in forcing threads to wait away from the work area so as not to disturb the current writer, and to prepare the work by grouping them in a queue. The last arrived takes the head of the queue by placing its preinitialized ring cell there, becomes the queue's leader, informs itself about the amount of previously accumulated bytes so that when its turn comes, it immediately knows how much room is needed to be released. It can then take the whole queue with it, leaving an empty one for new threads to come while it's releasing the room needed to copy everything. By doing so we're cascading contention areas so that multiple parts can work in parallel. Note that we must never leave a write counter set to 0xFF at tail, and this happens when a message cannot fit and we give up, because in this case we're writing back tail_ofs, and only later we restore the counter. The solution here is to make a special case when we're going to drop the messages, and to write the readers count before restoring tail. This already shows a tremendous performance gain on ARM (385k -> 4.8M), thanks to the fact that now all waiting threads wait on the queue's head instead of polluting the tail lock. On x86_64, the EPYC sees a big boost at 24C48T (1.88M -> 3.82M) and a slowdown at 3C6T (6.0->4.45) though this one is much less of a concern as so few threads need less bandwidth than bigger counts.	2024-03-25 17:34:19 +00:00
Willy Tarreau	6c1b29d06f	MINOR: ring: make the number of queues configurable Now the rings have one wait queue per group. This should limit the contention on systems such as EPYC CPUs where the performance drops dramatically when using more than one CCX. Tests were run with different numbers and it was showed that value 6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and anything around these values degrades quickly. The value has been left tunable in the global section. This commit only introduces everything needed to set up the queue count so that it's easier to adjust it in the forthcoming patches, but it was initially added after the series, making it harder to compare. It was also shown that trying to group the threads in queues by their thread groups is counter-productive and that it was more efficient to do that by applying a modulo on the thread number. As surprising as it seems, it does have the benefit of well balancing any number of threads.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e3f101a19a	MINOR: ring: add the definition of a ring waiting cell This is what will be used to describe one waiting thread, its message in the queues, and the aggregation of pending messages after it.	2024-03-25 17:34:19 +00:00
Willy Tarreau	447189f286	MINOR: ring: keep a few frequently used pointers in the local stack Code disassembly shows that ring->storage->tail and ring->queue are accessed a lot and reloaded a lot due to aliasing. Let's just have variables for them in the local stack. It makes the code smaller and slightly faster.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c7bd7a68e4	OPTIM: ring: have only one thread at a time wake up all readers It's inefficient and counter-productive that each ring writer iterates over all readers to wake them up. Let's just have one in charge of this, it strongly limits contention. The only thing is that since the thread is iterating over a list, we want to be sure that if the first readers have already completed their job, they will be woken up again. For this we keep a counter of messages delivered after the wakeup started, and the waking thread will check it before going back to sleep. In order to avoid looping forever, it will also drop its waking flag soon enough to possibly let another one take it. There used to be a few cases of watchdogs before this on a 24-core AMD EPYC platform on the list iteration those never appeared anymore. The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but remains unchanged at 24C48T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	1f8b14b7be	OPTIM: ring: don't even try to update offset when failed to read If there's nothing to read, it's pointless for a reader to try to update the offset pointer, that's two atomic ops to replace a value by itself twice. Let's just stop this.	2024-03-25 17:34:19 +00:00
Willy Tarreau	9e99cfbeb6	MAJOR: ring: drop the now unneeded lock It was only used to protect the list which is now an mt_list so it doesn't provide any required protection anymore. It obviously also used to provide strict ordering between the writer and the reader when the writer started to update the messages, but that's now covered by the oredered tail updates and updates to the readers count to protect the area. The message rate on small thread counts (up to 12) saw a boost of roughly 5% while on large counts while for large counts it lost about 2% due to some contention now becoming visible elsewhere. Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at 24C48T on the EPYC.	2024-03-25 17:34:19 +00:00
Willy Tarreau	cb482f92c4	MINOR: ring: make sure ring_dispatch waits when facing a changing message The writer is using tags 0xFF instead of readers count at the front of messages that are undergoing an update, while the tail has already been updated. The reader needs to take care of this because it can face these messages and mistakenly parse data that's still being written, leading to corruption (especially if this happens while the size is changing). Let's just stop reading when facing reserved codes, since they indicate that the end of usable messages was reached.	2024-03-25 17:34:19 +00:00
Willy Tarreau	31b93b40b0	MEDIUM: ring: protect the initialization of the initial reader offset Since we're going to remove the lock, there's no more way to prevent the ring from being fed while we're attaching a client to it. We need to freeze the buffer while looking at its head so that we can attach there and have a trustable one. We could do it by setting the lock bit on the tail offset but quite frankly we don't need to bother with that, attaching a client is rare enough to permit a thread_isolate().	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2d2dbf210	MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead Rings are keeping a lock only for the list, which apparently doesn't need anything more than an mt_list, so let's first turn it into that before dropping the lock. There should be no visible effect.	2024-03-25 17:34:19 +00:00
Willy Tarreau	04f1e3f3d9	MINOR: ring: don't take the readers lock if there are no readers There's no point looking for freshly attached readers if there are none, taking this lock requires an atomic write to a shared area, something we clearly want to avoid. A general test with 213-byte messages on different thread counts shows how the performance degrades across CCX and how this patch improves the situation: Before After 3C6T/1CCX: 6.39 Mmsg/s 6.35 Mmsg/s 6C12T/2CCX: 2.90 Mmsg/s 3.16 Mmsg/s 12C24T/4CCX: 2.14 Mmsg/s 2.33 Mmsg/s 24C48T/8CCX: 1.75 Mmsg/s 1.92 Mmsg/s This tends to confirm that the queues will really be needed and that they'll have to be per-ccx hence per thread-group. They will amortize the number of updates on head & tail (one per multiple messages).	2024-03-25 17:34:19 +00:00
Willy Tarreau	41d3ea521b	MEDIUM: ring: unlock the ring's tail earlier We know we can continue to protect the message area so we can unlock the tail as soon as we know its new value. Now we're seeing ~6.4M msg/s vs 5.4M previously on 3C6T of a 3rd gen EPYC, and 1.88M vs 1.54M for 24C48T threads, which is a significant gain! This requires to carefully write the new head counter before releasing the writers, and to change the calculation of the work area from tail..head to tail...new_tail while writing the message.	2024-03-25 17:34:19 +00:00
Willy Tarreau	3cdd3d27a8	MEDIUM: move the ring's lock to only protect the readers list Now the lock is only taken around the readers list. With careful ordering of writes to head/tail, the ring remains protected. The perf is a bit better, though (1.54M msg/s vs 1.4M at 48T on a 3rd gen EPYC, and 5.4M vs 5.3M for a 3C6T setup).	2024-03-25 17:34:19 +00:00
Willy Tarreau	eb3d5f464d	MEDIUM: ring: use the topmost bit of the tail as a lock We're now locking the tail while looking for some room in the ring. In fact it's still while writing to it, but the goal definitely is to get rid of the lock ASAP. For this we reserve the topmost bit of the tail as a lock, which may have as a possible visible effect that buffers will be limited to 2GB instead of 4GB on 32-bit machines (though in practise, good luck for allocating more than 2GB contiguous on 32-bit), but in practice since the size is read with atol() and some operating systems limit it to LONG_MAX unless passing negative numbers, the limit is already there. For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s on 48 threads on EPYC 24 cores) but this situation is only temporary so that changes can be reviewable and bisectable. Other approaches were attempted, such as using XCHG instead, which is slightly faster on x86 with low thread counts (but causes more write contention), and forces readers to stall under heavy traffic because they can't access a valid value for the queue anymore. A CAS requires preloading the value and is les good on ARMv8.1. XADD could also be considered with 12-13 upper bits of the offset dedicated to locking, but that looks overkill.	2024-03-25 17:34:19 +00:00
Willy Tarreau	2192983ffd	MEDIUM: ring: protect the reader's positions against writers The reader now needs to protect the positions it's reading. This is already done via the readers counter at the beginning of messages, but as long as the lock is present, this counter is decremented before starting to parse messages, and incremented at the end. We must now do that in reverse, first protect the end of the messages, and only then remove ourselves from the already processed messages, so that at no point could a writer pass over and possibly overwrite data we're currently watching.	2024-03-25 17:34:19 +00:00
Willy Tarreau	73b2436fe6	MEDIUM: ring: lock the tail's readers counters before proceeding with the changes The goal here is to start to protect the writing area inside the area itself so that we'll later be able to release the ring's lock. We're not there yet, but at least the tail is marked as protected for as long as the message is not fully written.	2024-03-25 17:34:19 +00:00
Willy Tarreau	d336d71cbb	MINOR: ring: make the reader check the readers count before inc/dec We'll want to reserve some special values for the readers count to temporary lock the following message, but for this it will be mandatory that readers check for them before incrementing/decrementing the counter. Let'sdo that using a CAS. The readers performance is not as critical as the writer's anyway so the slight overhead is not a problem.	2024-03-25 17:34:19 +00:00
Willy Tarreau	bf3dead20c	MEDIUM: ring: remove the struct buffer from the ring The purpose is to store a head and a tail that are independent so that we can further improve the API to update them independently from each other. The struct was arranged like the original one so that as long as a ring has its head set to zero (i.e. no recycling) it will continue to work. The new format is already detectable thanks to the "rsvd" field which indicates the number of reserved bytes at the beginning. It's located where the buffer's area pointer previously was, so that older versions of haring can continue to open the ring in repair mode, and newer ones can use the fact that the upper bits of that variable are zero to guess that it's working with the new format instead of the old one. Also let's keep in mind that the layout will further change to place some alignment constraints. The haring tool will thus updated based on this and it detects that the rsvd field is smaller than a page and that the sum of it with the size equals the mapped size, in which case it uses the new dump_v2() function instead of dump_v1(). The new function also creates a buffer from the ring's area, size, head and tail and calls the generic one so that no other code had to be adapted.	2024-03-25 17:34:19 +00:00
Willy Tarreau	01aa0a057c	MEDIUM: ring: change the ring reader to use the new vector-based API now The code now looks cleaner and more easily shows what still needs to be addressed. There are not that many changes in practice, these are mostly mechanical, essentially hiding the buffer from the callers.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4e6fadb8a1	MEDIUM: ring: replace the buffer API in ring_write() with the vec<->ring API This is the start of the replacement of the buffer API calls. Only the ring_write() function was touched. Instead of manipulating a buffer all along, we now extract the ring buffer's head and tail upon entry, store them locally and use them using the vec<->ring API until the last moment where we can update the buffer with the new values. One subtle point is that we must never fill the buffer past the last byte otherwise the vec-to-ring conversion gets lost and there's no more possibility to know where's the beginning nor the end (just like when dealing with head+tail in fact), because it then becomes impossible to distinguish between an empty and a full buffer.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4e6de42b27	MINOR: ring: allow to reduce a ring size In ring_resize() we used to check if the new ring was at least as large as the previous one before resizing it, but what counts is that it's as large as the previous one's contents. Initially it was thought this would not really matter, but given that rings are initially created as BUFSIZE, it's currently not possible to shrink them for debugging purposes. Now with this change it is.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0fa05ce171	MINOR: ring: resize only under thread isolation The ring resizing was already quite tricky, but when facing atomic writes it will no longer be possible and we definitely do not want to have to deal with a lock there. Since it's only done at boot time, and possibly later from the CLI, let's just do it under thread isolation.	2024-03-25 17:34:19 +00:00
Willy Tarreau	03816ccfa9	MAJOR: ring: insert an intermediary ring_storage level We'll need to add more complex structures in the ring, such as wait queues. That's far too much to be stored into the area in case of file-backed contents, so let's split the ring definition and its storage once for all. This patch introduces a struct ring_storage which is assigned to ring->storage, which contains minimal information to represent the storage layout, i.e. for now only the buffer, and all the rest remains in the ring itself. The storage is appended immediately after it and the buffer's pointer always points to that area. It has the benefit of remaining 100% compatible with the existing file-backed layout. In memory, the allocation loses the size of a struct buffer. It's not even certain it's worth placing the size there, given that it's constant and that a dump of a ring wouldn't really need it (the file size is sufficient). But for now everything comes with the struct buffer, and later this will change once split into head and tail. Also this area may be completed with more information in the future (e.g. storage version, format, endianness, word size etc).	2024-03-25 17:34:19 +00:00
Willy Tarreau	01abdcb307	MINOR: ring: add a flag to indicate a mapped file Till now we used to rely on a heuristic pointer comparison to check if a ring was mapped or allocated. Better assign a flag to clarify this because it's going to become difficult otherwise.	2024-03-25 17:34:19 +00:00

... 5 6 7 8 9 ...

17828 Commits