haproxy

Author	SHA1	Message	Date
Amaury Denoyelle	5db6dde058	MINOR: proto: define dedicated protocol for active reverse connect A new protocol named "reverse_connect" is created. This will be used to instantiate connections that are opened by a reverse bind. For the moment, only a minimal set of callbacks are defined with no real work. This will be extended along the next patches.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	b57f151586	REGTESTS: provide a reverse-server test with name argument This regtest is similar to the previous one, except the optional name argument is specified. An extra haproxy instance is used as a gateway for clear/TLS as vtest does not support TLS natively. A first request is done by specifying a name which does not match the idle connection SNI. This must result in a HTTP 503. Then the correct name is used which must result in a 200.	2023-08-24 17:02:37 +02:00
Amaury Denoyelle	1723e21af2	MINOR: connection: use attach-srv name as SNI reuse parameter on reverse On connection passive reverse from frontend to backend, its hash node is calculated to be able to select it from the idle server pool. If attach-srv rule defined an associated name, reuse it as the value for SNI prehash. This change allows a client to select a reverse connection by its name by configuring its server line with a SNI to permit this.	2023-08-24 17:02:34 +02:00
Amaury Denoyelle	0b3758e18f	MINOR: tcp-act: define optional arg name for attach-srv Add an optional argument 'name' for attach-srv rule. This contains an expression which will be used as an identifier inside the server idle pool after reversal. To match this connection for a future transfer through the server, the SNI server parameter must match this name. If no name is defined, match will only occur with an empty SNI value. For the moment, only the parsing step is implemented. An extra check is added to ensure that the reverse server uses SSL with a SNI. Indeed, if name is defined but server does not uses a SNI, connections will never be selected on reused after reversal due to a hash mismatch.	2023-08-24 15:28:38 +02:00
Amaury Denoyelle	f0bff29473	REGTESTS: provide a reverse-server test Test support for reverse server. This can be test without the opposite haproxy reversal support though a combination of VTC clients used to emit HTTP/2 responses after connection. This test ensures that first we get a 503 when connecting on a reverse server with no idle connection. Then a dummy VTC client is connected to act as as server. It is then expected that the same request is achieved with a 200 this time.	2023-08-24 15:02:56 +02:00
Amaury Denoyelle	58cb76d7e1	MINOR: tcp-act: parse 'tcp-request attach-srv' session rule Create a new tcp-request session rule 'attach-srv'. The parsing handler is used to extract the server targetted with the notation 'backend/server'. The server instance is stored in the act_rule instance under the new union variant 'attach_srv'. Extra checks are implemented in parsing to ensure attach-srv is only used for proxy in HTTP mode and with listeners/server with no explicit protocol reference or HTTP/2 only. The action handler itself is really simple. It assigns the stored server instance to the 'reverse' member of the connection instance. It will be used in a future patch to implement passive reverse-connect.	2023-08-24 15:02:32 +02:00
Amaury Denoyelle	6e428dfaf2	MINOR: backend: only allow reuse for reverse server A reverse server relies solely on its pool of idle connection to transfer requests which will be populated through a new tcp-request rule 'attach-srv'. Several changes are required on connect_server() to implement this. First, reuse mode is forced to always for this type of server. Then, if no idle connection is found, the request will be aborted. This results with a 503 HTTP error code, similarly to when no server is available.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	e6223a3188	MINOR: server: define reverse-connect server Implement reverse-connect server. This server type cannot instantiate its own connection on transfer. Instead, it can only reuse connection from its idle pool. These connections will be populated using the future 'tcp-request session attach-srv' rule. A reverse-connect has no address. Instead, it uses a new custom server notation with '@' character prefix. For the moment, only '@reverse' is defined. An extra check is implemented to ensure server is used in a HTTP proxy.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	4fb538d4b6	MEDIUM: h2: reverse connection after SETTINGS reception Reverse connection after SETTINGS reception if it was set as reversable. This operation is done in a new function h2_conn_reverse(). It regroups common changes which are needed for both reversal direction : H2_CF_IS_BACK is set or unset and timeouts are inverted. For the moment, only passive reverse is fully implemented. Once done, the connection instance is directly inserted in its targetted server pool. It can then be used immediately for future transfers using this server.	2023-08-24 14:49:03 +02:00
Amaury Denoyelle	1f76b8ae07	MEDIUM: connection: implement passive reverse Define a new method conn_reverse(). This method is used to reverse a connection from frontend to backend or vice-versa depending on its initial status. For the moment, passive reverse only is implemented. This covers the transition from frontend to backend side. The connection is detached from its owner session which can then be freed. Then the connection is linked to the server instance. only for passive connection on frontend to transfer them on the backend side. This requires to free the connection session after detaching it from.	2023-08-24 14:44:33 +02:00
Amaury Denoyelle	d8d9122a02	MINOR: connection: centralize init/deinit of backend elements A connection contains extra elements which are only used for the backend side. Regroup their allocation and deallocation in two new functions named conn_backend_init() and conn_backend_deinit(). No functional change is introduced with this commit. The new functions are reused in place of manual alloc/dealloc in conn_new() / conn_free(). This patch will be useful for reverse connect support with connection conversion from backend to frontend side and vice-versa.	2023-08-24 14:44:33 +02:00
Amaury Denoyelle	fbe35afaa4	MINOR: proxy: simplify parsing 'backend/server' Several CLI handlers use a server argument specified with the format '<backend>/<server>'. The parsing of this arguement is done in two steps, first splitting the string with '/' delimiter and then use get_backend_server() to retrieve the server instance. Refactor this code sections with the following changes : * splitting is reimplented using ist API * get_backend_server() is removed. Instead use the already existing proxy_be_by_name() then server_find_by_name() which contains duplicated code with the now removed function. No functional change occurs with this commit. However, it will be useful to add new configuration options reusing the same '<backend>/<server>' for reverse connect.	2023-08-24 14:44:33 +02:00
Willy Tarreau	9b47ed1a93	IMPORT: xxhash: update xxHash to version 0.8.2 Peter Varkoly reported a build issue on ppc64le in xxhash.h. Our version (0.8.1) was the last one 9 months ago, and since then this specific issue was addressed in 0.8.2, so let's apply the maintenance update. This should be backported to 2.8 and 2.7.	2023-08-24 12:01:06 +02:00
Willy Tarreau	821fc95146	MINOR: pattern: do not needlessly lookup the LRU cache for empty lists If a pattern list is empty, there's no way we can find its elements in the pattern cache, so let's avoid this expensive lookup. This can happen for ACLs or maps loaded from files that may optionally be empty for example. Doing so improves the request rate by roughly 10% for a single such match for only 8 threads. That's normal because the LRU cache pre-creates an entry that is about to be committed for the case the list lookup succeeds after a miss, so we bypass all this.	2023-08-22 07:27:01 +02:00
William Lallemand	3fde27d980	BUG/MINOR: quic: ssl_quic_initial_ctx() uses error count not error code ssl_quic_initial_ctx() is supposed to use error count and not errror code. Bug was introduced by 557706b3 ("MINOR: quic: Initialize TLS contexts for QUIC openssl wrapper"). No backport needed.	2023-08-21 15:35:17 +02:00
William Lallemand	8c004153e5	BUG/MINOR: quic: allow-0rtt warning must only be emitted with quic bind When built with USE_QUIC_OPENSSL_COMPAT, a warning is emitted when using allow-0rtt. However this warning is emitted for every allow-0rtt keywords on the bind line which is confusing, it must only be done in case the bind is a quic one. Also this does not handle the case where the allow-0rtt keyword is in the crt-list. This patch moves the warning to ssl_quic_initial_ctx() in order to emit the warning in every useful cases.	2023-08-21 15:33:26 +02:00
Fr�d�ric L�caille	2677dc1c32	MINOR: quic+openssl_compat: Emit an alert for "allow-0rtt" option QUIC 0-RTT is not supported when haproxy is linked against an TLS stack with limited QUIC support (OpenSSL). Modify the "allow-0rtt" option callback to make it emit a warning if set on a QUIC listener "bind" line.	2023-08-17 15:44:03 +02:00
Fr�d�ric L�caille	0e13325f23	MINOR: quic+openssl_compat: Do not start without "limited-quic" Add a check for limited-quic in check_config_validity() when compiled with USE_QUIC_OPENSSL_COMPAT so that we prevent a config from starting accidentally with limited QUIC support. If a QUIC listener is found when using the compatibility mode and limited-quic is not set, an error message is reported explaining that the SSL library is not compatible and proposing the user to enable limited-quic if that's what they want, and the startup fails. This partially reverts commit 7c730803d ("MINOR: quic: Warning for OpenSSL wrapper QUIC bindings without "limited-quic"") since a warning was not sufficient.	2023-08-17 15:44:03 +02:00
Amaury Denoyelle	cd97ba147c	BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1 Compilation is broken due to missing __pl_wait_unlock_long() definition when building with PLOCK_DISABLE_EBO=1. This has been introduced since the following commit which activates the inlining version of pl_wait_unlock_long() : commit 071d689a514dac522ac3654f53bc22214b5716d0 MINOR: threads: inline the wait function for pthread_rwlock emulation Add an extra check on PLOCK_DISABLE_EBO before choosing the inline or default version of pl_wait_unlock_long() to fix this.	2023-08-17 11:16:54 +02:00
Willy Tarreau	544c2f2d9e	MINOR: pools: use EBO to wait for unlock during pool_flush() pool_flush() could become a source of contention on the pool's free list if there are many competing thread using that pool. Let's make sure we use EBO and not just a simple CPU relaxation there, to avoid disturbing them.	2023-08-17 09:09:20 +02:00
Willy Tarreau	78fa54863d	MINOR: atomic: make sure to always relax after a failed CAS There were a few places left where we forgot to call __ha_cpu_relax() after a failed CAS, in the HA_ATOMIC_UPDATE_{MIN,MAX} macros, and in a few sync_* API macros (the same as above plus HA_ATOMIC_CAS and HA_ATOMIC_XCHG). Let's add them now. This could have been a cause of contention, particularly with process_stream() calling stream_update_time_stats() which uses 8 of them in a call (4 for the server, 4 for the proxy). This may be a possible explanation for the high CPU consumption reported in GH issue #2251. This should be backported at least to 2.6 as it's harmless.	2023-08-17 09:09:20 +02:00
Willy Tarreau	071d689a51	MINOR: threads: inline the wait function for pthread_rwlock emulation When using pthread_rwlock emulation, contention is reported on pl_wait_unlock_long(). This is really not convenient to analyse what is happening. Now plock supports inlining the wait call for just the lorw functions by enabling PLOCK_LORW_INLINE_WAIT. Let's do this so that now the wait time will be precisely reported as either pthread_rwlock_rdlock() or pthread_rwlock_wrlock() depending on the contended function, but no more on pl_wait_unlock_long(), which will still be reported for all other locks.	2023-08-17 00:09:05 +02:00
Willy Tarreau	e56275378f	IMPORT: lorw: support inlining the wait call Now when PLOCK_LORW_INLINE_WAIT is defined, the pl_wait_unlock_long() calls in pl_lorw_rdlock() and pl_lorw_wrlock() will be inlined so that all the CPU time is accounted for in the calling function. This is plock upstream commit c993f81d581732a6eb8fe3033f21970420d21e5e.	2023-08-17 00:09:05 +02:00
Willy Tarreau	66dcc0550e	IMPORT: plock: always expose the inline version of the lock wait function Doing so will allow to expose the time spent in certain highly contended functions, which can be desirable for more accurate CPU profiling. For example this could be done in locking functions that are already not inlined so that they are the ones being reported as those consuming the CPU instead of just pl_wait_unlock_long(). This is plock upstream commit 7505c2e2c8c4aa0ab8f52a2288e1334ae6412be4.	2023-08-17 00:09:05 +02:00
Willy Tarreau	c6b98f05d2	IMPORT: plock: also support inlining the int code Commit 9db830b ("plock: support inlining exponential backoff code") added an option to support inlining of the wait code for longs but forgot to do it for ints. Let's do it now. This is plock upstream commit b1f9f0d252fa40577d11cfb2bc0a809d6960a297.	2023-08-17 00:09:05 +02:00
Aurelien DARRAGON	3b4d2b7975	DEV: makefile: fix POSIX compatibility for "range" target make "range" which was introduced with 06d34d4 ("DEV: makefile: add a new "range" target to iteratively build all commits") does not work with POSIX shells (namely: bourne shell), and will fail with this kind of errors: \|/bin/sh: 6: Syntax error: "(" unexpected (expecting ")") \|make: *** [Makefile:1226: range] Error 2 This is because arrays and arithmetic expressions which are used for the "range" target are not supported by sh (unlike bash and other "modern" interpreters). However the make "all" target already complies with POSIX, so in this commit we try to make "range" target POSIX compliant to ensure that the makefile works as expected on systems where make uses /bin/sh as default intepreter and where /bin/sh points to POSIX shell.	2023-08-17 00:09:05 +02:00
William Lallemand	6ecb7df4e1	BUILD: Makefile: realigned USE_* options in make help Realigned the USE_* options of `make help` because of the length of USE_QUIC_OPENSSL_COMPAT. No backport needed.	2023-08-17 00:03:01 +02:00
William Lallemand	17bfc75974	BUILD: Makefile: add USE_QUIC_OPENSSL_COMPAT to make help Add the missing USE_QUIC_OPENSSL_COMPAT option to `make help`. No backport needed.	2023-08-17 00:01:27 +02:00
William Lallemand	1b5f9de1b4	BUILD: Makefile: add the USE_QUIC option to make help Add the missing "USE_QUIC" option to `make help`. Must be backported as far as 2.4.	2023-08-16 23:41:15 +02:00
Remi Tricot-Le Breton	672203c26b	DOC: jwt: Add explicit list of supported algorithms Add explicit list of algorithms supported by the jwt_verify converter.	2023-08-16 11:53:42 +02:00
Tim Duesterhus	c21b98a6d3	REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (3) Introduced in: 424981cde REGTEST: add ifnone-forwardfor test b015b3eb1 REGTEST: add RFC7239 forwarded header tests see also: fbbbc33df REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+	2023-08-15 11:29:13 +02:00
Willy Tarreau	f97db23b6d	SCRIPTS: git-show-backports: automatic ref and base detection with -m When running with -m (check for missing backports) we often have to fill lots of information that can be determined automatically the vast majority of the time: - restart point (last cherry-picked ID from one of the last commits) - current branch (HEAD) - reference branch (the one that contains most of the last commits) These elements are not that hard to determine, so let's make sure we can fall back to them when running in missing mode. The reference branch is guessed by looking at the upstream branch that most frequently contains some of the last 10 commits. It can be inaccurate if multiple branches exist with these commits, or when upstream changes due to a non-LTS branch disappearing in the middle of the series, in which case passing "-r" will help. But most of the time it works OK. It also gives precedence to local branches over remote ones for such choices. A test in 2.4 at commit 793a4b520 correctly shows 2.6/master as the upstream despite 2.5 having been used for the early ones of the tag. For the restart point, we assume that the most recent commit that was backported serves as a reference (and not the most recently backported commit). This means that the usual case where an old commit was found to be missing will not fool the analysis. Commits are inspected from 2 commits before the last tag, and reordered from the parent's tree to see which one is the last one. With this, it's sufficient to issue "git-show-backports -q -m" to get the list of backports from the upstream branch, restarting from the last backported one.	2023-08-14 13:12:56 +02:00
Johannes Naab	d5590ef633	DOC: typo: fix sc-set-gpt references Only sc-inc-gpc and sc-set-gpt do exist. The mix-up sc-inc-gpt crept in in 71d189219 (DOC: config: Rework and uniformize how TCP/HTTP rules are documented, 2021-10-14) and got copied in a92480462 (MINOR: http-rules: Add missing actions in http-after-response ruleset, 2023-01-05).	2023-08-14 09:04:45 +02:00
Aurelien DARRAGON	7eb05891d8	BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection Following the previous commit's logic, we enable the use of sc-add-gpc from tcp-request connection since it was probably forgotten in the first place for sc-set-gpt0, and since sc-add-gpc was inspired from it, it also lacks its. As sc-add-gpc was implemented in 5a72d03a58 ("MINOR: stick-table: implement the sc-add-gpc() action"), this should only be backported to 2.8	2023-08-14 09:03:49 +02:00
Aurelien DARRAGON	6c79309fda	BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection Both the documentation and original developer intents seem to suggest that sc-set-gpt/sc-set-gpt0 actions should be available from tcp-request connection. Yet because it was probably forgotten when expr support was added to sc-set-gpt0 in 0d7712dff0 ("MINOR: stick-table: allow sc-set-gpt0 to set value from an expression") it doesn't work and will report this kind of errors: "internal error, unexpected rule->from=0, please report this bug!" Fixing the code to comply with the documentation and the expected behavior. This must be backported to every stable versions. [for < 2.5, as only sc-set-gpt0 existed back then, the patch must be manually applied to skip irrelevant parts]	2023-08-14 09:03:44 +02:00
Willy Tarreau	67da85fa4c	DEV: flags/show-sess-to-flags: properly decode fd.state fd.state is reported without the "0x" prefix in show sess, let's support this during decoding. This may be backported to all versions supporting this utility.	2023-08-14 08:48:49 +02:00
Willy Tarreau	75028bcba6	[RELEASE] Released version 2.9-dev3 Released version 2.9-dev3 with the following main changes : - BUG/MINOR: ssl: OCSP callback only registered for first SSL_CTX - BUG/MEDIUM: h3: Properly report a C-L header was found to the HTX start-line - MINOR: sample: add pid sample - MINOR: sample: implement act_conn sample fetch - MINOR: sample: accept_date / request_date return %Ts / %tr timestamp values - MEDIUM: sample: implement us and ms variant of utime and ltime - BUG/MINOR: sample: check alloc_trash_chunk() in conv_time_common() - DOC: configuration: describe Td in Timing events - MINOR: sample: implement the T* timer tags from the log-format as fetches - DOC: configuration: add sample fetches for timing events - BUG/MINOR: quic: Possible crash when acknowledging Initial v2 packets - MINOR: quic: Export QUIC traces code from quic_conn.c - MINOR: quic: Export QUIC CLI code from quic_conn.c - MINOR: quic: Move TLS related code to quic_tls.c - MINOR: quic: Add new "QUIC over SSL" C module. - MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements - CLEANUP: quic: Defined but no more used function (quic_get_tls_enc_levels()) - MINOR: quic: Split QUIC connection code into three parts - CLEANUP: quic: quic_conn struct cleanup - MINOR: quic; Move the QUIC frame pool to its proper location - BUG/MINOR: chunk: fix chunk_appendf() to not write a zero if buffer is full - BUG/MEDIUM: h3: Be sure to handle fin bit on the last DATA frame - DOC: configuration: rework the custom log format table - BUG/MINOR: quic+openssl_compat: Non initialized TLS encryption levels - CLEANUP: acl: remove cache_idx from acl struct - REORG: cfgparse: extract curproxy as a global variable - MINOR: acl: add acl() sample fetch - BUILD: cfgparse: keep a single "curproxy" - BUG/MEDIUM: bwlim: Reset analyse expiration date when then channel analyse ends - MEDIUM: stream: Reset response analyse expiration date if there is no analyzer - BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing is used - BUG/MEDIUM: quic: consume contig space on requeue datagram - BUG/MINOR: http-client: Don't forget to commit changes on HTX message - CLEANUP: stconn: Move comment about sedesc fields on the field line - REGTESTS: http: Create a dedicated script to test spliced bodyless responses - REGTESTS: Test SPLICE feature is enabled to execute script about splicing - BUG/MINOR: quic: reappend rxbuf buffer on fake dgram alloc error - BUILD: quic: fix wrong potential NULL dereference - MINOR: h3: abort request if not completed before full response - BUG/MAJOR: http-ana: Get a fresh trash buffer for each header value replacement - CLEANUP: quic: Remove quic_path_room(). - MINOR: quic: Amplification limit handling sanitization. - MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct - MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer. - MINOR: quic: Use a pool for the connection ID tree. - MEDIUM: quic: Allow the quic_conn memory to be asap released. - MINOR: quic: Release asap quic_conn memory (application level) - MINOR: quic: Release asap quic_conn memory from ->close() xprt callback. - MINOR: quic: Warning for OpenSSL wrapper QUIC bindings without "limited-quic" - REORG: http: move has_forbidden_char() from h2.c to http.h - BUG/MAJOR: h3: reject header values containing invalid chars - MINOR: mux-h2/traces: also suggest invalid header upon parsing error - MINOR: ist: add new function ist_find_range() to find a character range - MINOR: http: add new function http_path_has_forbidden_char() - MINOR: h2: pass accept-invalid-http-request down the request parser - REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests - BUG/MINOR: h1: do not accept '#' as part of the URI component - BUG/MINOR: h2: reject more chars from the :path pseudo header - BUG/MINOR: h3: reject more chars from the :path pseudo header - REGTESTS: http-rules: verify that we block '#' by default for normalize-uri - DOC: clarify the handling of URL fragments in requests - BUG/MAJOR: http: reject any empty content-length header value - BUG/MINOR: http: skip leading zeroes in content-length values - BUG/MEDIUM: mux-h1: fix incorrect state checking in h1_process_mux() - BUG/MEDIUM: mux-h1: do not forget EOH even when no header is sent - BUILD: mux-h1: shut a build warning on clang from previous commit - DEV: makefile: add a new "range" target to iteratively build all commits - CI: do not use "groupinstall" for Fedora Rawhide builds - CI: get rid of travis-ci wrapper for Coverity scan - BUG/MINOR: quic: mux started when releasing quic_conn - BUG/MINOR: quic: Possible crash in quic_cc_conn_io_cb() traces. - MINOR: quic: Add a trace for QUIC conn fd ready for receive - BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands - BUG/MINOR: quic: Missing tasklet (quic_cc_conn_io_cb) memory release (leak) - BUG/MEDIUM: quic: fix tasklet_wakeup loop on connection closing - BUG/MINOR: hlua: fix invalid use of lua_pop on error paths - MINOR: hlua: add hlua_stream_ctx_prepare helper function - BUG/MEDIUM: hlua: streams don't support mixing lua-load with lua-load-per-thread - MAJOR: threads/plock: update the embedded library again - MINOR: stick-table: move the task_queue() call outside of the lock - MINOR: stick-table: move the task_wakeup() call outside of the lock - MEDIUM: stick-table: change the ref_cnt atomically - MINOR: stick-table: better organize the struct stktable - MEDIUM: peers: update ->commitupdate out of the lock using a CAS - MEDIUM: peers: drop then re-acquire the wrlock in peer_send_teachmsgs() - MEDIUM: peers: only read-lock peer_send_teachmsgs() - MEDIUM: stick-table: use a distinct lock for the updates tree - MEDIUM: stick-table: touch updates under an upgradable read lock - MEDIUM: peers: drop the stick-table lock before entering peer_send_teachmsgs() - MINOR: stick-table: move the update lock into its own cache line - CLEANUP: stick-table: slightly reorder the stktable struct - BUILD: defaults: use __WORDSIZE not LONGBITS for MAX_THREADS_PER_GROUP - MINOR: tools: make ptr_hash() support 0-bit outputs - MINOR: tools: improve ptr hash distribution on 64 bits - OPTIM: tools: improve hash distribution using a better prime seed - OPTIM: pools: use exponential back-off on shared pool allocation/release - OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated - MINOR: pools: introduce the use of multiple buckets - MEDIUM: pools: spread the allocated counter over a few buckets - MEDIUM: pools: move the used counter over a few buckets - MEDIUM: pools: move the needed_avg counter over a few buckets - MINOR: pools: move the failed allocation counter over a few buckets - MAJOR: pools: move the shared pool's free_list over multiple buckets - MINOR: pools: make pool_evict_last_items() use pool_put_to_os_no_dec() - BUILD: pools: fix build error on clang with inline vs forceinline	2023-08-12 19:59:27 +02:00
Willy Tarreau	2d18717fb8	BUILD: pools: fix build error on clang with inline vs forceinline clang is more picky than gcc regarding duplicate "inline". The functions declared with "forceinline" don't need to have "inline" since it's already in the macro.	2023-08-12 19:58:17 +02:00
Willy Tarreau	29eed99b50	MINOR: pools: make pool_evict_last_items() use pool_put_to_os_no_dec() The bucket is already known, no need to calculate it again. Let's just include the lower level functions.	2023-08-12 19:04:34 +02:00
Willy Tarreau	7bf829ace1	MAJOR: pools: move the shared pool's free_list over multiple buckets This aims at further reducing the contention on the free_list when using global pools. The free_list pointer now appears for each bucket, and both the alloc and the release code skip to a next bucket when ending on a contended entry. The default entry used for allocations and releases depend on the thread ID so that locality is preserved as much as possible under low contention. It would be nice to improve the situation to make sure that releases to the shared pools doesn't consider the first entry's pointer but only an argument that would be passed and that would correspond to the bucket in the thread's cache. This would reduce computations and make sure that the shared cache only contains items whose pointers match the same bucket. This was not yet done. One possibility could be to keep the same splitting in the local cache. With this change, an h2load test with 5 * 160 conns & 40 streams on 80 threads that was limited to 368k RPS with the shared cache jumped to 3.5M RPS for 8 buckets, 4M RPS for 16 buckets, 4.7M RPS for 32 buckets and 5.5M RPS for 64 buckets.	2023-08-12 19:04:34 +02:00
Willy Tarreau	8a0b5f783b	MINOR: pools: move the failed allocation counter over a few buckets The failed allocation counter cannot depend on a pointer, but since it's a perpetually increasing counter and not a gauge, we don't care where it's incremented. Thus instead we're hashing on the TID. There's no contention there anyway, but it's better not to waste the room in the pool's heads and to move that with the other counters.	2023-08-12 19:04:34 +02:00
Willy Tarreau	da6999f839	MEDIUM: pools: move the needed_avg counter over a few buckets That's the same principle as for ->allocated and ->used. Here we return the summ of the raw values, so the result still needs to be fed to swrate_avg(). It also means that we now use the local ->used instead of the global one for the calculations and do not need to call pool_used() anymore on fast paths. The number of samples should likely be divided by the number of buckets, but that's not done yet (better observe first). A function pool_needed_avg() was added to report aggregated values for the "show pools" command. With this change, an h2load made of 5 * 160 conn * 40 streams on 80 threads raised from 1.5M RPS to 6.7M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	9e5eb586b1	MEDIUM: pools: move the used counter over a few buckets That's the same principle as for ->allocated. The small difference here is that it's no longer possible to decrement ->used in batches when releasing clusters from the cache to the shared cache, so the counter has to be decremented for each of them. But as it provides less contention and it's done only during forced eviction, it shouldn't be a problem. A function "pool_used()" was added to return the sum of the entries. It's used by pool_alloc_nocache() and pool_free_nocache() which need to count the number of used entries. It's not a problem since such operations are done when picking/releasing objects to/from the OS, but it is a reminder that the number of buckets should remain small. With this change, an h2load test made of 5 * 160 conn * 40 streams on 80 threads raised from 812k RPS to 1.5M RPS.	2023-08-12 19:04:34 +02:00
Willy Tarreau	cdb711e42b	MEDIUM: pools: spread the allocated counter over a few buckets The ->used counter is one of the most stressed, and it heavily depends on the ->allocated one, so let's first move ->allocated to a few buckets. A function "pool_allocated()" was added to return the sum of the entries. It's important not to abuse it as it does iterate, so everywhere it's possible to avoid it by keeping a local counter, it's better. Currently it's used for limited pools which need to make sure they do not allocate too many objects. That's an acceptable tradeoff to save CPU on large machines at the expense of spending a little bit more on small ones which normally are not under load.	2023-08-12 19:04:34 +02:00
Willy Tarreau	06885aaea7	MINOR: pools: introduce the use of multiple buckets On many threads and without the shared cache, there can be extreme contention on the ->allocated counter, the ->free_list pointer, and the ->used counter. It's possible to limit this contention by spreading the counters a little bit over multiple entries, that are summed up when a consultation is needed. The criterion used to spread the values cannot be related to the thread ID due to migrations, since we need to keep consistent stats (allocated vs used). Instead we'll just hash the pointer, it provides an index that does the job and that is consistent for the object. When having just a few entries (16 here as it showed almost identical performance between global and non-global pools) even iterations should be short enough during measurements to not be a problem. A pair of functions designed to ease pointer hash bucket calculation were added, with one of them doing it for thread IDs because allocation failures will be associated with a thread and not a pointer. For now this patch only brings in the relevant parts of the infrastructure, the CONFIG_HAP_POOL_BUCKETS_BITS macro that defaults to 6 bits when 512 threads or more are supported, 5 bits when 128 or more are supported, 4 bits when 16 or more are supported, otherwise 3 bits for small setups. The array in the pool_head and the two utility functions are already added. It should have no measurable impact beyond inflating the pool_head structure.	2023-08-12 19:04:34 +02:00
Willy Tarreau	29ad61fb00	OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update ->allocated The pool's allocation counter doesn't strictly require to be updated from these functions, it may more efficiently be done in the caller (even out of a loop for pool_flush() and pool_gc()), and doing so will also help us spread the counters over an array later. The functions were renamed _noinc and _nodec to make sure we catch any possible user in an external patch. If needed, the original functions may easily be reimplemented in an inline function.	2023-08-12 19:04:34 +02:00
Willy Tarreau	feeda4132b	OPTIM: pools: use exponential back-off on shared pool allocation/release Running a stick-table stress with -dMglobal under 56 threads shows extreme contention on the pool's free_list because it has to be processed in two phases and only used to implement a cpu_relax() on the retry path. Let's at least implement exponential back-off here to limit the neighbor's noise and reduce the time needed to successfully acquire the pointer. Just doing so shows there's still contention but almost doubled the performance, from 1.1 to 2.1M req/s.	2023-08-12 19:04:34 +02:00
Willy Tarreau	f0d188f6ed	OPTIM: tools: improve hash distribution using a better prime seed During tests it was noticed that the current hash is not that good on 4- and 5- bit hashes. About 7.5% of all the 32-bit primes were tested as candidates for the hash function, by submitting them 128 arrangements of N pointers among 40k extracted from haproxy's pools, and the average fill rates for 1- to 12- bit hashes were measured and compared. It was clear that some values do not provide great hashes and other ones are way more resistant. The current value is not bad at all but delivers 42.6% unique 2-bit outputs, 41.6% 3-bit, 38.0% 4-bit, 38.2% 5-bit and 37.1% 10-bit. Some values did perform significantly better, among which 0xacd1be85 which does 43.2% 2-bit, 42.5% 3-bit, 42.2% 4-bit, 39.2% 5-bit and 37.3% 10-bit. The reverse value used in the ptr2_hash() was really underperforming and was replaced with 0x9d28e4e9 which does 49.6%, 40.4%, 42.6%, 39.1%, and 37.2% respectvely. This should slightly improve the accuracy of the task and memory profiling, and will be useful for pools.	2023-08-12 19:04:34 +02:00
Willy Tarreau	58946d44f8	MINOR: tools: improve ptr hash distribution on 64 bits When testing the pointer hash on 64-bit real pointers (map entries), it appeared that the shift by 33 bits that hoped to compensate for the 3 nul LSB degrades the hash, and the centering is more optimal on 31-(bits+1)/2. This makes sense since the topmost bit of the multiplicator is 31, so for an input of 1 bit and 1 bit of output we would always get zero. With the formula adjusted this way, we can get up to ~15% more unique entries at 10 bits and ~24% more at 11 bits.	2023-08-12 19:04:34 +02:00
Willy Tarreau	ab6cb5dea0	MINOR: tools: make ptr_hash() support 0-bit outputs When dealing with macro-based size definitions, it is useful to be able to hash pointers on zero bits so that the macro automatically returns a constant 0. For now it only supports 1-32. Let's just add this special case. It's automatically optimized out by the compiler since the function is inlined.	2023-08-12 19:04:34 +02:00

1 2 3 4 5 ...

20581 Commits