haproxy

Author	SHA1	Message	Date
Willy Tarreau	53461e4d94	CLEANUP: tinfo: better align fields in thread_ctx The introduction of buffer_wq[] in thread_ctx pushed a few fields around and the cache line alignment is less satisfying. And more importantly, even before this, all the lists in the local parts were 8-aligned, with the first one split across two cache lines. We can do better: - sched_profile_entry is not atomic at all, the data it points to is atomic so it doesn't need to be in the atomic-only region, and it can fill the 8-hole before the lists - the align(2*void) that was only before tasklets[] moves before all lists (and it's a nop for now) This now makes the lists and buffer_wq[] start on a cache line boundary, leaves 48 bytes after the lists before the atomic-only cache line, and leaves a full cache line at the end for 128-alignment. This way we still have plenty of room in both parts with better aligned fields.	2024-05-10 17:18:13 +02:00
Willy Tarreau	a5d6a79986	MEDIUM: dynbuf: make the buffer_wq an array of list heads Let's turn the buffer_wq into an array of 4 list heads. These are chosen by criticality. The DB_CRIT_TO_QUEUE() macro maps each criticality level into one of these 4 queues. The goal here clearly is to make it possible to wake up the most critical queues in priority in order to let some tasks finish their job and release buffers that others can use. In order to avoid having to look up all queues, a bit map indicates which queues are in use, which also allows to avoid looping in the most common case where queues are empty..	2024-05-10 17:18:13 +02:00
Willy Tarreau	a214197ce7	MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere The code places that were used to manipulate the buffer_wq manually now just call b_queue() or b_requeue(). This will simplify the multiple list management later.	2024-05-10 17:18:13 +02:00
Willy Tarreau	d1c2f325a2	MINOR: dynbuf: add functions to help queue/requeue buffer_wait fields When failing an allocation we always do the same dance, add the buffer_wait struct to a list if it's not, and return. Let's just add dedicated functions to centralize this, this will be useful to implement a bit more complex logic. For now they're not used.	2024-05-10 17:18:13 +02:00
Willy Tarreau	72d0dcda8e	MINOR: dynbuf: pass a criticality argument to b_alloc() The goal is to indicate how critical the allocation is, between the least one (growing an existing buffer ring) and the topmost one (boot time allocation for the life of the process). The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function to try to allocate otherwise subscribe. There's currently no distinction of direction nor part that tries to allocate, and this should be revisited to improve this situation, particularly when we consider that mux-h2 can reduce its Tx allocations if needed. For now, 4 main levels are planned, to translate how the data travels inside haproxy from a producer to a consumer: - MUX_RX: buffer used to receive data from the OS - SE_RX: buffer used to place a transformation of the RX data for a mux, or to produce a response for an applet - CHANNEL: the channel buffer for sync recv - MUX_TX: buffer used to transfer data from the channel to the outside, generally a mux but there can be a few specificities (e.g. http client's response buffer passed to the application, which also gets a transformation of the channel data). The other levels are a bit different in that they don't strictly need to allocate for the first two ones, or they're permanent for the last one (used by compression).	2024-05-10 17:18:13 +02:00
Christopher Faulet	eca9831ec8	MINOR: muxes: Add ctl commands to get info on streams for a connection There are 2 new ctl commands that may be used to retrieve the current number of streams openned for a connection and its limit (the maximum number of streams a mux connection supports). For the PT and H1 muxes, the limit is always 1 and the current number of streams is 0 for idle connections, otherwise 1 is returned. For the H2 and the FCGI muxes, info are already available in the mux connection. For the QUIC mux, the limit is also directly available. It is the maximum initial sub-ID of bidirectional stream allowed for the connection. For the current number of streams, it is the number of SC attached on the connection and the number of not already attached streams present in the "opening_list" list.	2024-05-06 22:00:00 +02:00
Christopher Faulet	96f8b7ad08	MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes A reason is now passed as parameter to muxes shutdowns to pass additional info about the abort, if any. No info means no abort or only generic one. For now, the reason is composed of 2 32-bits integer. The first on represents the abort code and the other one represents the info about the code (for instance the source). The code should be interpreted according to the associated info. One info is the source, encoding on 5 bits. Other bits are reserverd for now. For now, the muxes are the only supported source. But we can imagine to extend it to applets, streams, health-checks... The current design is quite simple and will most probably evolved.. But the idea is to let the opposite side forward some errors and let's a mux know why its stream was aborted. At first glance, a abort reason must only be evaluated if SE_SHW_SILENT flag is set. The main goal at short term, is to forward some H2 RST_STREAM codes because it is mandatory for gRPC applications, mainly to forward gRPC cancellation from an H2 client to an H2 server. But we can imagine to alter this reason at the applicative level to enrich it. It would also be used to report more accurate errors in logs.	2024-05-06 22:00:00 +02:00
Aurelien DARRAGON	48e0efb00b	MEDIUM: log: optimizing tmp->type handling in sess_build_logline() Instead of chaining 2 switchcases and performing encoding checks for all nodes let's actually split the logic in 2: first handle simple node types (text/separator), and then handle dynamic node types (tag, expr). Encoding options are only evaluated for dynamic node types. Also, last_isspace is always set to 0 after next_fmt label, since next_fmt label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR. Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now unused, let's get rid of it. No functional change should be expected. (Use diff -w to check patch changes since reindentation makes the patch look heavy, but in fact it remains fairly small)	2024-05-03 16:48:21 +02:00
Ilia Shipitsin	a65c6d3574	CLEANUP: assorted typo fixes in the code and comments This is 42nd iteration of typo fixes	2024-05-03 09:01:36 +02:00
Amaury Denoyelle	53782b9ea5	MINOR: stats: extract proxy clear-counter in a dedicated function Split code related to proxies list looping in cli_parse_clear_counters() to a new dedicated function. This function is placed in the new module stats-proxy.	2024-05-02 16:43:26 +02:00
Amaury Denoyelle	f0644d1bd7	REORG: stats: define stats-proxy source module Create a new module stats-proxy. Move stats functions related to proxies list looping in it. This allows to reduce stats source file dividing its size by half.	2024-05-02 16:42:36 +02:00
William Lallemand	964f093504	CLEANUP: ssl: rename new_ckch_store_load_files_path() to ckch_store_new_load_files_path() Rename the new_ckch_store_load_files_path() function to ckch_store_new_load_files_path(), in order to be more consistent.	2024-05-02 16:03:20 +02:00
Amaury Denoyelle	10ab56831e	MINOR: stats: convert age as generic column for proxy stat Convert FN_AGE in stat_cols_px[] as generic columns. These values will be automatically used for dump/preload of a stats-file. Remove srv_lastsession() / be_lastsession() function which are now useless as last_sess is calculated via me_generate_field().	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	634cc2a5d8	MINOR: counters: move last_change into counters struct last_change was a member present in both proxy and server struct. It is used as an age statistics to report the last update of the object. Move last_change into fe_counters/be_counters. This is necessary to be able to manipulate it through generic stat column and report it into stats-file. Note that there is a change for proxy structure with now 2 different last_change values, on frontend and backend side. Special care was taken to ensure that the value is initialized only on the proxy side. The other value is set to 0 unless a listen proxy is instantiated. For the moment, only backend counter is reported in stats. However, with now two distinct values, stats could be extended to report it on both side.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	fec2ae9b76	MINOR: stats: support rate in stats-file Implement support for FN_RATE stat column into stat-file. For the output part, only minimal change is required. Reuse the function read_freq_ctr() to print the same value in both stats output and stats-file dump. For counter preloading, define a new utility function preload_freq_ctr(). This can be used to initialize a freq-ctr type by preloading previous period value. Reuse this function in load_ctr() during stats-file parsing. At the moment, no rate column is defined as generic. Thus, this commit does not have functional change. This will be changed as soon as FN_RATE are converted to generic columns.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	639e73f8f2	MINOR: counters: move freq-ctr from proxy/server into counters struct Move freq-ctr defined in proxy or server structures into their dedicated fe_counters/be_counters struct. Functionnaly no change here. This commit will allow to convert rate stats column to generic one, which is mandatory to manipulate them in the stats-file.	2024-05-02 10:55:25 +02:00
Amaury Denoyelle	4e9e841878	MINOR: stats: prepare stats-file support for values other than FN_COUNTER Currently, only FN_COUNTER are dumped and preloaded via a stats-file. Thus in several places we relied on the assumption that only FN_COUNTER are valid in stats-file context. New stats types will soon be implemented as they are also eligilible to statistics reloading on process startup. Thus, prepare stats-file functions to remove any FN_COUNTER restriction. As one of this change, generate_stat_tree() now uses stcol_is_generic() for stats name tree indexing before stats-file parsing. Also related to stats-file parsing, individual counter preloading step as been extracted from line parsing in a dedicated new function load_ctr(). This will allow to extend it to support multiple mechanism of counter preloading depending on the stats type.	2024-05-02 10:55:25 +02:00
Valentine Krasnobaeva	5cbb278fae	MINOR: capabilities: add cap_sys_admin support If 'namespace' keyword is used in the backend server settings or/and in the bind string, it means that haproxy process will call setns() to change its default namespace to the configured one and then, it will create a socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN capability in the process Effective set (see man 2 setns). Otherwise, the process must be run as root. To avoid to run haproxy as root, let's add cap_sys_admin capability in the same way as we already added the support for some other network capabilities. As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was found during configuration parsing. The flag may be unset only in prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which inspect process EUID/RUID and Effective and Permitted capabilities sets. If system doesn't support Linux capabilities or 'cap_sys_admin' was not set in 'setcap', but 'namespace' keyword is presented in the configuration, we keep the previous strict behaviour. Process, that has changed uid to the non-priviledged user, will terminate with alert. This alert invites the user to recheck its configuration. In the case, when haproxy will start and run under a non-root user and 'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch does not change previous behaviour as well. We'll still let the user to try its configuration, but we inform via warning, that unexpected things, like socket creation errors, may occur.	2024-04-30 21:40:17 +02:00
Valentine Krasnobaeva	d3fc982cd7	MEDIUM: proto: make common fd checks in sock_create_server_socket quic_connect_server(), tcp_connect_server(), uxst_connect_server() duplicate same code to check different ERRNOs, that socket() and setns() may return. They also duplicate some runtime condition checks, applied to the obtained server socket fd. So, in order to remove these duplications and to improve code readability, let's encapsulate socket() and setns() ERRNOs handling in sock_handle_system_err(). It must be called just before fd's runtime condition checks, which we also move in sock_create_server_socket by the same reason.	2024-04-30 21:39:24 +02:00
Valentine Krasnobaeva	772d070ab5	MINOR: sock_set_mark: take sock family in account SO_MARK, SO_USER_COOKIE, SO_RTABLE socket options (used to set the special mark/ID on socket, in order to perform mark-based routing) are only supported by AF_INET sockets. So, let's check socket address family, when we enter into this function.	2024-04-30 21:38:29 +02:00
Aurelien DARRAGON	9931a62c3f	BUG/MINOR: log: fix global lf_expr node options behavior (2nd try) In `98b44e8` ("BUG/MINOR: log: fix global lf_expr node options behavior"), I properly restored global node options behavior for when encoding is not used, however the fix is not optimal when encoding is involved: Indeed, encoding logic in sess_build_logline() relies on global node options to know if encoding must be handled expression-wide or individually. However, because of the above fix, if an expression is made of 1 or multiple nodes that all set an encoding option manually (without '%o'), we consider that the option was set globally, but that's probably not what the user intended. Instead we should only evaluate global options from '%o', so that it remains possible to skip global encoding when needed. No backport needed.	2024-04-30 10:10:35 +02:00
William Lallemand	95949e6868	MINOR: httpclient: allow to use absolute URI with new flag HC_F_HTTPROXY The new HC_F_HTTPPROXY flag allows to use an absolute URI within a request that won't be modified in order to use an http proxy.	2024-04-29 17:10:47 +02:00
Aurelien DARRAGON	9bdce67585	CLEANUP: log: add a macro to know if a lf_node is configurable LF_NODE_WITH_OPT(node) returns true if the node's option may be set and thus should be considered. Logic is based on logformat node's type: for now only TAG and FMT nodes can be configured.	2024-04-29 14:47:37 +02:00
Aurelien DARRAGON	0e2aea8224	CLEANUP: tools/cbor: rename cbor_encode_ctx struct members Rename e_byte_fct to e_fct_byte and e_fct_byte_ctx to e_fct_ctx, and adjust some comments to make it clear that e_fct_ctx is here to provide additional user-ctx to the custom cbor encode function pointers. For now, only e_fct_byte function may be provided, but we could imagine having e_fct_int{16,32,64}() one day to speed up the encoding when we know we can encode multiple bytes at a time, but for now it's not worth the hassle.	2024-04-29 14:47:37 +02:00
Willy Tarreau	1db3a390bb	MINOR: list: add a macro to detect that a list contains at most one element The new LIST_ATMOST1() test verifies that the designated element is either alone or points on both sides to the same element. This is used to detect that a list has at most a single element, or that an element about to be deleted was the last one of a list.	2024-04-27 09:36:36 +02:00
Aurelien DARRAGON	c614fd3b9f	MINOR: log: add +cbor encoding option In this patch, we make use of the CBOR (RFC8949) encode helper functions from the previous commit to implement '+cbor' encoding option for log- formats. The logic behind it is pretty similar to '+json' encoding option, except that the produced output is a CBOR payload written in HEX format so that it remains compatible to use this with regular syslog endpoints. Example: log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]" Will produce: BF6B6E616D65645F6669656C64626F6BFF Detailed view (from cbor.me): BF # map() 6B # text(11) 6E616D65645F6669656C64 # "named_field" 62 # text(2) 6F6B # "ok" FF # primitive() If the option isn't set globally, but on a specific node instead, then only the value will be encoded according to CBOR specification. Example: log-format "test cbor bool: %{+cbor}[bool(true)]" Will produce: test cbor bool: F5	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	810303e3e6	MINOR: tools: add cbor encode helpers Add cbor helpers to encode strings (bytes/text) and integers according to RFC8949, also add cbor_encode_ctx struct to pass encoding options such as how to encode a single byte.	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	3f7c8387c0	MINOR: log: add +json encoding option In this patch, we add the "+json" log format option that can be set globally or per log format node. What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the current context which is provided to all lf_* log building function. This way, all lf_* are now aware of this option and try to comply with JSON specification when the option is set. If the option is set globally, then sess_build_logline() will produce a map-like object with key=val pairs for named logformat nodes. (logformat nodes that don't have a name are simply ignored). Example: log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]" Will produce: {"named_field": "ok"} If the option isn't set globally, but on a specific node instead, then only the value will be encoded according to JSON specification. Example: log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }" Will produce: {"manual_key": true} When the option is set, +E option will be ignored, and partial numerical values (ie: because of logasap) will be encoded as-is.	2024-04-26 18:39:32 +02:00
Aurelien DARRAGON	b7c3d8c87c	MINOR: log: add +bin logformat node option Support '+bin' option argument on logformat nodes to try to preserve binary output type with binary sample expressions. For this, we rely on the log/sink API which is capable of conveying binary data since all related functions don't search for a terminating NULL byte in provided log payload as they take a string pointer and a string length as argument. Example: log-format "%{+bin}o %[bin(00AABB)]" Will produce: 00aabb (output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX characters) This should be used carefully, because many syslog endpoints don't expect binary data (especially NULL bytes). This is mainly intended for use with set-var-fmt actions or with ring/udp log endpoints that know how to deal with such binary payloads. Also, this option is only supported globally (for use with '%o'), it will not have any effect when set on an individual node. (it makes no sense to have binary data in the middle of log payload that was started without binary data option)	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	2caa921abf	MINOR: log: add LOG_OPT_NONE flag Add LOG_OPT_NONE flag for default value. Flag is not explicitly used yet but with way we make it official that 0 value means NONE.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	a1583ec7c7	MINOR: log: make all lf_* sess build helper static There is no need to expose such functions since they are only involved in the log building process that occurs inside sess_build_logline(). Making functions static and removing their public prototype to ease code maintenance.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	507223d527	MINOR: log: global lf_expr node options Add options to lf_expr->nodes to store global options (those that are common to all node) for easier access. No functional change should be expected.	2024-04-26 18:39:31 +02:00
Aurelien DARRAGON	7ff4f09e23	MINOR: log: store lf_expr nodes inside substruct Add another struct level inside lf_expr struct to allow new information to be stored alongside lf_expr nodes.	2024-04-26 18:39:31 +02:00
Amaury Denoyelle	374dc08611	MINOR: stats: parse header lines from stats-file This patch implements parsing of headers line from stats-file. A header line is defined as starting with '#' character. It is directly followed by a domain name. For the moment, either 'fe' or 'be' is allowed. The following lines will contain counters values relatives to the domain context until the next header line. This is implemented via static function parse_header_line(). It first sets the domain context used during apply_stats_file(). A stats column array is generated to contains the order on which column are stored. This will be reused to parse following lines values. If an invalid line is found and no header was parsed, considered the stats-file as ill formatted and stop parsing. This allows to immediately interrupt parsing if a garbage file was used without emitting a ton of warnings to the user.	2024-04-26 11:34:02 +02:00
Amaury Denoyelle	34ae7755b3	MINOR: stats: apply stats-file on process startup This commit is the first one of a serie to implement preloading of haproxy counters via stats-file parsing. This patch defines a basic apply_stats_file() function. It implements reading line by line of a stats-file without any parsing for the moment. It is called automatically on process startup via init().	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	83731c8048	MINOR: guid: define guid_is_valid_fmt() Extract GUID format validation in a dedicated function named guid_is_valid_fmt(). For the moment, it is only used on guid_insert(). This will be reused when parsing stats-file, to ensure GUID has a valid format before tree lookup.	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	bc3c117dc0	MINOR: ist: define iststrip() new function Implement iststrip(). This function removes any trailing newline sequence if present from an ist.	2024-04-26 11:29:25 +02:00
Amaury Denoyelle	e74148fb7c	MEDIUM: stats: implement dump stats-file CLI Define a new CLI command "dump stats-file" with its handler cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump first frontend and then backend side. It reuses the common function stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the correct side. A new module stats-file.c is added to regroup function specifics to stats-file. It defines two main functions : * stats_dump_file_header() to generate the list of column list prefixed by the line context, either "#fe" or "#be" * stats_dump_fields_file() to generate each stat lines. Object without GUID are skipped. Each stat entry is separated by a comma. For the moment, stats-file does not support statistics modules. As such, stats_dump_*_line() functions are updated to prevent looping over stats module on stats-file output.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	83281303f6	MINOR: stats: define stats-file output format support Prepare stats function to handle a new format labelled "stats-file". Its purpose is to generate a statistics dump with a format closed from the CSV output. Such output will be then used to preload haproxy internal counters on process startup. stats-file output differs from a standard CSV on several points. First, only an excerpt of all statistics is outputted. All values that does not make sense to preload are excluded. For the moment, stats-file only list stats fully defined via "struct stat_col" method. Contrary to a CSV, sll columns of a stats-file will be filled. As such, empty field value is used to mark stats which should not be outputted. Some adaptation specifics to stats-file are necessary into me_generate_field(). First, stats-file will output separatedly values from frontend and backend sides with their own respective set of columns. As such, an empty field value is returned if stat is not defined for either frontend/listener, or backend/server when outputting the other side. Also, as stats-file does not support empty column, stcol_hide() is not used for it. A minor adjustement was necessary for stats_fill_fe_line() to pass context flags. This is necessary to detect stat output format. All other listener/server/backend corresponding functions already have it.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	6615252656	MEDIUM: stats: convert counters to new column definition Convert most of proxy counters statistics to new "struct stat_col" definition. Remove their corresponding switch..case entries in stats_fill_*_line() functions. Their value are automatically calculate via me_generate_field() invocation. Along with this, also complete stcol_hide() when some stats should be hidden. Only a few counters where not converted. This is because they rely on values stored outside of fe/be_counters structure, which me_generate_field() cannot use for now.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	a7810b7be6	MINOR: stats: implement automatic metric generation from stat_col This commit is a direct follow-up of the previous one which define a new type "struct stat_col" to fully define a statistic entry. Define a new function metric_generate(). For metrics statistics, it is able to automatically calculate a stat value field for "offsets" from "struct stat_col". Use it in stats_fill_*_stats() functions. Maintain a fallback to previously used switch-case for old-style statistics. This commit does not introduce functional change as currently no statistic is defined as "struct stat_col". This will be the subject of a future commit.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	65624876f2	MINOR: stats: introduce a more expressive stat definition method Previously, statistics were simply defined as a list of name_desc, as for example "stat_cols_px" for proxy stats. No notion of type was fixed for each stat definition. This correspondance was done individually inside stats_fill_*_line() functions. This renders the process to define new statistics tedious. Implement a more expressive stat definition method via a new API. A new type "struct stat_col" for stat column to replace name_desc usage is defined. It contains a field to store the stat nature and format. A <cap> field is also defined to be able to define a proxy stat only for certain type of objects. This new type is also further extended to include counter offsets. This allows to define a method to automatically generate a stat value field from a "struct stat_col". This will be the subject of a future commit. New type "struct stat_col" is fully compatible full name_desc. This allows to gradually convert stats definition. The focus will be first for proxies counters to implement statistics preservation on reload.	2024-04-26 10:20:57 +02:00
Amaury Denoyelle	861370a6d4	MINOR: stats: update ambiguous "metrics" naming to "stat_cols" The name "metrics" was chosen to represent the various list of haproxy exposed statistics. However, it is deemed as ambiguous as some stats are indeed metric in the true sense, but some are not, as highlighted by various "enum field_origin" values. Replace it by the new name "stat_cols" for statistic columns. Along with the already existing notion of stat lines it should better reflect its purpose.	2024-04-26 10:20:57 +02:00
Christopher Faulet	608e23c495	MINOR: peers: Use a static variable to wait a resync on reload When a process is reloaded, the old process must performed a synchronisation with the new process. To do so, the sync task notify the local peer to proceed and waits. Internally, the sync task used PEERS_F_DONOTSTOP flag to know it should wait. However, this flag was only set/unset in a single function. There is no real reason to set a flag to do so. A static variable set to 1 when the resync starts and to 0 when it is finished is enough.	2024-04-25 18:29:58 +02:00
Christopher Faulet	5df54f4796	DEV: flags/peers: Decode PEER and PEERS flags Decode peer and peers flags via peer_show_flags() and peers_show_flags() functions.	2024-04-25 18:29:58 +02:00
Christopher Faulet	697bd69efc	REORG: peers: Move peer and peers flags in the corresponding header file PEER_F_* and PEERS_F_ * flags were moved to <peer-t.h> header file. It is mandatory to decode them from "flags" dev tool.	2024-04-25 18:29:58 +02:00
Christopher Faulet	c904f7b440	MEDIUM: peers: Use true states for the learn state of a peer Some flags were used to define the learn state of a peer. It was a bit confusing, especially because the learn state of a peer is manipulated from the peer applet but also from the sync task. It is harder to understand the transitions if it is based on flags than if it is based a dedicated state based on an enum. It is the purpose of this patch. Now, we can define the following rules regarding this learn state: * A peer is assigned to learn by the sync task * The learn state is then changed by the peer itself to notify the learning is in progress and when it is finished. * Finally, when the peer finished to learn, the sync task must acknowledge it by unassigning the peer.	2024-04-25 18:29:57 +02:00
Christopher Faulet	ea9bd6d075	MEDIUM: peers: Use true states for the peer applets as seen from outside This patch is a cleanup of the recent change about the relation between a peer and the applet used to deal with I/O. Three flags was introduced to reflect the peer applet state as seen from outside (from the sync task in fact). Using flags instead of true states was in fact a bad idea. This work but it is confusing. Especially because it was mixed with LEARN and TEACH peer flags. So, now, to make it clearer, we are now using a dedicated state for this purpose. From the outside, the peer may be in one of the following state with respects of its applet: * the peer has no applet, it is stopped (PEER_APP_ST_STOPPED). * the peer applet was created with a validated connection from the protocol perspective. But the sync task must synchronized it with the peers section. It is in starting state (PEER_APP_ST_STARTING). * The starting starting was acknowledged by the sync task, the peer applet can start to process messages. It is in running state (PEER_APP_ST_RUNNING). * The last peer applet was released and the associated connection closed. But the sync task must synchronized it with the peers section. It is in stopping state (PEER_APP_ST_STOPPING). Functionnaly speaking, there is no true change here. But it should be easier to understand now. In addition to these changes, __process_peer_state() function was renamed sync_peer_app_state().	2024-04-25 18:29:57 +02:00
Christopher Faulet	bea541b70a	MINOR: applet: Add a function to know the side where an applet was created appctx_is_back() function may be used to know if an applet was create on frontend side or on backend side. It may be handy for some applets that may exist on both sides, like peer applets.	2024-04-25 18:29:57 +02:00
Willy Tarreau	13515d9fbe	MINOR: intops: add a pair of functions to check multi-byte ranges These new functions is_char4_outside() and is_char8_outside() are meant to be used to verify if any of the 4 or 8 chars represented respectively by a uint32_t or a uint64_t is outside of the min,max byte range passed in argument. This is the simplified, fast version of the function so it is restricted to less than 0x80 distance between min and max (sufficient to validate chars). Extra functions are also provided to check for min or max alone as well, with the same restriction. The use case typically is to check that the output of read_u32() or read_u64() contains exclusively certain bytes.	2024-04-24 15:54:55 +02:00
David Carlier	98d22f212a	MEDIUM: shctx: Naming shared memory context From Linux 5.17, anonymous regions can be name via prctl/PR_SET_VMA so caches can be identified when looking at HAProxy process memory mapping. The most possible error is lack of kernel support, as a result we ignore it, if the naming fails the mapping of memory context ought to still occur.	2024-04-24 10:25:38 +02:00
Tim Duesterhus	aab6477b67	MINOR: Add `ha_generate_uuid_v7` This function generates a version 7 UUID as per draft-ietf-uuidrev-rfc4122bis-14.	2024-04-24 08:23:56 +02:00
Tim Duesterhus	c6cea750a9	MINOR: tools: Rename `ha_generate_uuid` to `ha_generate_uuid_v4` This is in preparation of adding support for other UUID versions.	2024-04-24 08:23:56 +02:00
Willy Tarreau	19f8762a98	BUILD: stick-tables: silence build warnings when threads are disabled Since 3.0-dev7 with commit `1a088da7c2` ("MAJOR: stktable: split the keys across multiple shards to reduce contention"), building without threads yields a warning about the shard not being used. This is because the locks API does nothing of its arguments, which is the only place where the shard is being used. We cannot modify the lock API to pretend to consume its argument because quite often it's not even instantiated. Let's just pretend we consume shard using an explict ALREADY_CHECKED() statement instead. While we're at it, let's make sure that XXH32() is not called when there is a single bucket! No backport is needed.	2024-04-24 08:23:56 +02:00
Amaury Denoyelle	341bf913d4	MINOR: stats: use STAT_F_* prefix for flags Some flags are defined during statistics generation and output. They use the prefix STAT_* which is also used for other purposes. Rename them with the new prefix STAT_F_* to differentiate them from the other usages.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	e97375dcab	MINOR: stats: use stricter naming stats/field/line Several unique names were used for different purposes under statistics implementation. This caused the code to be difficult to understand. * stat/stats name is removed when a more specific name could be used * restrict field usage to purely refer to <struct field> which represents a raw stat value. * use "line" naming to represent an array of <struct field>	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	8dbb74542f	MINOR: stats: rename info stats Info are used to expose haproxy global metrics. It is similar to proxy statistics and any other module. As such, rename info indexes using SI_I_INF_* prefix. Also info variable is renamed stat_line_info. Thanks to this, naming is now consistent between info and other statistics. It will help to integrate it as a "global" statistics module.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	02e0dd6d30	MINOR: stats: rename ambiguous stat_l and stat_count Statistics were extended with the introduction of stats module. This mechanism allows to expose various metrics for several haproxy components. As a consequence of this, some static variables were transformed to dynamic ones to be able to regroup all statistics definition. Rename these variables with more explicit naming : * stat_lines can be used to generate one line of statistics for any module using struct field as value * metrics and metrics_len are used to stored description of metrics indexed by module Note that info is not integrated in the statistics module mechanism. However, it could be done in the future to better reflect its purpose.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	8fc0b18087	MINOR: stats: rename proxy stats This commit is the first one of a serie which adjust naming convention for stats module. The objective is to remove ambiguity and better reflect how stats are implemented, especially since the introduction of stats module. This patch renames elements related to proxies statistics. One of the main change is to rename ST_F_* statistics indexes prefix with the new name ST_I_PX_*. This remove the reference to field which represents another concept in the stats module. In the same vein, global stat_fields variable is renamed metrics_px.	2024-04-22 16:25:18 +02:00
Amaury Denoyelle	c02ec9a9db	BUG/MINOR: backend: use cum_sess counters instead of cum_conn This commit is part of a serie to align counters usage between frontends/listeners on one side and backends/servers on the other. "stot" metric refers to the total number of sessions. On backend side, it is interpreted as a number of streams. Previously, this was accounted using <cum_sess> be_counters field for servers, but <cum_conn> instead for backend proxies. Adjust this by using <cum_sess> for both proxies and servers. As such, <cum_conn> field can be removed from be_counters. Note that several diagnostic messages which reports total frontend and backend connections were adjusted to use <cum_sess>. However, this is an outdated and misleading information as it does reports streams count on backend side. These messages should be fixed in a separate commit. This should be backported to all stable releases.	2024-04-22 10:35:18 +02:00
Amaury Denoyelle	93066be32d	MINOR: backend: use be_counters for health down accounting This commit is the first one of a series which aims to align counters usage between frontends/listeners on one side and backends/servers on the other. Remove <down_trans> field from proxy structure. Use instead the same name field from be_counters structure, which is already used for servers.	2024-04-22 10:35:18 +02:00
Christopher Faulet	fbc0850d36	MEDIUM: muxes: Use one callback function to shut a mux stream mux-ops .shutr and .shutw callback functions are merged into a unique functions, called .shut. The shutdown mode is still passed as argument, muxes are responsible to test it. Concretly, .shut() function of each mux is now the content of the old .shutw() followed by the content of the old .shutr().	2024-04-19 16:33:40 +02:00
Christopher Faulet	1e38ac72ce	MEDIUM: stconn: Use one function to shut connection and applet endpoints se_shutdown() function is now used to perform a shutdown on a connection endpoint and an applet endpoint. The same function is used for both. sc_conn_shut() function was removed and appctx_shut() function was updated to only deal with the applet stuff.	2024-04-19 16:33:35 +02:00
Christopher Faulet	4b80442832	MEDIUM: stconn: Explicitly pass shut modes to shut applet endpoints It is the same than the previous patch but for applets. Here there is already only one function. But with this patch, appctx_shut() function was modified to explicitly get shutdown mode as parameter. In addition appctx_shutw() was removed.	2024-04-19 16:25:06 +02:00
Christopher Faulet	c96a873ba3	MEDIUM: stconn: Use only one SC function to shut connection endpoints The SC API to perform shutdowns on connection endpoints was unified to have only one function, sc_conn_shut(), with read/write shut modes passed explicitly. It means sc_conn_shutr() and sc_conn_shutw() were removed. The next step is to do the same at the mux level.	2024-04-19 16:25:06 +02:00
Christopher Faulet	d2c3f8dde7	MINOR: stconn/connection: Move shut modes at the SE descriptor level CO_SHR_* and CO_SHW_* modes are in fact used by the stream-connectors to instruct the muxes how streams must be shut done. It is then the mux responsibility to decide if it must be propagated to the connection layer or not. And in this case, the modes above are only tested to pass a boolean (clean or not). So, it is not consistant to still use connection related modes for information set at an upper layer and never used by the connection layer itself. These modes are thus moved at the sedesc level and merged into a single enum. Idea is to add more modes, not necessarily mutually exclusive, to pass more info to the muxes. For now, it is a one-for-one renaming.	2024-04-19 16:24:46 +02:00
Christopher Faulet	f58883002c	BUG/MINOR: stconn: Fix sc_mux_strm() return value Since the begining, this function returns a pointer on an appctx while it should be a void pointer. It is the caller responsibility to cast it to the right type, the corresponding mux stream in this case. However, it is not a big deal because this function is unused for now. Only the unsafe one is used. This patch must be backported as far as 2.6.	2024-04-19 15:31:06 +02:00
Olivier Houchard	a7caa14a64	MINOR: stats: Get the right prototype for stats_dump_html_end(). When the stat code was reorganized, and the prototype to stats_dump_html_end() was moved to its own header, it missed the function arguments. Fix that. This should fix issue 2540.	2024-04-19 01:54:00 +02:00
Amaury Denoyelle	0109c0658d	REORG: stats: extract JSON related functions This commit is similar to the previous one. This time it deals with functions related to stats JSON output.	2024-04-18 17:04:08 +02:00
Amaury Denoyelle	b8c1fdf24e	REORG: stats: extract HTML related functions Extract functions related to HTML stats webpage from stats.c into a new module named stats-html. This allows to reduce stats.c to roughly half of its original size.	2024-04-18 17:04:08 +02:00
Amaury Denoyelle	b3d5708adc	MINOR: stats: remove implicit static trash_chunk usage A static variable trash_chunk was used as implicit buffer in most of stats output function. It was a oneline buffer uses as temporary storage before emitting to the final applet or CLI buffer. Replaces it by a buffer defined in show_stat_ctx structure. This allows to retrieve it in most of stats output function. An additional parameter was added for the function where context was not already used. This renders the code cleaner and will allow to split stats.c in several source files. As a result of a new member into show_stat_ctx, per-command context max size has increased. This forces to increase APPLET_MAX_SVCCTX to ensure pool size is big enough. Increase it to 128 bytes which includes some extra room for the future.	2024-04-18 17:04:08 +02:00
Christopher Faulet	9b3a27f70c	BUILD: linuxcap: Properly declare prepare_caps_from_permitted_set() Expected arguments were not specified in the prepare_caps_from_permitted_set() function declaration. It is an issue for some compilers, for instance clang. But at the end, it is unexpected and deprecated. No backport needed, except if `f0b6436f57` ("MEDIUM: capabilities: check process capabilities sets") is backported.	2024-04-18 10:17:38 +02:00
Christopher Faulet	40aa87a28f	BUG/MEDIUM: applet: Fix applet API to put input data in a buffer applet_putblk and co were added to simplify applets. In 2.8, a fix was pushed to deal with all errors as a room error because the vast majority of applets didn't expect other kind of errors. The API was changed with the commit `389b7d1f7b` ("BUG/MEDIUM: applet: Fix API for function to push new data in channels buffer"). Unfortunately and for unknown reason, the fix was totally failed. Checks on channel functions were just wrong and not consistent. applet_putblk() function is especially affected because the error is returned but no flag are set on the SC to request more room. Because of this bug, applets relying on it may be blocked, waiting for more room, and never woken up. It is an issue for the peer and spoe applets. This patch must be backported as far as 2.8.	2024-04-18 09:17:03 +02:00
William Lallemand	10224d72fd	BUG/MINOR: ssl: fix crt-store load parsing The crt-store load line parser relies on offsets of member of the ckch_conf struct. However the new "alias" keyword as an offset to -1, because it does not need to be used. Plan was to handle it that way in the parser, but it wasn't supported yet. So -1 was still used in an offset computation which was not used, but ASAN could see the problem. This patch fixes the issue by using a signed type for the offset value, so any negative value would be skipped. It also introduced a PARSE_TYPE_NONE for the parser. No backport needed.	2024-04-17 21:00:34 +02:00
Ilya Shipitsin	ab7f05daba	CLEANUP: assorted typo fixes in the code and comments This is 41st iteration of typo fixes	2024-04-17 11:14:44 +02:00
Willy Tarreau	99c918ed8a	BUILD: xxhash: silence a build warning on Solaris + gcc-5.5 Testing an undefined macro emits warnings due to -Wundef, and we have exactly one such case in xxhash: include/import/xxhash.h:3390:42: warning: "__cplusplus" is not defined [-Wundef] #if ((defined(sun) \|\| defined(__sun)) && __cplusplus) /* Solaris includes __STDC_VERSION__ with C++. Tested with GCC 5.5 */ Let's just prepend "defined(__cplusplus) &&" before __cplusplus to resolve the problem. Upstream is still affected apparently.	2024-04-17 09:43:32 +02:00
Frederic Lecaille	98583c4256	BUG/MEDIUM: grpc: Fix several unaligned 32/64 bits accesses There were several places in grpc and its dependency protobuf where unaligned accesses were done. Read accesses to 32 (resp. 64) bits values should be performed by read_u32() (resp. read_u64()). Replace these unligned read accesses by correct calls to these functions. Same fixes for doubles and floats. Such unaligned read accesses could lead to crashes with bus errors on CPU archictectures which do not fix them at run time. This patch depends on this previous commit: 861199fa71 MINOR: net_helper: Add support for floats/doubles. Must be backported as far as 2.6.	2024-04-16 07:37:28 +02:00
Frederic Lecaille	153fac4804	MINOR: net_helper: Add support for floats/doubles. Implement (read\|write)_flt() (resp. (read\|write)_dbl()) to read/write floats (resp. read/write doubles) from/to an unaligned buffer.	2024-04-16 07:37:28 +02:00
William Lallemand	fa5c4cc6ce	MINOR: ssl: 'key-base' allows to load a 'key' from a specific path The global 'key-base' keyword allows to read the 'key' parameter of a crt-store load line using a path prefix. This is the equivalent of the 'crt-base' keyword but for 'key'. It only applies on crt-store.	2024-04-15 15:27:10 +02:00
William Lallemand	6567d09af5	MINOR: ssl: supports crt-base in crt-store Add crt-base support for "crt-store". It will be used by 'crt', 'ocsp', 'issuer', 'sctl' load line parameter. In order to keep compatibility with previous configurations and scripts for the CLI, a crt-store load line will save its ckch_store using the absolute crt path with the crt-base as the ckch tree key. This way, a `show ssl cert` on the CLI will always have the completed path.	2024-04-15 15:25:36 +02:00
Willy Tarreau	4615cb510c	MINOR: ring: always check that the old ring fits in the new one in ring_dup() Let's add a BUG_ON() to make sure we don't accidentally shrink a buffer.	2024-04-15 08:31:01 +02:00
Willy Tarreau	b662c5d2b8	MINOR: ring: clarify the usage of ring_size() and add ring_allocated_size() There's currently an abiguity around ring_size(), it's said to return the allocated size but returns the usable size. We can't change it as it's used everywhere in the code like this. Let's fix the comment and add ring_allocated_size() instead for anything related to allocation.	2024-04-15 08:25:03 +02:00
Willy Tarreau	c0ee2d78d7	DEBUG: pools: report the data around the offending area in case of mismatch When the integrity check fails, it's useful to get a dump of the area around the first faulty byte. That's what this patch does. For example it now shows this before reporting info about the tag itself: Contents around first corrupted address relative to pool item:. Contents around address 0xe4febc0792c0+40=0xe4febc0792e8: 0xe4febc0792c8 [80 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792d0 [a0 f7 23 a4 fe e4 00 00] [..#.....] 0xe4febc0792d8 [90 75 56 d8 fe e4 00 00] [.uV.....] 0xe4febc0792e0 [d9 93 fb ff fd ff ff ff] [........] 0xe4febc0792e8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f0 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc0792f8 [d9 93 fb ff ff ff ff ff] [........] 0xe4febc079300 [d9 93 fb ff ff ff ff ff] [........] This may be backported to 2.9 and maybe even 2.8 as it does help spot the cause of the memory corruption.	2024-04-12 18:01:55 +02:00
Willy Tarreau	16e3655fbd	REORG: pool: move the area dump with symbol resolution to tools.c This function is particularly useful to dump unknown areas watching for opportunistic symbols, so let's move it to tools.c so that we can reuse it a little bit more.	2024-04-12 18:01:20 +02:00
William Lallemand	81e54ef197	MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path Remove the ambiguous "ckchs" name and reflect the fact that its loaded from a path.	2024-04-12 15:38:54 +02:00
William Lallemand	00eb44864b	MINOR: ssl: add the section parser for 'crt-store' 'crt-store' is a new section useful to define the struct ckch_store. The "load" keyword in the "crt-store" section allows to define which files you want to load for a specific certificate definition. Ex: crt-store load crt "site1.crt" key "site1.key" load crt "site2.crt" key "site2.key" frontend in bind *:443 ssl crt "site1.crt" crt "site2.crt" This is part of the certificate loading which was discussed in #785.	2024-04-12 15:38:54 +02:00
Willy Tarreau	772f9a5874	BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option This option has been set by default for a very long time and also complicates the manipulation of the DEBUG variable. Let's make it the official default and permit to unset it by setting it to zero. The other pool-related DEBUG options were adjusted to also explicitly check for the zero value for consistency.	2024-04-11 17:25:45 +02:00
Willy Tarreau	b70981532a	BUILD: debug: make DEBUG_STRICT=1 the default We continue to carry it in the makefile, which adds to the difficulty of passing new options. Let's make DEBUG_STRICT=1 the default so that one has to explicitly pass DEBUG_STRICT=0 to disable it. This allows us to remove the option from the default DEBUG variable in the makefile.	2024-04-11 17:25:45 +02:00
Willy Tarreau	e791b243f0	BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented Setting DEBUG_STRICT=0 only validates the defined(DEBUG_STRICT) test regarding DEBUG_STRICT_ACTION, which is equivalent to DEBUG_STRICT>=0. Let's make sure the test checks for >0 so that DEBUG_STRICT=0 properly disables DEBUG_STRICT.	2024-04-11 16:41:08 +02:00
Willy Tarreau	2a9ccf5b25	BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes Recent commit `4c1480f13b` ("MINOR: stick-tables: mark the seen stksess with a flag "seen"") introduced a build regression on older versions of gcc before 4.7. This is in the old __sync_ API, the HA_ATOMIC_LOAD() implementation uses an intermediary return value called "ret" that is of the same name as the variable passed in argument to the macro in the aforementioned commit. As such, the compiler complains with a cryptic error: src/peers.c: In function 'peer_teach_process_stksess_lookup': src/peers.c:1502: error: invalid type argument of '->' (have 'int') The solution is to avoid referencing the argument in the expression and using an intermediary variable for the pointer as done elsewhere in the code. It seems there's no other place affected with this. It probably does not need to be backported since this code is antique and very rarely used nowadays.	2024-04-11 16:41:08 +02:00
Willy Tarreau	d78c346670	BUILD: makefile: support USE_xxx=0 as well William rightfully reported that not supporting =0 to disable a USE_xxx option is sometimes painful (e.g. a script might do USE_xxx=$(command)). It's not that difficult to handle actually, we just need to consider the value 0 as empty at the few places that test for an empty string in options.mk, and in each "ifneq" test in the main Makefile, so let's do that. We even take care of preserving the original value in the build options string so that building with USE_OPENSSL=0 will be reported as-is in haproxy -vv, and with "-OPENSSL" in the feature list.	2024-04-11 11:06:19 +02:00
Willy Tarreau	aa32ab13f0	BUILD: makefile: warn about unknown USE_* variables William suggested that it would be nice to warn about unknown USE_* variables to more easily catch misspelled ones. The valid ones are present in use_opts, so by appending "=%" to each of them, we can build a series of patterns to exclude from MAKEOVERRIDES and emit a warning for the ones that stand out. Example: $ make TARGET=linux-glibc USE_QUIC_COMPAT_OPENSSL=1 Makefile:338: Warning: ignoring unknown build option: USE_QUIC_COMPAT_OPENSSL=1 CC src/slz.o	2024-04-11 11:06:19 +02:00
Christopher Faulet	1fa6eb2eb9	BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values These values are obviously wrong. There is an extra zero at the end for both defines. By chance, it is harmless. But it is better to fix it. This patch should be backported as far as 2.6.	2024-04-10 15:50:00 +02:00
Amaury Denoyelle	34b31d85cb	OPTIM: quic: do not call qc_send() if nothing to emit qc_send() was systematically called by quic_conn IO handlers with all instantiated quic_enc_level. Change this to only register quic_enc_level for send if needed. Do not call at all qc_send() if no qel registered. A new function qel_need_sending() is defined to detect if sending is required. First, it checks if quic_enc_level has prepared frames or probing is set. It can also returns true if ACK required either on quic_enc_level itself or because of quic_conn ack timer fired. Finally, a CONNECTION_CLOSE emission for quic_conn is also a valid case. This should reduce the number of invocations of qc_send(). This could improve slightly performance, as well as simplify traces debugging.	2024-04-10 11:17:21 +02:00
Amaury Denoyelle	7fc1ce5bc8	MEDIUM: quic: remove duplicate hdshk/app send functions A series of previous patches have clean up sending function for handshake case. Their new exposed API is now flexible enough to convert app case to use the same functions. As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single entry point for QUIC emission. It is used during application packets emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal function qc_prep_hpkts() is renamed qc_prep_pkts(). Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts(). Also removed qc_send_app_probing(). It was a simple wrapper over other application send functions. Now, default qc_send() can be reuse for such cases with <old_data> argument set to true. An adjustment was needed when converting qc_send_hdshk_pkts() to the general qc_send() version. Previously, only a single packets encoding/emission cycle was performed. This was enough as handshake packets are always smaller than Tx buffer. However, it may be possible to emit more application data. As such, a loop is necessary to perform multiple encoding/emission cycles, as this was already the case in qc_send_app_pkts(). No functional difference should happen with this commit. However, as these are critcal functions with a lot of changes, this patch is labelled as medium.	2024-04-10 11:07:35 +02:00
Amaury Denoyelle	4e4127a66d	MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb quic_conn_io_cb() manually implements emission by using lower level functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the higher level function qc_send_hdshk_pkts() which notably handle buffer allocation and purging. This allows to clean up send API by flagging qc_prep_pkts() and qc_send_ppkts() as static. They are now used in a single location inside qc_send_hdshk_pkts().	2024-04-10 11:07:19 +02:00
Amaury Denoyelle	3a8f4761e7	MINOR: quic: improve sending API on retransmit qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on retransmission. It was restricted to use two quic_enc_level pointers as distinct arguments. Adapt it to directly use the same list of quic_enc_level which is passed then to qc_prep_hpkts(). Now for retransmission quic_enc_level send list is built directly into qc_dgrams_retransmit() which calls qc_send_hdshk_pkts(). Along this change, a new utility function qel_register_send() is defined. It is an helper to build the quic_enc_level send list. It enfores that each quic_enc_level instance is only registered in a single list to prevent memory issues. It is both used in qc_dgrams_retransmit() and quic_conn_io_cb().	2024-04-10 11:06:55 +02:00
Amaury Denoyelle	93f5b4c8ae	MINOR: quic: uniformize sending methods for handshake Emission of packets during handshakes was implemented via an API which uses two alternative ways to specify the list of frames. The first one uses a NULL list of quic_enc_level as argument for qc_prep_hpkts(). This was an implicit method to iterate on all qels stored in quic_conn instance, with frames already inserted in their corresponding quic_pktns. The second method was used for retransmission. It uses a custom local quic_enc_level list specified by the caller as input to qc_prep_hpkts(). Frames were accessible through <retransmit> list pointers of each quic_enc_level used in an implicit mechanism. This commit clarifies the API by using a single common method. Now quic_enc_level list must always be specified by the caller. As for frames list, each qels must set its new field <send_frms> pointer to the list of frames to send. Callers of qc_prep_hpkts() are responsible to always clear qels send list. This prevent a single instance of quic_enc_level to be inserted while being attached to another list. This allows notably to clean up some unnecessary code. First, <retransmit> list of quic_enc_level is removed as it is replaced by new <send_frms>. Also, it's now possible to use proper list_for_each_entry() inside qc_prep_hpkts() to loop over each qels. Internal functions for quic_enc_level selection is now removed.	2024-04-10 11:06:41 +02:00
Aurelien DARRAGON	8226e92eb0	BUG/MINOR: tools/log: invalid encode_{chunk,string} usage encode_{chunk,string}() is often found to be used this way: ret = encode_{chunk,string}(start, stop...) if (ret == NULL \|\| *ret != '\0') { //error } //success Indeed, encode_{chunk,string} will always try to add terminating NULL byte to the output string, unless no space is available for even 1 byte. However, it means that for the caller to be able to spot an error, then it must provide a buffer (here: start) which is already initialized. But this is wrong: not only this is very tricky to use, but since those functions don't return NULL on failure, then if the output buffer was not properly initialized prior to calling the function, the caller will perform invalid reads when checking for failure this way. Moreover, even if the buffer is initialized, we cannot reliably tell if the function actually failed this way because if the buffer was previously initialized with NULL byte, then the caller might think that the call actually succeeded (since the function didn't return NULL and didn't update the buffer). Also, sess_build_logline() relies lf_encode_{chunk,string}() functions which are in fact wrappers for encode_{chunk,string}() functions and thus exhibit the same error handling mechanism. It turns out that sess_build_logline() makes unsafe use of those functions because it uses the error-checking logic mentionned above while buffer (tmplog) is not guaranteed to be initialized when entering the function. This may ultimately cause malfunctions or invalid reads if the output buffer is lacking space. To fix the issue once and for all and prevent similar bugs from being introduced, we make it so encode_{string, chunk} and escape_string() (based on encode_string()) now explicitly return NULL on failure (when the function failed to write at least the ending NULL byte) lf_encode_{string,chunk}() helpers had to be patched as well due to code duplication. This should be backported to all stable versions. [ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to backport `ae1e14d65` ("CLEANUP: tools: removing escape_chunk() function") first, considering it's not very relevant to maintain a dead function]	2024-04-09 17:35:45 +02:00
Valentine Krasnobaeva	eef14e9574	CLEANUP: global: remove LSTCHK_CAP_BIND Remove LSTCHK_CAP_BIND as it is never set and never checked.	2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva	f0b6436f57	MEDIUM: capabilities: check process capabilities sets Since the Linux capabilities support add-on (see the commit `bd84387beb` ("MEDIUM: capabilities: enable support for Linux capabilities")), we can also check haproxy process effective and permitted capabilities sets, when it starts and runs as non-root. Like this, if needed network capabilities are presented only in the process permitted set, we can get this information with capget and put them in the process effective set via capset. To do this properly, let's introduce prepare_caps_from_permitted_set(). First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If there is a match, LSTCHK_NETADM is removed from global.last_checks list to avoid warning, because in the initialization sequence some last configuration checks are based on LSTCHK_NETADM flag and haproxy process euid may stay unpriviledged. If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted set will be checked and only capabilities given in 'setcap' keyword will be promoted in the process effective set. LSTCHK_NETADM will be also removed in this case by the same reason. In order to be transparent, we promote from permitted set only capabilities given by user in 'setcap' keyword. So, if caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not be unset and warning about missing priviledges will be emitted at initialization. Need to call it before protocol_bind_all() to allow binding to priviledged ports under non-root and 'setcap cap_net_bind_service' must be set in the global section in this case.	2024-04-05 18:01:54 +02:00
Amaury Denoyelle	0489d85263	MINOR: listener: implement GUID support This commit is similar with the two previous ones. Its purpose is to add GUID support on listeners. Due to bind_conf and listeners configuration, some specifities were required. Its possible to define several listeners on a single bind line, for example by specifying multiple addresses. As such, it's impossible to support a "guid" keyword on a bind line. The problem is exacerbated by the cloning of listeners when sharding is used. To resolve this, a new keyword "guid-prefix" is defined for bind lines. It allows to specify a string which will be used as a prefix for automatically generated GUID for each listeners attached to a bind_conf. Automatic GUID listeners generation is implemented via a new function bind_generate_guid(). It is called on post-parsing, after bind_complete_thread_setup(). For each listeners on a bind_conf, a new GUID is generated with bind_conf prefix and the index of the listener relative to other listeners in the bind_conf. This last value is stored in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted, for example due to a non-unique value, an error is returned, startup is interrupted with configuration rejected.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	8259456981	MINOR: server: implement GUID support This commit is similar to previous one, except that it implements GUID support for server instances. A guid_node field is inserted into server structure. A new "guid" server keyword is defined.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	da754b4533	MINOR: proxy: implement GUID support Implement proxy identiciation through GUID. As such, a guid_node member is inserted into proxy structure. A proxy keyword "guid" is defined to allow user to fix its value.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	1009ca4160	MINOR: guid: restrict guid format GUID format is unspecified to allow users to choose the naming scheme. Some restrictions however are added by this patch, mainly to ensure coherence and memory usage. The first restriction is on the length of GUID. No more than 127 characters can be used to prevent memory over consumption. The second restriction is on the character set allowed in GUID. Utility function invalid_char() is used for this : it allows alphanumeric values and '-', '_', '.' and ':'.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	84fa6b344a	MINOR: guid: introduce global UID module Define a new module guid. Its purpose is to be able to attach a global identifier for various objects such as proxies, servers and listeners. A new type guid_node is defined. It will be stored in the objects which can be referenced by such GUID. Several functions are implemented to properly initialized, insert, remove and lookup GUID in a global tree. Modification operations should only be conducted under thread isolation.	2024-04-05 15:40:42 +02:00
Aurelien DARRAGON	e751eebfc6	MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing Currently, the way proxy-oriented logformat directives are handled is way too complicated. Indeed, "log-format", "log-format-error", "log-format-sd" and "unique-id-format" all rely on preparsing hints stored inside proxy->conf member struct. Those preparsing hints include the original string that should be compiled once the proxy parameters are known plus the config file and line number where the string was found to generate precise error messages in case of failure during the compiling process that happens within check_config_validity(). Now that lf_expr API permits to compile a lf_expr struct that was previously prepared (with original string and config hints), let's leverage lf_expr_compile() from check_config_validity() and instead of relying on individual proxy->conf hints for each logformat expression, store string and config hints in the lf_expr struct directly and use lf_expr helpers funcs to handle them when relevant (ie: original logformat string freeing is now done at a central place inside lf_expr_deinit(), which allows for some simplifications) Doing so allows us to greatly simplify the preparsing logic for those 4 proxy directives, and to finally save some space in the proxy struct. Also, since httpclient proxy has its "logformat" automatically compiled in check_config_validity(), we now use the file hint from the logformat expression struct to set an explicit name that will be reported in case of error ("parsing [httpclient:0] : ...") and remove the extraneous check in httpclient_precheck() (logformat was parsed twice previously..)	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	2b79457bc0	MEDIUM: log: add compiling logic to logformat expressions split parse_logformat_string() into two functions: parse_logformat_string() sticks to the same behavior, but now becomes an helper for lf_expr_compile() which uses explicit arguments so that it becomes possible to use lf_expr_compile() without a proxy, but also compile an expression which was previously prepared for compiling (set string and config hints within the logformat expression to avoid manually storing string and config context if the compiling step happens later). lf_expr_dup() may be used to duplicate an expression before it is compiled, lf_expr_xfer() now makes sure that the input logformat is already compiled. This is some prerequisite works for log-profiles implementation, no functional change should be expected.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7a21c3a4ef	MAJOR: log: implement proper postparsing for logformat expressions This patch tries to address a design flaw with how logformat expressions are parsed from config. Indeed, some parse_logformat_string() calls are performed during config parsing when the proxy mode is not yet known. Here's a config example that illustrates the issue: defaults mode tcp listen test bind :8888 http-response set-header custom-hdr "%trl" # needs http mode http The above config should work, because the effective proxy mode is http, yet haproxy fails with this error: [ALERT] (99051) : config : parsing [repro.conf:6] : error detected in proxy 'test' while parsing 'http-response set-header' rule : format tag 'trl' is reserved for HTTP mode. To fix the issue once and for all, let's implement smart postparsing for logformat expressions encountered during config parsing: - split parse_logformat_string() (and subfonctions) in order to create a new lf_expr_postcheck() function that must be called to finish preparing and checking the logformat expression once the proxy type is known. - save some config hints info during parse_logformat_string() to generate more precise error messages during lf_expr_postcheck(), if needed, we rely on curpx->conf.args.{file,line} hints for that because parse_logformat_string() doesn't know about current file and line number. - lf_expr_postcheck() uses PR_FL_CHECKED proxy flag to know if the function may try to make the proxy compatible with the expression, or if it should simply fail as soon as an incompatibility is detected. - if parse_logformat_string() is called from an unchecked proxy, then schedule the expression for postparsing, else (ie: during runtime), run the postcheck right away. This change will also allow for some logformat expression error handling simplifications in the future.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	56d8074798	MINOR: proxy: add PR_FL_CHECKED flag PR_FL_CHECKED is set on proxy once the proxy configuration was fully checked (including postparsing checks). This information may be useful to functions that need to know if some config-related proxy properties are likely to change or not due to parsing or postparsing/check logics. Also, during runtime, except for some rare cases config-related proxy properties are not supposed to be changed.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	6810c41f8e	MEDIUM: tree-wide: add logformat expressions wrapper log format expressions are broadly used within the code: once they are parsed from input string, they are converted to a linked list of logformat nodes. We're starting to face some limitations because we're simply storing the converted expression as a generic logformat_node list. The first issue we're facing is that storing logformat expressions that way doesn't allow us to add metadata alongside the list, which is part of the prerequites for implementing log-profiles. Another issue with storing logformat expressions as generic lists of logformat_node elements is that it's starting to become really hard to tell when we rely on logformat expressions or not in the code given that there isn't always a comment near the list declaration or manipulation to indicate that it's relying on logformat expressions under the hood, so this adds some complexity for code maintenance. This patch looks quite impressive due to changes in a lot of header and source files (since logformat expressions are broadly used), but it does a simple thing: it defines the lf_expr structure which itself holds a generic list of logformat nodes, and then declares some helpers to manipulate lf_expr elements and fixes the code so that we now exclusively manipulate logformat_node lists as lf_expr elements outside of log.c. For now, lf_expr struct only contains the list of logformat nodes (no additional metadata), but now that we have dedicated type and helpers, doing so in the future won't be problematic at all and won't require extensive code changes.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7d8f45b647	MEDIUM: log: carry tag context in logformat node This is a pretty simple patch despite requiring to make some visible changes in the code: When parsing a logformat string, log tags (ie: '%tag', AKA log tags) are turned into logformat nodes with their type set to the type of the corresponding logformat_tag element which was matched by name. Thus, when "compiling" a logformat tag, we only keep a reference to the tag type from the original logformat_tag. For example, for "%B" log tag, we have the following logformat_tag element: { .name = "B", .type = LOG_FMT_BYTES, .mode = PR_MODE_TCP, .lw = LW_BYTES, .config_callback = NULL } When parsing "%B" string, we search for a matching logformat tag inside logformat_tags[] array using the provided name, once we find a matching element, we craft a logformat node whose type will be LOG_FMT_BYTES, but from the node itself, we no longer have access to other informations that are set in the logformat_tag struct element. Thus from a logformat_node resulting from a log tag, with current implementation, we cannot easily get back to matching logformat_tag struct element as it would require us to scan the whole logformat_tags array at runtime using node->type to find the matching element. Let's take a simpler path and consider all tag-specific LOG_FMT_* subtypes as being part of the same logformat node type: LOG_FMT_TAG. Thanks to that, we're now able to distinguish logformat nodes made from logformat tag from other logformat nodes, and link them to their corresponding logformat_tag element from logformat_tags[] array. All it costs is a simple indirection and an extra pointer in logformat_node struct. While at it, all LOG_FMT_* types related to logformat tags were moved inside log.c as they have no use outside of it since they are simply lookup indexes for sess_build_logline() and could even be replaced by function pointers some day...	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	8cf5c3d7f0	MINOR: log: expose logformat_tag struct rename logformat_type internal struct to logformat_tag to to make it less confusing, then expose logformat_tag struct through header file so that it can be referenced in other structs. also rename logformat_keywords[] to logformat_tags[] for better consistency.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	c85cbc1061	MEDIUM: log: rename logformat var to logformat tag What we use to call logformat variable in the code is referred as log-format tag in the documentation. Having both 'var' and 'tag' labels referring to the same thing is really confusing. Let's make the code comply with the documentation by replacing all logformat var/variable/VAR occurences with either tag or TAG. No functional change should be expected, the only visible side-effect from user point of view is that "variable" was replaced by "tag" in some error messages.	2024-04-04 19:10:01 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	4c1480f13b	MINOR: stick-tables: mark the seen stksess with a flag "seen" Right now we're taking the stick-tables update lock for reads just for the sake of checking if the update index is past it or not. That's costly because even taking the read lock is sufficient to provoke a cache line write, while when under load or attack it's frequent that the update has not yet been propagated and wouldn't require anything. This commit brings a new field to the stksess, "seen", which is zeroed when the entry is updated, and set to one as soon as at least one peer starts to consult it. This way it will reflect that the entry must be updated again so that this peer can see it. Otherwise no update will be necessary. For now the flag is only set/reset but not exploited. A great care is taken to avoid writes whenever possible.	2024-04-03 17:34:47 +02:00
William Lallemand	aa3632962f	MEDIUM: mworker: get rid of libsystemd Given the xz drama which allowed liblzma to be linked to openssh, lets remove libsystemd to get rid of useless dependencies. The sd_notify API seems to be stable and is now documented. This patch replaces the sd_notify() and sd_notifyf() function by a reimplementation inspired by the systemd documentation. This should not change anything functionnally. The function will be built when haproxy is built using USE_SYSTEMD=1. References: https://github.com/systemd/systemd/issues/32028 https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes Before: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffcfaf65000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000) libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000) libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000) libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000) liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000) /lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000) After: wla@kikyo:~% ldd /usr/sbin/haproxy linux-vdso.so.1 (0x00007ffd51901000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000) liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000) libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000) /lib64/ld-linux-x86-64.so.2 (0x00007f758d729000) A backport to all stable versions could be considered at some point.	2024-04-03 15:53:18 +02:00
Frederic Lecaille	a305bb92b9	MINOR: quic: HyStart++ implementation (RFC 9406) This is a simple algorithm to replace the classic slow start phase of the congestion control algorithms. It should reduce the high packet loss during this step. Implemented only for Cubic.	2024-04-02 18:47:19 +02:00
Anthony Deschamps	faa8c3e024	MEDIUM: lb-chash: Deterministic node hashes based on server address Motivation: When services are discovered through DNS resolution, the order in which DNS records get resolved and assigned to servers is arbitrary. Therefore, even though two HAProxy instances using chash balancing might agree that a particular request should go to server3, it is likely the case that they have assigned different IPs and ports to the server in that slot. This patch adds a server option, "hash-key <key>" which can be set to "id" (the existing behaviour, default), "addr", or "addr-port". By deriving the keys for the chash tree nodes from a server's address and port we ensure that independent HAProxy instances will agree on routing decisions. If an address is not known then the key is derived from the server's puid as it was previously. When adjusting a server's weight, we now check whether the server's hash has changed. If it has, we have to remove all its nodes first, since the node keys will also have to change.	2024-04-02 07:00:10 +02:00
Aurelien DARRAGON	3c6dfa618a	MEDIUM: log/balance: leverage lbprm api for log load-balancing log load-balancing implementation was not seamlessly integrated within lbprm API. The consequence is that it could become harder to maintain over time since it added some specific cases just for the log backend. Moreover, it resulted in some code duplication since balance algorithms that are common to logs and regular (tcp, http) backends were specifically rewritten for log backends. Thanks to the previous commit, we now have all the prerequisites to make log load-balancing fully leverage lbprm logic. Thus in this patch we make __do_send_log_backend() use existing lbprm algorithms, and we no longer require log-specific lbprm initialization in cfgparse.c and in postcheck_log_backend(). As a bonus, for log backends this allows weighed algorithms to properly support weights (ie: roundrobin, random and log-hash) since we now leverage the same lb algorithms that we use for tcp/http backends (doc was updated).	2024-03-29 17:08:37 +01:00
Aurelien DARRAGON	9aea6df81f	MINOR: lbprm: implement true "sticky" balance algo As previously mentioned in `cd352c0db` ("MINOR: log/balance: rename "log-sticky" to "sticky""), let's define a sticky algorithm that may be used from any protocol. Sticky algorithm sticks on the same server as long as it remains available. The documentation was updated accordingly.	2024-03-29 17:08:37 +01:00
Christopher Faulet	87426e82ec	MAJOR: cli: Use a custom .snd_buf function to only copy the current command The CLI applet is now using its own snd_buf callback function. Instead of copying as most output data as possible, only one command is copied at a time. To do so, a new state CLI_ST_PARSEREQ is added for the CLI applet. In this state, the CLI I/O handle knows a full command was copied into its input buffer and it must parse this command to evaluate it.	2024-03-28 17:32:55 +01:00
Christopher Faulet	838fb54de6	MINOR: stconn: Add a connection flag to notify sending data are the last ones This flag can be use by endpoints to know the data to send, via .snd_buf callback function are the last ones. It is useful to know a shutdown is pending but it cannot be delivered while sedning data are not consumed.	2024-03-28 17:32:55 +01:00
Christopher Faulet	2c6321842b	MEDIUM: applet: Handle applets with their own buffers in put functions applet_putchk() and other similar functions are now testing the applet's type to use the applet's outbuf instead of the channel's buffer. This will ease applets convertion because most of them relies on these functions.	2024-03-28 17:28:20 +01:00
Christopher Faulet	1380b97285	MEDIUM: buf: Add b_getline() and b_getdelim() functions These functions are very similar to co_getline() and co_getdelim(). The first one retrieves the longest part of the buffer that is composed exclusively of characters not in the a delimiter set. The second one stops on LF only and thus returns a line.	2024-03-28 17:28:20 +01:00
Christopher Faulet	5056cbdb86	MINOR: sc_strm: Add generic version to perform sync receives and sends sc_sync_recv() and sc_sync_send() were added to use connection or applet versions, depending on the endpoint type. For now these functions are not used. But this will be used by process_stream() to replace the connection version.	2024-03-28 17:28:20 +01:00
Remi Tricot-Le Breton	7359c0c7f4	MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option This option can be used to set a default ocsp-update mode for all certificates of a given conf file. It allows to activate ocsp-update on certificates without the need to create separate crt-lists. It can still be superseded by the crt-list 'ocsp-update' option. It takes either "on" or "off" as value and defaults to "off". Since setting this new parameter to "on" would mean that we try to enable ocsp-update on any certificate, and also certificates that don't have an OCSP URI, the checks performed in ssl_sock_load_ocsp were softened. We don't systematically raise an error when trying to enable ocsp-update on a certificate that does not have an OCSP URI, be it via the global option or the crt-list one. We will still raise an error when a user tries to load a certificate that does have an OCSP URI but a missing issuer certificate (if ocsp-update is enabled).	2024-03-27 11:38:28 +01:00
Willy Tarreau	6c1b29d06f	MINOR: ring: make the number of queues configurable Now the rings have one wait queue per group. This should limit the contention on systems such as EPYC CPUs where the performance drops dramatically when using more than one CCX. Tests were run with different numbers and it was showed that value 6 outperforms all other ones at 12, 24, 48, 64 and 80 threads on an EPYC, a Xeon and an Ampere CPU. Value 7 sometimes comes close and anything around these values degrades quickly. The value has been left tunable in the global section. This commit only introduces everything needed to set up the queue count so that it's easier to adjust it in the forthcoming patches, but it was initially added after the series, making it harder to compare. It was also shown that trying to group the threads in queues by their thread groups is counter-productive and that it was more efficient to do that by applying a modulo on the thread number. As surprising as it seems, it does have the benefit of well balancing any number of threads.	2024-03-25 17:34:19 +00:00
Willy Tarreau	e3f101a19a	MINOR: ring: add the definition of a ring waiting cell This is what will be used to describe one waiting thread, its message in the queues, and the aggregation of pending messages after it.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c7bd7a68e4	OPTIM: ring: have only one thread at a time wake up all readers It's inefficient and counter-productive that each ring writer iterates over all readers to wake them up. Let's just have one in charge of this, it strongly limits contention. The only thing is that since the thread is iterating over a list, we want to be sure that if the first readers have already completed their job, they will be woken up again. For this we keep a counter of messages delivered after the wakeup started, and the waking thread will check it before going back to sleep. In order to avoid looping forever, it will also drop its waking flag soon enough to possibly let another one take it. There used to be a few cases of watchdogs before this on a 24-core AMD EPYC platform on the list iteration those never appeared anymore. The perf has dropped a bit on 3C6T on the EPYC, from 6.61 to 6.0M but remains unchanged at 24C48T.	2024-03-25 17:34:19 +00:00
Willy Tarreau	9e99cfbeb6	MAJOR: ring: drop the now unneeded lock It was only used to protect the list which is now an mt_list so it doesn't provide any required protection anymore. It obviously also used to provide strict ordering between the writer and the reader when the writer started to update the messages, but that's now covered by the oredered tail updates and updates to the readers count to protect the area. The message rate on small thread counts (up to 12) saw a boost of roughly 5% while on large counts while for large counts it lost about 2% due to some contention now becoming visible elsewhere. Typical measures are 6.13M -> 6.61M at 3C6T, and 1.88 -> 1.92M at 24C48T on the EPYC.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a2d2dbf210	MEDIUM: ring/applet: turn the wait_entry list to an mt_list instead Rings are keeping a lock only for the list, which apparently doesn't need anything more than an mt_list, so let's first turn it into that before dropping the lock. There should be no visible effect.	2024-03-25 17:34:19 +00:00
Willy Tarreau	eb3d5f464d	MEDIUM: ring: use the topmost bit of the tail as a lock We're now locking the tail while looking for some room in the ring. In fact it's still while writing to it, but the goal definitely is to get rid of the lock ASAP. For this we reserve the topmost bit of the tail as a lock, which may have as a possible visible effect that buffers will be limited to 2GB instead of 4GB on 32-bit machines (though in practise, good luck for allocating more than 2GB contiguous on 32-bit), but in practice since the size is read with atol() and some operating systems limit it to LONG_MAX unless passing negative numbers, the limit is already there. For now the impact on x86_64 is significant (drop from 2.35 to 1.4M/s on 48 threads on EPYC 24 cores) but this situation is only temporary so that changes can be reviewable and bisectable. Other approaches were attempted, such as using XCHG instead, which is slightly faster on x86 with low thread counts (but causes more write contention), and forces readers to stall under heavy traffic because they can't access a valid value for the queue anymore. A CAS requires preloading the value and is les good on ARMv8.1. XADD could also be considered with 12-13 upper bits of the offset dedicated to locking, but that looks overkill.	2024-03-25 17:34:19 +00:00
Willy Tarreau	dd8ea5d928	MEDIUM: ring: align the head and tail fields in the ring_storage structure We really want to let the readers and writers act on different areas, so we want to have the tail and the head on separate cache lines, themselves separate from the rest of the ring. Doing so improves the performance from 2.15 to 2.35M msg/s at 48 threads on a 24-core EPYC. This increases the header space from 32 to 192 bytes when threads are enabled. But since we already have the header size available in the file, haring remains able to detect the aligned vs unaligned formats and call dump_v2a() when aligned is detected.	2024-03-25 17:34:19 +00:00
Willy Tarreau	bf3dead20c	MEDIUM: ring: remove the struct buffer from the ring The purpose is to store a head and a tail that are independent so that we can further improve the API to update them independently from each other. The struct was arranged like the original one so that as long as a ring has its head set to zero (i.e. no recycling) it will continue to work. The new format is already detectable thanks to the "rsvd" field which indicates the number of reserved bytes at the beginning. It's located where the buffer's area pointer previously was, so that older versions of haring can continue to open the ring in repair mode, and newer ones can use the fact that the upper bits of that variable are zero to guess that it's working with the new format instead of the old one. Also let's keep in mind that the layout will further change to place some alignment constraints. The haring tool will thus updated based on this and it detects that the rsvd field is smaller than a page and that the sum of it with the size equals the mapped size, in which case it uses the new dump_v2() function instead of dump_v1(). The new function also creates a buffer from the ring's area, size, head and tail and calls the generic one so that no other code had to be adapted.	2024-03-25 17:34:19 +00:00
Willy Tarreau	01aa0a057c	MEDIUM: ring: change the ring reader to use the new vector-based API now The code now looks cleaner and more easily shows what still needs to be addressed. There are not that many changes in practice, these are mostly mechanical, essentially hiding the buffer from the callers.	2024-03-25 17:34:19 +00:00
Willy Tarreau	03816ccfa9	MAJOR: ring: insert an intermediary ring_storage level We'll need to add more complex structures in the ring, such as wait queues. That's far too much to be stored into the area in case of file-backed contents, so let's split the ring definition and its storage once for all. This patch introduces a struct ring_storage which is assigned to ring->storage, which contains minimal information to represent the storage layout, i.e. for now only the buffer, and all the rest remains in the ring itself. The storage is appended immediately after it and the buffer's pointer always points to that area. It has the benefit of remaining 100% compatible with the existing file-backed layout. In memory, the allocation loses the size of a struct buffer. It's not even certain it's worth placing the size there, given that it's constant and that a dump of a ring wouldn't really need it (the file size is sufficient). But for now everything comes with the struct buffer, and later this will change once split into head and tail. Also this area may be completed with more information in the future (e.g. storage version, format, endianness, word size etc).	2024-03-25 17:34:19 +00:00
Willy Tarreau	01abdcb307	MINOR: ring: add a flag to indicate a mapped file Till now we used to rely on a heuristic pointer comparison to check if a ring was mapped or allocated. Better assign a flag to clarify this because it's going to become difficult otherwise.	2024-03-25 17:34:19 +00:00
Willy Tarreau	b30fd8cc2d	MINOR: ring: also add ring_area(), ring_head(), ring_tail() These will essentially be used to simplify the conversion to a new API.	2024-03-25 17:34:19 +00:00
Willy Tarreau	dc4836c15c	MINOR: ring: add ring_dup() to copy a ring into another one This will mostly be used during reallocation and boot-time duplicates, the purpose is simply to save the caller from having to know the details of the internal representation.	2024-03-25 17:34:19 +00:00
Willy Tarreau	a185d3d90d	MINOR: ring: add ring_size() to return the ring's size This is just to ease conversion so that callers stop accessing the ring's buffer.	2024-03-25 17:34:19 +00:00
Willy Tarreau	4c41fcd0da	MINOR: ring: add ring_data() to report the amount of data in a ring This will be used as an accessor for the few functions that need this outside of ring.c.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c222cb8389	MINOR: vecpair: add necessary functions to use vecpairss from/to ring APIs Many ring-based APIs need a tail and a head, with some extra assumption that the user takes care of not filling the ring so that tail==head is unambiguous. Vectors are particularly suited to this usage so here we create 4 functions to create vectors representing free room or data from a ring, as well as updating rings based on a pair of vectors that represents either free space or data.	2024-03-25 17:34:19 +00:00
Willy Tarreau	63261aae39	MINOR: vecpair: add new vector pair based data manipulation mechanisms The buffers API defines both a storage layout and how to handle the data. The storage is shared with the chunks API which only deals with non-wrapping messages while buffers support wrapping both of the data and of the free space. As such, most of the buffers code already makes special cases of two parts in a buffer, the first one before wrapping and the optional second one after the wrapping occurred. The thing is, there are plenty of other places (e.g. rings) where the code dealing with wrapping is desirable but with a different storage layout. Let's export the existing buffer handling code related to reading/writing wrapping data and make it work with arbitrary vector pairs instead. This will handle wrapping and holes in messages if desired, and it will be up to the caller to decide how its messages are arranged and to pass the relevant ptr,len elements. The code is limited to two vectors because this is sufficient to deal with wrapping without making the code needlessly complex. I.e. this will not reassemble an iovec. For vectors, since we already had the ist type, there's no point inventing a new type, and it's even possible that over time some callers will find benefits in using this unified API (i.e. no NOP translation layer). It also allows to pass inputs as direct arguments and outputs as pointers. Not only this is more efficient code-wise, but it also avoids the accidental use of a wrong function. It was indeed found that naming functions is even harder than with the buffer as the notion of from/to is even fuzzier here. The API will likely continue to evolve and some functions might get renamed to more explicit ones over time to limit confusion. For now the code provides anything needed to reset/create/fill/erase/read/peek or measure vector pairs and to manipulate chars/blocks/varints to/from there.	2024-03-25 17:34:19 +00:00
Willy Tarreau	0b1c17a2dd	MINOR: ring: reserve one special value for the readers count In order to support concurrent writers we'll need to lock areas in the buffer. For this we'll use one special value of the single-byte readers count. Let's reserve it now and use the macro instead of the hardcoded 255.	2024-03-25 17:34:19 +00:00
Willy Tarreau	63242a59c4	MINOR: buf: add b_getblk_ofs() that works relative to area and not head For some concurrently accessed buffers we can't rely on head/data etc, but sometimes the access patterns guarantees that the buffer contents are there. Let's implement a function to read contents from a fixed offset, which never checks head nor data, only the area and its size. It's the caller's job to get this offset.	2024-03-25 17:34:19 +00:00
Willy Tarreau	2f28981546	MINOR: buf: add b_putblk_ofs() to copy a block at a specific position This new function b_putblk_ofs() puts one full block of data of length <len> from <blk> into the buffer, starting from absolute offset <offset> after the buffer's area. As a convenience to avoid complex checks in callers, the offset is allowed to exceed a valid one by no more than one buffer size, and will automatically be wrapped. The caller is responsible for ensuring that <len> doesn't exceed the known length of the available room at this position, otherwise data may be overwritten. The buffer's length is not updated, so generally the caller will have updated it before calling this function. This is meant to be used on concurrently accessed buffers, so that a writer can append data while a reader is blocked by other means from reaching the current area The function guarantees never to use ->head nor ->data.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c5004ccb36	MINOR: buf: add b_rel_ofs() to turn an absolute offset into a relative one It basically does the opposite of b_peek_ofs(). If x=b_peek_ofs(y), then y=b_rel_ofs(x).	2024-03-25 17:34:19 +00:00
Willy Tarreau	15e47b6a59	MINOR: buf: add b_add_ofs() to add a count to an absolute position This function is used to compute a new absolute buffer offset by adding a length to an existing valid offset. It will check for wrapping.	2024-03-25 17:34:19 +00:00
Willy Tarreau	c62a2d540d	MEDIUM: ring: move the ring reader code to ring_dispatch_messages() This new function is made around the loop that scans a ring for new messages and dispatches them to a message handler. It also takes ring flags (WAIT, NEW, etc) and offset pointers that the caller will use to initialize/reuse/update the current processing offset. The caller is still responsible for presetting it to ~0 before the first call if it wants the function to automatically adjust it (or set it to the correct value). The function may also return the last_ofs that was known before releasing the lock so that the caller knows what to compare against and if it needs to restart processing or not. The context remains a void* so that should not necessarily depend on an appctx. The current "show ring" code was ported to this and it continues to work as expected.	2024-03-25 17:34:19 +00:00

1 2 3 4 5 ...

7748 Commits