15211 Commits

Author SHA1 Message Date
Christopher Faulet
47bfd7b9b7 BUG/MINOR: tcpcheck: Properly detect pending HTTP data in output buffer
In tcpcheck_eval_send(), the condition to detect there are still pending
data in the output buffer is buggy. Presence of raw data must be tested for
TCP connection only. But a condition on the connection was missing to be
sure it is not an HTX connection.

This patch must be backported as far as 2.2.
2021-08-12 07:49:23 +02:00
William Lallemand
8c29fa7454 MINOR: channel: remove an htx block from a channel
co_htx_remove_blk() implements a way to remove an htx block from a
channel buffer and update the channel output.
2021-08-12 00:51:59 +02:00
William Lallemand
7e7765a451 BUG/MINOR: buffer: fix buffer_dump() formatting
The formatting of the buffer_dump() output must be calculated using the
relative counter, not the absolute one, or everything will be broken if
the <from> variable is not a multiple of 16.

Could be backported in all maintained versions.
2021-08-12 00:51:45 +02:00
Amaury Denoyelle
3eb42f91d9 BUG/MEDIUM: server: support both check/agent-check on a dynamic instance
A static server is able to support simultaneously both health chech and
agent-check. Adjust the dynamic server CLI handlers to also support this
configuration.

This should not be backported, unless dynamic server checks are
backported.
2021-08-11 14:41:47 +02:00
Amaury Denoyelle
26cb8342ad BUG/MEDIUM: check: fix leak on agent-check purge
There is currently a leak on agent-check for dynamic servers. When
deleted, the check rules and vars are not liberated. This leak grows
each time a dynamic server with agent-check is deleted.

Replace the manual purge code by a free_check invocation which
centralizes all the details on check cleaning.

There is no leak for health check because in this case the proxy is the
owner of the check vars and rules.

This should not be backported, unless dynamic server checks are
backported.
2021-08-11 14:40:21 +02:00
Amaury Denoyelle
6d7fc446b4 BUG/MINOR: check: fix leak on add dynamic server with agent-check error
If an error occured during a dynamic server creation, free_check is used
to liberate a possible agent-check. However, this does not free
associated vars and rules associated as this is done on another function
named deinit_srv_agent_check.

To simplify the check free and avoid a leak, move free vars/rules in
free_check. This is valid because deinit_srv_agent_check also uses
free_check.

This operation is done only for an agent-check because for a health
check, the proxy instance is the owner of check vars/rules.

This should not be backported, unless dynamic server checks are
backported.
2021-08-11 14:37:42 +02:00
Amaury Denoyelle
25fe1033cb BUG/MINOR: check: do not reset check flags on purge
Do not reset check flags when setting CHK_ST_PURGE.

Currently, this change has no impact. However, it is semantically wrong
to clear important flags such as CHK_ST_AGENT on purge.

Furthermore, this change will become mandatoy for a future fix to
properly free agent checks on dynamic servers removal. For this, it will
be needed to differentiate health/agent-check on purge via CHK_ST_AGENT
to properly free agent checks.

This must not be backported unless dynamic servers checks are
backported.
2021-08-11 14:33:34 +02:00
Willy Tarreau
6807c7f6e1 ADMIN: dyncookie: implement a simple dynamic cookie calculator
This utility can be useful to figure what cookie value a server will
have based on the secret, its IP and its port.
2021-08-11 14:07:45 +02:00
Amaury Denoyelle
13f2e2ceeb BUG/MINOR: server: do not use refcount in free_server in stopping mode
Currently there is a leak at process shutdown with dynamic servers with
check/agent-check activated. Check purges are not executed on process
stopping, so the server is not liberated due to its refcount.

The solution is simply to ignore the refcount on process stopping mode
and free the server on the first free_server invocation.

This should not be backported, unless dynamic server checks are
backported. In this case, the following commit must be backported first.
  7afa5c1843521ec3be7549592d2b38ccc9d68b73
  MINOR: global: define MODE_STOPPING
2021-08-09 17:53:30 +02:00
Amaury Denoyelle
7afa5c1843 MINOR: global: define MODE_STOPPING
Define a new mode MODE_STOPPING. It is used to indicate that the process
is in the stopping stage and no event loop runs anymore.
2021-08-09 17:51:55 +02:00
Amaury Denoyelle
9ba34ae710 BUG/MINOR: check: test if server is not null in purge
Test if server is not null before using free_server in the check purge
operation. Currently, the null server scenario should not occured as
purge is used with refcounted dynamic servers. However, this might not
be always the case if purge is use in the future in other cases; thus
the test is useful for extensibility.

No need to backport, unless dynamic server checks are backported.

This has been reported through a coverity report in github issue #1343.
2021-08-09 17:48:34 +02:00
Ilya Shipitsin
811ce5598e CI: travis-ci: temporarily disable arm64 builds
few recent builds failed with "gcc: fatal error: Killed signal
terminated program cc1", let us disable arm64 builds until investigation
2021-08-07 07:28:15 +02:00
Amaury Denoyelle
a026783bd7 REGTESTS: server: fix dynamic server with checks test
Add a missing 'rxreq' statement in first server. Without it the test is
unstable. The issue is frequent when running with one thread only.

This should fix github issue #1342.
2021-08-06 15:34:04 +02:00
Amaury Denoyelle
414a612bb3 MINOR: doc: specify ulimit-n usage for dynamic servers
Complete the documentation of the dynamic servers to warn about a
possible fd resource exhaustion when using a large number of them.
2021-08-06 11:22:01 +02:00
Amaury Denoyelle
3e7d468e80 REGTESTS: server: add dynamic check server test
Write a regtest to validate check support by dynamic servers. Three
differents servers are added on various configuration :
- server OK
- server DOWN
- agent-check
2021-08-06 11:22:01 +02:00
Amaury Denoyelle
b65f4cab6a MEDIUM: server: implement agent check for dynamic servers
This commit is the counterpart for agent check of
"MEDIUM: server: implement check for dynamic servers".

The "agent-check" keyword is enabled for dynamic servers. The agent
check must manually be activated via "enable agent" CLI. This can
enable the dynamic server if the agent response is "ready" without an
explicit "enable server" CLI.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
2fc4d39577 MEDIUM: server: implement check for dynamic servers
Implement check support for dynamic servers. The "check" keyword is now
enabled for dynamic servers. If used, the server check is initialized
and the check task started in the "add server" CLI handler. The check is
explicitely disabled and must be manually activated via "enable health"
CLI handler.

The dynamic server refcount is incremented if a check is configured. On
"delete server" handler, the check is purged, which decrements the
refcount.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
9ecee0fa36 MINOR: check: enable safe keywords for dynamic servers
Implement a collection of keywords deemed safe and useful to dynamic
servers. The list of the supported keywords is :
- addr
- check-proto
- check-send-proxy
- check-via-socks4
- rise
- fall
- fastinter
- downinter
- port
- agent-addr
- agent-inter
- agent-port
- agent-send
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
b33a0abc0b MEDIUM: check: implement check deletion for dynamic servers
Implement a mechanism to free a started check on runtime for dynamic
servers. A new function check_purge is created for this. The check task
will be marked for deletion and scheduled to properly close connection
elements and free the task/tasklet/buf_wait elements.

This function will be useful to delete a dynamic server wich checks.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
d6b7080cec MINOR: server: implement a refcount for dynamic servers
It is necessary to have a refcount mechanism on dynamic servers to be
able to enable check support. Indeed, when deleting a dynamic server
with check activated, the check will be asynchronously removed. This is
mandatory to properly free the check resources in a thread-safe manner.
The server instance must be kept alive for this.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
403dce8e5a MINOR: check: do not increment global maxsock at runtime
global maxsock is used to estimate a number of fd to reserve for
internal use, such as checks. It is incremented at startup with the info
from the config file.

Disable this incrementation in checks functions at runtime. First, it
currently serves no purpose to increment it after startup. Worse, it may
lead to out-of-bound accesse on the fdtab.

This will be useful to initiate checks for dynamic servers.
2021-08-06 11:08:24 +02:00
Amaury Denoyelle
3c2ab1a0d4 MINOR: check: export check init functions
Remove static qualifier on init_srv_check, init_srv_agent_check and
start_check_task. These functions will be called in server.c for dynamic
servers with checks.
2021-08-06 11:08:04 +02:00
Amaury Denoyelle
f2c27a5c67 MINOR: check: allocate default check ruleset for every backends
Allocate default tcp ruleset for every backend without explicit rules
defined, even if no server in the backend use check. This change is
required to implement checks for dynamic servers.

This allocation is done on check_config_validity. It must absolutely be
called before check_proxy_tcpcheck (called via post proxy check) which
allocate the implicit tcp connect rule.
2021-08-06 11:08:04 +02:00
Amaury Denoyelle
fca18172d9 MINOR: server: initialize fields for dynamic server check
Set default inter/rise/fall values for dynamic servers check/agent. This
is required because dynamic servers do not inherit from a
default-server.
2021-08-06 11:08:04 +02:00
Amaury Denoyelle
7b368339af MEDIUM: task: implement tasklet kill
Implement an equivalent of task_kill for tasklets. This function can be
used to request a tasklet deletion in a thread-safe way.

Currently this function is unused.
2021-08-06 11:07:48 +02:00
Amaury Denoyelle
c755efd5c6 MINOR: server: unmark deprecated on enable health/agent cli
Remove the "DEPRECATED" marker on "enable/disable health/agent"
commands. Their purpose is to toggle the check/agent on a server.

These commands are still useful because their purpose is not covered by
the "set server" command. Most there was confusion with the commands
'set server health/agent', which in fact serves another goal.

Note that the indication "use 'set server' instead" has been added since
2016 on the commit
  2c04eda8b58636ad2ae44e42b1f50f3b5a24a642
  REORG: cli: move "{enable|disable} health" to server.c

and
  58d9cb7d22c1b0d8239543443131e3e3658375d0
  REORG: cli: move "{enable|disable} agent" to server.c

Besides, these commands will become required to enable check/agent on
dynamic servers which will be created with check disabled.

This should be backported up to 2.4.
2021-08-06 10:09:50 +02:00
Christopher Faulet
d7da3dd928 BUG/MEDIUM: spoe: Fix policy to close applets when SPOE connections are queued
It is the second part of the fix that should solve fairness issues with the
connections management inside the SPOE filter. Indeed, in multithreaded
mode, when the SPOE detects there are some connections in queue on a server,
it closes existing connections by releasing SPOE applets. It is mandatory
when a maxconn is set because few connections on a thread may prenvent new
connections establishment.

The first attempt to fix this bug (9e647e5af "BUG/MEDIUM: spoe: Kill applets
if there are pending connections and nbthread > 1") introduced a bug. In
pipelining mode, SPOE applets might be closed while some frames are pending
for the ACK reply. To fix the bug, in the processing stage, if there are
some connections in queue, only truly idle applets may process pending
requests. In this case, only one request at a time is processed. And at the
end of the processing stage, only truly idle applets may be released. It is
an empirical workaround, but it should be good enough to solve contention
issues when a low maxconn is set.

This patch should partely fix the issue #1340. It must be backported as far
as 2.0.
2021-08-05 10:07:43 +02:00
Christopher Faulet
6f1296b5c7 BUG/MEDIUM: spoe: Create a SPOE applet if necessary when the last one is released
On a thread, when the last SPOE applet is released, if there are still
pending streams, a new one is created. Of course, HAproxy must not be
stopping. It is important to start a new applet in this case to not abort
in-progress jobs, especially when a maxconn is set. Because applets may be
closed to be fair with connections waiting for a free slot.

This patch should partely fix the issue #1340. It depends on the commit
"MINOR: spoe: Create a SPOE applet if necessary when the last one on a
thread is closed". Both must be backported as far as 2.0.
2021-08-05 10:07:43 +02:00
Christopher Faulet
434b8525ee MINOR: spoe: Add a pointer on the filter config in the spoe_agent structure
There was no way to access the SPOE filter configuration from the agent
object. However it could be handy to have it. And in fact, this will be
required to fix a bug.
2021-08-05 10:07:43 +02:00
Willy Tarreau
d332f1396b BUG/MINOR: server: update last_change on maint->ready transitions too
Nenad noticed that when leaving maintenance, the servers' last_change
field was not updated. This is visible in the Status column of the stats
page in front of the state, as the cumuled time spent in the current state
is wrong, it starts from the last transition (typically ready->maint). In
addition, the backend's state was not updated either, because the down
transition is performed by set_backend_down() which also emits a log, and
it is this function which was extended to update the backend's last_change,
but it's not called for down->up transitions so that was not done.

The most visible (and unpleasant) effect of this bug is that it affects
slowstart so such a server could immediately restart with a significant
load ratio.

This should likely be backported to all stable releases.
2021-08-04 19:41:01 +02:00
Willy Tarreau
7b2ac29a92 CLEANUP: fd: remove the now unneeded fd_mig_lock
This is not needed anymore since we don't use it when setting the running
mask anymore.
2021-08-04 16:03:36 +02:00
Willy Tarreau
b201b1dab1 CLEANUP: fd: remove the now unused fd_set_running()
It was inlined inside fd_update_events() since it relies on a loop that
may return immediate failure codes.
2021-08-04 16:03:36 +02:00
Willy Tarreau
f69fea64e0 MAJOR: fd: get rid of the DWCAS when setting the running_mask
Right now we're using a DWCAS to atomically set the running_mask while
being constrained by the thread_mask. This DWCAS is annoying because we
may seriously need it later when adding support for thread groups, for
checking that the running_mask applies to the correct group.

It turns out that the DWCAS is not strictly necessary because we never
need it to set the thread_mask based on the running_mask, only the other
way around. And in fact, the running_mask is always cleared alone, and
the thread_mask is changed alone as well. The running_mask is only
relevant to indicate a takeover when the thread_mask matches it. Any
bit set in running and not present in thread_mask indicates a transition
in progress.

As such, it is possible to re-arrange this by using a regular CAS around a
consistency check between running_mask and thread_mask in fd_update_events
and by making a CAS on running_mask then an atomic store on the thread_mask
in fd_takeover(). The only other case is fd_delete() but that one already
sets the running_mask before clearing the thread_mask, which is compatible
with the consistency check above.

This change has happily survived 10 billion takeovers on a 16-thread
machine at 800k requests/s.

The fd-migration doc was updated to reflect this change.
2021-08-04 16:03:36 +02:00
Willy Tarreau
b1f29bc625 MINOR: activity/fd: remove the dead_fd counter
This one is set whenever an FD is reported by a poller with a null owner,
regardless of the thread_mask. It has become totally meaningless because
it only indicates a migrated FD that was not yet reassigned to a thread,
but as soon as a thread uses it, the status will change to skip_fd. Thus
there is no reason to distinguish between the two, it adds more confusion
than it helps. Let's simply drop it.
2021-08-04 16:03:36 +02:00
Amaury Denoyelle
bd8dd841e5 BUG/MINOR: server: remove srv from px list on CLI 'add server' error
If an error occured during the CLI 'add server' handler, the newly
created server must be removed from the proxy list if already inserted.
Currently, this can happen on the extremely rare error during server id
generation if there is no id left.

The removal operation is not thread-safe, it must be conducted before
releasing the thread isolation.

This can be backported up to 2.4. Please note that dynamic server track
is not implemented in 2.4, so the release_server_track invocation must
be removed for the backport to prevent a compilation error.
2021-08-04 14:57:06 +02:00
Willy Tarreau
ba3ab7907a MEDIUM: servers: make the server deletion code run under full thread isolation
In 2.4, runtime server deletion was brought by commit e558043e1 ("MINOR:
server: implement delete server cli command"). A comment remained in the
code about a theoretical race between the thread_isolate() call and another
thread being in the process of allocating memory before accessing the
server via a reference that was grabbed before the memory allocation,
since the thread_harmless_now()/thread_harmless_end() pair around mmap()
may have the effect of allowing cli_parse_delete_server() to proceed.

Now that the full thread isolation is available, let's update the code
to rely on this. Now it is guaranteed that competing threads will either
be in the poller or queued in front of thread_isolate_full().

This may be backported to 2.4 if any report of breakage suggests the bug
really exists, in which case the two following patches will also be
needed:
  MINOR: threads: make thread_release() not wait for other ones to complete
  MEDIUM: threads: add a stronger thread_isolate_full() call
2021-08-04 14:49:36 +02:00
Willy Tarreau
88d1c5d3fb MEDIUM: threads: add a stronger thread_isolate_full() call
The current principle of running under isolation was made to access
sensitive data while being certain that no other thread was using them
in parallel, without necessarily having to place locks everywhere. The
main use case are "show sess" and "show fd" which run over long chains
of pointers.

The thread_isolate() call relies on the "harmless" bit that indicates
for a given thread that it's not currently doing such sensitive things,
which is advertised using thread_harmless_now() and which ends usings
thread_harmless_end(), which also waits for possibly concurrent threads
to complete their work if they took this opportunity for starting
something tricky.

As some system calls were notoriously slow (e.g. mmap()), a bunch of
thread_harmless_now() / thread_harmless_end() were placed around them
to let waiting threads do their work while such other threads were not
able to modify memory contents.

But this is not sufficient for performing memory modifications. One such
example is the server deletion code. By modifying memory, it not only
requires that other threads are not playing with it, but are not either
in the process of touching it. The fact that a pool_alloc() or pool_free()
on some structure may call thread_harmless_now() and let another thread
start to release the same object's memory is not acceptable.

This patch introduces the concept of "idle threads". Threads entering
the polling loop are idle, as well as those that are waiting for all
others to become idle via the new function thread_isolate_full(). Once
thread_isolate_full() is granted, the thread is not idle anymore, and
it is released using thread_release() just like regular isolation. Its
users have to keep in mind that across this call nothing is granted as
another thread might have performed shared memory modifications. But
such users are extremely rare and are actually expecting this from their
peers as well.

Note that that in case of backport, this patch depends on previous patch:
  MINOR: threads: make thread_release() not wait for other ones to complete
2021-08-04 14:49:36 +02:00
Willy Tarreau
f519cfaa63 MINOR: threads: make thread_release() not wait for other ones to complete
The original intent of making thread_release() wait for other requesters to
proceed was more of a fairness trade, guaranteeing that a thread that was
granted an access to the CPU would be in turn giving back once its job is
done. But this is counter-productive as it forces such threads to spin
instead of going back to the poller, and it prevents us from implementing
multiple levels of guarantees, as a thread_release() call could spin
waiting for another requester to pass while that requester expects
stronger guarantees than the current thread may be able to offer.

Let's just remove that wait period and let the thread go back to the
poller, a-la "race to idle".

While in theory it could possibly slightly increase the perceived
latency of concurrent slow operations like "show fd" or "show sess",
it is not the case at all in tests, probably because the time needed
to reach the poller remains extremely low anyway.
2021-08-04 14:49:36 +02:00
Willy Tarreau
286363be08 CLEANUP: thread: fix fantaisist indentation of thread_harmless_till_end()
Probably due to a copy-paste, there were two indent levels in this function
since its introduction in 1.9 by commit 60b639ccb ("MEDIUM: hathreads:
implement a more flexible rendez-vous point"). Let's fix this.
2021-08-04 14:49:36 +02:00
Amaury Denoyelle
08be72b827 BUG/MINOR: server: fix race on error path of 'add server' CLI if track
If an error occurs during a dynamic server creation with tracking, it
must be removed from the tracked list. This operation is not thread-safe
and thus must be conducted under the thread isolation.

Track support for dynamic servers has been introduced in this release.
This does not need to be backported.
2021-08-04 09:18:12 +02:00
William Lallemand
85a16b2ba2 MINOR: stats: shows proxy in a stopped state
Previous patch b5c0d65 ("MINOR: proxy: disabled takes a stopping and a
disabled state") allows us to set 2 states for a stopped or a disabled
proxy. With this patch we are now able to show the stats of all proxies
when the process is in a stopping states, not only when there is some
activity on a proxy.

This patch should fix issue #1307.
2021-08-03 14:17:45 +02:00
William Lallemand
8e765b86fd MINOR: proxy: disabled takes a stopping and a disabled state
This patch splits the disabled state of a proxy into a PR_DISABLED and a
PR_STOPPED state.

The first one is set when the proxy is disabled in the configuration
file, and the second one is set upon a stop_proxy().
2021-08-03 14:17:45 +02:00
William Lallemand
fdc3faf654 MINOR: doc: rename conn_status in option httsplog
Rename the conn_status field by the real name of the sample fetch in the
`option httpslog` documentation.
2021-08-02 10:57:49 +02:00
William Lallemand
56f1f75715 MINOR: log: rename 'dontloglegacyconnerr' to 'log-error-via-logformat'
Rename the 'dontloglegacyconnerr' option to 'log-error-via-logformat'
which is much more self-explanatory and readable.

Note: only legacy keywords don't use hyphens, it is recommended to
separate words with them in new keywords.
2021-08-02 10:42:42 +02:00
Willy Tarreau
8441deb1e2 [RELEASE] Released version 2.5-dev3
Released version 2.5-dev3 with the following main changes :
    - BUG/MINOR: arg: free all args on make_arg_list()'s error path
    - BUG/MINOR: cfgcond: revisit the condition freeing mechanism to avoid a leak
    - MEDIUM: proxy: remove long-broken 'option http_proxy'
    - CLEANUP: http_ana: Remove now unused label from http_process_request()
    - MINOR: deinit: always deinit the init_mutex on failed initialization
    - BUG/MEDIUM: cfgcond: limit recursion level in the condition expression parser
    - BUG/MEDIUM: mworker: do not register an exit handler if exit is expected
    - BUG/MINOR: mworker: do not export HAPROXY_MWORKER_REEXEC across programs
    - BUILD/MINOR: memprof fix macOs build.
    - BUG/MEDIUM: ssl_sample: fix segfault for srv samples on invalid request
    - BUG/MINOR: stats: Add missing agent stats on servers
    - BUG/MINOR: check: fix the condition to validate a port-less server
    - BUILD: threads: fix pthread_mutex_unlock when !USE_THREAD
    - BUG/MINOR: resolvers: Use a null-terminated string to lookup in servers tree
    - MINOR: ssl: use __objt_* variant when retrieving counters
    - BUG/MINOR: systemd: must check the configuration using -Ws
    - BUG/MINOR: mux-h1: Obey dontlognull option for empty requests
    - BUG/MINOR: mux-h2: Obey dontlognull option during the preface
    - BUG/MINOR: mux-h1: Be sure to swap H1C to splice mode when rcv_pipe() is called
    - BUG/MEDIUM: mux-h2: Handle remaining read0 cases on partial frames
    - MINOR: proxy: rename PR_CAP_LUA to PR_CAP_INT
    - MINOR: mworker: the mworker CLI proxy is internal
    - MINOR: stats: don't output internal proxies (PR_CAP_INT)
    - CLEANUP: mworker: use the proxy helper functions in mworker_cli_proxy_create()
    - CLEANUP: mworker: PR_CAP already initialized with alloc_new_proxy()
    - BUG/MINOR: connection: Add missing error labels to conn_err_code_str
    - MINOR: connection: Add a connection error code sample fetch
    - MINOR: ssl: Enable error fetches in case of handshake error
    - MINOR: ssl: Add new ssl_fc_hsk_err sample fetch
    - MINOR: ssl: Define a default https log format
    - MEDIUM: connection: Add option to disable legacy error log
    - REGTESTS: ssl: Add tests for the connection and SSL error fetches
    - REGTESTS: ssl: ssl_errors.vtc does not work with old openssl version
    - BUG/MEDIUM: connection: close a rare race between idle conn close and takeover
    - BUG/MEDIUM: pollers: clear the sleeping bit after waking up, not before
    - BUG/MINOR: select: fix excess number of dead/skip reported
    - BUG/MINOR: poll: fix abnormally high skip_fd counter
    - BUG/MINOR: pollers: always program an update for migrated FDs
    - BUG/MINOR: fd: protect fd state harder against a concurrent takeover
    - DOC: internals: document the FD takeover process
    - MINOR: fd: update flags only once in fd_update_events()
    - MINOR: poll/epoll: move detection of RDHUP support earlier
    - REORG: fd: uninline fd_update_events()
    - MEDIUM: fd: rely more on fd_update_events() to detect changes
    - BUG/MINOR: freq_ctr: use stricter barriers between updates and readings
    - MEDIUM: atomic: simplify the atomic load/store/exchange operations
    - MEDIUM: atomic: relax the load/store barriers on x86_64
    - BUILD: opentracing: fixed build when using pkg-config utility
2021-08-01 18:19:51 +02:00
Miroslav Zagorac
3571111de3 BUILD: opentracing: fixed build when using pkg-config utility
In case the OpenTracing C Wrapper library was installed as part of the
system (ie in a directory that pkg-config considers part of the system),
HAProxy building was not possible because in that case pkg-config did
not set the value of the OT_CFLAGS variable in the addon Makefile.

This resolves GitHub issue #1323.
2021-08-01 18:18:29 +02:00
Willy Tarreau
99198546f6 MEDIUM: atomic: relax the load/store barriers on x86_64
The x86-tso model makes the load and store barriers unneeded for our
usage as long as they perform at least a compiler barrier: the CPU
will respect store ordering and store vs load ordering. It's thus
safe to remove the lfence and sfence which are normally needed only
to communicate with external devices. Let's keep the mfence though,
to make sure that reads of same memory location after writes report
the value from memory and not the one snooped from the write buffer
for too long.

An in-depth review of all use cases tends to indicate that this is
okay in the rest of the code. Some parts could be cleaned up to
use atomic stores and atomic loads instead of explicit barriers
though.

Doing this reliably increases the overall performance by about 2-2.5%
on a 8c-16t Xeon thanks to less frequent flushes (it's likely that the
biggest gain is in the MT lists which use them a lot, and that this
results in less cache line flushes).
2021-08-01 17:34:06 +02:00
Willy Tarreau
cb0451146f MEDIUM: atomic: simplify the atomic load/store/exchange operations
The atomic_load/atomic_store/atomic_xchg operations were all forced to
__ATOMIC_SEQ_CST, which results in explicit store or even full barriers
even on x86-tso while we do not need them: we're not communicating with
external devices for example and are only interested in respecting the
proper ordering of loads and stores between each other.

These ones being rarely used, the emitted code on x86 remains almost the
same (barring a handful of locations). However they will allow to place
correct barriers at other places where atomics are accessed a bit lightly.

The patch is marked medium because we can never rule out the risk of some
bugs on more relaxed platforms due to the rest of the code.
2021-08-01 17:34:06 +02:00
Willy Tarreau
55a0975b1e BUG/MINOR: freq_ctr: use stricter barriers between updates and readings
update_freq_ctr_period() was using relaxed atomics without using barriers,
which usually works fine on x86 but not everywhere else. In addition, some
values were read without being enclosed by barriers, allowing the compiler
to possibly prefetch them a bit earlier. Finally, freq_ctr_total() was also
reading these without enough barriers. Let's make explicit use of atomic
loads and atomic stores to get rid of this situation. This required to
slightly rearrange the freq_ctr_total() loop, which could possibly slightly
improve performance under extreme contention by avoiding to reread all
fields.

A backport may be done to 2.4 if a problem is encountered, but last tests
on arm64 with LSE didn't show any issue so this can possibly stay as-is.
2021-08-01 17:34:06 +02:00
Willy Tarreau
200bd50b73 MEDIUM: fd: rely more on fd_update_events() to detect changes
This function already performs a number of checks prior to calling the
IOCB, and detects the change of thread (FD migration). Half of the
controls are still in each poller, and these pollers also maintain
activity counters for various cases.

Note that the unreliable test on thread_mask was removed so that only
the one performed by fd_set_running() is now used, since this one is
reliable.

Let's centralize all that fd-specific logic into the function and make
it return a status among:

  FD_UPDT_DONE,        // update done, nothing else to be done
  FD_UPDT_DEAD,        // FD was already dead, ignore it
  FD_UPDT_CLOSED,      // FD was closed
  FD_UPDT_MIGRATED,    // FD was migrated, ignore it now

Some pollers already used to call it last and have nothing to do after
it, regardless of the result. epoll has to delete the FD in case a
migration is detected. Overall this removes more code than it adds.
2021-07-30 17:45:18 +02:00