haproxy

Author	SHA1	Message	Date
Olivier Houchard	a74bb7e26e	BUG/MEDIUM: connections: Let the xprt layer know a takeover happened. When we takeover a connection, let the xprt layer know. If it has its own tasklet, and it is already scheduled, then it has to be destroyed, otherwise it may run the new mux tasklet on the old thread. Note that we only do this for the ssl xprt for now, because the only other one that might wake the mux up is the handshake one, which is supposed to disappear before idle connections exist. No backport is needed, this is for 2.2.	2020-07-03 17:49:33 +02:00
Olivier Houchard	1662cdb0c6	BUG/MEDIUM: connections: Set the tid for the old tasklet on takeover. In the various takeover() methods, make sure we schedule the old tasklet on the old thread, as we don't want it to run on our own thread! This was causing a very rare crash when building with DEBUG_STRICT, seeing that either an FD's thread mask didn't match the thread ID in h1_io_cb(), or that stream_int_notify() would try to queue a task with the wrong tid_bit. In order to reproduce this, it is necessary to maintain many connections (typically 30k) at a high request rate flowing over H1+SSL between two proxies, the second of which would randomly reject ~1% of the incoming connection and randomly killing some idle ones using a very short client timeout. The request rate must be adjusted so that the CPUs are nearly saturated, but never reach 100%. It's easier to reproduce this by skipping local connections and always picking from other threads. The issue should happen in less than 20s otherwise it's necessary to restart to reset the idle connections lists. No backport is needed, takeover() is 2.2 only.	2020-07-03 17:49:23 +02:00
Willy Tarreau	43079e0731	MINOR: sched: split tasklet_wakeup() into tasklet_wakeup_on() tasklet_wakeup() only checks tl->tid to know whether the task is programmed to run on the current thread or on a specific thread. We'll have to ease this selection in a subsequent patch, preferably without modifying tl->tid, so let's have a new tasklet_wakeup_on() function to specify the thread number to run on. That the logic has not changed at all.	2020-07-03 17:19:47 +02:00
Willy Tarreau	18ed789ae2	BUG/MEDIUM: server: don't kill all idle conns when there are not enough In srv_cleanup_idle_connections(), we compute how many idle connections are in excess compared to the average need. But we may actually be missing some, for example if a certain number were recently closed and the average of used connections didn't change much since previous period. In this case exceed_conn can become negative. There was no special case for this in the code, and calculating the per-thread share of connections to kill based on this value resulted in special value -1 to be passed to srv_migrate_conns_to_remove(), which for this function means "kill all of them", as used in srv_cleanup_connections() for example. This causes large variations of idle connections counts on servers and CPU spikes at the moment the cleanup task passes. These were quite more visible with SSL as it costs CPU to close and re-establish these connections, and it also takes time, reducing the reuse ratio, hence increasing the amount of connections during reconnection. In this patch we simply skip the killing loop when this condition is met. No backport is needed, this is purely 2.2.	2020-07-02 19:05:30 +02:00
Emeric Brun	b39a3754d9	BUG/MINOR: log: missing timezone on iso dates. The function timeofday_as_iso_us adds now the trailing local timezone offset. Doing this the function could be use directly to generate rfc5424 logs. It affects content of a ring if the ring's format is set to 'iso' and 'timed'. Note: the default ring 'buf0' is of type 'timed'. It is preferable NOT to backport this to stable releases unless bugs are reported, because while the previous format is not correct and the new one is correct, there is a small risk to cause inconsistencies in log format to some users who would not expect such a change in a stable cycle.	2020-07-02 17:56:11 +02:00
Emeric Brun	9f9b22c4f1	MINOR: log: add time second fraction field to rfc5424 log timestamp. This patch adds the time second fraction in microseconds as supported by the rfc.	2020-07-02 17:56:06 +02:00
Willy Tarreau	4f58926352	BUG/MAJOR: sched: make it work also when not building with DEBUG_STRICT Sadly, the fix from commit 54d31170a ("BUG/MAJOR: sched: make sure task_kill() always queues the task") broke the builds without DEBUG_STRICT as, in order to be careful, it plcaed a BUG_ON() around the previously failing condition to check for any new possible failure, but this BUG_ON strips the condition when DEBUG_STRICT is not set. We don't want BUG_ON to evaluate any condition either as some debugging code calls possibly expensive ones (e.g. in htx_get_stline). Let's just drop the useless BUG_ON(). No backport is needed, this is 2.2-dev.	2020-07-02 17:17:42 +02:00
Willy Tarreau	ab8b6a45be	BUILD: haproxy: fix build error when RLIMIT_AS is not set As reported in issue #724, openbsd fails to build in haproxy.c due to a faulty comma in the middle of a warning message. This code is only compiled when RLIMIT_AS is not defined, which seems to be rare these days. This may be backported to older versions as the problem was likely introduced when strict limits were added.	2020-07-02 15:38:35 +02:00
Willy Tarreau	42abe68f11	BUG/MEDIUM: cli/proxy: don't try to dump idle connection state if there's none Commit 69f591e3b ("MINOR: cli/proxy: add a new "show servers conn" command") added the ability to dump the idle connections state for a server, but we must not do this if idle connections were not allocated, which happens if the server is configured with pool-max-conn 0. This is 2.2, no backport is needed.	2020-07-02 15:19:57 +02:00
Olivier Houchard	48ce6a3ab1	BUG/MEDIUM: muxes: Make sure nobody stole the connection before using it. In the various timeout functions, make sure nobody stole the connection from us before attempting to doing anything with it, there's a very small race condition between the time we access the task context, and the time we actually check it again with the lock, where it could have been free'd.	2020-07-02 14:17:25 +02:00
Willy Tarreau	54d31170a9	BUG/MAJOR: sched: make sure task_kill() always queues the task task_kill() may fail to queue a task if this task has never ever run, because its equivalent (tasklet->list) member has never been "emptied" since it didn't pass through the LIST_DEL_INIT() that's performed by run_tasks_from_lists(). This results in these tasks to never be freed. It happens during the mux takeover since the target task usually is the timeout task which, by definition, has never run yet. This fixes commit eb8c2c69f ("MEDIUM: sched: implement task_kill() to kill a task") which was introduced after 2.2-dev11 and doesn't need to be backported.	2020-07-02 14:14:00 +02:00
Willy Tarreau	dab586c3a8	BUILD: debug: avoid build warnings with DEBUG_MEM_STATS Some libcs define strdup() as a macro and cause redefine warnings to be emitted, so let's first undefine all functions we redefine.	2020-07-02 10:25:01 +02:00
Dragan Dosen	1e3b16f74f	MINOR: log-format: allow to preserve spacing in log format strings Now it's possible to preserve spacing everywhere except in "log-format", "log-format-sd" and "unique-id-format" directives, where spaces are delimiters and are merged. That may be useful when the response payload is specified as a log format string by "lf-file" or "lf-string", or even for headers or anything else. In order to merge spaces, a new option LOG_OPT_MERGE_SPACES is applied exclusively on options passed to function parse_logformat_string(). This patch fixes an issue #701 ("http-request return log-format file evaluation altering spacing of ASCII output/art").	2020-07-02 10:11:44 +02:00
Willy Tarreau	a6026a0c92	MINOR: debug: add a new "debug dev memstats" command Now when building with -DDEBUG_MEM_STATS, some malloc/calloc/strdup/realloc stats are kept per file+line number and may be displayed and even reset on the CLI using "debug dev memstats". This allows to easily track potential leakers or abnormal usages.	2020-07-02 09:14:48 +02:00
Dragan Dosen	d1ba552e41	MINOR: 51d: silence a warning about null pointer dereference This is due to issue #713, that reports null pointer dereference suspected by coverity.	2020-07-01 23:27:06 +02:00
Willy Tarreau	76cc699017	MINOR: config: add a new tune.idle-pool.shared global setting. Enables ('on') or disables ('off') sharing of idle connection pools between threads for a same server. The default is to share them between threads in order to minimize the number of persistent connections to a server, and to optimize the connection reuse rate. But to help with debugging or when suspecting a bug in HAProxy around connection reuse, it can be convenient to forcefully disable this idle pool sharing between multiple threads, and force this option to "off". The default is on. This could have been nice to have during the idle connections debugging, but it's not too late to add it!	2020-07-01 19:07:37 +02:00
Willy Tarreau	83ca305ddc	DOC: configuration: fix alphabetical ordering for tune.pool-{high,low}-fd-ratio In addition they were in the wrong alphabetical order in the doc. They were added in 2.0 by commit 88698d966 ("MEDIUM: connections: Add a way to control the number of idling connections.") so this must be backported to 2.0.	2020-07-01 18:30:16 +02:00
Willy Tarreau	a8e2d97905	DOC: configuration: add missing index entries for tune.pool-{low,high}-fd-ratio These two keywords didn't have an entry in the index. They were added in 2.0 by commit 88698d966 ("MEDIUM: connections: Add a way to control the number of idling connections.") so this must be backported to 2.0.	2020-07-01 18:29:44 +02:00
Olivier Houchard	ff1d0929b8	MEDIUM: connections: Don't use a lock when moving connections to remove. Make it so we don't have to take a lock while moving a connection from the idle list to the toremove_list by taking advantage of the MT_LIST.	2020-07-01 17:09:19 +02:00
Olivier Houchard	f8f4c2ef60	CLEANUP: connections: rename the toremove_lock to takeover_lock This lock was misnamed and a bit confusing. It's only used for takeover so let's call it takeover_lock.	2020-07-01 17:09:10 +02:00
Olivier Houchard	bbee1f7e78	MINOR: list: Add MT_LIST_DEL_SAFE_NOINIT() and MT_LIST_ADDQ_NOCHECK() Add two new macros, MT_LIST_DEL_SAFE_NOINIT makes sure we remove the element from the list, without reinitializing its next and prev, and MT_LIST_ADDQ_NOCHECK is similar to MT_LIST_ADDQ(), except it doesn't check if the element is already in a list. The goal is to be able to move an element from a list we're currently parsing to another, keeping it locked in the meanwhile.	2020-07-01 17:04:00 +02:00
Willy Tarreau	88d18f81ae	MEDIUM: mux-fcgi: use task_kill() during fcgi_takeover() instead of task_wakeup() task_wakeup() passes the task through the global run queue under the global RQ lock, which is expensive when dealing with large amounts of fcgi_takeover() calls. Let's use the new task_kill() instead to kill the task.	2020-07-01 16:47:12 +02:00
Willy Tarreau	617e80ff76	MEDIUM: mux-h2: use task_kill() during h2_takeover() instead of task_wakeup() task_wakeup() passes the task through the global run queue under the global RQ lock, which is expensive when dealing with large amounts of h2_takeover() calls. Let's use the new task_kill() instead to kill the task.	2020-07-01 16:47:12 +02:00
Willy Tarreau	09e0d9ecbc	MEDIUM: mux-h1: use task_kill() during h1_takeover() instead of task_wakeup() task_wakeup() passes the task through the global run queue under the global RQ lock, which is expensive when dealing with large amounts of h1_takeover() calls. Let's use the new task_kill() instead to kill the task. By doing so, a scenario involving approximately 130k takeover/s running on 16 threads gained almost 3% performance from 319k req/s to 328k.	2020-07-01 16:42:05 +02:00
Willy Tarreau	eb8c2c69fa	MEDIUM: sched: implement task_kill() to kill a task task_kill() may be used by any thread to kill any task with less overhead than a regular wakeup. In order to achieve this, it bypasses the priority tree and inserts the task directly into the shared tasklets list, cast as a tasklet. The task_list_size is updated to make sure it is properly decremented after execution of this task. The task will thus be picked by process_runnable_tasks() after checking the tree and sent to the TL_URGENT list, where it will be processed and killed. If the task is bound to more than one thread, its first thread will be the one notified. If the task was already queued or running, nothing is done, only the flag is added so that it gets killed before or after execution. Of course it's the caller's responsibility to make sur any resources allocated by this task were already cleaned up or taken over.	2020-07-01 16:35:53 +02:00
Willy Tarreau	8a6049c268	MEDIUM: sched: create a new TASK_KILLED task flag This flag, when set, will be used to indicate that the task must die. At the moment this may only be placed by the task itself or by the scheduler when placing it into the TL_NORMAL queue.	2020-07-01 16:35:49 +02:00
Willy Tarreau	d99177f86d	MINOR: sched: make sched->task_list_size atomic We'll need to update it from foreign threads in order to throw killed tasks and maintain correct accounting, so let's make it atomic.	2020-07-01 16:35:41 +02:00
Willy Tarreau	364f25a688	MINOR: backend: don't always takeover from the same threads The next thread walking algorithm in commit 566df309c ("MEDIUM: connections: Attempt to get idle connections from other threads.") proved to be sufficient for most cases, but it still has some rough edges when threads are unevenly loaded. If one thread wakes up with 10 streams to process in a burst, it will mainly take over connections from the next one until it doesn't have anymore. This patch implements a rotating index that is stored into the server list and that any thread taking over a connection is responsible for updating. This way it starts mostly random and avoids always picking from the same place. This results in a smoother distribution overall and a slightly lower takeover rate.	2020-07-01 16:07:43 +02:00
Willy Tarreau	0d587116c2	BUG/MEDIUM: backend: always search in the safe list after failing on the idle one There's a tricky behavior that was lost when the idle connections were made sharable between thread in commit 566df309c ("MEDIUM: connections: Attempt to get idle connections from other threads."), it is the ability to retry from the safe list when looking for any type of idle connection and not finding one in the idle list. It is already important when dealing with long-lived connections since they ultimately all become safe, but that case is already covered by the fact that safe conns not being used end up closing and are not looked up anymore since connect_server() sees there are none. But it's even more important when using server-side connections which periodically close, because the new connections may spend half of their time in safe state and the other half in the idle state, and failing to grab one such connection from the right list results in establishing a new connection. This patch makes sure that a failure to find an idle connection results in a new attempt at finding one from the safe list if available. In order to avoid locking twice, connections are attempted alternatively from the idle then safe list when picking from siblings. Tests have shown a ~2% performance increase by avoiding to lock twice. A typical test with 10000 connections over 16 threads with 210 servers having a 1 millisecond response time and closing every 5 requests shows a degrading performance starting at 120k req/s down to 60-90k and an average reuse rate of 44%. After the fix, the reuse rate raises to 79% and the performance becomes stable at 254k req/s. Similarly the previous test with full keep-alive has now increased from 96% reuse rate to 99% and from 352k to 375k req/s. No backport is needed as this is 2.2-only.	2020-07-01 15:49:21 +02:00
Willy Tarreau	2f3f4d3441	MEDIUM: server: add a new pool-low-conn server setting The problem with the way idle connections currently work is that it's easy for a thread to steal all of its siblings' connections, then release them, then it's done by another one, etc. This happens even more easily due to scheduling latencies, or merged events inside the same pool loop, which, when dealing with a fast server responding in sub-millisecond delays, can really result in one thread being fully at work at a time. In such a case, we perform a huge amount of takeover() which consumes CPU and requires quite some locking, sometimes resulting in lower performance than expected. In order to fight against this problem, this patch introduces a new server setting "pool-low-conn", whose purpose is to dictate when it is allowed to steal connections from a sibling. As long as the number of idle connections remains at least as high as this value, it is permitted to take over another connection. When the idle connection count becomes lower, a thread may only use its own connections or create a new one. By proceeding like this even with a low number (typically 2*nbthreads), we quickly end up in a situation where all active threads have a few connections. It then becomes possible to connect to a server without bothering other threads the vast majority of the time, while still being able to use these connections when the number of available FDs becomes low. We also use this threshold instead of global.nbthread in the connection release logic, allowing to keep more extra connections if needed. A test performed with 10000 concurrent HTTP/1 connections, 16 threads and 210 servers with 1 millisecond of server response time showed the following numbers: haproxy 2.1.7: 185000 requests per second haproxy 2.2: 314000 requests per second haproxy 2.2 lowconn 32: 352000 requests per second The takeover rate goes down from 300k/s to 13k/s. The difference is further amplified as the response time shrinks.	2020-07-01 15:23:15 +02:00
Willy Tarreau	35e30c9670	BUG/MINOR: server: fix the connection release logic regarding nearly full conditions There was a logic bug in commit ddfe0743d ("MEDIUM: server: use the two thresholds for the connection release algorithm"): instead of keeping only our first idle connection when FDs become scarce, the condition was inverted resulting in enforcing this constraint unless FDs are scarce. This results in less idle connections than permitted to be kept under normal condition. No backport needed.	2020-07-01 14:14:29 +02:00
Willy Tarreau	151c253a1e	MINOR: server: skip servers with no idle conns earlier In conn_backend_get() we can avoid locking other servers when trying to steal their connections when we know for sure they will not have one, so let's do it to lower the contention on the lock.	2020-07-01 10:33:39 +02:00
Willy Tarreau	69f591e3b0	MINOR: cli/proxy: add a new "show servers conn" command This command reuses the existing "show servers state" to also dump the state of active and idle connections. The main use is to serve as a debugging tool to troubleshot connection reuse issues.	2020-07-01 10:32:54 +02:00
Willy Tarreau	df2a0305f2	BUG/MINOR: proxy: always initialize the trash in show servers state Actually the cleanup in commit 6ff8143f7 ("BUG/MINOR: proxy: fix dump_server_state()'s misuse of the trash") allowed to spot that the trash is never reset when dumping a servers state. I couldn't manage to make it dump garbage even with large setups but didn't find either where it's cleared between successive calls while other handlers do explicitly invoke chunk_reset(), so it seems to happen a bit by luck. Let's use chunk_printf() here for each turn, it makes things clearer. This could be backported along with previous patch, especially if any user reports occasional garbage appearing in the show servers output.	2020-07-01 07:11:14 +02:00
Willy Tarreau	6ff8143f7c	BUG/MINOR: proxy: fix dump_server_state()'s misuse of the trash dump_server_state() claims to dump into a buffer but instead it writes into a buffer then dumps the trash into the channel, so it only supports being called with buf=&trash and doesn't need this buffer. There doesn't seem to be any current impact of this mistake since the function is called from one location only. A backport may be performed if it helps fixing other bugs but it will not fix an existing bug by itself.	2020-07-01 07:02:42 +02:00
Dragan Dosen	2866acfb23	BUG/MEDIUM: log-format: fix possible endless loop in parse_logformat_string() This patch adds a missing break to end the loop in case when '%[' is not properly closed with ']'. The issue has been introduced with commit cd0d2ed ("MEDIUM: log-format: make the LF parser aware of sample expressions' end").	2020-07-01 06:30:50 +02:00
Christopher Faulet	b4cf7ab9bc	BUG/MEDIUM: pattern: Add a trailing \0 to match strings only if possible In pat_match_str() and pat_math_beg() functions, a trailing zero is systematically added at the end of the string, even if the buffer is not large enough to accommodate it. It is a possible buffer overflow. For instance, when the alpn is matched against a list of strings, the sample fetch is filled with a non-null terminated string returned by the SSL library. No trailing zero must be added at the end of this string, because it is outside the buffer. So, to fix the bug, a trailing zero is added only if the buffer is large enough to accommodate it. Otherwise, the sample fetch is duplicated. smp_dup() function adds a trailing zero to the duplicated string, truncating it if it is too long. This patch should fix the issue #718. It must be backported to all supported versions.	2020-06-30 19:16:47 +02:00
William Lallemand	5d03639ba6	DOC: ssl: add "allow-0rtt" and "ciphersuites" in crt-list Support for "allow-0rtt" and "ciphersuites" exists for crt-list. Fix issue #721. Should be backported as far as 1.8.	2020-06-30 16:15:44 +02:00
Willy Tarreau	daf8aa62a8	MINOR: pools: increase MAX_BASE_POOLS to 64 When not sharing pools (i.e. when building with -DDEBUG_DONT_SHARE_POOLS) we have about 47 pools right now, while MAX_BASE_POOLS is only 32, meaning that only the first 32 ones will benefit from a per-thread cache entry. This totally kills performance when pools are not shared (roughly -20%). Let's double the limit to gain some margin, and make it possible to set it as a build option. It might be useful to backport this to stable versions as they're likely to be affected as well.	2020-06-30 14:29:02 +02:00
Willy Tarreau	60814ffe81	MINOR: mux-fcgi: avoid taking the toremove_lock in on dying tasks If the owning task is already dying (context was destroyed by fcgi_takeover) there's no point taking the lock then removing it later since all the code in between is conditionned by a non-null context. Let's simplify this.	2020-06-30 14:06:19 +02:00
Willy Tarreau	bd42e9257d	MINOR: mux-h2: avoid taking the toremove_lock in on dying tasks If the owning task is already dying (context was destroyed by h2_takeover) there's no point taking the lock then removing it later since all the code in between is conditionned by a non-null context. Let's simplify this.	2020-06-30 14:06:19 +02:00
Willy Tarreau	68d4ee9e26	MINOR: mux-h1: avoid taking the toremove_lock in on dying tasks If the owning task is already dying (context was destroyed by h1_takeover) there's no point taking the lock then removing it later since all the code in between is conditionned by a non-null context. Let's simplify this.	2020-06-30 14:06:19 +02:00
Willy Tarreau	1553b6657d	BUG/MINOR: sched: properly cover for a rare MT_LIST_ADDQ() race In commit 3ef7a190b ("MEDIUM: tasks: apply a fair CPU distribution between tasklet classes") we compute a total weight to be used to split the CPU time between queues. There is a mention that the total cannot be null, wihch is based on the fact that we only get there if thread_has_task() returns non-zero. But there is a very small race which can break this assumption: if two threads conflict on MT_LIST_ADDQ() on an empty shared list and both roll back before trying again, there is the possibility that a first call to MT_LIST_ISEMPTY() sees the first thread install itself, then the second call will see the list empty when both roll back. Thus we could proceed with the queue while it's temporarily empty and compute max lengths using a divide by zero. This case is very hard to trigger, it seldom happens on 16 threads at 400k req/s. Let's simply test for max_total and leave the loop when we've not found any work. No backport is needed, that's 2.2-only.	2020-06-30 14:06:19 +02:00
Christopher Faulet	9467f18d32	BUG/MINOR: http-rules: Fix ACLs parsing for http deny rules The parsing of http deny rules with no argument or only the deny_status argument is buggy if followed by an ACLs expression (starting with "if" or "unless" keyword). Instead of using the proxy errorfiles, a dummy error is used. To fix the bug, the parsing function must also check for "if" or "unless" keyword in such cases. This patch should fix the issue #720. No backport is needed.	2020-06-30 09:32:03 +02:00
Willy Tarreau	ddfe0743d8	MEDIUM: server: use the two thresholds for the connection release algorithm The algorithm improvement in bdb86bd ("MEDIUM: server: improve estimate of the need for idle connections") is still not enough because there's a hard limit between below and above the FD count, so it continues to end up with many killed connections. Here we're proceeding differently. Given that there are two configured limits, a low and a high one, what we do is that we drop connections when the high limit is reached (what's already done by the killing task anyway), when we're between the low and the high threshold, we only keep the connection if our idle entries are empty (with a preference for safe ones), and below the low threshold, we keep any connection so as to give them a chance of being reused or taken over by another thread. Proceeding like this results in much less dropped connections, we typically see a 99.3% reuse rate (76k conns for 10M requests over 200 servers and 4 threads, with 335k takeovers or 3%), and much less CPU usage variations because there are no more bursts to try to kill extra connections. It should be possible to further improve this by counting the number of threads exploiting a server and trying to optimize the amount of per-thread idle connections so that it is approximately balanced among the threads.	2020-06-29 21:54:38 +02:00
Willy Tarreau	e69282a03f	BUG/MINOR: server: always count one idle slot for current thread The idle server connection estimates brought in commit bdb86bd ("MEDIUM: server: improve estimate of the need for idle connections") were committed without the minimum of 1 idle conn needed for the current thread. The net effect is that there are bursts of dropped connections when the load varies because there's no provision for the last connection. No backport needed, this is 2.2-dev.	2020-06-29 21:54:38 +02:00
Willy Tarreau	369a2efc27	BUG/MINOR: haproxy: don't wake already stopping threads on exit Commit d645574 ("MINOR: soft-stop: let the first stopper only signal other threads") introduced a minor mistake which is that when a stopping thread signals all other threads, it also signals itself. When single-threaded, the process constantly wakes up while waiting for last connections to exit. Let's reintroduce the lost mask to avoid this. No backport is needed, this is 2.2-dev only.	2020-06-29 21:54:38 +02:00
Willy Tarreau	d59946e673	Revert "BUG/MEDIUM: lists: Lock the element while we check if it is in a list." This reverts previous commit 347bbf79d20e1cff57075a8a378355dfac2475e2i. The original code was correct. This patch resulted from a mistaken analysis and breaks the scheduler: ########################## Starting vtest ########################## Testing with haproxy version: 2.2-dev11-90b7d9-23 # top TEST reg-tests/lua/close_wait_lf.vtc TIMED OUT (kill -9) # top TEST reg-tests/lua/close_wait_lf.vtc FAILED (10.008) signal=9 1 tests failed, 0 tests skipped, 88 tests passed Program terminated with signal SIGABRT, Aborted. [Current thread is 1 (Thread 0x7fb0dac2c700 (LWP 11292))] (gdb) bt #0 0x00007fb0e7c143f8 in raise () from /lib64/libc.so.6 #1 0x00007fb0e7c15ffa in abort () from /lib64/libc.so.6 #2 0x000000000053f5d6 in ha_panic () at src/debug.c:269 #3 0x00000000005a6248 in wdt_handler (sig=14, si=<optimized out>, arg=<optimized out>) at src/wdt.c:119 #4 <signal handler called> #5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351 #6 listener_accept (fd=<optimized out>) at src/listener.c:999 #7 0x00000000004262df in fd_update_events (evts=<optimized out>, fd=6) at include/haproxy/fd.h:418 #8 _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:251 #9 0x0000000000548d0f in run_poll_loop () at src/haproxy.c:2949 #10 0x000000000054908b in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3067 #11 0x00007fb0e902b684 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fb0e7ce5eed in clone () from /lib64/libc.so.6 (gdb) up #5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351 351 if (MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list) == 1) { If the commit above is ever backported, this one must be as well!	2020-06-29 21:54:37 +02:00
Olivier Houchard	347bbf79d2	BUG/MEDIUM: lists: Lock the element while we check if it is in a list. In MT_LIST_ADDQ() and MT_LIST_ADD() we can't just check if the element is already in a list, because there's a small race condition, it could be added between the time we checked, and the time we actually set its next and prev. So we have to lock it first. This should be backported to 2.1.	2020-06-29 19:59:06 +02:00
Olivier Houchard	f21695bd8b	BUG/MINOR: threads: Don't forget to init each thread toremove_lock. Don't forget to use HA_SPIN_INIT() on each toremove_lock, or DEBUG_THREAD may not work reliably with it. This should be backported to 2.1 and 2.0.	2020-06-29 17:50:29 +02:00

... 3 4 5 6 7 ...

12605 Commits