BUG/MEDIUM: wdt: fix wrong thread being checked for sleeping
In 2.7, the method used to check for a sleeping thread changed with commit e7475c8e7 ("MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag"). Previously there was a global sleeping mask and now there is a flag per thread. The commit above partially broke the watchdog by looking at the current thread's flags via th_ctx instead of the reported thread's flags, and using an AND condition instead of an OR to update and leave. This can cause a wrong thread to be killed when the load is uneven. For example, when enabling busy polling and sending traffic over a single connection, all threads have their run time grow, and if the one receiving the signal is also processing some traffic, it will not match the sleeping/harmless condition and will set the stuck flag, then die upon next invocation. While it's reproducible in tests, it's unlikely to be met in field. This fix should be backported to 2.7.
This commit is contained in:
parent
91fe0bc77a
commit
5405c9cdf3
@ -83,7 +83,7 @@ void wdt_handler(int sig, siginfo_t *si, void *arg)
|
||||
if (!p || n - p < 1000000000UL)
|
||||
goto update_and_leave;
|
||||
|
||||
if ((_HA_ATOMIC_LOAD(&th_ctx->flags) & TH_FL_SLEEPING) &&
|
||||
if ((_HA_ATOMIC_LOAD(&ha_thread_ctx[thr].flags) & TH_FL_SLEEPING) ||
|
||||
(_HA_ATOMIC_LOAD(&ha_tgroup_ctx[tgrp-1].threads_harmless) & thr_bit)) {
|
||||
/* This thread is currently doing exactly nothing
|
||||
* waiting in the poll loop (unlikely but possible),
|
||||
|
Loading…
x
Reference in New Issue
Block a user