1
0
mirror of https://github.com/systemd/systemd.git synced 2025-01-04 09:18:12 +03:00

shared/watchdog: ratelimit the number of attempts to open watchdog

We need to retry to opening of the watchdog, because the device becomes
available asynchronously.

The watchdog is opened in two places:

- in pid1 in the main loop. The loop has a ratelimit, but during a boot we
  iterate in it fairly quickly. On my test VM with 'iTCO_wdt', version 2:

    $ journalctl -b --grep 'Failed to open any watchdog'|wc -l
    3398

  After the device has been processed by udev, it is initialized successfully.

- in shutdown. In that case, we most likely don't need to try more than once,
  because we mostly care about the case where the watchdog device was present
  and configured previously. But in principle it is possible that we might
  attempt shutdown while the machine was initializing, so we don't want to
  disable retries. Nevertheless, watchdog_ping() is called from a loop that
  might be fairly tight, so we could end up trying to reopen the device fairly
  often. This probably doesn't matter *too* much, but it's still ugly to try to
  open the device without any ratelimit.

Usually the watchdog timeout would be set to something like 30 s or a few
minutes. OTOH, on my VM, the device becomes avaiable at 4.35 s after boot. So
let's use 5 s as a value that is small enough to be much smaller than any
normal watchdog timeout, but large enough that we can expect to do no more than
a 1-2 retries during a normal boot.
This commit is contained in:
Zbigniew Jędrzejewski-Szmek 2024-12-20 19:14:51 +01:00
parent f34a3a9b84
commit 0ace9335e1

View File

@ -311,12 +311,16 @@ static int watchdog_update_timeout(void) {
}
static int watchdog_open(void) {
static RateLimit watchdog_open_ratelimit = { 5 * USEC_PER_SEC, 1 };
struct watchdog_info ident;
char **try_order;
int r;
assert(watchdog_fd < 0);
if (!ratelimit_below(&watchdog_open_ratelimit))
return -EWOULDBLOCK;
/* Let's prefer new-style /dev/watchdog0 (i.e. kernel 3.5+) over classic /dev/watchdog. The former
* has the benefit that we can easily find the matching directory in sysfs from it, as the relevant
* sysfs attributes can only be found via /sys/dev/char/<major>:<minor> if the new-style device