MEDIUM: clock: force internal time to wrap early after boot

GH issue #2034 clearly indicates yet another case of time roll-over
that went badly. Issues that happen only once every 50 days are hard
to detect and debug, and are usually reported more or less synchronized
from multiple sources. This patch finally does what had long been planned
but never done yet, which is to force the time to wrap early after boot
so that any such remaining issue can be spotted quicker. The margin delay
here is 20s (it may be changed by setting BOOT_TIME_WRAP_SEC to another
value). This value seems sufficient to permit failed health checks to
succeed and traffic to come in and possibly start to update some time
stamps (accept dates in logs, freq counters, stick-tables expiration
dates etc).

It could theoretically be helpful to have this in 2.7, but as can be
seen with the two patches below, we've already had incorrect use cases
of the internal monotonic time when the wall-clock one was needed, so
we could expect to detect other ones in the future. Note that this will
*not* induce bugs, it will only make them happen much faster (i.e. no
need to wait for 50 days before seeing them). If it were to eventually
be backported, these two previous patches must also be backported:

    BUG/MINOR: clock: use distinct wall-clock and monotonic start dates
    BUG/MEDIUM: cache: use the correct time reference when comparing dates
This commit is contained in:
Willy Tarreau 2023-02-07 14:44:44 +01:00
parent 9b5d57dfd5
commit 28360dc53f
2 changed files with 14 additions and 0 deletions

View File

@ -202,6 +202,10 @@
#define TV_ETERNITY_MS (-1)
#endif
/* delay between boot and first time wrap, in seconds */
#ifndef BOOT_TIME_WRAP_SEC
#define BOOT_TIME_WRAP_SEC 20
#endif
/* we want to be able to detect time jumps. Fix the maximum wait time to a low
* value so that we know the time has changed if we wait longer.
*/

View File

@ -267,6 +267,16 @@ void clock_init_process_date(void)
now = after_poll = before_poll = date;
global_now = ((ullong)date.tv_sec << 32) + (uint)date.tv_usec;
global_now_ms = now.tv_sec * 1000 + now.tv_usec / 1000;
/* force time to wrap 20s after boot: we first compute the time offset
* that once applied to the wall-clock date will make the local time
* wrap in 5 seconds. This offset is applied to the process-wide time,
* and will be used to recompute the local time, both of which will
* match and continue from this shifted date.
*/
now_offset = (uint64_t)(-(global_now_ms / 1000U) - BOOT_TIME_WRAP_SEC) << 32;
global_now += now_offset;
th_ctx->idle_pct = 100;
clock_update_date(0, 1);
}