MINOR: clock: do not update the global date too often

Tests with forced wakeups on a 24c/48t machine showed that we're caping
at 7.3M loops/s, which means 6.6 microseconds of loop delay without
having anything to do.

This is caused by two factors:
  - the load and update of the now_offset variable
  - the update of the global_now variable

What is happening is that threads are not running within the one-
microsecond time precision provided by gettimeofday(), so each thread
waking up sees a slightly different date and causes undesired updates
to global_now. But worse, these undesired updates mean that we then
have to adjust the now_offset to match that, and adds significant noise
to this variable, which then needs to be updated upon each call.

By only allowing sightly less precision we can completely eliminate
that contention. Here we're ignoring the 5 lowest bits of the usec
part, meaning that the global_now variable may be off by up to 31 us
(16 on avg). The variable is only used to correct the time drift some
threads might be observing in environments where CPU clocks are not
synchronized, and it's used by freq counters. In both cases we don't
need that level of precision and even one millisecond would be pretty
fine. We're just 30 times better at almost no cost since the global_now
and now_offset variables now only need to be updated 30000 times a
second in the worst case, which is unnoticeable.

After this change, the wakeup rate jumped from 7.3M/s to 66M/s, meaning
that the loop delay went from 6.6us to 0.73us, that's a 9x improvement
when under load! With real tasks we're seeing a boost from 28M to 52M
wakeups/s. The clock_update_global_date() function now only takes
1.6%, it's good enough so that we don't need to go further.
This commit is contained in:
Willy Tarreau 2022-09-21 08:21:45 +02:00
parent 58b73f9fa8
commit 4eaf85f5d9

View File

@ -199,11 +199,9 @@ void clock_update_global_date()
uint old_now_ms;
ullong old_now;
ullong new_now;
ullong ofs, ofs_new;
ullong ofs_new;
uint sec_ofs, usec_ofs;
ofs = HA_ATOMIC_LOAD(&now_offset);
/* now that we have bounded the local time, let's check if it's
* realistic regarding the global date, which only moves forward,
* otherwise catch up.
@ -219,15 +217,28 @@ void clock_update_global_date()
now = tmp_now;
/* now <now> is expected to be the most accurate date,
* equal to <global_now> or newer.
* equal to <global_now> or newer. Updating the global
* date too often causes extreme contention and is not
* needed: it's only used to help threads run at the
* same date in case of local drift, and the global date,
* which changes, is only used by freq counters (a choice
* which is debatable by the way since it changes under us).
* Tests have seen that the contention can be reduced from
* 37% in this function to almost 0% when keeping clocks
* synchronized no better than 32 microseconds, so that's
* what we're doing here.
*/
new_now = ((ullong)now.tv_sec << 32) + (uint)now.tv_usec;
now_ms = __tv_to_ms(&now);
if (!((new_now ^ old_now) & ~0x1FULL))
return;
/* let's try to update the global <now> (both in timeval
* and ms forms) or loop again.
*/
} while (((new_now != old_now && !_HA_ATOMIC_CAS(&global_now, &old_now, new_now)) ||
} while ((!_HA_ATOMIC_CAS(&global_now, &old_now, new_now) ||
(now_ms != old_now_ms && !_HA_ATOMIC_CAS(&global_now_ms, &old_now_ms, now_ms))) &&
__ha_cpu_relax());
@ -244,8 +255,7 @@ void clock_update_global_date()
sec_ofs -= 1;
}
ofs_new = ((ullong)sec_ofs << 32) + usec_ofs;
if (ofs_new != ofs)
HA_ATOMIC_STORE(&now_offset, ofs_new);
HA_ATOMIC_STORE(&now_offset, ofs_new);
}
/* must be called once at boot to initialize some global variables */