MINOR: clock: do not update the global date too often
Tests with forced wakeups on a 24c/48t machine showed that we're caping at 7.3M loops/s, which means 6.6 microseconds of loop delay without having anything to do. This is caused by two factors: - the load and update of the now_offset variable - the update of the global_now variable What is happening is that threads are not running within the one- microsecond time precision provided by gettimeofday(), so each thread waking up sees a slightly different date and causes undesired updates to global_now. But worse, these undesired updates mean that we then have to adjust the now_offset to match that, and adds significant noise to this variable, which then needs to be updated upon each call. By only allowing sightly less precision we can completely eliminate that contention. Here we're ignoring the 5 lowest bits of the usec part, meaning that the global_now variable may be off by up to 31 us (16 on avg). The variable is only used to correct the time drift some threads might be observing in environments where CPU clocks are not synchronized, and it's used by freq counters. In both cases we don't need that level of precision and even one millisecond would be pretty fine. We're just 30 times better at almost no cost since the global_now and now_offset variables now only need to be updated 30000 times a second in the worst case, which is unnoticeable. After this change, the wakeup rate jumped from 7.3M/s to 66M/s, meaning that the loop delay went from 6.6us to 0.73us, that's a 9x improvement when under load! With real tasks we're seeing a boost from 28M to 52M wakeups/s. The clock_update_global_date() function now only takes 1.6%, it's good enough so that we don't need to go further.
This commit is contained in:
parent
58b73f9fa8
commit
4eaf85f5d9
24
src/clock.c
24
src/clock.c
@ -199,11 +199,9 @@ void clock_update_global_date()
|
||||
uint old_now_ms;
|
||||
ullong old_now;
|
||||
ullong new_now;
|
||||
ullong ofs, ofs_new;
|
||||
ullong ofs_new;
|
||||
uint sec_ofs, usec_ofs;
|
||||
|
||||
ofs = HA_ATOMIC_LOAD(&now_offset);
|
||||
|
||||
/* now that we have bounded the local time, let's check if it's
|
||||
* realistic regarding the global date, which only moves forward,
|
||||
* otherwise catch up.
|
||||
@ -219,15 +217,28 @@ void clock_update_global_date()
|
||||
now = tmp_now;
|
||||
|
||||
/* now <now> is expected to be the most accurate date,
|
||||
* equal to <global_now> or newer.
|
||||
* equal to <global_now> or newer. Updating the global
|
||||
* date too often causes extreme contention and is not
|
||||
* needed: it's only used to help threads run at the
|
||||
* same date in case of local drift, and the global date,
|
||||
* which changes, is only used by freq counters (a choice
|
||||
* which is debatable by the way since it changes under us).
|
||||
* Tests have seen that the contention can be reduced from
|
||||
* 37% in this function to almost 0% when keeping clocks
|
||||
* synchronized no better than 32 microseconds, so that's
|
||||
* what we're doing here.
|
||||
*/
|
||||
|
||||
new_now = ((ullong)now.tv_sec << 32) + (uint)now.tv_usec;
|
||||
now_ms = __tv_to_ms(&now);
|
||||
|
||||
if (!((new_now ^ old_now) & ~0x1FULL))
|
||||
return;
|
||||
|
||||
/* let's try to update the global <now> (both in timeval
|
||||
* and ms forms) or loop again.
|
||||
*/
|
||||
} while (((new_now != old_now && !_HA_ATOMIC_CAS(&global_now, &old_now, new_now)) ||
|
||||
} while ((!_HA_ATOMIC_CAS(&global_now, &old_now, new_now) ||
|
||||
(now_ms != old_now_ms && !_HA_ATOMIC_CAS(&global_now_ms, &old_now_ms, now_ms))) &&
|
||||
__ha_cpu_relax());
|
||||
|
||||
@ -244,8 +255,7 @@ void clock_update_global_date()
|
||||
sec_ofs -= 1;
|
||||
}
|
||||
ofs_new = ((ullong)sec_ofs << 32) + usec_ofs;
|
||||
if (ofs_new != ofs)
|
||||
HA_ATOMIC_STORE(&now_offset, ofs_new);
|
||||
HA_ATOMIC_STORE(&now_offset, ofs_new);
|
||||
}
|
||||
|
||||
/* must be called once at boot to initialize some global variables */
|
||||
|
Loading…
Reference in New Issue
Block a user