cd4699c5fd
The tasklist_lock popped up as a scalability bottleneck on some testing workloads. The readlocks in do_prlimit and set/getpriority are not necessary in all cases. Based on a cycles profile, it looked like ~87% of the time was spent in the kernel, ~42% of which was just trying to get *some* spinlock (queued_spin_lock_slowpath, not necessarily the tasklist_lock). The big offenders (with rough percentages in cycles of the overall trace): - do_wait 11% - setpriority 8% (this patchset) - kill 8% - do_exit 5% - clone 3% - prlimit64 2% (this patchset) - getrlimit 1% (this patchset) I can't easily test this patchset on the original workload for various reasons. Instead, I used the microbenchmark below to at least verify there was some improvement. This patchset had a 28% speedup (12% from baseline to set/getprio, then another 14% for prlimit). One interesting thing is that my libc's getrlimit() was calling prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64 had no effect - it essentially optimized the older syscalls only. I didn't do that in this patchset, but figured I'd mention it since it was an option from the previous patch's discussion. v3: https://lkml.kernel.org/r/20220106172041.522167-1-brho@google.com v2: https://lore.kernel.org/lkml/20220105212828.197013-1-brho@google.com/ - update_rlimit_cpu on the group_leader instead of for_each_thread. - update_rlimit_cpu still returns 0 or -ESRCH, even though we don't care about the error here. it felt safer that way in case someone uses that function again. v1: https://lore.kernel.org/lkml/20211213220401.1039578-1-brho@google.com/ int main(int argc, char **argv) { pid_t child; struct rlimit rlim[1]; fork(); fork(); fork(); fork(); fork(); fork(); for (int i = 0; i < 5000; i++) { child = fork(); if (child < 0) exit(1); if (child > 0) { usleep(1000); kill(child, SIGTERM); waitpid(child, NULL, 0); } else { for (;;) { setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0)); getrlimit(RLIMIT_CPU, rlim); } } } return 0; } Barret Rhoden (3): setpriority: only grab the tasklist_lock for PRIO_PGRP prlimit: make do_prlimit() static prlimit: do not grab the tasklist_lock include/linux/posix-timers.h | 2 +- include/linux/resource.h | 2 - kernel/sys.c | 127 +++++++++++++++++---------------- kernel/time/posix-cpu-timers.c | 12 +++- 4 files changed, 76 insertions(+), 67 deletions(-) I have dropped the first change in this series as an almost identical change was merged as commit |
||
---|---|---|
.. | ||
alarmtimer.c | ||
clockevents.c | ||
clocksource-wdtest.c | ||
clocksource.c | ||
hrtimer.c | ||
itimer.c | ||
jiffies.c | ||
Kconfig | ||
Makefile | ||
namespace.c | ||
ntp_internal.h | ||
ntp.c | ||
posix-clock.c | ||
posix-cpu-timers.c | ||
posix-stubs.c | ||
posix-timers.c | ||
posix-timers.h | ||
sched_clock.c | ||
test_udelay.c | ||
tick-broadcast-hrtimer.c | ||
tick-broadcast.c | ||
tick-common.c | ||
tick-internal.h | ||
tick-legacy.c | ||
tick-oneshot.c | ||
tick-sched.c | ||
tick-sched.h | ||
time_test.c | ||
time.c | ||
timeconst.bc | ||
timeconv.c | ||
timecounter.c | ||
timekeeping_debug.c | ||
timekeeping_internal.h | ||
timekeeping.c | ||
timekeeping.h | ||
timer_list.c | ||
timer.c | ||
vsyscall.c |