linux

iv/linux

History

Patrick Bellasi c469933e77 sched/fair: Fix cpu_util_wake() for 'execl' type workloads A ~10% regression has been reported for UnixBench's execl throughput test by Aaron Lu and Ye Xiaolong: https://lkml.org/lkml/2018/10/30/765 That test is pretty simple, it does a "recursive" execve() syscall on the same binary. Starting from the syscall, this sequence is possible: do_execve() do_execveat_common() __do_execve_file() sched_exec() select_task_rq_fair() <==\| Task already enqueued find_idlest_cpu() find_idlest_group() capacity_spare_wake() <==\| Functions not called from cpu_util_wake() \| the wakeup path which means we can end up calling cpu_util_wake() not only from the "wakeup path", as its name would suggest. Indeed, the task doing an execve() syscall is already enqueued on the CPU we want to get the cpu_util_wake() for. The estimated utilization for a CPU computed in cpu_util_wake() was written under the assumption that function can be called only from the wakeup path. If instead the task is already enqueued, we end up with a utilization which does not remove the current task's contribution from the estimated utilization of the CPU. This will wrongly assume a reduced spare capacity on the current CPU and increase the chances to migrate the task on execve. The regression is tracked down to: commit `d519329f72` ("sched/fair: Update util_est only on util_avg updates") because in that patch we turn on by default the UTIL_EST sched feature. However, the real issue is introduced by: commit `f9be3e5961` ("sched/fair: Use util_est in LB and WU paths") Let's fix this by ensuring to always discount the task estimated utilization from the CPU's estimated utilization when the task is also the current one. The same benchmark of the bug report, executed on a dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine, reports these "Execl Throughput" figures (higher the better): mainline : 48136.5 lps mainline+fix : 55376.5 lps which correspond to a 15% speedup. Moreover, since {cpu_util,capacity_spare}_wake() are not really only used from the wakeup path, let's remove this ambiguity by using a better matching name: {cpu_util,capacity_spare}_without(). Since we are at that, let's also improve the existing documentation. Reported-by: Aaron Lu <aaron.lu@intel.com> Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Tested-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Fixes: `f9be3e5961` (sched/fair: Use util_est in LB and WU paths) Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/ Signed-off-by: Ingo Molnar <mingo@kernel.org>		2018-11-12 05:00:46 +01:00
..
autogroup.c	sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]	2018-05-05 08:34:42 +02:00
autogroup.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
clock.c	sched/clock: Disable interrupts when calling generic_sched_clock_init()	2018-07-30 19:33:35 +02:00
completion.c	sched/Documentation: Update wake_up() & co. memory-barrier guarantees	2018-07-17 09:30:34 +02:00
core.c	sched/core: Take the hotplug lock in sched_init_smp()	2018-11-04 00:57:44 +01:00
cpuacct.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpufreq_schedutil.c	sched/fair: Remove #ifdefs from scale_rt_capacity()	2018-07-25 11:41:05 +02:00
cpufreq.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpupri.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpupri.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cputime.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
deadline.c	sched/numa: Pass destination CPU as a parameter to migrate_task_rq	2018-10-02 09:42:21 +02:00
debug.c	sched/debug: Fix potential deadlock when writing to sched_features	2018-09-10 10:13:45 +02:00
fair.c	sched/fair: Fix cpu_util_wake() for 'execl' type workloads	2018-11-12 05:00:46 +01:00
features.h	sched/fair: Disable LB_BIAS by default	2018-10-02 09:45:01 +02:00
idle.c	x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()	2018-10-22 04:07:24 +02:00
isolation.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
loadavg.c	sched: loadavg: make calc_load_n() public	2018-10-26 16:26:32 -07:00
Makefile	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
membarrier.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
pelt.c	sched/fair: Remove setting task's se->runnable_weight during PELT update	2018-10-02 09:45:03 +02:00
pelt.h	sched/pelt: Fix warning and clean up IRQ PELT config	2018-10-02 09:45:00 +02:00
psi.c	psi: cgroup support	2018-10-26 16:26:32 -07:00
rt.c	sched/rt: Update comment in pick_next_task_rt()	2018-10-29 07:18:04 +01:00
sched-pelt.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sched.h	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
stats.c	proc: introduce proc_create_seq{,_data}	2018-05-16 07:23:35 +02:00
stats.h	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
stop_task.c	sched: Clean up and harmonize the coding style of the scheduler code base	2018-03-03 15:50:21 +01:00
swait.c	sched/swait: Rename to exclusive	2018-06-20 11:35:56 +02:00
topology.c	sched/topology: Fix off by one bug	2018-11-04 00:40:03 +01:00
wait_bit.c	sched/wait: Improve __var_waitqueue() code generation	2018-03-20 08:23:25 +01:00
wait.c	sched/wait: assert the wait_queue_head lock is held in __wake_up_common	2018-08-22 10:52:47 -07:00