linux

iv/linux

History

Peter Zijlstra dbfb089d36 sched: Fix loadavg accounting race The recent commit: `c6e7bd7afa` ("sched/core: Optimize ttwu() spinning on p->on_cpu") moved these lines in ttwu(): p->sched_contributes_to_load = !!task_contributes_to_load(p); p->state = TASK_WAKING; up before: smp_cond_load_acquire(&p->on_cpu, !VAL); into the 'p->on_rq == 0' block, with the thinking that once we hit schedule() the current task cannot change it's ->state anymore. And while this is true, it is both incorrect and flawed. It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0' to avoid weak hardware from re-ordering things for us. This can fairly easily be achieved by relying on the control-dependency already in place. The second problem, which makes the flaw in the original argument, is that while schedule() will not change prev->state, it will read it a number of times (arguably too many times since it's marked volatile). The previous condition 'p->on_cpu == 0' was sufficient because that indicates schedule() has completed, and will no longer read prev->state. So now the trick is to make this same true for the (much) earlier 'prev->on_rq == 0' case. Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0' assignment needs to he a RELEASE, but adding additional ordering to schedule() is an unwelcome proposition at the best of times, doubly so for mere accounting. Luckily we can push the prev->state load up before rq->lock, with the only caveat that we then have to re-read the state after. However, we know that if it changed, we no longer have to worry about the blocking path. This gives us the required ordering, if we block, we did the prev->state load before an (effective) smp_mb() and the p->on_rq store needs not change. With this we end up with the effective ordering: LOAD p->state LOAD-ACQUIRE p->on_rq == 0 MB STORE p->on_rq, 0 STORE p->state, TASK_WAKING which ensures the TASK_WAKING store happens after the prev->state load, and all is well again. Fixes: `c6e7bd7afa` ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Dave Jones <davej@codemonkey.org.uk> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net		2020-07-08 11:38:49 +02:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h
clock.c	sched/clock: Use static_branch_likely() with sched_clock_running	2019-11-29 08:10:54 +01:00
completion.c	completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()	2020-03-23 18:40:25 +01:00
core.c	sched: Fix loadavg accounting race	2020-07-08 11:38:49 +02:00
cpuacct.c	sched/cpuacct: Fix charge cpuacct.usage_sys	2020-05-19 20:34:14 +02:00
cpudeadline.c	Linux 5.2-rc5	2019-06-17 12:12:27 +02:00
cpudeadline.h
cpufreq_schedutil.c	sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()	2019-12-25 10:42:08 +01:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cpupri.c	sched/rt: cpupri_find: Trigger a full search as fallback	2020-03-20 13:06:20 +01:00
cpupri.h	sched/rt: Optimize cpupri_find() on non-heterogenous systems	2020-03-06 12:57:27 +01:00
cputime.c	sched/vtime: Work around an unitialized variable warning	2020-04-15 11:06:50 +02:00
deadline.c	sched/deadline: Initialize ->dl_boosted	2020-06-28 17:01:20 +02:00
debug.c	sched: Add rq::ttwu_pending	2020-05-28 10:54:16 +02:00
fair.c	sched/cfs: change initial value of runnable_avg	2020-06-28 17:01:20 +02:00
features.h	sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases	2019-10-29 10:01:07 +01:00
idle.c	cpuidle: Rearrange s2idle-specific idle state entry code	2020-06-25 13:52:53 +02:00
isolation.c	sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters	2020-04-15 10:38:26 +02:00
loadavg.c	timers/nohz: Update NOHZ load in remote tick	2020-01-28 21:36:44 +01:00
Makefile	kcsan: Improve various small stylistic details	2019-11-20 10:47:23 +01:00
membarrier.c	membarrier: Fix RCU locking bug caused by faulty merge	2019-10-01 21:27:50 +02:00
pelt.c	sched/pelt: Sync util/runnable_sum with PELT window when propagating	2020-05-19 20:34:14 +02:00
pelt.h	sched/pelt: Add support to track thermal pressure	2020-03-06 12:57:17 +01:00
psi.c	psi: Move PF_MEMSTALL out of task->flags	2020-03-20 13:06:19 +01:00
rt.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next	2020-06-03 16:27:18 -07:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched/core: s/WF_ON_RQ/WQ_ON_CPU/	2020-06-28 17:01:20 +02:00
smp.h	sched/headers: Split out open-coded prototypes into kernel/sched/smp.h	2020-05-28 11:03:20 +02:00
stats.c
stats.h	psi: Move PF_MEMSTALL out of task->flags	2020-03-20 13:06:19 +01:00
stop_task.c	sched/core: Further clarify sched_class::set_next_task()	2019-11-11 08:35:21 +01:00
swait.c	sched/swait: Prepare usage in completions	2020-03-21 16:00:23 +01:00
topology.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next	2020-06-03 16:27:18 -07:00
wait_bit.c	sched/wait: fix ___wait_var_event(exclusive)	2019-12-17 13:32:50 +01:00
wait.c	Add wake_up_interruptible_sync_poll_locked()	2019-10-31 15:12:23 +00:00