linux

iv/linux

History

Peter Zijlstra b6e13e8582 sched/core: Fix ttwu() race Paul reported rcutorture occasionally hitting a NULL deref: sched_ttwu_pending() ttwu_do_wakeup() check_preempt_curr() := check_preempt_wakeup() find_matching_se() is_same_group() if (se->cfs_rq == pse->cfs_rq) <-- BOOM Debugging showed that this only appears to happen when we take the new code-path from commit: `2ebb177175` ("sched/core: Offload wakee task activation if it the wakee is descheduling") and only when @cpu == smp_processor_id(). Something which should not be possible, because p->on_cpu can only be true for remote tasks. Similarly, without the new code-path from commit: `c6e7bd7afa` ("sched/core: Optimize ttwu() spinning on p->on_cpu") this would've unconditionally hit: smp_cond_load_acquire(&p->on_cpu, !VAL); and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this would result in an instant live-lock (with IRQs disabled), something that hasn't been reported. The NULL deref can be explained however if the task_cpu(p) load at the beginning of try_to_wake_up() returns an old value, and this old value happens to be smp_processor_id(). Further assume that the p->on_cpu load accurately returns 1, it really is still running, just not here. Then, when we enqueue the task locally, we can crash in exactly the observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from the wrong CPU, therefore we'll iterate into the non-existant parents and NULL deref. The closest semi-plausible scenario I've managed to contrive is somewhat elaborate (then again, actual reproduction takes many CPU hours of rcutorture, so it can't be anything obvious): X->cpu = 1 rq(1)->curr = X CPU0 CPU1 CPU2 // switch away from X LOCK rq(1)->lock smp_mb__after_spinlock dequeue_task(X) X->on_rq = 9 switch_to(Z) X->on_cpu = 0 UNLOCK rq(1)->lock // migrate X to cpu 0 LOCK rq(1)->lock dequeue_task(X) set_task_cpu(X, 0) X->cpu = 0 UNLOCK rq(1)->lock LOCK rq(0)->lock enqueue_task(X) X->on_rq = 1 UNLOCK rq(0)->lock // switch to X LOCK rq(0)->lock smp_mb__after_spinlock switch_to(X) X->on_cpu = 1 UNLOCK rq(0)->lock // X goes sleep X->state = TASK_UNINTERRUPTIBLE smp_mb(); // wake X ttwu() LOCK X->pi_lock smp_mb__after_spinlock if (p->state) cpu = X->cpu; // =? 1 smp_rmb() // X calls schedule() LOCK rq(0)->lock smp_mb__after_spinlock dequeue_task(X) X->on_rq = 0 if (p->on_rq) smp_rmb(); if (p->on_cpu && ttwu_queue_wakelist(..)) [*] smp_cond_load_acquire(&p->on_cpu, !VAL) cpu = select_task_rq(X, X->wake_cpu, ...) if (X->cpu != cpu) switch_to(Y) X->on_cpu = 0 UNLOCK rq(0)->lock However I'm having trouble convincing myself that's actually possible on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu observes ->state != RUNNING, it must also observe ->cpu != 1. (Most of the previous ttwu() races were found on very large PowerPC) Nevertheless, this fully explains the observed failure case. Fix it by ordering the task_cpu(p) load after the p->on_cpu load, which is easy since nothing actually uses @cpu before this. Fixes: `c6e7bd7afa` ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net		2020-06-28 17:01:20 +02:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h
clock.c	sched/clock: Use static_branch_likely() with sched_clock_running	2019-11-29 08:10:54 +01:00
completion.c	completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()	2020-03-23 18:40:25 +01:00
core.c	sched/core: Fix ttwu() race	2020-06-28 17:01:20 +02:00
cpuacct.c	sched/cpuacct: Fix charge cpuacct.usage_sys	2020-05-19 20:34:14 +02:00
cpudeadline.c	Linux 5.2-rc5	2019-06-17 12:12:27 +02:00
cpudeadline.h
cpufreq_schedutil.c	sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()	2019-12-25 10:42:08 +01:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cpupri.c	sched/rt: cpupri_find: Trigger a full search as fallback	2020-03-20 13:06:20 +01:00
cpupri.h	sched/rt: Optimize cpupri_find() on non-heterogenous systems	2020-03-06 12:57:27 +01:00
cputime.c	sched/vtime: Work around an unitialized variable warning	2020-04-15 11:06:50 +02:00
deadline.c	sched/deadline: Initialize ->dl_boosted	2020-06-28 17:01:20 +02:00
debug.c	sched: Add rq::ttwu_pending	2020-05-28 10:54:16 +02:00
fair.c	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
features.h	sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases	2019-10-29 10:01:07 +01:00
idle.c	sched: Replace rq::wake_list	2020-05-28 10:54:16 +02:00
isolation.c	sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters	2020-04-15 10:38:26 +02:00
loadavg.c	timers/nohz: Update NOHZ load in remote tick	2020-01-28 21:36:44 +01:00
Makefile	kcsan: Improve various small stylistic details	2019-11-20 10:47:23 +01:00
membarrier.c	membarrier: Fix RCU locking bug caused by faulty merge	2019-10-01 21:27:50 +02:00
pelt.c	sched/pelt: Sync util/runnable_sum with PELT window when propagating	2020-05-19 20:34:14 +02:00
pelt.h	sched/pelt: Add support to track thermal pressure	2020-03-06 12:57:17 +01:00
psi.c	psi: Move PF_MEMSTALL out of task->flags	2020-03-20 13:06:19 +01:00
rt.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next	2020-06-03 16:27:18 -07:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched: Replace rq::wake_list	2020-05-28 10:54:16 +02:00
smp.h	sched/headers: Split out open-coded prototypes into kernel/sched/smp.h	2020-05-28 11:03:20 +02:00
stats.c
stats.h	psi: Move PF_MEMSTALL out of task->flags	2020-03-20 13:06:19 +01:00
stop_task.c	sched/core: Further clarify sched_class::set_next_task()	2019-11-11 08:35:21 +01:00
swait.c	sched/swait: Prepare usage in completions	2020-03-21 16:00:23 +01:00
topology.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next	2020-06-03 16:27:18 -07:00
wait_bit.c	sched/wait: fix ___wait_var_event(exclusive)	2019-12-17 13:32:50 +01:00
wait.c	Add wake_up_interruptible_sync_poll_locked()	2019-10-31 15:12:23 +00:00