linux/kernel
Dmitry Adamushko 3e84050c81 cpusets, hotplug, scheduler: fix scheduler domain breakage
Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

  update_sched_domains() -> detach_destroy_domains(&cpu_online_map)

does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 11:37:02 +02:00
..
irq genirq: reenable a nobody cared disabled irq when a new driver arrives 2008-05-02 13:40:34 +02:00
power Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release 2008-04-30 13:58:00 -04:00
time clocksource: allow read access to available/current_clocksource 2008-05-03 18:11:48 +02:00
.gitignore Update kernel/.gitignore with new auto-generated files 2008-02-09 23:27:01 -08:00
acct.c bsd_acct: using task_struct->tgid is not right in pid-namespaces 2008-03-24 19:22:20 -07:00
audit_tree.c [PATCH] list_for_each_rcu must die: audit 2008-05-17 03:30:23 -04:00
audit.c [patch 1/1] audit_send_reply(): fix error-path memory leak 2008-05-17 03:30:22 -04:00
audit.h [PATCH 1/2] audit: move extern declarations to audit.h 2008-04-28 06:28:04 -04:00
auditfilter.c Merge branch 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current 2008-04-29 11:41:22 -07:00
auditsc.c [PATCH] new predicate - AUDIT_FILETYPE 2008-04-28 06:28:37 -04:00
backtracetest.c x86: add a simple backtrace test module 2008-01-30 13:33:08 +01:00
bounds.c Add kbuild.h that contains common definitions for kbuild users 2008-04-29 08:06:29 -07:00
capability.c capabilities: remain source compatible with 32-bit raw legacy capability support. 2008-05-31 16:36:16 -07:00
cgroup_debug.c CGroup API files: move "releasable" to cgroup_debug subsystem 2008-04-29 08:06:09 -07:00
cgroup.c cgroups: remove node_ prefix_from ns subsystem 2008-05-24 09:56:14 -07:00
compat.c ntp: support for TAI 2008-05-01 08:03:59 -07:00
configs.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
cpu.c kernel: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
cpuset.c cpusets, hotplug, scheduler: fix scheduler domain breakage 2008-07-13 11:37:02 +02:00
delayacct.c
dma.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
exec_domain.c
exit.c signals: fix sigqueue_free() vs __exit_signal() race 2008-05-24 09:56:10 -07:00
extable.c
fork.c [PATCH] dup_fd() fixes, part 1 2008-05-16 17:22:26 -04:00
futex_compat.c futex_compat __user annotation 2008-03-30 14:18:41 -07:00
futex.c futexes: fix fault handling in futex_lock_pi 2008-06-23 13:31:15 +02:00
hrtimer.c hrtimer: remove duplicate helper function 2008-05-03 18:11:48 +02:00
itimer.c ITIMER_REAL: convert to use struct pid 2008-02-08 09:22:29 -08:00
kallsyms.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
Kconfig.hz
Kconfig.preempt rcu: move PREEMPT_RCU config option back under PREEMPT 2008-03-10 18:01:20 -07:00
kexec.c kexec: make extended crashkernel= syntax less confusing 2008-05-01 08:04:00 -07:00
kfifo.c
kgdb.c kgdb: sparse fix 2008-06-24 10:52:55 -05:00
kmod.c [PATCH] split linux/file.h 2008-05-01 13:08:16 -04:00
kprobes.c kprobes: fix error checking of batch registration 2008-06-12 18:05:40 -07:00
ksysfs.c
kthread.c Deprecate find_task_by_pid() 2008-04-30 08:29:48 -07:00
latencytop.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
lockdep_internals.h
lockdep_proc.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
lockdep.c Subject: lockdep: include all lock classes in all_lock_classes 2008-02-25 23:03:02 +01:00
Makefile sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2008-05-05 23:56:18 +02:00
marker.c make marker_debug static 2008-04-30 08:29:49 -07:00
module.c modules: proper cleanup of kobject without CONFIG_SYSFS 2008-05-23 13:09:33 +10:00
mutex-debug.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
mutex-debug.h
mutex.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
mutex.h
notifier.c ipc: re-enable msgmni automatic recomputing msgmni if set to negative 2008-04-29 08:06:13 -07:00
ns_cgroup.c cgroups: kernel/ns_cgroup.c should #include <linux/nsproxy.h> 2008-04-29 08:06:07 -07:00
nsproxy.c ipc: sysvsem: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC) 2008-04-29 08:06:14 -07:00
panic.c Taint kernel after WARN_ON(condition) 2008-04-29 08:05:59 -07:00
params.c Add new string functions strict_strto* and convert kernel params to use them 2008-02-08 09:22:41 -08:00
pid_namespace.c pidns: make pid->level and pid_ns->level unsigned 2008-04-30 08:29:49 -07:00
pid.c pids: introduce change_pid() helper 2008-04-30 08:29:48 -07:00
pm_qos_params.c pm qos infrastructure and interface 2008-02-05 09:44:22 -08:00
posix-cpu-timers.c remove div_long_long_rem 2008-05-01 08:03:58 -07:00
posix-timers.c signals: join send_sigqueue() with send_group_sigqueue() 2008-04-30 08:29:36 -07:00
printk.c printk: don't read beyond string arguments' terminating zero 2008-04-30 08:29:52 -07:00
profile.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
ptrace.c make generic sys_ptrace unconditional 2008-05-01 10:21:54 -07:00
rcuclassic.c
rcupdate.c rcupdate: fix comment 2008-02-13 16:21:18 -08:00
rcupreempt_trace.c
rcupreempt.c rcupreempt: remove export of rcu_batches_completed_bh 2008-06-19 09:45:37 +02:00
rcutorture.c kernel: explicitly include required header files under kernel/ 2008-04-29 08:06:04 -07:00
relay.c splice: fix sendfile() issue with relay 2008-05-28 14:49:27 +02:00
res_counter.c memcgroup: add the max_usage member on the res_counter 2008-04-29 08:06:10 -07:00
resource.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
rtmutex_common.h Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.c Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c hrtimer: more hrtimer_init_sleeper() fallout. 2008-02-13 15:45:36 +01:00
rtmutex.h
rwsem.c
sched_clock.c sched: fix sched_clock_cpu() 2008-05-29 11:29:19 +02:00
sched_debug.c revert ("sched: fair-group: SMP-nice for group scheduling") 2008-05-29 11:28:57 +02:00
sched_fair.c sched: stop wake_affine from causing serious imbalance 2008-05-29 11:29:20 +02:00
sched_features.h sched: /debug/sched_features 2008-04-19 19:45:00 +02:00
sched_idletask.c sched: make rt_sched_class, idle_sched_class static 2008-05-05 23:56:17 +02:00
sched_rt.c sched: rt: dont stop the period timer when there are tasks wanting to run 2008-06-20 11:00:19 +02:00
sched_stats.h sched, delay accounting: fix incorrect delay time when constantly waiting on runqueue 2008-06-19 14:15:28 +02:00
sched.c sched: fix cpu hotplug, cleanup 2008-07-10 20:39:58 +02:00
seccomp.c
semaphore.c Revert "semaphore: fix" 2008-05-10 20:43:22 -07:00
signal.c posix timers: discard SI_TIMER signals on exec 2008-05-26 10:37:07 -07:00
softirq.c Fix cpu hotplug problem in softirq code 2008-05-01 08:03:58 -07:00
softlockup.c softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression 2008-06-19 09:45:38 +02:00
spinlock.c spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
srcu.c make srcu_readers_active() static 2008-02-06 10:41:02 -08:00
stacktrace.c
stop_machine.c stop_machine: make stop_machine_run more virtualization friendly 2008-05-23 13:09:34 +10:00
sys_ni.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
sys.c sys_prctl(): fix return of uninitialized value 2008-05-24 09:56:13 -07:00
sysctl_check.c constify tables in kernel/sysctl_check.c 2008-02-08 09:22:31 -08:00
sysctl.c [PATCH] avoid multiplication overflows and signedness issues for max_fds 2008-05-16 17:22:52 -04:00
taskstats.c Use find_task_by_vpid in taskstats 2008-04-30 08:29:48 -07:00
test_kprobes.c kprobes: kretprobe user entry-handler 2008-02-06 10:41:11 -08:00
time.c Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timeconst.pl Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timer.c debugobjects: add timer specific object debugging code 2008-04-30 08:29:53 -07:00
tsacct.c
uid16.c asmlinkage_protect replaces prevent_tail_call 2008-04-10 17:28:26 -07:00
user_namespace.c eCryptfs: make key module subsystem respect namespaces 2008-04-29 08:06:07 -07:00
user.c alloc_uid: cleanup 2008-04-30 08:29:53 -07:00
utsname_sysctl.c
utsname.c kernel: explicitly include required header files under kernel/ 2008-04-29 08:06:04 -07:00
wait.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
workqueue.c workqueue: remove redundant function invocation 2008-05-01 08:04:02 -07:00