linux/kernel
Cliff Wickman 470fd64644 hotplug cpu: migrate a task within its cpuset
When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have
been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system to
have its tasks confined to a cpuset so as to cluster them near the memory that
they share.  Furthermore, it is typical to explicitly place such a task on a
specific cpu in that cpuset.  And in that case the task's cpus_allowed
includes only a single cpu.

This patch would insert a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed

In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will
not block.  (name change - per Oleg's suggestion)

Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set
the cpuset mutex during the whole migrate_live_tasks() and
migrate_dead_tasks() procedure.

[akpm@linux-foundation.org: build fix]
[pj@sgi.com: Fix indentation and spacing]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:44 -07:00
..
irq Compile handle_percpu_irq even for uniprocessor kernels 2007-10-17 08:43:00 -07:00
power Hibernation: Enter platform hibernation state in a consistent way 2007-10-18 14:37:20 -07:00
time kernel/time/clocksource.c: Use list_for_each_entry instead of list_for_each 2007-10-19 11:53:38 -07:00
.gitignore
acct.c whitespace fixes: process accounting 2007-10-18 14:37:24 -07:00
audit.c whitespace fixes: system auditing 2007-10-18 14:37:25 -07:00
audit.h Audit: add TTY input auditing 2007-07-16 09:05:47 -07:00
auditfilter.c whitespace fixes: audit filtering 2007-10-18 14:37:24 -07:00
auditsc.c whitespace fixes: syscall auditing 2007-10-18 14:37:25 -07:00
capability.c Uninline find_pid etc set of functions 2007-10-19 11:53:41 -07:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
cgroup.c Control groups: Replace "cont" with "cgrp" and other misc renaming 2007-10-19 11:53:43 -07:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c use simple_read_from_buffer in kernel/ 2007-05-09 12:30:49 -07:00
cpu_acct.c Task Control Groups: example CPU accounting subsystem 2007-10-19 11:53:36 -07:00
cpu.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
cpuset.c hotplug cpu: migrate a task within its cpuset 2007-10-19 11:53:44 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
extable.c
fork.c Isolate the explicit usage of signal->pgrp 2007-10-19 11:53:43 -07:00
futex_compat.c Uninline find_task_by_xxx set of functions 2007-10-19 11:53:40 -07:00
futex.c Uninline find_task_by_xxx set of functions 2007-10-19 11:53:40 -07:00
hrtimer.c hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier 2007-10-18 22:54:18 +02:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c kallsyms: make KSYM_NAME_LEN include space for trailing '\0' 2007-07-17 10:23:03 -07:00
Kconfig.hz [PATCH] HZ: 300Hz support 2006-12-07 08:39:36 -08:00
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c pid namespaces: define is_global_init() and is_container_init() 2007-10-19 11:53:37 -07:00
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kmod.c Restore call_usermodehelper_pipe() behaviour 2007-09-11 17:21:20 -07:00
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c kthread: silence bogus section mismatch warning 2007-07-31 15:39:42 -07:00
latency.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
lockdep_internals.h [PATCH] lockdep: more chains 2006-12-07 08:39:43 -08:00
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
lockdep.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
Makefile cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
module.c whitespace fixes: module loading 2007-10-18 14:37:25 -07:00
mutex-debug.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
mutex-debug.h
mutex.c lockdep: fixup mutex annotations 2007-10-11 22:11:12 +02:00
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c pid namespaces: allow cloning of new namespace 2007-10-19 11:53:39 -07:00
panic.c whitespace fixes: panic handling 2007-10-18 14:37:25 -07:00
params.c param_sysfs_builtin memchr argument fix 2007-10-18 14:37:21 -07:00
pid.c Uninline the task_xid_nr_ns() calls 2007-10-19 11:53:41 -07:00
posix-cpu-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
posix-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
printk.c serial: turn serial console suspend a boot rather than compile time option 2007-10-18 14:37:19 -07:00
profile.c make kernel/profile.c:time_hook static 2007-10-17 08:42:55 -07:00
ptrace.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c memory unplug: memory hotplug cleanup 2007-10-16 09:43:01 -07:00
rtmutex_common.h FUTEX: Tidy up the code 2007-07-16 09:05:49 -07:00
rtmutex-debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex-debug.h
rtmutex-tester.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
rtmutex.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex.h
rwsem.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
sched_debug.c sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched_fair.c sched: fix new task startup crash 2007-10-17 16:55:11 +02:00
sched_idletask.c sched: mark scheduling classes as const 2007-10-15 17:00:12 +02:00
sched_rt.c sched: tidy up SCHED_RR 2007-10-15 17:00:13 +02:00
sched_stats.h sched: reduce schedstat variable overhead a bit 2007-10-18 21:32:56 +02:00
sched.c hotplug cpu: migrate a task within its cpuset 2007-10-19 11:53:44 -07:00
seccomp.c make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
signal.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
softirq.c [KERNEL]: Unexport raise_softirq_irqoff 2007-10-10 16:49:18 -07:00
softlockup.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
spinlock.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
srcu.c [PATCH] SRCU: report out-of-memory errors 2006-10-04 07:55:30 -07:00
stacktrace.c
stop_machine.c Fix stop_machine_run problem with naughty real time process 2007-07-16 09:05:41 -07:00
sys_ni.c kernel/sys_ni.c: add dummy sys_ni_syscall() prototype 2007-10-17 08:42:55 -07:00
sys.c Isolate the explicit usage of signal->pgrp 2007-10-19 11:53:43 -07:00
sysctl_check.c V3 file capabilities: alter behavior of cap_setpcap 2007-10-18 14:37:24 -07:00
sysctl.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
taskstats.c Add cgroupstats 2007-10-19 11:53:36 -07:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
user_namespace.c Fix user namespace exiting OOPs 2007-09-19 11:24:18 -07:00
user.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched 2007-10-17 09:11:18 -07:00
utsname_sysctl.c remove CONFIG_UTS_NS and CONFIG_IPC_NS 2007-07-16 09:05:47 -07:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
wait.c Fix occurrences of "the the " 2007-05-09 08:57:56 +02:00
workqueue.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00