2005-04-17 02:20:36 +04:00
/* CPU control.
* ( C ) 2001 , 2002 , 2003 , 2004 Rusty Russell
*
* This code is licenced under the GPL .
*/
2020-04-02 00:40:33 +03:00
# include <linux/sched/mm.h>
2005-04-17 02:20:36 +04:00
# include <linux/proc_fs.h>
# include <linux/smp.h>
# include <linux/init.h>
# include <linux/notifier.h>
2017-02-08 20:51:30 +03:00
# include <linux/sched/signal.h>
2017-02-08 20:51:36 +03:00
# include <linux/sched/hotplug.h>
2019-04-11 06:34:46 +03:00
# include <linux/sched/isolation.h>
2017-02-08 20:51:36 +03:00
# include <linux/sched/task.h>
2018-11-25 21:33:39 +03:00
# include <linux/sched/smt.h>
2005-04-17 02:20:36 +04:00
# include <linux/unistd.h>
# include <linux/cpu.h>
cpu: introduce clear_tasks_mm_cpumask() helper
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 03:26:22 +04:00
# include <linux/oom.h>
# include <linux/rcupdate.h>
2011-05-23 22:51:41 +04:00
# include <linux/export.h>
2012-06-01 03:26:26 +04:00
# include <linux/bug.h>
2005-04-17 02:20:36 +04:00
# include <linux/kthread.h>
# include <linux/stop_machine.h>
2006-06-26 11:24:32 +04:00
# include <linux/mutex.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
# include <linux/gfp.h>
2011-11-03 03:59:25 +04:00
# include <linux/suspend.h>
2014-03-11 00:34:03 +04:00
# include <linux/lockdep.h>
2015-03-30 12:29:19 +03:00
# include <linux/tick.h>
2015-07-05 20:12:30 +03:00
# include <linux/irq.h>
2017-09-12 22:37:04 +03:00
# include <linux/nmi.h>
2016-02-26 21:43:38 +03:00
# include <linux/smpboot.h>
2016-08-18 15:57:17 +03:00
# include <linux/relay.h>
2016-08-23 15:53:19 +03:00
# include <linux/slab.h>
sched/scs: Reset task stack state in bringup_cpu()
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
2021-11-23 14:40:47 +03:00
# include <linux/scs.h>
2017-05-24 11:15:40 +03:00
# include <linux/percpu-rwsem.h>
2021-03-28 00:01:36 +03:00
# include <linux/cpuset.h>
random: clear fast pool, crng, and batches in cpuhp bring up
For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-14 00:48:04 +03:00
# include <linux/random.h>
2022-04-06 02:29:33 +03:00
# include <linux/cc_platform.h>
2016-02-26 21:43:28 +03:00
2014-06-06 16:40:17 +04:00
# include <trace/events/power.h>
2016-02-26 21:43:28 +03:00
# define CREATE_TRACE_POINTS
# include <trace/events/cpuhp.h>
2005-04-17 02:20:36 +04:00
2012-04-20 17:05:44 +04:00
# include "smpboot.h"
2016-02-26 21:43:28 +03:00
/**
2021-08-10 01:38:25 +03:00
* struct cpuhp_cpu_state - Per cpu hotplug state storage
2016-02-26 21:43:28 +03:00
* @ state : The current cpu state
* @ target : The target state
2021-08-10 01:38:25 +03:00
* @ fail : Current CPU hotplug callback state
2016-02-26 21:43:38 +03:00
* @ thread : Pointer to the hotplug thread
* @ should_run : Thread should execute
2016-04-08 15:40:15 +03:00
* @ rollback : Perform a rollback
2016-08-12 20:49:38 +03:00
* @ single : Single callback invocation
* @ bringup : Single callback bringup or teardown selector
2021-08-10 01:38:25 +03:00
* @ cpu : CPU number
* @ node : Remote CPU node ; for multi - instance , do a
* single entry callback for install / remove
* @ last : For multi - instance rollback , remember how far we got
2016-08-12 20:49:38 +03:00
* @ cb_state : The state for a single callback ( install / uninstall )
2016-02-26 21:43:38 +03:00
* @ result : Result of the operation
2017-09-20 20:00:19 +03:00
* @ done_up : Signal completion to the issuer of the task for cpu - up
* @ done_down : Signal completion to the issuer of the task for cpu - down
2016-02-26 21:43:28 +03:00
*/
struct cpuhp_cpu_state {
enum cpuhp_state state ;
enum cpuhp_state target ;
2017-09-20 20:00:21 +03:00
enum cpuhp_state fail ;
2016-02-26 21:43:38 +03:00
# ifdef CONFIG_SMP
struct task_struct * thread ;
bool should_run ;
2016-04-08 15:40:15 +03:00
bool rollback ;
2016-08-12 20:49:38 +03:00
bool single ;
bool bringup ;
2016-08-12 20:49:39 +03:00
struct hlist_node * node ;
2017-09-20 20:00:17 +03:00
struct hlist_node * last ;
2016-02-26 21:43:38 +03:00
enum cpuhp_state cb_state ;
int result ;
2017-09-20 20:00:19 +03:00
struct completion done_up ;
struct completion done_down ;
2016-02-26 21:43:38 +03:00
# endif
2016-02-26 21:43:28 +03:00
} ;
2017-09-20 20:00:21 +03:00
static DEFINE_PER_CPU ( struct cpuhp_cpu_state , cpuhp_state ) = {
. fail = CPUHP_INVALID ,
} ;
2016-02-26 21:43:28 +03:00
2019-07-22 21:47:16 +03:00
# ifdef CONFIG_SMP
cpumask_t cpus_booted_once_mask ;
# endif
2017-05-24 11:15:43 +03:00
# if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
2017-09-20 20:00:20 +03:00
static struct lockdep_map cpuhp_state_up_map =
STATIC_LOCKDEP_MAP_INIT ( " cpuhp_state-up " , & cpuhp_state_up_map ) ;
static struct lockdep_map cpuhp_state_down_map =
STATIC_LOCKDEP_MAP_INIT ( " cpuhp_state-down " , & cpuhp_state_down_map ) ;
2017-12-26 17:08:53 +03:00
static inline void cpuhp_lock_acquire ( bool bringup )
2017-09-20 20:00:20 +03:00
{
lock_map_acquire ( bringup ? & cpuhp_state_up_map : & cpuhp_state_down_map ) ;
}
2017-12-26 17:08:53 +03:00
static inline void cpuhp_lock_release ( bool bringup )
2017-09-20 20:00:20 +03:00
{
lock_map_release ( bringup ? & cpuhp_state_up_map : & cpuhp_state_down_map ) ;
}
# else
2017-12-26 17:08:53 +03:00
static inline void cpuhp_lock_acquire ( bool bringup ) { }
static inline void cpuhp_lock_release ( bool bringup ) { }
2017-09-20 20:00:20 +03:00
2017-05-24 11:15:43 +03:00
# endif
2016-02-26 21:43:28 +03:00
/**
2021-08-10 01:38:25 +03:00
* struct cpuhp_step - Hotplug state machine step
2016-02-26 21:43:28 +03:00
* @ name : Name of the step
* @ startup : Startup function of the step
* @ teardown : Teardown function of the step
2016-02-26 21:43:32 +03:00
* @ cant_stop : Bringup / teardown can ' t be stopped at this step
2021-08-10 01:38:25 +03:00
* @ multi_instance : State has multiple instances which get added afterwards
2016-02-26 21:43:28 +03:00
*/
struct cpuhp_step {
2016-08-12 20:49:39 +03:00
const char * name ;
union {
2016-09-05 16:28:36 +03:00
int ( * single ) ( unsigned int cpu ) ;
int ( * multi ) ( unsigned int cpu ,
struct hlist_node * node ) ;
} startup ;
2016-08-12 20:49:39 +03:00
union {
2016-09-05 16:28:36 +03:00
int ( * single ) ( unsigned int cpu ) ;
int ( * multi ) ( unsigned int cpu ,
struct hlist_node * node ) ;
} teardown ;
2021-08-10 01:38:25 +03:00
/* private: */
2016-08-12 20:49:39 +03:00
struct hlist_head list ;
2021-08-10 01:38:25 +03:00
/* public: */
2016-08-12 20:49:39 +03:00
bool cant_stop ;
bool multi_instance ;
2016-02-26 21:43:28 +03:00
} ;
2016-02-26 21:43:31 +03:00
static DEFINE_MUTEX ( cpuhp_state_mutex ) ;
2017-12-01 16:50:05 +03:00
static struct cpuhp_step cpuhp_hp_states [ ] ;
2016-02-26 21:43:28 +03:00
2016-08-12 20:49:38 +03:00
static struct cpuhp_step * cpuhp_get_step ( enum cpuhp_state state )
{
2017-12-01 16:50:05 +03:00
return cpuhp_hp_states + state ;
2016-08-12 20:49:38 +03:00
}
2021-02-16 13:35:06 +03:00
static bool cpuhp_step_empty ( bool bringup , struct cpuhp_step * step )
{
return bringup ? ! step - > startup . single : ! step - > teardown . single ;
}
2016-02-26 21:43:28 +03:00
/**
2021-08-10 01:38:25 +03:00
* cpuhp_invoke_callback - Invoke the callbacks for a given state
2016-02-26 21:43:28 +03:00
* @ cpu : The cpu for which the callback should be invoked
2017-09-20 20:00:16 +03:00
* @ state : The state to do callbacks for
2016-08-12 20:49:38 +03:00
* @ bringup : True if the bringup callback should be invoked
2017-09-20 20:00:16 +03:00
* @ node : For multi - instance , do a single entry callback for install / remove
* @ lastp : For multi - instance rollback , remember how far we got
2016-02-26 21:43:28 +03:00
*
2016-08-12 20:49:39 +03:00
* Called from cpu hotplug and from the state register machinery .
2021-08-10 01:38:25 +03:00
*
* Return : % 0 on success or a negative errno code
2016-02-26 21:43:28 +03:00
*/
2016-08-12 20:49:38 +03:00
static int cpuhp_invoke_callback ( unsigned int cpu , enum cpuhp_state state ,
2017-09-20 20:00:16 +03:00
bool bringup , struct hlist_node * node ,
struct hlist_node * * lastp )
2016-02-26 21:43:28 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
2016-08-12 20:49:38 +03:00
struct cpuhp_step * step = cpuhp_get_step ( state ) ;
2016-08-12 20:49:39 +03:00
int ( * cbm ) ( unsigned int cpu , struct hlist_node * node ) ;
int ( * cb ) ( unsigned int cpu ) ;
int ret , cnt ;
2017-09-20 20:00:21 +03:00
if ( st - > fail = = state ) {
st - > fail = CPUHP_INVALID ;
return - EAGAIN ;
}
2021-02-16 13:35:06 +03:00
if ( cpuhp_step_empty ( bringup , step ) ) {
WARN_ON_ONCE ( 1 ) ;
return 0 ;
}
2016-08-12 20:49:39 +03:00
if ( ! step - > multi_instance ) {
2017-09-20 20:00:16 +03:00
WARN_ON_ONCE ( lastp & & * lastp ) ;
2016-09-05 16:28:36 +03:00
cb = bringup ? step - > startup . single : step - > teardown . single ;
2021-02-16 13:35:06 +03:00
2016-08-12 20:49:38 +03:00
trace_cpuhp_enter ( cpu , st - > target , state , cb ) ;
2016-02-26 21:43:28 +03:00
ret = cb ( cpu ) ;
2016-08-12 20:49:38 +03:00
trace_cpuhp_exit ( cpu , st - > state , state , ret ) ;
2016-08-12 20:49:39 +03:00
return ret ;
}
2016-09-05 16:28:36 +03:00
cbm = bringup ? step - > startup . multi : step - > teardown . multi ;
2016-08-12 20:49:39 +03:00
/* Single invocation for instance add/remove */
if ( node ) {
2017-09-20 20:00:16 +03:00
WARN_ON_ONCE ( lastp & & * lastp ) ;
2016-08-12 20:49:39 +03:00
trace_cpuhp_multi_enter ( cpu , st - > target , state , cbm , node ) ;
ret = cbm ( cpu , node ) ;
trace_cpuhp_exit ( cpu , st - > state , state , ret ) ;
return ret ;
}
/* State transition. Invoke on all instances */
cnt = 0 ;
hlist_for_each ( node , & step - > list ) {
2017-09-20 20:00:16 +03:00
if ( lastp & & node = = * lastp )
break ;
2016-08-12 20:49:39 +03:00
trace_cpuhp_multi_enter ( cpu , st - > target , state , cbm , node ) ;
ret = cbm ( cpu , node ) ;
trace_cpuhp_exit ( cpu , st - > state , state , ret ) ;
2017-09-20 20:00:16 +03:00
if ( ret ) {
if ( ! lastp )
goto err ;
* lastp = node ;
return ret ;
}
2016-08-12 20:49:39 +03:00
cnt + + ;
}
2017-09-20 20:00:16 +03:00
if ( lastp )
* lastp = NULL ;
2016-08-12 20:49:39 +03:00
return 0 ;
err :
/* Rollback the instances if one failed */
2016-09-05 16:28:36 +03:00
cbm = ! bringup ? step - > startup . multi : step - > teardown . multi ;
2016-08-12 20:49:39 +03:00
if ( ! cbm )
return ret ;
hlist_for_each ( node , & step - > list ) {
if ( ! cnt - - )
break ;
2017-09-20 20:00:18 +03:00
trace_cpuhp_multi_enter ( cpu , st - > target , state , cbm , node ) ;
ret = cbm ( cpu , node ) ;
trace_cpuhp_exit ( cpu , st - > state , state , ret ) ;
/*
* Rollback must not fail ,
*/
WARN_ON_ONCE ( ret ) ;
2016-02-26 21:43:28 +03:00
}
return ret ;
}
2008-12-13 13:49:41 +03:00
# ifdef CONFIG_SMP
2018-03-15 18:38:04 +03:00
static bool cpuhp_is_ap_state ( enum cpuhp_state state )
{
/*
* The extra check for CPUHP_TEARDOWN_CPU is only for documentation
* purposes as that state is handled explicitly in cpu_down .
*/
return state > CPUHP_BRINGUP_CPU & & state ! = CPUHP_TEARDOWN_CPU ;
}
2017-09-20 20:00:19 +03:00
static inline void wait_for_ap_thread ( struct cpuhp_cpu_state * st , bool bringup )
{
struct completion * done = bringup ? & st - > done_up : & st - > done_down ;
wait_for_completion ( done ) ;
}
static inline void complete_ap_thread ( struct cpuhp_cpu_state * st , bool bringup )
{
struct completion * done = bringup ? & st - > done_up : & st - > done_down ;
complete ( done ) ;
}
/*
* The former STARTING / DYING states , ran with IRQs disabled and must not fail .
*/
static bool cpuhp_is_atomic_state ( enum cpuhp_state state )
{
return CPUHP_AP_IDLE_DEAD < = state & & state < CPUHP_AP_ONLINE ;
}
2008-12-30 01:35:14 +03:00
/* Serializes the updates to cpu_online_mask, cpu_present_mask */
2006-07-23 23:12:16 +04:00
static DEFINE_MUTEX ( cpu_add_remove_lock ) ;
2016-02-26 21:43:23 +03:00
bool cpuhp_tasks_frozen ;
EXPORT_SYMBOL_GPL ( cpuhp_tasks_frozen ) ;
2005-04-17 02:20:36 +04:00
2010-05-27 01:43:36 +04:00
/*
CPU hotplug: Provide lockless versions of callback registration functions
The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
The deadlock is shown below:
CPU 0 CPU 1
----- -----
Acquire cpu_hotplug.lock
[via get_online_cpus()]
CPU online/offline operation
takes cpu_add_remove_lock
[via cpu_maps_update_begin()]
Try to acquire
cpu_add_remove_lock
[via register_cpu_notifier()]
CPU online/offline operation
tries to acquire cpu_hotplug.lock
[via cpu_hotplug_begin()]
*** DEADLOCK! ***
The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.
This can be achieved by writing the callback registration code as follows:
cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
for_each_online_cpu(cpu)
init_cpu(cpu);
/* This doesn't take the cpu_add_remove_lock */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]
Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.
Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.
Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-11 00:34:14 +04:00
* The following two APIs ( cpu_maps_update_begin / done ) must be used when
* attempting to serialize the updates to cpu_online_mask & cpu_present_mask .
2010-05-27 01:43:36 +04:00
*/
void cpu_maps_update_begin ( void )
{
mutex_lock ( & cpu_add_remove_lock ) ;
}
void cpu_maps_update_done ( void )
{
mutex_unlock ( & cpu_add_remove_lock ) ;
}
2005-04-17 02:20:36 +04:00
2017-05-24 11:15:40 +03:00
/*
* If set , cpu_up and cpu_down will return - EBUSY and do nothing .
2006-09-26 10:32:48 +04:00
* Should always be manipulated under cpu_add_remove_lock
*/
static int cpu_hotplug_disabled ;
2010-05-27 01:43:36 +04:00
# ifdef CONFIG_HOTPLUG_CPU
2017-05-24 11:15:40 +03:00
DEFINE_STATIC_PERCPU_RWSEM ( cpu_hotplug_lock ) ;
2014-03-11 00:34:03 +04:00
2017-05-24 11:15:12 +03:00
void cpus_read_lock ( void )
2005-11-29 00:43:46 +03:00
{
2017-05-24 11:15:40 +03:00
percpu_down_read ( & cpu_hotplug_lock ) ;
2005-11-29 00:43:46 +03:00
}
2017-05-24 11:15:12 +03:00
EXPORT_SYMBOL_GPL ( cpus_read_lock ) ;
2005-11-09 08:34:24 +03:00
2018-07-24 21:26:04 +03:00
int cpus_read_trylock ( void )
{
return percpu_down_read_trylock ( & cpu_hotplug_lock ) ;
}
EXPORT_SYMBOL_GPL ( cpus_read_trylock ) ;
2017-05-24 11:15:12 +03:00
void cpus_read_unlock ( void )
2005-11-29 00:43:46 +03:00
{
2017-05-24 11:15:40 +03:00
percpu_up_read ( & cpu_hotplug_lock ) ;
2005-11-29 00:43:46 +03:00
}
2017-05-24 11:15:12 +03:00
EXPORT_SYMBOL_GPL ( cpus_read_unlock ) ;
2005-11-29 00:43:46 +03:00
2017-05-24 11:15:12 +03:00
void cpus_write_lock ( void )
2008-01-25 23:08:01 +03:00
{
2017-05-24 11:15:40 +03:00
percpu_down_write ( & cpu_hotplug_lock ) ;
2008-01-25 23:08:01 +03:00
}
2014-12-12 12:11:44 +03:00
2017-05-24 11:15:12 +03:00
void cpus_write_unlock ( void )
2008-01-25 23:08:01 +03:00
{
2017-05-24 11:15:40 +03:00
percpu_up_write ( & cpu_hotplug_lock ) ;
2008-01-25 23:08:01 +03:00
}
2017-05-24 11:15:40 +03:00
void lockdep_assert_cpus_held ( void )
2008-01-25 23:08:01 +03:00
{
2018-12-19 21:23:15 +03:00
/*
* We can ' t have hotplug operations before userspace starts running ,
* and some init codepaths will knowingly not take the hotplug lock .
* This is all valid , so mute lockdep until it makes sense to report
* unheld locks .
*/
if ( system_state < SYSTEM_RUNNING )
return ;
2017-05-24 11:15:40 +03:00
percpu_rwsem_assert_held ( & cpu_hotplug_lock ) ;
2008-01-25 23:08:01 +03:00
}
2010-05-27 01:43:36 +04:00
2020-11-12 01:53:13 +03:00
# ifdef CONFIG_LOCKDEP
int lockdep_is_cpus_held ( void )
{
return percpu_rwsem_is_held ( & cpu_hotplug_lock ) ;
}
# endif
2018-09-11 12:51:27 +03:00
static void lockdep_acquire_cpus_lock ( void )
{
2019-10-30 22:01:26 +03:00
rwsem_acquire ( & cpu_hotplug_lock . dep_map , 0 , 0 , _THIS_IP_ ) ;
2018-09-11 12:51:27 +03:00
}
static void lockdep_release_cpus_lock ( void )
{
2019-10-30 22:01:26 +03:00
rwsem_release ( & cpu_hotplug_lock . dep_map , _THIS_IP_ ) ;
2018-09-11 12:51:27 +03:00
}
2013-06-13 01:04:36 +04:00
/*
* Wait for currently running CPU hotplug operations to complete ( if any ) and
* disable future CPU hotplug ( from sysfs ) . The ' cpu_add_remove_lock ' protects
* the ' cpu_hotplug_disabled ' flag . The same lock is also acquired by the
* hotplug path before performing hotplug operations . So acquiring that lock
* guarantees mutual exclusion from any currently running hotplug operations .
*/
void cpu_hotplug_disable ( void )
{
cpu_maps_update_begin ( ) ;
2015-08-05 10:52:46 +03:00
cpu_hotplug_disabled + + ;
2013-06-13 01:04:36 +04:00
cpu_maps_update_done ( ) ;
}
2015-08-05 10:52:47 +03:00
EXPORT_SYMBOL_GPL ( cpu_hotplug_disable ) ;
2013-06-13 01:04:36 +04:00
2016-06-10 09:43:28 +03:00
static void __cpu_hotplug_enable ( void )
{
if ( WARN_ONCE ( ! cpu_hotplug_disabled , " Unbalanced cpu hotplug enable \n " ) )
return ;
cpu_hotplug_disabled - - ;
}
2013-06-13 01:04:36 +04:00
void cpu_hotplug_enable ( void )
{
cpu_maps_update_begin ( ) ;
2016-06-10 09:43:28 +03:00
__cpu_hotplug_enable ( ) ;
2013-06-13 01:04:36 +04:00
cpu_maps_update_done ( ) ;
}
2015-08-05 10:52:47 +03:00
EXPORT_SYMBOL_GPL ( cpu_hotplug_enable ) ;
2018-09-11 12:51:27 +03:00
# else
static void lockdep_acquire_cpus_lock ( void )
{
}
static void lockdep_release_cpus_lock ( void )
{
}
ACPI / processor: Acquire writer lock to update CPU maps
CPU system maps are protected with reader/writer locks. The reader
lock, get_online_cpus(), assures that the maps are not updated while
holding the lock. The writer lock, cpu_hotplug_begin(), is used to
udpate the cpu maps along with cpu_maps_update_begin().
However, the ACPI processor handler updates the cpu maps without
holding the the writer lock.
acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic()
is called from acpi_processor_remove() to update cpu_possible_mask.
Currently, they are either unprotected or protected with the reader
lock, which is not correct.
For example, the get_online_cpus() below is supposed to assure that
cpu_possible_mask is not changed while the code is iterating with
for_each_possible_cpu().
get_online_cpus();
for_each_possible_cpu(cpu) {
:
}
put_online_cpus();
However, this lock has no protection with CPU hotplug since the ACPI
processor handler does not use the writer lock when it updates
cpu_possible_mask. The reader lock does not serialize within the
readers.
This patch protects them with the writer lock with cpu_hotplug_begin()
along with cpu_maps_update_begin(), which must be held before calling
cpu_hotplug_begin(). It also protects arch_register_cpu() /
arch_unregister_cpu(), which creates / deletes a sysfs cpu device
interface. For this purpose it changes cpu_hotplug_begin() and
cpu_hotplug_done() to global and exports them in cpu.h.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-12 19:45:53 +04:00
# endif /* CONFIG_HOTPLUG_CPU */
2010-05-27 01:43:36 +04:00
2018-11-25 21:33:39 +03:00
/*
* Architectures that need SMT - specific errata handling during SMT hotplug
* should override this .
*/
void __weak arch_smt_update ( void ) { }
2018-06-29 17:05:48 +03:00
# ifdef CONFIG_HOTPLUG_SMT
enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED ;
2018-08-07 09:19:57 +03:00
2018-07-13 17:23:23 +03:00
void __init cpu_smt_disable ( bool force )
2018-06-29 17:05:48 +03:00
{
2019-09-16 19:22:56 +03:00
if ( ! cpu_smt_possible ( ) )
2018-07-13 17:23:23 +03:00
return ;
if ( force ) {
2018-06-29 17:05:48 +03:00
pr_info ( " SMT: Force disabled \n " ) ;
cpu_smt_control = CPU_SMT_FORCE_DISABLED ;
2018-07-13 17:23:23 +03:00
} else {
2018-10-04 20:22:27 +03:00
pr_info ( " SMT: disabled \n " ) ;
2018-07-13 17:23:23 +03:00
cpu_smt_control = CPU_SMT_DISABLED ;
2018-06-29 17:05:48 +03:00
}
2018-07-13 17:23:23 +03:00
}
2018-07-13 17:23:24 +03:00
/*
* The decision whether SMT is supported can only be done after the full
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 16:13:58 +03:00
* CPU identification . Called from architecture code .
2018-08-07 09:19:57 +03:00
*/
void __init cpu_smt_check_topology ( void )
{
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 16:13:58 +03:00
if ( ! topology_smt_supported ( ) )
2018-08-07 09:19:57 +03:00
cpu_smt_control = CPU_SMT_NOT_SUPPORTED ;
}
2018-07-13 17:23:23 +03:00
static int __init smt_cmdline_disable ( char * str )
{
cpu_smt_disable ( str & & ! strcmp ( str , " force " ) ) ;
2018-06-29 17:05:48 +03:00
return 0 ;
}
early_param ( " nosmt " , smt_cmdline_disable ) ;
static inline bool cpu_smt_allowed ( unsigned int cpu )
{
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 16:13:58 +03:00
if ( cpu_smt_control = = CPU_SMT_ENABLED )
2018-06-29 17:05:48 +03:00
return true ;
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 16:13:58 +03:00
if ( topology_is_primary_thread ( cpu ) )
2018-06-29 17:05:48 +03:00
return true ;
/*
* On x86 it ' s required to boot all logical CPUs at least once so
* that the init code can get a chance to set CR4 . MCE on each
2020-04-17 19:40:04 +03:00
* CPU . Otherwise , a broadcasted MCE observing CR4 . MCE = 0 b on any
2018-06-29 17:05:48 +03:00
* core will shutdown the machine .
*/
2019-07-22 21:47:16 +03:00
return ! cpumask_test_cpu ( cpu , & cpus_booted_once_mask ) ;
2018-06-29 17:05:48 +03:00
}
2019-09-16 19:22:56 +03:00
/* Returns true if SMT is not supported of forcefully (irreversibly) disabled */
bool cpu_smt_possible ( void )
{
return cpu_smt_control ! = CPU_SMT_FORCE_DISABLED & &
cpu_smt_control ! = CPU_SMT_NOT_SUPPORTED ;
}
EXPORT_SYMBOL_GPL ( cpu_smt_possible ) ;
2018-06-29 17:05:48 +03:00
# else
static inline bool cpu_smt_allowed ( unsigned int cpu ) { return true ; }
# endif
2017-09-20 20:00:17 +03:00
static inline enum cpuhp_state
2022-04-11 18:22:32 +03:00
cpuhp_set_state ( int cpu , struct cpuhp_cpu_state * st , enum cpuhp_state target )
2017-09-20 20:00:17 +03:00
{
enum cpuhp_state prev_state = st - > state ;
2021-04-20 21:04:19 +03:00
bool bringup = st - > state < target ;
2017-09-20 20:00:17 +03:00
st - > rollback = false ;
st - > last = NULL ;
st - > target = target ;
st - > single = false ;
2021-04-20 21:04:19 +03:00
st - > bringup = bringup ;
2022-04-11 18:22:32 +03:00
if ( cpu_dying ( cpu ) ! = ! bringup )
set_cpu_dying ( cpu , ! bringup ) ;
2017-09-20 20:00:17 +03:00
return prev_state ;
}
static inline void
2022-04-11 18:22:32 +03:00
cpuhp_reset_state ( int cpu , struct cpuhp_cpu_state * st ,
enum cpuhp_state prev_state )
2017-09-20 20:00:17 +03:00
{
2021-04-20 21:04:19 +03:00
bool bringup = ! st - > bringup ;
2021-02-16 13:35:06 +03:00
st - > target = prev_state ;
/*
* Already rolling back . No need invert the bringup value or to change
* the current state .
*/
if ( st - > rollback )
return ;
2017-09-20 20:00:17 +03:00
st - > rollback = true ;
/*
* If we have st - > last we need to undo partial multi_instance of this
* state first . Otherwise start undo at the previous state .
*/
if ( ! st - > last ) {
if ( st - > bringup )
st - > state - - ;
else
st - > state + + ;
}
2021-04-20 21:04:19 +03:00
st - > bringup = bringup ;
2022-04-11 18:22:32 +03:00
if ( cpu_dying ( cpu ) ! = ! bringup )
set_cpu_dying ( cpu , ! bringup ) ;
2017-09-20 20:00:17 +03:00
}
/* Regular hotplug invocation of the AP hotplug thread */
static void __cpuhp_kick_ap ( struct cpuhp_cpu_state * st )
{
if ( ! st - > single & & st - > state = = st - > target )
return ;
st - > result = 0 ;
/*
* Make sure the above stores are visible before should_run becomes
* true . Paired with the mb ( ) above in cpuhp_thread_fun ( )
*/
smp_mb ( ) ;
st - > should_run = true ;
wake_up_process ( st - > thread ) ;
2017-09-20 20:00:19 +03:00
wait_for_ap_thread ( st , st - > bringup ) ;
2017-09-20 20:00:17 +03:00
}
2022-04-11 18:22:32 +03:00
static int cpuhp_kick_ap ( int cpu , struct cpuhp_cpu_state * st ,
enum cpuhp_state target )
2017-09-20 20:00:17 +03:00
{
enum cpuhp_state prev_state ;
int ret ;
2022-04-11 18:22:32 +03:00
prev_state = cpuhp_set_state ( cpu , st , target ) ;
2017-09-20 20:00:17 +03:00
__cpuhp_kick_ap ( st ) ;
if ( ( ret = st - > result ) ) {
2022-04-11 18:22:32 +03:00
cpuhp_reset_state ( cpu , st , prev_state ) ;
2017-09-20 20:00:17 +03:00
__cpuhp_kick_ap ( st ) ;
}
return ret ;
}
2017-07-04 23:20:23 +03:00
2016-02-26 21:43:41 +03:00
static int bringup_wait_for_ap ( unsigned int cpu )
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
2017-07-04 23:20:23 +03:00
/* Wait for the CPU to reach CPUHP_AP_ONLINE_IDLE */
2017-09-20 20:00:19 +03:00
wait_for_ap_thread ( st , true ) ;
2017-07-11 23:06:24 +03:00
if ( WARN_ON_ONCE ( ( ! cpu_online ( cpu ) ) ) )
return - ECANCELED ;
2017-07-04 23:20:23 +03:00
2019-12-10 11:34:54 +03:00
/* Unpark the hotplug thread of the target cpu */
2017-07-04 23:20:23 +03:00
kthread_unpark ( st - > thread ) ;
2018-06-29 17:05:48 +03:00
/*
* SMT soft disabling on X86 requires to bring the CPU out of the
* BIOS ' wait for SIPI ' state in order to set the CR4 . MCE bit . The
2019-05-28 22:31:49 +03:00
* CPU marked itself as booted_once in notify_cpu_starting ( ) so the
2018-06-29 17:05:48 +03:00
* cpu_smt_allowed ( ) check will now return false if this is not the
* primary sibling .
*/
if ( ! cpu_smt_allowed ( cpu ) )
return - ECANCELED ;
2017-09-20 20:00:17 +03:00
if ( st - > target < = CPUHP_AP_ONLINE_IDLE )
return 0 ;
2022-04-11 18:22:32 +03:00
return cpuhp_kick_ap ( cpu , st , st - > target ) ;
2016-02-26 21:43:41 +03:00
}
2016-02-26 21:43:24 +03:00
static int bringup_cpu ( unsigned int cpu )
{
struct task_struct * idle = idle_thread_get ( cpu ) ;
int ret ;
sched/scs: Reset task stack state in bringup_cpu()
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
2021-11-23 14:40:47 +03:00
/*
* Reset stale stack state from the last time this CPU was online .
*/
scs_task_reset ( idle ) ;
kasan_unpoison_task_stack ( idle ) ;
2016-08-03 20:22:28 +03:00
/*
* Some architectures have to walk the irq descriptors to
* setup the vector space for the cpu which comes online .
* Prevent irq alloc / free across the bringup .
*/
irq_lock_sparse ( ) ;
2016-02-26 21:43:24 +03:00
/* Arch-specific enabling code. */
ret = __cpu_up ( cpu , idle ) ;
2016-08-03 20:22:28 +03:00
irq_unlock_sparse ( ) ;
cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(),
register_hotcpu_notifier(), register_cpu_notifier(),
__register_hotcpu_notifier(), __register_cpu_notifier(),
unregister_hotcpu_notifier(), unregister_cpu_notifier(),
__unregister_hotcpu_notifier(), __unregister_cpu_notifier()
are unused now. Remove them and all related code.
Remove also the now pointless cpu notifier error injection mechanism. The
states can be executed step by step and error rollback is the same as cpu
down, so any state transition can be tested w/o requiring the notifier
error injection.
Some CPU hotplug states are kept as they are (ab)used for hotplug state
tracking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-21 22:19:53 +03:00
if ( ret )
2016-02-26 21:43:24 +03:00
return ret ;
2017-07-04 23:20:23 +03:00
return bringup_wait_for_ap ( cpu ) ;
2016-02-26 21:43:24 +03:00
}
2020-04-02 00:40:33 +03:00
static int finish_cpu ( unsigned int cpu )
{
struct task_struct * idle = idle_thread_get ( cpu ) ;
struct mm_struct * mm = idle - > active_mm ;
/*
* idle_task_exit ( ) will have switched to & init_mm , now
* clean up any remaining active_mm state .
*/
if ( mm ! = & init_mm )
idle - > active_mm = & init_mm ;
mmdrop ( mm ) ;
return 0 ;
}
2016-02-26 21:43:37 +03:00
/*
* Hotplug state machine related functions
*/
2021-02-16 13:35:06 +03:00
/*
* Get the next state to run . Empty ones will be skipped . Returns true if a
* state must be run .
*
* st - > state will be modified ahead of time , to match state_to_run , as if it
* has already ran .
*/
static bool cpuhp_next_state ( bool bringup ,
enum cpuhp_state * state_to_run ,
struct cpuhp_cpu_state * st ,
enum cpuhp_state target )
2016-02-26 21:43:37 +03:00
{
2021-02-16 13:35:06 +03:00
do {
if ( bringup ) {
if ( st - > state > = target )
return false ;
* state_to_run = + + st - > state ;
} else {
if ( st - > state < = target )
return false ;
* state_to_run = st - > state - - ;
}
if ( ! cpuhp_step_empty ( bringup , cpuhp_get_step ( * state_to_run ) ) )
break ;
} while ( true ) ;
return true ;
}
2022-09-27 13:12:59 +03:00
static int __cpuhp_invoke_callback_range ( bool bringup ,
unsigned int cpu ,
struct cpuhp_cpu_state * st ,
enum cpuhp_state target ,
bool nofail )
2021-02-16 13:35:06 +03:00
{
enum cpuhp_state state ;
2022-09-27 13:12:59 +03:00
int ret = 0 ;
2021-02-16 13:35:06 +03:00
while ( cpuhp_next_state ( bringup , & state , st , target ) ) {
2022-09-27 13:12:59 +03:00
int err ;
2021-02-16 13:35:06 +03:00
err = cpuhp_invoke_callback ( cpu , state , bringup , NULL , NULL ) ;
2022-09-27 13:12:59 +03:00
if ( ! err )
continue ;
if ( nofail ) {
pr_warn ( " CPU %u %s state %s (%d) failed (%d) \n " ,
cpu , bringup ? " UP " : " DOWN " ,
cpuhp_get_step ( st - > state ) - > name ,
st - > state , err ) ;
ret = - 1 ;
} else {
ret = err ;
2021-02-16 13:35:06 +03:00
break ;
2022-09-27 13:12:59 +03:00
}
2021-02-16 13:35:06 +03:00
}
2022-09-27 13:12:59 +03:00
return ret ;
}
static inline int cpuhp_invoke_callback_range ( bool bringup ,
unsigned int cpu ,
struct cpuhp_cpu_state * st ,
enum cpuhp_state target )
{
return __cpuhp_invoke_callback_range ( bringup , cpu , st , target , false ) ;
}
static inline void cpuhp_invoke_callback_range_nofail ( bool bringup ,
unsigned int cpu ,
struct cpuhp_cpu_state * st ,
enum cpuhp_state target )
{
__cpuhp_invoke_callback_range ( bringup , cpu , st , target , true ) ;
2016-02-26 21:43:37 +03:00
}
cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
Tianyu reported a crash in a CPU hotplug teardown callback when booting a
kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
parameter.
It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
forever in case that a bringup callback fails. Unfortunately this issue was
not recognized when the CPU hotplug code was reworked, so the shortcoming
just stayed in place.
When a bringup callback fails, the CPU hotplug code rolls back the
operation and takes the CPU offline.
The 'nosmt' command line argument uses a bringup failure to abort the
bringup of SMT sibling CPUs. This partial bringup is required due to the
MCE misdesign on Intel CPUs.
With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
teardown of a CPU including the synchronizations in various facilities like
RCU, NOHZ and others.
As a consequence the teardown callbacks which must be executed on the
outgoing CPU within stop machine with interrupts disabled are executed on
the control CPU in interrupt enabled and preemptible context causing the
kernel to crash and burn. The pre state machine code has a different
failure mode which is more subtle and resulting in a less obvious use after
free crash because the control side frees resources which are still in use
by the undead CPU.
But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.
The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
all architectures as the following architectures have either no hotplug
support at all or not all subarchitectures support it:
alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
Crashing the kernel in such a situation is not an acceptable state
either.
Implement a minimal rollback variant by limiting the teardown to the point
where all regular teardown callbacks have been invoked and leave the CPU in
the 'dead' idle state. This has the following consequences:
- the CPU is brought down to the point where the stop_machine takedown
would happen.
- the CPU stays there forever and is idle
- The CPU is cleared in the CPU active mask, but not in the CPU online
mask which is a legit state.
- Interrupts are not forced away from the CPU
- All facilities which only look at online mask would still see it, but
that is the case during normal hotplug/unplug operations as well. It's
just a (way) longer time frame.
This will expose issues, which haven't been exposed before or only seldom,
because now the normally transient state of being non active but online is
a permanent state. In testing this exposed already an issue vs. work queues
where the vmstat code schedules work on the almost dead CPU which ends up
in an unbound workqueue and triggers 'preemtible context' warnings. This is
not a problem of this change, it merily exposes an already existing issue.
Still this is better than crashing fully without a chance to debug it.
This is mainly thought as workaround for those architectures which do not
support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de
2019-03-26 19:36:05 +03:00
static inline bool can_rollback_cpu ( struct cpuhp_cpu_state * st )
{
if ( IS_ENABLED ( CONFIG_HOTPLUG_CPU ) )
return true ;
/*
* When CPU hotplug is disabled , then taking the CPU down is not
* possible because takedown_cpu ( ) and the architecture and
* subsystem specific mechanisms are not available . So the CPU
* which would be completely unplugged again needs to stay around
* in the current state .
*/
return st - > state < = CPUHP_BRINGUP_CPU ;
}
2016-02-26 21:43:37 +03:00
static int cpuhp_up_callbacks ( unsigned int cpu , struct cpuhp_cpu_state * st ,
2016-08-12 20:49:38 +03:00
enum cpuhp_state target )
2016-02-26 21:43:37 +03:00
{
enum cpuhp_state prev_state = st - > state ;
int ret = 0 ;
2021-02-16 13:35:06 +03:00
ret = cpuhp_invoke_callback_range ( true , cpu , st , target ) ;
if ( ret ) {
2021-04-09 08:53:16 +03:00
pr_debug ( " CPU UP failed (%d) CPU %u state %s (%d) \n " ,
ret , cpu , cpuhp_get_step ( st - > state ) - > name ,
st - > state ) ;
2022-04-11 18:22:32 +03:00
cpuhp_reset_state ( cpu , st , prev_state ) ;
2021-02-16 13:35:06 +03:00
if ( can_rollback_cpu ( st ) )
WARN_ON ( cpuhp_invoke_callback_range ( false , cpu , st ,
prev_state ) ) ;
2016-02-26 21:43:37 +03:00
}
return ret ;
}
2016-02-26 21:43:38 +03:00
/*
* The cpu hotplug threads manage the bringup and teardown of the cpus
*/
static int cpuhp_should_run ( unsigned int cpu )
{
struct cpuhp_cpu_state * st = this_cpu_ptr ( & cpuhp_state ) ;
return st - > should_run ;
}
/*
* Execute teardown / startup callbacks on the plugged cpu . Also used to invoke
* callbacks when a state gets [ un ] installed at runtime .
2017-09-20 20:00:17 +03:00
*
* Each invocation of this function by the smpboot thread does a single AP
* state callback .
*
* It has 3 modes of operation :
* - single : runs st - > cb_state
* - up : runs + + st - > state , while st - > state < st - > target
* - down : runs st - > state - - , while st - > state > st - > target
*
* When complete or on error , should_run is cleared and the completion is fired .
2016-02-26 21:43:38 +03:00
*/
static void cpuhp_thread_fun ( unsigned int cpu )
{
struct cpuhp_cpu_state * st = this_cpu_ptr ( & cpuhp_state ) ;
2017-09-20 20:00:17 +03:00
bool bringup = st - > bringup ;
enum cpuhp_state state ;
2016-02-26 21:43:38 +03:00
2018-09-05 08:52:07 +03:00
if ( WARN_ON_ONCE ( ! st - > should_run ) )
return ;
2016-02-26 21:43:38 +03:00
/*
2017-09-20 20:00:17 +03:00
* ACQUIRE for the cpuhp_should_run ( ) load of - > should_run . Ensures
* that if we see - > should_run we also see the rest of the state .
2016-02-26 21:43:38 +03:00
*/
smp_mb ( ) ;
2018-09-11 12:51:27 +03:00
/*
* The BP holds the hotplug lock , but we ' re now running on the AP ,
* ensure that anybody asserting the lock is held , will actually find
* it so .
*/
lockdep_acquire_cpus_lock ( ) ;
2017-09-20 20:00:20 +03:00
cpuhp_lock_acquire ( bringup ) ;
2017-09-20 20:00:17 +03:00
2016-08-12 20:49:38 +03:00
if ( st - > single ) {
2017-09-20 20:00:17 +03:00
state = st - > cb_state ;
st - > should_run = false ;
} else {
2021-02-16 13:35:06 +03:00
st - > should_run = cpuhp_next_state ( bringup , & state , st , st - > target ) ;
if ( ! st - > should_run )
goto end ;
2017-09-20 20:00:17 +03:00
}
WARN_ON_ONCE ( ! cpuhp_is_ap_state ( state ) ) ;
if ( cpuhp_is_atomic_state ( state ) ) {
local_irq_disable ( ) ;
st - > result = cpuhp_invoke_callback ( cpu , state , bringup , st - > node , & st - > last ) ;
local_irq_enable ( ) ;
2016-04-08 15:40:15 +03:00
2017-09-20 20:00:17 +03:00
/*
* STARTING / DYING must not fail !
*/
WARN_ON_ONCE ( st - > result ) ;
2016-02-26 21:43:38 +03:00
} else {
2017-09-20 20:00:17 +03:00
st - > result = cpuhp_invoke_callback ( cpu , state , bringup , st - > node , & st - > last ) ;
}
if ( st - > result ) {
/*
* If we fail on a rollback , we ' re up a creek without no
* paddle , no way forward , no way back . We loose , thanks for
* playing .
*/
WARN_ON_ONCE ( st - > rollback ) ;
st - > should_run = false ;
2016-02-26 21:43:38 +03:00
}
2017-09-20 20:00:17 +03:00
2021-02-16 13:35:06 +03:00
end :
2017-09-20 20:00:20 +03:00
cpuhp_lock_release ( bringup ) ;
2018-09-11 12:51:27 +03:00
lockdep_release_cpus_lock ( ) ;
2017-09-20 20:00:17 +03:00
if ( ! st - > should_run )
2017-09-20 20:00:19 +03:00
complete_ap_thread ( st , bringup ) ;
2016-02-26 21:43:38 +03:00
}
/* Invoke a single callback on a remote cpu */
2016-08-12 20:49:38 +03:00
static int
2016-08-12 20:49:39 +03:00
cpuhp_invoke_ap_callback ( int cpu , enum cpuhp_state state , bool bringup ,
struct hlist_node * node )
2016-02-26 21:43:38 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
2017-09-20 20:00:17 +03:00
int ret ;
2016-02-26 21:43:38 +03:00
if ( ! cpu_online ( cpu ) )
return 0 ;
2017-09-20 20:00:20 +03:00
cpuhp_lock_acquire ( false ) ;
cpuhp_lock_release ( false ) ;
cpuhp_lock_acquire ( true ) ;
cpuhp_lock_release ( true ) ;
2017-05-24 11:15:43 +03:00
2016-07-13 20:16:03 +03:00
/*
* If we are up and running , use the hotplug thread . For early calls
* we invoke the thread function directly .
*/
if ( ! st - > thread )
2017-09-20 20:00:16 +03:00
return cpuhp_invoke_callback ( cpu , state , bringup , node , NULL ) ;
2016-07-13 20:16:03 +03:00
2017-09-20 20:00:17 +03:00
st - > rollback = false ;
st - > last = NULL ;
st - > node = node ;
st - > bringup = bringup ;
2016-02-26 21:43:38 +03:00
st - > cb_state = state ;
2016-08-12 20:49:38 +03:00
st - > single = true ;
2017-09-20 20:00:17 +03:00
__cpuhp_kick_ap ( st ) ;
2016-02-26 21:43:38 +03:00
/*
2017-09-20 20:00:17 +03:00
* If we failed and did a partial , do a rollback .
2016-02-26 21:43:38 +03:00
*/
2017-09-20 20:00:17 +03:00
if ( ( ret = st - > result ) & & st - > last ) {
st - > rollback = true ;
st - > bringup = ! bringup ;
__cpuhp_kick_ap ( st ) ;
}
2017-10-21 17:06:52 +03:00
/*
* Clean up the leftovers so the next hotplug operation wont use stale
* data .
*/
st - > node = st - > last = NULL ;
2017-09-20 20:00:17 +03:00
return ret ;
2016-02-26 21:43:39 +03:00
}
static int cpuhp_kick_ap_work ( unsigned int cpu )
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
2017-09-20 20:00:17 +03:00
enum cpuhp_state prev_state = st - > state ;
int ret ;
2016-02-26 21:43:39 +03:00
2017-09-20 20:00:20 +03:00
cpuhp_lock_acquire ( false ) ;
cpuhp_lock_release ( false ) ;
cpuhp_lock_acquire ( true ) ;
cpuhp_lock_release ( true ) ;
2017-09-20 20:00:17 +03:00
trace_cpuhp_enter ( cpu , st - > target , prev_state , cpuhp_kick_ap_work ) ;
2022-04-11 18:22:32 +03:00
ret = cpuhp_kick_ap ( cpu , st , st - > target ) ;
2017-09-20 20:00:17 +03:00
trace_cpuhp_exit ( cpu , st - > state , prev_state , ret ) ;
return ret ;
2016-02-26 21:43:38 +03:00
}
static struct smp_hotplug_thread cpuhp_threads = {
. store = & cpuhp_state . thread ,
. thread_should_run = cpuhp_should_run ,
. thread_fn = cpuhp_thread_fun ,
. thread_comm = " cpuhp/%u " ,
. selfparking = true ,
} ;
2022-04-11 18:22:33 +03:00
static __init void cpuhp_init_state ( void )
{
struct cpuhp_cpu_state * st ;
int cpu ;
for_each_possible_cpu ( cpu ) {
st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
init_completion ( & st - > done_up ) ;
init_completion ( & st - > done_down ) ;
}
}
2016-02-26 21:43:38 +03:00
void __init cpuhp_threads_init ( void )
{
2022-04-11 18:22:33 +03:00
cpuhp_init_state ( ) ;
2016-02-26 21:43:38 +03:00
BUG_ON ( smpboot_register_percpu_thread ( & cpuhp_threads ) ) ;
kthread_unpark ( this_cpu_read ( cpuhp_state . thread ) ) ;
}
2021-03-28 00:01:36 +03:00
/*
*
* Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
* protected region .
*
* The operation is still serialized against concurrent CPU hotplug via
* cpu_add_remove_lock , i . e . CPU map protection . But it is _not_
* serialized against other hotplug related activity like adding or
* removing of state callbacks and state instances , which invoke either the
* startup or the teardown callback of the affected state .
*
* This is required for subsystems which are unfixable vs . CPU hotplug and
* evade lock inversion problems by scheduling work which has to be
* completed _before_ cpu_up ( ) / _cpu_down ( ) returns .
*
* Don ' t even think about adding anything to this for any new code or even
* drivers . It ' s only purpose is to keep existing lock order trainwrecks
* working .
*
* For cpu_down ( ) there might be valid reasons to finish cleanups which are
* not required to be done under cpu_hotplug_lock , but that ' s a different
* story and would be not invoked via this .
*/
static void cpu_up_down_serialize_trainwrecks ( bool tasks_frozen )
{
/*
* cpusets delegate hotplug operations to a worker to " solve " the
* lock order problems . Wait for the worker , but only if tasks are
* _not_ frozen ( suspend , hibernate ) as that would wait forever .
*
* The wait is required because otherwise the hotplug operation
* returns with inconsistent state , which could even be observed in
* user space when a new CPU is brought up . The CPU plug uevent
* would be delivered and user space reacting on it would fail to
* move tasks to the newly plugged CPU up to the point where the
* work has finished because up to that point the newly plugged CPU
* is not assignable in cpusets / cgroups . On unplug that ' s not
* necessarily a visible issue , but it is still inconsistent state ,
* which is the real problem which needs to be " fixed " . This can ' t
* prevent the transient state between scheduling the work and
* returning from waiting for it .
*/
if ( ! tasks_frozen )
cpuset_wait_for_hotplug ( ) ;
}
2016-12-07 16:54:38 +03:00
# ifdef CONFIG_HOTPLUG_CPU
2020-11-26 13:25:29 +03:00
# ifndef arch_clear_mm_cpumask_cpu
# define arch_clear_mm_cpumask_cpu(cpu, mm) cpumask_clear_cpu(cpu, mm_cpumask(mm))
# endif
2012-06-01 03:26:26 +04:00
/**
* clear_tasks_mm_cpumask - Safely clear tasks ' mm_cpumask for a CPU
* @ cpu : a CPU id
*
* This function walks all processes , finds a valid mm struct for each one and
* then clears a corresponding bit in mm ' s cpumask . While this all sounds
* trivial , there are various non - obvious corner cases , which this function
* tries to solve in a safe manner .
*
* Also note that the function uses a somewhat relaxed locking scheme , so it may
* be called only for an already offlined CPU .
*/
cpu: introduce clear_tasks_mm_cpumask() helper
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 03:26:22 +04:00
void clear_tasks_mm_cpumask ( int cpu )
{
struct task_struct * p ;
/*
* This function is called after the cpu is taken down and marked
* offline , so its not like new tasks will ever get this cpu set in
* their mm mask . - - Peter Zijlstra
* Thus , we may use rcu_read_lock ( ) here , instead of grabbing
* full - fledged tasklist_lock .
*/
2012-06-01 03:26:26 +04:00
WARN_ON ( cpu_online ( cpu ) ) ;
cpu: introduce clear_tasks_mm_cpumask() helper
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 03:26:22 +04:00
rcu_read_lock ( ) ;
for_each_process ( p ) {
struct task_struct * t ;
2012-06-01 03:26:26 +04:00
/*
* Main thread might exit , but other threads may still have
* a valid mm . Find one .
*/
cpu: introduce clear_tasks_mm_cpumask() helper
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 03:26:22 +04:00
t = find_lock_task_mm ( p ) ;
if ( ! t )
continue ;
2020-11-26 13:25:29 +03:00
arch_clear_mm_cpumask_cpu ( cpu , t - > mm ) ;
cpu: introduce clear_tasks_mm_cpumask() helper
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 03:26:22 +04:00
task_unlock ( t ) ;
}
rcu_read_unlock ( ) ;
}
2005-04-17 02:20:36 +04:00
/* Take this CPU down. */
2015-07-19 21:06:22 +03:00
static int take_cpu_down ( void * _param )
2005-04-17 02:20:36 +04:00
{
2016-02-26 21:43:29 +03:00
struct cpuhp_cpu_state * st = this_cpu_ptr ( & cpuhp_state ) ;
enum cpuhp_state target = max ( ( int ) st - > target , CPUHP_AP_OFFLINE ) ;
2016-02-26 21:43:23 +03:00
int err , cpu = smp_processor_id ( ) ;
2005-04-17 02:20:36 +04:00
/* Ensure this CPU doesn't handle any more interrupts. */
err = __cpu_disable ( ) ;
if ( err < 0 )
2005-06-26 01:54:50 +04:00
return err ;
2005-04-17 02:20:36 +04:00
2016-08-12 20:49:38 +03:00
/*
2021-02-16 13:35:06 +03:00
* Must be called from CPUHP_TEARDOWN_CPU , which means , as we are going
* down , that the current state is CPUHP_TEARDOWN_CPU - 1.
2016-08-12 20:49:38 +03:00
*/
2021-02-16 13:35:06 +03:00
WARN_ON ( st - > state ! = ( CPUHP_TEARDOWN_CPU - 1 ) ) ;
/*
2022-09-27 13:12:59 +03:00
* Invoke the former CPU_DYING callbacks . DYING must not fail !
2021-02-16 13:35:06 +03:00
*/
2022-09-27 13:12:59 +03:00
cpuhp_invoke_callback_range_nofail ( false , cpu , st , target ) ;
2016-02-26 21:43:29 +03:00
2015-04-03 03:37:24 +03:00
/* Give up timekeeping duties */
tick_handover_do_timer ( ) ;
2019-03-21 18:39:20 +03:00
/* Remove CPU from timer broadcasting */
tick_offline_cpu ( cpu ) ;
2013-01-31 16:11:14 +04:00
/* Park the stopper thread */
2016-02-26 21:43:23 +03:00
stop_machine_park ( cpu ) ;
2005-06-26 01:54:50 +04:00
return 0 ;
2005-04-17 02:20:36 +04:00
}
2016-02-26 21:43:25 +03:00
static int takedown_cpu ( unsigned int cpu )
2005-04-17 02:20:36 +04:00
{
2016-02-26 21:43:43 +03:00
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
2016-02-26 21:43:25 +03:00
int err ;
2005-04-17 02:20:36 +04:00
2016-03-10 22:42:08 +03:00
/* Park the smpboot threads */
2021-05-23 16:31:30 +03:00
kthread_park ( st - > thread ) ;
2016-02-26 21:43:39 +03:00
2013-10-11 16:38:20 +04:00
/*
2015-07-05 20:12:30 +03:00
* Prevent irq alloc / free while the dying cpu reorganizes the
* interrupt affinities .
2013-10-11 16:38:20 +04:00
*/
2015-07-05 20:12:30 +03:00
irq_lock_sparse ( ) ;
2013-10-11 16:38:20 +04:00
2015-07-05 20:12:30 +03:00
/*
* So now all preempt / rcu users must observe ! cpu_active ( ) .
*/
2017-05-24 11:15:28 +03:00
err = stop_machine_cpuslocked ( take_cpu_down , NULL , cpumask_of ( cpu ) ) ;
2008-07-28 21:16:29 +04:00
if ( err ) {
2016-04-08 15:40:15 +03:00
/* CPU refused to die */
2015-07-05 20:12:30 +03:00
irq_unlock_sparse ( ) ;
2016-04-08 15:40:15 +03:00
/* Unpark the hotplug thread so we can rollback there */
2021-05-23 16:31:30 +03:00
kthread_unpark ( st - > thread ) ;
2016-02-26 21:43:25 +03:00
return err ;
2006-10-28 21:38:57 +04:00
}
2008-07-28 21:16:29 +04:00
BUG_ON ( cpu_online ( cpu ) ) ;
2005-04-17 02:20:36 +04:00
2010-11-13 21:32:29 +03:00
/*
2017-12-06 13:59:11 +03:00
* The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
* all runnable tasks from the CPU , there ' s only the idle task left now
2010-11-13 21:32:29 +03:00
* that the migration thread is done doing the stop_machine thing .
2010-11-19 22:37:53 +03:00
*
* Wait for the stop thread to go away .
2010-11-13 21:32:29 +03:00
*/
2017-09-20 20:00:19 +03:00
wait_for_ap_thread ( st , false ) ;
2016-02-26 21:43:43 +03:00
BUG_ON ( st - > state ! = CPUHP_AP_IDLE_DEAD ) ;
2005-04-17 02:20:36 +04:00
2015-07-05 20:12:30 +03:00
/* Interrupts are moved away from the dying cpu, reenable alloc/free */
irq_unlock_sparse ( ) ;
2015-03-30 12:29:19 +03:00
hotplug_cpu__broadcast_tick_pull ( cpu ) ;
2005-04-17 02:20:36 +04:00
/* This actually kills the CPU. */
__cpu_die ( cpu ) ;
2015-04-03 03:38:05 +03:00
tick_cleanup_dead_cpu ( cpu ) ;
rcu: Migrate callbacks earlier in the CPU-offline timeline
RCU callbacks must be migrated away from an outgoing CPU, and this is
done near the end of the CPU-hotplug operation, after the outgoing CPU is
long gone. Unfortunately, this means that other CPU-hotplug callbacks
can execute while the outgoing CPU's callbacks are still immobilized
on the long-gone CPU's callback lists. If any of these CPU-hotplug
callbacks must wait, either directly or indirectly, for the invocation
of any of the immobilized RCU callbacks, the system will hang.
This commit avoids such hangs by migrating the callbacks away from the
outgoing CPU immediately upon its departure, shortly after the return
from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these
callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
callbacks to wait on these RCU callbacks without risk of a hang.
While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
dead code on the one hand and to avoid define-without-use warnings on the
other hand.
Reported-by: Jeffrey Hugo <jhugo@codeaurora.org>
Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
2017-06-20 22:11:34 +03:00
rcutree_migrate_callbacks ( cpu ) ;
2016-02-26 21:43:25 +03:00
return 0 ;
}
2005-04-17 02:20:36 +04:00
2016-03-03 12:52:10 +03:00
static void cpuhp_complete_idle_dead ( void * arg )
{
struct cpuhp_cpu_state * st = arg ;
2017-09-20 20:00:19 +03:00
complete_ap_thread ( st , false ) ;
2016-03-03 12:52:10 +03:00
}
2016-02-26 21:43:43 +03:00
void cpuhp_report_idle_dead ( void )
{
struct cpuhp_cpu_state * st = this_cpu_ptr ( & cpuhp_state ) ;
BUG_ON ( st - > state ! = CPUHP_AP_OFFLINE ) ;
2016-02-26 21:43:44 +03:00
rcu_report_dead ( smp_processor_id ( ) ) ;
2016-03-03 12:52:10 +03:00
st - > state = CPUHP_AP_IDLE_DEAD ;
/*
* We cannot call complete after rcu_report_dead ( ) so we delegate it
* to an online cpu .
*/
smp_call_function_single ( cpumask_first ( cpu_online_mask ) ,
cpuhp_complete_idle_dead , st , 0 ) ;
2016-02-26 21:43:43 +03:00
}
2017-09-20 20:00:17 +03:00
static int cpuhp_down_callbacks ( unsigned int cpu , struct cpuhp_cpu_state * st ,
enum cpuhp_state target )
{
enum cpuhp_state prev_state = st - > state ;
int ret = 0 ;
2021-02-16 13:35:06 +03:00
ret = cpuhp_invoke_callback_range ( false , cpu , st , target ) ;
if ( ret ) {
2021-04-09 08:53:16 +03:00
pr_debug ( " CPU DOWN failed (%d) CPU %u state %s (%d) \n " ,
ret , cpu , cpuhp_get_step ( st - > state ) - > name ,
st - > state ) ;
2021-02-16 13:35:06 +03:00
2022-04-11 18:22:32 +03:00
cpuhp_reset_state ( cpu , st , prev_state ) ;
2021-02-16 13:35:06 +03:00
if ( st - > state < prev_state )
WARN_ON ( cpuhp_invoke_callback_range ( true , cpu , st ,
prev_state ) ) ;
2017-09-20 20:00:17 +03:00
}
2021-02-16 13:35:06 +03:00
2017-09-20 20:00:17 +03:00
return ret ;
}
2016-02-26 21:43:28 +03:00
2016-02-26 21:43:25 +03:00
/* Requires cpu_add_remove_lock to be held */
2016-02-26 21:43:30 +03:00
static int __ref _cpu_down ( unsigned int cpu , int tasks_frozen ,
enum cpuhp_state target )
2016-02-26 21:43:25 +03:00
{
2016-02-26 21:43:28 +03:00
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
int prev_state , ret = 0 ;
2016-02-26 21:43:25 +03:00
if ( num_online_cpus ( ) = = 1 )
return - EBUSY ;
2016-02-26 21:43:32 +03:00
if ( ! cpu_present ( cpu ) )
2016-02-26 21:43:25 +03:00
return - EINVAL ;
2017-05-24 11:15:12 +03:00
cpus_write_lock ( ) ;
2016-02-26 21:43:25 +03:00
cpuhp_tasks_frozen = tasks_frozen ;
2022-04-11 18:22:32 +03:00
prev_state = cpuhp_set_state ( cpu , st , target ) ;
2016-02-26 21:43:39 +03:00
/*
* If the current CPU state is in the range of the AP hotplug thread ,
* then we need to kick the thread .
*/
2016-02-26 21:43:41 +03:00
if ( st - > state > CPUHP_TEARDOWN_CPU ) {
2017-09-20 20:00:17 +03:00
st - > target = max ( ( int ) target , CPUHP_TEARDOWN_CPU ) ;
2016-02-26 21:43:39 +03:00
ret = cpuhp_kick_ap_work ( cpu ) ;
/*
* The AP side has done the error rollback already . Just
* return the error code . .
*/
if ( ret )
goto out ;
/*
* We might have stopped still in the range of the AP hotplug
* thread . Nothing to do anymore .
*/
2016-02-26 21:43:41 +03:00
if ( st - > state > CPUHP_TEARDOWN_CPU )
2016-02-26 21:43:39 +03:00
goto out ;
2017-09-20 20:00:17 +03:00
st - > target = target ;
2016-02-26 21:43:39 +03:00
}
/*
2016-02-26 21:43:41 +03:00
* The AP brought itself down to CPUHP_TEARDOWN_CPU . So we need
2016-02-26 21:43:39 +03:00
* to do the further cleanups .
*/
2016-08-12 20:49:38 +03:00
ret = cpuhp_down_callbacks ( cpu , st , target ) ;
2021-02-16 13:35:05 +03:00
if ( ret & & st - > state < prev_state ) {
if ( st - > state = = CPUHP_TEARDOWN_CPU ) {
2022-04-11 18:22:32 +03:00
cpuhp_reset_state ( cpu , st , prev_state ) ;
2021-02-16 13:35:05 +03:00
__cpuhp_kick_ap ( st ) ;
} else {
WARN ( 1 , " DEAD callback error for CPU%d " , cpu ) ;
}
2016-04-08 15:40:15 +03:00
}
2016-02-26 21:43:25 +03:00
2016-02-26 21:43:39 +03:00
out :
2017-05-24 11:15:12 +03:00
cpus_write_unlock ( ) ;
2017-09-12 22:37:04 +03:00
/*
* Do post unplug cleanup . This is still protected against
* concurrent CPU hotplug via cpu_add_remove_lock .
*/
lockup_detector_cleanup ( ) ;
2018-11-25 21:33:39 +03:00
arch_smt_update ( ) ;
2021-03-28 00:01:36 +03:00
cpu_up_down_serialize_trainwrecks ( tasks_frozen ) ;
2016-02-26 21:43:28 +03:00
return ret ;
2006-09-26 10:32:48 +04:00
}
2018-05-29 18:49:05 +03:00
static int cpu_down_maps_locked ( unsigned int cpu , enum cpuhp_state target )
{
2022-04-06 02:29:33 +03:00
/*
* If the platform does not support hotplug , report it explicitly to
* differentiate it from a transient offlining failure .
*/
if ( cc_platform_has ( CC_ATTR_HOTPLUG_DISABLED ) )
return - EOPNOTSUPP ;
2018-05-29 18:49:05 +03:00
if ( cpu_hotplug_disabled )
return - EBUSY ;
return _cpu_down ( cpu , 0 , target ) ;
}
2020-03-23 16:51:10 +03:00
static int cpu_down ( unsigned int cpu , enum cpuhp_state target )
2006-09-26 10:32:48 +04:00
{
2008-12-22 14:36:30 +03:00
int err ;
2006-09-26 10:32:48 +04:00
2008-01-25 23:08:01 +03:00
cpu_maps_update_begin ( ) ;
2018-05-29 18:49:05 +03:00
err = cpu_down_maps_locked ( cpu , target ) ;
2008-01-25 23:08:01 +03:00
cpu_maps_update_done ( ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
2017-09-20 20:00:17 +03:00
2020-03-23 16:51:10 +03:00
/**
* cpu_device_down - Bring down a cpu device
* @ dev : Pointer to the cpu device to offline
*
* This function is meant to be used by device core cpu subsystem only .
*
* Other subsystems should use remove_cpu ( ) instead .
2021-08-10 01:38:25 +03:00
*
* Return : % 0 on success or a negative errno code
2020-03-23 16:51:10 +03:00
*/
int cpu_device_down ( struct device * dev )
2016-02-26 21:43:30 +03:00
{
2020-03-23 16:51:10 +03:00
return cpu_down ( dev - > id , CPUHP_OFFLINE ) ;
2016-02-26 21:43:30 +03:00
}
2017-09-20 20:00:17 +03:00
2020-03-23 16:50:54 +03:00
int remove_cpu ( unsigned int cpu )
{
int ret ;
lock_device_hotplug ( ) ;
ret = device_offline ( get_cpu_device ( cpu ) ) ;
unlock_device_hotplug ( ) ;
return ret ;
}
EXPORT_SYMBOL_GPL ( remove_cpu ) ;
2020-03-23 16:50:55 +03:00
void smp_shutdown_nonboot_cpus ( unsigned int primary_cpu )
{
unsigned int cpu ;
int error ;
cpu_maps_update_begin ( ) ;
/*
* Make certain the cpu I ' m about to reboot on is online .
*
* This is inline to what migrate_to_reboot_cpu ( ) already do .
*/
if ( ! cpu_online ( primary_cpu ) )
primary_cpu = cpumask_first ( cpu_online_mask ) ;
for_each_online_cpu ( cpu ) {
if ( cpu = = primary_cpu )
continue ;
error = cpu_down_maps_locked ( cpu , CPUHP_OFFLINE ) ;
if ( error ) {
pr_err ( " Failed to offline CPU%d - error=%d " ,
cpu , error ) ;
break ;
}
}
/*
* Ensure all but the reboot CPU are offline .
*/
BUG_ON ( num_online_cpus ( ) > 1 ) ;
/*
* Make sure the CPUs won ' t be enabled by someone else after this
* point . Kexec will reboot to a new kernel shortly resetting
* everything along the way .
*/
cpu_hotplug_disabled + + ;
cpu_maps_update_done ( ) ;
2016-02-26 21:43:30 +03:00
}
2017-09-20 20:00:17 +03:00
# else
# define takedown_cpu NULL
2005-04-17 02:20:36 +04:00
# endif /*CONFIG_HOTPLUG_CPU*/
2016-02-26 21:43:29 +03:00
/**
2016-08-18 15:57:16 +03:00
* notify_cpu_starting ( cpu ) - Invoke the callbacks on the starting CPU
2016-02-26 21:43:29 +03:00
* @ cpu : cpu that just started
*
* It must be called by the arch code on the new cpu , before the new cpu
* enables interrupts and before the " boot " cpu returns from __cpu_up ( ) .
*/
void notify_cpu_starting ( unsigned int cpu )
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
enum cpuhp_state target = min ( ( int ) st - > target , CPUHP_AP_ONLINE ) ;
2016-08-17 15:21:04 +03:00
rcu_cpu_starting ( cpu ) ; /* Enables RCU usage on this CPU. */
2019-07-22 21:47:16 +03:00
cpumask_set_cpu ( cpu , & cpus_booted_once_mask ) ;
2021-02-16 13:35:06 +03:00
/*
* STARTING must not fail !
*/
2022-09-27 13:12:59 +03:00
cpuhp_invoke_callback_range_nofail ( true , cpu , st , target ) ;
2016-02-26 21:43:29 +03:00
}
2016-02-26 21:43:35 +03:00
/*
2017-07-04 23:20:23 +03:00
* Called from the idle task . Wake up the controlling task which brings the
2019-12-10 11:34:54 +03:00
* hotplug thread of the upcoming CPU up and then delegates the rest of the
* online bringup to the hotplug thread .
2016-02-26 21:43:35 +03:00
*/
2016-02-26 21:43:41 +03:00
void cpuhp_online_idle ( enum cpuhp_state state )
2016-02-26 21:43:35 +03:00
{
2016-02-26 21:43:41 +03:00
struct cpuhp_cpu_state * st = this_cpu_ptr ( & cpuhp_state ) ;
/* Happens for the boot cpu */
if ( state ! = CPUHP_AP_ONLINE_IDLE )
return ;
2019-12-10 11:34:54 +03:00
/*
* Unpart the stopper thread before we start the idle loop ( and start
* scheduling ) ; this ensures the stopper task is always available .
*/
stop_machine_unpark ( smp_processor_id ( ) ) ;
2016-02-26 21:43:41 +03:00
st - > state = CPUHP_AP_ONLINE_IDLE ;
2017-09-20 20:00:19 +03:00
complete_ap_thread ( st , true ) ;
2016-02-26 21:43:35 +03:00
}
2006-09-26 10:32:48 +04:00
/* Requires cpu_add_remove_lock to be held */
2016-02-26 21:43:30 +03:00
static int _cpu_up ( unsigned int cpu , int tasks_frozen , enum cpuhp_state target )
2005-04-17 02:20:36 +04:00
{
2016-02-26 21:43:28 +03:00
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
2012-04-21 04:08:50 +04:00
struct task_struct * idle ;
2016-02-26 21:43:37 +03:00
int ret = 0 ;
2005-04-17 02:20:36 +04:00
2017-05-24 11:15:12 +03:00
cpus_write_lock ( ) ;
2012-04-20 17:05:44 +04:00
2016-02-26 21:43:32 +03:00
if ( ! cpu_present ( cpu ) ) {
2012-10-23 03:30:54 +04:00
ret = - EINVAL ;
goto out ;
}
2016-02-26 21:43:32 +03:00
/*
2020-03-23 16:51:10 +03:00
* The caller of cpu_up ( ) might have raced with another
* caller . Nothing to do .
2016-02-26 21:43:32 +03:00
*/
if ( st - > state > = target )
2012-04-20 17:05:44 +04:00
goto out ;
2016-02-26 21:43:32 +03:00
if ( st - > state = = CPUHP_OFFLINE ) {
/* Let it fail before we try to bring the cpu up */
idle = idle_thread_get ( cpu ) ;
if ( IS_ERR ( idle ) ) {
ret = PTR_ERR ( idle ) ;
goto out ;
}
2012-04-21 04:08:50 +04:00
}
2012-04-20 17:05:44 +04:00
2016-02-26 21:43:24 +03:00
cpuhp_tasks_frozen = tasks_frozen ;
2022-04-11 18:22:32 +03:00
cpuhp_set_state ( cpu , st , target ) ;
2016-02-26 21:43:39 +03:00
/*
* If the current CPU state is in the range of the AP hotplug thread ,
* then we need to kick the thread once more .
*/
2016-02-26 21:43:41 +03:00
if ( st - > state > CPUHP_BRINGUP_CPU ) {
2016-02-26 21:43:39 +03:00
ret = cpuhp_kick_ap_work ( cpu ) ;
/*
* The AP side has done the error rollback already . Just
* return the error code . .
*/
if ( ret )
goto out ;
}
/*
* Try to reach the target state . We max out on the BP at
2016-02-26 21:43:41 +03:00
* CPUHP_BRINGUP_CPU . After that the AP hotplug thread is
2016-02-26 21:43:39 +03:00
* responsible for bringing it up to the target state .
*/
2016-02-26 21:43:41 +03:00
target = min ( ( int ) target , CPUHP_BRINGUP_CPU ) ;
2016-08-12 20:49:38 +03:00
ret = cpuhp_up_callbacks ( cpu , st , target ) ;
2012-04-20 17:05:44 +04:00
out :
2017-05-24 11:15:12 +03:00
cpus_write_unlock ( ) ;
2018-11-25 21:33:39 +03:00
arch_smt_update ( ) ;
2021-03-28 00:01:36 +03:00
cpu_up_down_serialize_trainwrecks ( tasks_frozen ) ;
2006-09-26 10:32:48 +04:00
return ret ;
}
2020-03-23 16:51:10 +03:00
static int cpu_up ( unsigned int cpu , enum cpuhp_state target )
2006-09-26 10:32:48 +04:00
{
int err = 0 ;
2010-05-25 01:32:41 +04:00
2009-01-01 02:42:28 +03:00
if ( ! cpu_possible ( cpu ) ) {
2014-06-05 03:11:17 +04:00
pr_err ( " can't online cpu %d because it is not configured as may-hotadd at boot time \n " ,
cpu ) ;
2010-03-06 00:42:38 +03:00
# if defined(CONFIG_IA64)
2014-06-05 03:11:17 +04:00
pr_err ( " please check additional_cpus= boot parameter \n " ) ;
2007-10-19 10:40:47 +04:00
# endif
return - EINVAL ;
}
2006-09-26 10:32:48 +04:00
2013-11-13 03:07:25 +04:00
err = try_online_node ( cpu_to_node ( cpu ) ) ;
if ( err )
return err ;
2010-05-25 01:32:41 +04:00
2008-01-25 23:08:01 +03:00
cpu_maps_update_begin ( ) ;
2008-07-15 15:43:49 +04:00
if ( cpu_hotplug_disabled ) {
2006-09-26 10:32:48 +04:00
err = - EBUSY ;
2008-07-15 15:43:49 +04:00
goto out ;
}
2018-05-29 18:48:27 +03:00
if ( ! cpu_smt_allowed ( cpu ) ) {
err = - EPERM ;
goto out ;
}
2008-07-15 15:43:49 +04:00
2016-02-26 21:43:30 +03:00
err = _cpu_up ( cpu , 0 , target ) ;
2008-07-15 15:43:49 +04:00
out :
2008-01-25 23:08:01 +03:00
cpu_maps_update_done ( ) ;
2006-09-26 10:32:48 +04:00
return err ;
}
2016-02-26 21:43:30 +03:00
2020-03-23 16:51:10 +03:00
/**
* cpu_device_up - Bring up a cpu device
* @ dev : Pointer to the cpu device to online
*
* This function is meant to be used by device core cpu subsystem only .
*
* Other subsystems should use add_cpu ( ) instead .
2021-08-10 01:38:25 +03:00
*
* Return : % 0 on success or a negative errno code
2020-03-23 16:51:10 +03:00
*/
int cpu_device_up ( struct device * dev )
2016-02-26 21:43:30 +03:00
{
2020-03-23 16:51:10 +03:00
return cpu_up ( dev - > id , CPUHP_ONLINE ) ;
2016-02-26 21:43:30 +03:00
}
2006-09-26 10:32:48 +04:00
2020-03-23 16:50:54 +03:00
int add_cpu ( unsigned int cpu )
{
int ret ;
lock_device_hotplug ( ) ;
ret = device_online ( get_cpu_device ( cpu ) ) ;
unlock_device_hotplug ( ) ;
return ret ;
}
EXPORT_SYMBOL_GPL ( add_cpu ) ;
2020-03-23 16:51:01 +03:00
/**
* bringup_hibernate_cpu - Bring up the CPU that we hibernated on
* @ sleep_cpu : The cpu we hibernated on and should be brought up .
*
* On some architectures like arm64 , we can hibernate on any CPU , but on
* wake up the CPU we hibernated on might be offline as a side effect of
* using maxcpus = for example .
2021-08-10 01:38:25 +03:00
*
* Return : % 0 on success or a negative errno code
2020-03-23 16:51:01 +03:00
*/
int bringup_hibernate_cpu ( unsigned int sleep_cpu )
2016-02-26 21:43:30 +03:00
{
2020-03-23 16:51:01 +03:00
int ret ;
if ( ! cpu_online ( sleep_cpu ) ) {
pr_info ( " Hibernated on a CPU that is offline! Bringing CPU up. \n " ) ;
2020-03-23 16:51:10 +03:00
ret = cpu_up ( sleep_cpu , CPUHP_ONLINE ) ;
2020-03-23 16:51:01 +03:00
if ( ret ) {
pr_err ( " Failed to bring hibernate-CPU up! \n " ) ;
return ret ;
}
}
return 0 ;
}
2020-03-23 16:51:09 +03:00
void bringup_nonboot_cpus ( unsigned int setup_max_cpus )
{
unsigned int cpu ;
for_each_present_cpu ( cpu ) {
if ( num_online_cpus ( ) > = setup_max_cpus )
break ;
if ( ! cpu_online ( cpu ) )
2020-03-23 16:51:10 +03:00
cpu_up ( cpu , CPUHP_ONLINE ) ;
2020-03-23 16:51:09 +03:00
}
2016-02-26 21:43:30 +03:00
}
2006-09-26 10:32:48 +04:00
2007-08-31 10:56:29 +04:00
# ifdef CONFIG_PM_SLEEP_SMP
2009-01-01 02:42:28 +03:00
static cpumask_var_t frozen_cpus ;
2006-09-26 10:32:48 +04:00
2020-04-30 14:40:04 +03:00
int freeze_secondary_cpus ( int primary )
2006-09-26 10:32:48 +04:00
{
2016-08-17 15:50:25 +03:00
int cpu , error = 0 ;
2006-09-26 10:32:48 +04:00
2008-01-25 23:08:01 +03:00
cpu_maps_update_begin ( ) ;
2019-04-11 06:34:46 +03:00
if ( primary = = - 1 ) {
2016-08-17 15:50:25 +03:00
primary = cpumask_first ( cpu_online_mask ) ;
2022-02-07 18:59:06 +03:00
if ( ! housekeeping_cpu ( primary , HK_TYPE_TIMER ) )
primary = housekeeping_any_cpu ( HK_TYPE_TIMER ) ;
2019-04-11 06:34:46 +03:00
} else {
if ( ! cpu_online ( primary ) )
primary = cpumask_first ( cpu_online_mask ) ;
}
2009-12-16 20:04:32 +03:00
/*
* We take down all of the non - boot CPUs in one shot to avoid races
2006-09-26 10:32:48 +04:00
* with the userspace trying to use the CPU hotplug at the same time
*/
2009-01-01 02:42:28 +03:00
cpumask_clear ( frozen_cpus ) ;
2009-11-25 15:31:39 +03:00
2014-06-05 03:11:17 +04:00
pr_info ( " Disabling non-boot CPUs ... \n " ) ;
2006-09-26 10:32:48 +04:00
for_each_online_cpu ( cpu ) {
2016-08-17 15:50:25 +03:00
if ( cpu = = primary )
2006-09-26 10:32:48 +04:00
continue ;
2019-06-03 07:31:03 +03:00
2020-04-30 14:40:04 +03:00
if ( pm_wakeup_pending ( ) ) {
2019-06-03 07:31:03 +03:00
pr_info ( " Wakeup pending. Abort CPU freeze \n " ) ;
error = - EBUSY ;
break ;
}
2014-06-06 16:40:17 +04:00
trace_suspend_resume ( TPS ( " CPU_OFF " ) , cpu , true ) ;
2016-02-26 21:43:30 +03:00
error = _cpu_down ( cpu , 1 , CPUHP_OFFLINE ) ;
2014-06-06 16:40:17 +04:00
trace_suspend_resume ( TPS ( " CPU_OFF " ) , cpu , false ) ;
2009-11-18 03:22:13 +03:00
if ( ! error )
2009-01-01 02:42:28 +03:00
cpumask_set_cpu ( cpu , frozen_cpus ) ;
2009-11-18 03:22:13 +03:00
else {
2014-06-05 03:11:17 +04:00
pr_err ( " Error taking CPU%d down: %d \n " , cpu , error ) ;
2006-09-26 10:32:48 +04:00
break ;
}
}
2009-07-01 06:31:07 +04:00
2015-08-05 10:52:46 +03:00
if ( ! error )
2006-09-26 10:32:48 +04:00
BUG_ON ( num_online_cpus ( ) > 1 ) ;
2015-08-05 10:52:46 +03:00
else
2014-06-05 03:11:17 +04:00
pr_err ( " Non-boot CPUs are not disabled \n " ) ;
2015-08-05 10:52:46 +03:00
/*
* Make sure the CPUs won ' t be enabled by someone else . We need to do
2020-04-30 14:40:03 +03:00
* this even in case of failure as all freeze_secondary_cpus ( ) users are
* supposed to do thaw_secondary_cpus ( ) on the failure path .
2015-08-05 10:52:46 +03:00
*/
cpu_hotplug_disabled + + ;
2008-01-25 23:08:01 +03:00
cpu_maps_update_done ( ) ;
2006-09-26 10:32:48 +04:00
return error ;
}
2020-04-30 14:40:03 +03:00
void __weak arch_thaw_secondary_cpus_begin ( void )
2009-08-20 05:05:36 +04:00
{
}
2020-04-30 14:40:03 +03:00
void __weak arch_thaw_secondary_cpus_end ( void )
2009-08-20 05:05:36 +04:00
{
}
2020-04-30 14:40:03 +03:00
void thaw_secondary_cpus ( void )
2006-09-26 10:32:48 +04:00
{
int cpu , error ;
/* Allow everyone to use the CPU hotplug again */
2008-01-25 23:08:01 +03:00
cpu_maps_update_begin ( ) ;
2016-06-10 09:43:28 +03:00
__cpu_hotplug_enable ( ) ;
2009-01-01 02:42:28 +03:00
if ( cpumask_empty ( frozen_cpus ) )
2007-04-02 10:49:49 +04:00
goto out ;
2006-09-26 10:32:48 +04:00
2014-06-05 03:11:17 +04:00
pr_info ( " Enabling non-boot CPUs ... \n " ) ;
2009-08-20 05:05:36 +04:00
2020-04-30 14:40:03 +03:00
arch_thaw_secondary_cpus_begin ( ) ;
2009-08-20 05:05:36 +04:00
2009-01-01 02:42:28 +03:00
for_each_cpu ( cpu , frozen_cpus ) {
2014-06-06 16:40:17 +04:00
trace_suspend_resume ( TPS ( " CPU_ON " ) , cpu , true ) ;
2016-02-26 21:43:30 +03:00
error = _cpu_up ( cpu , 1 , CPUHP_ONLINE ) ;
2014-06-06 16:40:17 +04:00
trace_suspend_resume ( TPS ( " CPU_ON " ) , cpu , false ) ;
2006-09-26 10:32:48 +04:00
if ( ! error ) {
2014-06-05 03:11:17 +04:00
pr_info ( " CPU%d is up \n " , cpu ) ;
2006-09-26 10:32:48 +04:00
continue ;
}
2014-06-05 03:11:17 +04:00
pr_warn ( " Error taking CPU%d up: %d \n " , cpu , error ) ;
2006-09-26 10:32:48 +04:00
}
2009-08-20 05:05:36 +04:00
2020-04-30 14:40:03 +03:00
arch_thaw_secondary_cpus_end ( ) ;
2009-08-20 05:05:36 +04:00
2009-01-01 02:42:28 +03:00
cpumask_clear ( frozen_cpus ) ;
2007-04-02 10:49:49 +04:00
out :
2008-01-25 23:08:01 +03:00
cpu_maps_update_done ( ) ;
2005-04-17 02:20:36 +04:00
}
2009-01-01 02:42:28 +03:00
2011-11-16 00:59:31 +04:00
static int __init alloc_frozen_cpus ( void )
2009-01-01 02:42:28 +03:00
{
if ( ! alloc_cpumask_var ( & frozen_cpus , GFP_KERNEL | __GFP_ZERO ) )
return - ENOMEM ;
return 0 ;
}
core_initcall ( alloc_frozen_cpus ) ;
2011-11-03 03:59:25 +04:00
/*
* When callbacks for CPU hotplug notifications are being executed , we must
* ensure that the state of the system with respect to the tasks being frozen
* or not , as reported by the notification , remains unchanged * throughout the
* duration * of the execution of the callbacks .
* Hence we need to prevent the freezer from racing with regular CPU hotplug .
*
* This synchronization is implemented by mutually excluding regular CPU
* hotplug and Suspend / Hibernate call paths by hooking onto the Suspend /
* Hibernate notifications .
*/
static int
cpu_hotplug_pm_callback ( struct notifier_block * nb ,
unsigned long action , void * ptr )
{
switch ( action ) {
case PM_SUSPEND_PREPARE :
case PM_HIBERNATION_PREPARE :
2013-06-13 01:04:36 +04:00
cpu_hotplug_disable ( ) ;
2011-11-03 03:59:25 +04:00
break ;
case PM_POST_SUSPEND :
case PM_POST_HIBERNATION :
2013-06-13 01:04:36 +04:00
cpu_hotplug_enable ( ) ;
2011-11-03 03:59:25 +04:00
break ;
default :
return NOTIFY_DONE ;
}
return NOTIFY_OK ;
}
2011-11-16 00:59:31 +04:00
static int __init cpu_hotplug_pm_sync_init ( void )
2011-11-03 03:59:25 +04:00
{
2012-11-13 23:32:43 +04:00
/*
* cpu_hotplug_pm_callback has higher priority than x86
* bsp_pm_callback which depends on cpu_hotplug_pm_callback
* to disable cpu hotplug to avoid cpu hotplug race .
*/
2011-11-03 03:59:25 +04:00
pm_notifier ( cpu_hotplug_pm_callback , 0 ) ;
return 0 ;
}
core_initcall ( cpu_hotplug_pm_sync_init ) ;
2007-08-31 10:56:29 +04:00
# endif /* CONFIG_PM_SLEEP_SMP */
2008-05-29 22:17:02 +04:00
2017-03-20 14:26:55 +03:00
int __boot_cpu_id ;
2008-05-29 22:17:02 +04:00
# endif /* CONFIG_SMP */
2008-07-25 05:21:29 +04:00
2016-02-26 21:43:28 +03:00
/* Boot processor state steps */
2017-12-01 16:50:05 +03:00
static struct cpuhp_step cpuhp_hp_states [ ] = {
2016-02-26 21:43:28 +03:00
[ CPUHP_OFFLINE ] = {
. name = " offline " ,
2016-09-05 16:28:36 +03:00
. startup . single = NULL ,
. teardown . single = NULL ,
2016-02-26 21:43:28 +03:00
} ,
# ifdef CONFIG_SMP
[ CPUHP_CREATE_THREADS ] = {
2016-09-06 17:13:48 +03:00
. name = " threads:prepare " ,
2016-09-05 16:28:36 +03:00
. startup . single = smpboot_create_threads ,
. teardown . single = NULL ,
2016-02-26 21:43:32 +03:00
. cant_stop = true ,
2016-02-26 21:43:28 +03:00
} ,
2016-07-13 20:16:09 +03:00
[ CPUHP_PERF_PREPARE ] = {
2016-09-05 16:28:36 +03:00
. name = " perf:prepare " ,
. startup . single = perf_event_init_cpu ,
. teardown . single = perf_event_exit_cpu ,
2016-07-13 20:16:09 +03:00
} ,
random: clear fast pool, crng, and batches in cpuhp bring up
For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-14 00:48:04 +03:00
[ CPUHP_RANDOM_PREPARE ] = {
. name = " random:prepare " ,
. startup . single = random_prepare_cpu ,
. teardown . single = NULL ,
} ,
2016-07-13 20:16:29 +03:00
[ CPUHP_WORKQUEUE_PREP ] = {
2016-09-05 16:28:36 +03:00
. name = " workqueue:prepare " ,
. startup . single = workqueue_prepare_cpu ,
. teardown . single = NULL ,
2016-07-13 20:16:29 +03:00
} ,
2016-07-15 11:41:04 +03:00
[ CPUHP_HRTIMERS_PREPARE ] = {
2016-09-05 16:28:36 +03:00
. name = " hrtimers:prepare " ,
. startup . single = hrtimers_prepare_cpu ,
. teardown . single = hrtimers_dead_cpu ,
2016-07-15 11:41:04 +03:00
} ,
2016-07-13 20:17:01 +03:00
[ CPUHP_SMPCFD_PREPARE ] = {
2016-09-06 17:13:48 +03:00
. name = " smpcfd:prepare " ,
2016-09-05 16:28:36 +03:00
. startup . single = smpcfd_prepare_cpu ,
. teardown . single = smpcfd_dead_cpu ,
2016-07-13 20:17:01 +03:00
} ,
2016-08-18 15:57:17 +03:00
[ CPUHP_RELAY_PREPARE ] = {
. name = " relay:prepare " ,
. startup . single = relay_prepare_cpu ,
. teardown . single = NULL ,
} ,
2016-08-23 15:53:19 +03:00
[ CPUHP_SLAB_PREPARE ] = {
. name = " slab:prepare " ,
. startup . single = slab_prepare_cpu ,
. teardown . single = slab_dead_cpu ,
2016-07-13 20:17:01 +03:00
} ,
2016-07-13 20:17:03 +03:00
[ CPUHP_RCUTREE_PREP ] = {
2016-09-06 17:13:48 +03:00
. name = " RCU/tree:prepare " ,
2016-09-05 16:28:36 +03:00
. startup . single = rcutree_prepare_cpu ,
. teardown . single = rcutree_dead_cpu ,
2016-07-13 20:17:03 +03:00
} ,
2016-07-27 12:08:18 +03:00
/*
* On the tear - down path , timers_dead_cpu ( ) must be invoked
* before blk_mq_queue_reinit_notify ( ) from notify_dead ( ) ,
* otherwise a RCU stall occurs .
*/
2017-12-27 23:37:25 +03:00
[ CPUHP_TIMERS_PREPARE ] = {
2018-07-24 17:47:48 +03:00
. name = " timers:prepare " ,
2017-12-27 23:37:25 +03:00
. startup . single = timers_prepare_cpu ,
2016-09-05 16:28:36 +03:00
. teardown . single = timers_dead_cpu ,
2016-07-27 12:08:18 +03:00
} ,
2016-03-08 12:36:13 +03:00
/* Kicks the plugged cpu into life */
2016-02-26 21:43:28 +03:00
[ CPUHP_BRINGUP_CPU ] = {
. name = " cpu:bringup " ,
2016-09-05 16:28:36 +03:00
. startup . single = bringup_cpu ,
2020-04-02 00:40:33 +03:00
. teardown . single = finish_cpu ,
2016-02-26 21:43:32 +03:00
. cant_stop = true ,
2016-02-26 21:43:29 +03:00
} ,
2016-03-08 12:36:13 +03:00
/* Final state before CPU kills itself */
[ CPUHP_AP_IDLE_DEAD ] = {
. name = " idle:dead " ,
} ,
/*
* Last state before CPU enters the idle loop to die . Transient state
* for synchronization .
*/
[ CPUHP_AP_OFFLINE ] = {
. name = " ap:offline " ,
. cant_stop = true ,
} ,
2016-03-10 14:54:09 +03:00
/* First state is scheduler control. Interrupts are disabled */
[ CPUHP_AP_SCHED_STARTING ] = {
. name = " sched:starting " ,
2016-09-05 16:28:36 +03:00
. startup . single = sched_cpu_starting ,
. teardown . single = sched_cpu_dying ,
2016-03-10 14:54:09 +03:00
} ,
2016-07-13 20:17:03 +03:00
[ CPUHP_AP_RCUTREE_DYING ] = {
2016-09-06 17:13:48 +03:00
. name = " RCU/tree:dying " ,
2016-09-05 16:28:36 +03:00
. startup . single = NULL ,
. teardown . single = rcutree_dying_cpu ,
2016-02-26 21:43:29 +03:00
} ,
2017-11-28 16:19:53 +03:00
[ CPUHP_AP_SMPCFD_DYING ] = {
. name = " smpcfd:dying " ,
. startup . single = NULL ,
. teardown . single = smpcfd_dying_cpu ,
} ,
2016-03-08 12:36:13 +03:00
/* Entry state on starting. Interrupts enabled from here on. Transient
* state for synchronsization */
[ CPUHP_AP_ONLINE ] = {
. name = " ap:online " ,
} ,
2017-12-01 16:50:05 +03:00
/*
2020-09-16 10:27:18 +03:00
* Handled on control processor until the plugged processor manages
2017-12-01 16:50:05 +03:00
* this itself .
*/
[ CPUHP_TEARDOWN_CPU ] = {
. name = " cpu:teardown " ,
. startup . single = NULL ,
. teardown . single = takedown_cpu ,
. cant_stop = true ,
} ,
2020-09-16 10:27:18 +03:00
[ CPUHP_AP_SCHED_WAIT_EMPTY ] = {
. name = " sched:waitempty " ,
. startup . single = NULL ,
. teardown . single = sched_cpu_wait_empty ,
} ,
2016-03-08 12:36:13 +03:00
/* Handle smpboot threads park/unpark */
2016-02-26 21:43:39 +03:00
[ CPUHP_AP_SMPBOOT_THREADS ] = {
2016-09-06 17:13:48 +03:00
. name = " smpboot/threads:online " ,
2016-09-05 16:28:36 +03:00
. startup . single = smpboot_unpark_threads ,
2018-05-29 20:05:25 +03:00
. teardown . single = smpboot_park_threads ,
2016-02-26 21:43:39 +03:00
} ,
2017-06-20 02:37:51 +03:00
[ CPUHP_AP_IRQ_AFFINITY_ONLINE ] = {
. name = " irq/affinity:online " ,
. startup . single = irq_affinity_online_cpu ,
. teardown . single = NULL ,
} ,
2016-07-13 20:16:09 +03:00
[ CPUHP_AP_PERF_ONLINE ] = {
2016-09-05 16:28:36 +03:00
. name = " perf:online " ,
. startup . single = perf_event_init_cpu ,
. teardown . single = perf_event_exit_cpu ,
2016-07-13 20:16:09 +03:00
} ,
2018-06-07 11:52:03 +03:00
[ CPUHP_AP_WATCHDOG_ONLINE ] = {
. name = " lockup_detector:online " ,
. startup . single = lockup_detector_online_cpu ,
. teardown . single = lockup_detector_offline_cpu ,
} ,
2016-07-13 20:16:29 +03:00
[ CPUHP_AP_WORKQUEUE_ONLINE ] = {
2016-09-05 16:28:36 +03:00
. name = " workqueue:online " ,
. startup . single = workqueue_online_cpu ,
. teardown . single = workqueue_offline_cpu ,
2016-07-13 20:16:29 +03:00
} ,
random: clear fast pool, crng, and batches in cpuhp bring up
For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-14 00:48:04 +03:00
[ CPUHP_AP_RANDOM_ONLINE ] = {
. name = " random:online " ,
. startup . single = random_online_cpu ,
. teardown . single = NULL ,
} ,
2016-07-13 20:17:03 +03:00
[ CPUHP_AP_RCUTREE_ONLINE ] = {
2016-09-06 17:13:48 +03:00
. name = " RCU/tree:online " ,
2016-09-05 16:28:36 +03:00
. startup . single = rcutree_online_cpu ,
. teardown . single = rcutree_offline_cpu ,
2016-07-13 20:17:03 +03:00
} ,
2016-02-26 21:43:29 +03:00
# endif
2016-03-08 12:36:13 +03:00
/*
* The dynamically registered state space is here
*/
2016-03-10 14:54:19 +03:00
# ifdef CONFIG_SMP
/* Last state is scheduler control setting the cpu active */
[ CPUHP_AP_ACTIVE ] = {
. name = " sched:active " ,
2016-09-05 16:28:36 +03:00
. startup . single = sched_cpu_activate ,
. teardown . single = sched_cpu_deactivate ,
2016-03-10 14:54:19 +03:00
} ,
# endif
2016-03-08 12:36:13 +03:00
/* CPU is fully up and running. */
2016-02-26 21:43:29 +03:00
[ CPUHP_ONLINE ] = {
. name = " online " ,
2016-09-05 16:28:36 +03:00
. startup . single = NULL ,
. teardown . single = NULL ,
2016-02-26 21:43:29 +03:00
} ,
} ;
2016-02-26 21:43:33 +03:00
/* Sanity check for callbacks */
static int cpuhp_cb_check ( enum cpuhp_state state )
{
if ( state < = CPUHP_OFFLINE | | state > = CPUHP_ONLINE )
return - EINVAL ;
return 0 ;
}
2016-12-21 22:19:49 +03:00
/*
* Returns a free for dynamic slot assignment of the Online state . The states
* are protected by the cpuhp_slot_states mutex and an empty slot is identified
* by having no name assigned .
*/
static int cpuhp_reserve_state ( enum cpuhp_state state )
{
2017-01-10 16:01:05 +03:00
enum cpuhp_state i , end ;
struct cpuhp_step * step ;
2016-12-21 22:19:49 +03:00
2017-01-10 16:01:05 +03:00
switch ( state ) {
case CPUHP_AP_ONLINE_DYN :
2017-12-01 16:50:05 +03:00
step = cpuhp_hp_states + CPUHP_AP_ONLINE_DYN ;
2017-01-10 16:01:05 +03:00
end = CPUHP_AP_ONLINE_DYN_END ;
break ;
case CPUHP_BP_PREPARE_DYN :
2017-12-01 16:50:05 +03:00
step = cpuhp_hp_states + CPUHP_BP_PREPARE_DYN ;
2017-01-10 16:01:05 +03:00
end = CPUHP_BP_PREPARE_DYN_END ;
break ;
default :
return - EINVAL ;
}
for ( i = state ; i < = end ; i + + , step + + ) {
if ( ! step - > name )
2016-12-21 22:19:49 +03:00
return i ;
}
WARN ( 1 , " No more dynamic states available for CPU hotplug \n " ) ;
return - ENOSPC ;
}
static int cpuhp_store_callbacks ( enum cpuhp_state state , const char * name ,
int ( * startup ) ( unsigned int cpu ) ,
int ( * teardown ) ( unsigned int cpu ) ,
bool multi_instance )
2016-02-26 21:43:33 +03:00
{
/* (Un)Install the callbacks for further cpu hotplug operations */
struct cpuhp_step * sp ;
2016-12-21 22:19:49 +03:00
int ret = 0 ;
2016-02-26 21:43:33 +03:00
2017-07-20 01:36:00 +03:00
/*
* If name is NULL , then the state gets removed .
*
* CPUHP_AP_ONLINE_DYN and CPUHP_BP_PREPARE_DYN are handed out on
* the first allocation from these dynamic ranges , so the removal
* would trigger a new allocation and clear the wrong ( already
* empty ) state , leaving the callbacks of the to be cleared state
* dangling , which causes wreckage on the next hotplug operation .
*/
if ( name & & ( state = = CPUHP_AP_ONLINE_DYN | |
state = = CPUHP_BP_PREPARE_DYN ) ) {
2016-12-21 22:19:49 +03:00
ret = cpuhp_reserve_state ( state ) ;
if ( ret < 0 )
2017-03-14 18:06:45 +03:00
return ret ;
2016-12-21 22:19:49 +03:00
state = ret ;
}
2016-02-26 21:43:33 +03:00
sp = cpuhp_get_step ( state ) ;
2017-03-14 18:06:45 +03:00
if ( name & & sp - > name )
return - EBUSY ;
2016-09-05 16:28:36 +03:00
sp - > startup . single = startup ;
sp - > teardown . single = teardown ;
2016-02-26 21:43:33 +03:00
sp - > name = name ;
2016-08-12 20:49:39 +03:00
sp - > multi_instance = multi_instance ;
INIT_HLIST_HEAD ( & sp - > list ) ;
2016-12-21 22:19:49 +03:00
return ret ;
2016-02-26 21:43:33 +03:00
}
static void * cpuhp_get_teardown_cb ( enum cpuhp_state state )
{
2016-09-05 16:28:36 +03:00
return cpuhp_get_step ( state ) - > teardown . single ;
2016-02-26 21:43:33 +03:00
}
/*
* Call the startup / teardown function for a step either on the AP or
* on the current CPU .
*/
2016-08-12 20:49:39 +03:00
static int cpuhp_issue_call ( int cpu , enum cpuhp_state state , bool bringup ,
struct hlist_node * node )
2016-02-26 21:43:33 +03:00
{
2016-08-12 20:49:38 +03:00
struct cpuhp_step * sp = cpuhp_get_step ( state ) ;
2016-02-26 21:43:33 +03:00
int ret ;
2017-09-20 20:00:17 +03:00
/*
* If there ' s nothing to do , we done .
* Relies on the union for multi_instance .
*/
2021-02-16 13:35:06 +03:00
if ( cpuhp_step_empty ( bringup , sp ) )
2016-02-26 21:43:33 +03:00
return 0 ;
/*
* The non AP bound callbacks can fail on bringup . On teardown
* e . g . module removal we crash for now .
*/
2016-02-26 21:43:39 +03:00
# ifdef CONFIG_SMP
if ( cpuhp_is_ap_state ( state ) )
2016-08-12 20:49:39 +03:00
ret = cpuhp_invoke_ap_callback ( cpu , state , bringup , node ) ;
2016-02-26 21:43:39 +03:00
else
2017-09-20 20:00:16 +03:00
ret = cpuhp_invoke_callback ( cpu , state , bringup , node , NULL ) ;
2016-02-26 21:43:39 +03:00
# else
2017-09-20 20:00:16 +03:00
ret = cpuhp_invoke_callback ( cpu , state , bringup , node , NULL ) ;
2016-02-26 21:43:39 +03:00
# endif
2016-02-26 21:43:33 +03:00
BUG_ON ( ret & & ! bringup ) ;
return ret ;
}
/*
* Called from __cpuhp_setup_state on a recoverable failure .
*
* Note : The teardown callbacks for rollback are not allowed to fail !
*/
static void cpuhp_rollback_install ( int failedcpu , enum cpuhp_state state ,
2016-08-12 20:49:39 +03:00
struct hlist_node * node )
2016-02-26 21:43:33 +03:00
{
int cpu ;
/* Roll back the already executed steps on the other cpus */
for_each_present_cpu ( cpu ) {
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
int cpustate = st - > state ;
if ( cpu > = failedcpu )
break ;
/* Did we invoke the startup call on that cpu ? */
if ( cpustate > = state )
2016-08-12 20:49:39 +03:00
cpuhp_issue_call ( cpu , state , false , node ) ;
2016-02-26 21:43:33 +03:00
}
}
2017-05-24 11:15:15 +03:00
int __cpuhp_state_add_instance_cpuslocked ( enum cpuhp_state state ,
struct hlist_node * node ,
bool invoke )
2016-08-12 20:49:39 +03:00
{
struct cpuhp_step * sp ;
int cpu ;
int ret ;
2017-05-24 11:15:15 +03:00
lockdep_assert_cpus_held ( ) ;
2016-08-12 20:49:39 +03:00
sp = cpuhp_get_step ( state ) ;
if ( sp - > multi_instance = = false )
return - EINVAL ;
2017-03-14 18:06:45 +03:00
mutex_lock ( & cpuhp_state_mutex ) ;
2016-08-12 20:49:39 +03:00
2016-09-05 16:28:36 +03:00
if ( ! invoke | | ! sp - > startup . multi )
2016-08-12 20:49:39 +03:00
goto add_node ;
/*
* Try to call the startup callback for each present cpu
* depending on the hotplug state of the cpu .
*/
for_each_present_cpu ( cpu ) {
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
int cpustate = st - > state ;
if ( cpustate < state )
continue ;
ret = cpuhp_issue_call ( cpu , state , true , node ) ;
if ( ret ) {
2016-09-05 16:28:36 +03:00
if ( sp - > teardown . multi )
2016-08-12 20:49:39 +03:00
cpuhp_rollback_install ( cpu , state , node ) ;
2017-03-14 18:06:45 +03:00
goto unlock ;
2016-08-12 20:49:39 +03:00
}
}
add_node :
ret = 0 ;
hlist_add_head ( node , & sp - > list ) ;
2017-03-14 18:06:45 +03:00
unlock :
2016-08-12 20:49:39 +03:00
mutex_unlock ( & cpuhp_state_mutex ) ;
2017-05-24 11:15:15 +03:00
return ret ;
}
int __cpuhp_state_add_instance ( enum cpuhp_state state , struct hlist_node * node ,
bool invoke )
{
int ret ;
cpus_read_lock ( ) ;
ret = __cpuhp_state_add_instance_cpuslocked ( state , node , invoke ) ;
2017-05-24 11:15:12 +03:00
cpus_read_unlock ( ) ;
2016-08-12 20:49:39 +03:00
return ret ;
}
EXPORT_SYMBOL_GPL ( __cpuhp_state_add_instance ) ;
2016-02-26 21:43:33 +03:00
/**
2017-05-24 11:15:14 +03:00
* __cpuhp_setup_state_cpuslocked - Setup the callbacks for an hotplug machine state
2016-12-21 22:19:49 +03:00
* @ state : The state to setup
2021-06-05 09:30:03 +03:00
* @ name : Name of the step
2016-12-21 22:19:49 +03:00
* @ invoke : If true , the startup function is invoked for cpus where
* cpu state > = @ state
* @ startup : startup callback function
* @ teardown : teardown callback function
* @ multi_instance : State is set up for multiple instances which get
* added afterwards .
2016-02-26 21:43:33 +03:00
*
2017-05-24 11:15:14 +03:00
* The caller needs to hold cpus read locked while calling this function .
2021-08-10 01:38:25 +03:00
* Return :
2016-12-15 18:00:57 +03:00
* On success :
2021-08-10 01:38:25 +03:00
* Positive state number if @ state is CPUHP_AP_ONLINE_DYN ;
2016-12-15 18:00:57 +03:00
* 0 for all other states
* On failure : proper ( negative ) error code
2016-02-26 21:43:33 +03:00
*/
2017-05-24 11:15:14 +03:00
int __cpuhp_setup_state_cpuslocked ( enum cpuhp_state state ,
const char * name , bool invoke ,
int ( * startup ) ( unsigned int cpu ) ,
int ( * teardown ) ( unsigned int cpu ) ,
bool multi_instance )
2016-02-26 21:43:33 +03:00
{
int cpu , ret = 0 ;
2016-12-27 00:58:19 +03:00
bool dynstate ;
2016-02-26 21:43:33 +03:00
2017-05-24 11:15:14 +03:00
lockdep_assert_cpus_held ( ) ;
2016-02-26 21:43:33 +03:00
if ( cpuhp_cb_check ( state ) | | ! name )
return - EINVAL ;
2017-03-14 18:06:45 +03:00
mutex_lock ( & cpuhp_state_mutex ) ;
2016-02-26 21:43:33 +03:00
2016-12-21 22:19:49 +03:00
ret = cpuhp_store_callbacks ( state , name , startup , teardown ,
multi_instance ) ;
2016-02-26 21:43:33 +03:00
2016-12-27 00:58:19 +03:00
dynstate = state = = CPUHP_AP_ONLINE_DYN ;
if ( ret > 0 & & dynstate ) {
state = ret ;
ret = 0 ;
}
2016-12-21 22:19:49 +03:00
if ( ret | | ! invoke | | ! startup )
2016-02-26 21:43:33 +03:00
goto out ;
/*
* Try to call the startup callback for each present cpu
* depending on the hotplug state of the cpu .
*/
for_each_present_cpu ( cpu ) {
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
int cpustate = st - > state ;
if ( cpustate < state )
continue ;
2016-08-12 20:49:39 +03:00
ret = cpuhp_issue_call ( cpu , state , true , NULL ) ;
2016-02-26 21:43:33 +03:00
if ( ret ) {
2016-08-12 20:49:38 +03:00
if ( teardown )
2016-08-12 20:49:39 +03:00
cpuhp_rollback_install ( cpu , state , NULL ) ;
cpuhp_store_callbacks ( state , NULL , NULL , NULL , false ) ;
2016-02-26 21:43:33 +03:00
goto out ;
}
}
out :
2017-03-14 18:06:45 +03:00
mutex_unlock ( & cpuhp_state_mutex ) ;
2016-12-21 22:19:49 +03:00
/*
* If the requested state is CPUHP_AP_ONLINE_DYN , return the
* dynamically allocated state in case of success .
*/
2016-12-27 00:58:19 +03:00
if ( ! ret & & dynstate )
2016-02-26 21:43:33 +03:00
return state ;
return ret ;
}
2017-05-24 11:15:14 +03:00
EXPORT_SYMBOL ( __cpuhp_setup_state_cpuslocked ) ;
int __cpuhp_setup_state ( enum cpuhp_state state ,
const char * name , bool invoke ,
int ( * startup ) ( unsigned int cpu ) ,
int ( * teardown ) ( unsigned int cpu ) ,
bool multi_instance )
{
int ret ;
cpus_read_lock ( ) ;
ret = __cpuhp_setup_state_cpuslocked ( state , name , invoke , startup ,
teardown , multi_instance ) ;
cpus_read_unlock ( ) ;
return ret ;
}
2016-02-26 21:43:33 +03:00
EXPORT_SYMBOL ( __cpuhp_setup_state ) ;
2016-08-12 20:49:39 +03:00
int __cpuhp_state_remove_instance ( enum cpuhp_state state ,
struct hlist_node * node , bool invoke )
{
struct cpuhp_step * sp = cpuhp_get_step ( state ) ;
int cpu ;
BUG_ON ( cpuhp_cb_check ( state ) ) ;
if ( ! sp - > multi_instance )
return - EINVAL ;
2017-05-24 11:15:12 +03:00
cpus_read_lock ( ) ;
2017-03-14 18:06:45 +03:00
mutex_lock ( & cpuhp_state_mutex ) ;
2016-08-12 20:49:39 +03:00
if ( ! invoke | | ! cpuhp_get_teardown_cb ( state ) )
goto remove ;
/*
* Call the teardown callback for each present cpu depending
* on the hotplug state of the cpu . This function is not
* allowed to fail currently !
*/
for_each_present_cpu ( cpu ) {
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
int cpustate = st - > state ;
if ( cpustate > = state )
cpuhp_issue_call ( cpu , state , false , node ) ;
}
remove :
hlist_del ( node ) ;
mutex_unlock ( & cpuhp_state_mutex ) ;
2017-05-24 11:15:12 +03:00
cpus_read_unlock ( ) ;
2016-08-12 20:49:39 +03:00
return 0 ;
}
EXPORT_SYMBOL_GPL ( __cpuhp_state_remove_instance ) ;
2017-03-14 18:06:45 +03:00
2016-02-26 21:43:33 +03:00
/**
2017-05-24 11:15:14 +03:00
* __cpuhp_remove_state_cpuslocked - Remove the callbacks for an hotplug machine state
2016-02-26 21:43:33 +03:00
* @ state : The state to remove
* @ invoke : If true , the teardown function is invoked for cpus where
* cpu state > = @ state
*
2017-05-24 11:15:14 +03:00
* The caller needs to hold cpus read locked while calling this function .
2016-02-26 21:43:33 +03:00
* The teardown callback is currently not allowed to fail . Think
* about module removal !
*/
2017-05-24 11:15:14 +03:00
void __cpuhp_remove_state_cpuslocked ( enum cpuhp_state state , bool invoke )
2016-02-26 21:43:33 +03:00
{
2016-08-12 20:49:39 +03:00
struct cpuhp_step * sp = cpuhp_get_step ( state ) ;
2016-02-26 21:43:33 +03:00
int cpu ;
BUG_ON ( cpuhp_cb_check ( state ) ) ;
2017-05-24 11:15:14 +03:00
lockdep_assert_cpus_held ( ) ;
2016-02-26 21:43:33 +03:00
2017-03-14 18:06:45 +03:00
mutex_lock ( & cpuhp_state_mutex ) ;
2016-08-12 20:49:39 +03:00
if ( sp - > multi_instance ) {
WARN ( ! hlist_empty ( & sp - > list ) ,
" Error: Removing state %d which has instances left. \n " ,
state ) ;
goto remove ;
}
2016-08-12 20:49:38 +03:00
if ( ! invoke | | ! cpuhp_get_teardown_cb ( state ) )
2016-02-26 21:43:33 +03:00
goto remove ;
/*
* Call the teardown callback for each present cpu depending
* on the hotplug state of the cpu . This function is not
* allowed to fail currently !
*/
for_each_present_cpu ( cpu ) {
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , cpu ) ;
int cpustate = st - > state ;
if ( cpustate > = state )
2016-08-12 20:49:39 +03:00
cpuhp_issue_call ( cpu , state , false , NULL ) ;
2016-02-26 21:43:33 +03:00
}
remove :
2016-08-12 20:49:39 +03:00
cpuhp_store_callbacks ( state , NULL , NULL , NULL , false ) ;
2017-03-14 18:06:45 +03:00
mutex_unlock ( & cpuhp_state_mutex ) ;
2017-05-24 11:15:14 +03:00
}
EXPORT_SYMBOL ( __cpuhp_remove_state_cpuslocked ) ;
void __cpuhp_remove_state ( enum cpuhp_state state , bool invoke )
{
cpus_read_lock ( ) ;
__cpuhp_remove_state_cpuslocked ( state , invoke ) ;
2017-05-24 11:15:12 +03:00
cpus_read_unlock ( ) ;
2016-02-26 21:43:33 +03:00
}
EXPORT_SYMBOL ( __cpuhp_remove_state ) ;
2019-12-10 22:56:04 +03:00
# ifdef CONFIG_HOTPLUG_SMT
static void cpuhp_offline_cpu_device ( unsigned int cpu )
{
struct device * dev = get_cpu_device ( cpu ) ;
dev - > offline = true ;
/* Tell user space about the state change */
kobject_uevent ( & dev - > kobj , KOBJ_OFFLINE ) ;
}
static void cpuhp_online_cpu_device ( unsigned int cpu )
{
struct device * dev = get_cpu_device ( cpu ) ;
dev - > offline = false ;
/* Tell user space about the state change */
kobject_uevent ( & dev - > kobj , KOBJ_ONLINE ) ;
}
int cpuhp_smt_disable ( enum cpuhp_smt_control ctrlval )
{
int cpu , ret = 0 ;
cpu_maps_update_begin ( ) ;
for_each_online_cpu ( cpu ) {
if ( topology_is_primary_thread ( cpu ) )
continue ;
ret = cpu_down_maps_locked ( cpu , CPUHP_OFFLINE ) ;
if ( ret )
break ;
/*
* As this needs to hold the cpu maps lock it ' s impossible
* to call device_offline ( ) because that ends up calling
* cpu_down ( ) which takes cpu maps lock . cpu maps lock
* needs to be held as this might race against in kernel
* abusers of the hotplug machinery ( thermal management ) .
*
* So nothing would update device : offline state . That would
* leave the sysfs entry stale and prevent onlining after
* smt control has been changed to ' off ' again . This is
* called under the sysfs hotplug lock , so it is properly
* serialized against the regular offline usage .
*/
cpuhp_offline_cpu_device ( cpu ) ;
}
if ( ! ret )
cpu_smt_control = ctrlval ;
cpu_maps_update_done ( ) ;
return ret ;
}
int cpuhp_smt_enable ( void )
{
int cpu , ret = 0 ;
cpu_maps_update_begin ( ) ;
cpu_smt_control = CPU_SMT_ENABLED ;
for_each_present_cpu ( cpu ) {
/* Skip online CPUs and CPUs on offline nodes */
if ( cpu_online ( cpu ) | | ! node_online ( cpu_to_node ( cpu ) ) )
continue ;
ret = _cpu_up ( cpu , 0 , CPUHP_ONLINE ) ;
if ( ret )
break ;
/* See comment in cpuhp_smt_disable() */
cpuhp_online_cpu_device ( cpu ) ;
}
cpu_maps_update_done ( ) ;
return ret ;
}
# endif
2016-02-26 21:43:31 +03:00
# if defined(CONFIG_SYSFS) && defined(CONFIG_HOTPLUG_CPU)
2021-05-27 17:11:05 +03:00
static ssize_t state_show ( struct device * dev ,
struct device_attribute * attr , char * buf )
2016-02-26 21:43:31 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , dev - > id ) ;
return sprintf ( buf , " %d \n " , st - > state ) ;
}
2021-05-27 17:11:05 +03:00
static DEVICE_ATTR_RO ( state ) ;
2016-02-26 21:43:31 +03:00
2021-05-27 17:11:05 +03:00
static ssize_t target_store ( struct device * dev , struct device_attribute * attr ,
const char * buf , size_t count )
2016-02-26 21:43:32 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , dev - > id ) ;
struct cpuhp_step * sp ;
int target , ret ;
ret = kstrtoint ( buf , 10 , & target ) ;
if ( ret )
return ret ;
# ifdef CONFIG_CPU_HOTPLUG_STATE_CONTROL
if ( target < CPUHP_OFFLINE | | target > CPUHP_ONLINE )
return - EINVAL ;
# else
if ( target ! = CPUHP_OFFLINE & & target ! = CPUHP_ONLINE )
return - EINVAL ;
# endif
ret = lock_device_hotplug_sysfs ( ) ;
if ( ret )
return ret ;
mutex_lock ( & cpuhp_state_mutex ) ;
sp = cpuhp_get_step ( target ) ;
ret = ! sp - > name | | sp - > cant_stop ? - EINVAL : 0 ;
mutex_unlock ( & cpuhp_state_mutex ) ;
if ( ret )
2017-06-02 17:27:14 +03:00
goto out ;
2016-02-26 21:43:32 +03:00
if ( st - > state < target )
2020-03-23 16:51:10 +03:00
ret = cpu_up ( dev - > id , target ) ;
2022-11-17 19:23:28 +03:00
else if ( st - > state > target )
2020-03-23 16:51:10 +03:00
ret = cpu_down ( dev - > id , target ) ;
2022-11-17 19:23:28 +03:00
else if ( WARN_ON ( st - > target ! = target ) )
st - > target = target ;
2017-06-02 17:27:14 +03:00
out :
2016-02-26 21:43:32 +03:00
unlock_device_hotplug ( ) ;
return ret ? ret : count ;
}
2021-05-27 17:11:05 +03:00
static ssize_t target_show ( struct device * dev ,
struct device_attribute * attr , char * buf )
2016-02-26 21:43:31 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , dev - > id ) ;
return sprintf ( buf , " %d \n " , st - > target ) ;
}
2021-05-27 17:11:05 +03:00
static DEVICE_ATTR_RW ( target ) ;
2017-09-20 20:00:21 +03:00
2021-05-27 17:11:05 +03:00
static ssize_t fail_store ( struct device * dev , struct device_attribute * attr ,
const char * buf , size_t count )
2017-09-20 20:00:21 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , dev - > id ) ;
struct cpuhp_step * sp ;
int fail , ret ;
ret = kstrtoint ( buf , 10 , & fail ) ;
if ( ret )
return ret ;
2021-02-16 13:35:04 +03:00
if ( fail = = CPUHP_INVALID ) {
st - > fail = fail ;
return count ;
}
2019-06-27 05:47:32 +03:00
if ( fail < CPUHP_OFFLINE | | fail > CPUHP_ONLINE )
return - EINVAL ;
2017-09-20 20:00:21 +03:00
/*
* Cannot fail STARTING / DYING callbacks .
*/
if ( cpuhp_is_atomic_state ( fail ) )
return - EINVAL ;
2021-02-16 13:35:05 +03:00
/*
* DEAD callbacks cannot fail . . .
* . . . neither can CPUHP_BRINGUP_CPU during hotunplug . The latter
* triggering STARTING callbacks , a failure in this state would
* hinder rollback .
*/
if ( fail < = CPUHP_BRINGUP_CPU & & st - > state > CPUHP_BRINGUP_CPU )
return - EINVAL ;
2017-09-20 20:00:21 +03:00
/*
* Cannot fail anything that doesn ' t have callbacks .
*/
mutex_lock ( & cpuhp_state_mutex ) ;
sp = cpuhp_get_step ( fail ) ;
if ( ! sp - > startup . single & & ! sp - > teardown . single )
ret = - EINVAL ;
mutex_unlock ( & cpuhp_state_mutex ) ;
if ( ret )
return ret ;
st - > fail = fail ;
return count ;
}
2021-05-27 17:11:05 +03:00
static ssize_t fail_show ( struct device * dev ,
struct device_attribute * attr , char * buf )
2017-09-20 20:00:21 +03:00
{
struct cpuhp_cpu_state * st = per_cpu_ptr ( & cpuhp_state , dev - > id ) ;
return sprintf ( buf , " %d \n " , st - > fail ) ;
}
2021-05-27 17:11:05 +03:00
static DEVICE_ATTR_RW ( fail ) ;
2017-09-20 20:00:21 +03:00
2016-02-26 21:43:31 +03:00
static struct attribute * cpuhp_cpu_attrs [ ] = {
& dev_attr_state . attr ,
& dev_attr_target . attr ,
2017-09-20 20:00:21 +03:00
& dev_attr_fail . attr ,
2016-02-26 21:43:31 +03:00
NULL
} ;
2017-06-29 15:10:47 +03:00
static const struct attribute_group cpuhp_cpu_attr_group = {
2016-02-26 21:43:31 +03:00
. attrs = cpuhp_cpu_attrs ,
. name = " hotplug " ,
NULL
} ;
2021-05-27 17:11:05 +03:00
static ssize_t states_show ( struct device * dev ,
2016-02-26 21:43:31 +03:00
struct device_attribute * attr , char * buf )
{
ssize_t cur , res = 0 ;
int i ;
mutex_lock ( & cpuhp_state_mutex ) ;
2016-02-26 21:43:32 +03:00
for ( i = CPUHP_OFFLINE ; i < = CPUHP_ONLINE ; i + + ) {
2016-02-26 21:43:31 +03:00
struct cpuhp_step * sp = cpuhp_get_step ( i ) ;
if ( sp - > name ) {
cur = sprintf ( buf , " %3d: %s \n " , i , sp - > name ) ;
buf + = cur ;
res + = cur ;
}
}
mutex_unlock ( & cpuhp_state_mutex ) ;
return res ;
}
2021-05-27 17:11:05 +03:00
static DEVICE_ATTR_RO ( states ) ;
2016-02-26 21:43:31 +03:00
static struct attribute * cpuhp_cpu_root_attrs [ ] = {
& dev_attr_states . attr ,
NULL
} ;
2017-06-29 15:10:47 +03:00
static const struct attribute_group cpuhp_cpu_root_attr_group = {
2016-02-26 21:43:31 +03:00
. attrs = cpuhp_cpu_root_attrs ,
. name = " hotplug " ,
NULL
} ;
2018-05-29 18:48:27 +03:00
# ifdef CONFIG_HOTPLUG_SMT
static ssize_t
2019-03-27 15:00:29 +03:00
__store_smt_control ( struct device * dev , struct device_attribute * attr ,
const char * buf , size_t count )
2018-05-29 18:48:27 +03:00
{
int ctrlval , ret ;
if ( sysfs_streq ( buf , " on " ) )
ctrlval = CPU_SMT_ENABLED ;
else if ( sysfs_streq ( buf , " off " ) )
ctrlval = CPU_SMT_DISABLED ;
else if ( sysfs_streq ( buf , " forceoff " ) )
ctrlval = CPU_SMT_FORCE_DISABLED ;
else
return - EINVAL ;
if ( cpu_smt_control = = CPU_SMT_FORCE_DISABLED )
return - EPERM ;
if ( cpu_smt_control = = CPU_SMT_NOT_SUPPORTED )
return - ENODEV ;
ret = lock_device_hotplug_sysfs ( ) ;
if ( ret )
return ret ;
if ( ctrlval ! = cpu_smt_control ) {
switch ( ctrlval ) {
case CPU_SMT_ENABLED :
2018-07-07 12:40:18 +03:00
ret = cpuhp_smt_enable ( ) ;
2018-05-29 18:48:27 +03:00
break ;
case CPU_SMT_DISABLED :
case CPU_SMT_FORCE_DISABLED :
ret = cpuhp_smt_disable ( ctrlval ) ;
break ;
}
}
unlock_device_hotplug ( ) ;
return ret ? ret : count ;
}
2019-03-27 15:00:29 +03:00
# else /* !CONFIG_HOTPLUG_SMT */
static ssize_t
__store_smt_control ( struct device * dev , struct device_attribute * attr ,
const char * buf , size_t count )
{
return - ENODEV ;
}
# endif /* CONFIG_HOTPLUG_SMT */
static const char * smt_states [ ] = {
[ CPU_SMT_ENABLED ] = " on " ,
[ CPU_SMT_DISABLED ] = " off " ,
[ CPU_SMT_FORCE_DISABLED ] = " forceoff " ,
[ CPU_SMT_NOT_SUPPORTED ] = " notsupported " ,
[ CPU_SMT_NOT_IMPLEMENTED ] = " notimplemented " ,
} ;
2021-05-27 17:11:05 +03:00
static ssize_t control_show ( struct device * dev ,
struct device_attribute * attr , char * buf )
2019-03-27 15:00:29 +03:00
{
const char * state = smt_states [ cpu_smt_control ] ;
return snprintf ( buf , PAGE_SIZE - 2 , " %s \n " , state ) ;
}
2021-05-27 17:11:05 +03:00
static ssize_t control_store ( struct device * dev , struct device_attribute * attr ,
const char * buf , size_t count )
2019-03-27 15:00:29 +03:00
{
return __store_smt_control ( dev , attr , buf , count ) ;
}
2021-05-27 17:11:05 +03:00
static DEVICE_ATTR_RW ( control ) ;
2018-05-29 18:48:27 +03:00
2021-05-27 17:11:05 +03:00
static ssize_t active_show ( struct device * dev ,
struct device_attribute * attr , char * buf )
2018-05-29 18:48:27 +03:00
{
2019-03-27 15:00:29 +03:00
return snprintf ( buf , PAGE_SIZE - 2 , " %d \n " , sched_smt_active ( ) ) ;
2018-05-29 18:48:27 +03:00
}
2021-05-27 17:11:05 +03:00
static DEVICE_ATTR_RO ( active ) ;
2018-05-29 18:48:27 +03:00
static struct attribute * cpuhp_smt_attrs [ ] = {
& dev_attr_control . attr ,
& dev_attr_active . attr ,
NULL
} ;
static const struct attribute_group cpuhp_smt_attr_group = {
. attrs = cpuhp_smt_attrs ,
. name = " smt " ,
NULL
} ;
2019-03-27 15:00:29 +03:00
static int __init cpu_smt_sysfs_init ( void )
2018-05-29 18:48:27 +03:00
{
return sysfs_create_group ( & cpu_subsys . dev_root - > kobj ,
& cpuhp_smt_attr_group ) ;
}
2016-02-26 21:43:31 +03:00
static int __init cpuhp_sysfs_init ( void )
{
int cpu , ret ;
2019-03-27 15:00:29 +03:00
ret = cpu_smt_sysfs_init ( ) ;
2018-05-29 18:48:27 +03:00
if ( ret )
return ret ;
2016-02-26 21:43:31 +03:00
ret = sysfs_create_group ( & cpu_subsys . dev_root - > kobj ,
& cpuhp_cpu_root_attr_group ) ;
if ( ret )
return ret ;
for_each_possible_cpu ( cpu ) {
struct device * dev = get_cpu_device ( cpu ) ;
if ( ! dev )
continue ;
ret = sysfs_create_group ( & dev - > kobj , & cpuhp_cpu_attr_group ) ;
if ( ret )
return ret ;
}
return 0 ;
}
device_initcall ( cpuhp_sysfs_init ) ;
2019-03-27 15:00:29 +03:00
# endif /* CONFIG_SYSFS && CONFIG_HOTPLUG_CPU */
2016-02-26 21:43:31 +03:00
cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156
http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:32:33 +04:00
/*
* cpu_bit_bitmap [ ] is a special , " compressed " data structure that
* represents all NR_CPUS bits binary values of 1 < < nr .
*
2009-01-01 02:42:28 +03:00
* It is used by cpumask_of ( ) to get a constant address to a CPU
cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156
http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:32:33 +04:00
* mask value that has a single bit set only .
*/
2008-07-25 05:21:29 +04:00
cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156
http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:32:33 +04:00
/* cpu_bit_bitmap[0] is empty - so we can back into it */
2011-03-23 02:34:07 +03:00
# define MASK_DECLARE_1(x) [x+1][0] = (1UL << (x))
cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156
http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:32:33 +04:00
# define MASK_DECLARE_2(x) MASK_DECLARE_1(x), MASK_DECLARE_1(x+1)
# define MASK_DECLARE_4(x) MASK_DECLARE_2(x), MASK_DECLARE_2(x+2)
# define MASK_DECLARE_8(x) MASK_DECLARE_4(x), MASK_DECLARE_4(x+4)
2008-07-25 05:21:29 +04:00
cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156
http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:32:33 +04:00
const unsigned long cpu_bit_bitmap [ BITS_PER_LONG + 1 ] [ BITS_TO_LONGS ( NR_CPUS ) ] = {
MASK_DECLARE_8 ( 0 ) , MASK_DECLARE_8 ( 8 ) ,
MASK_DECLARE_8 ( 16 ) , MASK_DECLARE_8 ( 24 ) ,
# if BITS_PER_LONG > 32
MASK_DECLARE_8 ( 32 ) , MASK_DECLARE_8 ( 40 ) ,
MASK_DECLARE_8 ( 48 ) , MASK_DECLARE_8 ( 56 ) ,
2008-07-25 05:21:29 +04:00
# endif
} ;
cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156
http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:32:33 +04:00
EXPORT_SYMBOL_GPL ( cpu_bit_bitmap ) ;
2008-11-05 05:39:10 +03:00
const DECLARE_BITMAP ( cpu_all_bits , NR_CPUS ) = CPU_BITS_ALL ;
EXPORT_SYMBOL ( cpu_all_bits ) ;
2008-12-30 01:35:14 +03:00
# ifdef CONFIG_INIT_ALL_POSSIBLE
2016-01-21 02:00:19 +03:00
struct cpumask __cpu_possible_mask __read_mostly
2016-01-21 02:00:16 +03:00
= { CPU_BITS_ALL } ;
2008-12-30 01:35:14 +03:00
# else
2016-01-21 02:00:19 +03:00
struct cpumask __cpu_possible_mask __read_mostly ;
2008-12-30 01:35:14 +03:00
# endif
2016-01-21 02:00:19 +03:00
EXPORT_SYMBOL ( __cpu_possible_mask ) ;
2008-12-30 01:35:14 +03:00
2016-01-21 02:00:19 +03:00
struct cpumask __cpu_online_mask __read_mostly ;
EXPORT_SYMBOL ( __cpu_online_mask ) ;
2008-12-30 01:35:14 +03:00
2016-01-21 02:00:19 +03:00
struct cpumask __cpu_present_mask __read_mostly ;
EXPORT_SYMBOL ( __cpu_present_mask ) ;
2008-12-30 01:35:14 +03:00
2016-01-21 02:00:19 +03:00
struct cpumask __cpu_active_mask __read_mostly ;
EXPORT_SYMBOL ( __cpu_active_mask ) ;
2008-12-30 01:35:16 +03:00
2021-01-19 20:43:45 +03:00
struct cpumask __cpu_dying_mask __read_mostly ;
EXPORT_SYMBOL ( __cpu_dying_mask ) ;
2019-07-09 17:23:40 +03:00
atomic_t __num_online_cpus __read_mostly ;
EXPORT_SYMBOL ( __num_online_cpus ) ;
2008-12-30 01:35:16 +03:00
void init_cpu_present ( const struct cpumask * src )
{
2016-01-21 02:00:16 +03:00
cpumask_copy ( & __cpu_present_mask , src ) ;
2008-12-30 01:35:16 +03:00
}
void init_cpu_possible ( const struct cpumask * src )
{
2016-01-21 02:00:16 +03:00
cpumask_copy ( & __cpu_possible_mask , src ) ;
2008-12-30 01:35:16 +03:00
}
void init_cpu_online ( const struct cpumask * src )
{
2016-01-21 02:00:16 +03:00
cpumask_copy ( & __cpu_online_mask , src ) ;
2008-12-30 01:35:16 +03:00
}
2016-02-26 21:43:28 +03:00
2019-07-09 17:23:40 +03:00
void set_cpu_online ( unsigned int cpu , bool online )
{
/*
* atomic_inc / dec ( ) is required to handle the horrid abuse of this
* function by the reboot and kexec code which invoke it from
* IPI / NMI broadcasts when shutting down CPUs . Invocation from
* regular CPU hotplug is properly serialized .
*
* Note , that the fact that __num_online_cpus is of type atomic_t
* does not protect readers which are not serialized against
* concurrent hotplug operations .
*/
if ( online ) {
if ( ! cpumask_test_and_set_cpu ( cpu , & __cpu_online_mask ) )
atomic_inc ( & __num_online_cpus ) ;
} else {
if ( cpumask_test_and_clear_cpu ( cpu , & __cpu_online_mask ) )
atomic_dec ( & __num_online_cpus ) ;
}
}
2016-02-26 21:43:28 +03:00
/*
* Activate the first processor .
*/
void __init boot_cpu_init ( void )
{
int cpu = smp_processor_id ( ) ;
/* Mark the boot cpu "present", "online" etc for SMP and UP case */
set_cpu_online ( cpu , true ) ;
set_cpu_active ( cpu , true ) ;
set_cpu_present ( cpu , true ) ;
set_cpu_possible ( cpu , true ) ;
2017-03-20 14:26:55 +03:00
# ifdef CONFIG_SMP
__boot_cpu_id = cpu ;
# endif
2016-02-26 21:43:28 +03:00
}
/*
* Must be called _AFTER_ setting up the per_cpu areas
*/
init: rename and re-order boot_cpu_state_init()
This is purely a preparatory patch for upcoming changes during the 4.19
merge window.
We have a function called "boot_cpu_state_init()" that isn't really
about the bootup cpu state: that is done much earlier by the similarly
named "boot_cpu_init()" (note lack of "state" in name).
This function initializes some hotplug CPU state, and needs to run after
the percpu data has been properly initialized. It even has a comment to
that effect.
Except it _doesn't_ actually run after the percpu data has been properly
initialized. On x86 it happens to do that, but on at least arm and
arm64, the percpu base pointers are initialized by the arch-specific
'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
This had some unexpected results, and in particular we have a patch
pending for the merge window that did the obvious cleanup of using
'this_cpu_write()' in the cpu hotplug init code:
- per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
+ this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
which is obviously the right thing to do. Except because of the
ordering issue, it actually failed miserably and unexpectedly on arm64.
So this just fixes the ordering, and changes the name of the function to
be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
hotplug state, because the core CPU state was supposed to have already
been done earlier.
Marked for stable, since the (not yet merged) patch that will show this
problem is marked for stable.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-12 22:19:42 +03:00
void __init boot_cpu_hotplug_init ( void )
2016-02-26 21:43:28 +03:00
{
2018-08-15 00:26:00 +03:00
# ifdef CONFIG_SMP
2019-07-22 21:47:16 +03:00
cpumask_set_cpu ( smp_processor_id ( ) , & cpus_booted_once_mask ) ;
2018-08-15 00:26:00 +03:00
# endif
2018-06-29 17:05:48 +03:00
this_cpu_write ( cpuhp_state . state , CPUHP_ONLINE ) ;
2022-11-17 19:23:29 +03:00
this_cpu_write ( cpuhp_state . target , CPUHP_ONLINE ) ;
2016-02-26 21:43:28 +03:00
}
2019-04-12 23:39:28 +03:00
2019-11-04 14:22:02 +03:00
/*
* These are used for a global " mitigations= " cmdline option for toggling
* optional CPU mitigations .
*/
enum cpu_mitigations {
CPU_MITIGATIONS_OFF ,
CPU_MITIGATIONS_AUTO ,
CPU_MITIGATIONS_AUTO_NOSMT ,
} ;
static enum cpu_mitigations cpu_mitigations __ro_after_init =
CPU_MITIGATIONS_AUTO ;
2019-04-12 23:39:28 +03:00
static int __init mitigations_parse_cmdline ( char * arg )
{
if ( ! strcmp ( arg , " off " ) )
cpu_mitigations = CPU_MITIGATIONS_OFF ;
else if ( ! strcmp ( arg , " auto " ) )
cpu_mitigations = CPU_MITIGATIONS_AUTO ;
else if ( ! strcmp ( arg , " auto,nosmt " ) )
cpu_mitigations = CPU_MITIGATIONS_AUTO_NOSMT ;
2019-05-16 10:09:35 +03:00
else
pr_crit ( " Unsupported mitigations=%s, system may still be vulnerable \n " ,
arg ) ;
2019-04-12 23:39:28 +03:00
return 0 ;
}
early_param ( " mitigations " , mitigations_parse_cmdline ) ;
2019-11-04 14:22:02 +03:00
/* mitigations=off */
bool cpu_mitigations_off ( void )
{
return cpu_mitigations = = CPU_MITIGATIONS_OFF ;
}
EXPORT_SYMBOL_GPL ( cpu_mitigations_off ) ;
/* mitigations=auto,nosmt */
bool cpu_mitigations_auto_nosmt ( void )
{
return cpu_mitigations = = CPU_MITIGATIONS_AUTO_NOSMT ;
}
EXPORT_SYMBOL_GPL ( cpu_mitigations_auto_nosmt ) ;