linux/kernel
Ben Zhang 62572e29bc kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one
I ran into a scenario where while one cpu was stuck and should have
panic'd because of the NMI watchdog, it didn't.  The reason was another
cpu was spewing stack dumps on to the console.  Upon investigation, I
noticed that when writing to the console and also when dumping the
stack, the watchdog is touched.

This causes all the cpus to reset their NMI watchdog flags and the
'stuck' cpu just spins forever.

This change causes the semantics of touch_nmi_watchdog to be changed
slightly.  Previously, I accidentally changed the semantics and we
noticed there was a codepath in which touch_nmi_watchdog could be
touched from a preemtible area.  That caused a BUG() to happen when
CONFIG_DEBUG_PREEMPT was enabled.  I believe it was the acpi code.

My attempt here re-introduces the change to have the
touch_nmi_watchdog() code only touch the local cpu instead of all of the
cpus.  But instead of using __get_cpu_var(), I use the
__raw_get_cpu_var() version.

This avoids the preemption problem.  However my reasoning wasn't because
I was trying to be lazy.  Instead I rationalized it as, well if
preemption is enabled then interrupts should be enabled to and the NMI
watchdog will have no reason to trigger.  So it won't matter if the
wrong cpu is touched because the percpu interrupt counters the NMI
watchdog uses should still be incrementing.

Don said:

: I'm ok with this patch, though it does alter the behaviour of how
: touch_nmi_watchdog works.  For the most part I don't think most callers
: need to touch all of the watchdogs (on each cpu).  Perhaps a corner case
: will pop up (the scheduler??  to mimic touch_all_softlockup_watchdogs() ).
:
: But this does address an issue where if a system is locked up and one cpu
: is spewing out useful debug messages (or error messages), the hard lockup
: will fail to go off.  We have seen this on RHEL also.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Ben Zhang <benzh@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:58 -07:00
..
debug KGDB: make kgdb_breakpoint() as noinline 2014-02-26 11:16:25 +00:00
events perf: Optimize group_sched_in() 2014-02-27 12:43:26 +01:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq genirq: Export symbol no_action() 2014-03-22 11:33:09 +01:00
locking Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
power Merge branches 'pm-runtime' and 'pm-sleep' 2014-03-20 13:25:54 +01:00
printk printk: fix syslog() overflowing user buffer 2014-02-17 12:24:45 -08:00
rcu Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 11:21:19 -07:00
sched Merge branch 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 16:22:27 -07:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
trace Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block 2014-04-01 19:19:15 -07:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
acct.c
async.c
audit_tree.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit_watch.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit.c AUDIT: Allow login in non-init namespaces 2014-03-30 17:02:53 -07:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-02-28 04:04:33 -08:00
auditfilter.c audit: Update kdoc for audit_send_reply and audit_list_rules_send 2014-03-08 15:31:54 -08:00
auditsc.c execve: use 'struct filename *' for executable name passing 2014-02-05 12:54:53 -08:00
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-04-03 09:26:18 -07:00
cgroup_freezer.c cgroup: replace cftype->read_seq_string() with cftype->seq_show() 2013-12-05 12:28:04 -05:00
cgroup.c cgroup: fix a failure path in create_css() 2014-03-18 17:15:36 -04:00
compat.c Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 12:51:41 -07:00
configs.c
context_tracking.c context_tracking: Wrap static key check into more intuitive function name 2013-12-02 20:43:14 +01:00
cpu_pm.c
cpu.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:11 +09:00
cpuset.c cpuset: fix a race condition in __cpuset_node_allowed_softwall() 2014-02-27 09:39:54 -05:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c introduce for_each_thread() to replace the buggy while_each_thread() 2014-01-21 16:19:46 -08:00
extable.c asmlinkage: Make main_extable_sort_needed visible 2014-02-13 18:13:22 -08:00
fork.c sched/numa: Move task_numa_free() to __put_task_struct() 2014-03-11 12:05:43 +01:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex_compat.c compat: Get rid of (get|put)_compat_time(val|spec) 2014-02-02 14:09:12 -08:00
futex.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 10:59:39 -07:00
groups.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
hrtimer.c timer: Remove code redundancy while calling get_nohz_timer_target() 2014-03-20 12:35:46 +01:00
hung_task.c hung_task: Display every hung task warning 2014-01-25 12:13:33 +01:00
irq_work.c perf/x86: Warn to early_printk() in case irq_work is too slow 2014-02-21 21:49:07 +01:00
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
kexec.c kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types 2014-03-06 16:30:46 +01:00
kmod.c execve: use 'struct filename *' for executable name passing 2014-02-05 12:54:53 -08:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c 2014-02-26 06:35:16 -08:00
kthread.c kthread: ensure locality of task_struct allocations 2014-04-03 16:20:49 -07:00
latencytop.c
Makefile Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
notifier.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() 2014-02-26 06:35:13 -08:00
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
padata.c padata: Fix wrong usage of rcu_dereference() 2013-12-05 21:28:42 +08:00
panic.c Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
params.c params: improve standard definitions 2013-12-04 14:09:46 +10:30
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
posix-cpu-timers.c posix-timers: Convert abuses of BUG_ON to WARN_ON 2013-12-09 16:56:29 +01:00
posix-timers.c
profile.c mm: fix GFP_THISNODE callers and clarify 2014-03-10 17:26:19 -07:00
ptrace.c kernel/compat: convert to COMPAT_SYSCALL_DEFINE 2014-03-06 15:35:10 +01:00
range.c
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c treewide: Fix typo in Documentation/DocBook 2014-02-19 14:58:17 +01:00
res_counter.c memcg: reduce function dereference 2013-09-12 15:38:02 -07:00
resource.c resources: Set type in __request_region() 2014-03-19 15:00:16 -06:00
seccomp.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-04-03 09:26:18 -07:00
signal.c Merge branch 'master' into for-next 2014-02-20 14:54:28 +01:00
smp.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
smpboot.c
smpboot.h
softirq.c softirq: Add linux/irq.h to make it compile again 2014-03-19 11:28:14 +01:00
stacktrace.c
stop_machine.c stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-03-11 11:33:47 +01:00
sys_ni.c
sys.c sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE 2014-02-22 18:16:19 +01:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
sysctl.c Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block 2014-04-01 19:19:15 -07:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c task_work: documentation 2013-09-11 15:58:27 -07:00
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
torture.c rcutorture: Gracefully handle NULL cleanup hooks 2014-02-23 09:04:39 -08:00
tracepoint.c tracing: Do not add event files for modules that fail tracepoints 2014-03-03 21:11:05 -05:00
tsacct.c
uid16.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user_namespace.c user_namespace.c: Remove duplicated word in comment 2014-02-20 11:58:35 -08:00
user-return-notifier.c
user.c KEYS: fix uninitialized persistent_keyring_register_sem 2013-12-13 15:59:11 +00:00
utsname_sysctl.c
utsname.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
watchdog.c kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one 2014-04-03 16:20:58 -07:00
workqueue_internal.h
workqueue.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00