linux/kernel/time
Frederic Weisbecker 76a33061b9 irq_work: Force raised irq work to run on irq work interrupt
The nohz full kick, which restarts the tick when any resource depend
on it, can't be executed anywhere given the operation it does on timers.
If it is called from the scheduler or timers code, chances are that
we run into a deadlock.

This is why we run the nohz full kick from an irq work. That way we make
sure that the kick runs on a virgin context.

However if that's the case when irq work runs in its own dedicated
self-ipi, things are different for the big bunch of archs that don't
support the self triggered way. In order to support them, irq works are
also handled by the timer interrupt as fallback.

Now when irq works run on the timer interrupt, the context isn't blank.
More precisely, they can run in the context of the hrtimer that runs the
tick. But the nohz kick cancels and restarts this hrtimer and cancelling
an hrtimer from itself isn't allowed. This is why we run in an endless
loop:

	Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
	CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
	Workqueue: btrfs-endio-write normal_work_helper [btrfs]
	 ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
	 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
	 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
	Call Trace:
	 <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
	 [<ffffffff8a7ef928>] panic+0xd4/0x207
	 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
	 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
	 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
	 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
	 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
	 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
	 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
	 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
	 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
	 [<ffffffff8a008268>] do_nmi+0xb8/0x100
	 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
	 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
	 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
	 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
	 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
	 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
	 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
	 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
	 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
	 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
	 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
	 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
	 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
	 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
	 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
	 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

To fix this we force non-lazy irq works to run on irq work self-IPIs
when available. That ability of the arch to trigger irq work self IPIs
is available with arch_irq_work_has_interrupt().

Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:38:15 +02:00
..
alarmtimer.c Staging driver patches for 3.17-rc1 2014-08-04 18:36:12 -07:00
clockevents.c timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks 2014-08-01 12:54:41 +02:00
clocksource.c clocksource: Make delta calculation a function 2014-07-23 15:01:51 -07:00
hrtimer.c time: Consolidate the time accessor prototypes 2014-07-23 10:17:54 -07:00
itimer.c time/timers: Move all time(r) related files into kernel/time 2014-06-23 11:22:35 +02:00
jiffies.c time: Fix overflow when HZ is smaller than 60 2014-02-06 16:01:40 +01:00
Kconfig clocksource: Move cycle_last validation to core code 2014-07-23 15:01:51 -07:00
Makefile kernel: time: Add udelay_test module to validate udelay 2014-07-23 10:16:35 -07:00
ntp_internal.h timekeeping: Convert timekeeping core to use timespec64s 2014-07-23 10:17:54 -07:00
ntp.c timekeeping: Provide timespec64 based interfaces 2014-07-23 10:17:55 -07:00
posix-clock.c kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure 2011-10-31 19:30:05 -04:00
posix-cpu-timers.c time/timers: Move all time(r) related files into kernel/time 2014-06-23 11:22:35 +02:00
posix-timers.c time: Consolidate the time accessor prototypes 2014-07-23 10:17:54 -07:00
sched_clock.c sched_clock: Avoid corrupting hrtimer tree during suspend 2014-07-24 12:02:49 +02:00
tick-broadcast-hrtimer.c tick: Fixup more fallout from hrtimer broadcast mode 2014-02-09 15:11:47 +01:00
tick-broadcast.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
tick-common.c nohz: Move nohz full init call to tick init 2014-09-13 18:34:44 +02:00
tick-internal.h nohz: Move nohz full init call to tick init 2014-09-13 18:34:44 +02:00
tick-oneshot.c clockevents: Make minimum delay adjustments configurable 2011-09-08 11:10:56 +02:00
tick-sched.c nohz: Restore NMI safe local irq work for local nohz kick 2014-09-04 22:35:59 +02:00
time.c time: Export nsecs_to_jiffies() 2014-07-23 10:18:04 -07:00
timeconst.bc time/timers: Move all time(r) related files into kernel/time 2014-06-23 11:22:35 +02:00
timeconv.c
timekeeping_debug.c timekeeping: Convert timekeeping core to use timespec64s 2014-07-23 10:17:54 -07:00
timekeeping_internal.h clocksource: Move cycle_last validation to core code 2014-07-23 15:01:51 -07:00
timekeeping.c timekeeping: Update timekeeper before updating vsyscall and pvclock 2014-09-06 12:58:18 +02:00
timekeeping.h time: Consolidate the time accessor prototypes 2014-07-23 10:17:54 -07:00
timer_list.c timer_list: correct the iterator for timer_list 2013-08-28 19:26:38 -07:00
timer_stats.c timer stats: Add a 'Collection: active/inactive' line to timer usage statistics 2013-10-10 09:59:25 +02:00
timer.c irq_work: Force raised irq work to run on irq work interrupt 2014-09-13 18:38:15 +02:00
udelay_test.c kernel: time: Add udelay_test module to validate udelay 2014-07-23 10:16:35 -07:00