linux

iv/linux

History

Feng Tang b7082cdfc4 clocksource: Suspend the watchdog temporarily when high read latency detected Bugs have been reported on 8 sockets x86 machines in which the TSC was wrongly disabled when the system is under heavy workload. [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped! [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped! [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998) [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564. [ 821.067990] clocksource: Switched to clocksource hpet This can be reproduced by running memory intensive 'stream' tests, or some of the stress-ng subcases such as 'ioport'. The reason for these issues is the when system is under heavy load, the read latency of the clocksources can be very high. Even lightweight TSC reads can show high latencies, and latencies are much worse for external clocksources such as HPET or the APIC PM timer. These latencies can result in false-positive clocksource-unstable determinations. These issues were initially reported by a customer running on a production system, and this problem was reproduced on several generations of Xeon servers, especially when running the stress-ng test. These Xeon servers were not production systems, but they did have the latest steppings and firmware. Given that the clocksource watchdog is a continual diagnostic check with frequency of twice a second, there is no need to rush it when the system is under heavy load. Therefore, when high clocksource read latencies are detected, suspend the watchdog timer for 5 minutes. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Waiman Long <longman@redhat.com> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>		2023-01-24 15:12:48 -08:00
..
bpf	Including fixes from bpf, netfilter and can.	2022-12-21 08:41:32 -08:00
cgroup	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
configs	mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED	2022-12-01 00:09:20 +01:00
debug	kdb: use srcu console list iterator	2022-12-02 11:25:00 +01:00
dma	dma-mapping: reject GFP_COMP for noncoherent allocations	2022-12-21 08:45:38 +01:00
entry	entry: kmsan: introduce kmsan_unpoison_entry_regs()	2022-10-03 14:03:25 -07:00
events	New Feature:	2022-12-17 14:06:53 -06:00
futex	futex: Resend potentially swallowed owner death notification	2022-12-02 12:20:24 +01:00
gcov	gcov: add support for checksum field	2022-12-21 14:31:52 -08:00
irq	genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present	2022-12-16 14:04:04 +00:00
kcsan	hardening updates for v6.2-rc1	2022-12-14 12:20:00 -08:00
livepatch	modules changes for v6.2-rc1	2022-12-13 14:05:39 -08:00
locking	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
module	powerpc updates for 6.2	2022-12-19 07:13:33 -06:00
power	PM: sleep: Refine error message in try_to_freeze_tasks()	2022-12-06 12:04:34 +01:00
printk	Merge branch 'rework/console-list-lock' into for-linus	2022-12-08 11:46:56 +01:00
rcu	Urgent RCU pull request for v6.2	2022-12-21 07:59:57 -08:00
sched	hardening updates for v6.2-rc1	2022-12-14 12:20:00 -08:00
time	clocksource: Suspend the watchdog temporarily when high read latency detected	2023-01-24 15:12:48 -08:00
trace	Tracing fix for 6.2:	2022-12-21 19:03:42 -08:00
.gitignore
acct.c	acct: fix potential integer overflow in encode_comp_t()	2022-11-30 16:13:18 -08:00
async.c	Revert "module, async: async_synchronize_full() on module init iff async is used"	2022-02-03 11:20:34 -08:00
audit_fsnotify.c	audit: fix potential double free on error path from fsnotify_add_inode_mark	2022-08-22 18:50:06 -04:00
audit_tree.c	audit: use fsnotify group lock helpers	2022-04-25 14:37:28 +02:00
audit_watch.c	audit_init_parent(): constify path	2022-09-01 17:39:30 -04:00
audit.c	audit: use time_after to compare time	2022-08-29 19:47:03 -04:00
audit.h	audit: remove selinux_audit_rule_update() declaration	2022-09-07 11:30:15 -04:00
auditfilter.c	audit/stable-5.17 PR 20220110	2022-01-11 13:08:21 -08:00
auditsc.c	audit: unify audit_filter_{uring(), inode_name(), syscall()}	2022-10-17 14:24:42 -04:00
backtracetest.c
bounds.c	mm: multi-gen LRU: minimal implementation	2022-09-26 19:46:09 -07:00
capability.c	caps: use type safe idmapping helpers	2022-10-26 10:02:39 +02:00
cfi.c	cfi: Switch to -fsanitize=kcfi	2022-09-26 10:13:13 -07:00
compat.c
configs.c
context_tracking.c	MAINTAINERS: Add Paul as context tracking maintainer	2022-07-05 13:33:00 -07:00
cpu_pm.c	context_tracking: Take IRQ eqs entrypoints over RCU	2022-07-05 13:32:59 -07:00
cpu.c	cpu/hotplug: Do not bail-out in DYING/STARTING sections	2022-12-02 12:43:02 +01:00
crash_core.c	vmcoreinfo: warn if we exceed vmcoreinfo data size	2022-11-30 16:13:17 -08:00
crash_dump.c
cred.c	cred: Do not default to init_cred in prepare_kernel_cred()	2022-11-01 10:04:52 -07:00
delayacct.c	delayacct: support re-entrance detection of thrashing accounting	2022-09-26 19:46:07 -07:00
dma.c
exec_domain.c
exit.c	exit: Use READ_ONCE() for all oops/warn limit reads	2022-12-16 12:26:57 -08:00
extable.c	context_tracking: Take NMI eqs entrypoints over RCU	2022-07-05 13:32:59 -07:00
fail_function.c	fail_function: fix wrong use of fei_attr_remove()	2022-09-11 21:55:11 -07:00
fork.c	New Feature:	2022-12-17 14:06:53 -06:00
freezer.c	freezer,sched: Rewrite core freezer logic	2022-09-07 21:53:50 +02:00
gen_kheaders.sh	kbuild: build init/built-in.a just once	2022-09-29 04:40:15 +09:00
groups.c	security: Add LSM hook to setgroups() syscall	2022-07-15 18:21:49 +00:00
hung_task.c	sched: Fix more TASK_state comparisons	2022-09-30 16:50:39 +02:00
iomem.c
irq_work.c	irq_work: use kasan_record_aux_stack_noalloc() record callstack	2022-04-15 14:49:55 -07:00
jump_label.c	jump_label: Prevent key->enabled int overflow	2022-12-01 15:53:05 -08:00
kallsyms_internal.h	kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[]	2022-11-12 18:47:36 -08:00
kallsyms_selftest.c	kallsyms: Remove unneeded semicolon	2022-11-18 12:56:40 -08:00
kallsyms_selftest.h	kallsyms: Add self-test facility	2022-11-15 00:42:02 -08:00
kallsyms.c	kallsyms: Add self-test facility	2022-11-15 00:42:02 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt	Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"	2022-03-31 10:36:55 +02:00
kcov.c	kcov: kmsan: unpoison area->list in kcov_remote_area_put()	2022-10-03 14:03:23 -07:00
kexec_core.c	kexec: remove the unneeded result variable	2022-11-18 13:55:07 -08:00
kexec_elf.c
kexec_file.c	kexec: replace crash_mem_range with range	2022-11-18 13:55:07 -08:00
kexec_internal.h	panic, kexec: make __crash_kexec() NMI safe	2022-09-11 21:55:06 -07:00
kexec.c	panic, kexec: make __crash_kexec() NMI safe	2022-09-11 21:55:06 -07:00
kheaders.c
kmod.c
kprobes.c	kprobes: kretprobe events missing on 2-core KVM guest	2022-12-15 08:48:40 +09:00
ksysfs.c	kernel/ksysfs.c: export kernel cpu byteorder	2022-11-10 19:07:31 +01:00
kthread.c	signal: break out of wait loops on kthread_stop()	2022-10-09 16:01:59 -07:00
latencytop.c	latencytop: use the last element of latency_record of system	2022-09-11 21:55:12 -07:00
Makefile	kernel hardening fixes for v6.2-rc1	2022-12-23 12:00:24 -08:00
module_signature.c
notifier.c	notifier: repair slips in kernel-doc comments	2022-11-30 19:32:30 +01:00
nsproxy.c	fs/exec: switch timens when a task gets a new mm	2022-10-25 15:15:52 -07:00
padata.c	Kbuild updates for v6.2	2022-12-19 12:33:32 -06:00
panic.c	kernel hardening fixes for v6.2-rc1	2022-12-23 12:00:24 -08:00
params.c	Driver Core changes for 6.2-rc1	2022-12-16 03:54:54 -08:00
pid_namespace.c	kernel: pid_namespace: use NULL instead of using plain integer as pointer	2022-04-29 14:38:00 -07:00
pid.c	gfs2: Add glockfd debugfs file	2022-06-29 13:07:16 +02:00
profile.c	kernel/profile.c: simplify duplicated code in profile_setup()	2022-09-11 21:55:12 -07:00
ptrace.c	freezer,sched: Rewrite core freezer logic	2022-09-07 21:53:50 +02:00
range.c
reboot.c	kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode	2022-10-04 15:59:36 +02:00
regset.c
relay.c	relay: fix type mismatch when allocating memory in relay_create_buf()	2022-12-11 19:30:19 -08:00
resource_kunit.c
resource.c	Driver Core changes for 6.2-rc1	2022-12-16 03:54:54 -08:00
rseq.c	rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered	2022-11-14 09:58:32 +01:00
scftorture.c	scftorture: Fix distribution of short handler delays	2022-04-11 17:07:29 -07:00
scs.c	scs: add support for dynamic shadow call stacks	2022-11-09 18:06:35 +00:00
seccomp.c	seccomp: Add wait_killable semantic to seccomp user notifier	2022-05-03 14:11:58 -07:00
signal.c	hardening updates for v6.2-rc1	2022-12-14 12:20:00 -08:00
smp.c	bitmap patches for v6.1-rc1	2022-10-10 12:49:34 -07:00
smpboot.c	smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death	2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c	context_tracking: Take IRQ eqs entrypoints over RCU	2022-07-05 13:32:59 -07:00
stackleak.c	stackleak: add on/off stack variants	2022-05-08 01:33:09 -07:00
stacktrace.c	uaccess: remove CONFIG_SET_FS	2022-02-25 09:36:06 +01:00
static_call_inline.c	static_call: Add call depth tracking support	2022-10-17 16:41:16 +02:00
static_call.c	static_call: Don't make __static_call_return0 static	2022-04-05 09:59:38 +02:00
stop_machine.c	Scheduler changes in this cycle were:	2022-05-24 11:11:13 -07:00
sys_ni.c	kernel/sys_ni: add compat entry for fadvise64_64	2022-08-20 15:17:45 -07:00
sys.c	Random number generator updates for Linux 6.1-rc1.	2022-10-10 10:41:21 -07:00
sysctl-test.c	kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred}	2022-09-08 16:56:45 -07:00
sysctl.c	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
task_work.c	task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run	2022-09-11 21:55:10 -07:00
taskstats.c	genetlink: start to validate reserved header bytes	2022-08-29 12:47:15 +01:00
torture.c	torture: Wake up kthreads after storing task_struct pointer	2022-02-01 17:24:39 -08:00
tracepoint.c	tracepoint: Optimize the critical region of mutex_lock in tracepoint_module_coming()	2022-09-26 13:01:18 -04:00
tsacct.c	taskstats: version 12 with thread group and exe info	2022-04-29 14:38:03 -07:00
ucount.c	ucounts: Split rlimit and ucount values and max values	2022-05-18 18:24:57 -05:00
uid16.c
uid16.h
umh.c	freezer,sched: Rewrite core freezer logic	2022-09-07 21:53:50 +02:00
up.c
user_namespace.c	ucounts: Split rlimit and ucount values and max values	2022-10-09 16:24:05 -07:00
user-return-notifier.c
user.c	kernel/user: Allow user_struct::locked_vm to be usable for iommufd	2022-11-30 20:16:49 -04:00
usermode_driver.c	blob_to_mnt(): kern_unmount() is needed to undo kern_mount()	2022-05-19 23:25:47 -04:00
utsname_sysctl.c	kernel/utsname_sysctl.c: Fix hostname polling	2022-10-23 12:01:01 -07:00
utsname.c
watch_queue.c	This was a moderately busy cycle for documentation, but nothing all that	2022-08-02 19:24:24 -07:00
watchdog_hld.c	Revert "printk: add functions to prefer direct printing"	2022-06-23 18:41:40 +02:00
watchdog.c	powerpc updates for 6.0	2022-08-06 16:38:17 -07:00
workqueue_internal.h
workqueue.c	workqueue: Make queue_rcu_work() use call_rcu_hurry()	2022-11-30 13:17:05 -08:00