linux

iv/linux

History

Waiman Long 810507fe6f locking/lockdep: Reuse freed chain_hlocks entries Once a lock class is zapped, all the lock chains that include the zapped class are essentially useless. The lock_chain structure itself can be reused, but not the corresponding chain_hlocks[] entries. Over time, we will run out of chain_hlocks entries while there are still plenty of other lockdep array entries available. To fix this imbalance, we have to make chain_hlocks entries reusable just like the others. As the freed chain_hlocks entries are in blocks of various lengths. A simple bitmap like the one used in the other reusable lockdep arrays isn't applicable. Instead the chain_hlocks entries are put into bucketed lists (MAX_CHAIN_BUCKETS) of chain blocks. Bucket 0 is the variable size bucket which houses chain blocks of size larger than MAX_CHAIN_BUCKETS sorted in decreasing size order. Initially, the whole array is in one chain block (the primordial chain block) in bucket 0. The minimum size of a chain block is 2 chain_hlocks entries. That will be the minimum allocation size. In other word, allocation requests for one chain_hlocks entry will cause 2-entry block to be returned and hence 1 entry will be wasted. Allocation requests for the chain_hlocks are fulfilled first by looking for chain block of matching size. If not found, the first chain block from bucket[0] (the largest one) is split. That can cause hlock entries fragmentation and reduce allocation efficiency if a chain block of size > MAX_CHAIN_BUCKETS is ever zapped and put back to after the primordial chain block. So the MAX_CHAIN_BUCKETS must be large enough that this should seldom happen. By reusing the chain_hlocks entries, we are able to handle workloads that add and zap a lot of lock classes without the risk of running out of chain_hlocks entries as long as the total number of outstanding lock classes at any time remain within a reasonable limit. Two new tracking counters, nr_free_chain_hlocks & nr_large_chain_blocks, are added to track the total number of chain_hlocks entries in the free bucketed lists and the number of large chain blocks in buckets[0] respectively. The nr_free_chain_hlocks replaces nr_chain_hlocks. The nr_large_chain_blocks counter enables to see if we should increase the number of buckets (MAX_CHAIN_BUCKETS) available so as to avoid to avoid the fragmentation problem in bucket[0]. An internal nfsd test that ran for more than an hour and kept on loading and unloading kernel modules could cause the following message to be displayed. [ 4318.443670] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low! The patched kernel was able to complete the test with a lot of free chain_hlocks entries to spare: # cat /proc/lockdep_stats : dependency chains: 18867 [max: 65536] dependency chain hlocks: 74926 [max: 327680] dependency chain hlocks lost: 0 : zapped classes: 1541 zapped lock chains: 56765 large chain blocks: 1 By changing MAX_CHAIN_BUCKETS to 3 and add a counter for the size of the largest chain block. The system still worked and We got the following lockdep_stats data: dependency chains: 18601 [max: 65536] dependency chain hlocks used: 73133 [max: 327680] dependency chain hlocks lost: 0 : zapped classes: 1541 zapped lock chains: 56702 large chain blocks: 45165 large chain block size: 20165 By running the test again, I was indeed able to cause chain_hlocks entries to get lost: dependency chain hlocks used: 74806 [max: 327680] dependency chain hlocks lost: 575 : large chain blocks: 48737 large chain block size: 7 Due to the fragmentation, it is possible that the "MAX_LOCKDEP_CHAIN_HLOCKS too low!" error can happen even if a lot of of chain_hlocks entries appear to be free. Fortunately, a MAX_CHAIN_BUCKETS value of 16 should be big enough that few variable sized chain blocks, other than the initial one, should ever be present in bucket 0. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-7-longman@redhat.com		2020-02-11 13:10:52 +01:00
..
bpf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2020-02-08 17:15:08 -08:00
cgroup	Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2020-02-10 17:07:05 -08:00
configs
debug	Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no regs"	2020-02-06 11:40:09 +00:00
dma	lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr	2019-12-04 19:44:13 -08:00
events	A set of fixes and improvements for the perf subsystem:	2020-02-09 12:04:09 -08:00
gcov	Revert "um: Enable CONFIG_CONSTRUCTORS"	2020-01-19 22:42:06 +01:00
irq	A set of fixes for X86:	2020-02-09 12:11:12 -08:00
livepatch	New tracing features:	2019-11-27 11:42:01 -08:00
locking	locking/lockdep: Reuse freed chain_hlocks entries	2020-02-11 13:10:52 +01:00
power	Merge back new material related to system-wide PM for v5.6.	2020-01-23 16:00:56 +01:00
printk	printk: fix exclusive_console replaying	2020-01-02 16:15:04 +01:00
rcu	rcu: Forgive slow expedited grace periods at boot time	2020-01-25 12:00:40 -08:00
sched	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
time	Two small fixes for the time(r) subsystem:	2020-02-09 12:00:12 -08:00
trace	Tracing updates:	2020-02-06 07:12:11 +00:00
.gitignore
acct.c	acct: stop using get_seconds()	2019-12-18 18:07:31 +01:00
async.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
audit_fsnotify.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
audit_tree.c
audit_watch.c	audit_get_nd(): don't unlock parent too early	2019-11-10 11:56:55 -05:00
audit.c	audit: Add __rcu annotation to RCU pointer	2019-12-09 15:19:03 -05:00
audit.h	audit/stable-5.3 PR 20190702	2019-07-08 18:55:42 -07:00
auditfilter.c	audit/stable-5.3 PR 20190702	2019-07-08 18:55:42 -07:00
auditsc.c	Revert "bpf: Emit audit messages upon successful prog load and unload"	2019-11-23 09:56:02 -08:00
backtracetest.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
bounds.c
capability.c
compat.c	y2038: itimer: compat handling to itimer.c	2019-11-15 14:38:30 +01:00
configs.c	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
context_tracking.c	context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled()	2019-10-29 10:01:12 +01:00
cpu_pm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282	2019-06-05 17:36:37 +02:00
cpu.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2020-01-28 10:07:09 -08:00
crash_core.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230	2019-06-19 17:09:06 +02:00
crash_dump.c
cred.c	Merge branch 'dhowells' (patches from DavidH)	2020-01-14 09:56:31 -08:00
delayacct.c
dma.c
elfcore.c	kernel/elfcore.c: include proper prototypes	2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c	for-linus-2020-01-03	2020-01-03 11:17:14 -08:00
extable.c	bpf: Allow to resolve bpf trampoline and dispatcher in unwind	2020-01-25 07:12:40 -08:00
fail_function.c	fail_function: no need to check return value of debugfs_create functions	2019-06-03 15:49:06 +02:00
fork.c	hmm related patches for 5.6	2020-01-29 19:56:50 -08:00
freezer.c	Revert "libata, freezer: avoid block device removal while system is frozen"	2019-10-06 09:11:37 -06:00
futex.c	futex: Fix kernel-doc notation warning	2020-01-09 13:23:40 +01:00
gen_kheaders.sh	kheaders: explain why include/config/autoconf.h is excluded from md5sum	2019-11-11 20:10:01 +09:00
groups.c
hung_task.c
iomem.c	mm/nvdimm: add is_ioremap_addr and use that to check ioremap address	2019-07-12 11:05:40 -07:00
irq_work.c	irq_work: Fix IRQ_WORK_BUSY bit clearing	2019-11-15 10:48:37 +01:00
jump_label.c	jump_label: Don't warn on __exit jump entries	2019-08-29 15:10:10 +01:00
kallsyms.c	Kbuild updates for v5.6 (2nd)	2020-02-09 16:05:50 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks	sched/rt, locking: Use CONFIG_PREEMPTION	2019-12-08 14:37:36 +01:00
Kconfig.preempt	sched/Kconfig: Fix spelling mistake in user-visible help text	2019-11-12 11:35:32 +01:00
kcov.c	kcov: remote coverage support	2019-12-04 19:44:14 -08:00
kexec_core.c	kexec: add machine_kexec_post_load()	2020-01-08 16:32:55 +00:00
kexec_elf.c	kexec_elf: support 32 bit ELF files	2019-09-06 23:58:44 +02:00
kexec_file.c	kexec: add machine_kexec_post_load()	2020-01-08 16:32:55 +00:00
kexec_internal.h	kexec: add machine_kexec_post_load()	2020-01-08 16:32:55 +00:00
kexec.c	kexec: add machine_kexec_post_load()	2020-01-08 16:32:55 +00:00
kheaders.c
kmod.c
kprobes.c	kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic	2020-01-09 12:40:13 +01:00
ksysfs.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 170	2019-05-30 11:26:39 -07:00
kthread.c	kthread: make __kthread_queue_delayed_work static	2019-10-16 09:20:58 -07:00
latencytop.c	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
Makefile	kcov: ignore fault-inject and stacktrace	2020-01-31 10:30:41 -08:00
module_signature.c	MODSIGN: Export module signature definitions	2019-08-05 18:39:56 -04:00
module_signing.c	MODSIGN: Export module signature definitions	2019-08-05 18:39:56 -04:00
module-internal.h
module.c	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
notifier.c	kernel/notifier.c: remove blocking_notifier_chain_cond_register()	2019-12-04 19:44:12 -08:00
nsproxy.c	ns: Introduce Time Namespace	2020-01-14 12:20:48 +01:00
padata.c	padata: update documentation	2019-12-11 16:37:02 +08:00
panic.c	locking/refcount: Remove unused 'refcount_error_report()' function	2019-11-25 09:15:42 +01:00
params.c	lockdown: Lock down module params that specify hardware parameters (eg. ioport)	2019-08-19 21:54:16 -07:00
pid_namespace.c	fork: extend clone3() to support setting a PID	2019-11-15 23:49:22 +01:00
pid.c	pid: Implement pidfd_getfd syscall	2020-01-13 21:49:36 +01:00
profile.c	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
ptrace.c	ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()	2020-01-18 13:51:39 +01:00
range.c
reboot.c
relay.c
resource.c	mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range()	2019-09-24 15:54:09 -07:00
rseq.c	rseq: Reject unknown flags on rseq unregister	2019-12-25 10:41:20 +01:00
seccomp.c	seccomp: Check that seccomp_notif is zeroed out by the user	2020-01-02 13:03:45 -08:00
signal.c	sched.h: Annotate sighand_struct with __rcu	2020-01-26 10:54:47 +01:00
smp.c	smp: Remove superfluous cond_func check in smp_call_function_many_cond()	2020-01-28 15:43:00 +01:00
smpboot.c
smpboot.h
softirq.c	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-07-08 11:01:13 -07:00
stackleak.c
stacktrace.c	stacktrace: Get rid of unneeded '!!' pattern	2019-11-11 10:30:59 +01:00
stop_machine.c	stop_machine: Make stop_cpus() static	2020-01-17 10:19:21 +01:00
sys_ni.c	y2038: allow disabling time32 system calls	2019-11-15 14:38:30 +01:00
sys.c	prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim	2020-01-28 10:09:51 +01:00
sysctl_binary.c	sysctl: Remove the sysctl system call	2019-11-26 13:03:56 -06:00
sysctl-test.c	kunit: allow kunit tests to be loaded as a module	2020-01-09 16:42:29 -07:00
sysctl.c	rcu: Make PREEMPT_RCU be a modifier to TREE_RCU	2019-12-09 12:37:51 -08:00
task_work.c
taskstats.c	taskstats: fix data-race	2019-12-04 15:18:39 +01:00
test_kprobes.c
torture.c	torture: Remove exporting of internal functions	2019-08-01 14:30:22 -07:00
tracepoint.c	The main changes in this release include:	2019-07-18 11:51:00 -07:00
tsacct.c	tsacct: add 64-bit btime field	2019-12-18 18:07:31 +01:00
ucount.c	proc/sysctl: add shared variables for range check	2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c
up.c	smp/up: Make smp_call_function_single() match SMP semantics	2020-02-07 15:34:12 +01:00
user_namespace.c	Keyrings namespacing	2019-07-08 19:36:47 -07:00
user-return-notifier.c
user.c	Keyrings namespacing	2019-07-08 19:36:47 -07:00
utsname_sysctl.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
utsname.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
watchdog_hld.c
watchdog.c	watchdog/softlockup: Enforce that timestamp is valid on boot	2020-01-17 11:19:22 +01:00
workqueue_internal.h
workqueue.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2020-01-28 10:07:09 -08:00