linux

iv/linux

History

Davidlohr Bueso a52b89ebb6 futexes: Increase hash table size for better performance Currently, the futex global hash table suffers from its fixed, smallish (for today's standards) size of 256 entries, as well as its lack of NUMA awareness. Large systems, using many futexes, can be prone to high amounts of collisions; where these futexes hash to the same bucket and lead to extra contention on the same hb->lock. Furthermore, cacheline bouncing is a reality when we have multiple hb->locks residing on the same cacheline and different futexes hash to adjacent buckets. This patch keeps the current static size of 16 entries for small systems, or otherwise, 256 * ncpus (or larger as we need to round the number to a power of 2). Note that this number of CPUs accounts for all CPUs that can ever be available in the system, taking into consideration things like hotpluging. While we do impose extra overhead at bootup by making the hash table larger, this is a one time thing, and does not shadow the benefits of this patch. Furthermore, as suggested by tglx, by cache aligning the hash buckets we can avoid access across cacheline boundaries and also avoid massive cache line bouncing if multiple cpus are hammering away at different hash buckets which happen to reside in the same cache line. Also, similar to other core kernel components (pid, dcache, tcp), by using alloc_large_system_hash() we benefit from its NUMA awareness and thus the table is distributed among the nodes instead of in a single one. For a custom microbenchmark that pounds on the uaddr hashing -- making the wait path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server: +---------+--------------------+------------------------+-----------------------+-------------------------------+ \| threads \| baseline (ops/sec) \| aligned-only (ops/sec) \| large table (ops/sec) \| large table+aligned (ops/sec) \| +---------+--------------------+------------------------+-----------------------+-------------------------------+ \| 512 \| 32426 \| 50531 (+55.8%) \| 255274 (+687.2%) \| 292553 (+802.2%) \| \| 256 \| 65360 \| 99588 (+52.3%) \| 443563 (+578.6%) \| 508088 (+677.3%) \| \| 128 \| 125635 \| 200075 (+59.2%) \| 742613 (+491.1%) \| 835452 (+564.9%) \| \| 80 \| 193559 \| 323425 (+67.1%) \| 1028147 (+431.1%) \| 1130304 (+483.9%) \| \| 64 \| 247667 \| 443740 (+79.1%) \| 997300 (+302.6%) \| `1145494` (+362.5%) \| \| 32 \| 628412 \| 721401 (+14.7%) \| 965996 (+53.7%) \| 1122115 (+78.5%) \| +---------+--------------------+------------------------+-----------------------+-------------------------------+ Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Waiman Long <Waiman.Long@hp.com> Reviewed-and-tested-by: Jason Low <jason.low2@hp.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Link: http://lkml.kernel.org/r/1389569486-25487-3-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>		2014-01-13 11:45:18 +01:00
..
cpu	sched: Add NEED_RESCHED to the preempt_count	2013-09-25 14:07:49 +02:00
debug	kdb: Add support for external NMI handler to call KGDB/KDB	2013-10-03 18:47:54 +02:00
events	perf: Disable all pmus on unthrottling and rescheduling	2013-12-17 15:04:00 +01:00
gcov	gcov: reuse kbasename helper	2013-11-13 12:09:34 +09:00
irq	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-12-02 10:15:39 -08:00
locking	mutexes: Give more informative mutex warning in the !lock->owner case	2013-12-17 15:35:10 +01:00
power	PM / sleep: Fix memory leak in pm_vt_switch_unregister().	2013-12-22 00:56:35 +01:00
printk	printk.c: comments should refer to /proc/vmcore instead of /proc/vmcoreinfo	2013-11-13 12:09:14 +09:00
rcu	Linux 3.13-rc4	2013-12-17 15:27:08 +01:00
sched	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-12-19 09:11:22 -08:00
time	nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off	2013-11-29 12:23:03 +01:00
trace	This fixes a long standing bug in the ftrace profiler.	2013-12-20 09:32:30 -08:00
.gitignore	Ignore generated file kernel/x509_certificate_list	2013-12-10 18:21:34 +00:00
acct.c
async.c
audit_tree.c	kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()	2013-06-12 16:29:46 -07:00
audit_watch.c
audit.c	Merge git://git.infradead.org/users/eparis/audit	2013-11-21 19:18:14 -08:00
audit.h	audit: call audit_bprm() only once to add AUDIT_EXECVE information	2013-11-05 11:15:03 -05:00
auditfilter.c	audit: do not reject all AUDIT_INODE filter types	2013-11-05 11:09:16 -05:00
auditsc.c	audit: fix type of sessionid in audit_set_loginuid()	2013-11-06 11:47:24 -05:00
backtracetest.c
bounds.c	mm: do not allocate page->ptl dynamically, if spinlock_t fits to long	2013-12-20 12:25:45 -08:00
capability.c	xfs: update for v3.12-rc1	2013-09-09 11:19:09 -07:00
cgroup_freezer.c	cgroup: make css_for_each_descendant() and friends include the origin css in the iteration	2013-08-08 20:11:27 -04:00
cgroup.c	cgroup: don't recycle cgroup id until all csses' have been destroyed	2013-12-17 08:11:52 -05:00
compat.c
configs.c
context_tracking.c	Linux 3.12-rc4	2013-10-09 12:36:13 +02:00
cpu_pm.c
cpu.c	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-11-14 16:55:11 +09:00
cpuset.c	cpuset: Fix memory allocator deadlock	2013-11-27 13:52:47 -05:00
crash_dump.c
cred.c
delayacct.c	kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk()	2013-11-13 12:09:12 +09:00
dma.c
elfcore.c	switch elf_core_write_extra_phdrs() to dump_emit()	2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c	ptrace: revert "Prepare to fix racy accesses on task breakpoints"	2013-07-09 10:33:26 -07:00
extable.c	kernel/extable: fix address-checks for core_kernel and init areas	2013-11-28 09:49:41 -08:00
fork.c	mm: fix TLB flush race between migration, and change_protection_range	2013-12-18 19:04:51 -08:00
freezer.c	libata, freezer: avoid block device removal while system is frozen	2013-12-19 13:50:32 -05:00
futex_compat.c
futex.c	futexes: Increase hash table size for better performance	2014-01-13 11:45:18 +01:00
groups.c	userns: Kill nsown_capable it makes the wrong thing easy	2013-08-30 23:44:11 -07:00
hrtimer.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
hung_task.c	Here are the 3.13 KVM changes. There was a lot of work on the PPC	2013-11-15 13:51:36 +09:00
irq_work.c
itimer.c
jump_label.c	static_key: WARN on usage before jump_label_init was called	2013-10-19 19:45:35 -04:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz	kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS	2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
kexec.c	kexec: migrate to reboot cpu	2013-12-18 19:04:50 -08:00
kmod.c	kernel/kmod.c: check for NULL in call_usermodehelper_exec()	2013-09-30 14:31:02 -07:00
kprobes.c	kprobes: use KSYM_NAME_LEN to size identifier buffers	2013-11-13 12:09:26 +09:00
ksysfs.c	kernel: replace strict_strto() with kstrto()	2013-09-12 15:38:03 -07:00
kthread.c	kthread: make kthread_create() killable	2013-11-13 12:08:59 +09:00
latencytop.c
Makefile	KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y	2013-12-13 15:59:11 +00:00
module_signing.c	keys: change asymmetric keys to use common hash definitions	2013-10-25 17:15:18 -04:00
module-internal.h	KEYS: Separate the kernel signature checking keyring from module signing	2013-09-25 17:17:01 +01:00
module.c	Mainly boring here, too. rmmod --wait finally removed, though.	2013-11-15 13:27:50 +09:00
notifier.c
nsproxy.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2013-09-07 14:35:32 -07:00
padata.c	padata: make the sequence counter an atomic_t	2013-10-30 12:02:58 +08:00
panic.c	kernel/panic.c: reduce 1 byte usage for print tainted buffer	2013-11-13 12:09:35 +09:00
params.c	kernel/params: fix handling of signed integer types	2013-09-28 12:35:52 -07:00
pid_namespace.c	pid_namespace: make freeing struct pid_namespace rcu-delayed	2013-10-24 23:43:29 -04:00
pid.c	pidns: fix free_pid() to handle the first fork failure	2013-09-30 14:31:03 -07:00
posix-cpu-timers.c	posix_timers: fix racy timer delta caching on task exit	2013-07-03 16:54:42 +02:00
posix-timers.c
profile.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
ptrace.c	exec/ptrace: fix get_dumpable() incorrect tests	2013-11-13 12:09:33 +09:00
range.c	range: Do not add new blank slot with add_range_with_merge	2013-06-18 11:32:10 -05:00
reboot.c	kexec: migrate to reboot cpu	2013-12-18 19:04:50 -08:00
relay.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
res_counter.c	memcg: reduce function dereference	2013-09-12 15:38:02 -07:00
resource.c	kernel/resource.c: remove the unneeded assignment in function __find_resource	2013-07-03 16:08:06 -07:00
seccomp.c
signal.c	constify copy_siginfo_to_user{,32}()	2013-11-09 00:16:29 -05:00
smp.c	kernel: fix generic_exec_single indentation	2013-11-15 09:32:22 +09:00
smpboot.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
smpboot.h
softirq.c	lockdep: Simplify a bit hardirq <-> softirq transitions	2013-11-27 11:09:40 +01:00
stacktrace.c
stop_machine.c	stop_machine: Fix race between stop_two_cpus() and stop_cpus()	2013-11-11 12:43:38 +01:00
sys_ni.c
sys.c	kernel/sys.c: remove obsolete #include <linux/kexec.h>	2013-11-13 12:09:13 +09:00
sysctl_binary.c	kernel/sysctl_binary.c: use scnprintf() instead of snprintf()	2013-11-13 12:09:33 +09:00
sysctl.c	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-11-14 16:30:30 +09:00
system_certificates.S	KEYS: correct alignment of system_certificate_list content in assembly file	2013-12-10 18:25:28 +00:00
system_keyring.c	KEYS: correct alignment of system_certificate_list content in assembly file	2013-12-10 18:25:28 +00:00
task_work.c	task_work: documentation	2013-09-11 15:58:27 -07:00
taskstats.c	genetlink: only pass array to genl_register_family_with_ops()	2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c	sched: Rename sched.c as sched/core.c in comments and Documentation	2013-06-19 12:58:42 +02:00
timeconst.bc
timer.c	timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)	2013-11-19 14:59:50 +01:00
tracepoint.c
tsacct.c
uid16.c	userns: Kill nsown_capable it makes the wrong thing easy	2013-08-30 23:44:11 -07:00
up.c	kernel: provide a __smp_call_function_single stub for !CONFIG_SMP	2013-11-15 09:32:22 +09:00
user_namespace.c	KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches	2013-09-24 10:35:19 +01:00
user-return-notifier.c
user.c	KEYS: fix uninitialized persistent_keyring_register_sem	2013-12-13 15:59:11 +00:00
utsname_sysctl.c
utsname.c	userns: Kill nsown_capable it makes the wrong thing easy	2013-08-30 23:44:11 -07:00
watchdog.c	watchdog: update watchdog_thresh properly	2013-09-24 17:00:25 -07:00
workqueue_internal.h	sched: Rename sched.c as sched/core.c in comments and Documentation	2013-06-19 12:58:42 +02:00
workqueue.c	PCI updates for v3.13:	2013-12-15 11:45:27 -08:00