linux

iv/linux

History

Mathieu Desnoyers af7f588d8f sched: Introduce per-memory-map concurrency ID This feature allows the scheduler to expose a per-memory map concurrency ID to user-space. This concurrency ID is within the possible cpus range, and is temporarily (and uniquely) assigned while threads are actively running within a memory map. If a memory map has fewer threads than cores, or is limited to run on few cores concurrently through sched affinity or cgroup cpusets, the concurrency IDs will be values close to 0, thus allowing efficient use of user-space memory for per-cpu data structures. This feature is meant to be exposed by a new rseq thread area field. The primary purpose of this feature is to do the heavy-lifting needed by memory allocators to allow them to use per-cpu data structures efficiently in the following situations: - Single-threaded applications, - Multi-threaded applications on large systems (many cores) with limited cpu affinity mask, - Multi-threaded applications on large systems (many cores) with restricted cgroup cpuset per container. One of the key concern from scheduler maintainers is the overhead associated with additional spin locks or atomic operations in the scheduler fast-path. This is why the following optimization is implemented. On context switch between threads belonging to the same memory map, transfer the mm_cid from prev to next without any atomic ops. This takes care of use-cases involving frequent context switch between threads belonging to the same memory map. Additional optimizations can be done if the spin locks added when context switching between threads belonging to different memory maps end up being a performance bottleneck. Those are left out of this patch though. A performance impact would have to be clearly demonstrated to justify the added complexity. The credit goes to Paul Turner (Google) for the original virtual cpu id idea. This feature is implemented based on the discussions with Paul Turner and Peter Oskolkov (Google), but I took the liberty to implement scheduler fast-path optimizations and my own NUMA-awareness scheme. The rumor has it that Google have been running a rseq vcpu_id extension internally in production for a year. The tcmalloc source code indeed has comments hinting at a vcpu_id prototype extension to the rseq system call [1]. The following benchmarks do not show any significant overhead added to the scheduler context switch by this feature: * perf bench sched messaging (process) Baseline: 86.5±0.3 ms With mm_cid: 86.7±2.6 ms * perf bench sched messaging (threaded) Baseline: 84.3±3.0 ms With mm_cid: 84.7±2.6 ms * hackbench (process) Baseline: 82.9±2.7 ms With mm_cid: 82.9±2.9 ms * hackbench (threaded) Baseline: 85.2±2.6 ms With mm_cid: 84.4±2.9 ms [1] https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linux_syscall_support.h#L26 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-8-mathieu.desnoyers@efficios.com		2022-12-27 12:52:11 +01:00
..
9p	9p-for-6.2-rc1	2022-12-23 11:39:18 -08:00
adfs	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
affs	affs: move from strlcpy with unused retval to strscpy	2022-08-19 13:03:10 +02:00
afs	afs: Stop implementing ->writepage()	2022-12-22 11:40:35 +00:00
autofs	autofs: remove unused ino field inode	2022-07-17 17:31:42 -07:00
befs	befs: Convert befs_symlink_read_folio() to use a folio	2022-08-02 12:34:03 -04:00
bfs	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
btrfs	hardening updates for v6.2-rc1	2022-12-14 12:20:00 -08:00
cachefiles	fscache,cachefiles: add prepare_ondemand_read() callback	2022-12-07 10:56:29 +08:00
ceph	A fix to facilitate prompt cap releases on async creates from Xiubo.	2022-12-14 10:35:47 -08:00
cifs	20 cifs/smb3 client fixes, mostly related to reconnect and/or DFS	2022-12-21 10:40:08 -08:00
coda	coda: Convert coda_symlink_filler() to use a folio	2022-08-02 12:34:03 -04:00
configfs	configfs: fix possible memory leak in configfs_create_dir()	2022-12-02 11:11:22 +01:00
cramfs	cramfs: read_mapping_page() is synchronous	2022-08-02 12:34:02 -04:00
crypto	for-6.2/block-2022-12-08	2022-12-13 10:43:59 -08:00
debugfs	debugfs: fix error when writing negative value to atomic_t debugfs file	2022-11-30 16:13:16 -08:00
devpts
dlm	Treewide: Stop corrupting socket's task_frag	2022-12-19 17:28:49 -08:00
ecryptfs	ecryptfs: use stub posix acl handlers	2022-10-20 10:13:31 +02:00
efivarfs	efi: vars: prohibit reading random seed variables	2022-12-01 09:51:21 +01:00
efs	efs: Convert efs symlinks to read_folio	2022-05-09 16:21:45 -04:00
erofs	Changes since the last update:	2022-12-12 20:14:04 -08:00
exfat	Description for this pull request:	2022-12-15 18:14:21 -08:00
exportfs	exportfs: use pr_debug for unreachable debug statements	2022-11-28 12:54:45 -05:00
ext2	\n	2022-12-12 20:32:50 -08:00
ext4	treewide: Convert del_timer() to timer_shutdown()	2022-12-25 13:38:09 -08:00
f2fs	f2fs-for-6.2-rc1	2022-12-14 15:27:57 -08:00
fat	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
freevxfs	freevxfs: Convert vxfs_immed_read_folio() to use a folio	2022-08-02 12:34:03 -04:00
fscache	iov_iter work; most of that is about getting rid of	2022-12-12 18:29:54 -08:00
fuse	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
gfs2	gfs2 fixes	2022-12-17 08:18:04 -06:00
hfs	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
hfsplus	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
hostfs	hostfs: move from strlcpy with unused retval to strscpy	2022-09-19 22:46:25 +02:00
hpfs	hpfs: remove ->writepage	2022-12-11 18:12:18 -08:00
hugetlbfs	hugetlbfs: inode: remove unnecessary (void*) conversions	2022-11-30 15:58:56 -08:00
iomap	New XFS code for 6.2:	2022-12-14 10:11:51 -08:00
isofs	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
jbd2	jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeout	2022-12-08 21:49:25 -05:00
jffs2	fs: rename current get acl method	2022-10-20 10:13:27 +02:00
jfs	MM patches for 6.2-rc1.	2022-12-13 19:29:45 -08:00
kernfs	kernfs: fix all kernel-doc warnings and multiple typos	2022-11-23 19:28:26 +01:00
ksmbd	six ksmbd server fixes	2022-12-15 09:29:19 -08:00
lockd	NFSD 6.2 Release Notes	2022-12-12 20:54:39 -08:00
minix	vfs: open inside ->tmpfile()	2022-09-24 07:00:00 +02:00
netfs	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
nfs	Driver Core changes for 6.2-rc1	2022-12-16 03:54:54 -08:00
nfs_common
nfsd	nfsd-6.2 supplement:	2022-12-19 09:10:33 -06:00
nilfs2	treewide: Convert del_timer() to timer_shutdown()	2022-12-25 13:38:09 -08:00
nls
notify	Merge tag 'fsnotify-for_v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2022-10-07 08:28:50 -07:00
ntfs	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
ntfs3	ntfs3 for 6.2	2022-12-21 10:18:17 -08:00
ocfs2	Treewide: Stop corrupting socket's task_frag	2022-12-19 17:28:49 -08:00
omfs	omfs: remove ->writepage	2022-12-11 18:12:18 -08:00
openpromfs
orangefs	orangefs: four fixes from Zhang Xiaoxu and two from Colin Ian King	2022-12-14 11:16:33 -08:00
overlayfs	overlayfs update for 6.2	2022-12-12 20:18:26 -08:00
proc	ARM64:	2022-12-15 11:12:21 -08:00
pstore	pstore updates for v6.2-rc1-fixes	2022-12-23 11:55:54 -08:00
qnx4	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
qnx6	fs/qnx6: delete unnecessary checks before brelse()	2022-09-11 21:55:07 -07:00
quota	ext4: fix bug_on in __es_tree_search caused by bad quota inode	2022-12-08 21:49:23 -05:00
ramfs	tmpfile API change	2022-10-10 19:45:17 -07:00
reiserfs	lsm/stable-6.2 PR 20221212	2022-12-13 09:47:48 -08:00
romfs	romfs: Convert romfs to read_folio	2022-05-09 16:21:46 -04:00
smbfs_common	smb3: define missing create contexts	2022-10-05 01:55:27 -05:00
squashfs	fs.idmapped.squashfs.v6.2	2022-12-12 20:24:51 -08:00
sysfs	kobject: kobj_type: remove default_attrs	2022-04-05 15:39:19 +02:00
sysv	fs: sysv: Fix sysv_nblocks() returns wrong value	2022-12-10 14:13:37 -05:00
tracefs	tracefs: Only clobber mode/uid/gid on remount if asked	2022-09-08 17:10:54 -04:00
ubifs	treewide: use get_random_u32_below() instead of deprecated function	2022-11-18 02:15:15 +01:00
udf	\n	2022-12-12 20:32:50 -08:00
ufs	ufs: replace ll_rw_block()	2022-09-11 20:26:07 -07:00
unicode
vboxsf	vboxsf: Convert vboxsf to read_folio	2022-05-09 16:21:46 -04:00
verity	fsverity: simplify fsverity_get_digest()	2022-11-29 21:07:41 -08:00
xfs	New XFS code for 6.2:	2022-12-14 10:11:51 -08:00
zonefs	zonefs: Fix active zone accounting	2022-11-25 17:01:22 +09:00
aio.c	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
anon_inodes.c	dynamic_dname(): drop unused dentry argument	2022-08-20 11:34:04 -04:00
attr.c	attr: use consistent sgid stripping checks	2022-10-18 10:09:47 +02:00
bad_inode.c	fs: rename current get acl method	2022-10-20 10:13:27 +02:00
binfmt_elf_fdpic.c	binfmt: Fix error return code in load_elf_fdpic_binary()	2022-12-01 19:15:52 -08:00
binfmt_elf_test.c
binfmt_elf.c	rseq: Introduce feature size and alignment ELF auxiliary vector entries	2022-12-27 12:52:10 +01:00
binfmt_flat.c	binfmt_flat: Remove shared library support	2022-04-22 10:57:18 -07:00
binfmt_misc.c	binfmt_misc: fix shift-out-of-bounds in check_special_flags	2022-12-02 13:57:04 -08:00
binfmt_script.c
buffer.c	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
char_dev.c	chardev: fix error handling in cdev_device_add()	2022-12-02 17:48:59 +01:00
compat_binfmt_elf.c
coredump.c	hardening updates for v6.2-rc1	2022-12-14 12:20:00 -08:00
d_path.c	d_path.c: typo fix...	2022-08-20 11:34:33 -04:00
dax.c	fsdax,xfs: port unshare to fsdax	2022-12-11 18:12:17 -08:00
dcache.c	tmpfile API change	2022-10-10 19:45:17 -07:00
direct-io.c	block: remove PSI accounting from the bio layer	2022-09-20 08:24:38 -06:00
drop_caches.c
eventfd.c	eventfd: provide a eventfd_signal_mask() helper	2022-11-22 06:07:55 -07:00
eventpoll.c	eventpoll: add EPOLL_URING_WAKE poll wakeup flag	2022-11-21 07:45:29 -07:00
exec.c	sched: Introduce per-memory-map concurrency ID	2022-12-27 12:52:11 +01:00
fcntl.c	keep iocb_flags() result cached in struct file	2022-06-10 16:10:23 -04:00
fhandle.c	do_sys_name_to_handle(): constify path	2022-09-01 17:36:39 -04:00
file_table.c	locks: fix TOCTOU race when granting write lease	2022-08-16 10:59:54 -04:00
file.c	fs: use acquire ordering in __fget_light()	2022-10-31 15:30:11 -04:00
filesystems.c
fs_context.c
fs_parser.c	ext4: journal_path mount options should follow links	2022-12-01 10:46:54 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c	for-6.2/writeback-2022-12-12	2022-12-15 18:09:48 -08:00
fsopen.c	uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)	2022-05-19 23:25:10 -04:00
init.c
inode.c	fs.vfsuid.conversion.v6.2	2022-12-12 19:20:05 -08:00
internal.h	fs.ovl.setgid.v6.2	2022-12-12 19:03:10 -08:00
ioctl.c	Fixes for 5.18-rc1:	2022-04-01 19:35:56 -07:00
Kconfig	hugetlb: make hugetlb depends on SYSFS or SYSCTL	2022-09-11 20:26:10 -07:00
Kconfig.binfmt	Xtensa updates for v6.1	2022-10-10 14:21:11 -07:00
kernel_read_file.c	fs/kernel_read_file: allow to read files up-to ssize_t	2022-06-16 19:58:21 -07:00
libfs.c	libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value	2022-11-30 16:13:16 -08:00
locks.c	Add process name and pid to locks warning	2022-11-30 05:08:10 -05:00
Makefile	a.out: Remove the a.out implementation	2022-09-27 07:11:02 -07:00
mbcache.c	ext4: fix deadlock due to mbcache entry corruption	2022-12-08 21:49:25 -05:00
mount.h	switch try_to_unlazy_next() to __legitimize_mnt()	2022-07-05 16:18:21 -04:00
mpage.c	Folio changes for 6.0	2022-08-03 10:35:43 -07:00
namei.c	Landlock updates for v6.2-rc1	2022-12-13 09:14:50 -08:00
namespace.c	fs.idmapped.mnt_idmap.v6.2	2022-12-12 19:30:18 -08:00
no-block.c
nsfs.c	dynamic_dname(): drop unused dentry argument	2022-08-20 11:34:04 -04:00
open.c	Landlock updates for v6.2-rc1	2022-12-13 09:14:50 -08:00
pipe.c	dynamic_dname(): drop unused dentry argument	2022-08-20 11:34:04 -04:00
pnode.c	pnode: terminate at peers of source	2022-12-21 14:45:25 +01:00
pnode.h
posix_acl.c	fs.idmapped.mnt_idmap.v6.2	2022-12-12 19:30:18 -08:00
proc_namespace.c	vfs: escape hash as well	2022-06-28 13:58:05 -04:00
read_write.c	iov_iter work; most of that is about getting rid of	2022-12-12 18:29:54 -08:00
readdir.c	Change calling conventions for filldir_t	2022-08-17 17:25:04 -04:00
remap_range.c	New VFS code for 6.2:	2022-12-13 10:26:38 -08:00
select.c
seq_file.c	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
signalfd.c
splice.c	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
stack.c
stat.c	fs: use type safe idmapping helpers	2022-10-26 10:02:34 +02:00
statfs.c
super.c	misc pile	2022-12-12 18:38:47 -08:00
sync.c	riscv: compat: syscall: Add compat_sys_call_table implementation	2022-04-26 13:36:25 -07:00
sysctls.c
timerfd.c
userfaultfd.c	fs/userfaultfd: Fix maple tree iterator in userfaultfd_unregister()	2022-11-07 12:58:26 -08:00
utimes.c
xattr.c	fs.xattr.simple.rework.rbtree.rwlock.v6.2	2022-12-13 10:08:36 -08:00