linux

iv/linux

History

Johannes Weiner 4b85afbdac mm: zero-seek shrinkers The page cache and most shrinkable slab caches hold data that has been read from disk, but there are some caches that only cache CPU work, such as the dentry and inode caches of procfs and sysfs, as well as the subset of radix tree nodes that track non-resident page cache. Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for the shrinker's seeks setting tells the reclaim algorithm that for every two page cache pages scanned it should scan one slab object. This is a bogus setting. A virtual inode that required no IO to create is not twice as valuable as a page cache page; shadow cache entries with eviction distances beyond the size of memory aren't either. In most cases, the behavior in practice is still fine. Such virtual caches don't tend to grow and assert themselves aggressively, and usually get picked up before they cause problems. But there are scenarios where that's not true. Our database workloads suffer from two of those. For one, their file workingset is several times bigger than available memory, which has the kernel aggressively create shadow page cache entries for the non-resident parts of it. The workingset code does tell the VM that most of these are expendable, but the VM ends up balancing them 2:1 to cache pages as per the seeks setting. This is a huge waste of memory. These workloads also deal with tens of thousands of open files and use /proc for introspection, which ends up growing the proc_inode_cache to absurdly large sizes - again at the cost of valuable cache space, which isn't a reasonable trade-off, given that proc inodes can be re-created without involving the disk. This patch implements a "zero-seek" setting for shrinkers that results in a target ratio of 0:1 between their objects and IO-backed caches. This allows such virtual caches to grow when memory is available (they do cache/avoid CPU work after all), but effectively disables them as soon as IO-backed objects are under pressure. It then switches the shrinkers for procfs and sysfs metadata, as well as excess page cache shadow nodes, to the new zero-seek setting. Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Domas Mituzas <dmituzas@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2018-10-26 16:26:33 -07:00
..
9p	Pull request for inclusion in 4.19, take two	2018-08-17 17:27:58 -07:00
adfs	adfs: use timespec64 for time conversion	2018-08-22 10:52:51 -07:00
affs
afs	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-10-19 11:03:06 -07:00
autofs	Merge branch 'akpm' (patches from Andrew)	2018-08-22 12:34:08 -07:00
befs	fix a series of Documentation/ broken file name references	2018-06-15 18:10:01 -03:00
bfs
btrfs	Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2018-10-25 12:55:31 -07:00
cachefiles	cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)	2018-10-18 11:32:21 +02:00
ceph	ceph: avoid a use-after-free in ceph_destroy_options()	2018-09-06 16:18:04 +02:00
cifs	smb3: fix lease break problem introduced by compounding	2018-10-02 18:54:09 -05:00
coda	vfs: change inode times to use struct timespec64	2018-06-05 16:57:31 -07:00
configfs	configfs: fix registered group removal	2018-07-17 06:14:07 -07:00
cramfs	cramfs: convert to use vmf_insert_mixed	2018-10-26 16:25:19 -07:00
crypto	crypto: speck - remove Speck	2018-09-04 11:35:03 +08:00
debugfs	Revert "debugfs: inode: debugfs_create_dir uses mode permission from parent"	2018-06-12 20:52:16 -07:00
devpts	devpts: Convert to new IDA API	2018-08-21 23:54:17 -04:00
dlm	treewide: Use array_size() in vmalloc()	2018-06-12 16:19:22 -07:00
ecryptfs	ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()	2018-10-09 23:33:17 -04:00
efivarfs	efivars: Call guid_parse() against guid_t type of variable	2018-07-22 14:13:44 +02:00
efs
exofs	exofs: use bio_clone_fast in _write_mirror	2018-07-24 14:43:20 -06:00
exportfs
ext2	ext2, dax: set ext2_dax_aops for dax files	2018-09-19 15:03:04 +02:00
ext4	Further restructure ext4 documentation; fix up ext4's delayed	2018-10-24 17:42:24 +01:00
f2fs	f2fs: fix to keep project quota consistent	2018-10-22 17:54:48 -07:00
fat	fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()	2018-10-13 09:31:03 +02:00
freevxfs
fscache	fscache: Fix out of bound read in long cookie keys	2018-10-18 11:32:21 +02:00
fuse	fuse update for 4.19	2018-08-21 18:47:36 -07:00
gfs2	We've got 18 patches for this merge window, none of which are very major.	2018-10-24 17:30:39 +01:00
hfs	hfs: prevent crash on exit from failed search	2018-08-23 18:48:42 -07:00
hfsplus	hfsplus: prevent crash on exit from failed search	2018-08-23 18:48:42 -07:00
hostfs	vfs: discard ATTR_ATTR_FLAG	2018-08-17 16:20:28 -07:00
hpfs	hpfs: remove unnecessary checks on the value of r when assigning error code	2018-08-25 12:42:33 -07:00
hugetlbfs	mm: zero out the vma in vma_init()	2018-08-22 10:52:44 -07:00
isofs	isofs: reject hardware sector size > 2048 bytes	2018-08-21 11:37:41 +02:00
jbd2	jbd2: fix use after free in jbd2_log_do_checkpoint()	2018-10-05 18:44:40 -04:00
jffs2	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2018-10-24 11:22:39 +01:00
jfs	jfs: remove redundant dquot_initialize() in jfs_evict_inode()	2018-09-20 09:28:49 -05:00
kernfs	mm: zero-seek shrinkers	2018-10-26 16:26:33 -07:00
lockd	nfsd: fix leaked file lock with nfs exported overlayfs	2018-08-09 16:11:21 -04:00
minix
nfs	NFS client bugfixes for Linux 4.19	2018-09-14 19:25:28 -10:00
nfs_common
nfsd	vfs: swap names of {do,vfs}_clone_file_range()	2018-09-24 10:54:01 +02:00
nilfs2	nilfs2: convert to SPDX license tags	2018-09-04 16:45:02 -07:00
nls
notify	fsnotify: fix ignore mask logic in fsnotify()	2018-09-03 14:57:41 +02:00
ntfs	ntfs: mft: remove VLA usage	2018-08-17 16:20:27 -07:00
ocfs2	ocfs2: remove set but not used variable 'rb'	2018-10-26 16:25:18 -07:00
omfs
openpromfs
orangefs	orangefs: no need to check for service_operation returns > 0	2018-10-18 14:05:46 -04:00
overlayfs	ovl: fix format of setxattr debug	2018-10-04 14:49:10 +02:00
proc	mm: zero-seek shrinkers	2018-10-26 16:26:33 -07:00
pstore	pstore improvements:	2018-10-24 14:42:02 +01:00
qnx4
qnx6
quota	fs/quota: Fix spectre gadget in do_quotactl	2018-08-22 18:17:48 +02:00
ramfs
reiserfs	reiserfs: fix broken xattr handling (heap corruption, bad retval)	2018-08-22 10:52:50 -07:00
romfs
squashfs	Squashfs: Compute expected length from inode size rather than block length	2018-08-02 09:34:02 -07:00
sysfs	Driver core patches for 4.19-rc1	2018-08-18 11:44:53 -07:00
sysv	fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp	2018-08-22 10:52:51 -07:00
tracefs	tracefs: Annotate tracefs_ops with __ro_after_init	2018-07-31 11:32:44 -04:00
ubifs	ubifs: Fix WARN_ON logic in exit path	2018-10-13 11:05:02 +02:00
udf	udf: Fix mounting of Win7 created UDF filesystems	2018-08-24 11:13:32 +02:00
ufs	fs/ufs: use ktime_get_real_seconds for sb and cg timestamps	2018-08-17 16:20:27 -07:00
xfs	xfs: cancel COW blocks before swapext	2018-10-18 17:21:55 +11:00
aio.c	y2038: globally rename compat_time to old_time32	2018-08-27 14:48:48 +02:00
anon_inodes.c	anon_inode_getfile(): switch to alloc_file_pseudo()	2018-07-12 10:04:27 -04:00
attr.c	fs: Fix attr.c kernel-doc	2018-07-03 16:44:45 -04:00
bad_inode.c	get rid of 'opened' argument of ->atomic_open() - part 3	2018-07-12 10:04:20 -04:00
binfmt_aout.c
binfmt_elf_fdpic.c	treewide: kmalloc() -> kmalloc_array()	2018-06-12 16:19:22 -07:00
binfmt_elf.c	signal: Distinguish between kernel_siginfo and siginfo	2018-10-03 16:47:43 +02:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c	turn filp_clone_open() into inline wrapper for dentry_open()	2018-07-10 23:29:03 -04:00
binfmt_script.c
block_dev.c	for-4.19/block-20180812	2018-08-14 10:23:25 -07:00
buffer.c	blkcg: associate writeback bios with a blkg	2018-09-21 20:29:11 -06:00
char_dev.c
compat_binfmt_elf.c	y2038: globally rename compat_time to old_time32	2018-08-27 14:48:48 +02:00
compat_ioctl.c	Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2018-10-25 12:48:22 -07:00
compat.c	ncpfs: remove compat functionality	2018-06-05 19:23:26 +02:00
coredump.c	signal: Distinguish between kernel_siginfo and siginfo	2018-10-03 16:47:43 +02:00
d_path.c
dax.c	filesystem-dax: Fix dax_layout_busy_page() livelock	2018-10-08 11:38:44 -07:00
dcache.c	dcache: allocate external names from reclaimable kmalloc caches	2018-10-26 16:26:32 -07:00
dcookies.c
direct-io.c
drop_caches.c
eventfd.c	Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL	2018-06-28 10:40:47 -07:00
eventpoll.c	fs/eventpoll.c: simplify ep_is_linked() callers	2018-08-22 10:52:49 -07:00
exec.c	vfs: require i_size <= SIZE_MAX in kernel_read_file()	2018-10-10 12:56:14 -04:00
fcntl.c	signal: Distinguish between kernel_siginfo and siginfo	2018-10-03 16:47:43 +02:00
fhandle.c
file_table.c	overlayfs update for 4.19	2018-08-21 18:19:09 -07:00
file.c
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c
inode.c	overlayfs update for 4.19	2018-08-21 18:19:09 -07:00
internal.h	overlayfs update for 4.19	2018-08-21 18:19:09 -07:00
ioctl.c	vfs: swap names of {do,vfs}_clone_file_range()	2018-09-24 10:54:01 +02:00
iomap.c	fs/iomap.c: change return type to vm_fault_t	2018-10-26 16:25:18 -07:00
Kconfig	autofs: remove left-over autofs4 stubs	2018-06-11 08:22:34 -07:00
Kconfig.binfmt	kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt	2018-08-02 08:06:55 +09:00
libfs.c
locks.c	overlayfs update for 4.19	2018-08-21 18:19:09 -07:00
Makefile	autofs: remove left-over autofs4 stubs	2018-06-11 08:22:34 -07:00
mbcache.c	treewide: kmalloc() -> kmalloc_array()	2018-06-12 16:19:22 -07:00
mount.h
mpage.c	mpage: mpage_readpages() should submit IO as read-ahead	2018-08-17 16:20:29 -07:00
namei.c	namei: allow restricted O_CREAT of FIFOs and regular files	2018-08-23 18:48:43 -07:00
namespace.c	x86/fault: BUG() when uaccess helpers fault on kernel addresses	2018-09-03 15:12:09 +02:00
no-block.c
nsfs.c
open.c	overlayfs update for 4.19	2018-08-21 18:19:09 -07:00
pipe.c	Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2018-08-13 19:58:36 -07:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-10-25 11:14:36 -07:00
readdir.c
select.c	y2038: globally rename compat_time to old_time32	2018-08-27 14:48:48 +02:00
seq_file.c	fs/seq_file.c: simplify seq_file iteration code and interface	2018-08-17 16:20:28 -07:00
signalfd.c	signal: Distinguish between kernel_siginfo and siginfo	2018-10-03 16:47:43 +02:00
splice.c	Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2018-06-16 16:21:50 +09:00
stack.c
stat.c	y2038: Remove newstat family from default syscall set	2018-08-29 15:42:20 +02:00
statfs.c	kernel: add kcompat_sys_{f,}statfs64()	2018-07-12 14:49:48 +01:00
super.c	Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax	2018-08-26 11:48:42 -07:00
sync.c
timerfd.c	y2038: globally rename compat_time to old_time32	2018-08-27 14:48:48 +02:00
userfaultfd.c	userfaultfd: disable irqs when taking the waitqueue lock	2018-10-26 16:25:18 -07:00
utimes.c	y2038: utimes: Rework #ifdef guards for compat syscalls	2018-08-29 15:42:23 +02:00
xattr.c	sysfs: Do not return POSIX ACL xattrs via listxattr	2018-09-18 07:30:48 -04:00