linux

iv/linux

History

Ritesh Harjani 07b5b8e1ac ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling There could be a race in function ext4_mb_discard_group_preallocations() where the 1st thread may iterate through group's bb_prealloc_list and remove all the PAs and add to function's local list head. Now if the 2nd thread comes in to discard the group preallocations, it will see that the group->bb_prealloc_list is empty and will return 0. Consider for a case where we have less number of groups (for e.g. just group 0), this may even return an -ENOSPC error from ext4_mb_new_blocks() (where we call for ext4_mb_discard_group_preallocations()). But that is wrong, since 2nd thread should have waited for 1st thread to release all the PAs and should have retried for allocation. Since 1st thread was anyway going to discard the PAs. The algorithm using this percpu seq counter goes below: 1. We sample the percpu discard_pa_seq counter before trying for block allocation in ext4_mb_new_blocks(). 2. We increment this percpu discard_pa_seq counter when we either allocate or free these blocks i.e. while marking those blocks as used/free in mb_mark_used()/mb_free_blocks(). 3. We also increment this percpu seq counter when we successfully identify that the bb_prealloc_list is not empty and hence proceed for discarding of those PAs inside ext4_mb_discard_group_preallocations(). Now to make sure that the regular fast path of block allocation is not affected, as a small optimization we only sample the percpu seq counter on that cpu. Only when the block allocation fails and when freed blocks found were 0, that is when we sample percpu seq counter for all cpus using below function ext4_get_discard_pa_seq_sum(). This happens after making sure that all the PAs on grp->bb_prealloc_list got freed or if it's empty. It can be well argued that why don't just check for grp->bb_free to see if there are any free blocks to be allocated. So here are the two concerns which were discussed:- 1. If for some reason the blocks available in the group are not appropriate for allocation logic (say for e.g. EXT4_MB_HINT_GOAL_ONLY, although this is not yet implemented), then the retry logic may result into infinte looping since grp->bb_free is non-zero. 2. Also before preallocation was clubbed with block allocation with the same ext4_lock_group() held, there were lot of races where grp->bb_free could not be reliably relied upon. Due to above, this patch considers discard_pa_seq logic to determine if we should retry for block allocation. Say if there are are n threads trying for block allocation and none of those could allocate or discard any of the blocks, then all of those n threads will fail the block allocation and return -ENOSPC error. (Since the seq counter for all of those will match as no block allocation/discard was done during that duration). Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>		2020-06-03 23:16:53 -04:00
..
9p	9p: read only once on O_NONBLOCK	2020-03-27 09:29:56 +00:00
adfs	fs/adfs: bigdir: Fix an error code in adfs_fplus_read()	2020-01-25 11:31:59 -05:00
affs	affs: fix a memory leak in affs_remount	2019-11-18 14:26:43 +01:00
afs	afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate	2020-04-24 16:33:32 +01:00
autofs	LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()	2020-03-13 21:08:17 -04:00
befs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
bfs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
btrfs	btrfs: fix gcc-4.8 build warning for struct initializer	2020-04-30 12:17:49 +02:00
cachefiles	cachefiles: drop direct usage of ->bmap method.	2020-02-03 08:05:56 -05:00
ceph	ceph: fix potential bad pointer deref in async dirops cb's	2020-04-13 19:33:47 +02:00
cifs	cifs: fix uninitialised lease_key in open_shroot()	2020-04-22 20:29:11 -05:00
coda	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
configfs	utimes: Clamp the timestamps in notify_change()	2019-12-08 19:10:50 -05:00
cramfs	cramfs: switch to use of errofc() et.al.	2020-02-07 14:48:41 -05:00
crypto	fscrypt updates for 5.7	2020-03-31 12:58:36 -07:00
debugfs	debugfs: remove return value of debugfs_create_u32()	2020-04-17 17:08:50 +02:00
devpts	devpts_pty_kill(): don't bother with d_delete()	2019-09-03 09:30:56 -04:00
dlm	dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD	2019-12-18 18:07:31 +01:00
ecryptfs	eCryptfs fixes for 5.6-rc3	2020-02-17 21:08:37 -08:00
efivarfs	efi: Use more granular check for availability for variable services	2020-02-23 21:59:42 +01:00
efs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
erofs	erofs: handle corrupted images whose decompressed size less than it'd be	2020-03-03 23:40:52 +08:00
exfat	exfat: truncate atimes to 2s granularity	2020-04-22 20:14:06 +09:00
exportfs	race in exportfs_decode_fh()	2019-11-11 09:21:59 -05:00
ext2	ext2: fix empty body warnings when -Wextra is used	2020-03-23 13:01:37 +01:00
ext4	ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling	2020-06-03 23:16:53 -04:00
f2fs	f2fs-for-5.7-rc1	2020-04-07 13:48:26 -07:00
fat	fat: fix uninit-memory access for partial initialized inode	2020-03-06 07:06:09 -06:00
freevxfs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
fscache	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
fuse	fuse: fix stack use after return	2020-02-13 09:16:07 +01:00
gfs2	We've got a lot of patches (39) for this merge window. Most of these patches	2020-03-31 14:16:03 -07:00
hfs	hfs/hfsplus: use 64-bit inode timestamps	2019-12-18 18:07:32 +01:00
hfsplus	hfsplus: fix crash and filesystem corruption when deleting files	2020-04-10 15:36:20 -07:00
hostfs	hostfs: Use kasprintf() instead of fixed buffer formatting	2020-03-29 23:23:00 +02:00
hpfs	fs: compat_ioctl: move FITRIM emulation into file systems	2019-10-23 17:23:46 +02:00
hugetlbfs	hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race	2020-04-02 09:35:32 -07:00
iomap	fibmap: Warn and return an error in case of block > INT_MAX	2020-04-30 07:57:46 -07:00
isofs	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
jbd2	jbd2: improve comments about freeing data buffers whose page mapping is NULL	2020-03-05 20:25:05 -05:00
jffs2	fs_parse: fold fs_parameter_desc/fs_parameter_spec	2020-02-07 14:48:37 -05:00
jfs	Trivial cleanup for jfs	2020-02-05 05:28:20 +00:00
kernfs	kernfs: Add option to enable user xattrs	2020-03-16 15:53:47 -04:00
lockd	proc: convert everything to "struct proc_ops"	2020-02-04 03:05:26 +00:00
minix	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
nfs	NFS: Fix a race in __nfs_list_for_each_server()	2020-04-30 15:08:26 -04:00
nfs_common	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
nfsd	SUNRPC: Fix backchannel RPC soft lockups	2020-04-17 12:40:31 -04:00
nilfs2	fs: compat_ioctl: move FITRIM emulation into file systems	2019-10-23 17:23:46 +02:00
nls	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
notify	fanotify: Fix the checks in fanotify_fsid_equal	2020-03-30 12:40:53 +02:00
ntfs	fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t	2020-03-28 13:21:08 +01:00
ocfs2	dlmfs_file_write(): fix the bogosity in handling non-zero *ppos	2020-04-23 13:45:27 -04:00
omfs	fs: omfs: Initialize filesystem timestamp ranges	2019-08-30 08:11:25 -07:00
openpromfs	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
orangefs	orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush	2020-04-08 09:39:11 -04:00
overlayfs	ovl: enable xino automatically in more cases	2020-03-27 16:51:02 +01:00
proc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2020-04-25 12:25:32 -07:00
pstore	pstore/ram: Replace zero-length array with flexible-array member	2020-03-09 14:45:40 -07:00
qnx4	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
qnx6	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
quota	\n	2020-01-30 15:37:41 -08:00
ramfs	fs_parse: fold fs_parameter_desc/fs_parameter_spec	2020-02-07 14:48:37 -05:00
reiserfs	reiserfs: clean up several indentation issues	2020-04-07 10:43:44 -07:00
romfs	Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-09-19 10:06:57 -07:00
squashfs	Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-09-19 10:06:57 -07:00
sysfs	sysfs: remove redundant __compat_only_sysfs_link_entry_to_kobj fn	2020-04-05 11:34:35 -07:00
sysv	fs: sysv: Initialize filesystem timestamp ranges	2019-08-30 07:27:18 -07:00
tracefs	simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems	2019-12-10 22:29:58 -05:00
ubifs	This pull request contains fixes for UBI and UBIFS:	2020-04-07 12:40:56 -07:00
udf	change email address for Pali Rohár	2020-04-10 15:36:22 -07:00
ufs	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
unicode	.gitignore: add SPDX License Identifier	2020-03-25 11:50:48 +01:00
vboxsf	fs: Add VirtualBox guest shared folder (vboxsf) support	2020-02-08 17:34:58 -05:00
verity	fs-verity: use u64_to_user_ptr()	2020-01-14 13:28:28 -08:00
xfs	xfs: move inode flush to the sync workqueue	2020-04-16 09:07:42 -07:00
zonefs	zonfs: Fix handling of read-only zones	2020-03-25 11:28:26 +09:00
aio.c	aio: prevent potential eventfd recursion on poll	2020-02-03 17:27:47 -07:00
anon_inodes.c	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
attr.c	utimes: Clamp the timestamps in notify_change()	2019-12-08 19:10:50 -05:00
bad_inode.c
binfmt_aout.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_elf_fdpic.c	y2038: elfcore: Use __kernel_old_timeval for process times	2019-11-15 14:38:29 +01:00
binfmt_elf.c	fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path	2020-04-07 10:43:44 -07:00
binfmt_em86.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_flat.c	fs/binfmt_flat.c: remove set but not used variable 'inode'	2019-07-16 19:23:22 -07:00
binfmt_misc.c	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
binfmt_script.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
block_dev.c	block: remove unused header	2020-04-21 09:51:10 -06:00
buffer.c	block-5.7-2020-04-24	2020-04-24 12:44:19 -07:00
char_dev.c	chardev: Avoid potential use-after-free in 'chrdev_open()'	2020-01-06 20:10:26 +01:00
compat_binfmt_elf.c	y2038: elfcore: Use __kernel_old_timeval for process times	2019-11-15 14:38:29 +01:00
compat.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	2019-06-19 17:09:55 +02:00
coredump.c	coredump: fix null pointer dereference on coredump	2020-04-21 11:11:56 -07:00
d_path.c	[PATCH] fix d_absolute_path() interplay with fsmount()	2019-08-30 19:31:09 -04:00
dax.c	dax,iomap: Add helper dax_iomap_zero() to zero a range	2020-04-02 19:15:03 -07:00
dcache.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-12-08 11:08:28 -08:00
dcookies.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
direct-io.c	fs/direct-io.c: include fs/internal.h for missing prototype	2020-01-04 13:55:09 -08:00
drop_caches.c	fs: avoid softlockups in s_inodes iterators	2019-12-18 00:03:01 -05:00
eventfd.c	eventfd: track eventfd_signal() recursion depth	2020-02-03 17:27:38 -07:00
eventpoll.c	fs/epoll: make nesting accounting safe for -rt kernel	2020-04-07 10:43:44 -07:00
exec.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2020-04-02 11:22:17 -07:00
fcntl.c	fcntl: Distribute switch variables for initialization	2020-03-03 10:55:06 -05:00
fhandle.c	fs/handle.c - fix up kerneldoc	2019-08-07 21:51:47 -04:00
file_table.c	vfs: Export flush_delayed_fput for use by knfsd.	2019-08-19 11:00:39 -04:00
file.c	io_uring: make sure openat/openat2 honor rlimit nofile	2020-03-20 08:47:27 -06:00
filesystems.c	fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()	2020-04-10 15:36:22 -07:00
fs_context.c	add prefix to fs_context->log	2020-02-07 14:48:35 -05:00
fs_parser.c	fs_parse: remove pr_notice() about each validation	2020-04-02 09:35:26 -07:00
fs_pin.c	switch the remnants of releasing the mountpoint away from fs_pin	2019-07-16 22:52:37 -04:00
fs_struct.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
fs_types.c
fs-writeback.c	writeback: Export inode_io_list_del()	2020-06-03 23:16:49 -04:00
fsopen.c	add prefix to fs_context->log	2020-02-07 14:48:35 -05:00
inode.c	futex: Fix inode life-time issue	2020-03-06 11:06:15 +01:00
internal.h	writeback: Export inode_io_list_del()	2020-06-03 23:16:49 -04:00
io_uring.c	io_uring: punt splice async because of inode mutex	2020-05-01 08:50:57 -06:00
io-wq.c	io_uring: use io-wq manager as backup task if task is exiting	2020-04-03 11:35:57 -06:00
io-wq.h	io_uring: use io-wq manager as backup task if task is exiting	2020-04-03 11:35:57 -06:00
ioctl.c	fibmap: Warn and return an error in case of block > INT_MAX	2020-04-30 07:57:46 -07:00
Kconfig	exfat: add Kconfig and Makefile	2020-03-05 21:00:40 -05:00
Kconfig.binfmt	binfmt_flat: make support for old format binaries optional	2019-06-24 09:16:47 +10:00
libfs.c	libfs: fix infoleak in simple_attr_read()	2020-03-24 13:27:16 +01:00
locks.c	locks: reinstate locks_delete_block optimization	2020-03-18 13:03:38 -07:00
Makefile	exfat: add Kconfig and Makefile	2020-03-05 21:00:40 -05:00
mbcache.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
mount.h	switch the remnants of releasing the mountpoint away from fs_pin	2019-07-16 22:52:37 -04:00
mpage.c	fs: move guard_bio_eod() after bio_set_op_attrs	2020-01-09 08:16:12 -07:00
namei.c	fix a braino in legitimize_path()	2020-04-06 10:38:59 -04:00
namespace.c	LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()	2020-03-13 21:08:17 -04:00
no-block.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
nsfs.c	fs/nsfs.c: Added ns_match	2020-03-12 17:33:11 -07:00
open.c	Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-04-02 12:30:08 -07:00
pipe.c	mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page()	2020-04-02 09:35:28 -07:00
pnode.c	propagate_one(): mnt_set_mountpoint() needs mount_lock	2020-04-27 10:37:14 -04:00
pnode.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209	2019-05-30 11:29:53 -07:00
posix_acl.c	fs/posix_acl.c: fix kernel-doc warnings	2020-01-04 13:55:09 -08:00
proc_namespace.c	vfs: subtype handling moved to fuse	2019-09-06 21:28:49 +02:00
read_write.c	powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro	2020-04-03 00:09:59 +11:00
readdir.c	readdir: make user_access_begin() use the real access range	2020-01-23 10:15:28 -08:00
select.c	y2038: syscalls: change remaining timeval to __kernel_old_timeval	2019-11-15 14:38:29 +01:00
seq_file.c	fs/seq_file.c: seq_read(): add info message about buggy .next functions	2020-04-10 15:36:22 -07:00
signalfd.c
splice.c	splice: make do_splice public	2020-03-02 14:04:31 -07:00
stack.c	sched/rt, fs: Use CONFIG_PREEMPTION	2019-12-08 14:37:36 +01:00
stat.c	fs: make two stat prep helpers available	2020-01-20 17:03:54 -07:00
statfs.c	vfs: Fix EOVERFLOW testing in put_compat_statfs64	2019-10-03 14:21:35 -07:00
super.c	Fix use after free in get_tree_bdev()	2020-04-28 14:37:40 -07:00
sync.c	fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback	2019-05-14 09:47:50 -07:00
timerfd.c	timerfd: Make timerfd_settime() time namespace aware	2020-01-14 12:20:53 +01:00
userfaultfd.c	userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally	2020-04-07 10:43:40 -07:00
utimes.c	utimes: Clamp the timestamps in notify_change()	2019-12-08 19:10:50 -05:00
xattr.c	kernfs: Add removed_size out param for simple_xattr_set	2020-03-16 15:53:47 -04:00