linux

iv/linux

History

Qu Wenruo 7248e0cebb btrfs: skip update of block group item if used bytes are the same [BACKGROUND] When committing a transaction, we will update block group items for all dirty block groups. But in fact, dirty block groups don't always need to update their block group items. It's pretty common to have a metadata block group which experienced several COW operations, but still have the same amount of used bytes. In that case, we may unnecessarily COW a tree block doing nothing. [ENHANCEMENT] This patch will introduce btrfs_block_group::commit_used member to remember the last used bytes, and use that new member to skip unnecessary block group item update. This would be more common for large filesystems, where metadata block group can be as large as 1GiB, containing at most 64K metadata items. In that case, if COW added and then deleted one metadata item near the end of the block group, then it's completely possible we don't need to touch the block group item at all. [BENCHMARK] The change itself can have quite a high chance (20~80%) to skip block group item updates in lot of workloads. As a result, it would result shorter time spent on btrfs_write_dirty_block_groups(), and overall reduce the execution time of the critical section of btrfs_commit_transaction(). Here comes a fio command, which will do random writes in 4K block size, causing a very heavy metadata updates. fio --filename=$mnt/file --size=512M --rw=randwrite --direct=1 --bs=4k \ --ioengine=libaio --iodepth=64 --runtime=300 --numjobs=4 \ --name=random_write --fallocate=none --time_based --fsync_on_close=1 The file size (512M) and number of threads (4) means 2GiB file size in total, but during the full 300s run time, my dedicated SATA SSD is able to write around 20~25GiB, which is over 10 times the file size. Thus after we fill the initial 2G, we should not cause much block group item updates. Please note, the fio numbers by themselves don't have much change, but if we look deeper, there is some reduced execution time, especially for the critical section of btrfs_commit_transaction(). I added extra trace_printk() to measure the following per-transaction execution time: - Critical section of btrfs_commit_transaction() By re-using the existing update_commit_stats() function, which has already calculated the interval correctly. - The while() loop for btrfs_write_dirty_block_groups() Although this includes the execution time of btrfs_run_delayed_refs(), it should still be representative overall. Both result involves transid 7~30, the same amount of transaction committed. The result looks like this: \| Before \| After \| Diff ----------------------+-------------------+----------------+-------- Transaction interval \| 229247198.5 \| 215016933.6 \| -6.2% Block group interval \| 23133.33333 \| 18970.83333 \| -18.0% The change in block group item updates is more obvious, as skipped block group item updates also mean less delayed refs. And the overall execution time for that block group update loop is pretty small, thus we can assume the extent tree is already mostly cached. If we can skip an uncached tree block, it would cause more obvious change. Unfortunately the overall reduction in commit transaction critical section is much smaller, as the block group item updates loop is not really the major part, at least not for the above fio script. But still we have a observable reduction in the critical section. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2022-12-05 18:00:40 +01:00
..
9p	9p: Fix some kernel-doc comments	2022-07-02 18:52:21 +09:00
adfs	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
affs	affs: move from strlcpy with unused retval to strscpy	2022-08-19 13:03:10 +02:00
afs	afs: Fix server->active leak in afs_put_server	2022-11-30 10:02:37 -08:00
autofs	autofs: remove unused ino field inode	2022-07-17 17:31:42 -07:00
befs	befs: Convert befs_symlink_read_folio() to use a folio	2022-08-02 12:34:03 -04:00
bfs	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
btrfs	btrfs: skip update of block group item if used bytes are the same	2022-12-05 18:00:40 +01:00
cachefiles	cachefiles: use vfs_tmpfile_open() helper	2022-09-24 07:00:00 +02:00
ceph	ceph: fix NULL pointer dereference for req->r_session	2022-11-14 10:29:05 +01:00
cifs	cifs: fix missing unlock in cifs_file_copychunk_range()	2022-11-21 10:27:03 -06:00
coda	coda: Convert coda_symlink_filler() to use a folio	2022-08-02 12:34:03 -04:00
configfs
cramfs	cramfs: read_mapping_page() is synchronous	2022-08-02 12:34:02 -04:00
crypto	fscrypt: fix keyring memory leak on mount failure	2022-10-19 20:54:43 -07:00
debugfs	debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops	2022-09-24 15:00:48 +02:00
devpts
dlm	Networking changes for 6.1.	2022-10-04 13:38:03 -07:00
ecryptfs	whack-a-mole: constifying struct path *	2022-10-06 17:31:02 -07:00
efivarfs	efi: efivars: Fix variable writes without query_variable_store()	2022-10-21 11:09:40 +02:00
efs	efs: Convert efs symlinks to read_folio	2022-05-09 16:21:45 -04:00
erofs	Changes since last update:	2022-11-15 10:30:34 -08:00
exfat	treewide: use get_random_u32() when possible	2022-10-11 17:42:58 -06:00
exportfs	Change calling conventions for filldir_t	2022-08-17 17:25:04 -04:00
ext2	treewide: use prandom_u32_max() when possible, part 2	2022-10-11 17:42:58 -06:00
ext4	ext4: fix use-after-free in ext4_ext_shift_extents	2022-11-07 12:53:43 -05:00
f2fs	Random number generator fixes for Linux 6.1-rc1.	2022-10-16 15:27:07 -07:00
fat	treewide: use get_random_u32() when possible	2022-10-11 17:42:58 -06:00
freevxfs	freevxfs: Convert vxfs_immed_read_folio() to use a folio	2022-08-02 12:34:03 -04:00
fscache	fscache: fix OOB Read in __fscache_acquire_volume	2022-11-23 10:31:13 -08:00
fuse	fuse: lock inode unconditionally in fuse_fallocate()	2022-11-23 09:10:42 +01:00
gfs2	gfs2 debugfs improvements	2022-10-10 20:13:22 -07:00
hfs	hfs: replace kmap() with kmap_local_page() in btree.c	2022-09-11 21:55:09 -07:00
hfsplus	hfsplus: convert kmap() to kmap_local_page() in btree.c	2022-09-11 21:55:05 -07:00
hostfs	hostfs: move from strlcpy with unused retval to strscpy	2022-09-19 22:46:25 +02:00
hpfs	hpfs: Convert symlinks to read_folio	2022-05-09 16:21:45 -04:00
hugetlbfs	hugetlbfs: don't delete error page from pagecache	2022-11-08 15:57:22 -08:00
iomap	iomap: add a tracepoint for mappings returned by map_blocks	2022-10-02 11:42:19 -07:00
isofs	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
jbd2	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
jffs2	mtd: always initialize 'stats' in struct mtd_oob_ops	2022-09-21 10:38:07 +02:00
jfs	Folio changes for 6.0	2022-08-03 10:35:43 -07:00
kernfs	kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id()	2022-11-10 19:03:42 +01:00
ksmbd	vfs: fix copy_file_range() averts filesystem freeze protection	2022-11-25 00:52:28 -05:00
lockd	SUNRPC: Parametrize how much of argsize should be zeroed	2022-09-26 14:02:42 -04:00
minix	vfs: open inside ->tmpfile()	2022-09-24 07:00:00 +02:00
netfs	netfs: Fix dodgy maths	2022-11-15 16:56:07 +00:00
nfs	nfs4: Fix kmemleak when allocate slot failed	2022-10-27 15:52:11 -04:00
nfs_common
nfsd	Amir's copy_file_range() fix	2022-11-27 12:40:06 -08:00
nilfs2	nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()	2022-11-30 14:49:40 -08:00
nls
notify	Merge tag 'fsnotify-for_v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2022-10-07 08:28:50 -07:00
ntfs	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
ntfs3	treewide: use get_random_u32() when possible	2022-10-11 17:42:58 -06:00
ocfs2	ocfs2: clear dinode links count in case of error	2022-10-20 21:27:22 -07:00
omfs	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
openpromfs	fs: allocate inode by using alloc_inode_sb()	2022-03-22 15:57:03 -07:00
orangefs	Orangefs: change iterate to iterate_shared	2022-10-13 09:56:14 -07:00
overlayfs	tmpfile API change	2022-10-10 19:45:17 -07:00
proc	proc/meminfo: fix spacing in SecPageTables	2022-11-22 18:50:44 -08:00
pstore	Revert "pstore: migrate to crypto acomp interface"	2022-09-30 08:16:06 -07:00
qnx4	fs: Convert block_read_full_page() to block_read_full_folio()	2022-05-09 16:21:44 -04:00
qnx6	fs/qnx6: delete unnecessary checks before brelse()	2022-09-11 21:55:07 -07:00
quota	quota: Add more checking after reading from quota file	2022-09-29 15:37:30 +02:00
ramfs	tmpfile API change	2022-10-10 19:45:17 -07:00
reiserfs	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
romfs	romfs: Convert romfs to read_folio	2022-05-09 16:21:46 -04:00
smbfs_common	smb3: define missing create contexts	2022-10-05 01:55:27 -05:00
squashfs	squashfs: fix buffer release race condition in readahead code	2022-10-28 13:37:21 -07:00
sysfs	kobject: kobj_type: remove default_attrs	2022-04-05 15:39:19 +02:00
sysv	Not a lot of material this cycle. Many singleton patches against various	2022-05-27 11:22:03 -07:00
tracefs	tracefs: Only clobber mode/uid/gid on remount if asked	2022-09-08 17:10:54 -04:00
ubifs	Random number generator fixes for Linux 6.1-rc1.	2022-10-16 15:27:07 -07:00
udf	udf: Fix a slab-out-of-bounds write bug in udf_find_entry()	2022-11-09 12:24:42 +01:00
ufs	ufs: replace ll_rw_block()	2022-09-11 20:26:07 -07:00
unicode
vboxsf	vboxsf: Convert vboxsf to read_folio	2022-05-09 16:21:46 -04:00
verity	for-6.1-tag	2022-10-06 17:36:48 -07:00
xfs	xfs: rename XFS_REFC_COW_START to _COWFLAG	2022-10-31 08:58:22 -07:00
zonefs	zonefs: Fix active zone accounting	2022-11-25 17:01:22 +09:00
aio.c	aio: use atomic_try_cmpxchg in __get_reqs_available	2022-09-11 21:55:08 -07:00
anon_inodes.c	dynamic_dname(): drop unused dentry argument	2022-08-20 11:34:04 -04:00
attr.c	vfs: Check the truncate maximum size in inode_newsize_ok()	2022-08-08 10:39:29 -07:00
bad_inode.c	vfs: open inside ->tmpfile()	2022-09-24 07:00:00 +02:00
binfmt_elf_fdpic.c
binfmt_elf_test.c
binfmt_elf.c	fs/binfmt_elf: Fix memory leak in load_elf_binary()	2022-10-25 15:11:21 -07:00
binfmt_flat.c	binfmt_flat: Remove shared library support	2022-04-22 10:57:18 -07:00
binfmt_misc.c
binfmt_script.c
buffer.c	- hfs and hfsplus kmap API modernization from Fabio Francesco	2022-10-12 11:00:22 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
d_path.c	d_path.c: typo fix...	2022-08-20 11:34:33 -04:00
dax.c	Merge branch 'for-6.0/dax' into libnvdimm-fixes	2022-09-24 18:14:12 -07:00
dcache.c	tmpfile API change	2022-10-10 19:45:17 -07:00
direct-io.c	block: remove PSI accounting from the bio layer	2022-09-20 08:24:38 -06:00
drop_caches.c
eventfd.c	eventfd: guard wake_up in eventfd fs calls as well	2022-09-21 10:30:42 -06:00
eventpoll.c	epoll: use try_cmpxchg in list_add_tail_lockless	2022-09-11 21:55:07 -07:00
exec.c	23 hotfixes.	2022-10-29 17:49:33 -07:00
fcntl.c	keep iocb_flags() result cached in struct file	2022-06-10 16:10:23 -04:00
fhandle.c	do_sys_name_to_handle(): constify path	2022-09-01 17:36:39 -04:00
file_table.c	locks: fix TOCTOU race when granting write lease	2022-08-16 10:59:54 -04:00
file.c	fs: use acquire ordering in __fget_light()	2022-10-31 15:30:11 -04:00
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c	fs: do not update freeing inode i_io_list	2022-11-22 17:00:00 -05:00
fsopen.c	uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)	2022-05-19 23:25:10 -04:00
init.c
inode.c	saner inode_init_always()	2022-10-06 16:49:00 -07:00
internal.h	whack-a-mole: constifying struct path *	2022-10-06 17:31:02 -07:00
ioctl.c	Fixes for 5.18-rc1:	2022-04-01 19:35:56 -07:00
Kconfig	hugetlb: make hugetlb depends on SYSFS or SYSCTL	2022-09-11 20:26:10 -07:00
Kconfig.binfmt	Xtensa updates for v6.1	2022-10-10 14:21:11 -07:00
kernel_read_file.c	fs/kernel_read_file: allow to read files up-to ssize_t	2022-06-16 19:58:21 -07:00
libfs.c	fs: uninline inode_maybe_inc_iversion()	2022-10-03 14:21:43 -07:00
locks.c	locks: Fix dropped call to ->fl_release_private()	2022-08-17 15:08:58 -04:00
Makefile	a.out: Remove the a.out implementation	2022-09-27 07:11:02 -07:00
mbcache.c	mbcache: Avoid nesting of cache->c_list_lock under bit locks	2022-09-30 23:46:52 -04:00
mount.h	switch try_to_unlazy_next() to __legitimize_mnt()	2022-07-05 16:18:21 -04:00
mpage.c	Folio changes for 6.0	2022-08-03 10:35:43 -07:00
namei.c	vfs: vfs_tmpfile: ensure O_EXCL flag is enforced	2022-11-19 02:22:11 -05:00
namespace.c	fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts	2022-08-17 11:27:11 +02:00
no-block.c
nsfs.c	dynamic_dname(): drop unused dentry argument	2022-08-20 11:34:04 -04:00
open.c	struct file-related stuff	2022-10-06 17:13:18 -07:00
pipe.c	dynamic_dname(): drop unused dentry argument	2022-08-20 11:34:04 -04:00
pnode.c
pnode.h
posix_acl.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
proc_namespace.c	vfs: escape hash as well	2022-06-28 13:58:05 -04:00
read_write.c	vfs: fix copy_file_range() averts filesystem freeze protection	2022-11-25 00:52:28 -05:00
readdir.c	Change calling conventions for filldir_t	2022-08-17 17:25:04 -04:00
remap_range.c	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe	2022-08-05 16:32:45 -07:00
select.c
seq_file.c	rxrpc: Fix locking issue	2022-05-22 21:03:01 +01:00
signalfd.c
splice.c	iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()	2022-08-08 22:37:23 -04:00
stack.c
stat.c	vfs: support STATX_DIOALIGN on block devices	2022-09-11 19:47:12 -05:00
statfs.c
super.c	fscrypt: fix keyring memory leak on mount failure	2022-10-19 20:54:43 -07:00
sync.c	riscv: compat: syscall: Add compat_sys_call_table implementation	2022-04-26 13:36:25 -07:00
sysctls.c
timerfd.c
userfaultfd.c	fs/userfaultfd: Fix maple tree iterator in userfaultfd_unregister()	2022-11-07 12:58:26 -08:00
utimes.c
xattr.c	xattr: always us is_posix_acl_xattr() helper	2022-09-21 12:01:29 +02:00