linux/fs/btrfs
Josef Bacik e19eb11f4f btrfs: only let one thread pre-flush delayed refs in commit
I've been running a stress test that runs 20 workers in their own
subvolume, which are running an fsstress instance with 4 threads per
worker, which is 80 total fsstress threads.  In addition to this I'm
running balance in the background as well as creating and deleting
snapshots.  This test takes around 12 hours to run normally, going
slower and slower as the test goes on.

The reason for this is because fsstress is running fsync sometimes, and
because we're messing with block groups we often fall through to
btrfs_commit_transaction, so will often have 20-30 threads all calling
btrfs_commit_transaction at the same time.

These all get stuck contending on the extent tree while they try to run
delayed refs during the initial part of the commit.

This is suboptimal, really because the extent tree is a single point of
failure we only want one thread acting on that tree at once to reduce
lock contention.

Fix this by making the flushing mechanism a bit operation, to make it
easy to use test_and_set_bit() in order to make sure only one task does
this initial flush.

Once we're into the transaction commit we only have one thread doing
delayed ref running, it's just this initial pre-flush that is
problematic.  With this patch my stress test takes around 90 minutes to
run, instead of 12 hours.

The memory barrier is not necessary for the flushing bit as it's
ordered, unlike plain int. The transaction state accessed in
btrfs_should_end_transaction could be affected by that too as it's not
always used under transaction lock. Upon Nikolay's analysis in [1]
it's not necessary:

  In should_end_transaction it's read without holding any locks. (U)

  It's modified in btrfs_cleanup_transaction without holding the
  fs_info->trans_lock (U), but the STATE_ERROR flag is going to be set.

  set in cleanup_transaction under fs_info->trans_lock (L)
  set in btrfs_commit_trans to COMMIT_START under fs_info->trans_lock.(L)
  set in btrfs_commit_trans to COMMIT_DOING under fs_info->trans_lock.(L)
  set in btrfs_commit_trans to COMMIT_UNBLOCK under
  fs_info->trans_lock.(L)

  set in btrfs_commit_trans to COMMIT_COMPLETED without locks but at this
  point the transaction is finished and fs_info->running_trans is NULL (U
  but irrelevant).

  So by the looks of it we can have a concurrent READ race with a WRITE,
  due to reads not taking a lock. In this case what we want to ensure is
  we either see new or old state. I consulted with Will Deacon and he said
  that in such a case we'd want to annotate the accesses to ->state with
  (READ|WRITE)_ONCE so as to avoid a theoretical tear, in this case I
  don't think this could happen but I imagine at some point KCSAN would
  flag such an access as racy (which it is).

[1] https://lore.kernel.org/linux-btrfs/e1fd5cc1-0f28-f670-69f4-e9958b4964e6@suse.com

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add comments regarding memory barrier ]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-08 22:58:56 +01:00
..
tests btrfs: tests: initialize test inodes location 2020-12-18 14:59:49 +01:00
acl.c
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: do not warn if we can't find the reloc root when looking up backref 2021-02-08 22:58:56 +01:00
backref.h btrfs: add asserts for deleting backref cache nodes 2021-02-08 22:58:56 +01:00
block-group.c btrfs: do not block on deleted bgs mutex in the cleaner 2021-02-08 22:58:56 +01:00
block-group.h btrfs: load free space cache asynchronously 2020-12-08 15:54:03 +01:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: make btrfs_dio_private::bytes u32 2021-02-08 22:58:51 +01:00
check-integrity.c btrfs: drop casts of bio bi_sector 2020-12-09 19:16:05 +01:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: refactor btrfs_lookup_bio_sums to handle out-of-order bvecs 2020-12-09 19:16:11 +01:00
compression.h btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
ctree.c btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root 2021-02-08 22:58:56 +01:00
ctree.h btrfs: make btrfs_start_delalloc_root's nr argument a long 2021-02-08 22:58:51 +01:00
delalloc-space.c btrfs: fix parameter description of btrfs_inode_rsv_release/btrfs_delalloc_release_space 2021-02-08 22:58:54 +01:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: account for new extents being deleted in total_bytes_pinned 2021-02-08 22:58:55 +01:00
delayed-ref.h btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
dev-replace.c btrfs: make btrfs_start_delalloc_root's nr argument a long 2021-02-08 22:58:51 +01:00
dev-replace.h btrfs: add __pure attribute to functions 2019-11-18 12:46:52 +01:00
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: document now parameter of peek_discard_list 2021-02-08 22:58:53 +01:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: make btrfs_root::free_objectid hold the next available objectid 2021-02-08 22:58:50 +01:00
disk-io.h btrfs: rename btrfs_find_free_objectid to btrfs_get_free_objectid 2021-02-08 22:58:49 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent_io.c btrfs: remove repeated word in struct member comment 2021-02-08 22:58:55 +01:00
extent_io.h btrfs: update num_extent_pages to support subpage sized extent buffer 2020-12-09 19:16:10 +01:00
extent_map.c btrfs: fix parameter description of btrfs_add_extent_mapping 2021-02-08 22:58:53 +01:00
extent_map.h btrfs: remove extent_map::bdev 2019-11-18 23:43:44 +01:00
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: account for new extents being deleted in total_bytes_pinned 2021-02-08 22:58:55 +01:00
file-item.c btrfs: fix function description formats in file-item.c 2021-02-08 22:58:53 +01:00
file.c btrfs: update comment for btrfs_dirty_pages 2021-02-08 22:58:52 +01:00
free-space-cache.c btrfs: improve parameter description for __btrfs_write_out_cache 2021-02-08 22:58:53 +01:00
free-space-cache.h btrfs: remove free space items when disabling space cache v1 2020-12-09 19:16:09 +01:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-01-25 18:44:37 +01:00
free-space-tree.h btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: fix description format of fs_info of btrfs_wait_on_delayed_iputs 2021-02-08 22:58:54 +01:00
ioctl.c btrfs: make btrfs_start_delalloc_root's nr argument a long 2021-02-08 22:58:51 +01:00
Kconfig btrfs: switch to iomap for direct IO 2020-10-07 12:06:57 +02:00
locking.c btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00
Makefile btrfs: enable W=1 checks for btrfs 2021-02-08 22:58:55 +01:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: rework the order of btrfs_ordered_extent::flags 2021-02-08 22:58:52 +01:00
ordered-data.h btrfs: rework the order of btrfs_ordered_extent::flags 2021-02-08 22:58:52 +01:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h
qgroup.c btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan 2020-12-18 14:59:57 +01:00
qgroup.h btrfs: qgroup: export qgroups in sysfs 2020-07-27 12:55:37 +02:00
raid56.c btrfs: remove redundant NULL check before kvfree 2021-02-08 22:58:52 +01:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
ref-verify.c btrfs: ref-verify: make sure owner is set for all refs 2021-02-08 22:58:50 +01:00
ref-verify.h
reflink.c btrfs: fix deadlock when cloning inline extent and low on free metadata space 2020-12-18 14:49:50 +01:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: fix reloc root leak with 0 ref reloc roots on recovery 2021-02-08 22:58:55 +01:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: scrub: allow scrub to work with subpage sectorsize 2020-12-09 19:16:11 +01:00
send.c btrfs: send: remove stale code when checking for shared extents 2021-02-08 22:58:51 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: fix parameter description in space-info.c 2021-02-08 22:58:54 +01:00
space-info.h btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself 2021-02-08 22:58:55 +01:00
struct-funcs.c btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors 2020-12-09 19:16:10 +01:00
super.c btrfs: run delayed iputs when remounting RO to avoid leaking them 2020-12-18 15:00:08 +01:00
sysfs.c for-5.11/block-2020-12-14 2020-12-16 12:57:51 -08:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
transaction.h btrfs: return bool from btrfs_should_end_transaction 2020-12-08 15:54:16 +01:00
tree-checker.c btrfs: tree-checker: check if chunk item end overflows 2021-01-07 17:25:05 +01:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: rename btrfs_find_highest_objectid to btrfs_init_root_free_objectid 2021-02-08 22:58:49 +01:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: consolidate btrfs_previous_item ret val handling in btrfs_shrink_device 2021-02-08 22:58:51 +01:00
volumes.h btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch 2021-01-25 18:44:50 +01:00
xattr.c btrfs: skip unnecessary searches for xattrs when logging an inode 2020-12-08 15:54:12 +01:00
xattr.h
zlib.c btrfs: use larger zlib buffer for s390 hardware compression 2020-01-31 10:30:40 -08:00
zoned.c btrfs: zoned: remove unused variable in btrfs_sb_log_location_bdev 2021-02-08 22:58:54 +01:00
zoned.h btrfs: implement log-structured superblock for ZONED mode 2020-12-09 19:16:04 +01:00
zstd.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00