linux/fs/btrfs
Filipe Manana db21370bff btrfs: drop extent map range more efficiently
Currently when dropping extent maps for a file range, through
btrfs_drop_extent_map_range(), we do the following non-optimal things:

1) We lookup for extent maps one by one, always starting the search from
   the root of the extent map tree. This is not efficient if we have
   multiple extent maps in the range;

2) We check on every iteration if we have the 'split' and 'split2' spare
   extent maps in case we need to split an extent map that intersects our
   range but also crosses its boundaries (to the left, to the right or
   both cases). If our target range is for example:

       [2M, 8M)

   And we have 3 extents maps in the range:

       [1M, 3M) [3M, 6M) [6M, 10M[

   The on the first iteration we allocate two extent maps for 'split' and
   'split2', and use the 'split' to split the first extent map, so after
   the split we set 'split' to 'split2' and then set 'split2' to NULL.

   On the second iteration, we don't need to split the second extent map,
   but because 'split2' is now NULL, we allocate a new extent map for
   'split2'.

   On the third iteration we need to split the third extent map, so we
   use the extent map pointed by 'split'.

   So we ended up allocating 3 extent maps for splitting, but all we
   needed was 2 extent maps. We never need to allocate more than 2,
   because extent maps that need to be split are always the first one
   and the last one in the target range.

Improve on this by:

1) Using rb_next() to move on to the next extent map. This results in
   iterating over less nodes of the tree and it does not require comparing
   the ranges of nodes to our start/end offset;

2) Allocate the 2 extent maps for splitting before entering the loop and
   never allocate more than 2. In practice it's very rare to have the
   combination of both extent map allocations fail, since we have a
   dedicated slab for extent maps, and also have the need to split two
   extent maps.

This patch is part of a patchset comprised of the following patches:

   btrfs: fix missed extent on fsync after dropping extent maps
   btrfs: move btrfs_drop_extent_cache() to extent_map.c
   btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
   btrfs: use cond_resched_rwlock_write() during inode eviction
   btrfs: move open coded extent map tree deletion out of inode eviction
   btrfs: add helper to replace extent map range with a new extent map
   btrfs: remove the refcount warning/check at free_extent_map()
   btrfs: remove unnecessary extent map initializations
   btrfs: assert tree is locked when clearing extent map from logging
   btrfs: remove unnecessary NULL pointer checks when searching extent maps
   btrfs: remove unnecessary next extent map search
   btrfs: avoid pointless extent map tree search when flushing delalloc
   btrfs: drop extent map range more efficiently

And the following fio test was done before and after applying the whole
patchset, on a non-debug kernel (Debian's default kernel config) on a 12
cores Intel box with 64G of ram:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/nvme0n1
   MNT=/mnt/nvme0n1
   MOUNT_OPTIONS="-o ssd"
   MKFS_OPTIONS="-R free-space-tree -O no-holes"

   cat <<EOF > /tmp/fio-job.ini
   [writers]
   rw=randwrite
   fsync=8
   fallocate=none
   group_reporting=1
   direct=0
   bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5
   ioengine=psync
   filesize=2G
   runtime=300
   time_based
   directory=$MNT
   numjobs=8
   thread
   EOF

   echo performance | \
       tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

   echo
   echo "Using config:"
   echo
   cat /tmp/fio-job.ini
   echo

   umount $MNT &> /dev/null
   mkfs.btrfs -f $MKFS_OPTIONS $DEV
   mount $MOUNT_OPTIONS $DEV $MNT

   fio /tmp/fio-job.ini

   umount $MNT

Result before applying the patchset:

   WRITE: bw=197MiB/s (206MB/s), 197MiB/s-197MiB/s (206MB/s-206MB/s), io=57.7GiB (61.9GB), run=300188-300188msec

Result after applying the patchset:

   WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=59.5GiB (63.9GB), run=300019-300019msec

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-29 17:08:31 +02:00
..
tests btrfs: move btrfs_drop_extent_cache() to extent_map.c 2022-09-29 17:08:30 +02:00
acl.c btrfs: reserve correct number of items for inode creation 2022-05-16 17:03:08 +02:00
async-thread.c btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue 2022-05-16 17:03:15 +02:00
async-thread.h btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t 2022-07-25 17:45:36 +02:00
backref.c btrfs: skip unnecessary extent buffer sharedness checks during fiemap 2022-09-26 12:28:01 +02:00
backref.h btrfs: skip unnecessary extent buffer sharedness checks during fiemap 2022-09-26 12:28:01 +02:00
block-group.c btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
block-group.h btrfs: move btrfs_full_stripe_locks_tree into block-group.h 2022-09-26 12:28:06 +02:00
block-rsv.c btrfs: add KCSAN annotations for unlocked access to block_rsv->full 2022-09-26 12:28:02 +02:00
block-rsv.h btrfs: add KCSAN annotations for unlocked access to block_rsv->full 2022-09-26 12:28:02 +02:00
btrfs_inode.h btrfs: use a runtime flag to indicate an inode is a free space inode 2022-09-26 12:28:07 +02:00
check-integrity.c fs/btrfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:32 -06:00
check-integrity.h btrfs: check-integrity: split submit_bio from btrfsic checking 2022-05-16 17:03:12 +02:00
compression.c btrfs: unify the lock/unlock extent variants 2022-09-26 12:28:05 +02:00
compression.h for-5.20-tag 2022-08-03 14:54:52 -07:00
ctree.c btrfs: assert nowait mode is not used for some btree search functions 2022-09-29 17:08:29 +02:00
ctree.h btrfs: move btrfs_drop_extent_cache() to extent_map.c 2022-09-29 17:08:30 +02:00
delalloc-space.c btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
delalloc-space.h btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
delayed-inode.c btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
delayed-inode.h btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
delayed-ref.c btrfs: switch btrfs_block_rsv::full to bool 2022-07-25 17:45:40 +02:00
delayed-ref.h btrfs: remove btrfs_delayed_extent_op::is_data 2022-05-16 17:17:31 +02:00
dev-replace.c btrfs: don't take a bio_counter reference for cloned bios 2022-09-26 12:27:58 +02:00
dev-replace.h btrfs: add struct declarations in dev-replace.h 2022-09-26 12:28:07 +02:00
dir-item.c btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item 2022-05-16 17:03:07 +02:00
discard.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
discard.h
disk-io.c btrfs: relax block-group-tree feature dependency checks 2022-09-26 12:28:07 +02:00
disk-io.h btrfs: relax block-group-tree feature dependency checks 2022-09-26 12:28:07 +02:00
export.c
export.h
extent_io.c btrfs: move end_io_func argument to btrfs_bio_ctrl structure 2022-09-26 12:28:07 +02:00
extent_io.h btrfs: move extent io tree unrelated prototypes to their appropriate header 2022-09-26 12:28:04 +02:00
extent_map.c btrfs: drop extent map range more efficiently 2022-09-29 17:08:31 +02:00
extent_map.h btrfs: add helper to replace extent map range with a new extent map 2022-09-29 17:08:30 +02:00
extent-io-tree.c btrfs: remove is_data_inode() checks in extent-io-tree.c 2022-09-26 12:28:05 +02:00
extent-io-tree.h btrfs: stop tracking failed reads in the I/O tree 2022-09-26 12:28:05 +02:00
extent-tree.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
file-item.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
file.c btrfs: add helper to replace extent map range with a new extent map 2022-09-29 17:08:30 +02:00
free-space-cache.c btrfs: move btrfs_drop_extent_cache() to extent_map.c 2022-09-29 17:08:30 +02:00
free-space-cache.h btrfs: remove use btrfs_remove_free_space_cache instead of variant 2022-09-26 12:27:58 +02:00
free-space-tree.c btrfs: get rid of block group caching progress logic 2022-09-26 12:27:58 +02:00
free-space-tree.h
inode-item.c btrfs: make should_throttle loop local in btrfs_truncate_inode_items 2022-01-07 14:18:25 +01:00
inode-item.h btrfs: add inode to truncate control 2022-01-07 14:18:24 +01:00
inode.c btrfs: avoid pointless extent map tree search when flushing delalloc 2022-09-29 17:08:31 +02:00
ioctl.c btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITS 2022-09-26 12:28:05 +02:00
Kconfig btrfs: use generic Kconfig option for 256kB page size limit 2022-01-20 08:52:55 +02:00
locking.c btrfs: implement a nowait option for tree searches 2022-09-26 12:46:42 +02:00
locking.h btrfs: implement a nowait option for tree searches 2022-09-26 12:46:42 +02:00
lzo.c btrfs: replace kmap() with kmap_local_page() in lzo.c 2022-07-25 17:45:33 +02:00
Makefile btrfs: move extent state init and alloc functions to their own file 2022-09-26 12:28:03 +02:00
misc.h btrfs: convert the io_failure_tree to a plain rb_tree 2022-09-26 12:28:02 +02:00
ordered-data.c btrfs: add btrfs_try_lock_ordered_range 2022-09-29 17:08:28 +02:00
ordered-data.h btrfs: add btrfs_try_lock_ordered_range 2022-09-29 17:08:28 +02:00
orphan.c
print-tree.c btrfs: unify the error handling pattern for read_tree_block() 2022-03-14 13:13:53 +01:00
print-tree.h
props.c btrfs: remove the unnecessary result variables 2022-09-26 12:28:00 +02:00
props.h btrfs: move common inode creation code into btrfs_create_new_inode() 2022-05-16 17:03:08 +02:00
qgroup.c btrfs: qgroup: fix a typo in a comment 2022-09-26 12:28:02 +02:00
qgroup.h btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting 2022-09-26 12:28:01 +02:00
raid56.c btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
raid56.h btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
rcu-string.h
ref-verify.c btrfs: stop accessing ->extent_root directly 2022-01-03 15:09:49 +01:00
ref-verify.h
reflink.c btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITS 2022-09-26 12:28:05 +02:00
reflink.h
relocation.c btrfs: add helper to replace extent map range with a new extent map 2022-09-29 17:08:30 +02:00
root-tree.c btrfs: simplify error handling at btrfs_del_root_ref() 2022-09-26 12:27:58 +02:00
scrub.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
send.c btrfs: send: fix failures when processing inodes with no links 2022-09-26 12:27:57 +02:00
send.h btrfs: send: add support for fs-verity 2022-09-26 12:27:55 +02:00
space-info.c btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
space-info.h btrfs: move btrfs_init_async_reclaim_work prototype to space-info.h 2022-09-26 12:28:06 +02:00
struct-funcs.c btrfs: remove redundant check in up check_setget_bounds 2022-07-25 17:45:33 +02:00
subpage.c btrfs: remove extent writepage address space operation 2022-07-25 17:45:37 +02:00
subpage.h btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine 2022-05-16 17:03:11 +02:00
super.c btrfs: relax block-group-tree feature dependency checks 2022-09-26 12:28:07 +02:00
sysfs.c btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction() 2022-09-26 12:28:01 +02:00
sysfs.h
transaction.c btrfs: don't init io tree with private data for non-inodes 2022-09-26 12:28:05 +02:00
transaction.h btrfs: pass btrfs_fs_info for deleting snapshots and cleaner 2022-03-14 13:13:52 +01:00
tree-checker.c btrfs: tree-checker: check for overlapping extent items 2022-08-17 16:20:25 +02:00
tree-checker.h btrfs: tree-checker: check extent buffer owner against owner rootid 2022-05-16 17:03:09 +02:00
tree-defrag.c btrfs: remove unnecessary extent root check in btrfs_defrag_leaves 2022-01-03 15:09:48 +01:00
tree-log.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
tree-log.h btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
tree-mod-log.c btrfs: fix race when picking most recent mod log operation for an old root 2021-04-20 19:27:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
verity.c btrfs: send: add support for fs-verity 2022-09-26 12:27:55 +02:00
volumes.c btrfs: check superblock to ensure the fs was not modified at thaw time 2022-09-26 12:27:59 +02:00
volumes.h btrfs: move btrfs_swapfile_pin into volumes.h 2022-09-26 12:28:06 +02:00
xattr.c btrfs: check if root is readonly while setting security xattr 2022-08-22 18:06:30 +02:00
xattr.h
zlib.c btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio() 2022-07-25 17:45:41 +02:00
zoned.c btrfs: zoned: refactor device checks in btrfs_check_zoned_mode 2022-09-26 12:28:02 +02:00
zoned.h btrfs: zoned: activate metadata block group on flush_space 2022-07-25 17:45:42 +02:00
zstd.c btrfs: zstd: replace kmap() with kmap_local_page() 2022-07-25 17:45:40 +02:00