linux/fs/btrfs
Filipe Manana 943553ef9b btrfs: fix processing of delayed tree block refs during backref walking
During backref walking, when processing a delayed reference with a type of
BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there:

1) We are accessing the delayed references extent_op, and its key, without
   the protection of the delayed ref head's lock;

2) If there's no extent op for the delayed ref head, we end up with an
   uninitialized key in the stack, variable 'tmp_op_key', and then pass
   it to add_indirect_ref(), which adds the reference to the indirect
   refs rb tree.

   This is wrong, because indirect references should have a NULL key
   when we don't have access to the key, and in that case they should be
   added to the indirect_missing_keys rb tree and not to the indirect rb
   tree.

   This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting
   from freeing an extent buffer, therefore with a count of -1, it will
   not cancel out the corresponding reference we have in the extent tree
   (with a count of 1), since both references end up in different rb
   trees.

   When using fiemap, where we often need to check if extents are shared
   through shared subtrees resulting from snapshots, it means we can
   incorrectly report an extent as shared when it's no longer shared.
   However this is temporary because after the transaction is committed
   the extent is no longer reported as shared, as running the delayed
   reference results in deleting the tree block reference from the extent
   tree.

   Outside the fiemap context, the result is unpredictable, as the key was
   not initialized but it's used when navigating the rb trees to insert
   and search for references (prelim_ref_compare()), and we expect all
   references in the indirect rb tree to have valid keys.

The following reproducer triggers the second bug:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdj
   MNT=/mnt/sdj

   mkfs.btrfs -f $DEV
   mount -o compress $DEV $MNT

   # With a compressed 128M file we get a tree height of 2 (level 1 root).
   xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo

   btrfs subvolume snapshot $MNT $MNT/snap

   # Fiemap should output 0x2008 in the flags column.
   # 0x2000 means shared extent
   # 0x8 means encoded extent (because it's compressed)
   echo
   echo "fiemap after snapshot, range [120M, 120M + 128K):"
   xfs_io -c "fiemap -v 120M 128K" $MNT/foo
   echo

   # Overwrite one extent and fsync to flush delalloc and COW a new path
   # in the snapshot's tree.
   #
   # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type
   # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent
   # buffer in the path.
   #
   # In the extent tree we have inline references of type
   # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent
   # buffers, so they should cancel each other, and the extent buffers in
   # the fs tree should no longer be considered as shared.
   #
   echo "Overwriting file range [120M, 120M + 128K)..."
   xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo
   xfs_io -c "fsync" $MNT/snap/foo

   # Fiemap should output 0x8 in the flags column. The extent in the range
   # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs
   # tree.
   echo
   echo "fiemap after overwrite range [120M, 120M + 128K):"
   xfs_io -c "fiemap -v 120M 128K" $MNT/foo
   echo

   umount $MNT

Running it before this patch:

   $ ./test.sh
   (...)
   wrote 134217728/134217728 bytes at offset 0
   128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec)
   Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'

   fiemap after snapshot, range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256 0x2008

   Overwriting file range [120M, 120M + 128K)...
   wrote 131072/131072 bytes at offset 125829120
   128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec)

   fiemap after overwrite range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256 0x2008

The extent in the range [120M, 120M + 128K) is still reported as shared
(0x2000 bit set) after overwriting that range and flushing delalloc, which
is not correct - an entire path was COWed in the snapshot's tree and the
extent is now only referenced by the original fs tree.

Running it after this patch:

   $ ./test.sh
   (...)
   wrote 134217728/134217728 bytes at offset 0
   128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec)
   Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'

   fiemap after snapshot, range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256 0x2008

   Overwriting file range [120M, 120M + 128K)...
   wrote 131072/131072 bytes at offset 125829120
   128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec)

   fiemap after overwrite range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256   0x8

Now the extent is not reported as shared anymore.

So fix this by passing a NULL key pointer to add_indirect_ref() when
processing a delayed reference for a tree block if there's no extent op
for our delayed ref head with a defined key. Also access the extent op
only after locking the delayed ref head's lock.

The reproducer will be converted later to a test case for fstests.

Fixes: 86d5f99442 ("btrfs: convert prelimary reference tracking to use rbtrees")
Fixes: a6dbceafb9 ("btrfs: Remove unused op_key var from add_delayed_refs")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 14:48:01 +02:00
..
tests btrfs: move btrfs_drop_extent_cache() to extent_map.c 2022-09-29 17:08:30 +02:00
acl.c btrfs: reserve correct number of items for inode creation 2022-05-16 17:03:08 +02:00
async-thread.c btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue 2022-05-16 17:03:15 +02:00
async-thread.h btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t 2022-07-25 17:45:36 +02:00
backref.c btrfs: fix processing of delayed tree block refs during backref walking 2022-10-11 14:48:01 +02:00
backref.h btrfs: skip unnecessary extent buffer sharedness checks during fiemap 2022-09-26 12:28:01 +02:00
block-group.c btrfs: delete stale comments after merge conflict resolution 2022-10-11 14:47:54 +02:00
block-group.h btrfs: move btrfs_full_stripe_locks_tree into block-group.h 2022-09-26 12:28:06 +02:00
block-rsv.c btrfs: add KCSAN annotations for unlocked access to block_rsv->full 2022-09-26 12:28:02 +02:00
block-rsv.h btrfs: add KCSAN annotations for unlocked access to block_rsv->full 2022-09-26 12:28:02 +02:00
btrfs_inode.h btrfs: use a runtime flag to indicate an inode is a free space inode 2022-09-26 12:28:07 +02:00
check-integrity.c fs/btrfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:32 -06:00
check-integrity.h btrfs: check-integrity: split submit_bio from btrfsic checking 2022-05-16 17:03:12 +02:00
compression.c btrfs: unify the lock/unlock extent variants 2022-09-26 12:28:05 +02:00
compression.h for-5.20-tag 2022-08-03 14:54:52 -07:00
ctree.c btrfs: assert nowait mode is not used for some btree search functions 2022-09-29 17:08:29 +02:00
ctree.h btrfs: move btrfs_drop_extent_cache() to extent_map.c 2022-09-29 17:08:30 +02:00
delalloc-space.c btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
delalloc-space.h btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
delayed-inode.c btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
delayed-inode.h btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
delayed-ref.c btrfs: switch btrfs_block_rsv::full to bool 2022-07-25 17:45:40 +02:00
delayed-ref.h btrfs: remove btrfs_delayed_extent_op::is_data 2022-05-16 17:17:31 +02:00
dev-replace.c btrfs: don't take a bio_counter reference for cloned bios 2022-09-26 12:27:58 +02:00
dev-replace.h btrfs: add struct declarations in dev-replace.h 2022-09-26 12:28:07 +02:00
dir-item.c btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item 2022-05-16 17:03:07 +02:00
discard.c
discard.h
disk-io.c btrfs: relax block-group-tree feature dependency checks 2022-09-26 12:28:07 +02:00
disk-io.h btrfs: relax block-group-tree feature dependency checks 2022-09-26 12:28:07 +02:00
export.c
export.h
extent_io.c btrfs: move end_io_func argument to btrfs_bio_ctrl structure 2022-09-26 12:28:07 +02:00
extent_io.h btrfs: move extent io tree unrelated prototypes to their appropriate header 2022-09-26 12:28:04 +02:00
extent_map.c btrfs: drop extent map range more efficiently 2022-09-29 17:08:31 +02:00
extent_map.h btrfs: add helper to replace extent map range with a new extent map 2022-09-29 17:08:30 +02:00
extent-io-tree.c btrfs: unlock locked extent area if we have contention 2022-10-11 14:47:06 +02:00
extent-io-tree.h btrfs: stop tracking failed reads in the I/O tree 2022-09-26 12:28:05 +02:00
extent-tree.c btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer 2022-09-29 17:08:31 +02:00
file-item.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
file.c btrfs: add helper to replace extent map range with a new extent map 2022-09-29 17:08:30 +02:00
free-space-cache.c btrfs: move btrfs_drop_extent_cache() to extent_map.c 2022-09-29 17:08:30 +02:00
free-space-cache.h btrfs: remove use btrfs_remove_free_space_cache instead of variant 2022-09-26 12:27:58 +02:00
free-space-tree.c btrfs: get rid of block group caching progress logic 2022-09-26 12:27:58 +02:00
free-space-tree.h
inode-item.c btrfs: make should_throttle loop local in btrfs_truncate_inode_items 2022-01-07 14:18:25 +01:00
inode-item.h btrfs: add inode to truncate control 2022-01-07 14:18:24 +01:00
inode.c btrfs: avoid pointless extent map tree search when flushing delalloc 2022-09-29 17:08:31 +02:00
ioctl.c btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITS 2022-09-26 12:28:05 +02:00
Kconfig btrfs: use generic Kconfig option for 256kB page size limit 2022-01-20 08:52:55 +02:00
locking.c btrfs: implement a nowait option for tree searches 2022-09-26 12:46:42 +02:00
locking.h btrfs: implement a nowait option for tree searches 2022-09-26 12:46:42 +02:00
lzo.c btrfs: replace kmap() with kmap_local_page() in lzo.c 2022-07-25 17:45:33 +02:00
Makefile btrfs: move extent state init and alloc functions to their own file 2022-09-26 12:28:03 +02:00
misc.h btrfs: convert the io_failure_tree to a plain rb_tree 2022-09-26 12:28:02 +02:00
ordered-data.c btrfs: add btrfs_try_lock_ordered_range 2022-09-29 17:08:28 +02:00
ordered-data.h btrfs: add btrfs_try_lock_ordered_range 2022-09-29 17:08:28 +02:00
orphan.c
print-tree.c btrfs: unify the error handling pattern for read_tree_block() 2022-03-14 13:13:53 +01:00
print-tree.h
props.c btrfs: remove the unnecessary result variables 2022-09-26 12:28:00 +02:00
props.h btrfs: move common inode creation code into btrfs_create_new_inode() 2022-05-16 17:03:08 +02:00
qgroup.c btrfs: qgroup: fix a typo in a comment 2022-09-26 12:28:02 +02:00
qgroup.h btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip qgroup accounting 2022-09-26 12:28:01 +02:00
raid56.c btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
raid56.h btrfs: properly abstract the parity raid bio handling 2022-09-26 12:27:59 +02:00
rcu-string.h
ref-verify.c btrfs: stop accessing ->extent_root directly 2022-01-03 15:09:49 +01:00
ref-verify.h
reflink.c btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITS 2022-09-26 12:28:05 +02:00
reflink.h
relocation.c btrfs: add helper to replace extent map range with a new extent map 2022-09-29 17:08:30 +02:00
root-tree.c btrfs: simplify error handling at btrfs_del_root_ref() 2022-09-26 12:27:58 +02:00
scrub.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
send.c btrfs: send: update command for protocol version check 2022-10-11 14:47:06 +02:00
send.h btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG 2022-10-11 14:46:55 +02:00
space-info.c btrfs: add the ability to use NO_FLUSH for data reservations 2022-09-29 17:08:28 +02:00
space-info.h btrfs: move btrfs_init_async_reclaim_work prototype to space-info.h 2022-09-26 12:28:06 +02:00
struct-funcs.c btrfs: remove redundant check in up check_setget_bounds 2022-07-25 17:45:33 +02:00
subpage.c btrfs: remove extent writepage address space operation 2022-07-25 17:45:37 +02:00
subpage.h btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine 2022-05-16 17:03:11 +02:00
super.c btrfs: relax block-group-tree feature dependency checks 2022-09-26 12:28:07 +02:00
sysfs.c btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction() 2022-09-26 12:28:01 +02:00
sysfs.h
transaction.c btrfs: don't init io tree with private data for non-inodes 2022-09-26 12:28:05 +02:00
transaction.h btrfs: pass btrfs_fs_info for deleting snapshots and cleaner 2022-03-14 13:13:52 +01:00
tree-checker.c btrfs: tree-checker: check for overlapping extent items 2022-08-17 16:20:25 +02:00
tree-checker.h btrfs: tree-checker: check extent buffer owner against owner rootid 2022-05-16 17:03:09 +02:00
tree-defrag.c btrfs: remove unnecessary extent root check in btrfs_defrag_leaves 2022-01-03 15:09:48 +01:00
tree-log.c btrfs: make can_nocow_extent nowait compatible 2022-09-29 17:08:26 +02:00
tree-log.h btrfs: use delayed items when logging a directory 2022-09-26 12:27:57 +02:00
tree-mod-log.c
tree-mod-log.h
ulist.c
ulist.h
uuid-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
verity.c btrfs: send: add support for fs-verity 2022-09-26 12:27:55 +02:00
volumes.c btrfs: check superblock to ensure the fs was not modified at thaw time 2022-09-26 12:27:59 +02:00
volumes.h btrfs: move btrfs_swapfile_pin into volumes.h 2022-09-26 12:28:06 +02:00
xattr.c btrfs: check if root is readonly while setting security xattr 2022-08-22 18:06:30 +02:00
xattr.h
zlib.c btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio() 2022-07-25 17:45:41 +02:00
zoned.c btrfs: zoned: refactor device checks in btrfs_check_zoned_mode 2022-09-26 12:28:02 +02:00
zoned.h btrfs: zoned: activate metadata block group on flush_space 2022-07-25 17:45:42 +02:00
zstd.c btrfs: zstd: replace kmap() with kmap_local_page() 2022-07-25 17:45:40 +02:00