linux/fs
Filipe Manana e7a79811d0 btrfs: check if a log root exists before locking the log_mutex on unlink
This brings back an optimization that commit e678934cbe ("btrfs:
Remove unnecessary check from join_running_log_trans") removed, but in
a different form. So it's almost equivalent to a revert.

That commit removed an optimization where we avoid locking a root's
log_mutex when there is no log tree created in the current transaction.
The affected code path is triggered through unlink operations.

That commit was based on the assumption that the optimization was not
necessary because we used to have the following checks when the patch
was authored:

  int btrfs_del_dir_entries_in_log(...)
  {
        (...)
        if (dir->logged_trans < trans->transid)
            return 0;

        ret = join_running_log_trans(root);
        (...)
   }

   int btrfs_del_inode_ref_in_log(...)
   {
        (...)
        if (inode->logged_trans < trans->transid)
            return 0;

        ret = join_running_log_trans(root);
        (...)
   }

However before that patch was merged, another patch was merged first which
replaced those checks because they were buggy.

That other patch corresponds to commit 803f0f64d1 ("Btrfs: fix fsync
not persisting dentry deletions due to inode evictions"). The assumption
that if the logged_trans field of an inode had a smaller value then the
current transaction's generation (transid) meant that the inode was not
logged in the current transaction was only correct if the inode was not
evicted and reloaded in the current transaction. So the corresponding bug
fix changed those checks and replaced them with the following helper
function:

  static bool inode_logged(struct btrfs_trans_handle *trans,
                           struct btrfs_inode *inode)
  {
        if (inode->logged_trans == trans->transid)
                return true;

        if (inode->last_trans == trans->transid &&
            test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
            !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
                return true;

        return false;
  }

So if we have a subvolume without a log tree in the current transaction
(because we had no fsyncs), every time we unlink an inode we can end up
trying to lock the log_mutex of the root through join_running_log_trans()
twice, once for the inode being unlinked (by btrfs_del_inode_ref_in_log())
and once for the parent directory (with btrfs_del_dir_entries_in_log()).

This means if we have several unlink operations happening in parallel for
inodes in the same subvolume, and the those inodes and/or their parent
inode were changed in the current transaction, we end up having a lot of
contention on the log_mutex.

The test robots from intel reported a -30.7% performance regression for
a REAIM test after commit e678934cbe ("btrfs: Remove unnecessary check
from join_running_log_trans").

So just bring back the optimization to join_running_log_trans() where we
check first if a log root exists before trying to lock the log_mutex. This
is done by checking for a bit that is set on the root when a log tree is
created and removed when a log tree is freed (at transaction commit time).

Commit e678934cbe ("btrfs: Remove unnecessary check from
join_running_log_trans") was merged in the 5.4 merge window while commit
803f0f64d1 ("Btrfs: fix fsync not persisting dentry deletions due to
inode evictions") was merged in the 5.3 merge window. But the first
commit was actually authored before the second commit (May 23 2019 vs
June 19 2019).

Reported-by: kernel test robot <rong.a.chen@intel.com>
Link: https://lore.kernel.org/lkml/20200611090233.GL12456@shao2-debian/
Fixes: e678934cbe ("btrfs: Remove unnecessary check from join_running_log_trans")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-06-16 19:22:23 +02:00
..
9p 9p: read only once on O_NONBLOCK 2020-03-27 09:29:56 +00:00
adfs fs/adfs: bigdir: Fix an error code in adfs_fplus_read() 2020-01-25 11:31:59 -05:00
affs affs: fix a memory leak in affs_remount 2019-11-18 14:26:43 +01:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-23 17:16:18 -07:00
autofs LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() 2020-03-13 21:08:17 -04:00
befs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
bfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
btrfs btrfs: check if a log root exists before locking the log_mutex on unlink 2020-06-16 19:22:23 +02:00
cachefiles cachefiles: Fix race between read_waiter and read_copier involving op->to_do 2020-05-08 23:01:10 +01:00
ceph block-5.7-2020-05-09 2020-05-10 11:16:07 -07:00
cifs cifs: fix leaked reference on requeued write 2020-05-14 17:47:01 -05:00
coda y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
configfs configfs: fix config_item refcnt leak in configfs_rmdir() 2020-04-27 08:17:10 +02:00
cramfs cramfs: switch to use of errofc() et.al. 2020-02-07 14:48:41 -05:00
crypto fscrypt updates for 5.7 2020-03-31 12:58:36 -07:00
debugfs debugfs: remove return value of debugfs_create_u32() 2020-04-17 17:08:50 +02:00
devpts devpts_pty_kill(): don't bother with d_delete() 2019-09-03 09:30:56 -04:00
dlm dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD 2019-12-18 18:07:31 +01:00
ecryptfs eCryptfs fixes for 5.6-rc3 2020-02-17 21:08:37 -08:00
efivarfs efi: Use more granular check for availability for variable services 2020-02-23 21:59:42 +01:00
efs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
erofs erofs: handle corrupted images whose decompressed size less than it'd be 2020-03-03 23:40:52 +08:00
exfat exfat: add the dummy mount options to be backward compatible with staging/exfat 2020-05-21 16:40:11 -07:00
exportfs race in exportfs_decode_fh() 2019-11-11 09:21:59 -05:00
ext2 ext2: fix empty body warnings when -Wextra is used 2020-03-23 13:01:37 +01:00
ext4 ext4: fix fiemap size checks for bitmap files 2020-05-19 15:03:37 -04:00
f2fs f2fs-for-5.7-rc1 2020-04-07 13:48:26 -07:00
fat fat: fix uninit-memory access for partial initialized inode 2020-03-06 07:06:09 -06:00
freevxfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
fscache proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
fuse fuse: fix stack use after return 2020-02-13 09:16:07 +01:00
gfs2 Revert "gfs2: Don't demote a glock until its revokes are written" 2020-05-08 15:01:25 -05:00
hfs hfs/hfsplus: use 64-bit inode timestamps 2019-12-18 18:07:32 +01:00
hfsplus hfsplus: fix crash and filesystem corruption when deleting files 2020-04-10 15:36:20 -07:00
hostfs hostfs: Use kasprintf() instead of fixed buffer formatting 2020-03-29 23:23:00 +02:00
hpfs fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
hugetlbfs hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race 2020-04-02 09:35:32 -07:00
iomap iomap: remove lockdep_assert_held() 2020-05-25 13:12:53 +02:00
isofs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
jbd2 jbd2: improve comments about freeing data buffers whose page mapping is NULL 2020-03-05 20:25:05 -05:00
jffs2 fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
jfs Trivial cleanup for jfs 2020-02-05 05:28:20 +00:00
kernfs kernfs: Add option to enable user xattrs 2020-03-16 15:53:47 -04:00
lockd proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
minix fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
nfs NFSv3: fix rpc receive buffer size for MOUNT call 2020-05-14 18:42:44 -04:00
nfs_common treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
nfsd SUNRPC: Fix backchannel RPC soft lockups 2020-04-17 12:40:31 -04:00
nilfs2 fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
nls treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
notify fanotify: Fix the checks in fanotify_fsid_equal 2020-03-30 12:40:53 +02:00
ntfs fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t 2020-03-28 13:21:08 +01:00
ocfs2 dlmfs_file_write(): fix the bogosity in handling non-zero *ppos 2020-04-23 13:45:27 -04:00
omfs fs: omfs: Initialize filesystem timestamp ranges 2019-08-30 08:11:25 -07:00
openpromfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
orangefs orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush 2020-04-08 09:39:11 -04:00
overlayfs ovl: potential crash in ovl_fid_to_fh() 2020-05-13 11:10:57 +02:00
proc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-04-25 12:25:32 -07:00
pstore pstore/ram: Replace zero-length array with flexible-array member 2020-03-09 14:45:40 -07:00
qnx4 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
qnx6 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
quota \n 2020-01-30 15:37:41 -08:00
ramfs fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
reiserfs reiserfs: clean up several indentation issues 2020-04-07 10:43:44 -07:00
romfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
squashfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
sysfs sysfs: remove redundant __compat_only_sysfs_link_entry_to_kobj fn 2020-04-05 11:34:35 -07:00
sysv fs: sysv: Initialize filesystem timestamp ranges 2019-08-30 07:27:18 -07:00
tracefs simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems 2019-12-10 22:29:58 -05:00
ubifs ubifs: fix wrong use of crypto_shash_descsize() 2020-05-17 23:38:21 +02:00
udf change email address for Pali Rohár 2020-04-10 15:36:22 -07:00
ufs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
unicode .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
vboxsf vboxsf: don't use the source name in the bdi name 2020-05-07 08:45:47 -06:00
verity fs-verity: use u64_to_user_ptr() 2020-01-14 13:28:28 -08:00
xfs xfs: move inode flush to the sync workqueue 2020-04-16 09:07:42 -07:00
zonefs zonfs: Fix handling of read-only zones 2020-03-25 11:28:26 +09:00
aio.c aio: prevent potential eventfd recursion on poll 2020-02-03 17:27:47 -07:00
anon_inodes.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
attr.c utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
bad_inode.c
binfmt_aout.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
binfmt_elf_fdpic.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_elf.c fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path 2020-04-07 10:43:44 -07:00
binfmt_em86.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
binfmt_flat.c fs/binfmt_flat.c: remove set but not used variable 'inode' 2019-07-16 19:23:22 -07:00
binfmt_misc.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
binfmt_script.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
block_dev.c block: remove unused header 2020-04-21 09:51:10 -06:00
buffer.c block-5.7-2020-04-24 2020-04-24 12:44:19 -07:00
char_dev.c chardev: Avoid potential use-after-free in 'chrdev_open()' 2020-01-06 20:10:26 +01:00
compat_binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
compat.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
coredump.c coredump: fix crash when umh is disabled 2020-04-28 17:54:13 +02:00
d_path.c [PATCH] fix d_absolute_path() interplay with fsmount() 2019-08-30 19:31:09 -04:00
dax.c dax,iomap: Add helper dax_iomap_zero() to zero a range 2020-04-02 19:15:03 -07:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
dcookies.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
direct-io.c Revert "fs: remove dio_end_io()" 2020-06-09 19:23:18 +02:00
drop_caches.c fs: avoid softlockups in s_inodes iterators 2019-12-18 00:03:01 -05:00
eventfd.c eventfd: track eventfd_signal() recursion depth 2020-02-03 17:27:38 -07:00
eventpoll.c epoll: call final ep_events_available() check under the lock 2020-05-14 10:00:35 -07:00
exec.c exec: Move would_dump into flush_old_exec 2020-05-17 10:48:24 -05:00
fcntl.c fcntl: Distribute switch variables for initialization 2020-03-03 10:55:06 -05:00
fhandle.c fs/handle.c - fix up kerneldoc 2019-08-07 21:51:47 -04:00
file_table.c vfs: Export flush_delayed_fput for use by knfsd. 2019-08-19 11:00:39 -04:00
file.c fix multiplication overflow in copy_fdtable() 2020-05-19 18:29:36 -04:00
filesystems.c fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() 2020-04-10 15:36:22 -07:00
fs_context.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
fs_parser.c fs_parse: remove pr_notice() about each validation 2020-04-02 09:35:26 -07:00
fs_pin.c switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
fs_struct.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
fs_types.c
fs-writeback.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
fsopen.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
inode.c futex: Fix inode life-time issue 2020-03-06 11:06:15 +01:00
internal.h Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-04-02 12:30:08 -07:00
io_uring.c io_uring: reset -EBUSY error when io sq thread is waken up 2020-05-20 07:26:47 -06:00
io-wq.c io_uring: use io-wq manager as backup task if task is exiting 2020-04-03 11:35:57 -06:00
io-wq.h io_uring: use io-wq manager as backup task if task is exiting 2020-04-03 11:35:57 -06:00
ioctl.c fibmap: Warn and return an error in case of block > INT_MAX 2020-04-30 07:57:46 -07:00
Kconfig exfat: add Kconfig and Makefile 2020-03-05 21:00:40 -05:00
Kconfig.binfmt binfmt_flat: make support for old format binaries optional 2019-06-24 09:16:47 +10:00
libfs.c libfs: fix infoleak in simple_attr_read() 2020-03-24 13:27:16 +01:00
locks.c locks: reinstate locks_delete_block optimization 2020-03-18 13:03:38 -07:00
Makefile exfat: add Kconfig and Makefile 2020-03-05 21:00:40 -05:00
mbcache.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
mount.h switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
mpage.c fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
namei.c fix a braino in legitimize_path() 2020-04-06 10:38:59 -04:00
namespace.c LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() 2020-03-13 21:08:17 -04:00
no-block.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
nsfs.c fs/nsfs.c: Added ns_match 2020-03-12 17:33:11 -07:00
open.c Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-04-02 12:30:08 -07:00
pipe.c mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() 2020-04-02 09:35:28 -07:00
pnode.c propagate_one(): mnt_set_mountpoint() needs mount_lock 2020-04-27 10:37:14 -04:00
pnode.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209 2019-05-30 11:29:53 -07:00
posix_acl.c fs/posix_acl.c: fix kernel-doc warnings 2020-01-04 13:55:09 -08:00
proc_namespace.c vfs: subtype handling moved to fuse 2019-09-06 21:28:49 +02:00
read_write.c powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro 2020-04-03 00:09:59 +11:00
readdir.c readdir: make user_access_begin() use the real access range 2020-01-23 10:15:28 -08:00
select.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
seq_file.c fs/seq_file.c: seq_read(): add info message about buggy .next functions 2020-04-10 15:36:22 -07:00
signalfd.c
splice.c pipe: Fix pipe_full() test in opipe_prep(). 2020-05-20 10:54:29 -07:00
stack.c sched/rt, fs: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
stat.c fs: make two stat prep helpers available 2020-01-20 17:03:54 -07:00
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-03 14:21:35 -07:00
super.c Fix use after free in get_tree_bdev() 2020-04-28 14:37:40 -07:00
sync.c fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback 2019-05-14 09:47:50 -07:00
timerfd.c timerfd: Make timerfd_settime() time namespace aware 2020-01-14 12:20:53 +01:00
userfaultfd.c userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally 2020-04-07 10:43:40 -07:00
utimes.c utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
xattr.c kernfs: Add removed_size out param for simple_xattr_set 2020-03-16 15:53:47 -04:00