Nikolay Borisov bf6dd437c3 btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
commit 4d14c5cde5c268a2bc26addecf09489cb953ef64 upstream

Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
    __schedule at ffffffffa529e07d
    schedule at ffffffffa529e4ff
    schedule_timeout at ffffffffa52a1bdd
    wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
    start_delalloc_inodes at ffffffffc0380db5
    btrfs_start_delalloc_snapshot at ffffffffc0393836
    try_flush_qgroup at ffffffffc03f04b2
    __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
    btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
    btrfs_update_inode at ffffffffc0385ba8
   btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
   touch_atime at ffffffffa4cf0000
   generic_file_read_iter at ffffffffa4c1f123
   new_sync_read at ffffffffa4ccdc8a
   vfs_read at ffffffffa4cd0849
   ksys_read at ffffffffa4cd0bd1
   do_syscall_64 at ffffffffa4a052eb
   entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
    __schedule at ffffffffa529e07d
    schedule at ffffffffa529e4ff
    schedule_preempt_disabled at ffffffffa529e80a
    __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
    btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
    btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
    cow_file_range_inline.constprop.78 at ffffffffc0386be7
    cow_file_range at ffffffffc03879c1
    btrfs_run_delalloc_range at ffffffffc038894c
    writepage_delalloc at ffffffffc03a3c8f
   __extent_writepage at ffffffffc03a4c01
   extent_write_cache_pages at ffffffffc03a500b
   extent_writepages at ffffffffc03a6de2
   do_writepages at ffffffffa4c277eb
   __filemap_fdatawrite_range at ffffffffa4c1e5bb
   btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
   normal_work_helper at ffffffffc03b706c
   process_one_work at ffffffffa4aba4e4
   worker_thread at ffffffffa4aba6fd
   kthread at ffffffffa4ac0a3d
   ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 14:17:22 +01:00
..
2020-05-25 11:25:37 +02:00
2020-10-07 12:13:25 +02:00
2019-11-18 23:43:44 +01:00
2020-10-07 12:06:57 +02:00
2020-05-25 11:25:37 +02:00
2021-03-09 11:11:10 +01:00
2020-10-07 12:13:17 +02:00
2020-05-25 11:25:36 +02:00