303 Commits

Author SHA1 Message Date
Boris Burkov
fb9f76b2a2 btrfs: qgroup: correctly model root qgroup rsv in convert
commit 141fb8cd206ace23c02cd2791c6da52c1d77d42a upstream.

We use add_root_meta_rsv and sub_root_meta_rsv to track prealloc and
pertrans reservations for subvolumes when quotas are enabled. The
convert function does not properly increment pertrans after decrementing
prealloc, so the count is not accurate.

Note: we check that the fs is not read-only to mirror the logic in
qgroup_convert_meta, which checks that before adding to the pertrans rsv.

Fixes: 8287475a2055 ("btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02 16:23:35 +02:00
Boris Burkov
22965e4fee btrfs: forbid deleting live subvol qgroup
commit a8df35619948bd8363d330c20a90c9a7fbff28c0 upstream.

If a subvolume still exists, forbid deleting its qgroup 0/subvolid.
This behavior generally leads to incorrect behavior in squotas and
doesn't have a legitimate purpose.

Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23 08:42:23 +01:00
Filipe Manana
52403c3dad btrfs: fix race between quota disable and relocation
[ Upstream commit 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 ]

If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().

This can be sporadically triggered by test case btrfs/255 from fstests:

  $ ./check btrfs/255
  FSTYP         -- btrfs
  PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
  MKFS_OPTIONS  -- /dev/sdc
  MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

  btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
  - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
#      --- tests/btrfs/255.out	2023-03-02 21:47:53.876609426 +0000
#      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad	2023-06-16 10:20:39.267563212 +0100
#      @@ -1,2 +1,4 @@
#       QA output created by 255
#      +ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
#      +There may be more info in syslog - try dmesg | tail
#       Silence is golden
#      ...
      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad'  to see the entire diff)
  Ran: btrfs/255
  Failures: btrfs/255
  Failed 1 of 1 tests

To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.

Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11 11:57:31 +02:00
Filipe Manana
d2c667cc18 btrfs: fix warning when putting transaction with qgroups enabled after abort
commit aa84ce8a78a1a5c10cdf9c7a5fb0c999fbc2c8d6 upstream.

If we have a transaction abort with qgroups enabled we get a warning
triggered when doing the final put on the transaction, like this:

  [552.6789] ------------[ cut here ]------------
  [552.6815] WARNING: CPU: 4 PID: 81745 at fs/btrfs/transaction.c:144 btrfs_put_transaction+0x123/0x130 [btrfs]
  [552.6817] Modules linked in: btrfs blake2b_generic xor (...)
  [552.6819] CPU: 4 PID: 81745 Comm: btrfs-transacti Tainted: G        W          6.4.0-rc6-btrfs-next-134+ #1
  [552.6819] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
  [552.6819] RIP: 0010:btrfs_put_transaction+0x123/0x130 [btrfs]
  [552.6821] Code: bd a0 01 00 (...)
  [552.6821] RSP: 0018:ffffa168c0527e28 EFLAGS: 00010286
  [552.6821] RAX: ffff936042caed00 RBX: ffff93604a3eb448 RCX: 0000000000000000
  [552.6821] RDX: ffff93606421b028 RSI: ffffffff92ff0878 RDI: ffff93606421b010
  [552.6821] RBP: ffff93606421b000 R08: 0000000000000000 R09: ffffa168c0d07c20
  [552.6821] R10: 0000000000000000 R11: ffff93608dc52950 R12: ffffa168c0527e70
  [552.6821] R13: ffff93606421b000 R14: ffff93604a3eb420 R15: ffff93606421b028
  [552.6821] FS:  0000000000000000(0000) GS:ffff93675fb00000(0000) knlGS:0000000000000000
  [552.6821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [552.6821] CR2: 0000558ad262b000 CR3: 000000014feda005 CR4: 0000000000370ee0
  [552.6822] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [552.6822] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [552.6822] Call Trace:
  [552.6822]  <TASK>
  [552.6822]  ? __warn+0x80/0x130
  [552.6822]  ? btrfs_put_transaction+0x123/0x130 [btrfs]
  [552.6824]  ? report_bug+0x1f4/0x200
  [552.6824]  ? handle_bug+0x42/0x70
  [552.6824]  ? exc_invalid_op+0x14/0x70
  [552.6824]  ? asm_exc_invalid_op+0x16/0x20
  [552.6824]  ? btrfs_put_transaction+0x123/0x130 [btrfs]
  [552.6826]  btrfs_cleanup_transaction+0xe7/0x5e0 [btrfs]
  [552.6828]  ? _raw_spin_unlock_irqrestore+0x23/0x40
  [552.6828]  ? try_to_wake_up+0x94/0x5e0
  [552.6828]  ? __pfx_process_timeout+0x10/0x10
  [552.6828]  transaction_kthread+0x103/0x1d0 [btrfs]
  [552.6830]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
  [552.6832]  kthread+0xee/0x120
  [552.6832]  ? __pfx_kthread+0x10/0x10
  [552.6832]  ret_from_fork+0x29/0x50
  [552.6832]  </TASK>
  [552.6832] ---[ end trace 0000000000000000 ]---

This corresponds to this line of code:

  void btrfs_put_transaction(struct btrfs_transaction *transaction)
  {
      (...)
          WARN_ON(!RB_EMPTY_ROOT(
                          &transaction->delayed_refs.dirty_extent_root));
      (...)
  }

The warning happens because btrfs_qgroup_destroy_extent_records(), called
in the transaction abort path, we free all entries from the rbtree
"dirty_extent_root" with rbtree_postorder_for_each_entry_safe(), but we
don't actually empty the rbtree - it's still pointing to nodes that were
freed.

So set the rbtree's root node to NULL to avoid this warning (assign
RB_ROOT).

Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27 08:44:34 +02:00
Filipe Manana
6819bb0b85 btrfs: fix race when deleting quota root from the dirty cow roots list
commit b31cb5a6eb7a48b0a7bfdf06832b1fd5088d8c79 upstream.

When disabling quotas we are deleting the quota root from the list
fs_info->dirty_cowonly_roots without taking the lock that protects it,
which is struct btrfs_fs_info::trans_lock. This unsynchronized list
manipulation may cause chaos if there's another concurrent manipulation
of this list, such as when adding a root to it with
ctree.c:add_root_to_dirty_list().

This can result in all sorts of weird failures caused by a race, such as
the following crash:

  [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
  [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G        W          6.4.0-rc6-btrfs-next-134+ #1
  [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs]
  [337571.279928] Code: 85 38 06 00 (...)
  [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206
  [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000
  [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070
  [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b
  [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600
  [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48
  [337571.281723] FS:  00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000
  [337571.281950] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0
  [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [337571.282874] Call Trace:
  [337571.283101]  <TASK>
  [337571.283327]  ? __die_body+0x1b/0x60
  [337571.283570]  ? die_addr+0x39/0x60
  [337571.283796]  ? exc_general_protection+0x22e/0x430
  [337571.284022]  ? asm_exc_general_protection+0x22/0x30
  [337571.284251]  ? commit_cowonly_roots+0x11f/0x250 [btrfs]
  [337571.284531]  btrfs_commit_transaction+0x42e/0xf90 [btrfs]
  [337571.284803]  ? _raw_spin_unlock+0x15/0x30
  [337571.285031]  ? release_extent_buffer+0x103/0x130 [btrfs]
  [337571.285305]  reset_balance_state+0x152/0x1b0 [btrfs]
  [337571.285578]  btrfs_balance+0xa50/0x11e0 [btrfs]
  [337571.285864]  ? __kmem_cache_alloc_node+0x14a/0x410
  [337571.286086]  btrfs_ioctl+0x249a/0x3320 [btrfs]
  [337571.286358]  ? mod_objcg_state+0xd2/0x360
  [337571.286577]  ? refill_obj_stock+0xb0/0x160
  [337571.286798]  ? seq_release+0x25/0x30
  [337571.287016]  ? __rseq_handle_notify_resume+0x3ba/0x4b0
  [337571.287235]  ? percpu_counter_add_batch+0x2e/0xa0
  [337571.287455]  ? __x64_sys_ioctl+0x88/0xc0
  [337571.287675]  __x64_sys_ioctl+0x88/0xc0
  [337571.287901]  do_syscall_64+0x38/0x90
  [337571.288126]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
  [337571.288352] RIP: 0033:0x7f478aaffe9b

So fix this by locking struct btrfs_fs_info::trans_lock before deleting
the quota root from that list.

Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27 08:44:14 +02:00
Filipe Manana
5f63470343 btrfs: fix race between quota disable and quota assign ioctls
commit 2f1a6be12ab6c8470d5776e68644726c94257c54 upstream.

The quota assign ioctl can currently run in parallel with a quota disable
ioctl call. The assign ioctl uses the quota root, while the disable ioctl
frees that root, and therefore we can have a use-after-free triggered in
the assign ioctl, leading to a trace like the following when KASAN is
enabled:

  [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
  [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
  [672.724][T736]
  [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
  [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  [672.727][T736] Call Trace:
  [672.728][T736]  <TASK>
  [672.728][T736]  dump_stack_lvl+0xd9/0x150
  [672.725][T736]  print_report+0xc1/0x5e0
  [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
  [672.727][T736]  ? __phys_addr+0xc9/0x150
  [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
  [672.722][T736]  kasan_report+0xc0/0xf0
  [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
  [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
  [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
  [672.722][T736]  ? split_leaf+0x13d0/0x13d0
  [672.726][T736]  ? rcu_is_watching+0x12/0xb0
  [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
  [672.722][T736]  update_qgroup_status_item+0xf7/0x320
  [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
  [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
  [672.730][T736]  ? spin_bug+0x1d0/0x1d0
  [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
  [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
  [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
  [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
  [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
  [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
  [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
  [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
  [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
  [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
  [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
  [672.732][T736]  ? sigprocmask+0xf2/0x340
  [672.737][T736]  ? __fget_files+0x26a/0x480
  [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
  [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
  [672.736][T736]  __x64_sys_ioctl+0x198/0x210
  [672.736][T736]  do_syscall_64+0x39/0xb0
  [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.739][T736] RIP: 0033:0x4556ad
  [672.742][T736]  </TASK>
  [672.743][T736]
  [672.748][T736] Allocated by task 27677:
  [672.743][T736]  kasan_save_stack+0x22/0x40
  [672.741][T736]  kasan_set_track+0x25/0x30
  [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
  [672.749][T736]  btrfs_alloc_root+0x48/0x90
  [672.746][T736]  btrfs_create_tree+0x146/0xa20
  [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
  [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
  [672.747][T736]  __x64_sys_ioctl+0x198/0x210
  [672.749][T736]  do_syscall_64+0x39/0xb0
  [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.756][T736]
  [672.757][T736] Freed by task 27677:
  [672.759][T736]  kasan_save_stack+0x22/0x40
  [672.759][T736]  kasan_set_track+0x25/0x30
  [672.756][T736]  kasan_save_free_info+0x2e/0x50
  [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
  [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
  [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
  [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
  [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
  [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
  [672.756][T736]  __x64_sys_ioctl+0x198/0x210
  [672.753][T736]  do_syscall_64+0x39/0xb0
  [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.769][T736]
  [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
  [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
  [672.769][T736] The buggy address is located 520 bytes inside of
  [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
  [672.760][T736]
  [672.764][T736] The buggy address belongs to the physical page:
  [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
  [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
  [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
  [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
  [672.771][T736] page dumped because: kasan: bad access detected
  [672.778][T736] page_owner tracks the page as allocated
  [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
  [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
  [672.779][T736]  __alloc_pages+0x1cb/0x4a0
  [672.776][T736]  alloc_pages+0x1aa/0x270
  [672.773][T736]  allocate_slab+0x260/0x390
  [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
  [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
  [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
  [672.789][T736]  __kmalloc+0x4e/0x1a0
  [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
  [672.781][T736]  tomoyo_path_perm+0x22f/0x420
  [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
  [672.780][T736]  security_path_unlink+0xdb/0x150
  [672.788][T736]  do_unlinkat+0x377/0x680
  [672.788][T736]  __x64_sys_unlink+0xca/0x110
  [672.789][T736]  do_syscall_64+0x39/0xb0
  [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.784][T736] page last free stack trace:
  [672.787][T736]  free_pcp_prepare+0x4e5/0x920
  [672.787][T736]  free_unref_page+0x1d/0x4e0
  [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
  [672.797][T736]  qlist_free_all+0x6a/0x180
  [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
  [672.797][T736]  __kasan_slab_alloc+0x64/0x90
  [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
  [672.799][T736]  getname_flags.part.0+0x50/0x4e0
  [672.799][T736]  getname_flags+0x9e/0xe0
  [672.792][T736]  vfs_fstatat+0x77/0xb0
  [672.791][T736]  __do_sys_newlstat+0x84/0x100
  [672.798][T736]  do_syscall_64+0x39/0xb0
  [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.790][T736]
  [672.791][T736] Memory state around the buggy address:
  [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.809][T736]                       ^
  [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
call.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-05 11:23:49 +02:00
Filipe Manana
3bd4337485 btrfs: fix race between quota rescan and disable leading to NULL pointer deref
commit b7adbf9ada3513d2092362c8eac5cddc5b651f5c upstream.

If we have one task trying to start the quota rescan worker while another
one is trying to disable quotas, we can end up hitting a race that results
in the quota rescan worker doing a NULL pointer dereference. The steps for
this are the following:

1) Quotas are enabled;

2) Task A calls the quota rescan ioctl and enters btrfs_qgroup_rescan().
   It calls qgroup_rescan_init() which returns 0 (success) and then joins a
   transaction and commits it;

3) Task B calls the quota disable ioctl and enters btrfs_quota_disable().
   It clears the bit BTRFS_FS_QUOTA_ENABLED from fs_info->flags and calls
   btrfs_qgroup_wait_for_completion(), which returns immediately since the
   rescan worker is not yet running.
   Then it starts a transaction and locks fs_info->qgroup_ioctl_lock;

4) Task A queues the rescan worker, by calling btrfs_queue_work();

5) The rescan worker starts, and calls rescan_should_stop() at the start
   of its while loop, which results in 0 iterations of the loop, since
   the flag BTRFS_FS_QUOTA_ENABLED was cleared from fs_info->flags by
   task B at step 3);

6) Task B sets fs_info->quota_root to NULL;

7) The rescan worker tries to start a transaction and uses
   fs_info->quota_root as the root argument for btrfs_start_transaction().
   This results in a NULL pointer dereference down the call chain of
   btrfs_start_transaction(). The stack trace is something like the one
   reported in Link tag below:

   general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] PREEMPT SMP KASAN
   KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
   CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.1.0-syzkaller-13872-gb6bb9676f216 #0
   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
   Workqueue: btrfs-qgroup-rescan btrfs_work_helper
   RIP: 0010:start_transaction+0x48/0x10f0 fs/btrfs/transaction.c:564
   Code: 48 89 fb 48 (...)
   RSP: 0018:ffffc90000ab7ab0 EFLAGS: 00010206
   RAX: 0000000000000041 RBX: 0000000000000208 RCX: ffff88801779ba80
   RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
   RBP: dffffc0000000000 R08: 0000000000000001 R09: fffff52000156f5d
   R10: fffff52000156f5d R11: 1ffff92000156f5c R12: 0000000000000000
   R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000003
   FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007f2bea75b718 CR3: 000000001d0cc000 CR4: 00000000003506e0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
    <TASK>
    btrfs_qgroup_rescan_worker+0x3bb/0x6a0 fs/btrfs/qgroup.c:3402
    btrfs_work_helper+0x312/0x850 fs/btrfs/async-thread.c:280
    process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
    worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
    kthread+0x266/0x300 kernel/kthread.c:376
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
    </TASK>
   Modules linked in:

So fix this by having the rescan worker function not attempt to start a
transaction if it didn't do any rescan work.

Reported-by: syzbot+96977faa68092ad382c4@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000e5454b05f065a803@google.com/
Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:19:59 +01:00
ChenXiaoSong
01d7c41eac btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()
[ Upstream commit f7e942b5bb35d8e3af54053d19a6bf04143a3955 ]

Syzkaller reported BUG as follows:

  BUG: sleeping function called from invalid context at
       include/linux/sched/mm.h:274
  Call Trace:
   <TASK>
   dump_stack_lvl+0xcd/0x134
   __might_resched.cold+0x222/0x26b
   kmem_cache_alloc+0x2e7/0x3c0
   update_qgroup_limit_item+0xe1/0x390
   btrfs_qgroup_inherit+0x147b/0x1ee0
   create_subvol+0x4eb/0x1710
   btrfs_mksubvol+0xfe5/0x13f0
   __btrfs_ioctl_snap_create+0x2b0/0x430
   btrfs_ioctl_snap_create_v2+0x25a/0x520
   btrfs_ioctl+0x2a1c/0x5ce0
   __x64_sys_ioctl+0x193/0x200
   do_syscall_64+0x35/0x80

Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08 11:23:54 +01:00
Nikolay Borisov
1faf21bdd1 btrfs: move QUOTA_ENABLED check to rescan_should_stop from btrfs_qgroup_rescan_worker
[ Upstream commit db5df254120004471e1c957957ab2f1e612dcbd6 ]

Instead of having 2 places that short circuit the qgroup leaf scan have
everything in the qgroup_rescan_leaf function. In addition to that, also
ensure that the inconsistent qgroup flag is set when rescan_should_stop
returns true. This both retains the old behavior when -EINTR was set in
the body of the loop and at the same time also extends this behavior
when scanning is interrupted due to remount or unmount operations.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: f7e942b5bb35 ("btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08 11:23:54 +01:00
Filipe Manana
47b5ffe863 btrfs: fix race between quota enable and quota rescan ioctl
commit 331cd9461412e103d07595a10289de90004ac890 upstream.

When enabling quotas, at btrfs_quota_enable(), after committing the
transaction, we change fs_info->quota_root to point to the quota root we
created and set BTRFS_FS_QUOTA_ENABLED at fs_info->flags. Then we try
to start the qgroup rescan worker, first by initializing it with a call
to qgroup_rescan_init() - however if that fails we end up freeing the
quota root but we leave fs_info->quota_root still pointing to it, this
can later result in a use-after-free somewhere else.

We have previously set the flags BTRFS_FS_QUOTA_ENABLED and
BTRFS_QGROUP_STATUS_FLAG_ON, so we can only fail with -EINPROGRESS at
btrfs_quota_enable(), which is possible if someone already called the
quota rescan ioctl, and therefore started the rescan worker.

So fix this by ignoring an -EINPROGRESS and asserting we can't get any
other error.

Reported-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/linux-btrfs/20220823015931.421355-1-yebin10@huawei.com/
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26 13:25:12 +02:00
Sidong Yang
41712c5fa5 btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream.

The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and
qgroup rescan worker") by Kawasaki resolves deadlock between quota
disable and qgroup rescan worker. But also there is a deadlock case like
it. It's about enabling or disabling quota and creating or removing
qgroup. It can be reproduced in simple script below.

for i in {1..100}
do
    btrfs quota enable /mnt &
    btrfs qgroup create 1/0 /mnt &
    btrfs qgroup destroy 1/0 /mnt &
    btrfs quota disable /mnt &
done

Here's why the deadlock happens:

1) The quota rescan task is running.

2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
   mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
   the quota rescan task to complete.

3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
   the qgroup_ioctl_lock mutex, because it's being held by task A. At that
   point task B is holding a transaction handle for the current transaction.

4) The quota rescan task calls btrfs_commit_transaction(). This results
   in it waiting for all other tasks to release their handles on the
   transaction, but task B is blocked on the qgroup_ioctl_lock mutex
   while holding a handle on the transaction, and that mutex is being held
   by task A, which is waiting for the quota rescan task to complete,
   resulting in a deadlock between these 3 tasks.

To resolve this issue, the thread disabling quota should unlock
qgroup_ioctl_lock before waiting rescan completion. Move
btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.

Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08 19:09:38 +01:00
Shin'ichiro Kawasaki
32747e0143 btrfs: fix deadlock between quota disable and qgroup rescan worker
commit e804861bd4e69cc5fe1053eedcb024982dde8e48 upstream.

Quota disable ioctl starts a transaction before waiting for the qgroup
rescan worker completes. However, this wait can be infinite and results
in deadlock because of circular dependency among the quota disable
ioctl, the qgroup rescan worker and the other task with transaction such
as block group relocation task.

The deadlock happens with the steps following:

1) Task A calls ioctl to disable quota. It starts a transaction and
   waits for qgroup rescan worker completes.
2) Task B such as block group relocation task starts a transaction and
   joins to the transaction that task A started. Then task B commits to
   the transaction. In this commit, task B waits for a commit by task A.
3) Task C as the qgroup rescan worker starts its job and starts a
   transaction. In this transaction start, task C waits for completion
   of the transaction that task A started and task B committed.

This deadlock was found with fstests test case btrfs/115 and a zoned
null_blk device. The test case enables and disables quota, and the
block group reclaim was triggered during the quota disable by chance.
The deadlock was also observed by running quota enable and disable in
parallel with 'btrfs balance' command on regular null_blk devices.

An example report of the deadlock:

  [372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
  [372.479944]       Not tainted 5.16.0-rc8 #7
  [372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [372.493898] task:kworker/u16:6   state:D stack:    0 pid:  103 ppid:     2 flags:0x00004000
  [372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
  [372.510782] Call Trace:
  [372.514092]  <TASK>
  [372.521684]  __schedule+0xb56/0x4850
  [372.530104]  ? io_schedule_timeout+0x190/0x190
  [372.538842]  ? lockdep_hardirqs_on+0x7e/0x100
  [372.547092]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [372.555591]  schedule+0xe0/0x270
  [372.561894]  btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
  [372.570506]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
  [372.578875]  ? free_unref_page+0x3f2/0x650
  [372.585484]  ? finish_wait+0x270/0x270
  [372.591594]  ? release_extent_buffer+0x224/0x420 [btrfs]
  [372.599264]  btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
  [372.607157]  ? lock_release+0x3a9/0x6d0
  [372.613054]  ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
  [372.620960]  ? do_raw_spin_lock+0x11e/0x250
  [372.627137]  ? rwlock_bug.part.0+0x90/0x90
  [372.633215]  ? lock_is_held_type+0xe4/0x140
  [372.639404]  btrfs_work_helper+0x1ae/0xa90 [btrfs]
  [372.646268]  process_one_work+0x7e9/0x1320
  [372.652321]  ? lock_release+0x6d0/0x6d0
  [372.658081]  ? pwq_dec_nr_in_flight+0x230/0x230
  [372.664513]  ? rwlock_bug.part.0+0x90/0x90
  [372.670529]  worker_thread+0x59e/0xf90
  [372.676172]  ? process_one_work+0x1320/0x1320
  [372.682440]  kthread+0x3b9/0x490
  [372.687550]  ? _raw_spin_unlock_irq+0x24/0x50
  [372.693811]  ? set_kthread_struct+0x100/0x100
  [372.700052]  ret_from_fork+0x22/0x30
  [372.705517]  </TASK>
  [372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
  [372.729827]       Not tainted 5.16.0-rc8 #7
  [372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [372.767106] task:btrfs-transacti state:D stack:    0 pid: 2347 ppid:     2 flags:0x00004000
  [372.787776] Call Trace:
  [372.801652]  <TASK>
  [372.812961]  __schedule+0xb56/0x4850
  [372.830011]  ? io_schedule_timeout+0x190/0x190
  [372.852547]  ? lockdep_hardirqs_on+0x7e/0x100
  [372.871761]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [372.886792]  schedule+0xe0/0x270
  [372.901685]  wait_current_trans+0x22c/0x310 [btrfs]
  [372.919743]  ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
  [372.938923]  ? finish_wait+0x270/0x270
  [372.959085]  ? join_transaction+0xc75/0xe30 [btrfs]
  [372.977706]  start_transaction+0x938/0x10a0 [btrfs]
  [372.997168]  transaction_kthread+0x19d/0x3c0 [btrfs]
  [373.013021]  ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs]
  [373.031678]  kthread+0x3b9/0x490
  [373.047420]  ? _raw_spin_unlock_irq+0x24/0x50
  [373.064645]  ? set_kthread_struct+0x100/0x100
  [373.078571]  ret_from_fork+0x22/0x30
  [373.091197]  </TASK>
  [373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds.
  [373.114147]       Not tainted 5.16.0-rc8 #7
  [373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [373.130393] task:btrfs           state:D stack:    0 pid: 3145 ppid:  3141 flags:0x00004000
  [373.140998] Call Trace:
  [373.145501]  <TASK>
  [373.149654]  __schedule+0xb56/0x4850
  [373.155306]  ? io_schedule_timeout+0x190/0x190
  [373.161965]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.168469]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
  [373.175468]  schedule+0xe0/0x270
  [373.180814]  wait_for_commit+0x104/0x150 [btrfs]
  [373.187643]  ? test_and_set_bit+0x20/0x20 [btrfs]
  [373.194772]  ? kmem_cache_free+0x124/0x550
  [373.201191]  ? btrfs_put_transaction+0x69/0x3d0 [btrfs]
  [373.208738]  ? finish_wait+0x270/0x270
  [373.214704]  ? __btrfs_end_transaction+0x347/0x7b0 [btrfs]
  [373.222342]  btrfs_commit_transaction+0x44d/0x2610 [btrfs]
  [373.230233]  ? join_transaction+0x255/0xe30 [btrfs]
  [373.237334]  ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs]
  [373.245251]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
  [373.253296]  relocate_block_group+0x105/0xc20 [btrfs]
  [373.260533]  ? mutex_lock_io_nested+0x1270/0x1270
  [373.267516]  ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs]
  [373.275155]  ? merge_reloc_roots+0x710/0x710 [btrfs]
  [373.283602]  ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs]
  [373.291934]  ? kmem_cache_free+0x124/0x550
  [373.298180]  btrfs_relocate_block_group+0x35c/0x930 [btrfs]
  [373.306047]  btrfs_relocate_chunk+0x85/0x210 [btrfs]
  [373.313229]  btrfs_balance+0x12f4/0x2d20 [btrfs]
  [373.320227]  ? lock_release+0x3a9/0x6d0
  [373.326206]  ? btrfs_relocate_chunk+0x210/0x210 [btrfs]
  [373.333591]  ? lock_is_held_type+0xe4/0x140
  [373.340031]  ? rcu_read_lock_sched_held+0x3f/0x70
  [373.346910]  btrfs_ioctl_balance+0x548/0x700 [btrfs]
  [373.354207]  btrfs_ioctl+0x7f2/0x71b0 [btrfs]
  [373.360774]  ? lockdep_hardirqs_on_prepare+0x410/0x410
  [373.367957]  ? lockdep_hardirqs_on_prepare+0x410/0x410
  [373.375327]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
  [373.383841]  ? find_held_lock+0x2c/0x110
  [373.389993]  ? lock_release+0x3a9/0x6d0
  [373.395828]  ? mntput_no_expire+0xf7/0xad0
  [373.402083]  ? lock_is_held_type+0xe4/0x140
  [373.408249]  ? vfs_fileattr_set+0x9f0/0x9f0
  [373.414486]  ? selinux_file_ioctl+0x349/0x4e0
  [373.420938]  ? trace_raw_output_lock+0xb4/0xe0
  [373.427442]  ? selinux_inode_getsecctx+0x80/0x80
  [373.434224]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.440660]  ? force_qs_rnp+0x2a0/0x6b0
  [373.446534]  ? lock_is_held_type+0x9b/0x140
  [373.452763]  ? __blkcg_punt_bio_submit+0x1b0/0x1b0
  [373.459732]  ? security_file_ioctl+0x50/0x90
  [373.466089]  __x64_sys_ioctl+0x127/0x190
  [373.472022]  do_syscall_64+0x3b/0x90
  [373.477513]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [373.484823] RIP: 0033:0x7f8f4af7e2bb
  [373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb
  [373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003
  [373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0
  [373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001
  [373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002
  [373.546506]  </TASK>
  [373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds.
  [373.559383]       Not tainted 5.16.0-rc8 #7
  [373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [373.575748] task:btrfs           state:D stack:    0 pid: 3146 ppid:  2168 flags:0x00000000
  [373.586314] Call Trace:
  [373.590846]  <TASK>
  [373.595121]  __schedule+0xb56/0x4850
  [373.600901]  ? __lock_acquire+0x23db/0x5030
  [373.607176]  ? io_schedule_timeout+0x190/0x190
  [373.613954]  schedule+0xe0/0x270
  [373.619157]  schedule_timeout+0x168/0x220
  [373.625170]  ? usleep_range_state+0x150/0x150
  [373.631653]  ? mark_held_locks+0x9e/0xe0
  [373.637767]  ? do_raw_spin_lock+0x11e/0x250
  [373.643993]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
  [373.651267]  ? _raw_spin_unlock_irq+0x24/0x50
  [373.657677]  ? lockdep_hardirqs_on+0x7e/0x100
  [373.664103]  wait_for_completion+0x163/0x250
  [373.670437]  ? bit_wait_timeout+0x160/0x160
  [373.676585]  btrfs_quota_disable+0x176/0x9a0 [btrfs]
  [373.683979]  ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs]
  [373.691340]  ? down_write+0xd0/0x130
  [373.696880]  ? down_write_killable+0x150/0x150
  [373.703352]  btrfs_ioctl+0x3945/0x71b0 [btrfs]
  [373.710061]  ? find_held_lock+0x2c/0x110
  [373.716192]  ? lock_release+0x3a9/0x6d0
  [373.722047]  ? __handle_mm_fault+0x23cd/0x3050
  [373.728486]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
  [373.737032]  ? set_pte+0x6a/0x90
  [373.742271]  ? do_raw_spin_unlock+0x55/0x1f0
  [373.748506]  ? lock_is_held_type+0xe4/0x140
  [373.754792]  ? vfs_fileattr_set+0x9f0/0x9f0
  [373.761083]  ? selinux_file_ioctl+0x349/0x4e0
  [373.767521]  ? selinux_inode_getsecctx+0x80/0x80
  [373.774247]  ? __up_read+0x182/0x6e0
  [373.780026]  ? count_memcg_events.constprop.0+0x46/0x60
  [373.787281]  ? up_write+0x460/0x460
  [373.792932]  ? security_file_ioctl+0x50/0x90
  [373.799232]  __x64_sys_ioctl+0x127/0x190
  [373.805237]  do_syscall_64+0x3b/0x90
  [373.810947]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [373.818102] RIP: 0033:0x7f1383ea02bb
  [373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  [373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb
  [373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003
  [373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
  [373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a
  [373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000
  [373.879838]  </TASK>
  [373.884018]
               Showing all locks held in the system:
  [373.894250] 3 locks held by kworker/4:1/58:
  [373.900356] 1 lock held by khungtaskd/63:
  [373.906333]  #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260
  [373.917307] 3 locks held by kworker/u16:6/103:
  [373.923938]  #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320
  [373.936555]  #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320
  [373.951109]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs]
  [373.964027] 2 locks held by less/1803:
  [373.969982]  #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80
  [373.981295]  #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060
  [373.992969] 1 lock held by btrfs-transacti/2347:
  [373.999893]  #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs]
  [374.015872] 3 locks held by btrfs/3145:
  [374.022298]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs]
  [374.034456]  #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs]
  [374.047646]  #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs]
  [374.063295] 4 locks held by btrfs/3146:
  [374.069647]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs]
  [374.081601]  #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs]
  [374.094283]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs]
  [374.106885]  #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs]

  [374.126780] =============================================

To avoid the deadlock, wait for the qgroup rescan worker to complete
before starting the transaction for the quota disable ioctl. Clear
BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to
request the worker to complete. On transaction start failure, set the
BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag
changes can be done safely since the function btrfs_quota_disable is not
called concurrently because of fs_info->subvol_sem.

Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid
another qgroup rescan worker to start after the previous qgroup worker
completed.

CC: stable@vger.kernel.org # 5.4+
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08 18:30:35 +01:00
Filipe Manana
09e0ef287e btrfs: fix deadlock between quota enable and other quota operations
commit 232796df8c1437c41d308d161007f0715bac0a54 upstream.

When enabling quotas, we attempt to commit a transaction while holding the
mutex fs_info->qgroup_ioctl_lock. This can result on a deadlock with other
quota operations such as:

- qgroup creation and deletion, ioctl BTRFS_IOC_QGROUP_CREATE;

- adding and removing qgroup relations, ioctl BTRFS_IOC_QGROUP_ASSIGN.

This is because these operations join a transaction and after that they
attempt to lock the mutex fs_info->qgroup_ioctl_lock. Acquiring that mutex
after joining or starting a transaction is a pattern followed everywhere
in qgroups, so the quota enablement operation is the one at fault here,
and should not commit a transaction while holding that mutex.

Fix this by making the transaction commit while not holding the mutex.
We are safe from two concurrent tasks trying to enable quotas because
we are serialized by the rw semaphore fs_info->subvol_sem at
btrfs_ioctl_quota_ctl(), which is the only call site for enabling
quotas.

When this deadlock happens, it produces a trace like the following:

  INFO: task syz-executor:25604 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:24800 pid:25604 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  btrfs_commit_transaction+0x994/0x2e90 fs/btrfs/transaction.c:2201
  btrfs_quota_enable+0x95c/0x1790 fs/btrfs/qgroup.c:1120
  btrfs_ioctl_quota_ctl fs/btrfs/ioctl.c:4229 [inline]
  btrfs_ioctl+0x637e/0x7b70 fs/btrfs/ioctl.c:5010
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f86920b2c4d
  RSP: 002b:00007f868f61ac58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007f86921d90a0 RCX: 00007f86920b2c4d
  RDX: 0000000020005e40 RSI: 00000000c0109428 RDI: 0000000000000008
  RBP: 00007f869212bd80 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f86921d90a0
  R13: 00007fff6d233e4f R14: 00007fff6d233ff0 R15: 00007f868f61adc0
  INFO: task syz-executor:25628 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:29080 pid:25628 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
  __mutex_lock_common kernel/locking/mutex.c:669 [inline]
  __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
  btrfs_remove_qgroup+0xb7/0x7d0 fs/btrfs/qgroup.c:1548
  btrfs_ioctl_qgroup_create fs/btrfs/ioctl.c:4333 [inline]
  btrfs_ioctl+0x683c/0x7b70 fs/btrfs/ioctl.c:5014
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: Hao Sun <sunhao.th@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsZQF19bQ1C6=yetF3BvL10OSORpFUcWXTP6HErshDB4dQ@mail.gmail.com/
Fixes: 340f1aa27f36 ("btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable")
CC: stable@vger.kernel.org # 4.19
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:54:27 +01:00
Filipe Manana
3b87d0c583 btrfs: fix sleep while in non-sleep context during qgroup removal
commit 0bb788300990d3eb5582d3301a720f846c78925c upstream.

While removing a qgroup's sysfs entry we end up taking the kernfs_mutex,
through kobject_del(), while holding the fs_info->qgroup_lock spinlock,
producing the following trace:

  [821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
  [821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman
  [821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G        W         5.11.6 #15
  [821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020
  [821.843647] Call Trace:
  [821.843650]  dump_stack+0xa1/0xfb
  [821.843656]  ___might_sleep+0x144/0x160
  [821.843659]  mutex_lock+0x17/0x40
  [821.843662]  kernfs_remove_by_name_ns+0x1f/0x80
  [821.843666]  sysfs_remove_group+0x7d/0xe0
  [821.843668]  sysfs_remove_groups+0x28/0x40
  [821.843670]  kobject_del+0x2a/0x80
  [821.843672]  btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs]
  [821.843685]  __del_qgroup_rb+0x12/0x150 [btrfs]
  [821.843696]  btrfs_remove_qgroup+0x288/0x2a0 [btrfs]
  [821.843707]  btrfs_ioctl+0x3129/0x36a0 [btrfs]
  [821.843717]  ? __mod_lruvec_page_state+0x5e/0xb0
  [821.843719]  ? page_add_new_anon_rmap+0xbc/0x150
  [821.843723]  ? kfree+0x1b4/0x300
  [821.843725]  ? mntput_no_expire+0x55/0x330
  [821.843728]  __x64_sys_ioctl+0x5a/0xa0
  [821.843731]  do_syscall_64+0x33/0x70
  [821.843733]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [821.843736] RIP: 0033:0x4cd3fb
  [821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  [821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb
  [821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f
  [821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000
  [821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d
  [821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049

Fix this by removing the qgroup sysfs entry while not holding the spinlock,
since the spinlock is only meant for protection of the qgroup rbtree.

Reported-by: Stuart Shelton <srcshelton@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/
Fixes: 49e5fb46211de0 ("btrfs: qgroup: export qgroups in sysfs")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30 14:31:53 +02:00
Nikolay Borisov
cf9317ceb5 btrfs: export and rename qgroup_reserve_meta
commit 80e9baed722c853056e0c5374f51524593cb1031 upstream

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11 14:17:22 +01:00
Filipe Manana
29543864c8 btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
[ Upstream commit cb13eea3b49055bd78e6ddf39defd6340f7379fc ]

If we remount a filesystem in RO mode while the qgroup rescan worker is
running, we can end up having it still running after the remount is done,
and at unmount time we may end up with an open transaction that ends up
never getting committed. If that happens we end up with several memory
leaks and can crash when hardware acceleration is unavailable for crc32c.
Possibly it can lead to other nasty surprises too, due to use-after-free
issues.

The following steps explain how the problem happens.

1) We have a filesystem mounted in RW mode and the qgroup rescan worker is
   running;

2) We remount the filesystem in RO mode, and never stop/pause the rescan
   worker, so after the remount the rescan worker is still running. The
   important detail here is that the rescan task is still running after
   the remount operation committed any ongoing transaction through its
   call to btrfs_commit_super();

3) The rescan is still running, and after the remount completed, the
   rescan worker started a transaction, after it finished iterating all
   leaves of the extent tree, to update the qgroup status item in the
   quotas tree. It does not commit the transaction, it only releases its
   handle on the transaction;

4) A filesystem unmount operation starts shortly after;

5) The unmount task, at close_ctree(), stops the transaction kthread,
   which had not had a chance to commit the open transaction since it was
   sleeping and the commit interval (default of 30 seconds) has not yet
   elapsed since the last time it committed a transaction;

6) So after stopping the transaction kthread we still have the transaction
   used to update the qgroup status item open. At close_ctree(), when the
   filesystem is in RO mode and no transaction abort happened (or the
   filesystem is in error mode), we do not expect to have any transaction
   open, so we do not call btrfs_commit_super();

7) We then proceed to destroy the work queues, free the roots and block
   groups, etc. After that we drop the last reference on the btree inode
   by calling iput() on it. Since there are dirty pages for the btree
   inode, corresponding to the COWed extent buffer for the quotas btree,
   btree_write_cache_pages() is invoked to flush those dirty pages. This
   results in creating a bio and submitting it, which makes us end up at
   btrfs_submit_metadata_bio();

8) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
   that calls btrfs_wq_submit_bio(), because check_async_write() returned
   a value of 1. This value of 1 is because we did not have hardware
   acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
   set in fs_info->flags;

9) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
   workqueue at fs_info->workers, which was already freed before by the
   call to btrfs_stop_all_workers() at close_ctree(). This results in an
   invalid memory access due to a use-after-free, leading to a crash.

When this happens, before the crash there are several warnings triggered,
since we have reserved metadata space in a block group, the delayed refs
reservation, etc:

  ------------[ cut here ]------------
  WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
  Code: f0 01 00 00 48 39 c2 75 (...)
  RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
  RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
  RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
  RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
  R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
  FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
   close_ctree+0x2ba/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15ee221ee7
  Code: ff 0b 00 f7 d8 64 89 01 48 (...)
  RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
  RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
  R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace dd74718fef1ed5c6 ]---
  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
  Code: 48 83 bb b0 03 00 00 00 (...)
  RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
  RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
  RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
  R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
  FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
   close_ctree+0x2ba/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15ee221ee7
  Code: ff 0b 00 f7 d8 64 89 01 (...)
  RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
  RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
  R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace dd74718fef1ed5c7 ]---
  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
  Code: ad de 49 be 22 01 00 (...)
  RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
  RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
  RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
  R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
  FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   close_ctree+0x2ba/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f15ee221ee7
  Code: ff 0b 00 f7 d8 64 89 (...)
  RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
  RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
  R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace dd74718fef1ed5c8 ]---
  BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
  BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
  BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
  BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0

And the crash, which only happens when we do not have crc32c hardware
acceleration, produces the following trace immediately after those
warnings:

  stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
  Code: 54 55 53 48 89 f3 (...)
  RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
  RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
  RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
  R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
  FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
   btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
   submit_one_bio+0x61/0x70 [btrfs]
   btree_write_cache_pages+0x414/0x450 [btrfs]
   ? kobject_put+0x9a/0x1d0
   ? trace_hardirqs_on+0x1b/0xf0
   ? _raw_spin_unlock_irqrestore+0x3c/0x60
   ? free_debug_processing+0x1e1/0x2b0
   do_writepages+0x43/0xe0
   ? lock_acquired+0x199/0x490
   __writeback_single_inode+0x59/0x650
   writeback_single_inode+0xaf/0x120
   write_inode_now+0x94/0xd0
   iput+0x187/0x2b0
   close_ctree+0x2c6/0x2fa [btrfs]
   generic_shutdown_super+0x6c/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x31/0x70
   cleanup_mnt+0x100/0x160
   task_work_run+0x68/0xb0
   exit_to_user_mode_prepare+0x1bb/0x1c0
   syscall_exit_to_user_mode+0x4b/0x260
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f3cfebabee7
  Code: ff 0b 00 f7 d8 64 89 01 (...)
  RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
  RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
  RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
  R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
  Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
  ---[ end trace dd74718fef1ed5cc ]---

Finally when we remove the btrfs module (rmmod btrfs), there are several
warnings about objects that were allocated from our slabs but were never
freed, consequence of the transaction that was never committed and got
leaked:

  =============================================================================
  BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   slab_err+0xb7/0xdc
   ? lock_acquired+0x199/0x490
   __kmem_cache_shutdown+0x1ac/0x3c0
   ? lock_release+0x20e/0x4c0
   kmem_cache_destroy+0x55/0x120
   btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  INFO: Object 0x0000000050cbdd61 @offset=12104
  INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
	btrfs_free_tree_block+0x128/0x360 [btrfs]
	__btrfs_cow_block+0x489/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
	btrfs_mount+0x13b/0x3e0 [btrfs]
  INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	commit_cowonly_roots+0xfb/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	sync_filesystem+0x74/0x90
	generic_shutdown_super+0x22/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  INFO: Object 0x0000000086e9b0ff @offset=12776
  INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
  INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
	commit_cowonly_roots+0x248/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	close_ctree+0x113/0x2fa [btrfs]
	generic_shutdown_super+0x6c/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   kmem_cache_destroy+0x119/0x120
   btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  =============================================================================
  BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
  CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   slab_err+0xb7/0xdc
   ? lock_acquired+0x199/0x490
   __kmem_cache_shutdown+0x1ac/0x3c0
   ? lock_release+0x20e/0x4c0
   kmem_cache_destroy+0x55/0x120
   btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  INFO: Object 0x000000001a340018 @offset=4408
  INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
	btrfs_free_tree_block+0x128/0x360 [btrfs]
	__btrfs_cow_block+0x489/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
	btrfs_mount+0x13b/0x3e0 [btrfs]
  INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	btrfs_commit_transaction+0x60/0xc40 [btrfs]
	create_subvol+0x56a/0x990 [btrfs]
	btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
	__btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
	btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
	btrfs_ioctl+0x1a92/0x36f0 [btrfs]
	__x64_sys_ioctl+0x83/0xb0
	do_syscall_64+0x33/0x80
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  INFO: Object 0x000000002b46292a @offset=13648
  INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
  INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	commit_cowonly_roots+0xfb/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	close_ctree+0x113/0x2fa [btrfs]
	generic_shutdown_super+0x6c/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   kmem_cache_destroy+0x119/0x120
   btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  =============================================================================
  BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
  CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   slab_err+0xb7/0xdc
   ? lock_acquired+0x199/0x490
   __kmem_cache_shutdown+0x1ac/0x3c0
   ? __mutex_unlock_slowpath+0x45/0x2a0
   kmem_cache_destroy+0x55/0x120
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 f5 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  INFO: Object 0x000000004cf95ea8 @offset=6264
  INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
	__slab_alloc.isra.0+0x109/0x1c0
	kmem_cache_alloc+0x7bb/0x830
	btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
	btrfs_cow_block+0xf7/0x220 [btrfs]
	btrfs_search_slot+0x62a/0xc40 [btrfs]
	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
	open_ctree+0x125a/0x18a0 [btrfs]
	btrfs_mount_root.cold+0x13/0xed [btrfs]
	legacy_get_tree+0x30/0x60
	vfs_get_tree+0x28/0xe0
	fc_mount+0xe/0x40
	vfs_kern_mount.part.0+0x71/0x90
	btrfs_mount+0x13b/0x3e0 [btrfs]
  INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
	kmem_cache_free+0x34c/0x3c0
	__btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
	commit_cowonly_roots+0xfb/0x300 [btrfs]
	btrfs_commit_transaction+0x367/0xc40 [btrfs]
	close_ctree+0x113/0x2fa [btrfs]
	generic_shutdown_super+0x6c/0x100
	kill_anon_super+0x14/0x30
	btrfs_kill_super+0x12/0x20 [btrfs]
	deactivate_locked_super+0x31/0x70
	cleanup_mnt+0x100/0x160
	task_work_run+0x68/0xb0
	exit_to_user_mode_prepare+0x1bb/0x1c0
	syscall_exit_to_user_mode+0x4b/0x260
	entry_SYSCALL_64_after_hwframe+0x44/0xa9
  kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
  CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   kmem_cache_destroy+0x119/0x120
   exit_btrfs_fs+0xa/0x59 [btrfs]
   __x64_sys_delete_module+0x194/0x260
   ? fpregs_assert_state_consistent+0x1e/0x40
   ? exit_to_user_mode_prepare+0x55/0x1c0
   ? trace_hardirqs_on+0x1b/0xf0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f693e305897
  Code: 73 01 c3 48 8b 0d f9 (...)
  RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
  RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
  R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
  R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
  BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1

Fix this issue by having the remount path stop the qgroup rescan worker
when we are remounting RO and teach the rescan worker to stop when a
remount is in progress. If later a remount in RW mode happens, we are
already resuming the qgroup rescan worker through the call to
btrfs_qgroup_rescan_resume(), so we do not need to worry about that.

Tested-by: Fabian Vogt <fvogt@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:24 +01:00
Qu Wenruo
1888e5df84 btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
commit ae5e070eaca9dbebde3459dd8f4c2756f8c097d0 upstream.

There is a chance of racing for qgroup flushing which may lead to
deadlock:

	Thread A		|	Thread B
   (not holding trans handle)	|  (holding a trans handle)
--------------------------------+--------------------------------
__btrfs_qgroup_reserve_meta()   | __btrfs_qgroup_reserve_meta()
|- try_flush_qgroup()		| |- try_flush_qgroup()
   |- QGROUP_FLUSHING bit set   |    |
   |				|    |- test_and_set_bit()
   |				|    |- wait_event()
   |- btrfs_join_transaction()	|
   |- btrfs_commit_transaction()|

			!!! DEAD LOCK !!!

Since thread A wants to commit transaction, but thread B is holding a
transaction handle, blocking the commit.
At the same time, thread B is waiting for thread A to finish its commit.

This is just a hot fix, and would lead to more EDQUOT when we're near
the qgroup limit.

The proper fix would be to make all metadata/data reservations happen
without holding a transaction handle.

CC: stable@vger.kernel.org # 5.9+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:24 +01:00
Filipe Manana
a855fbe692 btrfs: fix lockdep splat when enabling and disabling qgroups
When running test case btrfs/017 from fstests, lockdep reported the
following splat:

  [ 1297.067385] ======================================================
  [ 1297.067708] WARNING: possible circular locking dependency detected
  [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 1297.068322] ------------------------------------------------------
  [ 1297.068629] btrfs/189080 is trying to acquire lock:
  [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.069274]
		 but task is already holding lock:
  [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
  [ 1297.070219]
		 which lock already depends on the new lock.

  [ 1297.071131]
		 the existing dependency chain (in reverse order) is:
  [ 1297.071721]
		 -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}:
  [ 1297.072375]        lock_acquire+0xd8/0x490
  [ 1297.072710]        __mutex_lock+0xa3/0xb30
  [ 1297.073061]        btrfs_qgroup_inherit+0x59/0x6a0 [btrfs]
  [ 1297.073421]        create_subvol+0x194/0x990 [btrfs]
  [ 1297.073780]        btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
  [ 1297.074133]        __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
  [ 1297.074498]        btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
  [ 1297.074872]        btrfs_ioctl+0x1a90/0x36f0 [btrfs]
  [ 1297.075245]        __x64_sys_ioctl+0x83/0xb0
  [ 1297.075617]        do_syscall_64+0x33/0x80
  [ 1297.075993]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.076380]
		 -> #0 (sb_internal#2){.+.+}-{0:0}:
  [ 1297.077166]        check_prev_add+0x91/0xc60
  [ 1297.077572]        __lock_acquire+0x1740/0x3110
  [ 1297.077984]        lock_acquire+0xd8/0x490
  [ 1297.078411]        start_transaction+0x3c5/0x760 [btrfs]
  [ 1297.078853]        btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.079323]        btrfs_ioctl+0x2c60/0x36f0 [btrfs]
  [ 1297.079789]        __x64_sys_ioctl+0x83/0xb0
  [ 1297.080232]        do_syscall_64+0x33/0x80
  [ 1297.080680]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.081139]
		 other info that might help us debug this:

  [ 1297.082536]  Possible unsafe locking scenario:

  [ 1297.083510]        CPU0                    CPU1
  [ 1297.084005]        ----                    ----
  [ 1297.084500]   lock(&fs_info->qgroup_ioctl_lock);
  [ 1297.084994]                                lock(sb_internal#2);
  [ 1297.085485]                                lock(&fs_info->qgroup_ioctl_lock);
  [ 1297.085974]   lock(sb_internal#2);
  [ 1297.086454]
		  *** DEADLOCK ***
  [ 1297.087880] 3 locks held by btrfs/189080:
  [ 1297.088324]  #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs]
  [ 1297.088799]  #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
  [ 1297.089284]  #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
  [ 1297.089771]
		 stack backtrace:
  [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 1297.092123] Call Trace:
  [ 1297.092629]  dump_stack+0x8d/0xb5
  [ 1297.093115]  check_noncircular+0xff/0x110
  [ 1297.093596]  check_prev_add+0x91/0xc60
  [ 1297.094076]  ? kvm_clock_read+0x14/0x30
  [ 1297.094553]  ? kvm_sched_clock_read+0x5/0x10
  [ 1297.095029]  __lock_acquire+0x1740/0x3110
  [ 1297.095510]  lock_acquire+0xd8/0x490
  [ 1297.095993]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.096476]  start_transaction+0x3c5/0x760 [btrfs]
  [ 1297.096962]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.097451]  btrfs_quota_enable+0xaf/0xa70 [btrfs]
  [ 1297.097941]  ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
  [ 1297.098429]  btrfs_ioctl+0x2c60/0x36f0 [btrfs]
  [ 1297.098904]  ? do_user_addr_fault+0x20c/0x430
  [ 1297.099382]  ? kvm_clock_read+0x14/0x30
  [ 1297.099854]  ? kvm_sched_clock_read+0x5/0x10
  [ 1297.100328]  ? sched_clock+0x5/0x10
  [ 1297.100801]  ? sched_clock_cpu+0x12/0x180
  [ 1297.101272]  ? __x64_sys_ioctl+0x83/0xb0
  [ 1297.101739]  __x64_sys_ioctl+0x83/0xb0
  [ 1297.102207]  do_syscall_64+0x33/0x80
  [ 1297.102673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1297.103148] RIP: 0033:0x7f773ff65d87

This is because during the quota enable ioctl we lock first the mutex
qgroup_ioctl_lock and then start a transaction, and starting a transaction
acquires a fs freeze semaphore (at the VFS level). However, every other
code path, except for the quota disable ioctl path, we do the opposite:
we start a transaction and then lock the mutex.

So fix this by making the quota enable and disable paths to start the
transaction without having the mutex locked, and then, after starting the
transaction, lock the mutex and check if some other task already enabled
or disabled the quotas, bailing with success if that was the case.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:43 +01:00
Filipe Manana
7aa6d35984 btrfs: do nofs allocations when adding and removing qgroup relations
When adding or removing a qgroup relation we are doing a GFP_KERNEL
allocation which is not safe because we are holding a transaction
handle open and that can make us deadlock if the allocator needs to
recurse into the filesystem. So just surround those calls with a
nofs context.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:40 +01:00
Filipe Manana
3d05cad3c3 btrfs: fix lockdep splat when reading qgroup config on mount
Lockdep reported the following splat when running test btrfs/190 from
fstests:

  [ 9482.126098] ======================================================
  [ 9482.126184] WARNING: possible circular locking dependency detected
  [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 9482.126365] ------------------------------------------------------
  [ 9482.126456] mount/24187 is trying to acquire lock:
  [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.126647]
		 but task is already holding lock:
  [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.126886]
		 which lock already depends on the new lock.

  [ 9482.127078]
		 the existing dependency chain (in reverse order) is:
  [ 9482.127213]
		 -> #1 (btrfs-quota-00){++++}-{3:3}:
  [ 9482.127366]        lock_acquire+0xd8/0x490
  [ 9482.127436]        down_read_nested+0x45/0x220
  [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
  [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
  [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
  [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
  [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
  [ 9482.128039]        process_one_work+0x24e/0x5e0
  [ 9482.128110]        worker_thread+0x50/0x3b0
  [ 9482.128181]        kthread+0x153/0x170
  [ 9482.128256]        ret_from_fork+0x22/0x30
  [ 9482.128327]
		 -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
  [ 9482.128464]        check_prev_add+0x91/0xc60
  [ 9482.128551]        __lock_acquire+0x1740/0x3110
  [ 9482.128623]        lock_acquire+0xd8/0x490
  [ 9482.130029]        __mutex_lock+0xa3/0xb30
  [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.133325]        legacy_get_tree+0x30/0x60
  [ 9482.133866]        vfs_get_tree+0x28/0xe0
  [ 9482.134392]        fc_mount+0xe/0x40
  [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
  [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.135942]        legacy_get_tree+0x30/0x60
  [ 9482.136444]        vfs_get_tree+0x28/0xe0
  [ 9482.136949]        path_mount+0x2d7/0xa70
  [ 9482.137438]        do_mount+0x75/0x90
  [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
  [ 9482.138400]        do_syscall_64+0x33/0x80
  [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.139346]
		 other info that might help us debug this:

  [ 9482.140735]  Possible unsafe locking scenario:

  [ 9482.141594]        CPU0                    CPU1
  [ 9482.142011]        ----                    ----
  [ 9482.142411]   lock(btrfs-quota-00);
  [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
  [ 9482.143216]                                lock(btrfs-quota-00);
  [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
  [ 9482.144056]
		  *** DEADLOCK ***

  [ 9482.145242] 2 locks held by mount/24187:
  [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
  [ 9482.146061]  #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.146509]
		 stack backtrace:
  [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 9482.148709] Call Trace:
  [ 9482.149169]  dump_stack+0x8d/0xb5
  [ 9482.149628]  check_noncircular+0xff/0x110
  [ 9482.150090]  check_prev_add+0x91/0xc60
  [ 9482.150561]  ? kvm_clock_read+0x14/0x30
  [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
  [ 9482.151470]  __lock_acquire+0x1740/0x3110
  [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.152402]  lock_acquire+0xd8/0x490
  [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.153354]  __mutex_lock+0xa3/0xb30
  [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.157567]  ? kfree+0x31f/0x3e0
  [ 9482.158030]  legacy_get_tree+0x30/0x60
  [ 9482.158489]  vfs_get_tree+0x28/0xe0
  [ 9482.158947]  fc_mount+0xe/0x40
  [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
  [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.160805]  ? kfree+0x31f/0x3e0
  [ 9482.161260]  ? legacy_get_tree+0x30/0x60
  [ 9482.161714]  legacy_get_tree+0x30/0x60
  [ 9482.162166]  vfs_get_tree+0x28/0xe0
  [ 9482.162616]  path_mount+0x2d7/0xa70
  [ 9482.163070]  do_mount+0x75/0x90
  [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
  [ 9482.163986]  do_syscall_64+0x33/0x80
  [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.164902] RIP: 0033:0x7f51e907caaa

This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.

A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.

Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23 21:16:30 +01:00
Qu Wenruo
6f23277a49 btrfs: qgroup: don't commit transaction when we already hold the handle
[BUG]
When running the following script, btrfs will trigger an ASSERT():

  #/bin/bash
  mkfs.btrfs -f $dev
  mount $dev $mnt
  xfs_io -f -c "pwrite 0 1G" $mnt/file
  sync
  btrfs quota enable $mnt
  btrfs quota rescan -w $mnt

  # Manually set the limit below current usage
  btrfs qgroup limit 512M $mnt $mnt

  # Crash happens
  touch $mnt/file

The dmesg looks like this:

  assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3230!
  invalid opcode: 0000 [#1] SMP PTI
  RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
   btrfs_commit_transaction.cold+0x11/0x5d [btrfs]
   try_flush_qgroup+0x67/0x100 [btrfs]
   __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs]
   btrfs_delayed_update_inode+0xaa/0x350 [btrfs]
   btrfs_update_inode+0x9d/0x110 [btrfs]
   btrfs_dirty_inode+0x5d/0xd0 [btrfs]
   touch_atime+0xb5/0x100
   iterate_dir+0xf1/0x1b0
   __x64_sys_getdents64+0x78/0x110
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fb5afe588db

[CAUSE]
In try_flush_qgroup(), we assume we don't hold a transaction handle at
all.  This is true for data reservation and mostly true for metadata.
Since data space reservation always happens before we start a
transaction, and for most metadata operation we reserve space in
start_transaction().

But there is an exception, btrfs_delayed_inode_reserve_metadata().
It holds a transaction handle, while still trying to reserve extra
metadata space.

When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
will join current transaction and commit, while we still have
transaction handle from qgroup code.

[FIX]
Let's check current->journal before we join the transaction.

If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
we are not holding a transaction, thus are able to join and then commit
transaction.

If current->journal is a valid transaction handle, we avoid committing
transaction and just end it

This is less effective than committing current transaction, as it won't
free metadata reserved space, but we may still free some data space
before new data writes.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634
Fixes: c53e9653605d ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13 22:17:57 +01:00
Dan Carpenter
f07728d541 btrfs: clean up NULL checks in qgroup_unreserve_range()
Smatch complains that this code dereferences "entry" before checking
whether it's NULL on the next line.  Fortunately, rb_entry() will never
return NULL so it doesn't cause a problem.  We can clean up the NULL
checking a bit to silence the warning and make the code more clear.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-05 13:02:20 +01:00
Josef Bacik
5223cc60b4 btrfs: drop the path before adding qgroup items when enabling qgroups
When enabling qgroups we walk the tree_root and then add a qgroup item
for every root that we have.  This creates a lock dependency on the
tree_root and qgroup_root, which results in the following lockdep splat
(with tree locks using rwsem), eg. in tests btrfs/017 or btrfs/022:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-default+ #1299 Not tainted
  ------------------------------------------------------
  btrfs/24552 is trying to acquire lock:
  ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

  but task is already holding lock:
  ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (btrfs-root-00){++++}-{3:3}:
	 __lock_acquire+0x3fb/0x730
	 lock_acquire.part.0+0x6a/0x130
	 down_read_nested+0x46/0x130
	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
	 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
	 btrfs_search_slot+0xc3/0x9f0 [btrfs]
	 btrfs_insert_item+0x6e/0x140 [btrfs]
	 btrfs_create_tree+0x1cb/0x240 [btrfs]
	 btrfs_quota_enable+0xcd/0x790 [btrfs]
	 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
	 __x64_sys_ioctl+0x83/0xa0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (btrfs-quota-00){++++}-{3:3}:
	 check_prev_add+0x91/0xc30
	 validate_chain+0x491/0x750
	 __lock_acquire+0x3fb/0x730
	 lock_acquire.part.0+0x6a/0x130
	 down_read_nested+0x46/0x130
	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
	 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
	 btrfs_search_slot+0xc3/0x9f0 [btrfs]
	 btrfs_insert_empty_items+0x58/0xa0 [btrfs]
	 add_qgroup_item.part.0+0x72/0x210 [btrfs]
	 btrfs_quota_enable+0x3bb/0x790 [btrfs]
	 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
	 __x64_sys_ioctl+0x83/0xa0
	 do_syscall_64+0x2d/0x70
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(btrfs-root-00);
				 lock(btrfs-quota-00);
				 lock(btrfs-root-00);
    lock(btrfs-quota-00);

   *** DEADLOCK ***

  5 locks held by btrfs/24552:
   #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
   #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
   #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
   #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
   #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

  stack backtrace:
  CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   dump_stack+0x77/0x97
   check_noncircular+0xf3/0x110
   check_prev_add+0x91/0xc30
   validate_chain+0x491/0x750
   __lock_acquire+0x3fb/0x730
   lock_acquire.part.0+0x6a/0x130
   ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   ? lock_acquire+0xc4/0x140
   ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   down_read_nested+0x46/0x130
   ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
   ? btrfs_root_node+0xd9/0x200 [btrfs]
   __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
   btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
   btrfs_search_slot+0xc3/0x9f0 [btrfs]
   btrfs_insert_empty_items+0x58/0xa0 [btrfs]
   add_qgroup_item.part.0+0x72/0x210 [btrfs]
   btrfs_quota_enable+0x3bb/0x790 [btrfs]
   btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
   __x64_sys_ioctl+0x83/0xa0
   do_syscall_64+0x2d/0x70
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by dropping the path whenever we find a root item, add the
qgroup item, and then re-lookup the root item we found and continue
processing roots.

Reported-by: David Sterba <dsterba@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26 15:04:57 +01:00
Randy Dunlap
260db43cd2 btrfs: delete duplicated words + other fixes in comments
Delete repeated words in fs/btrfs/.
{to, the, a, and old}
and change "into 2 part" to "into 2 parts".

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:06:50 +02:00
Qu Wenruo
adca4d945c btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT
commit a514d63882c3 ("btrfs: qgroup: Commit transaction in advance to
reduce early EDQUOT") tries to reduce the early EDQUOT problems by
checking the qgroup free against threshold and tries to wake up commit
kthread to free some space.

The problem of that mechanism is, it can only free qgroup per-trans
metadata space, can't do anything to data, nor prealloc qgroup space.

Now since we have the ability to flush qgroup space, and implemented
retry-after-EDQUOT behavior, such mechanism can be completely replaced.

So this patch will cleanup such mechanism in favor of
retry-after-EDQUOT.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:43 +02:00
Qu Wenruo
c53e965360 btrfs: qgroup: try to flush qgroup space when we get -EDQUOT
[PROBLEM]
There are known problem related to how btrfs handles qgroup reserved
space.  One of the most obvious case is the the test case btrfs/153,
which do fallocate, then write into the preallocated range.

  btrfs/153 1s ... - output mismatch (see xfstests-dev/results//btrfs/153.out.bad)
      --- tests/btrfs/153.out     2019-10-22 15:18:14.068965341 +0800
      +++ xfstests-dev/results//btrfs/153.out.bad      2020-07-01 20:24:40.730000089 +0800
      @@ -1,2 +1,5 @@
       QA output created by 153
      +pwrite: Disk quota exceeded
      +/mnt/scratch/testfile2: Disk quota exceeded
      +/mnt/scratch/testfile2: Disk quota exceeded
       Silence is golden
      ...
      (Run 'diff -u xfstests-dev/tests/btrfs/153.out xfstests-dev/results//btrfs/153.out.bad'  to see the entire diff)

[CAUSE]
Since commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to"),
we always reserve space no matter if it's COW or not.

Such behavior change is mostly for performance, and reverting it is not
a good idea anyway.

For preallcoated extent, we reserve qgroup data space for it already,
and since we also reserve data space for qgroup at buffered write time,
it needs twice the space for us to write into preallocated space.

This leads to the -EDQUOT in buffered write routine.

And we can't follow the same solution, unlike data/meta space check,
qgroup reserved space is shared between data/metadata.
The EDQUOT can happen at the metadata reservation, so doing NODATACOW
check after qgroup reservation failure is not a solution.

[FIX]
To solve the problem, we don't return -EDQUOT directly, but every time
we got a -EDQUOT, we try to flush qgroup space:

- Flush all inodes of the root
  NODATACOW writes will free the qgroup reserved at run_dealloc_range().
  However we don't have the infrastructure to only flush NODATACOW
  inodes, here we flush all inodes anyway.

- Wait for ordered extents
  This would convert the preallocated metadata space into per-trans
  metadata, which can be freed in later transaction commit.

- Commit transaction
  This will free all per-trans metadata space.

Also we don't want to trigger flush multiple times, so here we introduce
a per-root wait list and a new root status, to ensure only one thread
starts the flushing.

Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:42 +02:00
Qu Wenruo
263da812e8 btrfs: qgroup: allow to unreserve range without releasing other ranges
[PROBLEM]
Before this patch, when btrfs_qgroup_reserve_data() fails, we free all
reserved space of the changeset.

For example:
	ret = btrfs_qgroup_reserve_data(inode, changeset, 0, SZ_1M);
	ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_1M, SZ_1M);
	ret = btrfs_qgroup_reserve_data(inode, changeset, SZ_2M, SZ_1M);

If the last btrfs_qgroup_reserve_data() failed, it will release the
entire [0, 3M) range.

This behavior is kind of OK for now, as when we hit -EDQUOT, we normally
go error handling and need to release all reserved ranges anyway.

But this also means the following call is not possible:

	ret = btrfs_qgroup_reserve_data();
	if (ret == -EDQUOT) {
		/* Do something to free some qgroup space */
		ret = btrfs_qgroup_reserve_data();
	}

As if the first btrfs_qgroup_reserve_data() fails, it will free all
reserved qgroup space.

[CAUSE]
This is because we release all reserved ranges when
btrfs_qgroup_reserve_data() fails.

[FIX]
This patch will implement a new function, qgroup_unreserve_range(), to
iterate through the ulist nodes, to find any nodes in the failure range,
and remove the EXTENT_QGROUP_RESERVED bits from the io_tree, and
decrease the extent_changeset::bytes_changed, so that we can revert to
previous state.

This allows later patches to retry btrfs_qgroup_reserve_data() if EDQUOT
happens.

Suggested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:42 +02:00
Qu Wenruo
49e5fb4621 btrfs: qgroup: export qgroups in sysfs
This patch will add the following sysfs interface:

  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/referenced
  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/exclusive
  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/max_referenced
  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/max_exclusive
  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/limit_flags

Which is also available in output of "btrfs qgroup show".

  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_data
  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_meta_pertrans
  /sys/fs/btrfs/<UUID>/qgroups/<qgroup_id>/rsv_meta_prealloc

The last 3 rsv related members are not visible to users, but can be very
useful to debug qgroup limit related bugs.

Also, to avoid '/' used in <qgroup_id>, the separator between qgroup
level and qgroup id is changed to '_'.

The interface is not hidden behind 'debug' as we want this interface to
be included into production build and to provide another way to read the
qgroup information besides the ioctls.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:37 +02:00
Qu Wenruo
06f67c4707 btrfs: use __u16 for the return value of btrfs_qgroup_level()
The qgroup level is limited to u16, so no need to use u64 for it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:37 +02:00
Nikolay Borisov
cfdd459215 btrfs: make btrfs_qgroup_check_reserved_leak take btrfs_inode
vfs_inode is used only for the inode number everything else requires
btrfs_inode.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ use btrfs_ino ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:37 +02:00
Nikolay Borisov
7661a3e033 btrfs: make btrfs_qgroup_reserve_data take btrfs_inode
There's only a single use of vfs_inode in a tracepoint so let's take
btrfs_inode directly.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:35 +02:00
Nikolay Borisov
72b7d15bf1 btrfs: make btrfs_qgroup_release_data take btrfs_inode
It just forwards its argument to __btrfs_qgroup_release_data.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:32 +02:00
Nikolay Borisov
8b8a979f1f btrfs: make btrfs_qgroup_free_data take btrfs_inode
It passes btrfs_inode to its callee so change the interface.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:30 +02:00
Nikolay Borisov
8769af96cf btrfs: make __btrfs_qgroup_release_data take btrfs_inode
It uses vfs_inode only for a tracepoint so convert its interface to take
btrfs_inode directly.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:30 +02:00
Nikolay Borisov
df2cfd131f btrfs: make qgroup_free_reserved_data take btrfs_inode
It only uses btrfs_inode so can just as easily take it as an argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:30 +02:00
Qu Wenruo
5958253cf6 btrfs: qgroup: catch reserved space leaks at unmount time
Before this patch, qgroup completely relies on per-inode extent io tree
to detect reserved data space leak.

However previous bug has already shown how release page before
btrfs_finish_ordered_io() could lead to leak, and since it's
QGROUP_RESERVED bit cleared without triggering qgroup rsv, it can't be
detected by per-inode extent io tree.

So this patch adds another (and hopefully the final) safety net to catch
qgroup data reserved space leak.  At least the new safety net catches
all the leaks during development, so it should be pretty useful in the
real world.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:24 +02:00
Qu Wenruo
cbab8ade58 btrfs: qgroup: mark qgroup inconsistent if we're inherting snapshot to a new qgroup
[BUG]
For the following operation, qgroup is guaranteed to be screwed up due
to snapshot adding to a new qgroup:

  # mkfs.btrfs -f $dev
  # mount $dev $mnt
  # btrfs qgroup en $mnt
  # btrfs subv create $mnt/src
  # xfs_io -f -c "pwrite 0 1m" $mnt/src/file
  # sync
  # btrfs qgroup create 1/0 $mnt/src
  # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot
  # btrfs qgroup show -prce $mnt/src
  qgroupid         rfer         excl     max_rfer     max_excl parent  child
  --------         ----         ----     --------     -------- ------  -----
  0/5          16.00KiB     16.00KiB         none         none ---     ---
  0/257         1.02MiB     16.00KiB         none         none ---     ---
  0/258         1.02MiB     16.00KiB         none         none 1/0     ---
  1/0             0.00B        0.00B         none         none ---     0/258
	        ^^^^^^^^^^^^^^^^^^^^

[CAUSE]
The problem is in btrfs_qgroup_inherit(), we don't have good enough
check to determine if the new relation would break the existing
accounting.

Unlike btrfs_add_qgroup_relation(), which has proper check to determine
if we can do quick update without a rescan, in btrfs_qgroup_inherit() we
can even assign a snapshot to multiple qgroups.

[FIX]
Fix it by manually marking qgroup inconsistent for snapshot inheritance.

For subvolume creation, since all its extents are exclusively owned, we
don't need to rescan.

In theory, we should call relation check like quick_update_accounting()
when doing qgroup inheritance and inform user about qgroup accounting
inconsistency.

But we don't have good mechanism to relay that back to the user in the
snapshot creation context, thus we can only silently mark the qgroup
inconsistent.

Anyway, user shouldn't use qgroup inheritance during snapshot creation,
and should add qgroup relationship after snapshot creation by 'btrfs
qgroup assign', which has a much better UI to inform user about qgroup
inconsistent and kick in rescan automatically.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:29 +02:00
Josef Bacik
8c38938c7b btrfs: move the root freeing stuff into btrfs_put_root
There are a few different ways to free roots, either you allocated them
yourself and you just do

free_extent_buffer(root->node);
free_extent_buffer(root->commit_node);
btrfs_put_root(root);

Which is the pattern for log roots.  Or for snapshots/subvolumes that
are being dropped you simply call btrfs_free_fs_root() which does all
the cleanup for you.

Unify this all into btrfs_put_root(), so that we don't free up things
associated with the root until the last reference is dropped.  This
makes the root freeing code much more significant.

The only caveat is at close_ctree() time we have to free the extent
buffers for all of our main roots (extent_root, chunk_root, etc) because
we have to drop the btree_inode and we'll run into issues if we hold
onto those nodes until ->kill_sb() time.  This will be addressed in the
future when we kill the btree_inode.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:59 +01:00
Qu Wenruo
daf475c915 btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running
After the previous patch, qgroup_rescan_running is protected by
btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock.

Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:47 +01:00
Qu Wenruo
d61acbbf54 btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
[BUG]
There are some reports about btrfs wait forever to unmount itself, with
the following call trace:

  INFO: task umount:4631 blocked for more than 491 seconds.
        Tainted: G               X  5.3.8-2-default #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  umount          D    0  4631   3337 0x00000000
  Call Trace:
  ([<00000000174adf7a>] __schedule+0x342/0x748)
   [<00000000174ae3ca>] schedule+0x4a/0xd8
   [<00000000174b1f08>] schedule_timeout+0x218/0x420
   [<00000000174af10c>] wait_for_common+0x104/0x1d8
   [<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs]
   [<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs]
   [<0000000016fa3136>] generic_shutdown_super+0x8e/0x158
   [<0000000016fa34d6>] kill_anon_super+0x26/0x40
   [<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs]
   [<0000000016fa39f8>] deactivate_locked_super+0x68/0x98
   [<0000000016fcb198>] cleanup_mnt+0xc0/0x140
   [<0000000016d6a846>] task_work_run+0xc6/0x110
   [<0000000016d04f76>] do_notify_resume+0xae/0xb8
   [<00000000174b30ae>] system_call+0xe2/0x2c8

[CAUSE]
The problem happens when we have called qgroup_rescan_init(), but
not queued the worker. It can be caused mostly by error handling.

	Qgroup ioctl thread		|	Unmount thread
----------------------------------------+-----------------------------------
					|
btrfs_qgroup_rescan()			|
|- qgroup_rescan_init()			|
|  |- qgroup_rescan_running = true;	|
|					|
|- trans = btrfs_join_transaction()	|
|  Some error happened			|
|					|
|- btrfs_qgroup_rescan() returns error	|
   But qgroup_rescan_running == true;	|
					| close_ctree()
					| |- btrfs_qgroup_wait_for_completion()
					|    |- running == true;
					|    |- wait_for_completion();

btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake
up close_ctree() and we get a deadlock.

All involved qgroup_rescan_init() callers are:

- btrfs_qgroup_rescan()
  The example above. It's possible to trigger the deadlock when error
  happened.

- btrfs_quota_enable()
  Not possible. Just after qgroup_rescan_init() we queue the work.

- btrfs_read_qgroup_config()
  It's possible to trigger the deadlock. It only init the work, the
  work queueing happens in btrfs_qgroup_rescan_resume().
  Thus if error happened in between, deadlock is possible.

We shouldn't set fs_info->qgroup_rescan_running just in
qgroup_rescan_init(), as at that stage we haven't yet queued qgroup
rescan worker to run.

[FIX]
Set qgroup_rescan_running before queueing the work, so that we ensure
the rescan work is queued when we wait for it.

Fixes: 8d9eddad1946 ("Btrfs: fix qgroup rescan worker initialization")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ Change subject and cause analyse, use a smaller fix ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:47 +01:00
Josef Bacik
0024652895 btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root
We are now using these for all roots, rename them to btrfs_put_root()
and btrfs_grab_root();

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:33 +01:00
Josef Bacik
81f096edf0 btrfs: use btrfs_put_fs_root to free roots always
If we are going to track leaked roots we need to free them all the same
way, so don't kfree() roots directly, use btrfs_put_fs_root.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:32 +01:00
Jeff Mahoney
81f7eb00ff btrfs: destroy qgroup extent records on transaction abort
We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.

This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.

Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ Rebased to latest upstream, remove to_qgroup() helper, use
  rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-19 00:35:54 +01:00
Marcos Paulo de Souza
8a36e408d4 btrfs: qgroup: return ENOTCONN instead of EINVAL when quotas are not enabled
[PROBLEM]
qgroup create/remove code is currently returning EINVAL when the user
tries to create a qgroup on a subvolume without quota enabled. EINVAL is
already being used for too many error scenarios so that is hard to
depict what is the problem.

[FIX]
Currently scrub and balance code return -ENOTCONN when the user tries to
cancel/pause and no scrub or balance is currently running for the
desired subvolume. Do the same here by returning -ENOTCONN  when a user
tries to create/delete/assing/list a qgroup on a subvolume without quota
enabled.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:50 +01:00
Marcos Paulo de Souza
e3b0edd297 btrfs: qgroup: remove one-time use variables for quota_root checks
Remove some variables that are set only to be checked later, and never
used.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:50 +01:00
Johannes Thumshirn
26ef8493e1 btrfs: fix memory leak in qgroup accounting
When running xfstests on the current btrfs I get the following splat from
kmemleak:

unreferenced object 0xffff88821b2404e0 (size 32):
  comm "kworker/u4:7", pid 26663, jiffies 4295283698 (age 8.776s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 10 ff fd 26 82 88 ff ff  ...........&....
    10 ff fd 26 82 88 ff ff 20 ff fd 26 82 88 ff ff  ...&.... ..&....
  backtrace:
    [<00000000f94fd43f>] ulist_alloc+0x25/0x60 [btrfs]
    [<00000000fd023d99>] btrfs_find_all_roots_safe+0x41/0x100 [btrfs]
    [<000000008f17bd32>] btrfs_find_all_roots+0x52/0x70 [btrfs]
    [<00000000b7660afb>] btrfs_qgroup_rescan_worker+0x343/0x680 [btrfs]
    [<0000000058e66778>] btrfs_work_helper+0xac/0x1e0 [btrfs]
    [<00000000f0188930>] process_one_work+0x1cf/0x350
    [<00000000af5f2f8e>] worker_thread+0x28/0x3c0
    [<00000000b55a1add>] kthread+0x109/0x120
    [<00000000f88cbd17>] ret_from_fork+0x35/0x40

This corresponds to:

  (gdb) l *(btrfs_find_all_roots_safe+0x41)
  0x8d7e1 is in btrfs_find_all_roots_safe (fs/btrfs/backref.c:1413).
  1408
  1409            tmp = ulist_alloc(GFP_NOFS);
  1410            if (!tmp)
  1411                    return -ENOMEM;
  1412            *roots = ulist_alloc(GFP_NOFS);
  1413            if (!*roots) {
  1414                    ulist_free(tmp);
  1415                    return -ENOMEM;
  1416            }
  1417

Following the lifetime of the allocated 'roots' ulist, it gets freed
again in btrfs_qgroup_account_extent().

But this does not happen if the function is called with the
'BTRFS_FS_QUOTA_ENABLED' flag cleared, then btrfs_qgroup_account_extent()
does a short leave and directly returns.

Instead of directly returning we should jump to the 'out_free' in order to
free all resources as expected.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-08 17:56:17 +01:00
Nikolay Borisov
37d02592f1 btrfs: Fix error messages in qgroup_rescan_init
The branch of qgroup_rescan_init which is executed from the mount
path prints wrong errors messages. The textual print out in case
BTRFS_QGROUP_STATUS_FLAG_RESCAN/BTRFS_QGROUP_STATUS_FLAG_ON are not
set are transposed. Fix it by exchanging their place.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13 13:29:12 +01:00
David Sterba
32da5386d9 btrfs: rename btrfs_block_group_cache
The type name is misleading, a single entry is named 'cache' while this
normally means a collection of objects. Rename that everywhere. Also the
identifier was quite long, making function prototypes harder to format.

Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:51 +01:00
David Sterba
67439dadb0 btrfs: opencode extent_buffer_get
The helper is trivial and we can understand what the atomic_inc on
something named refs does.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:54 +01:00
Omar Sandoval
a0cac0ec96 btrfs: get rid of unique workqueue helper functions
Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.

However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.

While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:48 +01:00