linux

iv/linux

History

Filipe Manana a362bb864b btrfs: fix hang during unmount when stopping a space reclaim worker

Often when running generic/562 from fstests we can hang during unmount,
resulting in a trace like this:

  Sep 07 11:52:00 debian9 unknown: run fstests generic/562 at 2022-09-07 11:52:00
  Sep 07 11:55:32 debian9 kernel: INFO: task umount:49438 blocked for more than 120 seconds.
  Sep 07 11:55:32 debian9 kernel:       Not tainted 6.0.0-rc2-btrfs-next-122 #1
  Sep 07 11:55:32 debian9 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Sep 07 11:55:32 debian9 kernel: task:umount          state:D stack:    0 pid:49438 ppid: 25683 flags:0x00004000
  Sep 07 11:55:32 debian9 kernel: Call Trace:
  Sep 07 11:55:32 debian9 kernel:  <TASK>
  Sep 07 11:55:32 debian9 kernel:  __schedule+0x3c8/0xec0
  Sep 07 11:55:32 debian9 kernel:  ? rcu_read_lock_sched_held+0x12/0x70
  Sep 07 11:55:32 debian9 kernel:  schedule+0x5d/0xf0
  Sep 07 11:55:32 debian9 kernel:  schedule_timeout+0xf1/0x130
  Sep 07 11:55:32 debian9 kernel:  ? lock_release+0x224/0x4a0
  Sep 07 11:55:32 debian9 kernel:  ? lock_acquired+0x1a0/0x420
  Sep 07 11:55:32 debian9 kernel:  ? trace_hardirqs_on+0x2c/0xd0
  Sep 07 11:55:32 debian9 kernel:  __wait_for_common+0xac/0x200
  Sep 07 11:55:32 debian9 kernel:  ? usleep_range_state+0xb0/0xb0
  Sep 07 11:55:32 debian9 kernel:  __flush_work+0x26d/0x530
  Sep 07 11:55:32 debian9 kernel:  ? flush_workqueue_prep_pwqs+0x140/0x140
  Sep 07 11:55:32 debian9 kernel:  ? trace_clock_local+0xc/0x30
  Sep 07 11:55:32 debian9 kernel:  __cancel_work_timer+0x11f/0x1b0
  Sep 07 11:55:32 debian9 kernel:  ? close_ctree+0x12b/0x5b3 [btrfs]
  Sep 07 11:55:32 debian9 kernel:  ? __trace_bputs+0x10b/0x170
  Sep 07 11:55:32 debian9 kernel:  close_ctree+0x152/0x5b3 [btrfs]
  Sep 07 11:55:32 debian9 kernel:  ? evict_inodes+0x166/0x1c0
  Sep 07 11:55:32 debian9 kernel:  generic_shutdown_super+0x71/0x120
  Sep 07 11:55:32 debian9 kernel:  kill_anon_super+0x14/0x30
  Sep 07 11:55:32 debian9 kernel:  btrfs_kill_super+0x12/0x20 [btrfs]
  Sep 07 11:55:32 debian9 kernel:  deactivate_locked_super+0x2e/0xa0
  Sep 07 11:55:32 debian9 kernel:  cleanup_mnt+0x100/0x160
  Sep 07 11:55:32 debian9 kernel:  task_work_run+0x59/0xa0
  Sep 07 11:55:32 debian9 kernel:  exit_to_user_mode_prepare+0x1a6/0x1b0
  Sep 07 11:55:32 debian9 kernel:  syscall_exit_to_user_mode+0x16/0x40
  Sep 07 11:55:32 debian9 kernel:  do_syscall_64+0x48/0x90
  Sep 07 11:55:32 debian9 kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  Sep 07 11:55:32 debian9 kernel: RIP: 0033:0x7fcde59a57a7
  Sep 07 11:55:32 debian9 kernel: RSP: 002b:00007ffe914217c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
  Sep 07 11:55:32 debian9 kernel: RAX: 0000000000000000 RBX: 00007fcde5ae8264 RCX: 00007fcde59a57a7
  Sep 07 11:55:32 debian9 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b57556cdd0
  Sep 07 11:55:32 debian9 kernel: RBP: 000055b57556cba0 R08: 0000000000000000 R09: 00007ffe91420570
  Sep 07 11:55:32 debian9 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  Sep 07 11:55:32 debian9 kernel: R13: 000055b57556cdd0 R14: 000055b57556ccb8 R15: 0000000000000000
  Sep 07 11:55:32 debian9 kernel:  </TASK>

What happens is the following:

1) The cleaner kthread tries to start a transaction to delete an unused
   block group, but the metadata reservation can not be satisfied right
   away, so a reservation ticket is created and it starts the async
   metadata reclaim task (fs_info->async_reclaim_work);

2) Writeback for all the filler inodes with an i_size of 2K starts
   (generic/562 creates a lot of 2K files with the goal of filling
   metadata space). We try to create an inline extent for them, but we
   fail when trying to insert the inline extent with -ENOSPC (at
   cow_file_range_inline()) - since this is not critical, we fallback
   to non-inline mode (back to cow_file_range()), reserve extents, create
   extent maps and create the ordered extents;

3) An unmount starts, enters close_ctree();

4) The async reclaim task is flushing stuff, entering the flush states one
   by one, until it reaches RUN_DELAYED_IPUTS. There it runs all current
   delayed iputs.

   After running the delayed iputs and before calling
   btrfs_wait_on_delayed_iputs(), one or more ordered extents complete,
   and btrfs_add_delayed_iput() is called for each one through
   btrfs_finish_ordered_io() -> btrfs_put_ordered_extent(). This results
   in bumping fs_info->nr_delayed_iputs from 0 to some positive value.

   So the async reclaim task blocks at btrfs_wait_on_delayed_iputs() waiting
   for fs_info->nr_delayed_iputs to become 0;

5) The current transaction is committed by the transaction kthread, we then
   start unpinning extents and end up calling btrfs_try_granting_tickets()
   through unpin_extent_range(), since we released some space.
   This results in satisfying the ticket created by the cleaner kthread at
   step 1, waking up the cleaner kthread;

6) At close_ctree() we ask the cleaner kthread to park;

7) The cleaner kthread starts the transaction, deletes the unused block
   group, and then calls kthread_should_park(), which returns true, so it
   parks. And at this point we have the delayed iputs added by the
   completion of the ordered extents still pending;

8) Then later at close_ctree(), when we call:

       cancel_work_sync(&fs_info->async_reclaim_work);

   We hang forever, since the cleaner was parked and no one else can run
   delayed iputs after that, while the reclaim task is waiting for the
   remaining delayed iputs to be completed.

Fix this by waiting for all ordered extents to complete and running the
delayed iputs before attempting to stop the async reclaim tasks. Note that
we can not wait for ordered extents with btrfs_wait_ordered_roots() (or
other similar functions) because that waits for the BTRFS_ORDERED_COMPLETE
flag to be set on an ordered extent, but the delayed iput is added after
that, when doing the final btrfs_put_ordered_extent(). So instead wait for
the work queues used for executing ordered extent completion to be empty,
which works because we do the final put on an ordered extent at
btrfs_finish_ordered_io() (while we are in the unmount context).

Fixes: d6fd0ae25c6495 ("Btrfs: fix missing delayed iputs on unmount")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2022-09-13 14:05:13 +02:00

9p: fix EBADF errors in cached mode

2022-06-17 06:03:30 +09:00

adfs

fs: Convert block_read_full_page() to block_read_full_folio()

2022-05-09 16:21:44 -04:00

affs

affs: Convert affs to read_folio

2022-05-09 16:21:44 -04:00

afs

netfs: do not unlock and put the folio twice

2022-07-14 10:10:12 +02:00

autofs

…

befs

befs: Convert befs to read_folio

2022-05-09 16:21:45 -04:00

bfs

fs: Convert block_read_full_page() to block_read_full_folio()

2022-05-09 16:21:44 -04:00

btrfs

btrfs: fix hang during unmount when stopping a space reclaim worker

2022-09-13 14:05:13 +02:00

cachefiles

cachefiles: narrow the scope of flushed requests when releasing fd

2022-07-05 16:12:21 +01:00

ceph

netfs: do not unlock and put the folio twice

2022-07-14 10:10:12 +02:00

cifs

smb3: workaround negprot bug in some Samba servers

2022-07-13 19:59:47 -05:00

coda

coda: Convert coda to read_folio

2022-05-09 16:21:45 -04:00

configfs

configfs: fix a race in configfs_{,un}register_subsystem()

2022-02-22 18:30:28 +01:00

cramfs

cramfs: Convert cramfs to read_folio

2022-05-09 16:21:45 -04:00

crypto

fscrypt: add new helper functions for test_dummy_encryption

2022-05-09 16:18:54 -07:00

debugfs

debugfs: Document that debugfs_create functions need not be error checked

2022-02-25 11:56:13 +01:00

devpts

fsnotify: fix fsnotify hooks in pseudo filesystems

2022-01-24 14:17:02 +01:00

dlm

dlm: use kref_put_lock in __put_lkb

2022-05-02 11:23:49 -05:00

ecryptfs

ecryptfs: Convert ecryptfs to read_folio

2022-05-09 16:21:45 -04:00

efivarfs

…

efs

efs: Convert efs symlinks to read_folio

2022-05-09 16:21:45 -04:00

erofs

Changes since last update:

2022-06-01 11:54:29 -07:00

exfat

exfat: use updated exfat_chain directly during renaming

2022-06-09 21:26:32 +09:00

exportfs

exportfs: support idmapped mounts

2022-04-28 16:31:10 +02:00

ext2

ext2: fix fs corruption when trying to remove a non-empty directory with IO error

2022-06-16 10:55:45 +02:00

ext4

ext4: fix a doubled word "need" in a comment

2022-06-18 19:36:20 -04:00

f2fs

f2fs: do not count ENOENT for error case

2022-06-21 08:29:56 -07:00

fat

Not a lot of material this cycle. Many singleton patches against various

2022-05-27 11:22:03 -07:00

freevxfs

SPDX changes for 5.19-rc1

2022-06-03 10:34:34 -07:00

fscache

fscache: Fix invalidation/lookup race

2022-07-05 16:12:55 +01:00

fuse

libnvdimm for 5.19

2022-05-27 15:49:30 -07:00

gfs2

Page cache changes for 5.19

2022-05-24 19:55:07 -07:00

hfs

fs: Change try_to_free_buffers() to take a folio

2022-05-09 23:12:34 -04:00

hfsplus

fs: Change try_to_free_buffers() to take a folio

2022-05-09 23:12:34 -04:00

hostfs

hostfs: Convert hostfs to read_folio

2022-05-09 16:21:45 -04:00

hpfs

hpfs: Convert symlinks to read_folio

2022-05-09 16:21:45 -04:00

hugetlbfs

hugetlbfs: zero partial pages during fallocate hole punch

2022-06-16 19:11:32 -07:00

iomap

Page cache changes for 5.19

2022-05-24 19:55:07 -07:00

isofs

isofs: Convert symlinks and zisofs to read_folio

2022-05-09 16:21:45 -04:00

jbd2

fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment

2022-06-16 10:36:09 -04:00

jffs2

This pull request contains fixes for JFFS2, UBI and UBIFS

2022-06-03 14:42:24 -07:00

jfs

JFS: One bug fix and some code cleanup

2022-05-27 15:59:21 -07:00

kernfs

kernfs: Separate kernfs_pr_cont_buf and rename_lock.

2022-05-19 19:37:06 +02:00

ksmbd

vfs: fix copy_file_range() regression in cross-fs copies

2022-06-30 15:16:38 -07:00

lockd

lockd: fix nlm_close_files

2022-07-11 15:49:56 -04:00

minix

fs: Convert block_read_full_page() to block_read_full_folio()

2022-05-09 16:21:44 -04:00

netfs

netfs: do not unlock and put the folio twice

2022-07-14 10:10:12 +02:00

nfs

NFSv4: Add an fattr allocation to _nfs4_discover_trunking()

2022-06-30 16:13:00 -04:00

nfs_common

…

nfsd

Notable regression fixes:

2022-07-14 12:29:43 -07:00

nilfs2

nilfs2: fix incorrect masking of permission flags for symlinks

2022-07-03 15:42:33 -07:00

nls

…

notify

fanotify: refine the validation checks on non-dir inode mask

2022-06-28 11:18:13 +02:00

ntfs

Not a lot of material this cycle. Many singleton patches against various

2022-05-27 11:22:03 -07:00

ntfs3

Ntfs3 for 5.19

2022-06-03 16:57:16 -07:00

ocfs2

Not a lot of material this cycle. Many singleton patches against various

2022-05-27 11:22:03 -07:00

omfs

fs: Convert block_read_full_page() to block_read_full_folio()

2022-05-09 16:21:44 -04:00

openpromfs

fs: allocate inode by using alloc_inode_sb()

2022-03-22 15:57:03 -07:00

orangefs

orangefs: Convert to free_folio

2022-05-09 23:12:53 -04:00

overlayfs

ovl: turn of SB_POSIXACL with idmapped layers temporarily

2022-07-08 15:48:31 +02:00

proc

Not a lot of material this cycle. Many singleton patches against various

2022-05-27 11:22:03 -07:00

pstore

pstore: Don't use semaphores in always-atomic-context code

2022-03-15 11:08:23 -07:00

qnx4

fs: Convert block_read_full_page() to block_read_full_folio()

2022-05-09 16:21:44 -04:00

qnx6

fs: Convert mpage_readpage to mpage_read_folio

2022-05-09 16:21:44 -04:00

quota

quota: Prevent memory allocation recursion while holding dq_lock

2022-06-06 10:08:10 +02:00

ramfs

Merge branch 'akpm' (patches from Andrew)

2021-11-09 10:11:53 -08:00

reiserfs

fs: Change try_to_free_buffers() to take a folio

2022-05-09 23:12:34 -04:00

romfs

romfs: Convert romfs to read_folio

2022-05-09 16:21:46 -04:00

smbfs_common

Add various fsctl structs

2022-05-23 20:24:12 -05:00

squashfs

Page cache changes for 5.19

2022-05-24 19:55:07 -07:00

sysfs

kobject: kobj_type: remove default_attrs

2022-04-05 15:39:19 +02:00

sysv

Not a lot of material this cycle. Many singleton patches against various

2022-05-27 11:22:03 -07:00

tracefs

tracefs: Fix syntax errors in comments

2022-06-17 19:01:28 -04:00

ubifs

This pull request contains fixes for JFFS2, UBI and UBIFS

2022-06-03 14:42:24 -07:00

udf

Page cache changes for 5.19

2022-05-24 19:55:07 -07:00

ufs

fs: Convert block_read_full_page() to block_read_full_folio()

2022-05-09 16:21:44 -04:00

unicode

kbuild: unify cmd_copy and cmd_shipped

2022-02-14 10:37:32 +09:00

vboxsf

vboxsf: Convert vboxsf to read_folio

2022-05-09 16:21:46 -04:00

verity

Page cache changes for 5.19

2022-05-24 19:55:07 -07:00

xfs

xfs: prevent a UAF when log IO errors race with unmount

2022-07-01 09:09:52 -07:00

zonefs

zonefs: fix zonefs_iomap_begin() for reads

2022-06-08 19:13:55 +09:00

aio.c

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2022-04-01 19:57:03 -07:00

anon_inodes.c

…

attr.c

fs: account for group membership

2022-06-14 12:18:47 +02:00

bad_inode.c

…

binfmt_aout.c

…

binfmt_elf_fdpic.c

coredump: Snapshot the vmas in do_coredump

2022-03-08 12:55:29 -06:00

binfmt_elf_test.c

binfmt_elf: Introduce KUnit test

2022-03-03 20:38:56 -08:00

binfmt_elf.c

revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"

2022-04-15 14:49:56 -07:00

binfmt_flat.c

binfmt_flat: Remove shared library support

2022-04-22 10:57:18 -07:00

binfmt_misc.c

Fix regression due to "fs: move binfmt_misc sysctl to its own file"

2022-02-09 09:50:02 -08:00

binfmt_script.c

…

buffer.c

fs: Convert drop_buffers() to use a folio

2022-05-09 23:12:34 -04:00

char_dev.c

…

compat_binfmt_elf.c

binfmt_elf: Introduce KUnit test

2022-03-03 20:38:56 -08:00

coredump.c

ptrace: Cleanups for v5.18

2022-03-28 17:29:53 -07:00

d_path.c

d_path: fix Kernel doc validator complaining

2021-11-06 13:30:32 -07:00

dax.c

libnvdimm for 5.19

2022-05-27 15:49:30 -07:00

dcache.c

mm: dcache: use kmem_cache_alloc_lru() to allocate dentry

2022-03-22 15:57:03 -07:00

direct-io.c

direct-io: remove random prefetches

2022-04-17 19:50:02 -06:00

drop_caches.c

…

eventfd.c

…

eventpoll.c

eventpoll: simplify sysctl declaration with register_sysctl()

2022-01-22 08:33:35 +02:00

exec.c

fix race between exit_itimers() and /proc/pid/timers

2022-07-11 09:52:59 -07:00

fcntl.c

VFS: add FMODE_CAN_ODIRECT file flag

2022-05-09 18:20:49 -07:00

fhandle.c

…

file_table.c

Descriptor handling cleanups

2022-06-04 18:52:00 -07:00

file.c

fix the breakage in close_fd_get_file() calling conventions change

2022-06-05 15:03:03 -04:00

filesystems.c

…

fs_context.c

vfs: fs_context: fix up param length parsing in legacy_parse_param

2022-01-18 09:23:19 +02:00

fs_parser.c

fs_parse: allow parameter value to be empty

2021-12-09 14:09:36 -05:00

fs_pin.c

…

fs_struct.c

…

fs_types.c

…

fs-writeback.c

writeback: Fix inode->i_io_list not be protected by inode->i_lock error

2022-06-06 09:54:30 +02:00

fsopen.c

uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)

2022-05-19 23:25:10 -04:00

init.c

…

inode.c

writeback: Fix inode->i_io_list not be protected by inode->i_lock error

2022-06-06 09:54:30 +02:00

internal.h

Cleanups (and one fix) around struct mount handling.

2022-06-04 19:00:05 -07:00

io_uring.c

io_uring: do not recycle buffer in READV

2022-07-21 08:31:31 -06:00

io-wq.c

io-wq: use __set_notify_signal() to wake workers

2022-04-30 08:39:54 -06:00

io-wq.h

io_uring: add support for IORING_ASYNC_CANCEL_ALL

2022-04-24 18:18:18 -06:00

ioctl.c

Fixes for 5.18-rc1:

2022-04-01 19:35:56 -07:00

Kconfig

mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*

2022-04-28 23:16:15 -07:00

Kconfig.binfmt

m68knommu: changes for linux 5.19

2022-05-30 10:56:18 -07:00

kernel_read_file.c

…

libfs.c

fs: Convert simple_readpage to simple_read_folio

2022-05-09 16:21:44 -04:00

locks.c

fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict

2022-05-19 12:25:39 -04:00

Makefile

Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the

2022-02-01 11:13:24 -08:00

mbcache.c

…

mount.h

…

mpage.c

fs: Change try_to_free_buffers() to take a folio

2022-05-09 23:12:34 -04:00

namei.c

Several cleanups in fs/namei.c.

2022-06-04 19:07:15 -07:00

namespace.c

Cleanups (and one fix) around struct mount handling.

2022-06-04 19:00:05 -07:00

no-block.c

…

nsfs.c

…

open.c

RISC-V Patches for the 5.19 Merge Window, Part 1

2022-05-31 14:10:54 -07:00

pipe.c

Not a lot of material this cycle. Many singleton patches against various

2022-05-27 11:22:03 -07:00

pnode.c

…

pnode.h

…

posix_acl.c

fs: fix acl translation

2022-04-19 10:19:02 -07:00

proc_namespace.c

fs: add is_idmapped_mnt() helper

2021-12-03 18:44:06 +01:00

read_write.c

vfs: fix copy_file_range() regression in cross-fs copies

2022-06-30 15:16:38 -07:00

readdir.c

…

remap_range.c

Revert "vf/remap: return the amount of bytes actually deduplicated"

2022-07-14 15:35:24 -07:00

select.c

select: Fix indefinitely sleeping task in poll_schedule_timeout()

2022-01-11 09:03:05 -08:00

seq_file.c

rxrpc: Fix locking issue

2022-05-22 21:03:01 +01:00

signalfd.c

Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

2022-01-17 05:49:30 +02:00

splice.c

mm: Convert remove_mapping() to take a folio

2022-03-21 12:59:01 -04:00

stack.c

…

stat.c

RISC-V Patches for the 5.19 Merge Window, Part 1

2022-05-31 14:10:54 -07:00

statfs.c

…

super.c

block: add a bdev_stable_writes helper

2022-04-17 19:49:59 -06:00

sync.c

riscv: compat: syscall: Add compat_sys_call_table implementation

2022-04-26 13:36:25 -07:00

sysctls.c

fs: move namespace sysctls and declare fs base directory

2022-01-22 08:33:36 +02:00

timerfd.c

…

userfaultfd.c

mm/uffd: enable write protection for shmem & hugetlbfs

2022-05-13 07:20:11 -07:00

utimes.c

…

xattr.c

fs: split off do_getxattr from getxattr

2022-04-24 18:18:37 -06:00