2254a7396a
syzbot reported this warning from the faux inodegc shrinker that tries
to kick off inodegc work:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444
RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444
Call Trace:
__queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672
mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746
xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline]
xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191
do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853
shrink_slab+0x175/0x660 mm/vmscan.c:1013
shrink_one+0x502/0x810 mm/vmscan.c:5343
shrink_many mm/vmscan.c:5394 [inline]
lru_gen_shrink_node mm/vmscan.c:5511 [inline]
shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
kswapd_shrink_node mm/vmscan.c:7262 [inline]
balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
kswapd+0x677/0xd60 mm/vmscan.c:7712
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
This warning corresponds to this code in __queue_work:
/*
* For a draining wq, only works from the same workqueue are
* allowed. The __WQ_DESTROYING helps to spot the issue that
* queues a new work item to a wq after destroy_workqueue(wq).
*/
if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) &&
WARN_ON_ONCE(!is_chained_work(wq))))
return;
For this to trip, we must have a thread draining the inodedgc workqueue
and a second thread trying to queue inodegc work to that workqueue.
This can happen if freezing or a ro remount race with reclaim poking our
faux inodegc shrinker and another thread dropping an unlinked O_RDONLY
file:
Thread 0 Thread 1 Thread 2
xfs_inodegc_stop
xfs_inodegc_shrinker_scan
xfs_is_inodegc_enabled
<yes, will continue>
xfs_clear_inodegc_enabled
xfs_inodegc_queue_all
<list empty, do not queue inodegc worker>
xfs_inodegc_queue
<add to list>
xfs_is_inodegc_enabled
<no, returns>
drain_workqueue
<set WQ_DRAINING>
llist_empty
<no, will queue list>
mod_delayed_work_on(..., 0)
__queue_work
<sees WQ_DRAINING, kaboom>
In other words, everything between the access to inodegc_enabled state
and the decision to poke the inodegc workqueue requires some kind of
coordination to avoid the WQ_DRAINING state. We could perhaps introduce
a lock here, but we could also try to eliminate WQ_DRAINING from the
picture.
We could replace the drain_workqueue call with a loop that flushes the
workqueue and queues workers as long as there is at least one inode
present in the per-cpu inodegc llists. We've disabled inodegc at this
point, so we know that the number of queued inodes will eventually hit
zero as long as xfs_inodegc_start cannot reactivate the workers.
There are four callers of xfs_inodegc_start. Three of them come from the
VFS with s_umount held: filesystem thawing, failed filesystem freezing,
and the rw remount transition. The fourth caller is mounting rw (no
remount or freezing possible).
There are three callers ofs xfs_inodegc_stop. One is unmounting (no
remount or thaw possible). Two of them come from the VFS with s_umount
held: fs freezing and ro remount transition.
Hence, it is correct to replace the drain_workqueue call with a loop
that drains the inodegc llists.
Fixes:
|
||
---|---|---|
.. | ||
libxfs | ||
scrub | ||
Kconfig | ||
kmem.c | ||
kmem.h | ||
Makefile | ||
mrlock.h | ||
xfs_acl.c | ||
xfs_acl.h | ||
xfs_aops.c | ||
xfs_aops.h | ||
xfs_attr_inactive.c | ||
xfs_attr_item.c | ||
xfs_attr_item.h | ||
xfs_attr_list.c | ||
xfs_bio_io.c | ||
xfs_bmap_item.c | ||
xfs_bmap_item.h | ||
xfs_bmap_util.c | ||
xfs_bmap_util.h | ||
xfs_buf_item_recover.c | ||
xfs_buf_item.c | ||
xfs_buf_item.h | ||
xfs_buf.c | ||
xfs_buf.h | ||
xfs_dahash_test.c | ||
xfs_dahash_test.h | ||
xfs_dir2_readdir.c | ||
xfs_discard.c | ||
xfs_discard.h | ||
xfs_dquot_item_recover.c | ||
xfs_dquot_item.c | ||
xfs_dquot_item.h | ||
xfs_dquot.c | ||
xfs_dquot.h | ||
xfs_drain.c | ||
xfs_drain.h | ||
xfs_error.c | ||
xfs_error.h | ||
xfs_export.c | ||
xfs_export.h | ||
xfs_extent_busy.c | ||
xfs_extent_busy.h | ||
xfs_extfree_item.c | ||
xfs_extfree_item.h | ||
xfs_file.c | ||
xfs_filestream.c | ||
xfs_filestream.h | ||
xfs_fsmap.c | ||
xfs_fsmap.h | ||
xfs_fsops.c | ||
xfs_fsops.h | ||
xfs_globals.c | ||
xfs_health.c | ||
xfs_icache.c | ||
xfs_icache.h | ||
xfs_icreate_item.c | ||
xfs_icreate_item.h | ||
xfs_inode_item_recover.c | ||
xfs_inode_item.c | ||
xfs_inode_item.h | ||
xfs_inode.c | ||
xfs_inode.h | ||
xfs_ioctl32.c | ||
xfs_ioctl32.h | ||
xfs_ioctl.c | ||
xfs_ioctl.h | ||
xfs_iomap.c | ||
xfs_iomap.h | ||
xfs_iops.c | ||
xfs_iops.h | ||
xfs_itable.c | ||
xfs_itable.h | ||
xfs_iunlink_item.c | ||
xfs_iunlink_item.h | ||
xfs_iwalk.c | ||
xfs_iwalk.h | ||
xfs_linux.h | ||
xfs_log_cil.c | ||
xfs_log_priv.h | ||
xfs_log_recover.c | ||
xfs_log.c | ||
xfs_log.h | ||
xfs_message.c | ||
xfs_message.h | ||
xfs_mount.c | ||
xfs_mount.h | ||
xfs_mru_cache.c | ||
xfs_mru_cache.h | ||
xfs_notify_failure.c | ||
xfs_ondisk.h | ||
xfs_pnfs.c | ||
xfs_pnfs.h | ||
xfs_pwork.c | ||
xfs_pwork.h | ||
xfs_qm_bhv.c | ||
xfs_qm_syscalls.c | ||
xfs_qm.c | ||
xfs_qm.h | ||
xfs_quota.h | ||
xfs_quotaops.c | ||
xfs_refcount_item.c | ||
xfs_refcount_item.h | ||
xfs_reflink.c | ||
xfs_reflink.h | ||
xfs_rmap_item.c | ||
xfs_rmap_item.h | ||
xfs_rtalloc.c | ||
xfs_rtalloc.h | ||
xfs_stats.c | ||
xfs_stats.h | ||
xfs_super.c | ||
xfs_super.h | ||
xfs_symlink.c | ||
xfs_symlink.h | ||
xfs_sysctl.c | ||
xfs_sysctl.h | ||
xfs_sysfs.c | ||
xfs_sysfs.h | ||
xfs_trace.c | ||
xfs_trace.h | ||
xfs_trans_ail.c | ||
xfs_trans_buf.c | ||
xfs_trans_dquot.c | ||
xfs_trans_priv.h | ||
xfs_trans.c | ||
xfs_trans.h | ||
xfs_xattr.c | ||
xfs_xattr.h | ||
xfs.h |