linux/fs/xfs/libxfs
Dave Chinner c85007e2e3 xfs: don't use BMBT btree split workers for IO completion
When we split a BMBT due to record insertion, we offload it to a
worker thread because we can be deep in the stack when we try to
allocate a new block for the BMBT. Allocation can use several
kilobytes of stack (full memory reclaim, swap and/or IO path can
end up on the stack during allocation) and we can already be several
kilobytes deep in the stack when we need to split the BMBT.

A recent workload demonstrated a deadlock in this BMBT split
offload. It requires several things to happen at once:

1. two inodes need a BMBT split at the same time, one must be
unwritten extent conversion from IO completion, the other must be
from extent allocation.

2. there must be a no available xfs_alloc_wq worker threads
available in the worker pool.

3. There must be sustained severe memory shortages such that new
kworker threads cannot be allocated to the xfs_alloc_wq pool for
both threads that need split work to be run

4. The split work from the unwritten extent conversion must run
first.

5. when the BMBT block allocation runs from the split work, it must
loop over all AGs and not be able to either trylock an AGF
successfully, or each AGF is is able to lock has no space available
for a single block allocation.

6. The BMBT allocation must then attempt to lock the AGF that the
second task queued to the rescuer thread already has locked before
it finds an AGF it can allocate from.

At this point, we have an ABBA deadlock between tasks queued on the
xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task
holding the AGF lock can't be run by the rescuer thread until the
task the rescuer thread is runing gets the AGF lock....

This is a highly improbably series of events, but there it is.

There's a couple of ways to fix this, but the easiest way to ensure
that we only punt tasks with a locked AGF that holds enough space
for the BMBT block allocations to the worker thread.

This works for unwritten extent conversion in IO completion (which
doesn't have a locked AGF and space reservations) because we have
tight control over the IO completion stack. It is typically only 6
functions deep when xfs_btree_split() is called because we've
already offloaded the IO completion work to a worker thread and
hence we don't need to worry about stack overruns here.

The other place we can be called for a BMBT split without a
preceeding allocation is __xfs_bunmapi() when punching out the
center of an existing extent. We don't remove extents in the IO
path, so these operations don't tend to be called with a lot of
stack consumed. Hence we don't really need to ship the split off to
a worker thread in these cases, either.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05 08:48:24 -08:00
..
xfs_ag_resv.c xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_ag_resv.h xfs: move perag structure and setup to libxfs/xfs_ag.[ch] 2021-06-02 10:48:24 +10:00
xfs_ag.c xfs: double link the unlinked inode list 2022-07-14 11:46:43 +10:00
xfs_ag.h xfs: create a predicate to verify per-AG extents 2022-10-31 08:58:20 -07:00
xfs_alloc_btree.c xfs: pass perag to xfs_alloc_put_freelist 2022-07-07 19:08:08 +10:00
xfs_alloc_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_alloc.c xfs: fix confusing xfs_extent_item variable names 2023-02-05 08:48:11 -08:00
xfs_alloc.h xfs: pass perag to xfs_alloc_read_agfl 2022-07-07 19:08:15 +10:00
xfs_attr_leaf.c xfs: don't leak memory when attr fork loading fails 2022-07-20 16:40:39 -07:00
xfs_attr_leaf.h xfs: don't hold xattr leaf buffers across transaction rolls 2022-06-29 08:47:56 -07:00
xfs_attr_remote.c xfs: rework xfs_buf_incore() API 2022-07-07 22:05:18 +10:00
xfs_attr_remote.h xfs: rename struct xfs_attr_item to xfs_attr_intent 2022-05-22 16:00:26 +10:00
xfs_attr_sf.h xfs: Convert xfs_attr_sf macros to inline functions 2020-09-15 20:52:42 -07:00
xfs_attr.c xfs: replace XFS_IFORK_Q with a proper predicate function 2022-07-12 11:17:27 -07:00
xfs_attr.h xfs: replace XFS_IFORK_Q with a proper predicate function 2022-07-12 11:17:27 -07:00
xfs_bit.c
xfs_bit.h xfs: Use the correct style for SPDX License Identifier 2020-05-13 15:32:45 -07:00
xfs_bmap_btree.c xfs: replace inode fork size macros with functions 2022-07-12 11:17:27 -07:00
xfs_bmap_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_bmap.c xfs: pass the xfs_bmbt_irec directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_bmap.h xfs: pass the xfs_bmbt_irec directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_btree_staging.c xfs: encode the max btree height in the cursor 2021-10-19 11:45:15 -07:00
xfs_btree_staging.h xfs: xfs_btree_staging.h: delete duplicated words 2020-07-28 20:24:14 -07:00
xfs_btree.c xfs: don't use BMBT btree split workers for IO completion 2023-02-05 08:48:24 -08:00
xfs_btree.h xfs: get rid of assert from xfs_btree_islastblock 2022-12-01 09:36:16 -08:00
xfs_cksum.h
xfs_da_btree.c xfs: trim the mapp array accordingly in xfs_da_grow_inode_int 2022-10-04 16:39:42 +11:00
xfs_da_btree.h xfs: fix TOCTOU race involving the new logged xattrs control knob 2022-06-15 23:13:32 -07:00
xfs_da_format.h Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_defer.c xfs: share xattr name and value buffers when logging xattr updates 2022-05-23 08:43:46 +10:00
xfs_defer.h xfs: Implement attr logging and replay 2022-05-09 19:09:07 +10:00
xfs_dir2_block.c xfs: replace inode fork size macros with functions 2022-07-12 11:17:27 -07:00
xfs_dir2_data.c xfs: convert bp->b_bn references to xfs_buf_daddr() 2021-08-19 10:07:15 -07:00
xfs_dir2_leaf.c xfs: fix exception caused by unexpected illegal bestcount in leaf dir 2022-10-20 09:42:56 -07:00
xfs_dir2_node.c xfs: convert bp->b_bn references to xfs_buf_daddr() 2021-08-19 10:07:15 -07:00
xfs_dir2_priv.h xfs: constify the name argument to various directory functions 2022-03-14 10:23:17 -07:00
xfs_dir2_sf.c xfs: Remove the unneeded result variable 2022-09-19 06:52:14 +10:00
xfs_dir2.c xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx 2022-10-04 16:39:58 +11:00
xfs_dir2.h xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx 2022-10-04 16:39:58 +11:00
xfs_dquot_buf.c xfs: remove the xfs_dqblk_t typedef 2021-10-14 09:19:33 -07:00
xfs_errortag.h xfs: add debug knob to slow down write for fun 2022-11-28 17:54:49 -08:00
xfs_format.h xfs: rename XFS_REFC_COW_START to _COWFLAG 2022-10-31 08:58:22 -07:00
xfs_fs.h Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next 2022-04-21 16:46:17 +10:00
xfs_health.h xfs: Use the correct style for SPDX License Identifier 2020-05-13 15:32:45 -07:00
xfs_ialloc_btree.c xfs: make is_log_ag() a first class helper 2022-07-07 19:13:21 +10:00
xfs_ialloc_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_ialloc.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
xfs_ialloc.h xfs: pass perag to xfs_read_agi 2022-07-07 19:07:47 +10:00
xfs_iext_tree.c xfs: prevent metadata files from being inactivated 2021-03-25 16:47:50 -07:00
xfs_inode_buf.c xfs: make attr forks permanent 2022-07-14 09:46:37 -07:00
xfs_inode_buf.h xfs: kill xfs_sb_version_has_v3inode() 2021-08-19 10:07:14 -07:00
xfs_inode_fork.c xfs: clean up "%Ld/%Lu" which doesn't meet C standard 2022-09-19 06:47:14 +10:00
xfs_inode_fork.h xfs: replace inode fork size macros with functions 2022-07-12 11:17:27 -07:00
xfs_log_format.h xfs: refactor all the EFI/EFD log item sizeof logic 2022-10-31 08:58:20 -07:00
xfs_log_recover.h xfs: convert buf_cancel_table allocation to kmalloc_array 2022-05-27 10:27:19 +10:00
xfs_log_rlimit.c xfs: reduce transaction reservations with reflink 2022-04-28 10:25:42 -07:00
xfs_quota_defs.h xfs: remove warning counters from struct xfs_dquot_res 2022-05-11 17:12:09 +10:00
xfs_refcount_btree.c xfs: track cow/shared record domains explicitly in xfs_refcount_irec 2022-10-31 08:58:21 -07:00
xfs_refcount_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_refcount.c xfs: pass refcount intent directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_refcount.h xfs: pass refcount intent directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_rmap_btree.c xfs: make is_log_ag() a first class helper 2022-07-07 19:13:21 +10:00
xfs_rmap_btree.h xfs: use separate btree cursor cache for each btree type 2021-10-19 11:45:16 -07:00
xfs_rmap.c xfs: pass rmap space mapping directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_rmap.h xfs: pass rmap space mapping directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_rtbitmap.c xfs: pass explicit mount pointer to rtalloc query functions 2022-04-12 06:49:41 +10:00
xfs_sb.c xfs: fix sb write verify for lazysbcount 2022-11-16 19:20:20 -08:00
xfs_sb.h xfs: open code sb verifier feature checks 2021-08-19 10:07:13 -07:00
xfs_shared.h xfs: tag transactions that contain intent done items 2022-05-04 11:46:21 +10:00
xfs_symlink_remote.c xfs: convert XFS_IFORK_PTR to a static inline helper 2022-07-09 15:17:21 -07:00
xfs_trans_inode.c xfs: convert xfs_sb_version_has checks to use mount features 2021-08-19 10:07:14 -07:00
xfs_trans_resv.c xfs: increase rename inode reservation 2022-10-26 13:02:24 -07:00
xfs_trans_resv.h xfs: rename xfs_*alloc*_log_count to _block_count 2022-04-28 10:25:59 -07:00
xfs_trans_space.h xfs: compute the maximum height of the rmap btree when reflink enabled 2021-10-19 11:45:16 -07:00
xfs_types.c xfs: Pre-calculate per-AG agino geometry 2022-07-07 19:13:10 +10:00
xfs_types.h xfs: report refcount domain in tracepoints 2022-10-31 08:58:21 -07:00