linux/fs/btrfs
Jeff Mahoney 144439376b btrfs: fix lockup in find_free_extent with read-only block groups
If we have a block group that is all of the following:
1) uncached in memory
2) is read-only
3) has a disk cache state that indicates we need to recreate the cache

AND the file system has enough free space fragmentation such that the
request for an extent of a given size can't be honored;

AND have a single CPU core;

AND it's the block group with the highest starting offset such that
there are no opportunities (like reading from disk) for the loop to
yield the CPU;

We can end up with a lockup.

The root cause is simple.  Once we're in the position that we've read in
all of the other block groups directly and none of those block groups
can honor the request, there are no more opportunities to sleep.  We end
up trying to start a caching thread which never gets run if we only have
one core.  This *should* present as a hung task waiting on the caching
thread to make some progress, but it doesn't.  Instead, it degrades into
a busy loop because of the placement of the read-only check.

During the first pass through the loop, block_group->cached will be set
to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
read-only check and short circuit the loop.  We're not yet in
LOOP_CACHING_WAIT, so we skip that loop back before going through the
loop again for other raid groups.

Then we move to LOOP_CACHING_WAIT state.

During the this pass through the loop, ->cached will still be
BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
cache_block_group, do a lot of nothing, and return, and also set
have_caching_bg again.  Then we hit the read-only check and short circuit
the loop.  The same thing happens as before except now we DO trigger
the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
top.  We do this forever.

There are two fixes in this patch since they address the same underlying
bug.

The first is to add a cond_resched to the end of the loop to ensure
that the caching thread always has an opportunity to run.  This will
fix the soft lockup issue, but find_free_extent will still loop doing
nothing until the thread has completed.

The second is to move the read-only check to the top of the loop.  We're
never going to return an allocation within a read-only block group so
we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
would cause the same problem except that BTRFS_CACHE_ERROR is considered
a "done" state and we won't re-set have_caching_bg again.

Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
the testing process.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-07-24 16:04:02 +02:00
..
tests Btrfs: replace tree->mapping with tree->private_data 2017-06-19 18:25:58 +02:00
acl.c btrfs: Don't clear SGID when inheriting ACLs 2017-06-29 20:24:59 +02:00
async-thread.c btrfs: fix crash when tracepoint arguments are freed by wq callbacks 2017-01-09 11:24:50 +01:00
async-thread.h btrfs: limit async_work allocation and worker func duration 2016-12-13 11:01:30 -08:00
backref.c btrfs: use GFP_KERNEL in init_ipath 2017-06-19 18:26:02 +02:00
backref.h
btrfs_inode.h Btrfs: fix reported number of inode blocks 2017-04-26 16:27:26 +01:00
check-integrity.c btrfs: sink gfp parameter to btrfs_io_bio_alloc 2017-06-19 18:26:04 +02:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c btrfs: cloned bios must not be iterated by bio_for_each_segment_all 2017-07-14 20:39:31 +02:00
compression.h btrfs: reduce arguments for decompress_bio ops 2017-06-19 18:26:00 +02:00
ctree.c btrfs: adjust includes after vmalloc removal 2017-06-19 18:26:02 +02:00
ctree.h btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
delayed-inode.h btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
delayed-ref.c Btrfs: return old and new total ref mods when adding delayed refs 2017-06-29 20:17:01 +02:00
delayed-ref.h Btrfs: return old and new total ref mods when adding delayed refs 2017-06-29 20:17:01 +02:00
dev-replace.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
dev-replace.h btrfs: constify device path passed to relevant helpers 2017-02-28 14:26:07 +01:00
dir-item.c btrfs: fix validation of XATTR_ITEM dir items 2017-06-29 20:06:11 +02:00
disk-io.c btrfs: cloned bios must not be iterated by bio_for_each_segment_all 2017-07-14 20:39:31 +02:00
disk-io.h btrfs: btrfs_wait_tree_block_writeback can be void return 2017-06-19 18:26:01 +02:00
export.c btrfs: Check name_len before reading btrfs_get_name 2017-06-21 19:16:04 +02:00
export.h
extent_io.c Btrfs: fix unexpected return value of bio_readpage_error 2017-07-14 20:42:37 +02:00
extent_io.h Btrfs: fix unexpected return value of bio_readpage_error 2017-07-14 20:42:37 +02:00
extent_map.c btrfs: convert extent_map.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
extent_map.h btrfs: convert extent_map.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
extent-tree.c btrfs: fix lockup in find_free_extent with read-only block groups 2017-07-24 16:04:02 +02:00
file-item.c Btrfs: change how we iterate bios in endio 2017-06-19 18:25:59 +02:00
file.c btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
free-space-cache.c btrfs: use clear_page where appropriate 2017-04-18 14:07:26 +02:00
free-space-cache.h btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
free-space-tree.c Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps 2017-06-19 18:26:01 +02:00
free-space-tree.h
hash.c crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
inode-map.c btrfs: qgroup: Introduce extent changeset for qgroup reserve functions 2017-06-29 20:17:02 +02:00
inode-map.h
inode.c btrfs: btrfs_create_repair_bio never fails, skip error handling 2017-07-14 20:42:08 +02:00
ioctl.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
Kconfig
locking.c
locking.h
lzo.c btrfs: switch to kvmalloc and GFP_KERNEL in lzo/zlib alloc_workspace 2017-06-19 18:26:02 +02:00
Makefile
math.h
ordered-data.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
ordered-data.h btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
orphan.c
print-tree.c Btrfs: let btrfs_print_leaf print more about block group 2017-06-19 18:26:00 +02:00
print-tree.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
props.c btrfs: Verify dir_item in iterate_object_props 2017-06-21 19:16:04 +02:00
props.h
qgroup.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
qgroup.h btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
raid56.c Btrfs: fix write corruption due to bio cloning on raid5/6 2017-07-13 19:26:01 +01:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: remove unused member err from reada_extent 2017-06-19 18:25:59 +02:00
relocation.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
root-tree.c btrfs: Check name_len before in btrfs_del_root_ref 2017-06-21 19:16:04 +02:00
scrub.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
send.c Btrfs: incremental send, fix invalid memory access 2017-07-06 23:02:30 +01:00
send.h
struct-funcs.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
super.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
sysfs.c btrfs: Add quota_override knob into sysfs 2017-06-19 18:25:58 +02:00
sysfs.h
transaction.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
transaction.h btrfs: remove unused qgroup members from btrfs_trans_handle 2017-04-18 14:07:25 +02:00
tree-defrag.c
tree-log.c Btrfs: fix dir item validation when replaying xattr deletes 2017-07-19 20:38:16 +02:00
tree-log.h btrfs: Make btrfs_del_inode_ref take btrfs_inode 2017-02-14 15:50:54 +01:00
ulist.c btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
ulist.h btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
uuid-tree.c btrfs: return the actual error value from from btrfs_uuid_tree_iterate 2016-12-19 18:08:15 +01:00
volumes.c btrfs: preallocate device flush bio 2017-06-21 19:03:38 +02:00
volumes.h btrfs: preallocate device flush bio 2017-06-21 19:03:38 +02:00
xattr.c btrfs: Check name_len with boundary in verify dir_item 2017-06-21 19:16:04 +02:00
xattr.h
zlib.c btrfs: switch to kvmalloc and GFP_KERNEL in lzo/zlib alloc_workspace 2017-06-19 18:26:02 +02:00