linux

iv/linux

History

Filipe Manana 9ad6d91f05 btrfs: fix log replay failure due to race with space cache rebuild

After a sudden power failure we may end up with a space cache on disk that
is not valid and needs to be rebuilt from scratch.

If that happens, during log replay when we attempt to pin an extent buffer
from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for
the space cache to be rebuilt through the call to:

    btrfs_cache_block_group(cache, 1);

That is because that only waits for the task (work queue job) that loads
the space cache to change the cache state from BTRFS_CACHE_FAST to any
other value. That is ok when the space cache on disk exists and is valid,
but when the cache is not valid and needs to be rebuilt, it ends up
returning as soon as the cache state changes to BTRFS_CACHE_STARTED (done
at caching_thread()).

So this means that we can end up trying to unpin a range which is not yet
marked as free in the block group. This results in the call to
btrfs_remove_free_space() to return -EINVAL to
btrfs_pin_extent_for_log_replay(), which in turn makes the log replay fail
as well as mounting the filesystem. More specifically the -EINVAL comes
from free_space_cache.c:remove_from_bitmap(), because the requested range
is not marked as free space (ones in the bitmap), we have the following
condition triggered:

static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
(...)
       if (ret < 0 || search_start != *offset)
            return -EINVAL;
(...)

It's the "search_start != *offset" that results in the condition being
evaluated to true.

When this happens we got the following in dmesg/syslog:

[72383.415114] BTRFS: device fsid 32b95b69-0ea9-496a-9f02-3f5a56dc9322 devid 1 transid 1432 /dev/sdb scanned by mount (3816007)
[72383.417837] BTRFS info (device sdb): disk space caching is enabled
[72383.418536] BTRFS info (device sdb): has skinny extents
[72383.423846] BTRFS info (device sdb): start tree-log replay
[72383.426416] BTRFS warning (device sdb): block group 30408704 has wrong amount of free space
[72383.427686] BTRFS warning (device sdb): failed to load free space cache for block group 30408704, rebuilding it now
[72383.454291] BTRFS: error (device sdb) in btrfs_recover_log_trees:6203: errno=-22 unknown (Failed to pin buffers while recovering log root tree.)
[72383.456725] BTRFS: error (device sdb) in btrfs_replay_log:2253: errno=-22 unknown (Failed to recover log tree)
[72383.460241] BTRFS error (device sdb): open_ctree failed

We also mark the range for the extent buffer in the excluded extents io
tree. That is fine when the space cache is valid on disk and we can load
it, in which case it causes no problems.

However, for the case where we need to rebuild the space cache, because it
is either invalid or it is missing, having the extent buffer range marked
in the excluded extents io tree leads to a -EINVAL failure from the call
to btrfs_remove_free_space(), resulting in the log replay and mount to
fail. This is because by having the range marked in the excluded extents
io tree, the caching thread ends up never adding the range of the extent
buffer as free space in the block group since the calls to
add_new_free_space(), called from load_extent_tree_free(), filter out any
ranges that are marked as excluded extents.

So fix this by making sure that during log replay we wait for the caching
task to finish completely when we need to rebuild a space cache, and also
drop the need to mark the extent buffer range in the excluded extents io
tree, as well as clearing ranges from that tree at
btrfs_finish_extent_commit().

This started to happen with some frequency on large filesystems having
block groups with a lot of fragmentation since the recent commit
e747853cae3ae3 ("btrfs: load free space cache asynchronously"), but in
fact the issue has been there for years, it was just much less likely
to happen.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2021-01-25 18:44:53 +01:00

fs: 9p: add generic splice_write file operation

2020-12-01 21:40:47 +01:00

adfs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

affs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

afs

afs: Fix speculative status fetch going out of order wrt to modifications

2020-11-22 11:27:03 -08:00

autofs

autofs: harden ioctl table

2020-10-16 11:11:22 -07:00

befs

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

bfs

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

btrfs

btrfs: fix log replay failure due to race with space cache rebuild

2021-01-25 18:44:53 +01:00

cachefiles

cachefiles: Handle readpage error correctly

2020-10-26 10:42:54 -07:00

ceph

ceph: check session state after bumping session->s_seq

2020-11-04 20:55:49 +01:00

cifs

cifs: refactor create_sd_buf() and and avoid corrupting the buffer

2020-12-03 17:12:14 -06:00

coda

docs: filesystems: convert coda.txt to ReST

2020-05-05 09:22:21 -06:00

configfs

fs: configfs: delete repeated words in comments

2020-10-16 11:11:19 -07:00

cramfs

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

crypto

fscrypt: fix inline encryption not used on new files

2020-11-11 20:59:07 -08:00

debugfs

debugfs: remove return value of debugfs_create_devm_seqfile()

2020-10-30 08:37:39 +01:00

devpts

…

dlm

networking changes for the 5.10 merge window

2020-10-15 18:42:13 -07:00

ecryptfs

mm, treewide: rename kzfree() to kfree_sensitive()

2020-08-07 11:33:22 -07:00

efivarfs

efivarfs: revert "fix memory leak in efivarfs_create()"

2020-11-25 16:55:02 +01:00

efs

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

erofs

erofs: fix setting up pcluster for temporary pages

2020-11-04 09:15:48 +08:00

exfat

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

exportfs

…

ext2

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

ext4

ext4: fix bogus warning in ext4_update_dx_flag()

2020-11-19 22:41:10 -05:00

f2fs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

fat

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

freevxfs

…

fscache

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

2020-06-03 16:27:18 -07:00

fuse

fuse update for 5.10

2020-10-19 14:28:30 -07:00

gfs2

gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func

2020-12-01 00:21:10 +01:00

hfs

fs: Replace zero-length array with flexible-array member

2020-10-29 17:22:59 -05:00

hfsplus

fs: Replace zero-length array with flexible-array member

2020-10-29 17:22:59 -05:00

hostfs

hostfs: Use kasprintf() instead of fixed buffer formatting

2020-03-29 23:23:00 +02:00

hpfs

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

hugetlbfs

hugetlbfs: prevent filesystem stacking of hugetlbfs

2020-08-12 10:57:56 -07:00

iomap

iomap: clean up writeback state logic on writepage error

2020-11-04 08:52:46 -08:00

isofs

fs: Replace zero-length array with flexible-array member

2020-10-29 17:22:59 -05:00

jbd2

jbd2: fix kernel-doc markups

2020-11-19 22:38:29 -05:00

jffs2

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

jfs

fs: Introduce i_blocks_per_page

2020-09-21 08:59:26 -07:00

kernfs

fsnotify: pass dir and inode arguments to fsnotify()

2020-07-27 23:15:48 +02:00

lockd

The one new feature this time, from Anna Schumaker, is READ_PLUS, which

2020-10-22 09:44:27 -07:00

minix

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

nfs

NFS: Remove unnecessary inode lock in nfs_fsync_dir()

2020-11-12 10:41:26 -05:00

nfs_common

NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy

2020-10-21 10:31:20 -04:00

nfsd

NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy

2020-11-05 17:25:14 -05:00

nilfs2

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

nls

treewide: replace '---help---' in Kconfig files with 'help'

2020-06-14 01:57:21 +09:00

notify

fanotify: fix logic of reporting name info with watched parent

2020-11-09 15:03:08 +01:00

ntfs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

ocfs2

ocfs2: initialize ip_next_orphan

2020-11-14 11:26:04 -08:00

omfs

fs: omfs: use kmemdup() rather than kmalloc+memcpy

2020-09-22 23:39:45 -04:00

openpromfs

…

orangefs

orangefs: remove unnecessary assignment to variable ret

2020-08-04 15:01:58 -04:00

overlayfs

ovl: use generic vfs_ioc_setflags_prepare() helper

2020-10-06 15:38:15 +02:00

proc

io_uring-5.10-2020-11-20

2020-11-20 11:47:22 -08:00

pstore

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

qnx4

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

qnx6

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

quota

2020-10-15 14:56:15 -07:00

ramfs

ramfs: fix nommu mmap with gaps in the page cache

2020-10-16 11:11:22 -07:00

reiserfs

reiserfs: Fix oops during mount

2020-10-01 11:15:31 +02:00

romfs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

squashfs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

sysfs

sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output

2020-10-02 12:02:30 +02:00

sysv

[PATCH] reduce boilerplate in fsid handling

2020-09-18 16:45:50 -04:00

tracefs

…

ubifs

This pull request contains fixes for UBI and UBIFS

2020-10-18 09:56:50 -07:00

udf

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

ufs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

unicode

unicode: Add utf8_casefold_hash

2020-09-10 14:03:31 -07:00

vboxsf

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial

2020-10-15 15:11:56 -07:00

verity

fs-verity: use smp_load_acquire() for ->i_verity_info

2020-07-21 16:02:41 -07:00

xfs

xfs: revert "xfs: fix rmap key and record comparison functions"

2020-11-19 15:17:50 -08:00

zonefs

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

aio.c

vfs: separate __sb_start_write into blocking and non-blocking helpers

2020-11-10 16:53:07 -08:00

anon_inodes.c

…

attr.c

…

bad_inode.c

fs: move the fiemap definitions out of fs.h

2020-06-03 23:16:55 -04:00

binfmt_aout.c

exec: Rename flush_old_exec begin_new_exec

2020-05-07 16:55:47 -05:00

binfmt_elf_fdpic.c

binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot

2020-10-16 11:11:21 -07:00

binfmt_elf.c

fs: Replace zero-length array with flexible-array member

2020-10-29 17:22:59 -05:00

binfmt_em86.c

Merge branch 'akpm' (patches from Andrew)

2020-06-04 19:18:29 -07:00

binfmt_flat.c

binfmt_flat: revert "binfmt_flat: don't offset the data start"

2020-08-24 08:49:13 +10:00

binfmt_misc.c

Merge branch 'akpm' (patches from Andrew)

2020-06-04 19:18:29 -07:00

binfmt_script.c

Merge branch 'akpm' (patches from Andrew)

2020-06-04 19:18:29 -07:00

block_dev.c

block: add a bdget_part helper

2020-10-05 10:38:33 -06:00

buffer.c

mm, memcg: rework remote charging API to support nesting

2020-10-18 09:27:09 -07:00

char_dev.c

vfs: allow unprivileged whiteout creation

2020-05-14 16:44:23 +02:00

compat_binfmt_elf.c

Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK

2020-06-05 13:45:21 -07:00

coredump.c

coredump: fix core_pattern parse error

2020-12-06 10:19:07 -08:00

d_path.c

fs: fix NULL dereference due to data race in prepend_path()

2020-10-14 14:54:45 -07:00

dax.c

fuse update for 5.10

2020-10-19 14:28:30 -07:00

dcache.c

vfs: Use sequence counter with associated spinlock

2020-07-29 16:14:27 +02:00

dcookies.c

…

direct-io.c

2020-10-15 15:03:10 -07:00

drop_caches.c

sysctl: pass kernel pointers to ->proc_handler

2020-04-27 02:07:40 -04:00

eventfd.c

eventfd: convert to f_op->read_iter()

2020-05-06 22:33:43 -04:00

eventpoll.c

ep_create_wakeup_source(): dentry name can change under you...

2020-09-24 19:41:58 -04:00

exec.c

powerpc updates for 5.10

2020-10-16 12:21:15 -07:00

fcntl.c

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

fhandle.c

…

file_table.c

task_work: cleanup notification modes

2020-10-17 15:05:30 -06:00

file.c

io_uring: don't rely on weak ->files references

2020-09-30 20:32:32 -06:00

filesystems.c

fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()

2020-04-10 15:36:22 -07:00

fs_context.c

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

fs_parser.c

fs_parse: mark fs_param_bad_value() as static

2020-10-13 18:38:27 -07:00

fs_pin.c

…

fs_struct.c

vfs: Use sequence counter with associated spinlock

2020-07-29 16:14:27 +02:00

fs_types.c

…

fs-writeback.c

block-5.10-2020-10-12

2020-10-13 12:12:44 -07:00

fsopen.c

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

init.c

init: add an init_dup helper

2020-08-04 21:02:38 -04:00

inode.c

fs: add a filesystem flag for THPs

2020-10-16 11:11:15 -07:00

internal.h

fs: remove compat_sys_mount

2020-09-22 23:45:57 -04:00

io_uring.c

io_uring: fix recvmsg setup with compat buf-select

2020-11-30 11:12:03 -07:00

io-wq.c

io-wq: cancel request if it's asking for files and we don't have them

2020-11-04 10:22:56 -07:00

io-wq.h

io_uring: unify fsize with def->work_flags

2020-10-20 16:03:13 -06:00

ioctl.c

fs: remove ksys_ioctl

2020-07-31 08:16:01 +02:00

Kconfig

tmpfs: support 64-bit inums per-sb

2020-08-07 11:33:24 -07:00

Kconfig.binfmt

treewide: replace '---help---' in Kconfig files with 'help'

2020-06-14 01:57:21 +09:00

kernel_read_file.c

fs/kernel_file_read: Add "offset" arg for partial reads

2020-10-05 13:37:04 +02:00

libfs.c

libfs: fix error cast of negative value in simple_attr_write()

2020-11-22 10:48:22 -08:00

locks.c

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

Makefile

Refactored code for 5.10:

2020-10-23 11:33:41 -07:00

mbcache.c

…

mount.h

proc/mounts: add cursor

2020-05-14 16:44:24 +02:00

mpage.c

fs: convert mpage_readpages to mpage_readahead

2020-06-02 10:59:07 -07:00

namei.c

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

namespace.c

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-24 12:26:05 -07:00

no-block.c

…

nsfs.c

nsproxy: attach to namespaces via pidfds

2020-05-13 11:41:22 +02:00

open.c

exec: move S_ISREG() check earlier

2020-08-12 10:58:01 -07:00

pipe.c

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2020-10-11 11:11:35 -07:00

pnode.c

propagate_one(): mnt_set_mountpoint() needs mount_lock

2020-04-27 10:37:14 -04:00

pnode.h

…

posix_acl.c

vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK

2020-06-08 11:04:19 -07:00

proc_namespace.c

Add a "nosymfollow" mount option.

2020-08-27 16:06:47 -04:00

read_write.c

Refactored code for 5.10:

2020-10-23 11:33:41 -07:00

readdir.c

fs: remove ksys_getdents64

2020-07-31 08:16:00 +02:00

remap_range.c

vfs: move the remap range helpers to remap_range.c

2020-10-15 09:48:49 -07:00

select.c

fs: Replace zero-length array with flexible-array member

2020-10-29 17:22:59 -05:00

seq_file.c

seq_file: add seq_read_iter

2020-11-06 10:05:18 -08:00

signalfd.c

treewide: Use fallthrough pseudo-keyword

2020-08-23 17:36:59 -05:00

splice.c

io_uring-5.10-2020-10-24

2020-10-24 12:40:18 -07:00

stack.c

…

stat.c

fs: remove KSTAT_QUERY_FLAGS

2020-09-26 22:55:05 -04:00

statfs.c

Add a "nosymfollow" mount option.

2020-08-27 16:06:47 -04:00

super.c

vfs: move __sb_{start,end}_write* to fs.h

2020-11-10 16:53:11 -08:00

sync.c

overlayfs update for 5.8

2020-06-09 15:40:50 -07:00

timerfd.c

…

userfaultfd.c

mm: remove the now-unnecessary mmget_still_valid() hack

2020-10-16 11:11:22 -07:00

utimes.c

fs: expose utimes_common

2020-07-31 08:16:01 +02:00

xattr.c

fs/xattr.c: fix kernel-doc warnings for setxattr & removexattr

2020-10-13 18:38:27 -07:00