linux/fs
David Howells 598ad0bd09 netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in
a lockdep warning like that below.  The problem comes from cachefiles
needing to take the sb_writers lock in order to do a write to the cache,
but being asked to do this by netfslib called from readpage, readahead or
write_begin[1].

Fix this by always offloading the write to the cache off to a worker
thread.  The main thread doesn't need to wait for it, so deadlock can be
avoided.

This can be tested by running the quick xfstests on something like afs or
ceph with lockdep enabled.

WARNING: possible circular locking dependency detected
5.15.0-rc1-build2+ #292 Not tainted
------------------------------------------------------
holetest/65517 is trying to acquire lock:
ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5

but task is already holding lock:
ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&mm->mmap_lock#2){++++}-{3:3}:
       validate_chain+0x3c4/0x4a8
       __lock_acquire+0x89d/0x949
       lock_acquire+0x2dc/0x34b
       __might_fault+0x87/0xb1
       strncpy_from_user+0x25/0x18c
       removexattr+0x7c/0xe5
       __do_sys_fremovexattr+0x73/0x96
       do_syscall_64+0x67/0x7a
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #1 (sb_writers#10){.+.+}-{0:0}:
       validate_chain+0x3c4/0x4a8
       __lock_acquire+0x89d/0x949
       lock_acquire+0x2dc/0x34b
       cachefiles_write+0x2b3/0x4bb
       netfs_rreq_do_write_to_cache+0x3b5/0x432
       netfs_readpage+0x2de/0x39d
       filemap_read_page+0x51/0x94
       filemap_get_pages+0x26f/0x413
       filemap_read+0x182/0x427
       new_sync_read+0xf0/0x161
       vfs_read+0x118/0x16e
       ksys_read+0xb8/0x12e
       do_syscall_64+0x67/0x7a
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}:
       check_noncircular+0xe4/0x129
       check_prev_add+0x16b/0x3a4
       validate_chain+0x3c4/0x4a8
       __lock_acquire+0x89d/0x949
       lock_acquire+0x2dc/0x34b
       down_read+0x40/0x4a
       filemap_fault+0x276/0x7a5
       __do_fault+0x96/0xbf
       do_fault+0x262/0x35a
       __handle_mm_fault+0x171/0x1b5
       handle_mm_fault+0x12a/0x233
       do_user_addr_fault+0x3d2/0x59c
       exc_page_fault+0x85/0xa5
       asm_exc_page_fault+0x1e/0x30

other info that might help us debug this:

Chain exists of:
  mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_lock#2);
                               lock(sb_writers#10);
                               lock(&mm->mmap_lock#2);
  lock(mapping.invalidate_lock#3);

 *** DEADLOCK ***

1 lock held by holetest/65517:
 #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c

stack backtrace:
CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 dump_stack_lvl+0x45/0x59
 check_noncircular+0xe4/0x129
 ? print_circular_bug+0x207/0x207
 ? validate_chain+0x461/0x4a8
 ? add_chain_block+0x88/0xd9
 ? hlist_add_head_rcu+0x49/0x53
 check_prev_add+0x16b/0x3a4
 validate_chain+0x3c4/0x4a8
 ? check_prev_add+0x3a4/0x3a4
 ? mark_lock+0xa5/0x1c6
 __lock_acquire+0x89d/0x949
 lock_acquire+0x2dc/0x34b
 ? filemap_fault+0x276/0x7a5
 ? rcu_read_unlock+0x59/0x59
 ? add_to_page_cache_lru+0x13c/0x13c
 ? lock_is_held_type+0x7b/0xd3
 down_read+0x40/0x4a
 ? filemap_fault+0x276/0x7a5
 filemap_fault+0x276/0x7a5
 ? pagecache_get_page+0x2dd/0x2dd
 ? __lock_acquire+0x8bc/0x949
 ? pte_offset_kernel.isra.0+0x6d/0xc3
 __do_fault+0x96/0xbf
 ? do_fault+0x124/0x35a
 do_fault+0x262/0x35a
 ? handle_pte_fault+0x1c1/0x20d
 __handle_mm_fault+0x171/0x1b5
 ? handle_pte_fault+0x20d/0x20d
 ? __lock_release+0x151/0x254
 ? mark_held_locks+0x1f/0x78
 ? rcu_read_unlock+0x3a/0x59
 handle_mm_fault+0x12a/0x233
 do_user_addr_fault+0x3d2/0x59c
 ? pgtable_bad+0x70/0x70
 ? rcu_read_lock_bh_held+0xab/0xab
 exc_page_fault+0x85/0xa5
 ? asm_exc_page_fault+0x8/0x30
 asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x40192f
Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b
RSP: 002b:00007f9931867eb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0
RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700
R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0
R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000

Fixes: 726218fdc2 ("netfs: Define an interface to talk to a cache")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
cc: Jan Kara <jack@suse.cz>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
2021-12-07 13:39:55 +00:00
..
9p netfs, 9p, afs, ceph: Use folios 2021-11-10 21:16:56 +00:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs affs: use bdev_nr_sectors instead of open coding it 2021-10-18 14:43:22 -06:00
afs afs: Use folios in directory handling 2021-11-10 21:17:09 +00:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs for-5.16-rc2-tag 2021-11-26 11:24:32 -08:00
cachefiles for-5.16/ki_complete-2021-10-29 2021-11-01 10:17:11 -07:00
ceph One notable change here is that async creates and unlinks introduced 2021-11-13 11:31:07 -08:00
cifs cifs: avoid use of dstaddr as key for fscache client cookie 2021-12-03 12:38:25 -06:00
coda coda: bump module version to 7.2 2021-11-09 10:02:51 -08:00
configfs configfs: fix a race in configfs_lookup() 2021-08-25 07:58:49 +02:00
cramfs cramfs: use bdev_nr_bytes instead of open coding it 2021-10-18 14:43:22 -06:00
crypto fscrypt: improve a few comments 2021-10-25 19:11:50 -07:00
debugfs debugfs: debugfs_create_file_size(): use IS_ERR to check for error 2021-09-21 09:09:06 +02:00
devpts
dlm fs: dlm: avoid comms shutdown delay in release_lockspace 2021-09-01 11:29:14 -05:00
ecryptfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
efivarfs efivars: convert to fileattr 2021-04-12 15:04:29 +02:00
efs
erofs erofs: fix deadlock when shrink erofs slab 2021-11-23 14:58:16 +08:00
exfat exfat: fix incorrect loading of i_blocks for large files 2021-11-01 07:49:21 +09:00
exportfs
ext2 ext2: fix sleeping in atomic bugs on error 2021-09-22 13:05:23 +02:00
ext4 Only bug fixes and cleanups for ext4 this merge window. Of note are 2021-11-10 17:05:37 -08:00
f2fs Update to zstd-1.4.10 2021-11-13 15:32:30 -08:00
fat for-5.16/inode-sync-2021-10-29 2021-11-01 10:25:27 -07:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse fuse: release pipe buf after last use 2021-11-25 14:05:18 +01:00
gfs2 gfs2: gfs2_create_inode rework 2021-12-02 12:41:10 +01:00
hfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hfsplus Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs treewide: Replace open-coded flex arrays in unions 2021-10-18 12:28:53 -07:00
hugetlbfs mm,hugetlb: remove mlock ulimit for SHM_HUGETLB 2021-11-09 10:02:48 -08:00
iomap iomap: iomap_read_inline_data cleanup 2021-11-24 10:15:47 -08:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-10-19 12:51:02 +02:00
jbd2 jbd2: add sparse annotations for add_transaction_credits() 2021-08-30 23:36:50 -04:00
jffs2 vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
jfs Just one JFS patch 2021-11-03 09:23:25 -07:00
kernfs Merge 5.15-rc6 into driver-core-next 2021-10-18 09:43:37 +02:00
ksmbd ksmbd: fix memleak in get_file_stream_info() 2021-11-25 00:09:26 -06:00
lockd A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping 2021-11-10 16:45:54 -08:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock 2021-12-07 13:39:55 +00:00
nfs NFS client bugfixes for Linux 5.16 2021-11-27 10:33:55 -08:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd This is just one bugfix for a bufferflow in knfsd's xdr decoding. 2021-11-17 08:38:00 -08:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
nls
notify fanotify: Allow users to request FAN_FS_ERROR events 2021-10-27 12:53:45 +02:00
ntfs fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k 2021-11-27 14:34:41 -08:00
ntfs3 gfs2: Fix mmap + page fault deadlocks 2021-11-02 12:25:03 -07:00
ocfs2 Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs
orangefs orangefs: three fixes from other folks... 2021-11-09 10:34:06 -08:00
overlayfs overlayfs update for 5.16 2021-11-09 10:51:12 -08:00
proc proc/vmcore: fix clearing user buffer by properly using clear_user() 2021-11-20 10:35:55 -08:00
pstore pstore/blk: Use "%lu" to format unsigned long 2021-11-21 09:44:19 -08:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota \n 2021-11-06 16:40:48 -07:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs \n 2021-11-06 16:40:48 -07:00
romfs
smbfs_common cifs: Move SMB2_Create definitions to the shared area 2021-11-05 09:55:36 -05:00
squashfs lib: zstd: Add kernel-specific API 2021-11-08 16:55:21 -08:00
sysfs fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() 2021-10-05 16:35:05 +02:00
sysv sysv: use BUILD_BUG_ON instead of runtime check 2021-11-09 10:02:52 -08:00
tracefs tracefs: Have tracefs directories not set OTH permission bits by default 2021-10-08 18:08:43 -04:00
ubifs fscrypt: remove fscrypt_operations::max_namelen 2021-09-20 19:32:33 -07:00
udf udf: Fix crash after seekdir 2021-11-09 12:53:58 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs xfs: remove incorrect ASSERT in xfs_rename 2021-12-01 17:27:48 -08:00
zonefs gfs2: Fix mmap + page fault deadlocks 2021-11-02 12:25:03 -07:00
aio.c Various hardening fixes and cleanups for 5.16-rc1 2021-11-01 17:29:10 -07:00
anon_inodes.c fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() 2021-09-19 22:35:37 -04:00
attr.c fs: handle circular mappings correctly 2021-11-17 09:26:09 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf_fdpic.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
binfmt_elf.c Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c
binfmt_script.c
buffer.c fs: simplify init_page_buffers 2021-10-18 14:43:22 -06:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c New code for 5.15: 2021-08-31 11:13:35 -07:00
dcache.c useful constants: struct qstr for ".." 2021-04-15 22:36:45 -04:00
direct-io.c fs: get rid of the res2 iocb->ki_complete argument 2021-10-25 10:36:24 -06:00
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c ARM development updates for 5.15: 2021-09-09 13:25:49 -07:00
exec.c Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c switch file_open_root() to struct path 2021-04-07 13:56:43 -04:00
file_table.c
file.c fget: check that the fd still exists after getting a ref to it 2021-12-03 10:06:58 -08:00
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs_context.c memcg: charge fs_context and legacy_fs_context 2021-09-03 09:58:12 -07:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c Various hardening fixes and cleanups for 5.16-rc1 2021-11-01 17:29:10 -07:00
fsopen.c
init.c
inode.c fs: Remove FS_THP_SUPPORT 2021-11-17 10:36:35 -05:00
internal.h Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
io_uring.c io_uring: Fix undefined-behaviour in io_issue_sqe 2021-11-27 06:41:38 -07:00
io-wq.c io-wq: don't retry task_work creation failure on fatal conditions 2021-12-03 06:27:32 -07:00
io-wq.h io_uring: optimise INIT_WQ_LIST 2021-10-19 05:49:54 -06:00
ioctl.c New code for 5.15: 2021-08-31 11:06:32 -07:00
Kconfig 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c libfs: Support RENAME_EXCHANGE in simple_rename() 2021-11-03 15:43:08 +01:00
locks.c locks: remove changelog comments 2021-10-19 14:11:39 -04:00
Makefile 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
mbcache.c
mount.h
mpage.c
namei.c File locking changes for v5.16 2021-11-01 09:06:53 -07:00
namespace.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
no-block.c
nsfs.c
open.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
pipe.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
pnode.c
pnode.h
posix_acl.c fs/posix_acl.c: avoid -Wempty-body warning 2021-11-06 13:30:32 -07:00
proc_namespace.c
read_write.c fs: remove leftover comments from mandatory locking removal 2021-10-26 12:20:50 -04:00
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-17 11:39:49 -07:00
remap_range.c fs: remove mandatory file locking support 2021-08-23 06:15:36 -04:00
select.c Revert "memcg: enable accounting for pollfd and select bits arrays" 2021-09-07 11:26:23 -07:00
seq_file.c seq_file: move seq_escape() to a header 2021-11-09 10:02:52 -08:00
signalfd.c signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency 2021-07-23 13:16:43 -05:00
splice.c
stack.c
stat.c fs: add generic helper for filling statx attribute flags 2021-08-17 11:47:43 +02:00
statfs.c
super.c fs: explicitly unregister per-superblock BDIs 2021-11-06 13:30:34 -07:00
sync.c block: simplify the block device syncing code 2021-10-22 08:36:55 -06:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c userfaultfd: fix a race between writeprotect and exit_mmap() 2021-10-18 20:22:02 -10:00
utimes.c
xattr.c