IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The order of these two parameters is just reversed. gcc didn't warn on
that, probably because 'void *' can be converted from or to other
pointer types without warning.
Cc: stable@vger.kernel.org
Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers")
Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in
a lockdep warning like that below. The problem comes from cachefiles
needing to take the sb_writers lock in order to do a write to the cache,
but being asked to do this by netfslib called from readpage, readahead or
write_begin[1].
Fix this by always offloading the write to the cache off to a worker
thread. The main thread doesn't need to wait for it, so deadlock can be
avoided.
This can be tested by running the quick xfstests on something like afs or
ceph with lockdep enabled.
WARNING: possible circular locking dependency detected
5.15.0-rc1-build2+ #292 Not tainted
------------------------------------------------------
holetest/65517 is trying to acquire lock:
ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5
but task is already holding lock:
ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&mm->mmap_lock#2){++++}-{3:3}:
validate_chain+0x3c4/0x4a8
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
__might_fault+0x87/0xb1
strncpy_from_user+0x25/0x18c
removexattr+0x7c/0xe5
__do_sys_fremovexattr+0x73/0x96
do_syscall_64+0x67/0x7a
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #1 (sb_writers#10){.+.+}-{0:0}:
validate_chain+0x3c4/0x4a8
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
cachefiles_write+0x2b3/0x4bb
netfs_rreq_do_write_to_cache+0x3b5/0x432
netfs_readpage+0x2de/0x39d
filemap_read_page+0x51/0x94
filemap_get_pages+0x26f/0x413
filemap_read+0x182/0x427
new_sync_read+0xf0/0x161
vfs_read+0x118/0x16e
ksys_read+0xb8/0x12e
do_syscall_64+0x67/0x7a
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}:
check_noncircular+0xe4/0x129
check_prev_add+0x16b/0x3a4
validate_chain+0x3c4/0x4a8
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
down_read+0x40/0x4a
filemap_fault+0x276/0x7a5
__do_fault+0x96/0xbf
do_fault+0x262/0x35a
__handle_mm_fault+0x171/0x1b5
handle_mm_fault+0x12a/0x233
do_user_addr_fault+0x3d2/0x59c
exc_page_fault+0x85/0xa5
asm_exc_page_fault+0x1e/0x30
other info that might help us debug this:
Chain exists of:
mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_lock#2);
lock(sb_writers#10);
lock(&mm->mmap_lock#2);
lock(mapping.invalidate_lock#3);
*** DEADLOCK ***
1 lock held by holetest/65517:
#0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
stack backtrace:
CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
dump_stack_lvl+0x45/0x59
check_noncircular+0xe4/0x129
? print_circular_bug+0x207/0x207
? validate_chain+0x461/0x4a8
? add_chain_block+0x88/0xd9
? hlist_add_head_rcu+0x49/0x53
check_prev_add+0x16b/0x3a4
validate_chain+0x3c4/0x4a8
? check_prev_add+0x3a4/0x3a4
? mark_lock+0xa5/0x1c6
__lock_acquire+0x89d/0x949
lock_acquire+0x2dc/0x34b
? filemap_fault+0x276/0x7a5
? rcu_read_unlock+0x59/0x59
? add_to_page_cache_lru+0x13c/0x13c
? lock_is_held_type+0x7b/0xd3
down_read+0x40/0x4a
? filemap_fault+0x276/0x7a5
filemap_fault+0x276/0x7a5
? pagecache_get_page+0x2dd/0x2dd
? __lock_acquire+0x8bc/0x949
? pte_offset_kernel.isra.0+0x6d/0xc3
__do_fault+0x96/0xbf
? do_fault+0x124/0x35a
do_fault+0x262/0x35a
? handle_pte_fault+0x1c1/0x20d
__handle_mm_fault+0x171/0x1b5
? handle_pte_fault+0x20d/0x20d
? __lock_release+0x151/0x254
? mark_held_locks+0x1f/0x78
? rcu_read_unlock+0x3a/0x59
handle_mm_fault+0x12a/0x233
do_user_addr_fault+0x3d2/0x59c
? pgtable_bad+0x70/0x70
? rcu_read_lock_bh_held+0xab/0xab
exc_page_fault+0x85/0xa5
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x40192f
Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b
RSP: 002b:00007f9931867eb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0
RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700
R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0
R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000
Fixes: 726218fdc22c ("netfs: Define an interface to talk to a cache")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
cc: Jan Kara <jack@suse.cz>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
It's not sufficient to skip reading when the pos is beyond the EOF.
There may be data at the head of the page that we need to fill in
before the write.
Add a new helper function that corrects and clarifies the logic of
when we can skip reads, and have it only zero out the part of the page
that won't have data copied in for the write.
Finally, don't set the page Uptodate after zeroing. It's not up to date
since the write data won't have been copied in yet.
[DH made the following changes:
- Prefixed the new function with "netfs_".
- Don't call zero_user_segments() for a full-page write.
- Altered the beyond-last-page check to avoid a DIV instruction and got
rid of then-redundant zero-length file check.
]
Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper")
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/
Link: https://lore.kernel.org/r/162367683365.460125.4467036947364047314.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162391826758.1173366.11794946719301590013.stgit@warthog.procyon.org.uk/ # v2
Fix some miscellaneous things in the new netfs lib[1]:
(1) The kerneldoc for netfs_readpage() shouldn't say netfs_page().
(2) netfs_readpage() can get an integer overflow on 32-bit when it
multiplies page_index(page) by PAGE_SIZE. It should use
page_file_offset() instead.
(3) netfs_write_begin() should use page_offset() to avoid the same
overflow.
Note that netfs_readpage() needs to use page_file_offset() rather than
page_offset() as it may see swap-over-NFS.
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/161789062190.6155.12711584466338493050.stgit@warthog.procyon.org.uk/ [1]