Commit Graph

90663 Commits

Author SHA1 Message Date
Linus Torvalds
71b1543c83 five ksmbd server fixes, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmYmfyAACgkQiiy9cAdy
 T1H8CQv/RlvQNbfSc9V5/vUPqfEWhx+5EyFoMFPZmo2HmtxT4BI8sEICUQcC6h+Y
 L2li6ihmsF1DnxDxsFwKAhQ9nTpYG4LoBbljk/j/N+sKhUFE3ZLWxZyoEAbWWfeh
 UJt8ZuGTgbc8nyPZ1s2d5oZy37PNlGd7CilLtP5HQyEW3l8EHg0qQOp98cibh0mm
 +kS2vilb4nIlapblTq2uFDcBTolXRSjTlY8eWC9DNvWdjTJLdTEjAurqeuN71FA0
 +XXnA9EYE9MDEkCqnQdZEwGvR8ZKy9sR4cqCUGZO/tNz/KfOGfjJkrEf+OEj37Af
 I5ahxUKrYwBEC/6gaFx8i1ZiivdDOyIaM7b+rpazpiV1sCCbxlM18TlDSyODFnUC
 nqEABR3slx/oZ4xYcFuxjzN0L9ZS7+0NQ1VQovFLlNqW7f9qFdUq0JKqfNDvFOm4
 tgi5/lCnlMco0ts6h0HIw5LrwDi7HJWbYf1Fmv/1jC2Ed5Ut0GCMNpSC9WQxDo+t
 G4m8ul1S
 =ZizJ
 -----END PGP SIGNATURE-----

Merge tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Five ksmbd server fixes, most also for stable:

   - rename fix

   - two fixes for potential out of bounds

   - fix for connections from MacOS (padding in close response)

   - fix for when to enable persistent handles"

* tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: add continuous availability share parameter
  ksmbd: common: use struct_group_attr instead of struct_group for network_open_info
  ksmbd: clear RENAME_NOREPLACE before calling vfs_rename
  ksmbd: validate request buffer size in smb2_allocate_rsp_buf()
  ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
2024-04-22 16:28:31 -07:00
Linus Torvalds
a2c63a3f3d bcachefs fixes for 6.9-rc6
- fix a few more deadlocks in recovery
 - fix u32/u64 issues in mi_btree_bitmap
 - btree key cache shrinker now actually frees, with more instrumentation
   coming so we can verify that it's working correctly more easily in the
   future
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYmveYACgkQE6szbY3K
 bnY+kA//dtdKfPliuQlfYhUvIZSvgEy+KxjDaDmeJFVMKFYHiip/JJt7V/YU7jWW
 DmWGtPqo1hoGZnic7h9fstbBCRgUdIYGBInqAWPzL3wmFYe5FPE02KN5fuZ+A+Mp
 sn0QFML4oA0uxD7TRXfCeNx3NRwXonztletVskXCuLa0T8iTOTdOuAH0MCGow3OC
 mlfRZyK05f6Q0UOIzvntfBl8Tkr+yk5g0GFz54U7s+qs/zJsYiYfruXuq6AxLTy2
 xVMDj3Hc6W0vggVpv68HInluubl/b7rVSy+w59GG0D3iQ9/fBdqitFLdw49ReXzi
 J//ctZLb3n+IM4lA7t5ev0lY7bvI2FwFNkrL4qW41E4un5eQ3ghbyOmMoz87svyg
 4JW/CPGP7uKNVmfRuHn1qhgJ9/vIXkObJVl9GKZF2BylaZwjMM5YrL5MZwrKFQYy
 9BMgemgvHFK+wRi74q/OUu3PyH045AoJdIKI66ypFexhGi5YeNqtRHLUKdfSrJR+
 eEkAUaHcgLLfxyk+fIRvcSK+Q9j3BibvsSkU3vLSnl2B+xdvfJqb+I/yvBt/SZQW
 P09JceDQABRBMu9beVbVqMED+PniSZVfG2eU2jBcZ+jhbGQgmiWfMNJVKS4/0uwz
 PRS33P2mViVZ6PJokWyecbgGtVxKrK2ruPdcu6/W05Qi0Vv58+k=
 =lCNd
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Nothing too crazy in this one, and it looks like (fingers crossed) the
  recovery and repair issues are settling down - although there's going
  to be a long tail there, as we've still yet to really ramp up on error
  injection or syzbot.

   - fix a few more deadlocks in recovery

   - fix u32/u64 issues in mi_btree_bitmap

   - btree key cache shrinker now actually frees, with more
     instrumentation coming so we can verify that it's working
     correctly more easily in the future"

* tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: If we run merges at a lower watermark, they must be nonblocking
  bcachefs: Fix inode early destruction path
  bcachefs: Fix deadlock in journal write path
  bcachefs: Tweak btree key cache shrinker so it actually frees
  bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong
  bcachefs: Fix missing call to bch2_fs_allocator_background_exit()
  bcachefs: Check for journal entries overruning end of sb clean section
  bcachefs: Fix bio alloc in check_extent_checksum()
  bcachefs: fix leak in bch2_gc_write_reflink_key
  bcachefs: KEY_TYPE_error is allowed for reflink
  bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift
  bcachefs: make sure to release last journal pin in replay
  bcachefs: node scan: ignore multiple nodes with same seq if interior
  bcachefs: Fix format specifier in validate_bset_keys()
  bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
2024-04-22 13:53:50 -07:00
Takayuki Nagata
77d8aa79ec cifs: reinstate original behavior again for forceuid/forcegid
forceuid/forcegid should be enabled by default when uid=/gid= options are
specified, but commit 24e0a1eff9 ("cifs: switch to new mount api")
changed the behavior. Due to the change, a mounted share does not show
intentional uid/gid for files and directories even though uid=/gid=
options are specified since forceuid/forcegid are not enabled.

This patch reinstates original behavior that overrides uid/gid with
specified uid/gid by the options.

Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Signed-off-by: Takayuki Nagata <tnagata@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-22 10:50:31 -05:00
Eric Van Hensbergen
d05dcfdf5e
fs/9p: mitigate inode collisions
Detect and mitigate inode collsions that now occur since we
fixed 9p generating duplicate inode structures.  Underlying
cause of these appears to be a race condition between reuse
of inode numbers in underlying file system and cleanup of
inode numbers in the client.  Enabling caching
makes this much more likely to happen as it increases cleanup
latency due to writebacks.

Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-04-22 15:34:27 +00:00
Kent Overstreet
e858beeddf bcachefs: If we run merges at a lower watermark, they must be nonblocking
Fix another deadlock related to the merge path; previously, we switched
to always running merges at a lower watermark (because they are
noncritical); but when we run at a lower watermark we also need to run
nonblocking or we've introduced a new deadlock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-and-tested-by: s@m-h.ug
2024-04-22 01:26:51 -04:00
Linus Torvalds
4e90ba757b Kernfs bugfix and documentation update for 6.9-rc5
Here are 2 changes for 6.9-rc5 that deal with "driver core" stuff, that
 do the following:
   - sysfs reference leak fix
   - embargoed-hardware-issues.rst update for Power
 
 Both of these have been in linux-next for over a week with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZiT4Dw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymFwQCfR1OysT/aO16NUYBSaGd7Tx4/3dIAn3YDU7O7
 BvGCYc/Nv7S7WdmA5KKf
 =RuJo
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull kernfs bugfix and documentation update from Greg KH:
 "Here are two changes for 6.9-rc5 that deal with "driver core" stuff,
  that do the following:

   - sysfs reference leak fix

   - embargoed-hardware-issues.rst update for Power

  Both of these have been in linux-next for over a week with no reported
  issues"

* tag 'driver-core-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: embargoed-hardware-issues.rst: Add myself for Power
  fs: sysfs: Fix reference leak in sysfs_break_active_protection()
2024-04-21 10:30:21 -07:00
Kent Overstreet
0e42f38119 bcachefs: Fix inode early destruction path
discard_new_inode() is the wrong interface to use when we need to free
an inode that was never inserted into the inode hash table; we can
bypass the whole iput() -> evict() path and replace it with
__destroy_inode(); kmem_cache_free() - this fixes a WARN_ON() about
I_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 23:00:59 -04:00
Kent Overstreet
85ab365f7c bcachefs: Fix deadlock in journal write path
bch2_journal_write() was incorrectly waiting on earlier journal writes
synchronously; this usually worked because most of the time we'd be
running in the context of a thread that did a journal_buf_put(), but
sometimes we'd be running out of the same workqueue that completes those
prior journal writes.

Additionally, this makes sure to punt to a workqueue before submitting
preflushes - we really don't want to be calling submit_bio() in the main
transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 23:00:59 -04:00
Kent Overstreet
adfe9357c3 bcachefs: Tweak btree key cache shrinker so it actually frees
Freeing key cache items is a multi stage process; we need to wait for an
SRCU grace period to elapse, and we handle this ourselves - partially to
avoid callback overhead, but primarily so that when allocating we can
first allocate from the freed items waiting for an SRCU grace period.

Previously, the shrinker was counting the items on the 'waiting for SRCU
grace period' lists as items being scanned, but this meant that too many
items waiting for an SRCU grace period could prevent it from doing any
work at all.

After this, we're seeing that items skipped due to the accessed bit are
the main cause of the shrinker not making any progress, and we actually
want the key cache shrinker to run quite aggressively because reclaimed
items will still generally be found (more compactly) in the btree node
cache - so we also tweak the shrinker to not count those against
nr_to_scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 17:06:20 -04:00
Kent Overstreet
6e4d9bd110 bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong
this stores the SRCU sequence number, which we use to check if an SRCU
barrier has elapsed; this is a partial fix for the key cache shrinker
not actually freeing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 15:15:51 -04:00
Kent Overstreet
ec438ac59d bcachefs: Fix missing call to bch2_fs_allocator_background_exit()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 00:31:59 -04:00
Kent Overstreet
fcdbc1d7a4 bcachefs: Check for journal entries overruning end of sb clean section
Fix a missing bounds check in superblock validation.

Note that we don't yet have repair code for this case - repair code for
individual items is generally low priority, since the whole superblock
is checksummed, validated prior to write, and we have backups.

Reported-by: lei lu <llfamsec@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 00:16:53 -04:00
Namjae Jeon
e9d8c2f95a ksmbd: add continuous availability share parameter
If capabilities of the share is not SMB2_SHARE_CAP_CONTINUOUS_AVAILABILITY,
ksmbd should not grant a persistent handle to the client.
This patch add continuous availability share parameter to control it.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 20:48:47 -05:00
Namjae Jeon
0268a7cc7f ksmbd: common: use struct_group_attr instead of struct_group for network_open_info
4byte padding cause the connection issue with the applications of MacOS.
smb2_close response size increases by 4 bytes by padding, And the smb
client of MacOS check it and stop the connection. This patch use
struct_group_attr instead of struct_group for network_open_info to use
 __packed to avoid padding.

Fixes: 0015eb6e12 ("smb: client, common: fix fortify warnings")
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 20:48:47 -05:00
Marios Makassikis
4973b04d3e ksmbd: clear RENAME_NOREPLACE before calling vfs_rename
File overwrite case is explicitly handled, so it is not necessary to
pass RENAME_NOREPLACE to vfs_rename.

Clearing the flag fixes rename operations when the share is a ntfs-3g
mount. The latter uses an older version of fuse with no support for
flags in the ->rename op.

Cc: stable@vger.kernel.org
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 20:48:47 -05:00
Namjae Jeon
17cf0c2794 ksmbd: validate request buffer size in smb2_allocate_rsp_buf()
The response buffer should be allocated in smb2_allocate_rsp_buf
before validating request. But the fields in payload as well as smb2 header
is used in smb2_allocate_rsp_buf(). This patch add simple buffer size
validation to avoid potencial out-of-bounds in request buffer.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 20:48:47 -05:00
Namjae Jeon
c119f4ede3 ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
If ->ProtocolId is SMB2_TRANSFORM_PROTO_NUM, smb2 request size
validation could be skipped. if request size is smaller than
sizeof(struct smb2_query_info_req), slab-out-of-bounds read can happen in
smb2_allocate_rsp_buf(). This patch allocate response buffer after
decrypting transform request. smb3_decrypt_req() will validate transform
request size and avoid slab-out-of-bound in smb2_allocate_rsp_buf().

Reported-by: Norbert Szetei <norbert@doyensec.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 20:48:47 -05:00
Paulo Alcantara
18d86965e3 smb: client: fix rename(2) regression against samba
After commit 2c7d399e55 ("smb: client: reuse file lease key in
compound operations") the client started reusing lease keys for
rename, unlink and set path size operations to prevent it from
breaking its own leases and thus causing unnecessary lease breaks to
same connection.

The implementation relies on positive dentries and
cifsInodeInfo::lease_granted to decide whether reusing lease keys for
the compound requests.  cifsInodeInfo::lease_granted was introduced by
commit 0ab95c2510 ("Defer close only when lease is enabled.") to
indicate whether lease caching is granted for a specific file, but
that can only happen until file is open, so
cifsInodeInfo::lease_granted was left uninitialised in ->alloc_inode
and then client started sending random lease keys for files that
hadn't any leases.

This fixes the following test case against samba:

mount.cifs //srv/share /mnt/1 -o ...,nosharesock
mount.cifs //srv/share /mnt/2 -o ...,nosharesock
touch /mnt/1/foo; tail -f /mnt/1/foo & pid=$!
mv /mnt/2/foo /mnt/2/bar # fails with -EIO
kill $pid

Fixes: 0ab95c2510 ("Defer close only when lease is enabled.")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 16:02:45 -05:00
David Howells
afc23febd5 cifs: Add tracing for the cifs_tcon struct refcounting
Add tracing for the refcounting/lifecycle of the cifs_tcon struct, marking
different events with different labels and giving each tcon its own debug
ID so that the tracelines corresponding to individual tcons can be
distinguished.  This can be enabled with:

	echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_tcon_ref/enable

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 16:02:09 -05:00
David Howells
dad80c6bff cifs: Fix reacquisition of volume cookie on still-live connection
During mount, cifs_mount_get_tcon() gets a tcon resource connection record
and then attaches an fscache volume cookie to it.  However, it does this
irrespective of whether or not the tcon returned from cifs_get_tcon() is a
new record or one that's already in use.  This leads to a warning about a
volume cookie collision and a leaked volume cookie because tcon->fscache
gets reset.

Fix this be adding a mutex and a "we've already tried this" flag and only
doing it once for the lifetime of the tcon.

[!] Note: Looking at cifs_mount_get_tcon(), a more general solution may
actually be required.  Reacquiring the volume cookie isn't the only thing
that function does: it also partially reinitialises the tcon record without
any locking - which may cause live filesystem ops already using the tcon
through a previous mount to malfunction.

This can be reproduced simply by something like:

    mount //example.com/test /xfstest.test -o user=shares,pass=xxx,fsc
    mount //example.com/test /mnt -o user=shares,pass=xxx,fsc

Fixes: 70431bfd82 ("cifs: Support fscache indexing rewrite")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-19 15:37:47 -05:00
Linus Torvalds
46b28503cd fs/9p: fixes regressions in 6.9
This series contains a reversion of one of the original 6.9
 patches which seems to have been the cause of most of the
 instability.  It also incorporates several fixes to legacy
 support and cache fixes.
 
 There are few additional changes to improve stability,
 but I want another week of testing before sending them
 upstream.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmYiiQ0ACgkQiP/V+0pf
 /5hrCw//aJwdNAimpwPrc5UfE4Q37igQeXoT29VJbkOBO78rZ2cNgd3EFpgC2UES
 RFJejQ/IQlEkpqbHiMHIyCii2MmWGT0xzePLf3nUZW/qmoUvhvXlPG5OZb0FomXY
 gxCRFuUgegNcK3t3LtFAVn7v6NpXtOfLAgJb3MDIFP8WsCuN863pQcJCwn4aSuKc
 C1ct2tLaaIeZSAy68xytqDwRXslMGaKUp7ygBzpyaIIEqy2l9H8NRKQ8Cmg+vyKF
 2+zu3fNYIGIS3KflUtcTQDZ9IVtp/YxN7QXchZ56nnD5PFy9L9GgvBecZ0i8zzoZ
 XFmzyp7HLwyBA8oNmmEJWMz93iwx61mePxOzPu2n1VfqWRTgFp/kd3KrFKWLfHvw
 NoPGbneAhtwifKCNkxAmX6aCvnTZ18j9nds8WbRcuLRbTF0hHfkI36+vgoRWebaA
 su673A0fnFFe64EEnOLjlnAa0V8CL26V2rX2Mi2Kjaw6emc1Yz5HDnGYjckKIlvS
 fZjlfP1dtqzBecXvBLIuMQKfygpRJD83sEni+rGtAN1FKVP8eKz+/ZcyAG5xqcrZ
 dnDXBegjhieqyz4q9vykxTLmYKEKd4fqbhhjZQ3PStyXgc6iFVKvD41akSdxR6ob
 3oujNYblpkJVhHCcO+H4dWa7tznB7hqd9xv2Jerx4cKTdd9uIik=
 =M4CO
 -----END PGP SIGNATURE-----

Merge tag '9p-fixes-for-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull fs/9p fixes from Eric Van Hensbergen:
 "This contains a reversion of one of the original 6.9 patches which
  seems to have been the cause of most of the instability. It also
  incorporates several fixes to legacy support and cache fixes.

  There are few additional changes to improve stability, but I want
  another week of testing before sending them upstream"

* tag '9p-fixes-for-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: drop inodes immediately on non-.L too
  fs/9p: Revert "fs/9p: fix dups even in uncached mode"
  fs/9p: remove erroneous nlink init from legacy stat2inode
  9p: explicitly deny setlease attempts
  fs/9p: fix the cache always being enabled on files with qid flags
  fs/9p: translate O_TRUNC into OTRUNC
  fs/9p: only translate RWX permissions for plain 9P2000
2024-04-19 13:36:28 -07:00
Linus Torvalds
daa757767d fuse fixes for 6.9-rc5
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZiJcTAAKCRDh3BK/laaZ
 PK1QAP9u/S7GYKDj0k58xOVAof2x/q0puHWXoObRma+bPmeoeQEA2+K+vlnTJHub
 kLRURaTCzGyFfL+CB/JQ4Kv4tDF5qQc=
 =Eoob
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

 - Fix two bugs in the new passthrough mode

 - Fix a statx bug introduced in v6.6

 - Fix code documentation

* tag 'fuse-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  cuse: add kernel-doc comments to cuse_process_init_reply()
  fuse: fix leaked ENOSYS error on first statx call
  fuse: fix parallel dio write on file open in passthrough mode
  fuse: fix wrong ff->iomode state changes from parallel dio write
2024-04-19 13:16:10 -07:00
Linus Torvalds
54c23548e0 15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues
or aren't considered suitable for backporting.
 
 There are a significant number of fixups for this cycle's page_owner
 changes (series "page_owner: print stacks and their outstanding
 allocations").  Apart from that, singleton changes all over, mainly in MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZiGTewAKCRDdBJ7gKXxA
 jt1QAP9QxiU/+gUMVjkHyKaMBHSBMD/CWBFjDfRjx+BPqYx55gD+JWxUXwlyVkMo
 Z8fqtCGEgatev1VbwpCwByhvnH9bKgw=
 =YBZ9
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues
  or aren't considered suitable for backporting.

  There are a significant number of fixups for this cycle's page_owner
  changes (series "page_owner: print stacks and their outstanding
  allocations"). Apart from that, singleton changes all over, mainly in
  MM"

* tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix OOB in nilfs_set_de_type
  MAINTAINERS: update Naoya Horiguchi's email address
  fork: defer linking file vma until vma is fully initialized
  mm/shmem: inline shmem_is_huge() for disabled transparent hugepages
  mm,page_owner: defer enablement of static branch
  Squashfs: check the inode number is not the invalid value of zero
  mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
  mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
  mm/userfaultfd: allow hugetlb change protection upon poison entry
  mm,page_owner: fix printing of stack records
  mm,page_owner: fix accounting of pages when migrating
  mm,page_owner: fix refcount imbalance
  mm,page_owner: update metadata for tail pages
  userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE
  mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
2024-04-19 09:13:35 -07:00
Qu Wenruo
fe1c6c7acc btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
[BUG]
During my extent_map cleanup/refactor, with extra sanity checks,
extent-map-tests::test_case_7() would not pass the checks.

The problem is, after btrfs_drop_extent_map_range(), the resulted
extent_map has a @block_start way too large.
Meanwhile my btrfs_file_extent_item based members are returning a
correct @disk_bytenr/@offset combination.

The extent map layout looks like this:

     0        16K    32K       48K
     | PINNED |      | Regular |

The regular em at [32K, 48K) also has 32K @block_start.

Then drop range [0, 36K), which should shrink the regular one to be
[36K, 48K).
However the @block_start is incorrect, we expect 32K + 4K, but got 52K.

[CAUSE]
Inside btrfs_drop_extent_map_range() function, if we hit an extent_map
that covers the target range but is still beyond it, we need to split
that extent map into half:

	|<-- drop range -->|
		 |<----- existing extent_map --->|

And if the extent map is not compressed, we need to forward
extent_map::block_start by the difference between the end of drop range
and the extent map start.

However in that particular case, the difference is calculated using
(start + len - em->start).

The problem is @start can be modified if the drop range covers any
pinned extent.

This leads to wrong calculation, and would be caught by my later
extent_map sanity checks, which checks the em::block_start against
btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.

This is a regression caused by commit c962098ca4 ("btrfs: fix
incorrect splitting in btrfs_drop_extent_map_range"), which removed the
@len update for pinned extents.

[FIX]
Fix it by avoiding using @start completely, and use @end - em->start
instead, which @end is exclusive bytenr number.

And update the test case to verify the @block_start to prevent such
problem from happening.

Thankfully this is not going to lead to any data corruption, as IO path
does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.

So this fix is only here for the sake of consistency/correctness.

CC: stable@vger.kernel.org # 6.5+
Fixes: c962098ca4 ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 18:18:50 +02:00
Johannes Thumshirn
2f7ef5bb4a btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
Syzbot reported the following information leak for in
btrfs_ioctl_logical_to_ino():

  BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
  BUG: KMSAN: kernel-infoleak in _copy_to_user+0xbc/0x110 lib/usercopy.c:40
   instrument_copy_to_user include/linux/instrumented.h:114 [inline]
   _copy_to_user+0xbc/0x110 lib/usercopy.c:40
   copy_to_user include/linux/uaccess.h:191 [inline]
   btrfs_ioctl_logical_to_ino+0x440/0x750 fs/btrfs/ioctl.c:3499
   btrfs_ioctl+0x714/0x1260
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:904 [inline]
   __se_sys_ioctl+0x261/0x450 fs/ioctl.c:890
   __x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890
   x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Uninit was created at:
   __kmalloc_large_node+0x231/0x370 mm/slub.c:3921
   __do_kmalloc_node mm/slub.c:3954 [inline]
   __kmalloc_node+0xb07/0x1060 mm/slub.c:3973
   kmalloc_node include/linux/slab.h:648 [inline]
   kvmalloc_node+0xc0/0x2d0 mm/util.c:634
   kvmalloc include/linux/slab.h:766 [inline]
   init_data_container+0x49/0x1e0 fs/btrfs/backref.c:2779
   btrfs_ioctl_logical_to_ino+0x17c/0x750 fs/btrfs/ioctl.c:3480
   btrfs_ioctl+0x714/0x1260
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:904 [inline]
   __se_sys_ioctl+0x261/0x450 fs/ioctl.c:890
   __x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890
   x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Bytes 40-65535 of 65536 are uninitialized
  Memory access of size 65536 starts at ffff888045a40000

This happens, because we're copying a 'struct btrfs_data_container' back
to user-space. This btrfs_data_container is allocated in
'init_data_container()' via kvmalloc(), which does not zero-fill the
memory.

Fix this by using kvzalloc() which zeroes out the memory on allocation.

CC: stable@vger.kernel.org # 4.14+
Reported-by:  <syzbot+510a1abbb8116eeb341d@syzkaller.appspotmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <Johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 18:18:13 +02:00
Linus Torvalds
8cd26fd90c for-6.9-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmYgXDMACgkQxWXV+ddt
 WDsPpg//RzpLGyfFVFx+AdqIPScBvDSr6RIQAug++4OmDbIRMxzOpxKOAWThhivf
 78KIms2fj9R/zLJEdUGCLQTcy8a1eWBnoeSzXoeTta2pip5cKrc9v3hJId53l0F6
 BfltbVjpAKt6XHqeI0V2myrL/KHx5bApz5oNn/oEQCwiA2HBkasrYTRLEA7xMem2
 hRUIXrTuIdwiyWugi84xjp9D0BxEdbTBfH6SR6RG4ESy+73gdEt4BAeDI6DzWN+D
 eKUv/CthhrP7xuO8Aq9XGkwznP7lIeIwBCiV5XURLR0HztFm64vXgbPQHhwqvI43
 5uhA7wifc/VE8nOysubfET6MwVEeyOptW6+25ih/9Da9VLxRK1y/Hm94JW8t6Sxi
 VPgT5gz4YuE5/QaojETDLYgkkjKj7Lpe/Bs225J3QBCHu3fs/tp9kHKbUNJrcAeM
 b56tiRMccLVpeoslbK4ahvQqCH4/LKBMdAqfWK5/p24JkYT/ubVP3CdLS2MOeRpV
 UqDpQExuWsVJZKH8znSXXrHf2ZMYHmlA/1gRqdEmcvPF8A2vCc9aMMZHTP7v57EC
 /80NJv9HQuxcUFQCl0h4WBlB+gGQtAszz+0q1X9aedauC6Hd/7LeICLCPRczJC3g
 rD3J+EXiTg2MxqZWyXJXQ1Q9cQWNkQjG6o/rEhl5r5c3OGWgssk=
 =ZKAP
 -----END PGP SIGNATURE-----

Merge tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fixup in zoned mode for out-of-order writes of metadata that are no
   longer necessary, this used to be tracked in a separate list but now
   the old locaion needs to be zeroed out, also add assertions

 - fix bulk page allocation retry, this may stall after first failure
   for compression read/write

* tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not wait for short bulk allocation
  btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling
  btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer
2024-04-17 18:25:40 -07:00
Sweet Tea Dorminy
131a821a24 btrfs: fallback if compressed IO fails for ENOSPC
In commit b4ccace878 ("btrfs: refactor submit_compressed_extents()"), if
an async extent compressed but failed to find enough space, we changed
from falling back to an uncompressed write to just failing the write
altogether. The principle was that if there's not enough space to write
the compressed version of the data, there can't possibly be enough space
to write the larger, uncompressed version of the data.

However, this isn't necessarily true: due to fragmentation, there could
be enough discontiguous free blocks to write the uncompressed version,
but not enough contiguous free blocks to write the smaller but
unsplittable compressed version.

This has occurred to an internal workload which relied on write()'s
return value indicating there was space. While rare, it has happened a
few times.

Thus, in order to prevent early ENOSPC, re-add a fallback to
uncompressed writing.

Fixes: b4ccace878 ("btrfs: refactor submit_compressed_extents()")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Co-developed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 01:46:52 +02:00
Naohiro Aota
7192833c4e btrfs: scrub: run relocation repair when/only needed
When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the errors are fixed, sctx->error_bitmap is cleared for the stripe
range. However, in the zoned mode, it runs relocation to repair scrub
errors when the bitmap is *not* empty, which is a flipped condition.

Also, it runs the relocation even if the scrub is read-only. This was
missed by a fix in commit 1f2030ff6e ("btrfs: scrub: respect the
read-only flag during repair").

The repair is only necessary when there is a repaired sector and should be
done on read-write scrub. So, tweak the condition for both regular and
zoned case.

Fixes: 54765392a1 ("btrfs: scrub: introduce helper to queue a stripe for scrub")
Fixes: 1f2030ff6e ("btrfs: scrub: respect the read-only flag during repair")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 01:46:47 +02:00
David Sterba
e5a78fdec0 btrfs: remove colon from messages with state
The message format in syslog is usually made of two parts:

  prefix ":" message

Various tools parse the prefix up to the first ":". When there's
an additional status of a btrfs filesystem like

  [5.199782] BTRFS info (device nvme1n1p1: state M): use zstd compression, level 9

where 'M' is for remount, there's one more ":" that does not conform to
the format. Remove it.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 01:46:35 +02:00
Kent Overstreet
0389c09b2f bcachefs: Fix bio alloc in check_extent_checksum()
if the buffer is virtually mapped it won't be a single bvec

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17 17:29:58 -04:00
Kent Overstreet
719aec84b1 bcachefs: fix leak in bch2_gc_write_reflink_key
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17 17:29:58 -04:00
Kent Overstreet
605109ff5e bcachefs: KEY_TYPE_error is allowed for reflink
KEY_TYPE_error is left behind when we have to delete all pointers in an
extent in fsck; it allows errors to be correctly returned by reads
later.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17 17:29:58 -04:00
Kent Overstreet
fa845c7349 bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift
Fixes: 27c15ed297 bcachefs: bch_member.btree_allocated_bitmap
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17 17:29:53 -04:00
Kent Overstreet
79055f50a6 bcachefs: make sure to release last journal pin in replay
This fixes a deadlock when journal replay has many keys to insert that
were from fsck, not the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-16 19:14:01 -04:00
Kent Overstreet
fabb4d4985 bcachefs: node scan: ignore multiple nodes with same seq if interior
Interior nodes are not really needed, when we have to scan - but if this
pops up for leaf nodes we'll need a real heuristic.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-16 19:14:00 -04:00
Nathan Chancellor
9fd5a48a1e bcachefs: Fix format specifier in validate_bset_keys()
When building for 32-bit platforms, for which size_t is 'unsigned int',
there is a warning from a format string in validate_bset_keys():

  fs/bcachefs/btree_io.c: In function 'validate_bset_keys':
  fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=]
    891 |                                  "bad k->u64s %u (min %u max %lu)", k->u64s,
        |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err'
    603 |                                msg, ##__VA_ARGS__);                     \
        |                                ^~~
  fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on'
    887 |                 if (btree_err_on(!bkeyp_u64s_valid(&b->format, k),
        |                     ^~~~~~~~~~~~
  fs/bcachefs/btree_io.c:891:64: note: format string is defined here
    891 |                                  "bad k->u64s %u (min %u max %lu)", k->u64s,
        |                                                              ~~^
        |                                                                |
        |                                                                long unsigned int
        |                                                              %u
  cc1: all warnings being treated as errors

BKEY_U64s is size_t so the entire expression is promoted to size_t. Use
the '%zu' specifier so that there is no warning regardless of the width
of size_t.

Fixes: 031ad9e7db ("bcachefs: Check for packed bkeys that are too big")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-16 19:11:49 -04:00
Kent Overstreet
02bed83d59 bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
We need to initialize the stdio redirects before they're used.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-16 19:11:49 -04:00
Jeongjun Park
c4a7dc9523 nilfs2: fix OOB in nilfs_set_de_type
The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is
defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function,
which uses this array, specifies the index to read from the array in the
same way as "(mode & S_IFMT) >> S_SHIFT".

static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode
 *inode)
{
	umode_t mode = inode->i_mode;

	de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob
}

However, when the index is determined this way, an out-of-bounds (OOB)
error occurs by referring to an index that is 1 larger than the array size
when the condition "mode & S_IFMT == S_IFMT" is satisfied.  Therefore, a
patch to resize the nilfs_type_by_mode array should be applied to prevent
OOB errors.

Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+2e22057de05b9f3b30d8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-16 15:39:52 -07:00
Phillip Lougher
9253c54e01 Squashfs: check the inode number is not the invalid value of zero
Syskiller has produced an out of bounds access in fill_meta_index().

That out of bounds access is ultimately caused because the inode
has an inode number with the invalid value of zero, which was not checked.

The reason this causes the out of bounds access is due to following
sequence of events:

1. Fill_meta_index() is called to allocate (via empty_meta_index())
   and fill a metadata index.  It however suffers a data read error
   and aborts, invalidating the newly returned empty metadata index.
   It does this by setting the inode number of the index to zero,
   which means unused (zero is not a valid inode number).

2. When fill_meta_index() is subsequently called again on another
   read operation, locate_meta_index() returns the previous index
   because it matches the inode number of 0.  Because this index
   has been returned it is expected to have been filled, and because
   it hasn't been, an out of bounds access is performed.

This patch adds a sanity check which checks that the inode number
is not zero when the inode is created and returns -EINVAL if it is.

[phillip@squashfs.org.uk: whitespace fix]
  Link: https://lkml.kernel.org/r/20240409204723.446925-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20240408220206.435788-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com>
Closes: https://lore.kernel.org/lkml/87f5c007-b8a5-41ae-8b57-431e924c5915.bugreport@ubisectech.com/
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-16 15:39:50 -07:00
Christian Brauner
74871791ff
ntfs3: serve as alias for the legacy ntfs driver
Johan Hovold reported that removing the legacy ntfs driver broke boot
for him since his fstab uses the legacy ntfs driver to access firmware
from the original Windows partition.

Use ntfs3 as an alias for legacy ntfs if CONFIG_NTFS_FS is selected.
This is similar to how ext3 is treated.

Link: https://lore.kernel.org/r/Zf2zPf5TO5oYt3I3@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240325-hinkriegen-zuziehen-d7e2c490427a@brauner
Fixes: 7ffa8f3d30 ("fs: Remove NTFS classic")
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-16 10:45:26 +02:00
Linus Torvalds
96fca68c4f nfsd-6.9 fixes:
- Fix a potential tracepoint crash
 - Fix NFSv4 GETATTR on big-endian platforms
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmYdlTMACgkQM2qzM29m
 f5esDBAAnXOgnizrGTMkpmqWL11UmpIjWDyTxQ7dWrk7dqQGXT3qAAya3dijaJiM
 a1eLdFiaaKFxtkFrR9QPCtqfpR/gNxkkHf05SK/LQ1SL2OMbAMa1/UIaf0teWM78
 CafmMT1YLMyiEDFpB0rAnoJ5VvTU2BVowjfzAW/0PkmwLlO5+XMMhPx/qd1061Ll
 gwl2pqwZPankZRWsUBZtDE5bCTuKQDePrG7e7J7FKVPR+1EqAcudsDMh1tmSTvar
 0NTeLH0LTJ2imZi21b+j9+VKtwXTtmuY2GxhADNb8goUuQI2+lqNakDk4AflQvuy
 Kg3Z0dnNkTWGKPIbV/020vhN/6Fev5RVF9SdPF5WcEfeaWDV5rjEY1s4svphUuS+
 Nh8VCPeQEAamAcShA584G8onWdXGP9sYgBiWXZvh8R38Akq6AC6LPEkbqT6dR5mU
 ftMDGb3BBvkOs7ahjaiUUaPqoRXxeS+Qh06Sa3JrZhbMFdccZRq/AgodtC7ZYGZZ
 4u7yG+y8MIytHbIljE2aCo8U8jV8f4nl6VV3xda3H9zZG0RRfpZfFetHiAWqRjoq
 BEB75eLFDjf1qAXENWzzdeS0wLRRr5PHIkBfDeFq71zyJO37RH15sfVnavinj2KY
 7a0ASn2xlqzDHY7MTZ2ULRCLYsS7XwN88KBF7tNghfQBKJYs59A=
 =wAk4
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a potential tracepoint crash

 - Fix NFSv4 GETATTR on big-endian platforms

* tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix endianness issue in nfsd4_encode_fattr4
  SUNRPC: Fix rpcgss_context trace event acceptor field
2024-04-15 14:09:47 -07:00
Linus Torvalds
cef27048e5 bcachefs fixes for 6.9-rc5
various recovery fixes:
 
 - fixes for the btree_insert_entry being resized on path allocation
   btree_path array recently became dynamically resizable, and
   btree_insert_entry along with it; this was being observed during
   journal replay, when write buffer btree updates don't use the write
   buffer and instead use the normal btree update path
 - multiple fixes for deadlock in recovery when we need to do lots of
   btree node merges; excessive merges were clocking up the whole
   pipeline
 - write buffer path now correctly does btree node merges when needed
 - fix failure to go RW when superblock indicates recovery passes needed
   (i.e. to complete an unfinished upgrade)
 
 various unsafety fixes - test case contributed by a user who had two
 drives out of a six drive array write out a whole bunch of garbage after
 power failure
 
 new (tiny) on disk format feature: since it appears the btree node scan
 tool will be a more regular thing (crappy hardware, user error) - this
 adds a 64 bit per-device bitmap of regions that have ever had btree
 nodes.
 
 a path->should_be_locked fix, from a larger patch series tightening up
 invariants and assertions around btree transaction and path locking
 state; this particular fix prevents us from keeping around btree_paths
 that are no longer needed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYdaRIACgkQE6szbY3K
 bnbqcA/9ETT0Jekf/V4klQmoWj9GX5nQstUz+ENABNNPL+5hld62EojiRvOW2qwU
 zVs7O0M59B8/+4v4KJoW+RqnLFjAF4z/Gf+/Uw9WarsHAKIxxFFFARxG93JpGqOn
 nGa8RSw0BaYQIdbMR0Bdacc2f0N+JkJQx956/+JV7EG5MAJqXgz00AvIuLqMZ+2t
 0m9av3n0tVmstyvvGqk8pouvQjK0XUvIDYN3oiUDl7WXOAIKXDlp6yviiGnTbusq
 DssmIt5fdeVBq/DAk5PMNEKM9NUP+weIZW1UWPWINaicarqyV+pn2fhvLrBxVl7q
 zBSN3v28viaABKC8A15b2bqj3IT2WIBDoBCEi406akMao9eiVsE6is13rFkPQwQI
 Obhc7NNDyOPPTvX25M3tKXpr8rSGoD2qHIMMKMIBe1ZWscj6lMbmUBErwzTOAW4+
 pNTvzWT2XwcS7tE8Fx50ZxcehTQl6ir0hQvjJL5JV2po8XMbdGxcImBe6xPmAa3n
 /XIzyglL8IvW494wjCsHxtTeOt+f8nW7BXJCrWB71UQeXIXq4b9FADOwWtlGTnxJ
 6XNprfi8TSp+RsSRxav6DBw2ou5viGjAjP2ddrO6Lw37XUYV0igS+BeDNEPA4dwI
 ZlbCzNE7qSXK2rjmGjyu7GCJ3+NOxJDQ8GdxkTDtpPrBF2kCOkQ=
 =NAId
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs

Pull yet more bcachefs fixes from Kent Overstreet:
 "This gets recovery working again for the affected user I've been
  working with, and I'm still waiting to hear back on other bug reports
  but should fix it for everyone else who's been having issues with
  recovery.

   - Various recovery fixes:

       - fixes for the btree_insert_entry being resized on path
         allocation btree_path array recently became dynamically
         resizable, and btree_insert_entry along with it; this was being
         observed during journal replay, when write buffer btree updates
         don't use the write buffer and instead use the normal btree
         update path

       - multiple fixes for deadlock in recovery when we need to do lots
         of btree node merges; excessive merges were clocking up the
         whole pipeline

       - write buffer path now correctly does btree node merges when
         needed

       - fix failure to go RW when superblock indicates recovery passes
         needed (i.e. to complete an unfinished upgrade)

   - Various unsafety fixes - test case contributed by a user who had
     two drives out of a six drive array write out a whole bunch of
     garbage after power failure

   - New (tiny) on disk format feature: since it appears the btree node
     scan tool will be a more regular thing (crappy hardware, user
     error) - this adds a 64 bit per-device bitmap of regions that have
     ever had btree nodes.

   - A path->should_be_locked fix, from a larger patch series tightening
     up invariants and assertions around btree transaction and path
     locking state.

     This particular fix prevents us from keeping around btree_paths
     that are no longer needed"

* tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs: (24 commits)
  bcachefs: set_btree_iter_dontneed also clears should_be_locked
  bcachefs: fix error path of __bch2_read_super()
  bcachefs: Check for backpointer bucket_offset >= bucket size
  bcachefs: bch_member.btree_allocated_bitmap
  bcachefs: sysfs internal/trigger_journal_flush
  bcachefs: Fix bch2_btree_node_fill() for !path
  bcachefs: add safety checks in bch2_btree_node_fill()
  bcachefs: Interior known are required to have known key types
  bcachefs: add missing bounds check in __bch2_bkey_val_invalid()
  bcachefs: Fix btree node merging on write buffer btrees
  bcachefs: Disable merges from interior update path
  bcachefs: Run merges at BCH_WATERMARK_btree
  bcachefs: Fix missing write refs in fs fio paths
  bcachefs: Fix deadlock in journal replay
  bcachefs: Go rw if running any explicit recovery passes
  bcachefs: Standardize helpers for printing enum strs with bounds checks
  bcachefs: don't queue btree nodes for rewrites during scan
  bcachefs: fix race in bch2_btree_node_evict()
  bcachefs: fix unsafety in bch2_stripe_to_text()
  bcachefs: fix unsafety in bch2_extent_ptr_to_text()
  ...
2024-04-15 11:01:11 -07:00
Kent Overstreet
ad29cf999a bcachefs: set_btree_iter_dontneed also clears should_be_locked
This is part of a larger series cleaning up the semantics of
should_be_locked and adding assertions around it; if we don't need an
iterator/path anymore, it clearly doesn't need to be locked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15 13:31:15 -04:00
Chao Yu
3078e059a5 bcachefs: fix error path of __bch2_read_super()
In __bch2_read_super(), if kstrdup() fails, it needs to release memory
in sb->holder, fix to call bch2_free_super() in the error path.

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15 13:31:15 -04:00
Yang Li
09492cb451 cuse: add kernel-doc comments to cuse_process_init_reply()
This commit adds kernel-doc style comments with complete parameter
descriptions for the function  cuse_process_init_reply.

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-04-15 11:02:10 +02:00
Danny Lin
eb4b691b91 fuse: fix leaked ENOSYS error on first statx call
FUSE attempts to detect server support for statx by trying it once and
setting no_statx=1 if it fails with ENOSYS, but consider the following
scenario:

- Userspace (e.g. sh) calls stat() on a file
  * succeeds
- Userspace (e.g. lsd) calls statx(BTIME) on the same file
  - request_mask = STATX_BASIC_STATS | STATX_BTIME
  - first pass: sync=true due to differing cache_mask
  - statx fails and returns ENOSYS
  - set no_statx and retry
  - retry sets mask = STATX_BASIC_STATS
  - now mask == cache_mask; sync=false (time_before: still valid)
  - so we take the "else if (stat)" path
  - "err" is still ENOSYS from the failed statx call

Fix this by zeroing "err" before retrying the failed call.

Fixes: d3045530bd ("fuse: implement statx")
Cc: stable@vger.kernel.org # v6.6
Signed-off-by: Danny Lin <danny@orbstack.dev>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-04-15 10:12:44 +02:00
Amir Goldstein
7cc9112628 fuse: fix parallel dio write on file open in passthrough mode
Parallel dio write takes a negative refcount of fi->iocachectr and so does
open of file in passthrough mode.

The refcount of passthrough mode is associated with attach/detach of a
fuse_backing object to fuse inode.

For parallel dio write, the backing file is irrelevant, so the call to
fuse_inode_uncached_io_start() passes a NULL fuse_backing object.

Passing a NULL fuse_backing will result in false -EBUSY error if the file
is already open in passthrough mode.

Allow taking negative fi->iocachectr refcount with NULL fuse_backing,
because it does not conflict with an already attached fuse_backing object.

Fixes: 4a90451bbc ("fuse: implement open in passthrough mode")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-04-15 10:12:44 +02:00
Amir Goldstein
4864a6dd83 fuse: fix wrong ff->iomode state changes from parallel dio write
There is a confusion with fuse_file_uncached_io_{start,end} interface.
These helpers do two things when called from passthrough open()/release():
1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode)
2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open)

The calls from parallel dio write path need to take a reference on
fi->iocachectr, but they should not be changing ff->iomode state, because
in this case, the fi->iocachectr reference does not stick around until file
release().

Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from
parallel dio write path and rename fuse_file_*cached_io_{start,end} helpers
to fuse_file_*cached_io_{open,release} to clarify the difference.

Fixes: 205c1d8026 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-04-15 10:12:03 +02:00
Kent Overstreet
f0a73d4fde bcachefs: Check for backpointer bucket_offset >= bucket size
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 20:02:11 -04:00
Kent Overstreet
27c15ed297 bcachefs: bch_member.btree_allocated_bitmap
This adds a small (64 bit) per-device bitmap that tracks ranges that
have btree nodes, for accelerating btree node scan if it is ever needed.

- New helpers, bch2_dev_btree_bitmap_marked() and
  bch2_dev_bitmap_mark(), for checking and updating the bitmap

- Interior btree update path updates the bitmaps when required

- The check_allocations pass has a new fsck_err check,
  btree_bitmap_not_marked

- New on disk format version, mi_btree_mitmap, which indicates the new
  bitmap is present

- Upgrade table lists the required recovery pass and expected fsck error

- Btree node scan uses the bitmap to skip ranges if we're on the new
  version

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 20:02:11 -04:00
Kent Overstreet
bdae2a7e60 bcachefs: sysfs internal/trigger_journal_flush
Add a sysfs knob for immediately flushing the entire journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 20:02:11 -04:00
Kent Overstreet
e879389f57 bcachefs: Fix bch2_btree_node_fill() for !path
We shouldn't be doing the unlock/relock dance when we're not using a
path - this fixes an assertion pop when called from btree node scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 20:02:11 -04:00
Kent Overstreet
8cf2036e7b bcachefs: add safety checks in bch2_btree_node_fill()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 18:01:12 -04:00
Kent Overstreet
d789e9a7d5 bcachefs: Interior known are required to have known key types
For forwards compatibilyt, we allow bkeys of unknown type in leaf nodes;
we can simply ignore metadata we don't understand. Pointers to btree
nodes must always be of known types, howwever.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 18:01:12 -04:00
Kent Overstreet
bceb86be9e bcachefs: add missing bounds check in __bch2_bkey_val_invalid()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14 18:01:12 -04:00
Linus Torvalds
72374d71c3 Get rid of lockdep false positives around sysfs/overlayfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZhu2kwAKCRBZ7Krx/gZQ
 62gzAP9eeADy6rQkzgWJ8d8sKzGfmd0nup9WlCOxZSR0XojTXwEAnue47dn7PlMx
 wQY0joZ0V5FO8PNTEbWc2P/dSQrANgc=
 =MshW
 -----END PGP SIGNATURE-----

Merge tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull sysfs fix from Al Viro:
 "Get rid of lockdep false positives around sysfs/overlayfs

  syzbot has uncovered a class of lockdep false positives for setups
  with sysfs being one of the backing layers in overlayfs. The root
  cause is that of->mutex allocated when opening a sysfs file read-only
  (which overlayfs might do) is confused with of->mutex of a file opened
  writable (held in write to sysfs file, which overlayfs won't do).

  Assigning them separate lockdep classes fixes that bunch and it's
  obviously safe"

* tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kernfs: annotate different lockdep class for of->mutex of writable files
2024-04-14 11:41:51 -07:00
Amir Goldstein
16b52bbee4 kernfs: annotate different lockdep class for of->mutex of writable files
The writable file /sys/power/resume may call vfs lookup helpers for
arbitrary paths and readonly files can be read by overlayfs from vfs
helpers when sysfs is a lower layer of overalyfs.

To avoid a lockdep warning of circular dependency between overlayfs
inode lock and kernfs of->mutex, use a different lockdep class for
writable and readonly kernfs files.

Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com
Fixes: 0fedefd4c4 ("kernfs: sysfs: support custom llseek method for sysfs entries")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-14 06:55:46 -04:00
Kent Overstreet
86dbf8c566 bcachefs: Fix btree node merging on write buffer btrees
The btree write buffer flush fastpath that avoids the main transaction
commit path had the unfortunate side effect of not doing btree node
merging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:49:25 -04:00
Kent Overstreet
3f10048973 bcachefs: Disable merges from interior update path
There's been a bug in the btree write buffer where it wasn't triggering
btree node merges - and leaving behind a bunch of nearly empty btree
nodes.

Then during journal replay, when updates to the backpointers btree
aren't using the btree write buffer (because we require synchronization
with journal replay), we end up doing those merges all at once.

Then if it's the interior update path running them, we deadlock because
those run with the highest watermark.

There's no real need for the interior update path to be doing btree node
merges; other code paths can handle that at lower watermarks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:49:25 -04:00
Kent Overstreet
9054ef2ea9 bcachefs: Run merges at BCH_WATERMARK_btree
This fixes a deadlock where the interior update path during journal
replay ends up doing a ton of merges on the backpointers btree, and
deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:49:25 -04:00
Kent Overstreet
9e203c43dc bcachefs: Fix missing write refs in fs fio paths
bch2_journal_flush_seq requires us to have a write ref

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
82cf18f23e bcachefs: Fix deadlock in journal replay
btree_key_can_insert_cached() should be checking the watermark -
BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when
watermark < reclaim, it was being used incorrectly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
4518e80adf bcachefs: Go rw if running any explicit recovery passes
This fixes a bug where we fail to start when upgrading/downgrading
because we forgot we needed to go rw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
9abb6dd7ce bcachefs: Standardize helpers for printing enum strs with bounds checks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
ba8ed36e72 bcachefs: don't queue btree nodes for rewrites during scan
many nodes found during scan will be old nodes, overwritten by newer
nodes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
7b4c4ccf84 bcachefs: fix race in bch2_btree_node_evict()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
2aeed876d7 bcachefs: fix unsafety in bch2_stripe_to_text()
.to_text() functions need to work on key values that didn't pass .valid

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
dc32c118ec bcachefs: fix unsafety in bch2_extent_ptr_to_text()
Need to check if we have a valid bucket before checking if ptr is stale

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
87cb0239c8 bcachefs: btree node scan: handle encrypted nodes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
031ad9e7db bcachefs: Check for packed bkeys that are too big
add missing validation; fixes assertion pop in bkey unpack

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Kent Overstreet
58caa786f1 bcachefs: Fix UAFs of btree_insert_entry array
The btree paths array is now dynamically resizable - and as well the
btree_insert_entries array, as it needs to be the same size.

The merge path (and interior update path) allocates new btree paths,
thus can trigger a resize; thus we need to not retain direct pointers
after invoking merge; similarly when running btree node triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:16 -04:00
Linus Torvalds
76b0e9c429 zonefs fixes for 6.9-rc4
- Suppress a coccicheck warning using str_plural().
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZhnikwAKCRDdoc3SxdoY
 ditrAP9s/r36Dl4c8BZxpoUyCmf44ww6oTMxGzjwi/4gA8Ry2gEA766WGVsDyJf2
 G0qjl7+bVneaOG6Xayzce69OjvZGZws=
 =U6kP
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fix from Damien Le Moal:

 - Suppress a coccicheck warning using str_plural()

* tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Use str_plural() to fix Coccinelle warning
2024-04-13 10:25:32 -07:00
Linus Torvalds
fa4022cb73 Four cifs.ko changesets, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmYaS/8ACgkQiiy9cAdy
 T1G3OAwAjRyhi2XHb0QtA0YpZN0FLFghoY/+hgAYaS20WokdqbdyPfZeievRn6Jq
 zgoRYmPF9ihEAZ/UnNIu5ficHoFFsNTxPodQKOTMKxtYg7RCWv7ZF87+/OafuG/k
 FbYue7LQ1CvloEBMsNg088uFQsCp4u8k/9IaHHf+wEDmti42JTqoLjg5YsgbcotI
 Qt8VtxTVL+FzPax/qQkXtBvHggxTfs7VWvwRkv21NsotQzBoPGxHxv3bUV9G54Ez
 56w6Qg+R4ZSwIGgrRGAIjTZ0E6lPuIBanmePJlFwFB0em7Ajl3K2BBJUXpTsxgEd
 aDgJKFZ+ZmDpp5GWlYRccmy/jm6uOiLdCxPye/EQk5KK/oUNpFUUQ95OJ1clH6Su
 pR8LLk1yPQH+e8eyx0Exq7H8l0lsy4pHzerVyUgdgd9E1IZ3+MIN261vTu2TTfGq
 CtML0WNcZFnQdYbMnkADAZ7zYgr32U0UiWr6WS+V0rugyguDrPIEiIneFl3rSZsq
 MQaOHyS5
 =AHB9
 -----END PGP SIGNATURE-----

Merge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix for oops in cifs_get_fattr of deleted files

 - fix for the remote open counter going negative in some directory
   lease cases

 - fix for mkfifo to instantiate dentry to avoid possible crash

 - important fix to allow handling key rotation for mount and remount
   (ie cases that are becoming more common when password that was used
   for the mount will expire soon but will be replaced by new password)

* tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix broken reconnect when password changing on the server by allowing password rotation
  smb: client: instantiate when creating SFU files
  smb3: fix Open files on server counter going negative
  smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()
2024-04-13 10:10:18 -07:00
Linus Torvalds
90d3eaaf4f Two CephFS fixes marked for stable and a MAINTAINERS update.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmYZXkYTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziyLNCACkJv5GjDeHC22P+UB6re3gO39wdFsL
 BvHmhyebz5iidjS1l6pC/RcLOmB+OwekOiDfFvkwu0E+3d5dZTBInUV4BWzbJVLR
 Hkjd+oCEThV+f58n2Ol/r6lBGIFCgJpgtW02DQurM8wHTLm/vqxpj2GsvJCRXNr6
 lfqWVvZBteEmqxHUGNWldOQh2jNPGyvOpDRCQTzdjEfUAijoc8arLyuwbxl9tFdA
 qVtCZk+Jv0Lz9uQUSsI7aM+SEXUgGW3pZXJL9h9PPjC9d811GCwQnUcL4hNE2B3X
 Nf5Qzo9jV2GKJh8+kHui0dmpt0dMQfFZ2YaeJDZl8c8dE2uqBL+zMO79
 =mvie
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two CephFS fixes marked for stable and a MAINTAINERS update"

* tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client:
  MAINTAINERS: remove myself as a Reviewer for Ceph
  ceph: switch to use cap_delay_lock for the unlink delay list
  ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
2024-04-12 10:15:46 -07:00
Linus Torvalds
5939d45155 Tracing fixes for 6.9:
- Fix the buffer_percent accounting as it is dependent on three variables:
   1) pages_read - number of subbuffers read
   2) pages_lost - number of subbuffers lost due to overwrite
   3) pages_touched - number of pages that a writer entered
   These three counters only increment, and to know how many active pages
   there are on the buffer at any given time, the pages_read and
   pages_lost are subtracted from pages_touched. But the pages touched
   was incremented whenever any writer went to the next subbuffer even
   if it wasn't the only one, so it was incremented more than it should
   be causing the counter for how many subbuffers currently have content
   incorrect, which caused the buffer_percent that holds waiters until
   the ring buffer is filled to a given percentage to wake up early.
 
 - Fix warning of unused functions when PERF_EVENTS is not configured in
 
 - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE
 
 - Fix to some kerneldoc function comments in eventfs code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZhk+khQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qvs0AP98c226UFU6Dha4wvgSulC/wKVvHN3X
 jeclMTdn8RGs2gD/b9OULKNv1//6fP16ZRun7ntRQkotVhlNhf9Ee0smiwU=
 =UYrk
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix the buffer_percent accounting as it is dependent on three
   variables:

     1) pages_read - number of subbuffers read
     2) pages_lost - number of subbuffers lost due to overwrite
     3) pages_touched - number of pages that a writer entered

   These three counters only increment, and to know how many active
   pages there are on the buffer at any given time, the pages_read and
   pages_lost are subtracted from pages_touched.

   But the pages touched was incremented whenever any writer went to the
   next subbuffer even if it wasn't the only one, so it was incremented
   more than it should be causing the counter for how many subbuffers
   currently have content incorrect, which caused the buffer_percent
   that holds waiters until the ring buffer is filled to a given
   percentage to wake up early.

 - Fix warning of unused functions when PERF_EVENTS is not configured in

 - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE

 - Fix to some kerneldoc function comments in eventfs code.

* tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Only update pages_touched when a new page is touched
  tracing: hide unused ftrace_event_id_fops
  tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
  eventfs: Fix kernel-doc comments to functions
2024-04-12 09:02:24 -07:00
Kent Overstreet
2b3e79fea6 bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree split path
It turns out - btree splits happen with the rest of the transaction
still locked, to avoid unnecessary restarts, which means using nofail
doesn't work here - we can deadlock.

Fortunately, we now have the ability to return errors here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11 23:45:12 -04:00
Joakim Sindholt
7fd524b9bd
fs/9p: drop inodes immediately on non-.L too
Signed-off-by: Joakim Sindholt <opensource@zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-04-11 23:40:55 +00:00
Eric Van Hensbergen
824f06ff81
fs/9p: Revert "fs/9p: fix dups even in uncached mode"
This reverts commit be57855f50.

It caused a regression involving duplicate inode numbers in
some tester trees.  The bad behavior seems to be dependent on inode
reuse policy in underlying file system, so it did not trigger in my
test setup.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-04-11 23:36:33 +00:00
Yang Li
a8fa658eeb eventfs: Fix kernel-doc comments to functions
This commit fix kernel-doc style comments with complete parameter
descriptions for the lookup_file(),lookup_dir_entry() and
lookup_file_dentry().

Link: https://lore.kernel.org/linux-trace-kernel/20240322062604.28862-1-yang.lee@linux.alibaba.com

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-04-11 17:42:09 -04:00
Steve French
35f834265e smb3: fix broken reconnect when password changing on the server by allowing password rotation
There are various use cases that are becoming more common in which password
changes are scheduled on a server(s) periodically but the clients connected
to this server need to stay connected (even in the face of brief network
reconnects) due to mounts which can not be easily unmounted and mounted at
will, and servers that do password rotation do not always have the ability
to tell the clients exactly when to the new password will be effective,
so add support for an alt password ("password2=") on mount (and also
remount) so that we can anticipate the upcoming change to the server
without risking breaking existing mounts.

An alternative would have been to use the kernel keyring for this but the
processes doing the reconnect do not have access to the keyring but do
have access to the ses structure.

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-11 16:03:48 -05:00
Paulo Alcantara
c6ff459037 smb: client: instantiate when creating SFU files
In cifs_sfu_make_node(), on success, instantiate rather than leave it
with dentry unhashed negative to support callers that expect mknod(2)
to always instantiate.

This fixes the following test case:

  mount.cifs //srv/share /mnt -o ...,sfu
  mkfifo /mnt/fifo
  ./xfstests/ltp/growfiles -b -W test -e 1 -u -i 0 -L 30 /mnt/fifo
  ...
  BUG: unable to handle page fault for address: 000000034cec4e58
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 1 PREEMPT SMP PTI
  CPU: 0 PID: 138098 Comm: growfiles Kdump: loaded Not tainted
  5.14.0-436.3987_1240945149.el9.x86_64 #1
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  RIP: 0010:_raw_callee_save__kvm_vcpu_is_preempted+0x0/0x20
  Code: e8 15 d9 61 00 e9 63 ff ff ff 41 bd ea ff ff ff e9 58 ff ff ff e8
  d0 71 c0 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <48> 8b 04
  fd 60 2b c1 99 80 b8 90 50 03 00 00 0f 95 c0 c3 cc cc cc
  RSP: 0018:ffffb6a143cf7cf8 EFLAGS: 00010206
  RAX: ffff8a9bc30fb038 RBX: ffff8a9bc666a200 RCX: ffff8a9cc0260000
  RDX: 00000000736f622e RSI: ffff8a9bc30fb038 RDI: 000000007665645f
  RBP: ffffb6a143cf7d70 R08: 0000000000001000 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a9bc666a200
  R13: 0000559a302a12b0 R14: 0000000000001000 R15: 0000000000000000
  FS: 00007fbed1dbb740(0000) GS:ffff8a9cf0000000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000034cec4e58 CR3: 0000000128ec6006 CR4: 0000000000770ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? show_trace_log_lvl+0x1c4/0x2df
   ? show_trace_log_lvl+0x1c4/0x2df
   ? __mutex_lock.constprop.0+0x5f7/0x6a0
   ? __die_body.cold+0x8/0xd
   ? page_fault_oops+0x134/0x170
   ? exc_page_fault+0x62/0x150
   ? asm_exc_page_fault+0x22/0x30
   ? _pfx_raw_callee_save__kvm_vcpu_is_preempted+0x10/0x10
   __mutex_lock.constprop.0+0x5f7/0x6a0
   ? __mod_memcg_lruvec_state+0x84/0xd0
   pipe_write+0x47/0x650
   ? do_anonymous_page+0x258/0x410
   ? inode_security+0x22/0x60
   ? selinux_file_permission+0x108/0x150
   vfs_write+0x2cb/0x410
   ksys_write+0x5f/0xe0
   do_syscall_64+0x5c/0xf0
   ? syscall_exit_to_user_mode+0x22/0x40
   ? do_syscall_64+0x6b/0xf0
   ? sched_clock_cpu+0x9/0xc0
   ? exc_page_fault+0x62/0x150
   entry_SYSCALL_64_after_hwframe+0x6e/0x76

Cc: stable@vger.kernel.org
Fixes: 72bc63f5e2 ("smb3: fix creating FIFOs when mounting with "sfu" mount option")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-11 16:03:40 -05:00
Steve French
28e0947651 smb3: fix Open files on server counter going negative
We were decrementing the count of open files on server twice
for the case where we were closing cached directories.

Fixes: 8e843bf38f ("cifs: return a single-use cfid if we did not get a lease")
Cc: stable@vger.kernel.org
Acked-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-11 16:02:02 -05:00
Xiubo Li
17f8dc2db5 ceph: switch to use cap_delay_lock for the unlink delay list
The same list item will be used in both cap_delay_list and
cap_unlink_delay_list, so it's buggy to use two different locks
to protect them.

Cc: stable@vger.kernel.org
Fixes: dbc347ef7f ("ceph: add ceph_cap_unlink_work to fire check_caps() immediately")
Link: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AODC76VXRAMXKLFDCTK4TKFDDPWUSCN5
Reported-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-04-11 22:56:28 +02:00
Linus Torvalds
e1dc191dbf bcachefs fixes for v6.9-rc4
Notable user impacting bugs
 
 - On multi device filesystems, recovery was looping in
   btree_trans_too_many_iters(). This checks if a transaction has touched
   too many btree paths (because of iteration over many keys), and isuses
   a restart to drop unneeded paths. But it's now possible for some paths
   to exceed the previous limit without iteration in the interior btree
   update path, since the transaction commit will do alloc updates for
   every old and new btree node, and during journal replay we don't use
   the btree write buffer for locking reasons and thus those updates use
   btree paths when they wouldn't normally.
 
 - Fix a corner case in rebalance when moving extents on a durability=0
   device. This wouldn't be hit when a device was formatted with
   durability=0 since in that case we'll only use it as a write through
   cache (only cached extents will live on it), but durability can now be
   changed on an existing device.
 
 - bch2_get_acl() could rarely forget to handle a transaction restart;
   this manifested as the occasional missing acl that came back after
   dropping caches.
 
 - Fix a major performance regression on high iops multithreaded write
   workloads (only since 6.9-rc1); a previous fix for a deadlock in the
   interior btree update path to check the journal watermark introduced a
   dependency on the state of btree write buffer flushing that we didn't
   want.
 
 - Assorted other repair paths and recovery fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYXTd8ACgkQE6szbY3K
 bna4MA//Y/CSB2JupxJPFUAb69+WmNDMnJJV3FlD/Hwo19kOR7aRbKQMUxsH51nb
 dfv5o/58n39QIBqMcMtTipnoND6jrwv7l5NimFKQmj/YehxosdsOf2BgbD/M4Ozz
 84IUmHXSs5+zezjF8IAw/bjR/p13XNJGSYTfl1RWbGUMERLpfZcLn70FvoCR/qpQ
 Bcp+z70K6bBhrPZqFYd2mEC+Cfo42aCD1lqWUQ/e0FHiNEZnCNH2lNca4phBzGt4
 f9sBxBcwmDfizQxqpyZ4izRzbS9ZwZ3ega336L2DrPpwgjMgTRLKLjdXqJDjGvDW
 ngvnaUw7SgICs+q48g3f67cGhw/4lSPdVu/9a/ldqDEP9PynQiUS1G8aDofLdboW
 xCZM6toX86p0DMNY2kP9vc5kxr377cSXAL7VeKbE+ZV5vGyEz1qFDLRAYVixHDfr
 Q7KgYvoJq8CgjfK7rIZbO2tqKhz2TP4+2rJa6tDLwKXs5ice++w/aF5d/nBZ7iCe
 +dyJ+aJiosjHEVG+QscACFrjBJdNNspJqBWDP396XeOsdl+iCeIRP0VHOGytLLRf
 gisE0Lhj4pz6bv7OhAAkdxvegVQ8HfqN1E+f/WM7Kogqos0NS2skEhy9cTuDCHUm
 qtPTUq5XNibiE6J+NOK86pu6o+6sqpWIpfTPuTbMid5sevLsomE=
 =BQC0
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs fixes from Kent Overstreet:
 "Notable user impacting bugs

   - On multi device filesystems, recovery was looping in
     btree_trans_too_many_iters(). This checks if a transaction has
     touched too many btree paths (because of iteration over many keys),
     and isuses a restart to drop unneeded paths.

     But it's now possible for some paths to exceed the previous limit
     without iteration in the interior btree update path, since the
     transaction commit will do alloc updates for every old and new
     btree node, and during journal replay we don't use the btree write
     buffer for locking reasons and thus those updates use btree paths
     when they wouldn't normally.

   - Fix a corner case in rebalance when moving extents on a
     durability=0 device. This wouldn't be hit when a device was
     formatted with durability=0 since in that case we'll only use it as
     a write through cache (only cached extents will live on it), but
     durability can now be changed on an existing device.

   - bch2_get_acl() could rarely forget to handle a transaction restart;
     this manifested as the occasional missing acl that came back after
     dropping caches.

   - Fix a major performance regression on high iops multithreaded write
     workloads (only since 6.9-rc1); a previous fix for a deadlock in
     the interior btree update path to check the journal watermark
     introduced a dependency on the state of btree write buffer flushing
     that we didn't want.

   - Assorted other repair paths and recovery fixes"

* tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs: (25 commits)
  bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
  bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
  bcachefs: Fix a race in btree_update_nodes_written()
  bcachefs: btree_node_scan: Respect member.data_allowed
  bcachefs: Don't scan for btree nodes when we can reconstruct
  bcachefs: Fix check_topology() when using node scan
  bcachefs: fix eytzinger0_find_gt()
  bcachefs: fix bch2_get_acl() transaction restart handling
  bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list
  bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
  bcachefs: Rename struct field swap to prevent macro naming collision
  MAINTAINERS: Add entry for bcachefs documentation
  Documentation: filesystems: Add bcachefs toctree
  bcachefs: JOURNAL_SPACE_LOW
  bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
  bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
  bcachefs: fix rand_delete unit test
  bcachefs: fix ! vs ~ typo in __clear_bit_le64()
  bcachefs: Fix rebalance from durability=0 device
  bcachefs: Print shutdown journal sequence number
  ...
2024-04-11 11:24:55 -07:00
NeilBrown
b372e96bd0 ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
The page has been marked clean before writepage is called.  If we don't
redirty it before postponing the write, it might never get written.

Cc: stable@vger.kernel.org
Fixes: 503d4fa6ee ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-04-11 19:17:02 +02:00
Vasily Gorbik
f488138b52 NFSD: fix endianness issue in nfsd4_encode_fattr4
The nfs4 mount fails with EIO on 64-bit big endian architectures since
v6.7. The issue arises from employing a union in the nfsd4_encode_fattr4()
function to overlay a 32-bit array with a 64-bit values based bitmap,
which does not function as intended. Address the endianness issue by
utilizing bitmap_from_arr32() to copy 32-bit attribute masks into a
bitmap in an endianness-agnostic manner.

Cc: stable@vger.kernel.org
Fixes: fce7913b13 ("NFSD: Use a bitmask loop to encode FATTR4 results")
Link: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2060217
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-11 09:21:06 -04:00
Alan Stern
a90bca2228 fs: sysfs: Fix reference leak in sysfs_break_active_protection()
The sysfs_break_active_protection() routine has an obvious reference
leak in its error path.  If the call to kernfs_find_and_get() fails then
kn will be NULL, so the companion sysfs_unbreak_active_protection()
routine won't get called (and would only cause an access violation by
trying to dereference kn->parent if it was called).  As a result, the
reference to kobj acquired at the start of the function will never be
released.

Fix the leak by adding an explicit kobject_put() call when kn is NULL.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 2afc9166f7 ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/8a4d3f0f-c5e3-4b70-a188-0ca433f9e6f9@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-11 15:16:48 +02:00
Linus Torvalds
03a55b6391 Bootconfig fixes for v6.9-rc3:
- fs/proc: Fix to not show original kernel cmdline more than twice on
   /proc/bootconfig.
 - fs/proc: Fix to show the original cmdline only if the bootconfig
   modifies it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmYWrVMbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bwxMH/0c/y0wimIlEgqhl27j1
 +SCIA9SBF3oZ7P9Jajs/tZZf8jNwOsuXFBQbwqgAkxdKbolaLkhbLOBqKg8LVodH
 fzpDh+w0BxYVnPLTQe1dpOrygEcfWEM7RdwskwDYyNDDuTZecYkbmHF3mlYnyj5h
 /aPSBcfqHqjM8ltUxjus2B2kMcS/Khun2HVyhQRpVeiRMZPtLpdh9RyNhAwwlfGV
 9WLH5JTg45EHN5V9GL5RzX86HIQ82ZfWtqTKdrU7u7ahph3NN6enkD1+MI2vMqjG
 t8F/GsFeFdgXfVE2B4HZPWdA07PUeh416iDeajZBY8QlyK3rt/vATkKcGyyYdVJK
 EGI=
 =N4qD
 -----END PGP SIGNATURE-----

Merge tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull bootconfig fixes from Masami Hiramatsu:

 - show the original cmdline only once, and only if it was modeified by
   bootconfig

* tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fs/proc: Skip bootloader comment if no embedded kernel parameters
  fs/proc: remove redundant comments from /proc/bootconfig
2024-04-10 19:42:45 -07:00
Kent Overstreet
1189bdda6c bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
We weren't respecting trans->journal_replay_not_finished - we shouldn't
be searching the journal keys unless we have a ref on them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-10 22:28:36 -04:00
Kent Overstreet
517236cb3e bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
dropping read locks in bch2_btree_node_lock_write_nofail() dates from
before we had the cycle detector; we can now tell the cycle detector
directly when taking a lock may not fail because we can't handle
transaction restarts.

This is needed for adding should_be_locked asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-10 22:28:36 -04:00
Kent Overstreet
beccf29114 bcachefs: Fix a race in btree_update_nodes_written()
One btree update might have terminated in a node update, and then while
it is in flight another btree update might free that original node.

This race has to be handled in btree_update_nodes_written() - we were
missing a READ_ONCE().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-10 22:28:36 -04:00
Paulo Alcantara
ec4535b2a1 smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()
cifs_get_fattr() may be called with a NULL inode, so check for a
non-NULL inode before calling
cifs_mark_open_handles_for_deleted_file().

This fixes the following oops:

  mount.cifs //srv/share /mnt -o ...,vers=3.1.1
  cd /mnt
  touch foo; tail -f foo &
  rm foo
  cat foo

  BUG: kernel NULL pointer dereference, address: 00000000000005c0
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 696 Comm: cat Not tainted 6.9.0-rc2 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  1.16.3-1.fc39 04/01/2014
  RIP: 0010:__lock_acquire+0x5d/0x1c70
  Code: 00 00 44 8b a4 24 a0 00 00 00 45 85 f6 0f 84 bb 06 00 00 8b 2d
  48 e2 95 01 45 89 c3 41 89 d2 45 89 c8 85 ed 0 0 <48> 81 3f 40 7a 76
  83 44 0f 44 d8 83 fe 01 0f 86 1b 03 00 00 31 d2
  RSP: 0018:ffffc90000b37490 EFLAGS: 00010002
  RAX: 0000000000000000 RBX: ffff888110021ec0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000005c0
  RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200
  FS: 00007f2a1fa08740(0000) GS:ffff888157a00000(0000)
  knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0:
  0000000080050033
  CR2: 00000000000005c0 CR3: 000000011ac7c000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x180/0x490
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x70/0x230
   ? asm_exc_page_fault+0x26/0x30
   ? __lock_acquire+0x5d/0x1c70
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc0/0x2d0
   ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? kmem_cache_alloc+0x2d9/0x370
   _raw_spin_lock+0x34/0x80
   ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
   cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
   cifs_get_fattr+0x24c/0x940 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   cifs_get_inode_info+0x96/0x120 [cifs]
   cifs_lookup+0x16e/0x800 [cifs]
   cifs_atomic_open+0xc7/0x5d0 [cifs]
   ? lookup_open.isra.0+0x3ce/0x5f0
   ? __pfx_cifs_atomic_open+0x10/0x10 [cifs]
   lookup_open.isra.0+0x3ce/0x5f0
   path_openat+0x42b/0xc30
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   do_filp_open+0xc4/0x170
   do_sys_openat2+0xab/0xe0
   __x64_sys_openat+0x57/0xa0
   do_syscall_64+0xc1/0x1e0
   entry_SYSCALL_64_after_hwframe+0x72/0x7a

Fixes: ffceb7640c ("smb: client: do not defer close open handles to deleted files")
Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-10 18:53:43 -05:00
Eric Van Hensbergen
6e45a30fe5
fs/9p: remove erroneous nlink init from legacy stat2inode
In 9p2000 legacy mode, stat2inode initializes nlink to 1,
which is redundant with what alloc_inode should have already set.
9p2000.u overrides this with extensions if present in the stat
structure, and 9p2000.L incorporates nlink into its stat structure.

At the very least this probably messes with directory nlink
accounting in legacy mode.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-04-09 23:53:00 +00:00
Kent Overstreet
9b31152fd7 bcachefs: btree_node_scan: Respect member.data_allowed
If a device wasn't used for btree nodes, no need to scan for them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09 18:54:46 -04:00
Thorsten Blum
60b703c71f zonefs: Use str_plural() to fix Coccinelle warning
Fixes the following Coccinelle/coccicheck warning reported by
string_choices.cocci:

	opportunity for str_plural(zgroup->g_nr_zones)

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2024-04-10 07:23:47 +09:00
Qu Wenruo
1db7959aac btrfs: do not wait for short bulk allocation
[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array() for compressed read/write.

[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed (fell back to single page
allocation) we still retry but with extra memalloc_retry_wait().

If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.

The behavior was introduced in commit 395cb57e85 ("btrfs: wait between
incomplete batch memory allocations").

[FIX]
Although the commit mentioned that other filesystems do the wait, it's
not the case at least nowadays.

All the mainlined filesystems only call memalloc_retry_wait() if they
failed to allocate any page (not only for bulk allocation).
If there is any progress, they won't call memalloc_retry_wait() at all.

For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
if there is no allocation progress at all, and the call is not for
metadata readahead.

So I don't believe we should call memalloc_retry_wait() unconditionally
for short allocation.

Call memalloc_retry_wait() if it fails to allocate any page for tree
block allocation (which goes with __GFP_NOFAIL and may not need the
special handling anyway), and reduce the latency for
btrfs_alloc_page_array().

Reported-by: Julian Taylor <julian.taylor@1und1.de>
Tested-by: Julian Taylor <julian.taylor@1und1.de>
Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
Fixes: 395cb57e85 ("btrfs: wait between incomplete batch memory allocations")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-09 23:20:32 +02:00
Naohiro Aota
073bda7a54 btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling
Add an ASSERT to catch a faulty delayed reference item resulting from
prematurely cleared extent buffer.

Also, add a WARN to detect if we try to dirty a ZEROOUT buffer again, which
is suspicious as its update will be lost.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-09 23:20:29 +02:00
Naohiro Aota
6887938618 btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer
Btrfs clears the content of an extent buffer marked as
EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is
introduced to prevent a write hole of an extent buffer, which is once
allocated, marked dirty, but turns out unnecessary and cleaned up within
one transaction operation.

Currently, btrfs_clear_buffer_dirty() marks the extent buffer as
EXTENT_BUFFER_ZONED_ZEROOUT, and skips the entry function. If this call
happens while the buffer is under IO (with the WRITEBACK flag set,
without the DIRTY flag), we can add the ZEROOUT flag and clear the
buffer's content just before a bio submission. As a result:

1) it can lead to adding faulty delayed reference item which leads to a
   FS corrupted (EUCLEAN) error, and

2) it writes out cleared tree node on disk

The former issue is previously discussed in [1]. The corruption happens
when it runs a delayed reference update. So, on-disk data is safe.

[1] https://lore.kernel.org/linux-btrfs/3f4f2a0ff1a6c818050434288925bdcf3cd719e5.1709124777.git.naohiro.aota@wdc.com/

The latter one can reach on-disk data. But, as that node is already
processed by btrfs_clear_buffer_dirty(), that will be invalidated in the
next transaction commit anyway. So, the chance of hitting the corruption
is relatively small.

Anyway, we should skip flagging ZEROOUT on a non-DIRTY extent buffer, to
keep the content under IO intact.

Fixes: aa6313e6ff ("btrfs: zoned: don't clear dirty flag of extent buffer")
CC: stable@vger.kernel.org # 6.8
Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-09 23:20:28 +02:00
Masami Hiramatsu
c722cea208 fs/proc: Skip bootloader comment if no embedded kernel parameters
If the "bootconfig" kernel command-line argument was specified or if
the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
no embedded kernel parameter, omit the "# Parameters from bootloader:"
comment from the /proc/bootconfig file.  This will cause automation
to fall back to the /proc/cmdline file, which will be identical to the
comment in this no-embedded-kernel-parameters case.

Link: https://lore.kernel.org/all/20240409044358.1156477-2-paulmck@kernel.org/

Fixes: 8b8ce6c75430 ("fs/proc: remove redundant comments from /proc/bootconfig")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-09 23:36:18 +09:00
Zhenhua Huang
fbbdc255fb fs/proc: remove redundant comments from /proc/bootconfig
commit 717c7c894d ("fs/proc: Add boot loader arguments as comment to
/proc/bootconfig") adds bootloader argument comments into /proc/bootconfig.

/proc/bootconfig shows boot_command_line[] multiple times following
every xbc key value pair, that's duplicated and not necessary.
Remove redundant ones.

Output before and after the fix is like:
key1 = value1
*bootloader argument comments*
key2 = value2
*bootloader argument comments*
key3 = value3
*bootloader argument comments*
...

key1 = value1
key2 = value2
key3 = value3
*bootloader argument comments*
...

Link: https://lore.kernel.org/all/20240409044358.1156477-1-paulmck@kernel.org/

Fixes: 717c7c894d ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig")
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: <linux-trace-kernel@vger.kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-09 23:31:54 +09:00