92093 Commits

Author SHA1 Message Date
Konstantin Komarov
2f7236ba9f fs/ntfs3: Missed NI_FLAG_UPDATE_PARENT setting
[ Upstream commit 1c308ace1fd6de93bd0b7e1a5e8963ab27e2c016 ]

Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:21 +02:00
Konstantin Komarov
539bd10ab6 fs/ntfs3: Deny getting attr data block in compressed frame
[ Upstream commit 69943484b95267c94331cba41e9e64ba7b24f136 ]

Attempting to retrieve an attribute data block in a compressed frame
is ignored.

Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:21 +02:00
Konstantin Komarov
ff73b91873 fs/ntfs3: Fix transform resident to nonresident for compressed files
[ Upstream commit 25610ff98d4a34e6a85cbe4fd8671be6b0829f8f ]

Сorrected calculation of required space len (in clusters)
for attribute data storage in case of compression.

Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:21 +02:00
Konstantin Komarov
a62241fb49 fs/ntfs3: Merge synonym COMPRESSION_UNIT and NTFS_LZNT_CUNIT
[ Upstream commit 487f8d482a7e51a640b8f955a398f906a4f83951 ]

COMPRESSION_UNIT and NTFS_LZNT_CUNIT mean the same thing
(1u<<NTFS_LZNT_CUNIT) determines the size for compression (in clusters).

COMPRESS_MAX_CLUSTER is not used in the code.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Stable-dep-of: 25610ff98d4a ("fs/ntfs3: Fix transform resident to nonresident for compressed files")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:21 +02:00
Christoph Hellwig
387e6e9d11 nfs: pass explicit offset/count to trace events
[ Upstream commit fada32ed6dbc748f447c8d050a961b75d946055a ]

nfs_folio_length is unsafe to use without having the folio locked and a
check for a NULL ->f_mapping that protects against truncations and can
lead to kernel crashes.  E.g. when running xfstests generic/065 with
all nfs trace points enabled.

Follow the model of the XFS trace points and pass in an explіcit offset
and length.  This has the additional benefit that these values can
be more accurate as some of the users touch partial folio ranges.

Fixes: eb5654b3b89d ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:05 +02:00
Jan Kara
024d8556f3 ext4: avoid writing unitialized memory to disk in EA inodes
[ Upstream commit 65121eff3e4c8c90f8126debf3c369228691c591 ]

If the extended attribute size is not a multiple of block size, the last
block in the EA inode will have uninitialized tail which will get
written to disk. We will never expose the data to userspace but still
this is not a good practice so just zero out the tail of the block as it
isn't going to cause a noticeable performance overhead.

Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:04 +02:00
Luis Henriques (SUSE)
149a0691bf ext4: don't track ranges in fast_commit if inode has inlined data
[ Upstream commit 7882b0187bbeb647967a7b5998ce4ad26ef68a9a ]

When fast-commit needs to track ranges, it has to handle inodes that have
inlined data in a different way because ext4_fc_write_inode_data(), in the
actual commit path, will attempt to map the required blocks for the range.
However, inodes that have inlined data will have it's data stored in
inode->i_block and, eventually, in the extended attribute space.

Unfortunately, because fast commit doesn't currently support extended
attributes, the solution is to mark this commit as ineligible.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Tested-by: Ben Hutchings <benh@debian.org>
Fixes: 9725958bb75c ("ext4: fast commit may miss tracking unwritten range during ftruncate")
Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:04 +02:00
Olga Kornievskaia
4ebf5bfd34 NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server
[ Upstream commit 4840c00003a2275668a13b82c9f5b1aed80183aa ]

Previously in order to mark the communication with the DS server,
we tried to use NFS_CS_DS in cl_flags. However, this flag would
only be saved for the DS server and in case where DS equals MDS,
the client would not find a matching nfs_client in nfs_match_client
that represents the MDS (but is also a DS).

Instead, don't rely on the NFS_CS_DS but instead use NFS_CS_PNFS.

Fixes: 379e4adfddd6 ("NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:04 +02:00
Luis Henriques (SUSE)
81f819c537 ext4: fix infinite loop when replaying fast_commit
[ Upstream commit 907c3fe532253a6ef4eb9c4d67efb71fab58c706 ]

When doing fast_commit replay an infinite loop may occur due to an
uninitialized extent_status struct.  ext4_ext_determine_insert_hole() does
not detect the replay and calls ext4_es_find_extent_range(), which will
return immediately without initializing the 'es' variable.

Because 'es' contains garbage, an integer overflow may happen causing an
infinite loop in this function, easily reproducible using fstest generic/039.

This commit fixes this issue by unconditionally initializing the structure
in function ext4_es_find_extent_range().

Thanks to Zhang Yi, for figuring out the real problem!

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240515082857.32730-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 09:00:02 +02:00
Jeff Layton
4799e4e51f nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
[ Upstream commit 769d20028f45a4f442cfe558a32faba357a7f5e2 ]

"data" actually refers to a file_lease and not a file_lock. Both structs
have their file_lock_core as the first field though, so this bug should
be harmless without struct randomization in play.

Reported-by: Florian Evers <florian-evers@gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219008
Fixes: 05580bbfc6bc ("nfsd: adapt to breakup of struct file_lock")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Florian Evers <florian-evers@gmx.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 08:59:46 +02:00
Chuck Lever
65b97d6f32 NFSD: Fix nfsdcld warning
[ Upstream commit 18a5450684c312e98eb2253f0acf88b3f780af20 ]

Since CONFIG_NFSD_LEGACY_CLIENT_TRACKING is a new config option, its
initial default setting should have been Y (if we are to follow the
common practice of "default Y, wait, default N, wait, remove code").

Paul also suggested adding a clearer remedy action to the warning
message.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Message-Id: <d2ab4ee7-ba0f-44ac-b921-90c8fa5a04d2@molgen.mpg.de>
Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 08:59:44 +02:00
Jan Kara
40d7b3ed52 udf: Fix bogus checksum computation in udf_rename()
[ Upstream commit 27ab33854873e6fb958cb074681a0107cc2ecc4c ]

Syzbot reports uninitialized memory access in udf_rename() when updating
checksum of '..' directory entry of a moved directory. This is indeed
true as we pass on-stack diriter.fi to the udf_update_tag() and because
that has only struct fileIdentDesc included in it and not the impUse or
name fields, the checksumming function is going to checksum random stack
contents beyond the end of the structure. This is actually harmless
because the following udf_fiiter_write_fi() will recompute the checksum
from on-disk buffers where everything is properly included. So all that
is needed is just removing the bogus calculation.

Fixes: e9109a92d2a9 ("udf: Convert udf_rename() to new directory iteration code")
Link: https://lore.kernel.org/all/000000000000cf405f060d8f75a9@google.com/T/
Link: https://patch.msgid.link/20240617154201.29512-1-jack@suse.cz
Reported-by: syzbot+d31185aa54170f7fc1f5@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 08:59:39 +02:00
Jan Kara
50b556a1ce udf: Fix lock ordering in udf_evict_inode()
[ Upstream commit 8832fc1e502687869606bb0a7b79848ed3bf036f ]

udf_evict_inode() calls udf_setsize() to truncate deleted inode.
However inode deletion through udf_evict_inode() can happen from inode
reclaim context and udf_setsize() grabs mapping->invalidate_lock which
isn't generally safe to acquire from fs reclaim context since we
allocate pages under mapping->invalidate_lock for example in a page
fault path.  This is however not a real deadlock possibility as by the
time udf_evict_inode() is called, nobody can be accessing the inode,
even less work with its page cache. So this is just a lockdep triggering
false positive. Fix the problem by moving mapping->invalidate_lock
locking outsize of udf_setsize() into udf_setattr() as grabbing
mapping->invalidate_lock from udf_evict_inode() is pointless.

Reported-by: syzbot+0333a6f4b88bcd68a62f@syzkaller.appspotmail.com
Fixes: b9a861fd527a ("udf: Protect truncate and file type conversion with invalidate_lock")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 08:59:36 +02:00
Chao Yu
312d272dc4 hfsplus: fix to avoid false alarm of circular locking
[ Upstream commit be4edd1642ee205ed7bbf66edc0453b1be1fb8d7 ]

Syzbot report potential ABBA deadlock as below:

loop0: detected capacity change from 0 to 1024
======================================================
WARNING: possible circular locking dependency detected
6.9.0-syzkaller-10323-g8f6a15f095a6 #0 Not tainted
------------------------------------------------------
syz-executor171/5344 is trying to acquire lock:
ffff88807cb980b0 (&tree->tree_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595

but task is already holding lock:
ffff88807a930108 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x2da/0xb50 fs/hfsplus/extents.c:576

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __mutex_lock_common kernel/locking/mutex.c:608 [inline]
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       hfsplus_file_extend+0x21b/0x1b70 fs/hfsplus/extents.c:457
       hfsplus_bmap_reserve+0x105/0x4e0 fs/hfsplus/btree.c:358
       hfsplus_rename_cat+0x1d0/0x1050 fs/hfsplus/catalog.c:456
       hfsplus_rename+0x12e/0x1c0 fs/hfsplus/dir.c:552
       vfs_rename+0xbdb/0xf00 fs/namei.c:4887
       do_renameat2+0xd94/0x13f0 fs/namei.c:5044
       __do_sys_rename fs/namei.c:5091 [inline]
       __se_sys_rename fs/namei.c:5089 [inline]
       __x64_sys_rename+0x86/0xa0 fs/namei.c:5089
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&tree->tree_lock){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3134 [inline]
       check_prevs_add kernel/locking/lockdep.c:3253 [inline]
       validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
       __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __mutex_lock_common kernel/locking/mutex.c:608 [inline]
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595
       hfsplus_setattr+0x1ce/0x280 fs/hfsplus/inode.c:265
       notify_change+0xb9d/0xe70 fs/attr.c:497
       do_truncate+0x220/0x310 fs/open.c:65
       handle_truncate fs/namei.c:3308 [inline]
       do_open fs/namei.c:3654 [inline]
       path_openat+0x2a3d/0x3280 fs/namei.c:3807
       do_filp_open+0x235/0x490 fs/namei.c:3834
       do_sys_openat2+0x13e/0x1d0 fs/open.c:1406
       do_sys_open fs/open.c:1421 [inline]
       __do_sys_creat fs/open.c:1497 [inline]
       __se_sys_creat fs/open.c:1491 [inline]
       __x64_sys_creat+0x123/0x170 fs/open.c:1491
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&HFSPLUS_I(inode)->extents_lock);
                               lock(&tree->tree_lock);
                               lock(&HFSPLUS_I(inode)->extents_lock);
  lock(&tree->tree_lock);

This is a false alarm as tree_lock mutex are different, one is
from sbi->cat_tree, and another is from sbi->ext_tree:

Thread A			Thread B
- hfsplus_rename
 - hfsplus_rename_cat
  - hfs_find_init
   - mutext_lock(cat_tree->tree_lock)
				- hfsplus_setattr
				 - hfsplus_file_truncate
				  - mutex_lock(hip->extents_lock)
				  - hfs_find_init
				   - mutext_lock(ext_tree->tree_lock)
  - hfs_bmap_reserve
   - hfsplus_file_extend
    - mutex_lock(hip->extents_lock)

So, let's call mutex_lock_nested for tree_lock mutex lock, and pass
correct lock class for it.

Fixes: 31651c607151 ("hfsplus: avoid deadlock on file truncation")
Reported-by: syzbot+6030b3b1b9bf70e538c4@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-fsdevel/000000000000e37a4005ef129563@google.com
Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240607142304.455441-1-chao@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-03 08:59:11 +02:00
Jann Horn
ed898f9ca3 filelock: Fix fcntl/close race recovery compat path
commit f8138f2ad2f745b9a1c696a05b749eabe44337ea upstream.

When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when
fcntl/close race is detected"), I missed that there are two copies of the
code I was patching: The normal version, and the version for 64-bit offsets
on 32-bit kernels.
Thanks to Greg KH for stumbling over this while doing the stable
backport...

Apply exactly the same fix to the compat path for 32-bit kernels.

Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling")
Cc: stable@kernel.org
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 11:40:36 +02:00
lei lu
617cf144c2 fs/ntfs3: Validate ff offset
commit 50c47879650b4c97836a0086632b3a2e300b0f06 upstream.

This adds sanity checks for ff offset. There is a check
on rt->first_free at first, but walking through by ff
without any check. If the second ff is a large offset.
We may encounter an out-of-bound read.

Signed-off-by: lei lu <llfamsec@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 11:40:32 +02:00
Konstantin Komarov
9b71f820f7 fs/ntfs3: Add a check for attr_names and oatbl
commit 702d4930eb06dcfda85a2fa67e8a1a27bfa2a845 upstream.

Added out-of-bound checking for *ane (ATTR_NAME_ENTRY).

Reported-by: lei lu <llfamsec@gmail.com>
Fixes: 865e7a7700d93 ("fs/ntfs3: Reduce stack usage")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 11:40:32 +02:00
lei lu
dbde7bc910 jfs: don't walk off the end of ealist
commit d0fa70aca54c8643248e89061da23752506ec0d4 upstream.

Add a check before visiting the members of ea to
make sure each ea stays within the ealist.

Signed-off-by: lei lu <llfamsec@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 11:40:32 +02:00
lei lu
edb2e67dd4 ocfs2: add bounds checking to ocfs2_check_dir_entry()
commit 255547c6bb8940a97eea94ef9d464ea5967763fb upstream.

This adds sanity checks for ocfs2_dir_entry to make sure all members of
ocfs2_dir_entry don't stray beyond valid memory region.

Link: https://lkml.kernel.org/r/20240626104433.163270-1-llfamsec@gmail.com
Signed-off-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 11:40:32 +02:00
David Howells
fafd1dcc84 cifs: Fix setting of zero_point after DIO write
commit 61ea6b3a3104fcd66364282391dd2152bc4c129a upstream.

At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to
adjust the size of the file if it needs it.  This will reduce the
zero_point (the point above which we assume a read will just return zeros)
if it's more than the new i_size, but won't increase it.

With DIO writes, however, we definitely want to increase it as we have
clobbered the local pagecache and then written some data that's not
available locally.

Fix cifs to make the zero_point above the end of a DIO or unbuffered write.

This fixes corruption seen occasionally with the generic/708 xfs-test.  In
that case, the read-back of some of the written data is being
short-circuited and replaced with zeroes.

Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Cc: stable@vger.kernel.org
Reported-by: Steve French <sfrench@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-24 15:54:06 +02:00
David Howells
b1d0a56676 cifs: Fix server re-repick on subrequest retry
commit de40579b903883274fe203865f29d66b168b7236 upstream.

When a subrequest is marked for needing retry, netfs will call
cifs_prepare_write() which will make cifs repick the server for the op
before renegotiating credits; it then calls cifs_issue_write() which
invokes smb2_async_writev() - which re-repicks the server.

If a different server is then selected, this causes the increment of
server->in_flight to happen against one record and the decrement to happen
against another, leading to misaccounting.

Fix this by just removing the repick code in smb2_async_writev().  As this
is only called from netfslib-driven code, cifs_prepare_write() should
always have been called first, and so server should never be NULL and the
preparatory step is repeated in the event that we do a retry.

The problem manifests as a warning looking something like:

 WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs]
 ...
 RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs]
 ...
  smb2_writev_callback+0x334/0x560 [cifs]
  cifs_demultiplex_thread+0x77a/0x11b0 [cifs]
  kthread+0x187/0x1d0
  ret_from_fork+0x34/0x60
  ret_from_fork_asm+0x1a/0x30

Which may be triggered by a number of different xfstests running against an
Azure server in multichannel mode.  generic/249 seems the most repeatable,
but generic/215, generic/249 and generic/308 may also show it.

Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Cc: stable@vger.kernel.org
Reported-by: Steve French <smfrench@gmail.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Aurelien Aptel <aaptel@suse.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-24 15:54:06 +02:00
Steve French
b5347b051d cifs: fix noisy message on copy_file_range
commit ae4ccca47195332c69176b8615c5ee17efd30c46 upstream.

There are common cases where copy_file_range can noisily
log "source and target of copy not on same server"
e.g. the mv command across mounts to two different server's shares.
Change this to informational rather than logging as an error.

A followon patch will add dynamic trace points e.g. for
cifs_file_copychunk_range

Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-24 15:54:06 +02:00
David Howells
5c0a6c40c2 cifs: Fix missing fscache invalidation
commit a07d38afd15281c42613943a9a715c3ba07c21e6 upstream.

A network filesystem needs to implement a netfslib hook to invalidate
fscache if it's to be able to use the cache.

Fix cifs to implement the cache invalidation hook.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-24 15:54:06 +02:00
David Howells
447c00d76e cifs: Fix missing error code set
commit d2c5eb57b6da10f335c30356f9696bd667601e6a upstream.

In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by
a successful return from netfs_start_io_direct(), such that if
cifs_find_lock_conflict() fails, we don't return an error.

Fix this by resetting the default error code.

Fixes: 14b1cd25346b ("cifs: Fix locking in cifs_strict_readv()")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-24 15:54:05 +02:00
Kees Cook
4378be89dd ext4: use memtostr_pad() for s_volume_name
commit be27cd64461c45a6088a91a04eba5cd44e1767ef upstream.

As with the other strings in struct ext4_super_block, s_volume_name is
not NUL terminated. The other strings were marked in commit 072ebb3bffe6
("ext4: add nonstring annotations to ext4.h"). Using strscpy() isn't
the right replacement for strncpy(); it should use memtostr_pad()
instead.

Reported-by: syzbot+50835f73143cc2905b9e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000019f4c00619192c05@google.com/
Fixes: 744a56389f73 ("ext4: replace deprecated strncpy with alternatives")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://patch.msgid.link/20240523225408.work.904-kees@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-24 15:54:05 +02:00
Linus Torvalds
d0d0cd3800 small fix, also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmaSqPwACgkQiiy9cAdy
 T1EdMAv/Q+qBEmCUybFAjkJelkt+FecWWWZ3L26TXjyAGZBlf7cl590Rr5jXRLw1
 xPDdUt7rE0Zxpg0pK8L5QRgDjc7BwiuAIEJfxdI/gAHbEueElLGdqvFp0G1HSBvY
 3lgkG5zz9uZUBemFlrxZ2Wsd4MiHBPsaBx5+TEPPGkRhWzd3LRU7fi7PGa6PUD3U
 BChQED88EhWB7BfxOqctAYfUgOxqzqiaOe5KAATsWcKpJ3sqgYCHLiVn5vZQ7tYD
 69HijShCHC8ng7KeXkW3XJf1knsDHlHsROzNQgX+pUqEZWcDsjGpJNKGtIO3IfeD
 9uOy3U+VuPwaVnVZnr5+bSqaiZbOehvGa+3T/JOwJnRfwVP6Kb97/YiEJdVvFwiI
 K0CSop3+cgBouqo9S+4j2mjosN6oCQcfTGxBXzMCIZwdawvkNAVxKg/7RpuDuRWJ
 3QVVOKzmVOYE6X1RTsnBevcgjCg/t6upfD+m99a8JnZlZislyzxj9qKUcs1XZ8WJ
 02TCRc3V
 =v/5z
 -----END PGP SIGNATURE-----

Merge tag '6.10-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
 "Small fix, also for stable"

* tag '6.10-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix setting SecurityFlags to true
2024-07-13 13:00:25 -07:00
Steve French
d2346e2836 cifs: fix setting SecurityFlags to true
If you try to set /proc/fs/cifs/SecurityFlags to 1 it
will set them to CIFSSEC_MUST_NTLMV2 which no longer is
relevant (the less secure ones like lanman have been removed
from cifs.ko) and is also missing some flags (like for
signing and encryption) and can even cause mount to fail,
so change this to set it to Kerberos in this case.

Also change the description of the SecurityFlags to remove mention
of flags which are no longer supported.

Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-13 09:24:27 -05:00
Linus Torvalds
975f3b6da1 for-6.10-rc7-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmaRcQgACgkQxWXV+ddt
 WDvAGxAAknJAiREp/AmzhSwkhr+nSnqex0t+VVgsOaMTu0BEHO0xhoXc3l0QuSwS
 u2AIqmOYyzr/UQVXCuatBqAE+5T4njtYAYIWwE825yquAtHNyuok9+Sjhfvxrwgs
 HmNAN4Vvl2Fwds7xbWE8ug18QlssuRTIX8hk7ZtS6xo49g0tsbRX9KlzIPpsULD3
 BOZa+2NJwC1PGVeNPf3p06rfiUkKfmFYgdDybe2zJ17uwsRz1CFSsaEEB35ys1f0
 xYOS4epfcie03EGyZmYctuNxatUkk/J/1lTH4Z9JHwvPBvLK1U97SyJ11Wz2VQC/
 8ar8gUDRYtjWdf6vn6AWBM4MseaYm9LDMlPhbSfvpDcWiclGTE64IOP4gKKr3mCh
 WzlNSIR9I+tYgrhvcsCEzd7lvrSVHa7clwfooYgkEx0wl5lgbN0llAdtJWG3eeLn
 3stxje2FqqXsFNj5N9SrPy7f7t6xF2i8vwk4qh6EpRuT4yuatb+nWzDm9EuTT/Bc
 P+zM1KFp7Blk7Zw/Tpw0O9qjt1whStY2xrqcMzg539WVo45MmuFEFzmGBRwZsH55
 QPGLIjXPpt728AgMdhBFEG0DtWaiA3AOI/C5nYOtLu92aZVBmbaX7/d/GpJv3Vvd
 Ihvr9s1c49YvTZsIS0T0tkq/7LXZi/SToRJDjhP5HCrRGf7A30Y=
 =gtsF
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Fix a regression in extent map shrinker behaviour.

  In the past weeks we got reports from users that there are huge
  latency spikes or freezes. This was bisected to newly added shrinker
  of extent maps (it was added to fix a build up of the structures in
  memory).

  I'm assuming that the freezes would happen to many users after release
  so I'd like to get it merged now so it's in 6.10. Although the diff
  size is not small the changes are relatively straightforward, the
  reporters verified the fixes and we did testing on our side.

  The fixes:

   - adjust behaviour under memory pressure and check lock or scheduling
     conditions, bail out if needed

   - synchronize tracking of the scanning progress so inode ranges are
     not skipped or work duplicated

   - do a delayed iput when scanning a root so evicting an inode does
     not slow things down in case of lots of dirty data, also fix
     lockdep warning, a deadlock could happen when writing the dirty
     data would need to start a transaction"

* tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: avoid races when tracking progress for extent map shrinking
  btrfs: stop extent map shrinker if reschedule is needed
  btrfs: use delayed iput during extent map shrinking
2024-07-12 12:08:42 -07:00
Linus Torvalds
5d4c85134b bcachefs fixes for 6.10-rc8, more
- revert the SLAB_ACCOUNT patch, something crazy is going on in memcg
   and someone forgot to test
 - minor fixes: missing rcu_read_lock(), scheduling while atomic (in an
   emergency shutdown path)
 - two lockdep fixes; these could have gone earlier, but were left to
   bake awhile
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmaRRssACgkQE6szbY3K
 bnZpVw/+KBili018zGFIZjaYg89tNjyPhfdjtTSnTBm+989Naw4GjdF8nR81sVfC
 LPW3hbUE/uZ9d9++kmwWXe+c6QrCk8skbc9z/6Bs7LAeidranZWmLNg157TfqOkF
 utWhXb8oo4X0wivb6V+slnk7G4fU+p+uYfd68BFMGhNMX62ldjkVqKwpaT2bsQ03
 5igS8b2ss90Q+XLkpzHNDYl/yq8kFxUbaeDc4IKFV0aN5pq93pBOGDtZCi1dHeN+
 O9rHYyXoSd26+C/gjCscD/ZIfmEq7IgWRZP/2157fbVzSx9VzWZtb2FIuVf0cx/F
 z01+M8fZctJVAIFijvG33P5KwAiwyDzGZMN5S/U4CICvMXu/iwJrAPUOjJoII8Dl
 09ex4X0cZZjvFsA+dDMfd2Vc8U6dgpo4+7j3/rZkH/9REJypnhv/o+89Dl+Lx9P6
 UdQsqphQjz0ud4Qd5TOT+x/7n8JFlRLnzeIXf8U1qKKlkKCuhxBnDEtVUc/s7gBc
 8ekLQSHgpy14fCpq0wgOYx4OFQrY4KQ+Ocpt3i9RdzX1+Nti4v1REgVvBmi4UHra
 5itcQFGMEq0xQ0H9eZcwqqoUHuQuzCtaP12CbECHDjWfZDWKDaYHkG2jo0SK6P9e
 8ISZakmdWsu0sqP4nuexKvcb4K1ov0JtidMlGOeB3KYrQ7LHQjA=
 =vpux
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs fixes from Kent Overstreet:

 - revert the SLAB_ACCOUNT patch, something crazy is going on in memcg
   and someone forgot to test

 - minor fixes: missing rcu_read_lock(), scheduling while atomic (in an
   emergency shutdown path)

 - two lockdep fixes; these could have gone earlier, but were left to
   bake awhile

* tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: bch2_gc_btree() should not use btree_root_lock
  bcachefs: Set PF_MEMALLOC_NOFS when trans->locked
  bcachefs; Use trans_unlock_long() when waiting on allocator
  Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"
  bcachefs: fix scheduling while atomic in break_cycle()
  bcachefs: Fix RCU splat
2024-07-12 08:22:43 -07:00
Kent Overstreet
1841027c7d bcachefs: bch2_gc_btree() should not use btree_root_lock
btree_root_lock is for the root keys in btree_root, not the pointers to
the nodes themselves; this fixes a lock ordering issue between
btree_root_lock and btree node locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11 20:10:55 -04:00
Kent Overstreet
f236ea4bca bcachefs: Set PF_MEMALLOC_NOFS when trans->locked
proper lock ordering is: fs_reclaim -> btree node locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11 20:10:55 -04:00
Kent Overstreet
f0f3e51148 bcachefs; Use trans_unlock_long() when waiting on allocator
not using unlock_long() blocks key cache reclaim, and the allocator may
take awhile

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11 20:10:55 -04:00
Kent Overstreet
aacd897d4d Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"
This reverts commit 86d81ec5f5f05846c7c6e48ffb964b24cba2e669.

This wasn't tested with memcg enabled, it immediately hits a null ptr
deref in list_lru_add().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11 20:01:38 -04:00
Linus Torvalds
83ab4b461e vfs-6.10-rc8.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZo9dYAAKCRCRxhvAZXjc
 omYQAP4wELNW5StzljRReC6s/Kzu6IANJQlfFpuGnPIl23iRmwD+Pq433xQqSy5f
 uonMBEdxqbOrJM7A6KeHKCyuAKYpNg0=
 =zg3n
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "cachefiles:

   - Export an existing and add a new cachefile helper to be used in
     filesystems to fix reference count bugs

   - Use the newly added fscache_ty_get_volume() helper to get a
     reference count on an fscache_volume to handle volumes that are
     about to be removed cleanly

   - After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN
     wait for all ongoing cookie lookups to complete and for the object
     count to reach zero

   - Propagate errors from vfs_getxattr() to avoid an infinite loop in
     cachefiles_check_volume_xattr() because it keeps seeing ESTALE

   - Don't send new requests when an object is dropped by raising
     CACHEFILES_ONDEMAND_OJBSTATE_DROPPING

   - Cancel all requests for an object that is about to be dropped

   - Wait for the ondemand_boject_worker to finish before dropping a
     cachefiles object to prevent use-after-free

   - Use cyclic allocation for message ids to better handle id recycling

   - Add missing lock protection when iterating through the xarray when
     polling

  netfs:

   - Use standard logging helpers for debug logging

  VFS:

   - Fix potential use-after-free in file locks during
     trace_posix_lock_inode(). The tracepoint could fire while another
     task raced it and freed the lock that was requested to be traced

   - Only increment the nr_dentry_negative counter for dentries that are
     present on the superblock LRU. Currently, DCACHE_LRU_LIST list is
     used to detect this case. However, the flag is also raised in
     combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru
     is used. So checking only DCACHE_LRU_LIST will lead to wrong
     nr_dentry_negative count. Fix the check to not count dentries that
     are on a shrink related list

  Misc:

   - hfsplus: fix an uninitialized value issue in copy_name

   - minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even
     though we switched it to kmap_local_page() a while ago"

* tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  minixfs: Fix minixfs_rename with HIGHMEM
  hfsplus: fix uninit-value in copy_name
  vfs: don't mod negative dentry count when on shrinker list
  filelock: fix potential use-after-free in posix_lock_inode
  cachefiles: add missing lock protection when polling
  cachefiles: cyclic allocation of msg_id to avoid reuse
  cachefiles: wait for ondemand_object_worker to finish when dropping object
  cachefiles: cancel all requests for the object that is being dropped
  cachefiles: stop sending new request when dropping object
  cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop
  cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()
  cachefiles: fix slab-use-after-free in fscache_withdraw_volume()
  netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()
  netfs: Switch debug logging to pr_debug()
2024-07-11 09:03:28 -07:00
Filipe Manana
4484940514 btrfs: avoid races when tracking progress for extent map shrinking
We store the progress (root and inode numbers) of the extent map shrinker
in fs_info without any synchronization but we can have multiple tasks
calling into the shrinker during memory allocations when there's enough
memory pressure for example.

This can result in a task A reading fs_info->extent_map_shrinker_last_ino
after another task B updates it, and task A reading
fs_info->extent_map_shrinker_last_root before task B updates it, making
task A see an odd state that isn't necessarily harmful but may make it
skip certain inode ranges or do more work than necessary by going over
the same inodes again. These unprotected accesses would also trigger
warnings from tools like KCSAN.

So add a lock to protect access to these progress fields.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 16:50:54 +02:00
Filipe Manana
b3ebb9b7e9 btrfs: stop extent map shrinker if reschedule is needed
The extent map shrinker can be called in a variety of contexts where we
are under memory pressure, and of them is when a task is trying to
allocate memory. For this reason the shrinker is typically called with a
value of struct shrink_control::nr_to_scan that is much smaller than what
we return in the nr_cached_objects callback of struct super_operations
(fs/btrfs/super.c:btrfs_nr_cached_objects()), so that the shrinker does
not take a long time and cause high latencies. However we can still take
a lot of time in the shrinker even for a limited amount of nr_to_scan:

1) When traversing the red black tree that tracks open inodes in a root,
   as for example with millions of open inodes we get a deep tree which
   takes time searching for an inode;

2) Iterating over the extent map tree, which is a red black tree, of an
   inode when doing the rb_next() calls and when removing an extent map
   from the tree, since often that requires rebalancing the red black
   tree;

3) When trying to write lock an inode's extent map tree we may wait for a
   significant amount of time, because there's either another task about
   to do IO and searching for an extent map in the tree or inserting an
   extent map in the tree, and we can have thousands or even millions of
   extent maps for an inode. Furthermore, there can be concurrent calls
   to the shrinker so the lock might be busy simply because there is
   already another task shrinking extent maps for the same inode;

4) We often reschedule if we need to, which further increases latency.

So improve on this by stopping the extent map shrinking code whenever we
need to reschedule and make it skip an inode if we can't immediately lock
its extent map tree.

Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CABXGCsMmmb36ym8hVNGTiU8yfUS_cGvoUmGCcBrGWq9OxTrs+A@mail.gmail.com/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 16:45:42 +02:00
Filipe Manana
68a3ebd18b btrfs: use delayed iput during extent map shrinking
When putting an inode during extent map shrinking we're doing a standard
iput() but that may take a long time in case the inode is dirty and we are
doing the final iput that triggers eviction - the VFS will have to wait
for writeback before calling the btrfs evict callback (see
fs/inode.c:evict()).

This slows down the task running the shrinker which may have been
triggered while updating some tree for example, meaning locks are held
as well as an open transaction handle.

Also if the iput() ends up triggering eviction and the inode has no links
anymore, then we trigger item truncation which requires flushing delayed
items, space reservation to start a transaction and that may trigger the
space reclaim task and wait for it, resulting in deadlocks in case the
reclaim task needs for example to commit a transaction and the shrinker
is being triggered from a path holding a transaction handle.

Syzbot reported such a case with the following stack traces:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Not tainted
   ------------------------------------------------------
   kswapd0/111 is trying to acquire lock:
   ffff88801eae4610 (sb_internal#3){.+.+}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275

   but task is already holding lock:
   ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire mm/page_alloc.c:3783 [inline]
          fs_reclaim_acquire+0x102/0x160 mm/page_alloc.c:3797
          might_alloc include/linux/sched/mm.h:334 [inline]
          slab_pre_alloc_hook mm/slub.c:3890 [inline]
          slab_alloc_node mm/slub.c:3980 [inline]
          kmem_cache_alloc_lru_noprof+0x58/0x2f0 mm/slub.c:4019
          btrfs_alloc_inode+0x118/0xb20 fs/btrfs/inode.c:8411
          alloc_inode+0x5d/0x230 fs/inode.c:261
          iget5_locked fs/inode.c:1235 [inline]
          iget5_locked+0x1c9/0x2c0 fs/inode.c:1228
          btrfs_iget_locked fs/btrfs/inode.c:5590 [inline]
          btrfs_iget_path fs/btrfs/inode.c:5607 [inline]
          btrfs_iget+0xfb/0x230 fs/btrfs/inode.c:5636
          create_reloc_inode+0x403/0x820 fs/btrfs/relocation.c:3911
          btrfs_relocate_block_group+0x471/0xe60 fs/btrfs/relocation.c:4114
          btrfs_relocate_chunk+0x143/0x450 fs/btrfs/volumes.c:3373
          __btrfs_balance fs/btrfs/volumes.c:4157 [inline]
          btrfs_balance+0x211a/0x3f00 fs/btrfs/volumes.c:4534
          btrfs_ioctl_balance fs/btrfs/ioctl.c:3675 [inline]
          btrfs_ioctl+0x12ed/0x8290 fs/btrfs/ioctl.c:4742
          __do_compat_sys_ioctl+0x2c3/0x330 fs/ioctl.c:1007
          do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
          __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
          do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
          entry_SYSENTER_compat_after_hwframe+0x84/0x8e

   -> #2 (btrfs_trans_num_extwriters){++++}-{0:0}:
          join_transaction+0x164/0xf40 fs/btrfs/transaction.c:315
          start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700
          btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323
          btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999
          open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554
          btrfs_fill_super fs/btrfs/super.c:946 [inline]
          btrfs_get_tree_super fs/btrfs/super.c:1863 [inline]
          btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089
          vfs_get_tree+0x8f/0x380 fs/super.c:1780
          fc_mount+0x16/0xc0 fs/namespace.c:1125
          btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline]
          btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090
          vfs_get_tree+0x8f/0x380 fs/super.c:1780
          do_new_mount fs/namespace.c:3352 [inline]
          path_mount+0x6e1/0x1f10 fs/namespace.c:3679
          do_mount fs/namespace.c:3692 [inline]
          __do_sys_mount fs/namespace.c:3898 [inline]
          __se_sys_mount fs/namespace.c:3875 [inline]
          __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875
          do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
          __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
          do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
          entry_SYSENTER_compat_after_hwframe+0x84/0x8e

   -> #1 (btrfs_trans_num_writers){++++}-{0:0}:
          join_transaction+0x148/0xf40 fs/btrfs/transaction.c:314
          start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700
          btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323
          btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999
          open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554
          btrfs_fill_super fs/btrfs/super.c:946 [inline]
          btrfs_get_tree_super fs/btrfs/super.c:1863 [inline]
          btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089
          vfs_get_tree+0x8f/0x380 fs/super.c:1780
          fc_mount+0x16/0xc0 fs/namespace.c:1125
          btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline]
          btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090
          vfs_get_tree+0x8f/0x380 fs/super.c:1780
          do_new_mount fs/namespace.c:3352 [inline]
          path_mount+0x6e1/0x1f10 fs/namespace.c:3679
          do_mount fs/namespace.c:3692 [inline]
          __do_sys_mount fs/namespace.c:3898 [inline]
          __se_sys_mount fs/namespace.c:3875 [inline]
          __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875
          do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
          __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
          do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
          entry_SYSENTER_compat_after_hwframe+0x84/0x8e

   -> #0 (sb_internal#3){.+.+}-{0:0}:
          check_prev_add kernel/locking/lockdep.c:3134 [inline]
          check_prevs_add kernel/locking/lockdep.c:3253 [inline]
          validate_chain kernel/locking/lockdep.c:3869 [inline]
          __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137
          lock_acquire kernel/locking/lockdep.c:5754 [inline]
          lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719
          percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
          __sb_start_write include/linux/fs.h:1655 [inline]
          sb_start_intwrite include/linux/fs.h:1838 [inline]
          start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694
          btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275
          btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291
          evict+0x2ed/0x6c0 fs/inode.c:667
          iput_final fs/inode.c:1741 [inline]
          iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
          iput+0x5c/0x80 fs/inode.c:1757
          btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline]
          btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189
          super_cache_scan+0x409/0x550 fs/super.c:227
          do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
          shrink_slab+0x18a/0x1310 mm/shrinker.c:662
          shrink_one+0x493/0x7c0 mm/vmscan.c:4790
          shrink_many mm/vmscan.c:4851 [inline]
          lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951
          shrink_node mm/vmscan.c:5910 [inline]
          kswapd_shrink_node mm/vmscan.c:6720 [inline]
          balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911
          kswapd+0x5ea/0xbf0 mm/vmscan.c:7180
          kthread+0x2c1/0x3a0 kernel/kthread.c:389
          ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
          ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

   other info that might help us debug this:

   Chain exists of:
     sb_internal#3 --> btrfs_trans_num_extwriters --> fs_reclaim

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(fs_reclaim);
                                  lock(btrfs_trans_num_extwriters);
                                  lock(fs_reclaim);
     rlock(sb_internal#3);

    *** DEADLOCK ***

   2 locks held by kswapd0/111:
    #0: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924
    #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_trylock_shared fs/super.c:562 [inline]
    #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_cache_scan+0x96/0x550 fs/super.c:196

   stack backtrace:
   CPU: 0 PID: 111 Comm: kswapd0 Not tainted 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
   Call Trace:
    <TASK>
    __dump_stack lib/dump_stack.c:88 [inline]
    dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
    check_noncircular+0x31a/0x400 kernel/locking/lockdep.c:2187
    check_prev_add kernel/locking/lockdep.c:3134 [inline]
    check_prevs_add kernel/locking/lockdep.c:3253 [inline]
    validate_chain kernel/locking/lockdep.c:3869 [inline]
    __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137
    lock_acquire kernel/locking/lockdep.c:5754 [inline]
    lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719
    percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
    __sb_start_write include/linux/fs.h:1655 [inline]
    sb_start_intwrite include/linux/fs.h:1838 [inline]
    start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694
    btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275
    btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291
    evict+0x2ed/0x6c0 fs/inode.c:667
    iput_final fs/inode.c:1741 [inline]
    iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
    iput+0x5c/0x80 fs/inode.c:1757
    btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline]
    btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189
    super_cache_scan+0x409/0x550 fs/super.c:227
    do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
    shrink_slab+0x18a/0x1310 mm/shrinker.c:662
    shrink_one+0x493/0x7c0 mm/vmscan.c:4790
    shrink_many mm/vmscan.c:4851 [inline]
    lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951
    shrink_node mm/vmscan.c:5910 [inline]
    kswapd_shrink_node mm/vmscan.c:6720 [inline]
    balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911
    kswapd+0x5ea/0xbf0 mm/vmscan.c:7180
    kthread+0x2c1/0x3a0 kernel/kthread.c:389
    ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    </TASK>

So fix this by using btrfs_add_delayed_iput() so that the final iput is
delegated to the cleaner kthread.

Link: https://lore.kernel.org/linux-btrfs/000000000000892280061a344581@google.com/
Reported-by: syzbot+3dad89b3993a4b275e72@syzkaller.appspotmail.com
Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11 16:45:18 +02:00
Linus Torvalds
9d9a2f29ae 21 hotfixes, 15 of which are cc:stable.
No identifiable theme here - all are singleton patches, 19 are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZo7tTQAKCRDdBJ7gKXxA
 jvhZAP977PnAwQH5khIS3xJxZrqx/+Tho7UPZzQPvHJPRpHorAD/TZfDazGtlPMD
 uLPEVslh18rks/w+kddLrnlBnkpUMwY=
 =vhts
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes, 15 of which are cc:stable.

  No identifiable theme here - all are singleton patches, 19 are for MM"

* tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()
  filemap: replace pte_offset_map() with pte_offset_map_nolock()
  arch/xtensa: always_inline get_current() and current_thread_info()
  sched.h: always_inline alloc_tag_{save|restore} to fix modpost warnings
  MAINTAINERS: mailmap: update Lorenzo Stoakes's email address
  mm: fix crashes from deferred split racing folio migration
  lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat
  mm: gup: stop abusing try_grab_folio
  nilfs2: fix kernel bug on rename operation of broken directory
  mm/hugetlb_vmemmap: fix race with speculative PFN walkers
  cachestat: do not flush stats in recency check
  mm/shmem: disable PMD-sized page cache if needed
  mm/filemap: skip to create PMD-sized page cache if needed
  mm/readahead: limit page cache size in page_cache_ra_order()
  mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
  mm/damon/core: merge regions aggressively when max_nr_regions is unmet
  Fix userfaultfd_api to return EINVAL as expected
  mm: vmalloc: check if a hash-index is in cpu_possible_mask
  mm: prevent derefencing NULL ptr in pfn_section_valid()
  ...
2024-07-10 14:59:41 -07:00
Linus Torvalds
f6963ab4b0 bcachefs fixes for 6.10-rc8
- Switch some asserts to WARN()
 - Fix a few "transaction not locked" asserts in the data read retry
   paths and backpointers gc
 - Fix a race that would cause the journal to get stuck on a flush commit
 - Add missing fsck checks for the fragmentation LRU
 - The usual assorted ssorted syzbot fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmaOuRwACgkQE6szbY3K
 bnaCHhAAi9VRqws+zx3fSpe2OMwWqAEWA84QgIFJccy+I86d7dXkqG389gFqJwMG
 9S3BUHP1WooJmpsTRhK5cNtxZuKKOajXlxUYz3onsF7O/U3dHFY5GU7yIIjXS/0o
 q7+iryWAJ4MmlOrAJhgPMH/WlhbSVsjANUN0n/NhlOWHccFGHmpdMTb6aYzb+lfL
 iZOONKmEOR65gLzZYlO323OB2Tv00iEbOZAtxk68BLZYX+WON/j1T1A8gK4G0XSX
 8wcYpXNxGGkCufjBfAbXf4mcp/WygQq0Wj3bdVMFkZ+AwSJDcfGeK1H7f6tJ9e4n
 lqfWL4tgWIckS+41sA96B5cYry9TMDdhu3IeFaAm0ZrF55JT1JySGE1GNA+mo6xA
 mkMAqhG7rwYh6nSJfWX0Ie+zJ9TFbmi05ZbI7jaTuQjnJ5uvPpTuRfBDi+qSWmoi
 +IBDAi9hZgCUNEsLRGDm7RDQo0dpbFo6jpArn1RHK4MO/HkTrqcKpTqiGnfwFAU4
 PFxwq5G9+d38+M6YMX0tXdfQ+fdxroA6aIBJSsIpF18tPRBOBlQsM2GFP34uHbyk
 L6HOzed2QpM5ExBmViX79F+obuDQ/gzXQszYvDKL4QTFNbx43gPWRDrGm8EQen6y
 12EScamXbUWBSWnOqxscmeUsTdTKxLfw/F43JbE2fE7jSxc5tss=
 =VGT8
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:

 - Switch some asserts to WARN()

 - Fix a few "transaction not locked" asserts in the data read retry
   paths and backpointers gc

 - Fix a race that would cause the journal to get stuck on a flush
   commit

 - Add missing fsck checks for the fragmentation LRU

 - The usual assorted ssorted syzbot fixes

* tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs: (22 commits)
  bcachefs: Add missing bch2_trans_begin()
  bcachefs: Fix missing error check in journal_entry_btree_keys_validate()
  bcachefs: Warn on attempting a move with no replicas
  bcachefs: bch2_data_update_to_text()
  bcachefs: Log mount failure error code
  bcachefs: Fix undefined behaviour in eytzinger1_first()
  bcachefs: Mark bch_inode_info as SLAB_ACCOUNT
  bcachefs: Fix bch2_inode_insert() race path for tmpfiles
  closures: fix closure_sync + closure debugging
  bcachefs: Fix journal getting stuck on a flush commit
  bcachefs: io clock: run timer fns under clock lock
  bcachefs: Repair fragmentation_lru in alloc_write_key()
  bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref()
  bcachefs: bch2_btree_write_buffer_maybe_flush()
  bcachefs: Add missing printbuf_tabstops_reset() calls
  bcachefs: Fix loop restart in bch2_btree_transactions_read()
  bcachefs: Fix bch2_read_retry_nodecode()
  bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs
  bcachefs: Fix shift greater than integer size
  bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning
  ...
2024-07-10 11:50:16 -07:00
Kent Overstreet
fd80d14005 bcachefs: fix scheduling while atomic in break_cycle()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 12:59:28 -04:00
Kent Overstreet
6f692b1672 bcachefs: Fix RCU splat
Reported-by: syzbot+e74fea078710bbca6f4b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 12:46:22 -04:00
Kent Overstreet
7d7f71cd87 bcachefs: Add missing bch2_trans_begin()
this fixes a 'transaction should be locked' error in backpointers fsck

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
0f6f8f7693 bcachefs: Fix missing error check in journal_entry_btree_keys_validate()
Closes: https://syzkaller.appspot.com/bug?extid=8996d8f176cf946ef641
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
f49d2c9835 bcachefs: Warn on attempting a move with no replicas
Instead of popping an assert in bch2_write(), WARN and print out some
debugging info.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
ad8b68cd39 bcachefs: bch2_data_update_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
0f1f7324da bcachefs: Log mount failure error code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
8ed58789fc bcachefs: Fix undefined behaviour in eytzinger1_first()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Youling Tang
86d81ec5f5 bcachefs: Mark bch_inode_info as SLAB_ACCOUNT
After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark
the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg:
account for certain kmem allocations to memcg")

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
b02f973e67 bcachefs: Fix bch2_inode_insert() race path for tmpfiles
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
0435773239 bcachefs: Fix journal getting stuck on a flush commit
silly race

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00