linux

iv/linux

Author	SHA1	Message	Date
Konstantin Komarov	ff9400d321	fs/ntfs3: Keep runs for $MFT::$ATTR_DATA and $MFT::$ATTR_BITMAP [ Upstream commit `eb95678ee9` ] We skip the run_truncate_head call also for $MFT::$ATTR_BITMAP. Otherwise wnd_map()/run_lookup_entry will not find the disk position for the bitmap parts. Fixes: `0e5b044cbf` ("fs/ntfs3: Refactoring attr_set_size to restore after errors") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:25 +02:00
Konstantin Komarov	73e38cb683	fs/ntfs3: Missed error return [ Upstream commit `2cbbd96820` ] Fixes: `3f3b442b5a` ("fs/ntfs3: Add bitmap") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:25 +02:00
Konstantin Komarov	2e83375fd9	fs/ntfs3: Fix the format of the "nocase" mount option [ Upstream commit `d392e85fd1` ] The 'nocase' option was mistakenly added as fsparam_flag_no with the 'no' prefix, causing the case-insensitive mode to require the 'nonocase' option to be enabled. Fixes: `a3a956c78e` ("fs/ntfs3: Add option "nocase"") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:25 +02:00
Ryusuke Konishi	31be02ef4e	nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro [ Upstream commit `0f3819e8c4` ] According to the C standard 3.4.3p3, the result of signed integer overflow is undefined. The macro nilfs_cnt32_ge(), which compares two sequence numbers, uses signed integer subtraction that can overflow, and therefore the result of the calculation may differ from what is expected due to undefined behavior in different environments. Similar to an earlier change to the jiffies-related comparison macros in commit `5a581b367b` ("jiffies: Avoid undefined behavior from signed overflow"), avoid this potential issue by changing the definition of the macro to perform the subtraction as unsigned integers, then cast the result to a signed integer for comparison. Link: https://lkml.kernel.org/r/20130727225828.GA11864@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/20240702183512.6390-1-konishi.ryusuke@gmail.com Fixes: `9ff05123e3` ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:25 +02:00
David Hildenbrand	16a767c969	fs/proc/task_mmu: properly detect PM_MMAP_EXCLUSIVE per page of PMD-mapped THPs [ Upstream commit `2c1f057e5b` ] We added PM_MMAP_EXCLUSIVE in 2015 via commit `77bb499bb6` ("pagemap: add mmap-exclusive bit for marking pages mapped only here"), when THPs could not be partially mapped and page_mapcount() returned something that was true for all pages of the THP. In 2016, we added support for partially mapping THPs via commit `53f9263bab` ("mm: rework mapcount accounting to enable 4k mapping of THPs") but missed to determine PM_MMAP_EXCLUSIVE as well per page. Checking page_mapcount() on the head page does not tell the whole story. We should check each individual page. In a future without per-page mapcounts it will be different, but we'll change that to be consistent with PTE-mapped THPs once we deal with that. Link: https://lkml.kernel.org/r/20240607122357.115423-4-david@redhat.com Fixes: `53f9263bab` ("mm: rework mapcount accounting to enable 4k mapping of THPs") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:24 +02:00
David Hildenbrand	65c52a91e3	fs/proc/task_mmu: don't indicate PM_MMAP_EXCLUSIVE without PM_PRESENT [ Upstream commit `da7f31ed0f` ] Relying on the mapcount for non-present PTEs that reference pages doesn't make any sense: they are not accounted in the mapcount, so page_mapcount() == 1 won't return the result we actually want to know. While we don't check the mapcount for migration entries already, we could end up checking it for swap, hwpoison, device exclusive, ... entries, which we really shouldn't. There is one exception: device private entries, which we consider fake-present (e.g., incremented the mapcount). But we won't care about that for now for PM_MMAP_EXCLUSIVE, because indicating PM_SWAP for them although they are fake-present already sounds suspiciously wrong. Let's never indicate PM_MMAP_EXCLUSIVE without PM_PRESENT. Link: https://lkml.kernel.org/r/20240607122357.115423-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lance Yang <ioworker0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: `2c1f057e5b` ("fs/proc/task_mmu: properly detect PM_MMAP_EXCLUSIVE per page of PMD-mapped THPs") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:24 +02:00
David Hildenbrand	680f6ceee8	fs/proc/task_mmu: indicate PM_FILE for PMD-mapped file THP [ Upstream commit `3f9f022e97` ] Patch series "fs/proc: move page_mapcount() to fs/proc/internal.h". With all other page_mapcount() users in the tree gone, move page_mapcount() to fs/proc/internal.h, rename it and extend the documentation to prevent future (ab)use. ... of course, I find some issues while working on that code that I sort first ;) We'll now only end up calling page_mapcount() [now folio_precise_page_mapcount()] on pages mapped via present page table entries. Except for /proc/kpagecount, that still does questionable things, but we'll leave that legacy interface as is for now. Did a quick sanity check. Likely we would want some better selfestest for /proc/$/pagemap + smaps. I'll see if I can find some time to write some more. This patch (of 6): Looks like we never taught pagemap_pmd_range() about the existence of PMD-mapped file THPs. Seems to date back to the times when we first added support for non-anon THPs in the form of shmem THP. Link: https://lkml.kernel.org/r/20240607122357.115423-1-david@redhat.com Link: https://lkml.kernel.org/r/20240607122357.115423-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Fixes: `800d8c63b2` ("shmem: add huge pages support") Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:24 +02:00
Konstantin Komarov	c41eadba18	fs/ntfs3: Fix field-spanning write in INDEX_HDR [ Upstream commit `2f3e176fee` ] Fields flags and res[3] replaced with one 4 byte flags. Fixes: `4534a70b70` ("fs/ntfs3: Add headers and misc files") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:23 +02:00
Andy Shevchenko	d7d3cbb231	fs/ntfs3: Drop stray '\' (backslash) in formatting string [ Upstream commit `b366809dd1` ] CHECK /home/andy/prj/linux-topic-uart/fs/ntfs3/super.c fs/ntfs3/super.c:471:23: warning: unknown escape sequence: '\%' Drop stray '\' (backslash) in formatting string. Fixes: `d27e202b9a` ("fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:23 +02:00
Konstantin Komarov	8e0f398c6f	fs/ntfs3: Correct undo if ntfs_create_inode failed [ Upstream commit `f28d0866d8` ] Clusters allocated for Extended Attributes, must be freed when rolling back inode creation. Fixes: `82cae269cf` ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:22 +02:00
Konstantin Komarov	d5ad80aabd	fs/ntfs3: Replace inode_trylock with inode_lock [ Upstream commit `69505fe98f` ] The issue was detected due to xfstest 465 failing. Fixes: `4342306f0f` ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:22 +02:00
Konstantin Komarov	edfef987bc	fs/ntfs3: Add missing .dirty_folio in address_space_operations [ Upstream commit `0f9579d9e0` ] After switching from pages to folio [1], it became evident that the initialization of .dirty_folio for page cache operations was missed for compressed files. [1] https://lore.kernel.org/ntfs3/20240422193203.3534108-1-willy@infradead.org Fixes: `82cae269cf` ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:21 +02:00
Konstantin Komarov	0bbd61e1ac	fs/ntfs3: Fix getting file type [ Upstream commit `24c5100ace` ] An additional condition causes the mft record to be read from disk and get the file type dt_type. Fixes: `22457c047e` ("fs/ntfs3: Modified fix directory element type detection") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:21 +02:00
Konstantin Komarov	2f7236ba9f	fs/ntfs3: Missed NI_FLAG_UPDATE_PARENT setting [ Upstream commit `1c308ace1f` ] Fixes: `be71b5cba2` ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:21 +02:00
Konstantin Komarov	539bd10ab6	fs/ntfs3: Deny getting attr data block in compressed frame [ Upstream commit `69943484b9` ] Attempting to retrieve an attribute data block in a compressed frame is ignored. Fixes: `be71b5cba2` ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:21 +02:00
Konstantin Komarov	ff73b91873	fs/ntfs3: Fix transform resident to nonresident for compressed files [ Upstream commit `25610ff98d` ] Сorrected calculation of required space len (in clusters) for attribute data storage in case of compression. Fixes: `be71b5cba2` ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:21 +02:00
Konstantin Komarov	a62241fb49	fs/ntfs3: Merge synonym COMPRESSION_UNIT and NTFS_LZNT_CUNIT [ Upstream commit `487f8d482a` ] COMPRESSION_UNIT and NTFS_LZNT_CUNIT mean the same thing (1u<<NTFS_LZNT_CUNIT) determines the size for compression (in clusters). COMPRESS_MAX_CLUSTER is not used in the code. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Stable-dep-of: `25610ff98d` ("fs/ntfs3: Fix transform resident to nonresident for compressed files") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:21 +02:00
Christoph Hellwig	387e6e9d11	nfs: pass explicit offset/count to trace events [ Upstream commit `fada32ed6d` ] nfs_folio_length is unsafe to use without having the folio locked and a check for a NULL ->f_mapping that protects against truncations and can lead to kernel crashes. E.g. when running xfstests generic/065 with all nfs trace points enabled. Follow the model of the XFS trace points and pass in an explіcit offset and length. This has the additional benefit that these values can be more accurate as some of the users touch partial folio ranges. Fixes: `eb5654b3b8` ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:05 +02:00
Jan Kara	024d8556f3	ext4: avoid writing unitialized memory to disk in EA inodes [ Upstream commit `65121eff3e` ] If the extended attribute size is not a multiple of block size, the last block in the EA inode will have uninitialized tail which will get written to disk. We will never expose the data to userspace but still this is not a good practice so just zero out the tail of the block as it isn't going to cause a noticeable performance overhead. Fixes: `e50e5129f3` ("ext4: xattr-in-inode support") Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:04 +02:00
Luis Henriques (SUSE)	149a0691bf	ext4: don't track ranges in fast_commit if inode has inlined data [ Upstream commit `7882b0187b` ] When fast-commit needs to track ranges, it has to handle inodes that have inlined data in a different way because ext4_fc_write_inode_data(), in the actual commit path, will attempt to map the required blocks for the range. However, inodes that have inlined data will have it's data stored in inode->i_block and, eventually, in the extended attribute space. Unfortunately, because fast commit doesn't currently support extended attributes, the solution is to mark this commit as ineligible. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039883 Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Tested-by: Ben Hutchings <benh@debian.org> Fixes: `9725958bb7` ("ext4: fast commit may miss tracking unwritten range during ftruncate") Link: https://patch.msgid.link/20240618144312.17786-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:04 +02:00
Olga Kornievskaia	4ebf5bfd34	NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server [ Upstream commit `4840c00003` ] Previously in order to mark the communication with the DS server, we tried to use NFS_CS_DS in cl_flags. However, this flag would only be saved for the DS server and in case where DS equals MDS, the client would not find a matching nfs_client in nfs_match_client that represents the MDS (but is also a DS). Instead, don't rely on the NFS_CS_DS but instead use NFS_CS_PNFS. Fixes: `379e4adfdd` ("NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS server") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:04 +02:00
Luis Henriques (SUSE)	81f819c537	ext4: fix infinite loop when replaying fast_commit [ Upstream commit `907c3fe532` ] When doing fast_commit replay an infinite loop may occur due to an uninitialized extent_status struct. ext4_ext_determine_insert_hole() does not detect the replay and calls ext4_es_find_extent_range(), which will return immediately without initializing the 'es' variable. Because 'es' contains garbage, an integer overflow may happen causing an infinite loop in this function, easily reproducible using fstest generic/039. This commit fixes this issue by unconditionally initializing the structure in function ext4_es_find_extent_range(). Thanks to Zhang Yi, for figuring out the real problem! Fixes: `8016e29f43` ("ext4: fast commit recovery path") Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240515082857.32730-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 09:00:02 +02:00
Jeff Layton	4799e4e51f	nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument [ Upstream commit `769d20028f` ] "data" actually refers to a file_lease and not a file_lock. Both structs have their file_lock_core as the first field though, so this bug should be harmless without struct randomization in play. Reported-by: Florian Evers <florian-evers@gmx.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219008 Fixes: `05580bbfc6` ("nfsd: adapt to breakup of struct file_lock") Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Florian Evers <florian-evers@gmx.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:59:46 +02:00
Chuck Lever	65b97d6f32	NFSD: Fix nfsdcld warning [ Upstream commit `18a5450684` ] Since CONFIG_NFSD_LEGACY_CLIENT_TRACKING is a new config option, its initial default setting should have been Y (if we are to follow the common practice of "default Y, wait, default N, wait, remove code"). Paul also suggested adding a clearer remedy action to the warning message. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Message-Id: <d2ab4ee7-ba0f-44ac-b921-90c8fa5a04d2@molgen.mpg.de> Fixes: `74fd48739d` ("nfsd: new Kconfig option for legacy client tracking") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:59:44 +02:00
Jan Kara	40d7b3ed52	udf: Fix bogus checksum computation in udf_rename() [ Upstream commit `27ab338548` ] Syzbot reports uninitialized memory access in udf_rename() when updating checksum of '..' directory entry of a moved directory. This is indeed true as we pass on-stack diriter.fi to the udf_update_tag() and because that has only struct fileIdentDesc included in it and not the impUse or name fields, the checksumming function is going to checksum random stack contents beyond the end of the structure. This is actually harmless because the following udf_fiiter_write_fi() will recompute the checksum from on-disk buffers where everything is properly included. So all that is needed is just removing the bogus calculation. Fixes: `e9109a92d2` ("udf: Convert udf_rename() to new directory iteration code") Link: https://lore.kernel.org/all/000000000000cf405f060d8f75a9@google.com/T/ Link: https://patch.msgid.link/20240617154201.29512-1-jack@suse.cz Reported-by: syzbot+d31185aa54170f7fc1f5@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:59:39 +02:00
Jan Kara	50b556a1ce	udf: Fix lock ordering in udf_evict_inode() [ Upstream commit `8832fc1e50` ] udf_evict_inode() calls udf_setsize() to truncate deleted inode. However inode deletion through udf_evict_inode() can happen from inode reclaim context and udf_setsize() grabs mapping->invalidate_lock which isn't generally safe to acquire from fs reclaim context since we allocate pages under mapping->invalidate_lock for example in a page fault path. This is however not a real deadlock possibility as by the time udf_evict_inode() is called, nobody can be accessing the inode, even less work with its page cache. So this is just a lockdep triggering false positive. Fix the problem by moving mapping->invalidate_lock locking outsize of udf_setsize() into udf_setattr() as grabbing mapping->invalidate_lock from udf_evict_inode() is pointless. Reported-by: syzbot+0333a6f4b88bcd68a62f@syzkaller.appspotmail.com Fixes: `b9a861fd52` ("udf: Protect truncate and file type conversion with invalidate_lock") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:59:36 +02:00
Chao Yu	312d272dc4	hfsplus: fix to avoid false alarm of circular locking [ Upstream commit `be4edd1642` ] Syzbot report potential ABBA deadlock as below: loop0: detected capacity change from 0 to 1024 ====================================================== WARNING: possible circular locking dependency detected 6.9.0-syzkaller-10323-g8f6a15f095a6 #0 Not tainted ------------------------------------------------------ syz-executor171/5344 is trying to acquire lock: ffff88807cb980b0 (&tree->tree_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595 but task is already holding lock: ffff88807a930108 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x2da/0xb50 fs/hfsplus/extents.c:576 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 hfsplus_file_extend+0x21b/0x1b70 fs/hfsplus/extents.c:457 hfsplus_bmap_reserve+0x105/0x4e0 fs/hfsplus/btree.c:358 hfsplus_rename_cat+0x1d0/0x1050 fs/hfsplus/catalog.c:456 hfsplus_rename+0x12e/0x1c0 fs/hfsplus/dir.c:552 vfs_rename+0xbdb/0xf00 fs/namei.c:4887 do_renameat2+0xd94/0x13f0 fs/namei.c:5044 __do_sys_rename fs/namei.c:5091 [inline] __se_sys_rename fs/namei.c:5089 [inline] __x64_sys_rename+0x86/0xa0 fs/namei.c:5089 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&tree->tree_lock){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595 hfsplus_setattr+0x1ce/0x280 fs/hfsplus/inode.c:265 notify_change+0xb9d/0xe70 fs/attr.c:497 do_truncate+0x220/0x310 fs/open.c:65 handle_truncate fs/namei.c:3308 [inline] do_open fs/namei.c:3654 [inline] path_openat+0x2a3d/0x3280 fs/namei.c:3807 do_filp_open+0x235/0x490 fs/namei.c:3834 do_sys_openat2+0x13e/0x1d0 fs/open.c:1406 do_sys_open fs/open.c:1421 [inline] __do_sys_creat fs/open.c:1497 [inline] __se_sys_creat fs/open.c:1491 [inline] __x64_sys_creat+0x123/0x170 fs/open.c:1491 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&HFSPLUS_I(inode)->extents_lock); lock(&tree->tree_lock); lock(&HFSPLUS_I(inode)->extents_lock); lock(&tree->tree_lock); This is a false alarm as tree_lock mutex are different, one is from sbi->cat_tree, and another is from sbi->ext_tree: Thread A Thread B - hfsplus_rename - hfsplus_rename_cat - hfs_find_init - mutext_lock(cat_tree->tree_lock) - hfsplus_setattr - hfsplus_file_truncate - mutex_lock(hip->extents_lock) - hfs_find_init - mutext_lock(ext_tree->tree_lock) - hfs_bmap_reserve - hfsplus_file_extend - mutex_lock(hip->extents_lock) So, let's call mutex_lock_nested for tree_lock mutex lock, and pass correct lock class for it. Fixes: `31651c6071` ("hfsplus: avoid deadlock on file truncation") Reported-by: syzbot+6030b3b1b9bf70e538c4@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-fsdevel/000000000000e37a4005ef129563@google.com Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240607142304.455441-1-chao@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-03 08:59:11 +02:00
Jann Horn	ed898f9ca3	filelock: Fix fcntl/close race recovery compat path commit `f8138f2ad2` upstream. When I wrote commit `3cad1bc010` ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: `c293621bbf` ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-27 11:40:36 +02:00
lei lu	617cf144c2	fs/ntfs3: Validate ff offset commit `50c4787965` upstream. This adds sanity checks for ff offset. There is a check on rt->first_free at first, but walking through by ff without any check. If the second ff is a large offset. We may encounter an out-of-bound read. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-27 11:40:32 +02:00
Konstantin Komarov	9b71f820f7	fs/ntfs3: Add a check for attr_names and oatbl commit `702d4930eb` upstream. Added out-of-bound checking for *ane (ATTR_NAME_ENTRY). Reported-by: lei lu <llfamsec@gmail.com> Fixes: `865e7a7700` ("fs/ntfs3: Reduce stack usage") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-27 11:40:32 +02:00
lei lu	dbde7bc910	jfs: don't walk off the end of ealist commit `d0fa70aca5` upstream. Add a check before visiting the members of ea to make sure each ea stays within the ealist. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-27 11:40:32 +02:00
lei lu	edb2e67dd4	ocfs2: add bounds checking to ocfs2_check_dir_entry() commit `255547c6bb` upstream. This adds sanity checks for ocfs2_dir_entry to make sure all members of ocfs2_dir_entry don't stray beyond valid memory region. Link: https://lkml.kernel.org/r/20240626104433.163270-1-llfamsec@gmail.com Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-27 11:40:32 +02:00
David Howells	fafd1dcc84	cifs: Fix setting of zero_point after DIO write commit `61ea6b3a31` upstream. At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to adjust the size of the file if it needs it. This will reduce the zero_point (the point above which we assume a read will just return zeros) if it's more than the new i_size, but won't increase it. With DIO writes, however, we definitely want to increase it as we have clobbered the local pagecache and then written some data that's not available locally. Fix cifs to make the zero_point above the end of a DIO or unbuffered write. This fixes corruption seen occasionally with the generic/708 xfs-test. In that case, the read-back of some of the written data is being short-circuited and replaced with zeroes. Fixes: `3ee1a1fc39` ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by: Steve French <sfrench@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-24 15:54:06 +02:00
David Howells	b1d0a56676	cifs: Fix server re-repick on subrequest retry commit `de40579b90` upstream. When a subrequest is marked for needing retry, netfs will call cifs_prepare_write() which will make cifs repick the server for the op before renegotiating credits; it then calls cifs_issue_write() which invokes smb2_async_writev() - which re-repicks the server. If a different server is then selected, this causes the increment of server->in_flight to happen against one record and the decrement to happen against another, leading to misaccounting. Fix this by just removing the repick code in smb2_async_writev(). As this is only called from netfslib-driven code, cifs_prepare_write() should always have been called first, and so server should never be NULL and the preparatory step is repeated in the event that we do a retry. The problem manifests as a warning looking something like: WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] ... RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] ... smb2_writev_callback+0x334/0x560 [cifs] cifs_demultiplex_thread+0x77a/0x11b0 [cifs] kthread+0x187/0x1d0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 Which may be triggered by a number of different xfstests running against an Azure server in multichannel mode. generic/249 seems the most repeatable, but generic/215, generic/249 and generic/308 may also show it. Fixes: `3ee1a1fc39` ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by: Steve French <smfrench@gmail.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Aurelien Aptel <aaptel@suse.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-24 15:54:06 +02:00
Steve French	b5347b051d	cifs: fix noisy message on copy_file_range commit `ae4ccca471` upstream. There are common cases where copy_file_range can noisily log "source and target of copy not on same server" e.g. the mv command across mounts to two different server's shares. Change this to informational rather than logging as an error. A followon patch will add dynamic trace points e.g. for cifs_file_copychunk_range Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-24 15:54:06 +02:00
David Howells	5c0a6c40c2	cifs: Fix missing fscache invalidation commit `a07d38afd1` upstream. A network filesystem needs to implement a netfslib hook to invalidate fscache if it's to be able to use the cache. Fix cifs to implement the cache invalidation hook. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: `3ee1a1fc39` ("cifs: Cut over to using netfslib") Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-24 15:54:06 +02:00
David Howells	447c00d76e	cifs: Fix missing error code set commit `d2c5eb57b6` upstream. In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by a successful return from netfs_start_io_direct(), such that if cifs_find_lock_conflict() fails, we don't return an error. Fix this by resetting the default error code. Fixes: `14b1cd2534` ("cifs: Fix locking in cifs_strict_readv()") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-24 15:54:05 +02:00
Kees Cook	4378be89dd	ext4: use memtostr_pad() for s_volume_name commit `be27cd6446` upstream. As with the other strings in struct ext4_super_block, s_volume_name is not NUL terminated. The other strings were marked in commit `072ebb3bff` ("ext4: add nonstring annotations to ext4.h"). Using strscpy() isn't the right replacement for strncpy(); it should use memtostr_pad() instead. Reported-by: syzbot+50835f73143cc2905b9e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000019f4c00619192c05@google.com/ Fixes: `744a56389f` ("ext4: replace deprecated strncpy with alternatives") Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://patch.msgid.link/20240523225408.work.904-kees@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-24 15:54:05 +02:00
Linus Torvalds	d0d0cd3800	Merge tag '6.10-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fix from Steve French: "Small fix, also for stable" * tag '6.10-rc7-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix setting SecurityFlags to true	2024-07-13 13:00:25 -07:00
Steve French	d2346e2836	cifs: fix setting SecurityFlags to true If you try to set /proc/fs/cifs/SecurityFlags to 1 it will set them to CIFSSEC_MUST_NTLMV2 which no longer is relevant (the less secure ones like lanman have been removed from cifs.ko) and is also missing some flags (like for signing and encryption) and can even cause mount to fail, so change this to set it to Kerberos in this case. Also change the description of the SecurityFlags to remove mention of flags which are no longer supported. Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-13 09:24:27 -05:00
Linus Torvalds	975f3b6da1	Merge tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fix a regression in extent map shrinker behaviour. In the past weeks we got reports from users that there are huge latency spikes or freezes. This was bisected to newly added shrinker of extent maps (it was added to fix a build up of the structures in memory). I'm assuming that the freezes would happen to many users after release so I'd like to get it merged now so it's in 6.10. Although the diff size is not small the changes are relatively straightforward, the reporters verified the fixes and we did testing on our side. The fixes: - adjust behaviour under memory pressure and check lock or scheduling conditions, bail out if needed - synchronize tracking of the scanning progress so inode ranges are not skipped or work duplicated - do a delayed iput when scanning a root so evicting an inode does not slow things down in case of lots of dirty data, also fix lockdep warning, a deadlock could happen when writing the dirty data would need to start a transaction" * tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid races when tracking progress for extent map shrinking btrfs: stop extent map shrinker if reschedule is needed btrfs: use delayed iput during extent map shrinking	2024-07-12 12:08:42 -07:00
Linus Torvalds	5d4c85134b	Merge tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs Pull more bcachefs fixes from Kent Overstreet: - revert the SLAB_ACCOUNT patch, something crazy is going on in memcg and someone forgot to test - minor fixes: missing rcu_read_lock(), scheduling while atomic (in an emergency shutdown path) - two lockdep fixes; these could have gone earlier, but were left to bake awhile * tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs: bcachefs: bch2_gc_btree() should not use btree_root_lock bcachefs: Set PF_MEMALLOC_NOFS when trans->locked bcachefs; Use trans_unlock_long() when waiting on allocator Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT" bcachefs: fix scheduling while atomic in break_cycle() bcachefs: Fix RCU splat	2024-07-12 08:22:43 -07:00
Kent Overstreet	1841027c7d	bcachefs: bch2_gc_btree() should not use btree_root_lock btree_root_lock is for the root keys in btree_root, not the pointers to the nodes themselves; this fixes a lock ordering issue between btree_root_lock and btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	f236ea4bca	bcachefs: Set PF_MEMALLOC_NOFS when trans->locked proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	f0f3e51148	bcachefs; Use trans_unlock_long() when waiting on allocator not using unlock_long() blocks key cache reclaim, and the allocator may take awhile Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	aacd897d4d	Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT" This reverts commit `86d81ec5f5`. This wasn't tested with memcg enabled, it immediately hits a null ptr deref in list_lru_add(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:01:38 -04:00
Linus Torvalds	83ab4b461e	Merge tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "cachefiles: - Export an existing and add a new cachefile helper to be used in filesystems to fix reference count bugs - Use the newly added fscache_ty_get_volume() helper to get a reference count on an fscache_volume to handle volumes that are about to be removed cleanly - After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN wait for all ongoing cookie lookups to complete and for the object count to reach zero - Propagate errors from vfs_getxattr() to avoid an infinite loop in cachefiles_check_volume_xattr() because it keeps seeing ESTALE - Don't send new requests when an object is dropped by raising CACHEFILES_ONDEMAND_OJBSTATE_DROPPING - Cancel all requests for an object that is about to be dropped - Wait for the ondemand_boject_worker to finish before dropping a cachefiles object to prevent use-after-free - Use cyclic allocation for message ids to better handle id recycling - Add missing lock protection when iterating through the xarray when polling netfs: - Use standard logging helpers for debug logging VFS: - Fix potential use-after-free in file locks during trace_posix_lock_inode(). The tracepoint could fire while another task raced it and freed the lock that was requested to be traced - Only increment the nr_dentry_negative counter for dentries that are present on the superblock LRU. Currently, DCACHE_LRU_LIST list is used to detect this case. However, the flag is also raised in combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru is used. So checking only DCACHE_LRU_LIST will lead to wrong nr_dentry_negative count. Fix the check to not count dentries that are on a shrink related list Misc: - hfsplus: fix an uninitialized value issue in copy_name - minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even though we switched it to kmap_local_page() a while ago" * tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minixfs: Fix minixfs_rename with HIGHMEM hfsplus: fix uninit-value in copy_name vfs: don't mod negative dentry count when on shrinker list filelock: fix potential use-after-free in posix_lock_inode cachefiles: add missing lock protection when polling cachefiles: cyclic allocation of msg_id to avoid reuse cachefiles: wait for ondemand_object_worker to finish when dropping object cachefiles: cancel all requests for the object that is being dropped cachefiles: stop sending new request when dropping object cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie() cachefiles: fix slab-use-after-free in fscache_withdraw_volume() netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume() netfs: Switch debug logging to pr_debug()	2024-07-11 09:03:28 -07:00
Filipe Manana	4484940514	btrfs: avoid races when tracking progress for extent map shrinking We store the progress (root and inode numbers) of the extent map shrinker in fs_info without any synchronization but we can have multiple tasks calling into the shrinker during memory allocations when there's enough memory pressure for example. This can result in a task A reading fs_info->extent_map_shrinker_last_ino after another task B updates it, and task A reading fs_info->extent_map_shrinker_last_root before task B updates it, making task A see an odd state that isn't necessarily harmful but may make it skip certain inode ranges or do more work than necessary by going over the same inodes again. These unprotected accesses would also trigger warnings from tools like KCSAN. So add a lock to protect access to these progress fields. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-11 16:50:54 +02:00
Filipe Manana	b3ebb9b7e9	btrfs: stop extent map shrinker if reschedule is needed The extent map shrinker can be called in a variety of contexts where we are under memory pressure, and of them is when a task is trying to allocate memory. For this reason the shrinker is typically called with a value of struct shrink_control::nr_to_scan that is much smaller than what we return in the nr_cached_objects callback of struct super_operations (fs/btrfs/super.c:btrfs_nr_cached_objects()), so that the shrinker does not take a long time and cause high latencies. However we can still take a lot of time in the shrinker even for a limited amount of nr_to_scan: 1) When traversing the red black tree that tracks open inodes in a root, as for example with millions of open inodes we get a deep tree which takes time searching for an inode; 2) Iterating over the extent map tree, which is a red black tree, of an inode when doing the rb_next() calls and when removing an extent map from the tree, since often that requires rebalancing the red black tree; 3) When trying to write lock an inode's extent map tree we may wait for a significant amount of time, because there's either another task about to do IO and searching for an extent map in the tree or inserting an extent map in the tree, and we can have thousands or even millions of extent maps for an inode. Furthermore, there can be concurrent calls to the shrinker so the lock might be busy simply because there is already another task shrinking extent maps for the same inode; 4) We often reschedule if we need to, which further increases latency. So improve on this by stopping the extent map shrinking code whenever we need to reschedule and make it skip an inode if we can't immediately lock its extent map tree. Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CABXGCsMmmb36ym8hVNGTiU8yfUS_cGvoUmGCcBrGWq9OxTrs+A@mail.gmail.com/ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-11 16:45:42 +02:00
Filipe Manana	68a3ebd18b	btrfs: use delayed iput during extent map shrinking When putting an inode during extent map shrinking we're doing a standard iput() but that may take a long time in case the inode is dirty and we are doing the final iput that triggers eviction - the VFS will have to wait for writeback before calling the btrfs evict callback (see fs/inode.c:evict()). This slows down the task running the shrinker which may have been triggered while updating some tree for example, meaning locks are held as well as an open transaction handle. Also if the iput() ends up triggering eviction and the inode has no links anymore, then we trigger item truncation which requires flushing delayed items, space reservation to start a transaction and that may trigger the space reclaim task and wait for it, resulting in deadlocks in case the reclaim task needs for example to commit a transaction and the shrinker is being triggered from a path holding a transaction handle. Syzbot reported such a case with the following stack traces: ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Not tainted ------------------------------------------------------ kswapd0/111 is trying to acquire lock: ffff88801eae4610 (sb_internal#3){.+.+}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 but task is already holding lock: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire mm/page_alloc.c:3783 [inline] fs_reclaim_acquire+0x102/0x160 mm/page_alloc.c:3797 might_alloc include/linux/sched/mm.h:334 [inline] slab_pre_alloc_hook mm/slub.c:3890 [inline] slab_alloc_node mm/slub.c:3980 [inline] kmem_cache_alloc_lru_noprof+0x58/0x2f0 mm/slub.c:4019 btrfs_alloc_inode+0x118/0xb20 fs/btrfs/inode.c:8411 alloc_inode+0x5d/0x230 fs/inode.c:261 iget5_locked fs/inode.c:1235 [inline] iget5_locked+0x1c9/0x2c0 fs/inode.c:1228 btrfs_iget_locked fs/btrfs/inode.c:5590 [inline] btrfs_iget_path fs/btrfs/inode.c:5607 [inline] btrfs_iget+0xfb/0x230 fs/btrfs/inode.c:5636 create_reloc_inode+0x403/0x820 fs/btrfs/relocation.c:3911 btrfs_relocate_block_group+0x471/0xe60 fs/btrfs/relocation.c:4114 btrfs_relocate_chunk+0x143/0x450 fs/btrfs/volumes.c:3373 __btrfs_balance fs/btrfs/volumes.c:4157 [inline] btrfs_balance+0x211a/0x3f00 fs/btrfs/volumes.c:4534 btrfs_ioctl_balance fs/btrfs/ioctl.c:3675 [inline] btrfs_ioctl+0x12ed/0x8290 fs/btrfs/ioctl.c:4742 __do_compat_sys_ioctl+0x2c3/0x330 fs/ioctl.c:1007 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #2 (btrfs_trans_num_extwriters){++++}-{0:0}: join_transaction+0x164/0xf40 fs/btrfs/transaction.c:315 start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700 btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323 btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999 open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554 btrfs_fill_super fs/btrfs/super.c:946 [inline] btrfs_get_tree_super fs/btrfs/super.c:1863 [inline] btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089 vfs_get_tree+0x8f/0x380 fs/super.c:1780 fc_mount+0x16/0xc0 fs/namespace.c:1125 btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline] btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090 vfs_get_tree+0x8f/0x380 fs/super.c:1780 do_new_mount fs/namespace.c:3352 [inline] path_mount+0x6e1/0x1f10 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount fs/namespace.c:3875 [inline] __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #1 (btrfs_trans_num_writers){++++}-{0:0}: join_transaction+0x148/0xf40 fs/btrfs/transaction.c:314 start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700 btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323 btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999 open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554 btrfs_fill_super fs/btrfs/super.c:946 [inline] btrfs_get_tree_super fs/btrfs/super.c:1863 [inline] btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089 vfs_get_tree+0x8f/0x380 fs/super.c:1780 fc_mount+0x16/0xc0 fs/namespace.c:1125 btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline] btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090 vfs_get_tree+0x8f/0x380 fs/super.c:1780 do_new_mount fs/namespace.c:3352 [inline] path_mount+0x6e1/0x1f10 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount fs/namespace.c:3875 [inline] __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #0 (sb_internal#3){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1655 [inline] sb_start_intwrite include/linux/fs.h:1838 [inline] start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694 btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline] btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189 super_cache_scan+0x409/0x550 fs/super.c:227 do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435 shrink_slab+0x18a/0x1310 mm/shrinker.c:662 shrink_one+0x493/0x7c0 mm/vmscan.c:4790 shrink_many mm/vmscan.c:4851 [inline] lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951 shrink_node mm/vmscan.c:5910 [inline] kswapd_shrink_node mm/vmscan.c:6720 [inline] balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911 kswapd+0x5ea/0xbf0 mm/vmscan.c:7180 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Chain exists of: sb_internal#3 --> btrfs_trans_num_extwriters --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(btrfs_trans_num_extwriters); lock(fs_reclaim); rlock(sb_internal#3); * DEADLOCK * 2 locks held by kswapd0/111: #0: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924 #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_trylock_shared fs/super.c:562 [inline] #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_cache_scan+0x96/0x550 fs/super.c:196 stack backtrace: CPU: 0 PID: 111 Comm: kswapd0 Not tainted 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 check_noncircular+0x31a/0x400 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1655 [inline] sb_start_intwrite include/linux/fs.h:1838 [inline] start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694 btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline] btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189 super_cache_scan+0x409/0x550 fs/super.c:227 do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435 shrink_slab+0x18a/0x1310 mm/shrinker.c:662 shrink_one+0x493/0x7c0 mm/vmscan.c:4790 shrink_many mm/vmscan.c:4851 [inline] lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951 shrink_node mm/vmscan.c:5910 [inline] kswapd_shrink_node mm/vmscan.c:6720 [inline] balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911 kswapd+0x5ea/0xbf0 mm/vmscan.c:7180 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> So fix this by using btrfs_add_delayed_iput() so that the final iput is delegated to the cleaner kthread. Link: https://lore.kernel.org/linux-btrfs/000000000000892280061a344581@google.com/ Reported-by: syzbot+3dad89b3993a4b275e72@syzkaller.appspotmail.com Fixes: `956a17d9d0` ("btrfs: add a shrinker for extent maps") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-11 16:45:18 +02:00

1 2 3 4 5 ...

92056 Commits