linux

iv/linux

Author	SHA1	Message	Date
Naohiro Aota	9634e5360b	btrfs: reinsert BGs failed to reclaim commit 7e27180994383b7c741ad87749db01e4989a02ba upstream. The reclaim process can temporarily fail. For example, if the space is getting tight, it fails to make the block group read-only. If there are no further writes on that block group, the block group will never get back to the reclaim list, and the BG never gets reclaimed. In a certain workload, we can leave many such block groups never reclaimed. So, let's get it back to the list and give it a chance to be reclaimed. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:13 +02:00
David Sterba	d9f1e518ab	btrfs: add block-group tree to lockdep classes commit 1a1b0e729d227f9f758f7b5f1c997e874e94156e upstream. The block group tree was not present among the lockdep classes. We could get potentially lockdep warnings but so far none has been seen, also because block-group-tree is a relatively new feature. CC: stable@vger.kernel.org # 6.1+ Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:13 +02:00
Naohiro Aota	3702c5342c	btrfs: bail out reclaim process if filesystem is read-only commit 93463ff7b54626f8276c0bd3d3f968fbf8d5d380 upstream. When a filesystem is read-only, we cannot reclaim a block group as it cannot rewrite the data. Just bail out in that case. Note that it can drop block groups in this case. As we did sb_start_write(), read-only filesystem means we got a fatal error and forced read-only. There is no chance to reclaim them again. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:13 +02:00
Naohiro Aota	8560861095	btrfs: delete unused BGs while reclaiming BGs commit 3ed01616bad6c7e3de196676b542ae3df8058592 upstream. The reclaiming process only starts after the filesystem volumes are allocated to a certain level (75% by default). Thus, the list of reclaiming target block groups can build up so huge at the time the reclaim process kicks in. On a test run, there were over 1000 BGs in the reclaim list. As the reclaim involves rewriting the data, it takes really long time to reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs() won't proceed because the reclaim side is holding fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused BGs kept in the unused list. On my test run, I got 1057 unused BGs. Since deleting a block group is relatively easy and fast work, we can call btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up unused BGs. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:13 +02:00
Matt Corallo	4fadf53fa9	btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile commit 160fe8f6fdb13da6111677be6263e5d65e875987 upstream. Callers of `btrfs_reduce_alloc_profile` expect it to return exactly one allocation profile flag, and failing to do so may ultimately result in a WARN_ON and remount-ro when allocating new blocks, like the below transaction abort on 6.1. `btrfs_reduce_alloc_profile` has two ways of determining the profile, first it checks if a conversion balance is currently running and uses the profile we're converting to. If no balance is currently running, it returns the max-redundancy profile which at least one block in the selected block group has. This works by simply checking each known allocation profile bit in redundancy order. However, `btrfs_reduce_alloc_profile` has not been updated as new flags have been added - first with the `DUP` profile and later with the RAID1C34 profiles. Because of the way it checks, if we have blocks with different profiles and at least one is known, that profile will be selected. However, if none are known we may return a flag set with multiple allocation profiles set. This is currently only possible when a balance from one of the three unhandled profiles to another of the unhandled profiles is canceled after allocating at least one block using the new profile. In that case, a transaction abort like the below will occur and the filesystem will need to be mounted with -o skip_balance to get it mounted rw again (but the balance cannot be resumed without a similar abort). [770.648] ------------[ cut here ]------------ [770.648] BTRFS: Transaction aborted (error -22) [770.648] WARNING: CPU: 43 PID: 1159593 at fs/btrfs/extent-tree.c:4122 find_free_extent+0x1d94/0x1e00 [btrfs] [770.648] CPU: 43 PID: 1159593 Comm: btrfs Tainted: G W 6.1.0-0.deb11.7-powerpc64le #1 Debian 6.1.20-2~bpo11+1a~test [770.648] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [770.648] NIP: c00800000f6784fc LR: c00800000f6784f8 CTR: c000000000d746c0 [770.648] REGS: c000200089afe9a0 TRAP: 0700 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test) [770.648] MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28848282 XER: 20040000 [770.648] CFAR: c000000000135110 IRQMASK: 0 GPR00: c00800000f6784f8 c000200089afec40 c00800000f7ea800 0000000000000026 GPR04: 00000001004820c2 c000200089afea00 c000200089afe9f8 0000000000000027 GPR08: c000200ffbfe7f98 c000000002127f90 ffffffffffffffd8 0000000026d6a6e8 GPR12: 0000000028848282 c000200fff7f3800 5deadbeef0000122 c00000002269d000 GPR16: c0002008c7797c40 c000200089afef17 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000001 c000200008bc5a98 0000000000000001 GPR24: 0000000000000000 c0000003c73088d0 c000200089afef17 c000000016d3a800 GPR28: c0000003c7308800 c00000002269d000 ffffffffffffffea 0000000000000001 [770.648] NIP [c00800000f6784fc] find_free_extent+0x1d94/0x1e00 [btrfs] [770.648] LR [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] [770.648] Call Trace: [770.648] [c000200089afec40] [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] (unreliable) [770.648] [c000200089afed30] [c00800000f681398] btrfs_reserve_extent+0x1a0/0x2f0 [btrfs] [770.648] [c000200089afeea0] [c00800000f681bf0] btrfs_alloc_tree_block+0x108/0x670 [btrfs] [770.648] [c000200089afeff0] [c00800000f66bd68] __btrfs_cow_block+0x170/0x850 [btrfs] [770.648] [c000200089aff100] [c00800000f66c58c] btrfs_cow_block+0x144/0x288 [btrfs] [770.648] [c000200089aff1b0] [c00800000f67113c] btrfs_search_slot+0x6b4/0xcb0 [btrfs] [770.648] [c000200089aff2a0] [c00800000f679f60] lookup_inline_extent_backref+0x128/0x7c0 [btrfs] [770.648] [c000200089aff3b0] [c00800000f67b338] lookup_extent_backref+0x70/0x190 [btrfs] [770.648] [c000200089aff470] [c00800000f67b54c] __btrfs_free_extent+0xf4/0x1490 [btrfs] [770.648] [c000200089aff5a0] [c00800000f67d770] __btrfs_run_delayed_refs+0x328/0x1530 [btrfs] [770.648] [c000200089aff740] [c00800000f67ea2c] btrfs_run_delayed_refs+0xb4/0x3e0 [btrfs] [770.648] [c000200089aff800] [c00800000f699aa4] btrfs_commit_transaction+0x8c/0x12b0 [btrfs] [770.648] [c000200089aff8f0] [c00800000f6dc628] reset_balance_state+0x1c0/0x290 [btrfs] [770.648] [c000200089aff9a0] [c00800000f6e2f7c] btrfs_balance+0x1164/0x1500 [btrfs] [770.648] [c000200089affb40] [c00800000f6f8e4c] btrfs_ioctl+0x2b54/0x3100 [btrfs] [770.648] [c000200089affc80] [c00000000053be14] sys_ioctl+0x794/0x1310 [770.648] [c000200089affd70] [c00000000002af98] system_call_exception+0x138/0x250 [770.648] [c000200089affe10] [c00000000000c654] system_call_common+0xf4/0x258 [770.648] --- interrupt: c00 at 0x7fff94126800 [770.648] NIP: 00007fff94126800 LR: 0000000107e0b594 CTR: 0000000000000000 [770.648] REGS: c000200089affe80 TRAP: 0c00 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test) [770.648] MSR: 900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE> CR: 24002848 XER: 00000000 [770.648] IRQMASK: 0 GPR00: 0000000000000036 00007fffc9439da0 00007fff94217100 0000000000000003 GPR04: 00000000c4009420 00007fffc9439ee8 0000000000000000 0000000000000000 GPR08: 00000000803c7416 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff9467d120 0000000107e64c9c 0000000107e64d0a GPR16: 0000000107e64d06 0000000107e64cf1 0000000107e64cc4 0000000107e64c73 GPR20: 0000000107e64c31 0000000107e64bf1 0000000107e64be7 0000000000000000 GPR24: 0000000000000000 00007fffc9439ee0 0000000000000003 0000000000000001 GPR28: 00007fffc943f713 0000000000000000 00007fffc9439ee8 0000000000000000 [770.648] NIP [00007fff94126800] 0x7fff94126800 [770.648] LR [0000000107e0b594] 0x107e0b594 [770.648] --- interrupt: c00 [770.648] Instruction dump: [770.648] 3b00ffe4 e8898828 481175f5 60000000 4bfff4fc 3be00000 4bfff570 3d220000 [770.648] 7fc4f378 e8698830 4811cd95 e8410018 <0fe00000> f9c10060 f9e10068 fa010070 [770.648] ---[ end trace 0000000000000000 ]--- [770.648] BTRFS: error (device dm-2: state A) in find_free_extent_update_loop:4122: errno=-22 unknown [770.648] BTRFS info (device dm-2: state EA): forced readonly [770.648] BTRFS: error (device dm-2: state EA) in __btrfs_free_extent:3070: errno=-22 unknown [770.648] BTRFS error (device dm-2: state EA): failed to run delayed ref for logical 17838685708288 num_bytes 24576 type 184 action 2 ref_mod 1: -22 [770.648] BTRFS: error (device dm-2: state EA) in btrfs_run_delayed_refs:2144: errno=-22 unknown [770.648] BTRFS: error (device dm-2: state EA) in reset_balance_state:3599: errno=-22 unknown Fixes: 47e6f7423b91 ("btrfs: add support for 3-copy replication (raid1c3)") Fixes: 8d6fac0087e5 ("btrfs: add support for 4-copy replication (raid1c4)") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Matt Corallo <blnxfsl@bluematt.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:13 +02:00
Jan Kara	f40d621387	fs: Lock moved directories commit 28eceeda130f5058074dd007d9c59d2e8bc5af2e upstream. When a directory is moved to a different directory, some filesystems (udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to update their pointer to the parent and this must not race with other operations on the directory. Lock the directories when they are moved. Although not all filesystems need this locking, we perform it in vfs_rename() because getting the lock ordering right is really difficult and we don't want to expose these locking details to filesystems. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-5-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:12 +02:00
Jan Kara	10c159f994	fs: Establish locking order for unrelated directories commit f23ce757185319886ca80c4864ce5f81ac6cc9e9 upstream. Currently the locking order of inode locks for directories that are not in ancestor relationship is not defined because all operations that needed to lock two directories like this were serialized by sb->s_vfs_rename_mutex. However some filesystems need to lock two subdirectories for RENAME_EXCHANGE operations and for this we need the locking order established even for two tree-unrelated directories. Provide a helper function lock_two_inodes() that establishes lock ordering for any two inodes and use it in lock_two_directories(). CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-4-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:12 +02:00
Jan Kara	6654d2a165	Revert "f2fs: fix potential corruption when moving a directory" commit cde3c9d7e2a359e337216855dcb333a19daaa436 upstream. This reverts commit d94772154e524b329a168678836745d2773a6e02. The locking is going to be provided by VFS. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-3-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:12 +02:00
Jan Kara	6aaa22ec73	ext4: Remove ext4 locking of moved directory commit 3658840cd363f2be094f5dfd2f0b174a9055dd0f upstream. Remove locking of moved directory in ext4_rename2(). We will take care of it in VFS instead. This effectively reverts commit 0813299c586b ("ext4: Fix possible corruption when moving a directory") and followup fixes. CC: Ted Tso <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:12 +02:00
Thomas Weißschuh	606e463eef	fs: avoid empty option when generating legacy mount string commit 62176420274db5b5127cd7a0083a9aeb461756ee upstream. As each option string fragment is always prepended with a comma it would happen that the whole string always starts with a comma. This could be interpreted by filesystem drivers as an empty option and may produce errors. For example the NTFS driver from ntfs.ko behaves like this and fails when mounted via the new API. Link: https://github.com/util-linux/util-linux/issues/2298 Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Fixes: 3e1aeb00e6d1 ("vfs: Implement a filesystem superblock creation/configuration context") Cc: stable@vger.kernel.org Message-Id: <20230607-fs-empty-option-v1-1-20c8dbf4671b@weissschuh.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:11 +02:00
Fabian Frederick	6df680709d	jffs2: reduce stack usage in jffs2_build_xattr_subsystem() commit 1168f095417643f663caa341211e117db552989f upstream. Use kcalloc() for allocation/flush of 128 pointers table to reduce stack usage. Function now returns -ENOMEM or 0 on success. stackusage Before: ./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 1208 dynamic,bounded After: ./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 192 dynamic,bounded Also update definition when CONFIG_JFFS2_FS_XATTR is not enabled Tested with an MTD mount point and some user set/getfattr. Many current target on OpenWRT also suffer from a compilation warning (that become an error with CONFIG_WERROR) with the following output: fs/jffs2/xattr.c: In function 'jffs2_build_xattr_subsystem': fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 887 \| } \| ^ Using dynamic allocation fix this compilation warning. Fixes: c9f700f840bd ("[JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion") Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Ron Economos <re@w6rz.net> Reported-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: stable@vger.kernel.org Message-Id: <20230506045612.16616-1-ansuelsmth@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:11 +02:00
Roberto Sassu	1f34bf8b44	shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs commit 36ce9d76b0a93bae799e27e4f5ac35478c676592 upstream. As the ramfs-based tmpfs uses ramfs_init_fs_context() for the init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb() to free it and avoid a memory leak. Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweicloud.com Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:11 +02:00
Dai Ngo	2b4e43b5ad	NFSD: add encoding of op_recall flag for write delegation commit 58f5d894006d82ed7335e1c37182fbc5f08c2f51 upstream. Modified nfsd4_encode_open to encode the op_recall flag properly for OPEN result with write delegation granted. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:08 +02:00
Filipe Manana	b990e37603	btrfs: do not BUG_ON() on tree mod log failure at balance_level() [ Upstream commit 39020d8abc7ec62c4de9b260e3d10d4a1c2478ce ] At balance_level(), instead of doing a BUG_ON() in case we fail to record tree mod log operations, do a transaction abort and return the error to the callers. There's really no need for the BUG_ON() as we can release all resources in this context, and we have to abort because other future tree searches that use the tree mod log (btrfs_search_old_slot()) may get inconsistent results if other operations modify the tree after that failure and before the tree mod log based search. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:22:08 +02:00
David Howells	f5ea303502	afs: Fix accidental truncation when storing data [ Upstream commit 03275585cabd0240944f19f33d7584a1b099a3a8 ] When an AFS FS.StoreData RPC call is made, amongst other things it is given the resultant file size to be. On the server, this is processed by truncating the file to new size and then writing the data. Now, kafs has a lock (vnode->io_lock) that serves to serialise operations against a specific vnode (ie. inode), but the parameters for the op are set before the lock is taken. This allows two writebacks (say sync and kswapd) to race - and if writes are ongoing the writeback for a later write could occur before the writeback for an earlier one if the latter gets interrupted. Note that afs_writepages() cannot take i_mutex and only takes a shared lock on vnode->validate_lock. Also note that the server does the truncation and the write inside a lock, so there's no problem at that end. Fix this by moving the calculation for the proposed new i_size inside the vnode->io_lock. Also reset the iterator (which we might have read from) and update the mtime setting there. Fixes: bd80d8a80e12 ("afs: Use ITER_XARRAY for writing") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3526895.1687960024@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:22:06 +02:00
Amir Goldstein	78c6cf1dc7	fanotify: disallow mount/sb marks on kernel internal pseudo fs [ Upstream commit 69562eb0bd3e6bb8e522a7b254334e0fb30dff0c ] Hopefully, nobody is trying to abuse mount/sb marks for watching all anonymous pipes/inodes. I cannot think of a good reason to allow this - it looks like an oversight that dated back to the original fanotify API. Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/ Fixes: 0ff21db9fcc3 ("fanotify: hooks the fanotify_mark syscall to the vfsmount code") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230629042044.25723-1-amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:22:05 +02:00
Zeng Heng	c86a2517df	ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr() [ Upstream commit 3c675ddffb17a8b1e32efad5c983254af18b12c2 ] Here is a BUG report from syzbot: BUG: KASAN: slab-out-of-bounds in ntfs_list_ea fs/ntfs3/xattr.c:191 [inline] BUG: KASAN: slab-out-of-bounds in ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710 Read of size 1 at addr ffff888021acaf3d by task syz-executor128/3632 Call Trace: ntfs_list_ea fs/ntfs3/xattr.c:191 [inline] ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710 vfs_listxattr fs/xattr.c:457 [inline] listxattr+0x293/0x2d0 fs/xattr.c:804 Fix the logic of ea_all iteration. When the ea->name_len is 0, return immediately, or Add2Ptr() would visit invalid memory in the next loop. Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Reported-by: syzbot+9fcea5ef6dc4dc72d334@syzkaller.appspotmail.com Signed-off-by: Zeng Heng <zengheng4@huawei.com> [almaz.alexandrovich@paragon-software.com: lines of the patch have changed] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:22:04 +02:00
Chao Yu	c2c5c6d2c4	f2fs: fix error path handling in truncate_dnode() [ Upstream commit 0135c482fa97e2fd8245cb462784112a00ed1211 ] If truncate_node() fails in truncate_dnode(), it missed to call f2fs_put_page(), fix it. Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:22:03 +02:00
Chao Yu	0623f13959	f2fs: check return value of freeze_super() [ Upstream commit 8bec7dd1b3f7d7769d433d67bde404de948a2d95 ] freeze_super() can fail, it needs to check its return value and do error handling in f2fs_resize_fs(). Fixes: 04f0b2eaa3b3 ("f2fs: ioctl for removing a range from F2FS") Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:22:00 +02:00
Chao Yu	ebe83e9bb8	f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io() [ Upstream commit d8189834d4348ae608083e1f1f53792cfcc2a9bc ] butt3rflyh4ck reports a bug as below: When a thread always calls F2FS_IOC_RESIZE_FS to resize fs, if resize fs is failed, f2fs kernel thread would invoke callback function to update f2fs io info, it would call f2fs_write_end_io and may trigger null-ptr-deref in NODE_MAPPING. general protection fault, probably for non-canonical address KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] RIP: 0010:NODE_MAPPING fs/f2fs/f2fs.h:1972 [inline] RIP: 0010:f2fs_write_end_io+0x727/0x1050 fs/f2fs/data.c:370 <TASK> bio_endio+0x5af/0x6c0 block/bio.c:1608 req_bio_endio block/blk-mq.c:761 [inline] blk_update_request+0x5cc/0x1690 block/blk-mq.c:906 blk_mq_end_request+0x59/0x4c0 block/blk-mq.c:1023 lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1101 __do_softirq+0x1d4/0x8ef kernel/softirq.c:571 run_ksoftirqd kernel/softirq.c:939 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:931 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 The root cause is below race case can cause leaving dirty metadata in f2fs after filesystem is remount as ro: Thread A Thread B - f2fs_ioc_resize_fs - f2fs_readonly --- return false - f2fs_resize_fs - f2fs_remount - write_checkpoint - set f2fs as ro - free_segment_range - update meta_inode's data Then, if f2fs_put_super() fails to write_checkpoint due to readonly status, and meta_inode's dirty data will be writebacked after node_inode is put, finally, f2fs_write_end_io will access NULL pointer on sbi->node_inode. Thread A IRQ context - f2fs_put_super - write_checkpoint fails - iput(node_inode) - node_inode = NULL - iput(meta_inode) - write_inode_now - f2fs_write_meta_page - f2fs_write_end_io - NODE_MAPPING(sbi) : access NULL pointer on node_inode Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Closes: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:55 +02:00
Chao Yu	15c073e752	f2fs: fix potential deadlock due to unpaired node_write lock use [ Upstream commit f082c6b205a06953f26c40bdc7621cc5a58ceb7c ] If S_NOQUOTA is cleared from inode during data page writeback of quota file, it may miss to unlock node_write lock, result in potential deadlock, fix to use the lock in paired. Kworker Thread - writepage if (IS_NOQUOTA()) f2fs_down_read(&sbi->node_write); - vfs_cleanup_quota_inode - inode->i_flags &= ~S_NOQUOTA; if (IS_NOQUOTA()) f2fs_up_read(&sbi->node_write); Fixes: 79963d967b49 ("f2fs: shrink node_write lock coverage") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:55 +02:00
Bob Peterson	2e980eb955	gfs2: Fix duplicate should_fault_in_pages() call [ Upstream commit c8ed1b35931245087968fd95b2ec3dfc50f77769 ] In gfs2_file_buffered_write(), we currently jump from the second call of function should_fault_in_pages() to above the first call, so should_fault_in_pages() is getting called twice in a row, causing it to accidentally fall back to single-page writes rather than trying the more efficient multi-page writes first. Fix that by moving the retry label to the correct place, behind the first call to should_fault_in_pages(). Fixes: e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:54 +02:00
Muchun Song	f5d80ad7b6	kernfs: fix missing kernfs_idr_lock to remove an ID from the IDR [ Upstream commit 30480b988f88c279752f3202a26b6fee5f586aef ] The root->ino_idr is supposed to be protected by kernfs_idr_lock, fix it. Fixes: 488dee96bb62 ("kernfs: allow creating kernfs objects with arbitrary uid/gid") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230523024017.24851-1-songmuchun@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:53 +02:00
Yangtao Li	d6dd477436	f2fs: do not allow to defragment files have FI_COMPRESS_RELEASED [ Upstream commit 7cd2e5f75b86a1befa99834f3ed1d735eeff69e6 ] If a file has FI_COMPRESS_RELEASED, all writes for it should not be allowed. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Signed-off-by: Qi Han <hanqi@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:48 +02:00
Filipe Manana	6f1c81886b	btrfs: fix race when deleting free space root from the dirty cow roots list commit babebf023e661b90b1c78b2baa384fb03a226879 upstream. When deleting the free space tree we are deleting the free space root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the free space root from that list. Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:21:47 +02:00
Arnd Bergmann	be54803be8	ksmbd: avoid field overflow warning [ Upstream commit 9cedc58bdbe9fff9aacd0ca19ee5777659f28fd7 ] clang warns about a possible field overflow in a memcpy: In file included from fs/smb/server/smb_common.c:7: include/linux/fortify-string.h:583:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] __write_overflow_field(p_size_field, size); It appears to interpret the "&out[baselen + 4]" as referring to a single byte of the character array, while the equivalen "out + baselen + 4" is seen as an offset into the array. I don't see that kind of warning elsewhere, so just go with the simple rework. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:44 +02:00
Paulo Alcantara	babaab6ef6	smb: client: fix broken file attrs with nodfs mounts [ Upstream commit d439b29057e26464120fc6c18f97433aa003b5fe ] *_get_inode_info() functions expect -EREMOTE when query path info calls find a DFS link, regardless whether !CONFIG_CIFS_DFS_UPCALL or 'nodfs' mount option. Otherwise, those files will miss the fake DFS file attributes. Before patch $ mount.cifs //srv/dfs /mnt/1 -o ...,nodfs $ ls -l /mnt/1 ls: cannot access '/mnt/1/link': Operation not supported total 0 -rwxr-xr-x 1 root root 0 Jul 26 2022 dfstest2_file1.txt drwxr-xr-x 2 root root 0 Aug 8 2022 dir1 d????????? ? ? ? ? ? link After patch $ mount.cifs //srv/dfs /mnt/1 -o ...,nodfs $ ls -l /mnt/1 total 0 -rwxr-xr-x 1 root root 0 Jul 26 2022 dfstest2_file1.txt drwxr-xr-x 2 root root 0 Aug 8 2022 dir1 drwx--x--x 2 root root 0 Jun 26 20:29 link Fixes: c877ce47e137 ("cifs: reduce roundtrips on create/qinfo requests") Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:44 +02:00
Shyam Prasad N	9fb981a86a	cifs: do all necessary checks for credits within or before locking [ Upstream commit 326a8d04f147e2bf393f6f9cdb74126ee6900607 ] All the server credits and in-flight info is protected by req_lock. Once the req_lock is held, and we've determined that we have enough credits to continue, this lock cannot be dropped till we've made the changes to credits and in-flight count. However, we used to drop the lock in order to avoid deadlock with the recent srv_lock. This could cause the checks already made to be invalidated. Fixed it by moving the server status check to before locking req_lock. Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:44 +02:00
Shyam Prasad N	4fe07d55a5	cifs: prevent use-after-free by freeing the cfile later [ Upstream commit 33f736187d08f6bc822117629f263b97d3df4165 ] In smb2_compound_op we have a possible use-after-free which can cause hard to debug problems later on. This was revealed during stress testing with KASAN enabled kernel. Fixing it by moving the cfile free call to a few lines below, after the usage. Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:44 +02:00
Bharath SM	1bf709b962	SMB3: Do not send lease break acknowledgment if all file handles have been closed [ Upstream commit da787d5b74983f7525d1eb4b9c0b4aff2821511a ] In case if all existing file handles are deferred handles and if all of them gets closed due to handle lease break then we dont need to send lease break acknowledgment to server, because last handle close will be considered as lease break ack. After closing deferred handels, we check for openfile list of inode, if its empty then we skip sending lease break ack. Fixes: 59a556aebc43 ("SMB3: drop reference to cfile before sending oplock break") Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:43 +02:00
Olga Kornievskaia	c2bf8d7b8f	NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION [ Upstream commit c907e72f58ed979a24a9fdcadfbc447c51d5e509 ] When the client received NFS4ERR_BADSESSION, it schedules recovery and start the state manager thread which in turn freezes the session table and does not allow for any new requests to use the no-longer valid session. However, it is possible that before the state manager thread runs, a new operation would use the released slot that received BADSESSION and was therefore not updated its sequence number. Such re-use of the slot can lead the application errors. Fixes: 5c441544f045 ("NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:43 +02:00
Qi Zheng	7053178436	NFSv4.2: fix wrong shrinker_id [ Upstream commit 7f7ab336898f281e58540ef781a8fb375acc32a9 ] Currently, the list_lru::shrinker_id corresponding to the nfs4_xattr shrinkers is wrong: >>> prog["nfs4_xattr_cache_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_entry_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_large_entry_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_cache_shrinker"].id (int)18 >>> prog["nfs4_xattr_entry_shrinker"].id (int)19 >>> prog["nfs4_xattr_large_entry_shrinker"].id (int)20 This is not what we expect, which will cause these shrinkers not to be found in shrink_slab_memcg(). We should assign shrinker::id before calling list_lru_init_memcg(), so that the corresponding list_lru::shrinker_id will be assigned the correct value like below: >>> prog["nfs4_xattr_cache_lru"].shrinker_id (int)16 >>> prog["nfs4_xattr_entry_lru"].shrinker_id (int)17 >>> prog["nfs4_xattr_large_entry_lru"].shrinker_id (int)18 >>> prog["nfs4_xattr_cache_shrinker"].id (int)16 >>> prog["nfs4_xattr_entry_shrinker"].id (int)17 >>> prog["nfs4_xattr_large_entry_shrinker"].id (int)18 So just do it. Fixes: 95ad37f90c33 ("NFSv4.2: add client side xattr caching.") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:43 +02:00
Amir Goldstein	e4f2a1feeb	ovl: update of dentry revalidate flags after copy up [ Upstream commit b07d5cc93e1b28df47a72c519d09d0a836043613 ] After copy up, we may need to update d_flags if upper dentry is on a remote fs and lower dentries are not. Add helpers to allow incremental update of the revalidate flags. Fixes: bccece1ead36 ("ovl: allow remote upper") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:33 +02:00
David Howells	94817712b5	ocfs2: Fix use of slab data with sendpage [ Upstream commit 86d7bd6e66e9925f0f04a7bcf3c92c05fdfefb5a ] ocfs2 uses kzalloc() to allocate buffers for o2net_hand, o2net_keep_req and o2net_keep_resp and then passes these to sendpage. This isn't really allowed as the lifetime of slab objects is not controlled by page ref - though in this case it will probably work. sendmsg() with MSG_SPLICE_PAGES will, however, print a warning and give an error. Fix it to use folio_alloc() instead to allocate a buffer for the handshake message, keepalive request and reply messages. Fixes: 98211489d414 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: David Howells <dhowells@redhat.com> cc: Mark Fasheh <mark@fasheh.com> cc: Kurt Hackel <kurt.hackel@oracle.com> cc: Joel Becker <jlbec@evilplan.org> cc: Joseph Qi <joseph.qi@linux.alibaba.com> cc: ocfs2-devel@oss.oracle.com Link: https://lore.kernel.org/r/20230623225513.2732256-14-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:13 +02:00
Jiasheng Jiang	f57ba91a46	pstore/ram: Add check for kstrdup [ Upstream commit d97038d5ec2062733c1e016caf9baaf68cf64ea1 ] Add check for the return value of kstrdup() and return the error if it fails in order to avoid NULL pointer dereference. Fixes: e163fdb3f7f8 ("pstore/ram: Regularize prz label allocation lifetime") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230614093733.36048-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:21:03 +02:00
Gao Xiang	9a53410038	erofs: fix compact 4B support for 16k block size [ Upstream commit 001b8ccd0650727e54ec16ef72bf1b8eeab7168e ] In compact 4B, two adjacent lclusters are packed together as a unit to form on-disk indexes for effective random access, as below: (amortized = 4, vcnt = 2) _____________________________________________ \|___@_____ encoded bits __________\|_ blkaddr _\| 0 . amortized * vcnt = 8 . . . . amortized * vcnt - 4 = 4 . . .____________________________. \|_type (2 bits)_\|_clusterofs_\| Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs, since each lcluster can get 16 bits for its type and clusterofs, the maximum supported lclustersize for compact 4B format is 16k (14 bits). Fix this to enable compact 4B format for 16k lclusters (blocks), which is tested on an arm64 server with 16k page size. Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:59 +02:00
Gao Xiang	ec94df6bcf	erofs: simplify iloc() [ Upstream commit b780d3fc6107464dcc43631a6208c43b6421f1e6 ] Actually we could pass in inodes directly to clean up all callers. Also rename iloc() as erofs_iloc(). Link: https://lore.kernel.org/r/20230114150823.432069-1-xiang@kernel.org Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Stable-dep-of: 001b8ccd0650 ("erofs: fix compact 4B support for 16k block size") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:59 +02:00
NeilBrown	ce16368280	lockd: drop inappropriate svc_get() from locked_get() [ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ] The below-mentioned patch was intended to simplify refcounting on the svc_serv used by locked. The goal was to only ever have a single reference from the single thread. To that end we dropped a call to lockd_start_svc() (except when creating thread) which would take a reference, and dropped the svc_put(serv) that would drop that reference. Unfortunately we didn't also remove the svc_get() from lockd_create_svc() in the case where the svc_serv already existed. So after the patch: - on the first call the svc_serv was allocated and the one reference was given to the thread, so there are no extra references - on subsequent calls svc_get() was called so there is now an extra reference. This is clearly not consistent. The inconsistency is also clear in the current code in lockd_get() takes two references, one on nlmsvc_serv and one by incrementing nlmsvc_users. This clearly does not match lockd_put(). So: drop that svc_get() from lockd_get() (which used to be in lockd_create_svc(). Reported-by: Ido Schimmel <idosch@idosch.org> Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:56 +02:00
Gao Xiang	d3b39ea248	erofs: kill hooked chains to avoid loops on deduplicated compressed images [ Upstream commit 967c28b23f6c89bb8eef6a046ea88afe0d7c1029 ] After heavily stressing EROFS with several images which include a hand-crafted image of repeated patterns for more than 46 days, I found two chains could be linked with each other almost simultaneously and form a loop so that the entire loop won't be submitted. As a consequence, the corresponding file pages will remain locked forever. It can be _only_ observed on data-deduplicated compressed images. For example, consider two chains with five pclusters in total: Chain 1: 2->3->4->5 -- The tail pcluster is 5; Chain 2: 5->1->2 -- The tail pcluster is 2. Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link to Chain 2 at the same time with pcluster 2. Since hooked chains are all linked locklessly now, I have no idea how to simply avoid the race. Instead, let's avoid hooked chains completely until I could work out a proper way to fix this and end users finally tell us that it's needed to add it back. Actually, this optimization can be found with multi-threaded workloads (especially even more often on deduplicated compressed images), yet I'm not sure about the overall system impacts of not having this compared with implementation complexity. Fixes: 267f2492c8f7 ("erofs: introduce multi-reference pclusters (fully-referenced)") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:55 +02:00
Gao Xiang	daed10290b	erofs: move zdata.h into zdata.c [ Upstream commit a9a94d9373349e1a53f149d2015eb6f03a8517cf ] Definitions in zdata.h are only used in zdata.c and for internal use only. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-4-hsiangkao@linux.alibaba.com Stable-dep-of: 967c28b23f6c ("erofs: kill hooked chains to avoid loops on deduplicated compressed images") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:54 +02:00
Gao Xiang	041ff2c21b	erofs: remove tagged pointer helpers [ Upstream commit b1ed220c6262bff63cdcb53692e492be0b05206c ] Just open-code the remaining one to simplify the code. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-3-hsiangkao@linux.alibaba.com Stable-dep-of: 967c28b23f6c ("erofs: kill hooked chains to avoid loops on deduplicated compressed images") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:54 +02:00
Gao Xiang	3379f13ebc	erofs: avoid tagged pointers to mark sync decompression [ Upstream commit cdba55067f2f9fdc7870ffcb6aef912d3468cff8 ] We could just use a boolean in z_erofs_decompressqueue for sync decompression to simplify the code. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-2-hsiangkao@linux.alibaba.com Stable-dep-of: 967c28b23f6c ("erofs: kill hooked chains to avoid loops on deduplicated compressed images") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:54 +02:00
Gao Xiang	3564500b0d	erofs: clean up cached I/O strategies [ Upstream commit 1282dea37b09087b8aec59f0774572c16b52276a ] After commit 4c7e42552b3a ("erofs: remove useless cache strategy of DELAYEDALLOC"), only one cached I/O allocation strategy is supported: When cached I/O is preferred, page allocation is applied without direct reclaim. If allocation fails, fall back to inplace I/O. Let's get rid of z_erofs_cache_alloctype. No logical changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20221206060352.152830-1-xiang@kernel.org Stable-dep-of: 967c28b23f6c ("erofs: kill hooked chains to avoid loops on deduplicated compressed images") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-19 16:20:54 +02:00
Linus Torvalds	e6bbad7571	mm: always expand the stack with the mmap write lock held commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [6.1: Patch drivers/iommu/io-pgfault.c instead] Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-01 13:16:25 +02:00
Linus Torvalds	c4b31d1b69	execve: expand new process stack manually ahead of time commit f313c51d26aa87e69633c9b46efb37a930faca71 upstream. This is a small step towards a model where GUP itself would not expand the stack, and any user that needs GUP to not look up existing mappings, but actually expand on them, would have to do so manually before-hand, and with the mm lock held for writing. It turns out that execve() already did almost exactly that, except it didn't take the mm lock at all (it's single-threaded so no locking technically needed, but it could cause lockdep errors). And it only did it for the CONFIG_STACK_GROWSUP case, since in that case GUP has obviously never expanded the stack downwards. So just make that CONFIG_STACK_GROWSUP case do the right thing with locking, and enable it generally. This will eventually help GUP, and in the meantime avoids a special case and the lockdep issue. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [6.1 Minor context from still having FOLL_FORCE flags set] Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-01 13:16:25 +02:00
Liam R. Howlett	6a6b5616c3	mm: make find_extend_vma() fail if write lock not held commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream. Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-01 13:16:25 +02:00
Steve French	29429a1f58	smb: move client and server files to common directory fs/smb commit 38c8a9a52082579090e34c033d439ed2cd1a462d upstream. Move CIFS/SMB3 related client and server files (cifs.ko and ksmbd.ko and helper modules) to new fs/smb subdirectory: fs/cifs --> fs/smb/client fs/ksmbd --> fs/smb/server fs/smbfs_common --> fs/smb/common Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> [ added to stable trees to handle the directory change to handle the future stable patches due to the constant churn in this filesystem at the moment - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-06-28 11:12:40 +02:00
Andreas Gruenbacher	c2888c460d	gfs2: Don't get stuck writing page onto itself under direct I/O [ Upstream commit fa58cc888d67e640e354d8b3ceef877ea167b0cf ] When a direct I/O write is performed, iomap_dio_rw() invalidates the part of the page cache which the write is going to before carrying out the write. In the odd case, the direct I/O write will be reading from the same page it is writing to. gfs2 carries out writes with page faults disabled, so it should have been obvious that this page invalidation can cause iomap_dio_rw() to never make any progress. Currently, gfs2 will end up in an endless retry loop in gfs2_file_direct_write() instead, though. Break this endless loop by limiting the number of retries and falling back to buffered I/O after that. Also simplify should_fault_in_pages() sightly and add a comment to make the above case easier to understand. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-28 11:12:38 +02:00
Shida Zhang	3f6391062d	btrfs: fix an uninitialized variable warning in btrfs_log_inode [ Upstream commit 8fd9f4232d8152c650fd15127f533a0f6d0a4b2b ] This fixes the following warning reported by gcc 10.2.1 under x86_64: ../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’: ../fs/btrfs/tree-log.c:6211:9: error: ‘last_range_start’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 6211 \| ret = insert_dir_log_key(trans, log, path, key.objectid, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6212 \| first_dir_index, last_dir_index); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../fs/btrfs/tree-log.c:6161:6: note: ‘last_range_start’ was declared here 6161 \| u64 last_range_start; \| ^~~~~~~~~~~~~~~~ This might be a false positive fixed in later compiler versions but we want to have it fixed. Reported-by: k2ci <kernel-bot@kylinos.cn> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-28 11:12:36 +02:00
Steve French	9d8ac2726c	smb3: missing null check in SMB2_change_notify [ Upstream commit b535cc796a4b4942cd189652588e8d37c1f5925a ] If plen is null when passed in, we only checked for null in one of the two places where it could be used. Although plen is always valid (not null) for current callers of the SMB2_change_notify function, this change makes it more consistent. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/all/202305251831.3V1gbbFs-lkp@intel.com/ Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-28 11:12:35 +02:00

... 9 10 11 12 13 ...

80020 Commits