linux

iv/linux

Author	SHA1	Message	Date
David Howells	6406cce4b2	cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved commit 83d5518b124dfd605f10a68128482c839a239f9d upstream. Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in smb3_zero_range(), to set i_size after extending the file on the server. Fixes: 72c419d9b073 ("cifs: fix smb3_zero_range so it can expand the file-size when required") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-12-08 08:51:12 +01:00
Chuck Lever	5d9ddbf4b5	NFSD: Fix checksum mismatches in the duplicate reply cache [ Upstream commit bf51c52a1f3c238d72c64e14d5e7702d3a245b82 ] nfsd_cache_csum() currently assumes that the server's RPC layer has been advancing rq_arg.head[0].iov_base as it decodes an incoming request, because that's the way it used to work. On entry, it expects that buf->head[0].iov_base points to the start of the NFS header, and excludes the already-decoded RPC header. These days however, head[0].iov_base now points to the start of the RPC header during all processing. It no longer points at the NFS Call header when execution arrives at nfsd_cache_csum(). In a retransmitted RPC the XID and the NFS header are supposed to be the same as the original message, but the contents of the retransmitted RPC header can be different. For example, for krb5, the GSS sequence number will be different between the two. Thus if the RPC header is always included in the DRC checksum computation, the checksum of the retransmitted message might not match the checksum of the original message, even though the NFS part of these messages is identical. The result is that, even if a matching XID is found in the DRC, the checksum mismatch causes the server to execute the retransmitted RPC transaction again. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-12-03 07:32:10 +01:00
Chuck Lever	b597f3c85d	NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() [ Upstream commit 1caf5f61dd8430ae5a0b4538afe4953ce7517cbb ] The "statp + 1" pointer that is passed to nfsd_cache_update() is supposed to point to the start of the egress NFS Reply header. In fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. But both krb5i and krb5p add fields between the RPC header's accept_stat field and the start of the NFS Reply header. In those cases, "statp + 1" points at the extra fields instead of the Reply. The result is that nfsd_cache_update() caches what looks to the client like garbage. A connection break can occur for a number of reasons, but the most common reason when using krb5i/p is a GSS sequence number window underrun. When an underrun is detected, the server is obliged to drop the RPC and the connection to force a retransmit with a fresh GSS sequence number. The client presents the same XID, it hits in the server's DRC, and the server returns the garbage cache entry. The "statp + 1" argument has been used since the oldest changeset in the kernel history repo, so it has been in nfsd_dispatch() literally since before history began. The problem arose only when the server-side GSS implementation was added twenty years ago. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-12-03 07:32:10 +01:00
Zhang Yi	d7eb37615b	ext4: make sure allocate pending entry not fail [ Upstream commit 8e387c89e96b9543a339f84043cf9df15fed2632 ] __insert_pending() allocate memory in atomic context, so the allocation could fail, but we are not handling that failure now. It could lead ext4_es_remove_extent() to get wrong reserved clusters, and the global data blocks reservation count will be incorrect. The same to extents_status entry preallocation, preallocate pending entry out of the i_es_lock with __GFP_NOFAIL, make sure __insert_pending() and __revise_pending() always succeeds. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230824092619.1327976-3-yi.zhang@huaweicloud.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	8384d8c5cc	ext4: fix slab-use-after-free in ext4_es_insert_extent() [ Upstream commit 768d612f79822d30a1e7d132a4d4b05337ce42ec ] Yikebaer reported an issue: ================================================================== BUG: KASAN: slab-use-after-free in ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 Read of size 4 at addr ffff888112ecc1a4 by task syz-executor/8438 CPU: 1 PID: 8438 Comm: syz-executor Not tainted 6.5.0-rc5 #1 Call Trace: [...] kasan_report+0xba/0xf0 mm/kasan/report.c:588 ext4_es_insert_extent+0xc68/0xcb0 fs/ext4/extents_status.c:894 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Allocated by task 8438: [...] kmem_cache_zalloc include/linux/slab.h:693 [inline] __es_alloc_extent fs/ext4/extents_status.c:469 [inline] ext4_es_insert_extent+0x672/0xcb0 fs/ext4/extents_status.c:873 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] Freed by task 8438: [...] kmem_cache_free+0xec/0x490 mm/slub.c:3823 ext4_es_try_to_merge_right fs/ext4/extents_status.c:593 [inline] __es_insert_extent+0x9f4/0x1440 fs/ext4/extents_status.c:802 ext4_es_insert_extent+0x2ca/0xcb0 fs/ext4/extents_status.c:882 ext4_map_blocks+0x92a/0x16f0 fs/ext4/inode.c:680 ext4_alloc_file_blocks.isra.0+0x2df/0xb70 fs/ext4/extents.c:4462 ext4_zero_range fs/ext4/extents.c:4622 [inline] ext4_fallocate+0x251c/0x3ce0 fs/ext4/extents.c:4721 [...] ================================================================== The flow of issue triggering is as follows: 1. remove es raw es es removed es1 \|-------------------\| -> \|----\|.......\|------\| 2. insert es es insert es1 merge with es es1 merge with es and free es1 \|----\|.......\|------\| -> \|------------\|------\| -> \|-------------------\| es merges with newes, then merges with es1, frees es1, then determines if es1->es_len is 0 and triggers a UAF. The code flow is as follows: ext4_es_insert_extent es1 = __es_alloc_extent(true); es2 = __es_alloc_extent(true); __es_remove_extent(inode, lblk, end, NULL, es1) __es_insert_extent(inode, &newes, es1) ---> insert es1 to es tree __es_insert_extent(inode, &newes, es2) ext4_es_try_to_merge_right ext4_es_free_extent(inode, es1) ---> es1 is freed if (es1 && !es1->es_len) // Trigger UAF by determining if es1 is used. We determine whether es1 or es2 is used immediately after calling __es_remove_extent() or __es_insert_extent() to avoid triggering a UAF if es1 or es2 is freed. Reported-by: Yikebaer Aizezi <yikebaer61@gmail.com> Closes: https://lore.kernel.org/lkml/CALcu4raD4h9coiyEBL4Bm0zjDwxC2CyPiTwsP3zFuhot6y9Beg@mail.gmail.com Fixes: 2a69c450083d ("ext4: using nofail preallocation in ext4_es_insert_extent()") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230815070808.3377171-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	9164978bce	ext4: using nofail preallocation in ext4_es_insert_extent() [ Upstream commit 2a69c450083db164596c75c0f5b4d9c4c0e18eba ] Similar to in ext4_es_insert_delayed_block(), we use preallocations that do not fail to avoid inconsistencies, but we do not care about es that are not must be kept, and we return 0 even if such es memory allocation fails. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-9-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	614b383d01	ext4: using nofail preallocation in ext4_es_insert_delayed_block() [ Upstream commit 4a2d98447b37bcb68a7f06a1078edcb4f7e6ce7e ] Similar to in ext4_es_remove_extent(), we use a no-fail preallocation to avoid inconsistencies, except that here we may have to preallocate two extent_status. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	51cef2a5c6	ext4: using nofail preallocation in ext4_es_remove_extent() [ Upstream commit e9fe2b882bd5b26b987c9ba110c2222796f72af5 ] If __es_remove_extent() returns an error it means that when splitting extent, allocating an extent that must be kept failed, where returning an error directly would cause the extent tree to be inconsistent. So we use GFP_NOFAIL to pre-allocate an extent_status and pass it to __es_remove_extent() to avoid this problem. In addition, since the allocated memory is outside the i_es_lock, the extent_status tree may change and the pre-allocated extent_status is no longer needed, so we release the pre-allocated extent_status when es->es_len is not initialized. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	f1c2369366	ext4: use pre-allocated es in __es_remove_extent() [ Upstream commit bda3efaf774fb687c2b7a555aaec3006b14a8857 ] When splitting extent, if the second extent can not be dropped, we return -ENOMEM and use GFP_NOFAIL to preallocate an extent_status outside of i_es_lock and pass it to __es_remove_extent() to be used as the second extent. This ensures that __es_remove_extent() is executed successfully, thus ensuring consistency in the extent status tree. If the second extent is not undroppable, we simply drop it and return 0. Then retry is no longer necessary, remove it. Now, __es_remove_extent() will always remove what it should, maybe more. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-6-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	ce581f8631	ext4: use pre-allocated es in __es_insert_extent() [ Upstream commit 95f0b320339a977cf69872eac107122bf536775d ] Pass a extent_status pointer prealloc to __es_insert_extent(). If the pointer is non-null, it is used directly when a new extent_status is needed to avoid memory allocation failures. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	594a5f00e5	ext4: factor out __es_alloc_extent() and __es_free_extent() [ Upstream commit 73a2f033656be11298912201ad50615307b4477a ] Factor out __es_alloc_extent() and __es_free_extent(), which only allocate and free extent_status in these two helpers. The ext4_es_alloc_extent() function is split into __es_alloc_extent() and ext4_es_init_extent(). In __es_alloc_extent() we allocate memory using GFP_KERNEL \| __GFP_NOFAIL \| __GFP_ZERO if the memory allocation cannot fail, otherwise we use GFP_ATOMIC. and the ext4_es_init_extent() is used to initialize extent_status and update related variables after a successful allocation. This is to prepare for the use of pre-allocated extent_status later. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Baokun Li	9381ff6512	ext4: add a new helper to check if es must be kept [ Upstream commit 9649eb18c6288f514cacffdd699d5cd999c2f8f6 ] In the extent status tree, we have extents which we can just drop without issues and extents we must not drop - this depends on the extent's status - currently ext4_es_is_delayed() extents must stay, others may be dropped. A helper function is added to help determine if the current extent can be dropped, although only ext4_es_is_delayed() extents cannot be dropped currently. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 8e387c89e96b ("ext4: make sure allocate pending entry not fail") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:10 +01:00
Shyam Prasad N	e9c3d6b09c	cifs: fix leak of iface for primary channel [ Upstream commit 29954d5b1e0d67a4cd61c30c2201030c97e94b1e ] My last change in this area introduced a change which accounted for primary channel in the interface ref count. However, it did not reduce this ref count on deallocation of the primary channel. i.e. during umount. Fixing this leak here, by dropping this ref count for primary channel while freeing up the session. Fixes: fa1d0508bdd4 ("cifs: account for primary channel in the interface list") Cc: stable@vger.kernel.org Reported-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:09 +01:00
Shyam Prasad N	b24d42b52b	cifs: account for primary channel in the interface list [ Upstream commit fa1d0508bdd4a68c5e40f85f635712af8c12f180 ] The refcounting of server interfaces should account for the primary channel too. Although this is not strictly necessary, doing so will account for the primary channel in DebugData. Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:09 +01:00
Shyam Prasad N	548893404c	cifs: distribute channels across interfaces based on speed [ Upstream commit a6d8fb54a515f0546ffdb7870102b1238917e567 ] Today, if the server interfaces RSS capable, we simply choose the fastest interface to setup a channel. This is not a scalable approach, and does not make a lot of attempt to distribute the connections. This change does a weighted distribution of channels across all the available server interfaces, where the weight is a function of the advertised interface speed. Also make sure that we don't mix rdma and non-rdma for channels. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: fa1d0508bdd4 ("cifs: account for primary channel in the interface list") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:09 +01:00
Shyam Prasad N	5607a415d4	cifs: print last update time for interface list [ Upstream commit 05844bd661d9fd478df1175b6639bf2d9398becb ] We store the last updated time for interface list while parsing the interfaces. This change is to just print that info in DebugData. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: fa1d0508bdd4 ("cifs: account for primary channel in the interface list") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:09 +01:00
Steve French	f4dff37111	smb3: allow dumping session and tcon id to improve stats analysis and debugging [ Upstream commit de4eceab578ead12a71e5b5588a57e142bbe8ceb ] When multiple mounts are to the same share from the same client it was not possible to determine which section of /proc/fs/cifs/Stats (and DebugData) correspond to that mount. In some recent examples this turned out to be a significant problem when trying to analyze performance data - since there are many cases where unless we know the tree id and session id we can't figure out which stats (e.g. number of SMB3.1.1 requests by type, the total time they take, which is slowest, how many fail etc.) apply to which mount. The only existing loosely related ioctl CIFS_IOC_GET_MNT_INFO does not return the information needed to uniquely identify which tcon is which mount although it does return various flags and device info. Add a cifs.ko ioctl CIFS_IOC_GET_TCON_INFO (0x800ccf0c) to return tid, session id, tree connect count. Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:09 +01:00
Steve French	fbc666a9ac	cifs: minor cleanup of some headers [ Upstream commit c19204cbd65c12fdcd34fb8f5d645007238ed5cd ] checkpatch showed formatting problems with extra spaces, and extra semicolon and some missing blank lines in some cifs headers. Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Germano Percossi <germano.percossi@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: de4eceab578e ("smb3: allow dumping session and tcon id to improve stats analysis and debugging") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:09 +01:00
David Howells	48b3ee0134	afs: Fix file locking on R/O volumes to operate in local mode [ Upstream commit b590eb41be766c5a63acc7e8896a042f7a4e8293 ] AFS doesn't really do locking on R/O volumes as fileservers don't maintain state with each other and thus a lock on a R/O volume file on one fileserver will not be be visible to someone looking at the same file on another fileserver. Further, the server may return an error if you try it. Fix this by doing what other AFS clients do and handle filelocking on R/O volume files entirely within the client and don't touch the server. Fixes: 6c6c1d63c243 ("afs: Provide mount-time configurable byte-range file locking emulation") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:08 +01:00
David Howells	f9cf17836e	afs: Return ENOENT if no cell DNS record can be found [ Upstream commit 0167236e7d66c5e1e85d902a6abc2529b7544539 ] Make AFS return error ENOENT if no cell SRV or AFSDB DNS record (or cellservdb config file record) can be found rather than returning EDESTADDRREQ. Also add cell name lookup info to the cursor dump. Fixes: d5c32c89b208 ("afs: Fix cell DNS lookup") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:08 +01:00
David Howells	d2b3bc8c7f	afs: Make error on cell lookup failure consistent with OpenAFS [ Upstream commit 2a4ca1b4b77850544408595e2433f5d7811a9daa ] When kafs tries to look up a cell in the DNS or the local config, it will translate a lookup failure into EDESTADDRREQ whereas OpenAFS translates it into ENOENT. Applications such as West expect the latter behaviour and fail if they see the former. This can be seen by trying to mount an unknown cell: # mount -t afs %example.com:cell.root /mnt mount: /mnt: mount(2) system call failed: Destination address required. Fixes: 4d673da14533 ("afs: Support the AFS dynamic root") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:07 +01:00
David Howells	790ea5bc40	afs: Fix afs_server_list to be cleaned up with RCU [ Upstream commit e6bace7313d61e31f2b16fa3d774fd8cb3cb869e ] afs_server_list is accessed with the rcu_read_lock() held from volume->servers, so it needs to be cleaned up correctly. Fix this by using kfree_rcu() instead of kfree(). Fixes: 8a070a964877 ("afs: Detect cell aliases 1 - Cells with root volumes") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-12-03 07:32:06 +01:00
Jan Kara	dc4542861e	ext4: properly sync file size update after O_SYNC direct IO commit 91562895f8030cb9a0470b1db49de79346a69f91 upstream. Gao Xiang has reported that on ext4 O_SYNC direct IO does not properly sync file size update and thus if we crash at unfortunate moment, the file can have smaller size although O_SYNC IO has reported successful completion. The problem happens because update of on-disk inode size is handled in ext4_dio_write_iter() after iomap_dio_rw() (and thus dio_complete() in particular) has returned and generic_file_sync() gets called by dio_complete(). Fix the problem by handling on-disk inode size update directly in our ->end_io completion handler. References: https://lore.kernel.org/all/02d18236-26ef-09b0-90ad-030c4fe3ee20@linux.alibaba.com Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com> CC: stable@vger.kernel.org Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure") Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20231013121350.26872-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:22 +00:00
Kemeng Shi	e1d0f68bc0	ext4: add missed brelse in update_backups commit 9adac8b01f4be28acd5838aade42b8daa4f0b642 upstream. add missed brelse in update_backups Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230826174712.4059355-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Kemeng Shi	1793dc461e	ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks commit 40dd7953f4d606c280074f10d23046b6812708ce upstream. Wrong check of gdb backup in meta bg as following: first_group is the first group of meta_bg which contains target group, so target group is always >= first_group. We check if target group has gdb backup by comparing first_group with [group + 1] and [group + EXT4_DESC_PER_BLOCK(sb) - 1]. As group >= first_group, then [group + N] is > first_group. So no copy of gdb backup in meta bg is done in setup_new_flex_group_blocks. No need to do gdb backup copy in meta bg from setup_new_flex_group_blocks as we always copy updated gdb block to backups at end of ext4_flex_group_add as following: ext4_flex_group_add /* no gdb backup copy for meta bg any more / setup_new_flex_group_blocks / update current group number / ext4_update_super sbi->s_groups_count += flex_gd->count; / * if group in meta bg contains backup is added, the primary gdb block * of the meta bg will be copy to backup in new added group here. */ for (; gdb_num <= gdb_num_end; gdb_num++) update_backups(...) In summary, we can remove wrong gdb backup copy code in setup_new_flex_group_blocks. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230826174712.4059355-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Zhang Yi	80ddcf21e7	ext4: correct the start block of counting reserved clusters commit 40ea98396a3659062267d1fe5f99af4f7e4f05e3 upstream. When big allocate feature is enabled, we need to count and update reserved clusters before removing a delayed only extent_status entry. {init\|count\|get}_rsvd() have already done this, but the start block number of this counting isn't correct in the following case. lblk end \| \| v v ------------------------- \| \| orig_es ------------------------- ^ ^ len1 is 0 \| len2 \| If the start block of the orig_es entry founded is bigger than lblk, we passed lblk as start block to count_rsvd(), but the length is correct, finally, the range to be counted is offset. This patch fix this by passing the start blocks to 'orig_es->lblk + len1'. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230824092619.1327976-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Kemeng Shi	ec4ba3d62f	ext4: correct return value of ext4_convert_meta_bg commit 48f1551592c54f7d8e2befc72a99ff4e47f7dca0 upstream. Avoid to ignore error in "err". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230826174712.4059355-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Ojaswin Mujoo	32b9fb9a67	ext4: mark buffer new if it is unwritten to avoid stale data exposure commit 2cd8bdb5efc1e0d5b11a4b7ba6b922fd2736a87f upstream. Short Version In ext4 with dioread_nolock, we could have a scenario where the bh returned by get_blocks (ext4_get_block_unwritten()) in __block_write_begin_int() has UNWRITTEN and MAPPED flag set. Since such a bh does not have NEW flag set we never zero out the range of bh that is not under write, causing whatever stale data is present in the folio at that time to be written out to disk. To fix this mark the buffer as new, in case it is unwritten, in ext4_get_block_unwritten(). Long Version The issue mentioned above was resulting in two different bugs: 1. On block size < page size case in ext4, generic/269 was reliably failing with dioread_nolock. The state of the write was as follows: * The write was extending i_size. * The last block of the file was fallocated and had an unwritten extent * We were near ENOSPC and hence we were switching to non-delayed alloc allocation. In this case, the back trace that triggers the bug is as follows: ext4_da_write_begin() /* switch to nodelalloc due to low space / ext4_write_begin() ext4_should_dioread_nolock() // true since mount flags still have delalloc __block_write_begin(..., ext4_get_block_unwritten) __block_write_begin_int() for(each buffer head in page) { / first iteration, this is bh1 which contains i_size / if (!buffer_mapped) get_block() / returns bh with only UNWRITTEN and MAPPED / / second iteration, bh2 / if (!buffer_mapped) get_block() / we fail here, could be ENOSPC / } if (err) / * this would zero out all new buffers and mark them uptodate. * Since bh1 was never marked new, we skip it here which causes * the bug later. / folio_zero_new_buffers(); / ext4_wrte_begin() error handling / ext4_truncate_failed_write() ext4_truncate() ext4_block_truncate_page() __ext4_block_zero_page_range() if(!buffer_uptodate()) ext4_read_bh_lock() ext4_read_bh() -> ... ext4_submit_bh_wbc() BUG_ON(buffer_unwritten(bh)); / !!! / 2. The second issue is stale data exposure with page size >= blocksize with dioread_nolock. The conditions needed for it to happen are same as the previous issue ie dioread_nolock around ENOSPC condition. The issue is also similar where in __block_write_begin_int() when we call ext4_get_block_unwritten() on the buffer_head and the underlying extent is unwritten, we get an unwritten and mapped buffer head. Since it is not new, we never zero out the partial range which is not under write, thus writing stale data to disk. This can be easily observed with the following reproducer: fallocate -l 4k testfile xfs_io -c "pwrite 2k 2k" testfile # hexdump output will have stale data in from byte 0 to 2k in testfile hexdump -C testfile NOTE: To trigger this, we need dioread_nolock enabled and write happening via ext4_write_begin(), which is usually used when we have -o nodealloc. Since dioread_nolock is disabled with nodelalloc, the only alternate way to call ext4_write_begin() is to ensure that delayed alloc switches to nodelalloc ie ext4_da_write_begin() calls ext4_write_begin(). This will usually happen when ext4 is almost full like the way generic/269 was triggering it in Issue 1 above. This might make the issue harder to hit. Hence, for reliable replication, I used the below patch to temporarily allow dioread_nolock with nodelalloc and then mount the disk with -o nodealloc,dioread_nolock. With this you can hit the stale data issue 100% of times: @@ -508,8 +508,8 @@ static inline int ext4_should_dioread_nolock(struct inode inode) if (ext4_should_journal_data(inode)) return 0; /* temporary fix to prevent generic/422 test failures */ - if (!test_opt(inode->i_sb, DELALLOC)) - return 0; + // if (!test_opt(inode->i_sb, DELALLOC)) + // return 0; return 1; } After applying this patch to mark buffer as NEW, both the above issues are fixed. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/d0ed09d70a9733fbb5349c5c7b125caac186ecdf.1695033645.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Kemeng Shi	f0cc1368fa	ext4: correct offset of gdb backup in non meta_bg group to update_backups commit 31f13421c004a420c0e9d288859c9ea9259ea0cc upstream. Commit 0aeaa2559d6d5 ("ext4: fix corruption when online resizing a 1K bigalloc fs") found that primary superblock's offset in its group is not equal to offset of backup superblock in its group when block size is 1K and bigalloc is enabled. As group descriptor blocks are right after superblock, we can't pass block number of gdb to update_backups for the same reason. The root casue of the issue above is that leading 1K padding block is count as data block offset for primary block while backup block has no padding block offset in its group. Remove padding data block count to fix the issue for gdb backups. For meta_bg case, update_backups treat blk_off as block number, do no conversion in this case. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230826174712.4059355-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Max Kellermann	af075d06b3	ext4: apply umask if ACL support is disabled commit 484fd6c1de13b336806a967908a927cc0356e312 upstream. The function ext4_init_acl() calls posix_acl_create() which is responsible for applying the umask. But without CONFIG_EXT4_FS_POSIX_ACL, ext4_init_acl() is an empty inline function, and nobody applies the umask. This fixes a bug which causes the umask to be ignored with O_TMPFILE on ext4: https://github.com/MusicPlayerDaemon/MPD/issues/558 https://bugs.gentoo.org/show_bug.cgi?id=686142#c3 https://bugzilla.kernel.org/show_bug.cgi?id=203625 Reviewed-by: "J. Bruce Fields" <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/r/20230919081824.1096619-1-max.kellermann@ionos.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:21 +00:00
Mahmoud Adam	1bb61fb790	nfsd: fix file memleak on client_opens_release commit bc1b5acb40201a0746d68a7d7cfc141899937f4f upstream. seq_release should be called to free the allocated seq_file Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Mahmoud Adam <mngyadam@amazon.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:19 +00:00
Su Hui	526dd7540a	f2fs: avoid format-overflow warning commit e0d4e8acb3789c5a8651061fbab62ca24a45c063 upstream. With gcc and W=1 option, there's a warning like this: fs/f2fs/compress.c: In function ‘f2fs_init_page_array_cache’: fs/f2fs/compress.c:1984:47: error: ‘%u’ directive writing between 1 and 7 bytes into a region of size between 5 and 8 [-Werror=format-overflow=] 1984 \| sprintf(slab_name, "f2fs_page_array_entry-%u:%u", MAJOR(dev), MINOR(dev)); \| ^~ String "f2fs_page_array_entry-%u:%u" can up to 35. The first "%u" can up to 4 and the second "%u" can up to 7, so total size is "24 + 4 + 7 = 35". slab_name's size should be 35 rather than 32. Cc: stable@vger.kernel.org Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:19 +00:00
Jaegeuk Kim	6122b72ce5	f2fs: do not return EFSCORRUPTED, but try to run online repair commit 50a472bbc79ff9d5a88be8019a60e936cadf9f13 upstream. If we return the error, there's no way to recover the status as of now, since fsck does not fix the xattr boundary issue. Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:18 +00:00
Naohiro Aota	a0d43e0f7c	btrfs: zoned: wait for data BG to be finished on direct IO allocation commit 776a838f1fa95670c1c1cf7109a898090b473fa3 upstream. Running the fio command below on a ZNS device results in "Resource temporarily unavailable" error. $ sudo fio --name=w --directory=/mnt --filesize=1GB --bs=16MB --numjobs=16 \ --rw=write --ioengine=libaio --iodepth=128 --direct=1 fio: io_u error on file /mnt/w.2.0: Resource temporarily unavailable: write offset=117440512, buflen=16777216 fio: io_u error on file /mnt/w.2.0: Resource temporarily unavailable: write offset=134217728, buflen=16777216 ... This happens because -EAGAIN error returned from btrfs_reserve_extent() called from btrfs_new_extent_direct() is spilling over to the userland. btrfs_reserve_extent() returns -EAGAIN when there is no active zone available. Then, the caller should wait for some other on-going IO to finish a zone and retry the allocation. This logic is already implemented for buffered write in cow_file_range(), but it is missing for the direct IO counterpart. Implement the same logic for it. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 2ce543f47843 ("btrfs: zoned: wait until zone is finished when allocation didn't progress") CC: stable@vger.kernel.org # 6.1+ Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Dave Chinner	9ad4c7f065	xfs: recovery should not clear di_flushiter unconditionally commit 7930d9e103700cde15833638855b750715c12091 upstream. Because on v3 inodes, di_flushiter doesn't exist. It overlaps with zero padding in the inode, except when NREXT64=1 configurations are in use and the zero padding is no longer padding but holds the 64 bit extent counter. This manifests obviously on big endian platforms (e.g. s390) because the log dinode is in host order and the overlap is the LSBs of the extent count field. It is not noticed on little endian machines because the overlap is at the MSB end of the extent count field and we need to get more than 2^^48 extents in the inode before it manifests. i.e. the heat death of the universe will occur before we see the problem in little endian machines. This is a zero-day issue for NREXT64=1 configuraitons on big endian machines. Fix it by only clearing di_flushiter on v2 inodes during recovery. Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") cc: stable@kernel.org # 5.19+ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Shyam Prasad N	209379924a	cifs: do not reset chan_max if multichannel is not supported at mount commit 6e5e64c9477d58e73cb1a0e83eacad1f8df247cf upstream. If the mount command has specified multichannel as a mount option, but multichannel is found to be unsupported by the server at the time of mount, we set chan_max to 1. Which means that the user needs to remount the share if the server starts supporting multichannel. This change removes this reset. What it means is that if the user specified multichannel or max_channels during mount, and at this time, multichannel is not supported, but the server starts supporting it at a later point, the client will be capable of scaling out the number of channels. Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Shyam Prasad N	c9569bfd28	cifs: force interface update before a fresh session setup commit d9a6d78096056a3cb5c5f07a730ab92f2f9ac4e6 upstream. During a session reconnect, it is possible that the server moved to another physical server (happens in case of Azure files). So at this time, force a query of server interfaces again (in case of multichannel session), such that the secondary channels connect to the right IP addresses (possibly updated now). Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Shyam Prasad N	5bdf34ca32	cifs: reconnect helper should set reconnect for the right channel commit c3326a61cdbf3ce1273d9198b6cbf90965d7e029 upstream. We introduced a helper function to be used by non-cifsd threads to mark the connection for reconnect. For multichannel, when only a particular channel needs to be reconnected, this had a bug. This change fixes that by marking that particular channel for reconnect. Fixes: dca65818c80c ("cifs: use a different reconnect helper for non-cifsd threads") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Paulo Alcantara	9eb44db68c	smb: client: fix potential deadlock when releasing mids commit e6322fd177c6885a21dd4609dc5e5c973d1a2eb7 upstream. All release_mid() callers seem to hold a reference of @mid so there is no need to call kref_put(&mid->refcount, __release_mid) under @server->mid_lock spinlock. If they don't, then an use-after-free bug would have occurred anyways. By getting rid of such spinlock also fixes a potential deadlock as shown below CPU 0 CPU 1 ------------------------------------------------------------------ cifs_demultiplex_thread() cifs_debug_data_proc_show() release_mid() spin_lock(&server->mid_lock); spin_lock(&cifs_tcp_ses_lock) spin_lock(&server->mid_lock) __release_mid() smb2_find_smb_tcon() spin_lock(&cifs_tcp_ses_lock) deadlock Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Paulo Alcantara	558817597d	smb: client: fix use-after-free bug in cifs_debug_data_proc_show() commit d328c09ee9f15ee5a26431f5aad7c9239fa85e62 upstream. Skip SMB sessions that are being teared down (e.g. @ses->ses_status == SES_EXITING) in cifs_debug_data_proc_show() to avoid use-after-free in @ses. This fixes the following GPF when reading from /proc/fs/cifs/DebugData while mounting and umounting [ 816.251274] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6d81: 0000 [#1] PREEMPT SMP NOPTI ... [ 816.260138] Call Trace: [ 816.260329] <TASK> [ 816.260499] ? die_addr+0x36/0x90 [ 816.260762] ? exc_general_protection+0x1b3/0x410 [ 816.261126] ? asm_exc_general_protection+0x26/0x30 [ 816.261502] ? cifs_debug_tcon+0xbd/0x240 [cifs] [ 816.261878] ? cifs_debug_tcon+0xab/0x240 [cifs] [ 816.262249] cifs_debug_data_proc_show+0x516/0xdb0 [cifs] [ 816.262689] ? seq_read_iter+0x379/0x470 [ 816.262995] seq_read_iter+0x118/0x470 [ 816.263291] proc_reg_read_iter+0x53/0x90 [ 816.263596] ? srso_alias_return_thunk+0x5/0x7f [ 816.263945] vfs_read+0x201/0x350 [ 816.264211] ksys_read+0x75/0x100 [ 816.264472] do_syscall_64+0x3f/0x90 [ 816.264750] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 816.265135] RIP: 0033:0x7fd5e669d381 Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Steve French	49d0ff613f	smb3: fix caching of ctime on setxattr commit 5923d6686a100c2b4cabd4c2ca9d5a12579c7614 upstream. Fixes xfstest generic/728 which had been failing due to incorrect ctime after setxattr and removexattr Update ctime on successful set of xattr Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Steve French	34828baf81	smb3: fix touch -h of symlink commit 475efd9808a3094944a56240b2711349e433fb66 upstream. For example: touch -h -t 02011200 testfile where testfile is a symlink would not change the timestamp, but touch -t 02011200 testfile does work to change the timestamp of the target Suggested-by: David Howells <dhowells@redhat.com> Reported-by: Micah Veilleux <micah.veilleux@iba-group.com> Closes: https://bugzilla.samba.org/show_bug.cgi?id=14476 Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Steve French	9d96ac07ae	smb3: fix creating FIFOs when mounting with "sfu" mount option commit 72bc63f5e23a38b65ff2a201bdc11401d4223fa9 upstream. Fixes some xfstests including generic/564 and generic/157 The "sfu" mount option can be useful for creating special files (character and block devices in particular) but could not create FIFOs. It did recognize existing empty files with the "system" attribute flag as FIFOs but this is too general, so to support creating FIFOs more safely use a new tag (but the same length as those for char and block devices ie "IntxLNK" and "IntxBLK") "LnxFIFO" to indicate that the file should be treated as a FIFO (when mounted with the "sfu"). For some additional context note that "sfu" followed the way that "Services for Unix" on Windows handled these special files (at least for character and block devices and symlinks), which is different than newer Windows which can handle special files as reparse points (which isn't an option to many servers). Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Jeff Layton	5691e15695	fs: add ctime accessors infrastructure commit 9b6304c1d53745c300b86f202d0dcff395e2d2db upstream. struct timespec64 has unused bits in the tv_nsec field that can be used for other purposes. In future patches, we're going to change how the inode->i_ctime is accessed in certain inodes in order to make use of them. In order to do that safely though, we'll need to eradicate raw accesses of the inode->i_ctime field from the kernel. Add new accessor functions for the ctime that we use to replace them. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Message-Id: <20230705185812.579118-2-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:15 +00:00
Eric Biggers	4f3135e2dd	quota: explicitly forbid quota files from being encrypted commit d3cc1b0be258191d6360c82ea158c2972f8d3991 upstream. Since commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key"), xfstest generic/270 causes a WARNING when run on f2fs with test_dummy_encryption in the mount options: $ kvm-xfstests -c f2fs/encrypt generic/270 [...] WARNING: CPU: 1 PID: 2453 at fs/crypto/keyring.c:240 fscrypt_destroy_keyring+0x1f5/0x260 The cause of the WARNING is that not all encrypted inodes have been evicted before fscrypt_destroy_keyring() is called, which violates an assumption. This happens because the test uses an external quota file, which gets automatically encrypted due to test_dummy_encryption. Encryption of quota files has never really been supported. On ext4, ext4_quota_read() does not decrypt the data, so encrypted quota files are always considered invalid on ext4. On f2fs, f2fs_quota_read() uses the pagecache, so trying to use an encrypted quota file gets farther, resulting in the issue described above being possible. But this was never intended to be possible, and there is no use case for it. Therefore, make the quota support layer explicitly reject using IS_ENCRYPTED inodes when quotaon is attempted. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230905003227.326998-1-ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:13 +00:00
Zhihao Cheng	ed3cc4f3ca	jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev commit 61187fce8600e8ef90e601be84f9d0f3222c1206 upstream. JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), however, other process could intercept the EIO information from bdev's mapping, which leads journal recovering successful even EIO occurs during data written back to fs device. We found this problem in our product, iscsi + multipath is chosen for block device of ext4. Unstable network may trigger kpartx to rescan partitions in device mapper layer. Detailed process is shown as following: mount kpartx irq jbd2_journal_recover do_one_pass memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal mark_buffer_dirty // mark bh dirty vfs_read generic_file_read_iter // dio filemap_write_and_wait_range __filemap_fdatawrite_range do_writepages block_write_full_folio submit_bh_wbc >> EIO occurs in disk << end_buffer_async_write mark_buffer_write_io_error mapping_set_error set_bit(AS_EIO, &mapping->flags) // set! filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // clear! err2 = sync_blockdev filemap_write_and_wait filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // false err2 = 0 Filesystem is mounted successfully even data from journal is failed written into disk, and ext4/ocfs2 could become corrupted. Fix it by comparing the wb_err state in fs block device before recovering and after recovering. A reproducer can be found in the kernel bugzilla referenced below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230919012525.1783108-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:13 +00:00
Mimi Zohar	143f450c6c	ima: detect changes to the backing overlay file commit b836c4d29f2744200b2af41e14bf50758dddc818 upstream. Commit 18b44bc5a672 ("ovl: Always reevaluate the file signature for IMA") forced signature re-evaulation on every file access. Instead of always re-evaluating the file's integrity, detect a change to the backing file, by comparing the cached file metadata with the backing file's metadata. Verifying just the i_version has not changed is insufficient. In addition save and compare the i_ino and s_dev as well. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Eric Snowberg <eric.snowberg@oracle.com> Tested-by: Raul E Rangel <rrangel@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:12 +00:00
Josef Bacik	e866ef947a	btrfs: don't arbitrarily slow down delalloc if we're committing commit 11aeb97b45ad2e0040cbb2a589bc403152526345 upstream. We have a random schedule_timeout() if the current transaction is committing, which seems to be a holdover from the original delalloc reservation code. Remove this, we have the proper flushing stuff, we shouldn't be hoping for random timing things to make everything work. This just induces latency for no reason. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:12 +00:00
Namjae Jeon	8387c94d73	ksmbd: fix slab out of bounds write in smb_inherit_dacl() commit eebff19acaa35820cb09ce2ccb3d21bee2156ffb upstream. slab out-of-bounds write is caused by that offsets is bigger than pntsd allocation size. This patch add the check to validate 3 offsets using allocation size. Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-22271 Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:11 +00:00
Namjae Jeon	482aaa72f9	ksmbd: handle malformed smb1 message commit 5a5409d90bd05f87fe5623a749ccfbf3f7c7d400 upstream. If set_smb1_rsp_status() is not implemented, It will cause NULL pointer dereferece error when client send malformed smb1 message. This patch add set_smb1_rsp_status() to ignore malformed smb1 message. Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-11-28 17:07:10 +00:00

... 4 5 6 7 8 ...

80139 Commits