linux

iv/linux

Author	SHA1	Message	Date
Al Viro	0c8c205381	new helper: lookup_positive_unlocked() [ Upstream commit 6c2d4798a8d16cf4f3a28c3cd4af4f1dcbbb4d04 ] Most of the callers of lookup_one_len_unlocked() treat negatives are ERR_PTR(-ENOENT). Provide a helper that would do just that. Note that a pinned positive dentry remains positive - it's ->d_inode is stable, etc.; a pinned _negative_ dentry can become positive at any point as long as you are not holding its parent at least shared. So using lookup_one_len_unlocked() needs to be careful; lookup_positive_unlocked() is safer and that's what the callers end up open-coding anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: 0d5a4f8f775f ("fs: Fix error checking for d_hash_and_lookup()") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-23 10:59:40 +02:00
Wen Yang	0a2b1eb8a9	eventfd: prevent underflow for eventfd semaphores [ Upstream commit 758b492047816a3158d027e9fca660bc5bcf20bf ] For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. An underflow can happen with EFD_SEMAPHORE eventfds in at least the following three subsystems: (1) virt/kvm/eventfd.c (2) drivers/vfio/virqfd.c (3) drivers/virt/acrn/irqfd.c where (2) and (3) are just modeled after (1). An eventfd must be specified for use with the KVM_IRQFD ioctl(). This can also be an EFD_SEMAPHORE eventfd. When the eventfd count is zero or has been decremented to zero an underflow can be triggered when the irqfd is shut down by raising the KVM_IRQFD_FLAG_DEASSIGN flag in the KVM_IRQFD ioctl(): // ctx->count == 0 kvm_vm_ioctl() -> kvm_irqfd() -> kvm_irqfd_deassign() -> irqfd_deactivate() -> irqfd_shutdown() -> eventfd_ctx_remove_wait_queue(&cnt) -> eventfd_ctx_do_read(&cnt) Userspace polling on the eventfd wouldn't notice the underflow because 1 is always returned as the value from eventfd_read() while ctx->count would've underflowed. It's not a huge deal because this should only be happening when the irqfd is shutdown but we should still fix it and avoid the spurious wakeup. Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dylan Yudaken <dylany@fb.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <tencent_7588DFD1F365950A757310D764517A14B306@qq.com> [brauner: rewrite commit message and add explanation how this underflow can happen] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-09-23 10:59:40 +02:00
David Woodhouse	3e9617d63e	eventfd: Export eventfd_ctx_do_read() [ Upstream commit 28f1326710555bbe666f64452d08f2d7dd657cae ] Where events are consumed in the kernel, for example by KVM's irqfd_wakeup() and VFIO's virqfd_wakeup(), they currently lack a mechanism to drain the eventfd's counter. Since the wait queue is already locked while the wakeup functions are invoked, all they really need to do is call eventfd_ctx_do_read(). Add a check for the lock, and export it for them. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20201027135523.646811-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Stable-dep-of: 758b49204781 ("eventfd: prevent underflow for eventfd semaphores") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-09-23 10:59:40 +02:00
Matthew Wilcox	f59ff66698	reiserfs: Check the return value from __getblk() [ Upstream commit ba38980add7ffc9e674ada5b4ded4e7d14e76581 ] __getblk() can return a NULL pointer if we run out of memory or if we try to access beyond the end of the device; check it and handle it appropriately. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/lkml/CAFcO6XOacq3hscbXevPQP7sXRoYFz34ZdKPYjmd6k5sZuhGFDw@mail.gmail.com/ Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") # probably introduced in 2002 Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-09-23 10:59:40 +02:00
Jan Kara	b36c4a731a	udf: Handle error when adding extent to a file commit 19fd80de0a8b5170ef34704c8984cca920dffa59 upstream. When adding extent to a file fails, so far we've silently squelshed the error. Make sure to propagate it up properly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-23 10:59:40 +02:00
Vladislav Efanov	7648ea9896	udf: Check consistency of Space Bitmap Descriptor commit 1e0d4adf17e7ef03281d7b16555e7c1508c8ed2d upstream. Bits, which are related to Bitmap Descriptor logical blocks, are not reset when buffer headers are allocated for them. As the result, these logical blocks can be treated as free and be used for other blocks.This can cause usage of one buffer header for several types of data. UDF issues WARNING in this situation: WARNING: CPU: 0 PID: 2703 at fs/udf/inode.c:2014 __udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014 RIP: 0010:__udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014 Call Trace: udf_setup_indirect_aext+0x573/0x880 fs/udf/inode.c:1980 udf_add_aext+0x208/0x2e0 fs/udf/inode.c:2067 udf_insert_aext fs/udf/inode.c:2233 [inline] udf_update_extents fs/udf/inode.c:1181 [inline] inode_getblk+0x1981/0x3b70 fs/udf/inode.c:885 Found by Linux Verification Center (linuxtesting.org) with syzkaller. [JK: Somewhat cleaned up the boundary checks] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-23 10:59:40 +02:00
Shyam Prasad N	107f5cad23	cifs: add a warning when the in-flight count goes negative [ Upstream commit e4645cc2f1e2d6f268bb8dcfac40997c52432aed ] We've seen the in-flight count go into negative with some internal stress testing in Microsoft. Adding a WARN when this happens, in hope of understanding why this happens when it happens. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-09-23 10:59:39 +02:00
Winston Wen	b796adfc98	fs/nls: make load_nls() take a const parameter [ Upstream commit c1ed39ec116272935528ca9b348b8ee79b0791da ] load_nls() take a char * parameter, use it to find nls module in list or construct the module name to load it. This change make load_nls() take a const parameter, so we don't need do some cast like this: ses->local_nls = load_nls((char *)ctx->local_nls->charset); Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Winston Wen <wentao@uniontech.com> Reviewed-by: Paulo Alcantara <pc@manguebit.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-09-23 10:59:38 +02:00
Ryusuke Konishi	99a73016a5	nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse commit cdaac8e7e5a059f9b5e816cda257f08d0abffacd upstream. A syzbot stress test using a corrupted disk image reported that mark_buffer_dirty() called from __nilfs_mark_inode_dirty() or nilfs_palloc_commit_alloc_entry() may output a kernel warning, and can panic if the kernel is booted with panic_on_warn. This is because nilfs2 keeps buffer pointers in local structures for some metadata and reuses them, but such buffers may be forcibly discarded by nilfs_clear_dirty_page() in some critical situations. This issue is reported to appear after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only"), but the issue has potentially existed before. Fix this issue by checking the uptodate flag when attempting to reuse an internally held buffer, and reloading the metadata instead of reusing the buffer if the flag was lost. Link: https://lkml.kernel.org/r/20230818131804.7758-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+cdfcae656bac88ba0e2d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000003da75f05fdeffd12@google.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-23 10:59:37 +02:00
Ryusuke Konishi	724474dfaa	nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers() commit f83913f8c5b882a312e72b7669762f8a5c9385e4 upstream. A syzbot stress test reported that create_empty_buffers() called from nilfs_lookup_dirty_data_buffers() can cause a general protection fault. Analysis using its reproducer revealed that the back reference "mapping" from a page/folio has been changed to NULL after dirty page/folio gang lookup in nilfs_lookup_dirty_data_buffers(). Fix this issue by excluding pages/folios from being collected if, after acquiring a lock on each page/folio, its back reference "mapping" differs from the pointer to the address space struct that held the page/folio. Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0ad741797f4565e7e2d2@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-23 10:59:37 +02:00
Gao Xiang	e83f5d13cb	erofs: ensure that the post-EOF tails are all zeroed commit e4c1cf523d820730a86cae2c6d55924833b6f7ac upstream. This was accidentally fixed up in commit e4c1cf523d82 but we can't take the full change due to other dependancy issues, so here is just the actual bugfix that is needed. [Background] keltargw reported an issue [1] that with mmaped I/Os, sometimes the tail of the last page (after file ends) is not filled with zeroes. The root cause is that such tail page could be wrongly selected for inplace I/Os so the zeroed part will then be filled with compressed data instead of zeroes. A simple fix is to avoid doing inplace I/Os for such tail parts, actually that was already fixed upstream in commit e4c1cf523d82 ("erofs: tidy up z_erofs_do_read_page()") by accident. [1] https://lore.kernel.org/r/3ad8b469-25db-a297-21f9-75db2d6ad224@linux.alibaba.com Reported-by: keltargw <keltar.gw@gmail.com> Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-23 10:59:36 +02:00
Benjamin Coddington	3b83759fd4	nfsd: Fix race to FREE_STATEID and cl_revoked commit 3b816601e279756e781e6c4d9b3f3bd21a72ac67 upstream. We have some reports of linux NFS clients that cannot satisfy a linux knfsd server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though those clients repeatedly walk all their known state using TEST_STATEID and receive NFS4_OK for all. Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then nfsd4_free_stateid() finds the delegation and returns NFS4_OK to FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to cl_revoked. This would produce the observed client/server effect. Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID and move to cl_revoked happens within the same cl_lock. This will allow nfsd4_free_stateid() to properly remove the delegation from cl_revoked. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575 Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v4.17+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-30 16:27:25 +02:00
Benjamin Coddington	a0bc5cf2e7	NFSv4: Fix dropped lock for racing OPEN and delegation return commit 1cbc11aaa01f80577b67ae02c73ee781112125fd upstream. Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. This patch expects that commit to be reverted. The problem as originally explained in the above commit is: There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024". Fix this by using the old_stateid refresh helpers if the server replies with OLD_STATEID. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-30 16:27:24 +02:00
Alexander Aring	7b57fc3f4c	fs: dlm: fix mismatch of plock results from userspace [ Upstream commit 57e2c2f2d94cfd551af91cedfa1af6d972487197 ] When a waiting plock request (F_SETLKW) is sent to userspace for processing (dlm_controld), the result is returned at a later time. That result could be incorrectly matched to a different waiting request in cases where the owner field is the same (e.g. different threads in a process.) This is fixed by comparing all the properties in the request and reply. The results for non-waiting plock requests are now matched based on list order because the results are returned in the same order they were sent. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:21 +02:00
Alexander Aring	721d5b514d	fs: dlm: use dlm_plock_info for do_unlock_close [ Upstream commit 4d413ae9ced4180c0e2114553c3a7560b509b0f8 ] This patch refactors do_unlock_close() by using only struct dlm_plock_info as a parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:21 +02:00
Alexander Aring	da794f6dd5	fs: dlm: change plock interrupted message to debug again [ Upstream commit ea06d4cabf529eefbe7e89e3a8325f1f89355ccd ] This patch reverses the commit bcfad4265ced ("dlm: improve plock logging if interrupted") by moving it to debug level and notifying the user an op was removed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:20 +02:00
Alexander Aring	f03726ef19	fs: dlm: add pid to debug log [ Upstream commit 19d7ca051d303622c423b4cb39e6bde5d177328b ] This patch adds the pid information which requested the lock operation to the debug log output. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:20 +02:00
Jakob Koschel	8b73497e50	dlm: replace usage of found with dedicated list iterator variable [ Upstream commit dc1acd5c94699389a9ed023e94dd860c846ea1f6 ] To move the list iterator variable into the list_for_each_entry_() macro in the future it should be avoided to use the list iterator variable after the loop body. To never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:20 +02:00
Alexander Aring	526cc04d71	dlm: improve plock logging if interrupted [ Upstream commit bcfad4265cedf3adcac355e994ef9771b78407bd ] This patch changes the log level if a plock is removed when interrupted from debug to info. Additional it signals now that the plock entity was removed to let the user know what's happening. If on a dev_write() a pending plock cannot be find it will signal that it might have been removed because wait interruption. Before this patch there might be a "dev_write no op ..." info message and the users can only guess that the plock was removed before because the wait interruption. To be sure that is the case we log both messages on the same log level. Let both message be logged on info layer because it should not happened a lot and if it happens it should be clear why the op was not found. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:20 +02:00
Russell Harmon via samba-technical	4259dd5342	cifs: Release folio lock on fscache read hit. commit 69513dd669e243928f7450893190915a88f84a2b upstream. Under the current code, when cifs_readpage_worker is called, the call contract is that the callee should unlock the page. This is documented in the read_folio section of Documentation/filesystems/vfs.rst as: > The filesystem should unlock the folio once the read has completed, > whether it was successful or not. Without this change, when fscache is in use and cache hit occurs during a read, the page lock is leaked, producing the following stack on subsequent reads (via mmap) to the page: $ cat /proc/3890/task/12864/stack [<0>] folio_wait_bit_common+0x124/0x350 [<0>] filemap_read_folio+0xad/0xf0 [<0>] filemap_fault+0x8b1/0xab0 [<0>] __do_fault+0x39/0x150 [<0>] do_fault+0x25c/0x3e0 [<0>] __handle_mm_fault+0x6ca/0xc70 [<0>] handle_mm_fault+0xe9/0x350 [<0>] do_user_addr_fault+0x225/0x6c0 [<0>] exc_page_fault+0x84/0x1b0 [<0>] asm_exc_page_fault+0x27/0x30 This requires a reboot to resolve; it is a deadlock. Note however that the call to cifs_readpage_from_fscache does mark the page clean, but does not free the folio lock. This happens in __cifs_readpage_from_fscache on success. Releasing the lock at that point however is not appropriate as cifs_readahead also calls cifs_readpage_from_fscache and does unconditionally release the lock after its return. This change therefore effectively makes cifs_readpage_worker work like cifs_readahead. Signed-off-by: Russell Harmon <russ@har.mn> Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-30 16:27:19 +02:00
xiaoshoukui	a0a462a0f2	btrfs: fix BUG_ON condition in btrfs_cancel_balance commit 29eefa6d0d07e185f7bfe9576f91e6dba98189c2 upstream. Pausing and canceling balance can race to interrupt balance lead to BUG_ON panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance does not take this race scenario into account. However, the race condition has no other side effects. We can fix that. Reproducing it with panic trace like this: kernel BUG at fs/btrfs/volumes.c:4618! RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0 Call Trace: <TASK> ? do_nanosleep+0x60/0x120 ? hrtimer_nanosleep+0xb7/0x1a0 ? sched_core_clone_cookie+0x70/0x70 btrfs_ioctl_balance_ctl+0x55/0x70 btrfs_ioctl+0xa46/0xd20 __x64_sys_ioctl+0x7d/0xa0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Race scenario as follows: > mutex_unlock(&fs_info->balance_mutex); > -------------------- > .......issue pause and cancel req in another thread > -------------------- > ret = __btrfs_balance(fs_info); > > mutex_lock(&fs_info->balance_mutex); > if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) { > btrfs_info(fs_info, "balance: paused"); > btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED); > } CC: stable@vger.kernel.org # 4.19+ Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-30 16:27:15 +02:00
Trond Myklebust	12c4c22789	nfsd: Remove incorrect check in nfsd4_validate_stateid [ Upstream commit f75546f58a70da5cfdcec5a45ffc377885ccbee8 ] If the client is calling TEST_STATEID, then it is because some event occurred that requires it to check all the stateids for validity and call FREE_STATEID on the ones that have been revoked. In this case, either the stateid exists in the list of stateids associated with that nfs4_client, in which case it should be tested, or it does not. There are no additional conditions to be considered. Reported-by: "Frank Ch. Eigler" <fche@redhat.com> Fixes: 7df302f75ee2 ("NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:14 +02:00
J. Bruce Fields	a4e3c4cd02	nfsd4: kill warnings on testing stateids with mismatched clientids [ Upstream commit 663e36f07666ff924012defa521f88875f6e5402 ] It's normal for a client to test a stateid from a previous instance, e.g. after a network partition. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: f75546f58a70 ("nfsd: Remove incorrect check in nfsd4_validate_stateid") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:14 +02:00
Tuo Li	b4a7ab57ef	gfs2: Fix possible data races in gfs2_show_options() [ Upstream commit 6fa0a72cbbe45db4ed967a51f9e6f4e3afe61d20 ] Some fields such as gt_logd_secs of the struct gfs2_tune are accessed without holding the lock gt_spin in gfs2_show_options(): val = sdp->sd_tune.gt_logd_secs; if (val != 30) seq_printf(s, ",commit=%d", val); And thus can cause data races when gfs2_show_options() and other functions such as gfs2_reconfigure() are concurrently executed: spin_lock(&gt->gt_spin); gt->gt_logd_secs = newargs->ar_commit; To fix these possible data races, the lock sdp->sd_tune.gt_spin is acquired before accessing the fields of gfs2_tune and released after these accesses. Further changes by Andreas: - Don't hold the spin lock over the seq_printf operations. Reported-by: BassCheck <bass@buaa.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:10 +02:00
Immad Mir	2a8807f9f5	FS: JFS: Check for read-only mounted filesystem in txBegin [ Upstream commit 95e2b352c03b0a86c5717ba1d24ea20969abcacc ] This patch adds a check for read-only mounted filesystem in txBegin before starting a transaction potentially saving from NULL pointer deref. Signed-off-by: Immad Mir <mirimmad17@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:10 +02:00
Immad Mir	a7d17d6bd7	FS: JFS: Fix null-ptr-deref Read in txBegin [ Upstream commit 47cfdc338d674d38f4b2f22b7612cc6a2763ba27 ] Syzkaller reported an issue where txBegin may be called on a superblock in a read-only mounted filesystem which leads to NULL pointer deref. This could be solved by checking if the filesystem is read-only before calling txBegin, and returning with appropiate error code. Reported-By: syzbot+f1faa20eec55e0c8644c@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=be7e52c50c5182cc09a09ea6fc456446b2039de3 Signed-off-by: Immad Mir <mirimmad17@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:10 +02:00
Yogesh	6e7d9d76e5	fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev [ Upstream commit 4e302336d5ca1767a06beee7596a72d3bdc8d983 ] Syzkaller reported the following issue: UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:1965:6 index -84 is out of range for type 's8[341]' (aka 'signed char[341]') CPU: 1 PID: 4995 Comm: syz-executor146 Not tainted 6.4.0-rc6-syzkaller-00037-gb6dad5178cea #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 dbAllocDmapLev+0x3e5/0x430 fs/jfs/jfs_dmap.c:1965 dbAllocCtl+0x113/0x920 fs/jfs/jfs_dmap.c:1809 dbAllocAG+0x28f/0x10b0 fs/jfs/jfs_dmap.c:1350 dbAlloc+0x658/0xca0 fs/jfs/jfs_dmap.c:874 dtSplitUp fs/jfs/jfs_dtree.c:974 [inline] dtInsert+0xda7/0x6b00 fs/jfs/jfs_dtree.c:863 jfs_create+0x7b6/0xbb0 fs/jfs/namei.c:137 lookup_open fs/namei.c:3492 [inline] open_last_lookups fs/namei.c:3560 [inline] path_openat+0x13df/0x3170 fs/namei.c:3788 do_filp_open+0x234/0x490 fs/namei.c:3818 do_sys_openat2+0x13f/0x500 fs/open.c:1356 do_sys_open fs/open.c:1372 [inline] __do_sys_openat fs/open.c:1388 [inline] __se_sys_openat fs/open.c:1383 [inline] __x64_sys_openat+0x247/0x290 fs/open.c:1383 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f1f4e33f7e9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc21129578 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f4e33f7e9 RDX: 000000000000275a RSI: 0000000020000040 RDI: 00000000ffffff9c RBP: 00007f1f4e2ff080 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1f4e2ff110 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The bug occurs when the dbAllocDmapLev()function attempts to access dp->tree.stree[leafidx + LEAFIND] while the leafidx value is negative. To rectify this, the patch introduces a safeguard within the dbAllocDmapLev() function. A check has been added to verify if leafidx is negative. If it is, the function immediately returns an I/O error, preventing any further execution that could potentially cause harm. Tested via syzbot. Reported-by: syzbot+853a6f4dfa3cf37d3aea@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=ae2f5a27a07ae44b0f17 Signed-off-by: Yogesh <yogi.kernel@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:09 +02:00
Jan Kara	3f1368af47	udf: Fix uninitialized array access for some pathnames [ Upstream commit 028f6055c912588e6f72722d89c30b401bbcf013 ] For filenames that begin with . and are between 2 and 5 characters long, UDF charset conversion code would read uninitialized memory in the output buffer. The only practical impact is that the name may be prepended a "unification hash" when it is not actually needed but still it is good to fix this. Reported-by: syzbot+cd311b1e43cc25f90d18@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000e2638a05fe9dc8f9@google.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:09 +02:00
Christian Brauner	8f203dd401	ovl: check type and offset of struct vfsmount in ovl_entry [ Upstream commit f723edb8a532cd26e1ff0a2b271d73762d48f762 ] Porting overlayfs to the new amount api I started experiencing random crashes that couldn't be explained easily. So after much debugging and reasoning it became clear that struct ovl_entry requires the point to struct vfsmount to be the first member and of type struct vfsmount. During the port I added a new member at the beginning of struct ovl_entry which broke all over the place in the form of random crashes and cache corruptions. While there's a comment in ovl_free_fs() to the effect of "Hack! Reuse ofs->layers as a vfsmount array before freeing it" there's no such comment on struct ovl_entry which makes this easy to trip over. Add a comment and two static asserts for both the offset and the type of pointer in struct ovl_entry. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:09 +02:00
Ye Bin	3f378783c4	quota: fix warning in dqgrab() [ Upstream commit d6a95db3c7ad160bc16b89e36449705309b52bcb ] There's issue as follows when do fault injection: WARNING: CPU: 1 PID: 14870 at include/linux/quotaops.h:51 dquot_disable+0x13b7/0x18c0 Modules linked in: CPU: 1 PID: 14870 Comm: fsconfig Not tainted 6.3.0-next-20230505-00006-g5107a9c821af-dirty #541 RIP: 0010:dquot_disable+0x13b7/0x18c0 RSP: 0018:ffffc9000acc79e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88825e41b980 RDX: 0000000000000000 RSI: ffff88825e41b980 RDI: 0000000000000002 RBP: ffff888179f68000 R08: ffffffff82087ca7 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed102f3ed026 R12: ffff888179f68130 R13: ffff888179f68110 R14: dffffc0000000000 R15: ffff888179f68118 FS: 00007f450a073740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe96f2efd8 CR3: 000000025c8ad000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dquot_load_quota_sb+0xd53/0x1060 dquot_resume+0x172/0x230 ext4_reconfigure+0x1dc6/0x27b0 reconfigure_super+0x515/0xa90 __x64_sys_fsconfig+0xb19/0xd20 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happens as follows: ProcessA ProcessB ProcessC sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_suspend -> suspend all type quota sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_resume ret = dquot_load_quota_sb add_dquot_ref do_open -> open file O_RDWR vfs_open do_dentry_open get_write_access atomic_inc_unless_negative(&inode->i_writecount) ext4_file_open dquot_file_open dquot_initialize __dquot_initialize dqget atomic_inc(&dquot->dq_count); __dquot_initialize __dquot_initialize dqget if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) ext4_acquire_dquot -> Return error DQ_ACTIVE_B flag isn't set dquot_disable invalidate_dquots if (atomic_read(&dquot->dq_count)) dqgrab WARN_ON_ONCE(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) -> Trigger warning In the above scenario, 'dquot->dq_flags' has no DQ_ACTIVE_B is normal when dqgrab(). To solve above issue just replace the dqgrab() use in invalidate_dquots() with atomic_inc(&dquot->dq_count). Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-3-yebin10@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:09 +02:00
Jan Kara	c3a1f5ba11	quota: Properly disable quotas when add_dquot_ref() fails [ Upstream commit 6a4e3363792e30177cc3965697e34ddcea8b900b ] When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want to disable quotas we are trying to enable. However dquot_disable() call was passed just the flags we are enabling so in case flags == DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL instead of properly disabling quotas. Fix the problem by always passing DQUOT_LIMITS_ENABLED \| DQUOT_USAGE_ENABLED to dquot_disable() in this case. Reported-and-tested-by: Ye Bin <yebin10@huawei.com> Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-2-yebin10@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-30 16:27:09 +02:00
Josef Bacik	756c024698	btrfs: set cache_block_group_error if we find an error commit 92fb94b69c6accf1e49fff699640fa0ce03dc910 upstream. We set cache_block_group_error if btrfs_cache_block_group() returns an error, this is because we could end up not finding space to allocate and mistakenly return -ENOSPC, and which could then abort the transaction with the incorrect errno, and in the case of ENOSPC result in a WARN_ON() that will trip up tests like generic/475. However there's the case where multiple threads can be racing, one thread gets the proper error, and the other thread doesn't actually call btrfs_cache_block_group(), it instead sees ->cached == BTRFS_CACHE_ERROR. Again the result is the same, we fail to allocate our space and return -ENOSPC. Instead we need to set cache_block_group_error to -EIO in this case to make sure that if we do not make our allocation we get the appropriate error returned back to the caller. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:24 +02:00
Christoph Hellwig	fa7bc2684a	btrfs: don't stop integrity writeback too early commit effa24f689ce0948f68c754991a445a8d697d3a8 upstream. extent_write_cache_pages stops writing pages as soon as nr_to_write hits zero. That is the right thing for opportunistic writeback, but incorrect for data integrity writeback, which needs to ensure that no dirty pages are left in the range. Thus only stop the writeback for WB_SYNC_NONE if nr_to_write hits 0. This is a port of write_cache_pages changes in commit 05fe478dd04e ("mm: write_cache_pages integrity fix"). Note that I've only trigger the problem with other changes to the btrfs writeback code, but this condition seems worthwhile fixing anyway. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ updated comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:24 +02:00
Ryusuke Konishi	d2c539c216	nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput commit f8654743a0e6909dc634cbfad6db6816f10f3399 upstream. During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously, nilfs_evict_inode() could cause use-after-free read for nilfs_root if inodes are left in "garbage_list" and released by nilfs_dispose_list at the end of nilfs_detach_log_writer(), and this bug was fixed by commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"). However, it turned out that there is another possibility of UAF in the call path where mark_inode_dirty_sync() is called from iput(): nilfs_detach_log_writer() nilfs_dispose_list() iput() mark_inode_dirty_sync() __mark_inode_dirty() nilfs_dirty_inode() __nilfs_mark_inode_dirty() nilfs_load_inode_block() --> causes UAF of nilfs_root struct This can happen after commit 0ae45f63d4ef ("vfs: add support for a lazytime mount option"), which changed iput() to call mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME flag and i_nlink is non-zero. This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only") when using the syzbot reproducer, but the issue has potentially existed before. Fix this issue by adding a "purging flag" to the nilfs structure, setting that flag while disposing the "garbage_list" and checking it in __nilfs_mark_inode_dirty(). Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"), this patch does not rely on ns_writer to determine whether to skip operations, so as not to break recovery on mount. The nilfs_salvage_orphan_logs routine dirties the buffer of salvaged data before attaching the log writer, so changing __nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL will cause recovery write to fail. The purpose of using the cleanup-only flag is to allow for narrowing of such conditions. Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 4.0+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:23 +02:00
Xiubo Li	72c946246e	ceph: defer stopping mdsc delayed_work [ Upstream commit e7e607bd00481745550389a29ecabe33e13d67cf ] Flushing the dirty buffer may take a long time if the cluster is overloaded or if there is network issue. So we should ping the MDSs periodically to keep alive, else the MDS will blocklist the kclient. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61843 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:54:01 +02:00
Jeff Layton	f82fe11a30	ceph: use kill_anon_super helper [ Upstream commit 470a5c77eac0e07bfe60413fb3d314b734392bc3 ] ceph open-codes this around some other activity and the rationale for it isn't clear. There is no need to delay free_anon_bdev until the end of kill_sb. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Stable-dep-of: e7e607bd0048 ("ceph: defer stopping mdsc delayed_work") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:54:01 +02:00
Jeff Layton	82edffead5	ceph: show tasks waiting on caps in debugfs caps file [ Upstream commit 3a3430affce5de301fc8e6e50fa3543d7597820e ] Add some visibility of tasks that are waiting for caps to the "caps" debugfs file. Display the tgid of the waiting task, inode number, and the caps the task needs and wants. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Stable-dep-of: e7e607bd0048 ("ceph: defer stopping mdsc delayed_work") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:54:00 +02:00
Jan Kara	040cdadf9f	ext2: Drop fragment support commit 404615d7f1dcd4cca200e9a7a9df3a1dcae1dd62 upstream. Ext2 has fields in superblock reserved for subblock allocation support. However that never landed. Drop the many years dead code. Reported-by: syzbot+af5e10f73dbff48f70af@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:59 +02:00
Jan Kara	0336b42456	fs: Protect reconfiguration of sb read-write from racing writes commit c541dce86c537714b6761a79a969c1623dfa222b upstream. The reconfigure / remount code takes a lot of effort to protect filesystem's reconfiguration code from racing writes on remounting read-only. However during remounting read-only filesystem to read-write mode userspace writes can start immediately once we clear SB_RDONLY flag. This is inconvenient for example for ext4 because we need to do some writes to the filesystem (such as preparation of quota files) before we can take userspace writes so we are clearing SB_RDONLY flag before we are fully ready to accept userpace writes and syzbot has found a way to exploit this [1]. Also as far as I'm reading the code the filesystem remount code was protected from racing writes in the legacy mount path by the mount's MNT_READONLY flag so this is relatively new problem. It is actually fairly easy to protect remount read-write from racing writes using sb->s_readonly_remount flag so let's just do that instead of having to workaround these races in the filesystem code. [1] https://lore.kernel.org/all/00000000000006a0df05f6667499@google.com/T/ Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230615113848.8439-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:59 +02:00
Prince Kumar Maurya	0a44ceba77	fs/sysv: Null check to prevent null-ptr-deref bug commit ea2b62f305893992156a798f665847e0663c9f41 upstream. sb_getblk(inode->i_sb, parent) return a null ptr and taking lock on that leads to the null-ptr-deref bug. Reported-by: syzbot+aad58150cbc64ba41bdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aad58150cbc64ba41bdc Signed-off-by: Prince Kumar Maurya <princekumarmaurya06@gmail.com> Message-Id: <20230531013141.19487-1-princekumarmaurya06@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:59 +02:00
Filipe Manana	0e0f324c25	btrfs: fix race between quota disable and quota assign ioctls commit 2f1a6be12ab6c8470d5776e68644726c94257c54 upstream. The quota assign ioctl can currently run in parallel with a quota disable ioctl call. The assign ioctl uses the quota root, while the disable ioctl frees that root, and therefore we can have a use-after-free triggered in the assign ioctl, leading to a trace like the following when KASAN is enabled: [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0 [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736 [672.724][T736] [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37 [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [672.727][T736] Call Trace: [672.728][T736] <TASK> [672.728][T736] dump_stack_lvl+0xd9/0x150 [672.725][T736] print_report+0xc1/0x5e0 [672.720][T736] ? __virt_addr_valid+0x61/0x2e0 [672.727][T736] ? __phys_addr+0xc9/0x150 [672.725][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.722][T736] kasan_report+0xc0/0xf0 [672.729][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.724][T736] btrfs_search_slot+0x2962/0x2db0 [672.723][T736] ? fs_reclaim_acquire+0xba/0x160 [672.722][T736] ? split_leaf+0x13d0/0x13d0 [672.726][T736] ? rcu_is_watching+0x12/0xb0 [672.723][T736] ? kmem_cache_alloc+0x338/0x3c0 [672.722][T736] update_qgroup_status_item+0xf7/0x320 [672.724][T736] ? add_qgroup_rb+0x3d0/0x3d0 [672.739][T736] ? do_raw_spin_lock+0x12d/0x2b0 [672.730][T736] ? spin_bug+0x1d0/0x1d0 [672.737][T736] btrfs_run_qgroups+0x5de/0x840 [672.730][T736] ? btrfs_qgroup_rescan_worker+0xa70/0xa70 [672.738][T736] ? __del_qgroup_relation+0x4ba/0xe00 [672.738][T736] btrfs_ioctl+0x3d58/0x5d80 [672.735][T736] ? tomoyo_path_number_perm+0x16a/0x550 [672.737][T736] ? tomoyo_execute_permission+0x4a0/0x4a0 [672.731][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.737][T736] ? __sanitizer_cov_trace_switch+0x54/0x90 [672.734][T736] ? do_vfs_ioctl+0x132/0x1660 [672.730][T736] ? vfs_fileattr_set+0xc40/0xc40 [672.730][T736] ? _raw_spin_unlock_irq+0x2e/0x50 [672.732][T736] ? sigprocmask+0xf2/0x340 [672.737][T736] ? __fget_files+0x26a/0x480 [672.732][T736] ? bpf_lsm_file_ioctl+0x9/0x10 [672.738][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.736][T736] __x64_sys_ioctl+0x198/0x210 [672.736][T736] do_syscall_64+0x39/0xb0 [672.731][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.739][T736] RIP: 0033:0x4556ad [672.742][T736] </TASK> [672.743][T736] [672.748][T736] Allocated by task 27677: [672.743][T736] kasan_save_stack+0x22/0x40 [672.741][T736] kasan_set_track+0x25/0x30 [672.741][T736] __kasan_kmalloc+0xa4/0xb0 [672.749][T736] btrfs_alloc_root+0x48/0x90 [672.746][T736] btrfs_create_tree+0x146/0xa20 [672.744][T736] btrfs_quota_enable+0x461/0x1d20 [672.743][T736] btrfs_ioctl+0x4a1c/0x5d80 [672.747][T736] __x64_sys_ioctl+0x198/0x210 [672.749][T736] do_syscall_64+0x39/0xb0 [672.744][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.756][T736] [672.757][T736] Freed by task 27677: [672.759][T736] kasan_save_stack+0x22/0x40 [672.759][T736] kasan_set_track+0x25/0x30 [672.756][T736] kasan_save_free_info+0x2e/0x50 [672.751][T736] ____kasan_slab_free+0x162/0x1c0 [672.758][T736] slab_free_freelist_hook+0x89/0x1c0 [672.752][T736] __kmem_cache_free+0xaf/0x2e0 [672.752][T736] btrfs_put_root+0x1ff/0x2b0 [672.759][T736] btrfs_quota_disable+0x80a/0xbc0 [672.752][T736] btrfs_ioctl+0x3e5f/0x5d80 [672.756][T736] __x64_sys_ioctl+0x198/0x210 [672.753][T736] do_syscall_64+0x39/0xb0 [672.765][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.769][T736] [672.768][T736] The buggy address belongs to the object at ffff888022ec0000 [672.768][T736] which belongs to the cache kmalloc-4k of size 4096 [672.769][T736] The buggy address is located 520 bytes inside of [672.769][T736] freed 4096-byte region [ffff888022ec0000, ffff888022ec1000) [672.760][T736] [672.764][T736] The buggy address belongs to the physical page: [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0 [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [672.779][T736] flags: 0xfff00000010200(slab\|head\|node=0\|zone=1\|lastcpupid=0x7ff) [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002 [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 [672.771][T736] page dumped because: kasan: bad access detected [672.778][T736] page_owner tracks the page as allocated [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO\|__GFP_NOWARN\|__GFP_NORETRY\|__GFP_COMP\|__GFP_NOMEMALLOC), pid 88 [672.779][T736] get_page_from_freelist+0x119c/0x2d50 [672.779][T736] __alloc_pages+0x1cb/0x4a0 [672.776][T736] alloc_pages+0x1aa/0x270 [672.773][T736] allocate_slab+0x260/0x390 [672.771][T736] ___slab_alloc+0xa9a/0x13e0 [672.778][T736] __slab_alloc.constprop.0+0x56/0xb0 [672.771][T736] __kmem_cache_alloc_node+0x136/0x320 [672.789][T736] __kmalloc+0x4e/0x1a0 [672.783][T736] tomoyo_realpath_from_path+0xc3/0x600 [672.781][T736] tomoyo_path_perm+0x22f/0x420 [672.782][T736] tomoyo_path_unlink+0x92/0xd0 [672.780][T736] security_path_unlink+0xdb/0x150 [672.788][T736] do_unlinkat+0x377/0x680 [672.788][T736] __x64_sys_unlink+0xca/0x110 [672.789][T736] do_syscall_64+0x39/0xb0 [672.783][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.784][T736] page last free stack trace: [672.787][T736] free_pcp_prepare+0x4e5/0x920 [672.787][T736] free_unref_page+0x1d/0x4e0 [672.784][T736] __unfreeze_partials+0x17c/0x1a0 [672.797][T736] qlist_free_all+0x6a/0x180 [672.796][T736] kasan_quarantine_reduce+0x189/0x1d0 [672.797][T736] __kasan_slab_alloc+0x64/0x90 [672.793][T736] kmem_cache_alloc+0x17c/0x3c0 [672.799][T736] getname_flags.part.0+0x50/0x4e0 [672.799][T736] getname_flags+0x9e/0xe0 [672.792][T736] vfs_fstatat+0x77/0xb0 [672.791][T736] __do_sys_newlstat+0x84/0x100 [672.798][T736] do_syscall_64+0x39/0xb0 [672.796][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.790][T736] [672.791][T736] Memory state around the buggy address: [672.799][T736] ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.805][T736] ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ^ [672.809][T736] ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex before calling btrfs_run_qgroups(), which is what all qgroup ioctls should call. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/ CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:54 +02:00
Marcos Paulo de Souza	4f8f86bc5d	btrfs: qgroup: return ENOTCONN instead of EINVAL when quotas are not enabled commit 8a36e408d40606e21cd4e2dd9601004a67b14868 upstream. [PROBLEM] qgroup create/remove code is currently returning EINVAL when the user tries to create a qgroup on a subvolume without quota enabled. EINVAL is already being used for too many error scenarios so that is hard to depict what is the problem. [FIX] Currently scrub and balance code return -ENOTCONN when the user tries to cancel/pause and no scrub or balance is currently running for the desired subvolume. Do the same here by returning -ENOTCONN when a user tries to create/delete/assing/list a qgroup on a subvolume without quota enabled. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:53 +02:00
Marcos Paulo de Souza	8c1d1f3a33	btrfs: qgroup: remove one-time use variables for quota_root checks commit e3b0edd29737d44137fc7583a9c185abda6e23b8 upstream. Remove some variables that are set only to be checked later, and never used. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:53 +02:00
Filipe Manana	6db2a3c5c2	btrfs: check if the transaction was aborted at btrfs_wait_for_commit() [ Upstream commit bf7ecbe9875061bf3fce1883e3b26b77f847d1e8 ] At btrfs_wait_for_commit() we wait for a transaction to finish and then always return 0 (success) without checking if it was aborted, in which case the transaction didn't happen due to some critical error. Fix this by checking if the transaction was aborted. Fixes: 462045928bda ("Btrfs: add START_SYNC, WAIT_SYNC ioctls") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:53:52 +02:00
Filipe Manana	d1c6e68003	btrfs: check for commit error at btrfs_attach_transaction_barrier() commit b28ff3a7d7e97456fd86b68d24caa32e1cfa7064 upstream. btrfs_attach_transaction_barrier() is used to get a handle pointing to the current running transaction if the transaction has not started its commit yet (its state is < TRANS_STATE_COMMIT_START). If the transaction commit has started, then we wait for the transaction to commit and finish before returning - however we completely ignore if the transaction was aborted due to some error during its commit, we simply return ERR_PT(-ENOENT), which makes the caller assume everything is fine and no errors happened. This could make an fsync return success (0) to user space when in fact we had a transaction abort and the target inode changes were therefore not persisted. Fix this by checking for the return value from btrfs_wait_for_commit(), and if it returned an error, return it back to the caller. Fixes: d4edf39bd5db ("Btrfs: fix uncompleted transaction") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 11:53:52 +02:00
Chao Yu	6b1ee62ecb	ext4: fix to check return value of freeze_bdev() in ext4_shutdown() [ Upstream commit c4d13222afd8a64bf11bc7ec68645496ee8b54b9 ] freeze_bdev() can fail due to a lot of reasons, it needs to check its reason before later process. Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl") Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:53:47 +02:00
Alexander Aring	bd020c7763	fs: dlm: interrupt posix locks only when process is killed [ Upstream commit 59e45c758ca1b9893ac923dd63536da946ac333b ] If a posix lock request is waiting for a result from user space (dlm_controld), do not let it be interrupted unless the process is killed. This reverts commit a6b1533e9a57 ("dlm: make posix locks interruptible"). The problem with the interruptible change is that all locks were cleared on any signal interrupt. If a signal was received that did not terminate the process, the process could continue running after all its dlm posix locks had been cleared. A future patch will add cancelation to allow proper interruption. Cc: stable@vger.kernel.org Fixes: a6b1533e9a57 ("dlm: make posix locks interruptible") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:53:45 +02:00
Alexander Aring	f61d5752ae	dlm: rearrange async condition return [ Upstream commit a800ba77fd285c6391a82819867ac64e9ab3af46 ] This patch moves the return of FILE_LOCK_DEFERRED a little bit earlier than checking afterwards again if the request was an asynchronous request. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 59e45c758ca1 ("fs: dlm: interrupt posix locks only when process is killed") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:53:45 +02:00
Alexander Aring	ed092c495e	dlm: cleanup plock_op vs plock_xop [ Upstream commit bcbb4ba6c9ba81e6975b642a2cade68044cd8a66 ] Lately the different casting between plock_op and plock_xop and list holders which was involved showed some issues which were hard to see. This patch removes the "plock_xop" structure and introduces a "struct plock_async_data". This structure will be set in "struct plock_op" in case of asynchronous lock handling as the original "plock_xop" was made for. There is no need anymore to cast pointers around for additional fields in case of asynchronous lock handling. As disadvantage another allocation was introduces but only needed in the asynchronous case which is currently only used in combination with nfs lockd. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Stable-dep-of: 59e45c758ca1 ("fs: dlm: interrupt posix locks only when process is killed") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:53:45 +02:00
Zhihao Cheng	8eb15ff216	ext4: Fix reusing stale buffer heads from last failed mounting [ Upstream commit 26fb5290240dc31cae99b8b4dd2af7f46dfcba6b ] Following process makes ext4 load stale buffer heads from last failed mounting in a new mounting operation: mount_bdev ext4_fill_super \| ext4_load_and_init_journal \| ext4_load_journal \| jbd2_journal_load \| load_superblock \| journal_get_superblock \| set_buffer_verified(bh) // buffer head is verified \| jbd2_journal_recover // failed caused by EIO \| goto failed_mount3a // skip 'sb->s_root' initialization deactivate_locked_super kill_block_super generic_shutdown_super if (sb->s_root) // false, skip ext4_put_super->invalidate_bdev-> // invalidate_mapping_pages->mapping_evict_folio-> // filemap_release_folio->try_to_free_buffers, which // cannot drop buffer head. blkdev_put blkdev_put_whole if (atomic_dec_and_test(&bdev->bd_openers)) // false, systemd-udev happens to open the device. Then // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> // truncate_inode_folio->truncate_cleanup_folio-> // folio_invalidate->block_invalidate_folio-> // filemap_release_folio->try_to_free_buffers will be skipped, // dropping buffer head is missed again. Second mount: ext4_fill_super ext4_load_and_init_journal ext4_load_journal ext4_get_journal jbd2_journal_init_inode journal_init_common bh = getblk_unmovable bh = __find_get_block // Found stale bh in last failed mounting journal->j_sb_buffer = bh jbd2_journal_load load_superblock journal_get_superblock if (buffer_verified(bh)) // true, skip journal->j_format_version = 2, value is 0 jbd2_journal_recover do_one_pass next_log_block += count_tags(journal, bh) // According to journal_tag_bytes(), 'tag_bytes' calculating is // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() // returns false because 'j->j_format_version >= 2' is not true, // then we get wrong next_log_block. The do_one_pass may exit // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'. The filesystem is corrupted here, journal is partially replayed, and new journal sequence number actually is already used by last mounting. The invalidate_bdev() can drop all buffer heads even racing with bare reading block device(eg. systemd-udev), so we can fix it by invalidating bdev in error handling path in __ext4_fill_super(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming") Cc: stable@vger.kernel.org # v3.5 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-11 11:53:44 +02:00

... 4 5 6 7 8 ...

63165 Commits