linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	ec438ac59d	bcachefs: Fix missing call to bch2_fs_allocator_background_exit() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-20 00:31:59 -04:00
Kent Overstreet	fcdbc1d7a4	bcachefs: Check for journal entries overruning end of sb clean section Fix a missing bounds check in superblock validation. Note that we don't yet have repair code for this case - repair code for individual items is generally low priority, since the whole superblock is checksummed, validated prior to write, and we have backups. Reported-by: lei lu <llfamsec@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-20 00:16:53 -04:00
Namjae Jeon	e9d8c2f95a	ksmbd: add continuous availability share parameter If capabilities of the share is not SMB2_SHARE_CAP_CONTINUOUS_AVAILABILITY, ksmbd should not grant a persistent handle to the client. This patch add continuous availability share parameter to control it. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 20:48:47 -05:00
Namjae Jeon	0268a7cc7f	ksmbd: common: use struct_group_attr instead of struct_group for network_open_info 4byte padding cause the connection issue with the applications of MacOS. smb2_close response size increases by 4 bytes by padding, And the smb client of MacOS check it and stop the connection. This patch use struct_group_attr instead of struct_group for network_open_info to use __packed to avoid padding. Fixes: 0015eb6e1238 ("smb: client, common: fix fortify warnings") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 20:48:47 -05:00
Marios Makassikis	4973b04d3e	ksmbd: clear RENAME_NOREPLACE before calling vfs_rename File overwrite case is explicitly handled, so it is not necessary to pass RENAME_NOREPLACE to vfs_rename. Clearing the flag fixes rename operations when the share is a ntfs-3g mount. The latter uses an older version of fuse with no support for flags in the ->rename op. Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 20:48:47 -05:00
Namjae Jeon	17cf0c2794	ksmbd: validate request buffer size in smb2_allocate_rsp_buf() The response buffer should be allocated in smb2_allocate_rsp_buf before validating request. But the fields in payload as well as smb2 header is used in smb2_allocate_rsp_buf(). This patch add simple buffer size validation to avoid potencial out-of-bounds in request buffer. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 20:48:47 -05:00
Namjae Jeon	c119f4ede3	ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf If ->ProtocolId is SMB2_TRANSFORM_PROTO_NUM, smb2 request size validation could be skipped. if request size is smaller than sizeof(struct smb2_query_info_req), slab-out-of-bounds read can happen in smb2_allocate_rsp_buf(). This patch allocate response buffer after decrypting transform request. smb3_decrypt_req() will validate transform request size and avoid slab-out-of-bound in smb2_allocate_rsp_buf(). Reported-by: Norbert Szetei <norbert@doyensec.com> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 20:48:47 -05:00
Paulo Alcantara	18d86965e3	smb: client: fix rename(2) regression against samba After commit 2c7d399e551c ("smb: client: reuse file lease key in compound operations") the client started reusing lease keys for rename, unlink and set path size operations to prevent it from breaking its own leases and thus causing unnecessary lease breaks to same connection. The implementation relies on positive dentries and cifsInodeInfo::lease_granted to decide whether reusing lease keys for the compound requests. cifsInodeInfo::lease_granted was introduced by commit 0ab95c2510b6 ("Defer close only when lease is enabled.") to indicate whether lease caching is granted for a specific file, but that can only happen until file is open, so cifsInodeInfo::lease_granted was left uninitialised in ->alloc_inode and then client started sending random lease keys for files that hadn't any leases. This fixes the following test case against samba: mount.cifs //srv/share /mnt/1 -o ...,nosharesock mount.cifs //srv/share /mnt/2 -o ...,nosharesock touch /mnt/1/foo; tail -f /mnt/1/foo & pid=$! mv /mnt/2/foo /mnt/2/bar # fails with -EIO kill $pid Fixes: 0ab95c2510b6 ("Defer close only when lease is enabled.") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 16:02:45 -05:00
David Howells	afc23febd5	cifs: Add tracing for the cifs_tcon struct refcounting Add tracing for the refcounting/lifecycle of the cifs_tcon struct, marking different events with different labels and giving each tcon its own debug ID so that the tracelines corresponding to individual tcons can be distinguished. This can be enabled with: echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_tcon_ref/enable Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 16:02:09 -05:00
David Howells	dad80c6bff	cifs: Fix reacquisition of volume cookie on still-live connection During mount, cifs_mount_get_tcon() gets a tcon resource connection record and then attaches an fscache volume cookie to it. However, it does this irrespective of whether or not the tcon returned from cifs_get_tcon() is a new record or one that's already in use. This leads to a warning about a volume cookie collision and a leaked volume cookie because tcon->fscache gets reset. Fix this be adding a mutex and a "we've already tried this" flag and only doing it once for the lifetime of the tcon. [!] Note: Looking at cifs_mount_get_tcon(), a more general solution may actually be required. Reacquiring the volume cookie isn't the only thing that function does: it also partially reinitialises the tcon record without any locking - which may cause live filesystem ops already using the tcon through a previous mount to malfunction. This can be reproduced simply by something like: mount //example.com/test /xfstest.test -o user=shares,pass=xxx,fsc mount //example.com/test /mnt -o user=shares,pass=xxx,fsc Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite") Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-04-19 15:37:47 -05:00
Linus Torvalds	46b28503cd	fs/9p: fixes regressions in 6.9 This series contains a reversion of one of the original 6.9 patches which seems to have been the cause of most of the instability. It also incorporates several fixes to legacy support and cache fixes. There are few additional changes to improve stability, but I want another week of testing before sending them upstream. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmYiiQ0ACgkQiP/V+0pf /5hrCw//aJwdNAimpwPrc5UfE4Q37igQeXoT29VJbkOBO78rZ2cNgd3EFpgC2UES RFJejQ/IQlEkpqbHiMHIyCii2MmWGT0xzePLf3nUZW/qmoUvhvXlPG5OZb0FomXY gxCRFuUgegNcK3t3LtFAVn7v6NpXtOfLAgJb3MDIFP8WsCuN863pQcJCwn4aSuKc C1ct2tLaaIeZSAy68xytqDwRXslMGaKUp7ygBzpyaIIEqy2l9H8NRKQ8Cmg+vyKF 2+zu3fNYIGIS3KflUtcTQDZ9IVtp/YxN7QXchZ56nnD5PFy9L9GgvBecZ0i8zzoZ XFmzyp7HLwyBA8oNmmEJWMz93iwx61mePxOzPu2n1VfqWRTgFp/kd3KrFKWLfHvw NoPGbneAhtwifKCNkxAmX6aCvnTZ18j9nds8WbRcuLRbTF0hHfkI36+vgoRWebaA su673A0fnFFe64EEnOLjlnAa0V8CL26V2rX2Mi2Kjaw6emc1Yz5HDnGYjckKIlvS fZjlfP1dtqzBecXvBLIuMQKfygpRJD83sEni+rGtAN1FKVP8eKz+/ZcyAG5xqcrZ dnDXBegjhieqyz4q9vykxTLmYKEKd4fqbhhjZQ3PStyXgc6iFVKvD41akSdxR6ob 3oujNYblpkJVhHCcO+H4dWa7tznB7hqd9xv2Jerx4cKTdd9uIik= =M4CO -----END PGP SIGNATURE----- Merge tag '9p-fixes-for-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull fs/9p fixes from Eric Van Hensbergen: "This contains a reversion of one of the original 6.9 patches which seems to have been the cause of most of the instability. It also incorporates several fixes to legacy support and cache fixes. There are few additional changes to improve stability, but I want another week of testing before sending them upstream" * tag '9p-fixes-for-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: drop inodes immediately on non-.L too fs/9p: Revert "fs/9p: fix dups even in uncached mode" fs/9p: remove erroneous nlink init from legacy stat2inode 9p: explicitly deny setlease attempts fs/9p: fix the cache always being enabled on files with qid flags fs/9p: translate O_TRUNC into OTRUNC fs/9p: only translate RWX permissions for plain 9P2000	2024-04-19 13:36:28 -07:00
Linus Torvalds	daa757767d	fuse fixes for 6.9-rc5 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZiJcTAAKCRDh3BK/laaZ PK1QAP9u/S7GYKDj0k58xOVAof2x/q0puHWXoObRma+bPmeoeQEA2+K+vlnTJHub kLRURaTCzGyFfL+CB/JQ4Kv4tDF5qQc= =Eoob -----END PGP SIGNATURE----- Merge tag 'fuse-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix two bugs in the new passthrough mode - Fix a statx bug introduced in v6.6 - Fix code documentation * tag 'fuse-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: cuse: add kernel-doc comments to cuse_process_init_reply() fuse: fix leaked ENOSYS error on first statx call fuse: fix parallel dio write on file open in passthrough mode fuse: fix wrong ff->iomode state changes from parallel dio write	2024-04-19 13:16:10 -07:00
Linus Torvalds	54c23548e0	15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues or aren't considered suitable for backporting. There are a significant number of fixups for this cycle's page_owner changes (series "page_owner: print stacks and their outstanding allocations"). Apart from that, singleton changes all over, mainly in MM. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZiGTewAKCRDdBJ7gKXxA jt1QAP9QxiU/+gUMVjkHyKaMBHSBMD/CWBFjDfRjx+BPqYx55gD+JWxUXwlyVkMo Z8fqtCGEgatev1VbwpCwByhvnH9bKgw= =YBZ9 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues or aren't considered suitable for backporting. There are a significant number of fixups for this cycle's page_owner changes (series "page_owner: print stacks and their outstanding allocations"). Apart from that, singleton changes all over, mainly in MM" * tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix OOB in nilfs_set_de_type MAINTAINERS: update Naoya Horiguchi's email address fork: defer linking file vma until vma is fully initialized mm/shmem: inline shmem_is_huge() for disabled transparent hugepages mm,page_owner: defer enablement of static branch Squashfs: check the inode number is not the invalid value of zero mm,swapops: update check in is_pfn_swap_entry for hwpoison entries mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled mm/userfaultfd: allow hugetlb change protection upon poison entry mm,page_owner: fix printing of stack records mm,page_owner: fix accounting of pages when migrating mm,page_owner: fix refcount imbalance mm,page_owner: update metadata for tail pages userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE mm/madvise: make MADV_POPULATE_(READ\|WRITE) handle VM_FAULT_RETRY properly	2024-04-19 09:13:35 -07:00
Qu Wenruo	fe1c6c7acc	btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range() [BUG] During my extent_map cleanup/refactor, with extra sanity checks, extent-map-tests::test_case_7() would not pass the checks. The problem is, after btrfs_drop_extent_map_range(), the resulted extent_map has a @block_start way too large. Meanwhile my btrfs_file_extent_item based members are returning a correct @disk_bytenr/@offset combination. The extent map layout looks like this: 0 16K 32K 48K \| PINNED \| \| Regular \| The regular em at [32K, 48K) also has 32K @block_start. Then drop range [0, 36K), which should shrink the regular one to be [36K, 48K). However the @block_start is incorrect, we expect 32K + 4K, but got 52K. [CAUSE] Inside btrfs_drop_extent_map_range() function, if we hit an extent_map that covers the target range but is still beyond it, we need to split that extent map into half: \|<-- drop range -->\| \|<----- existing extent_map --->\| And if the extent map is not compressed, we need to forward extent_map::block_start by the difference between the end of drop range and the extent map start. However in that particular case, the difference is calculated using (start + len - em->start). The problem is @start can be modified if the drop range covers any pinned extent. This leads to wrong calculation, and would be caught by my later extent_map sanity checks, which checks the em::block_start against btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset. This is a regression caused by commit c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range"), which removed the @len update for pinned extents. [FIX] Fix it by avoiding using @start completely, and use @end - em->start instead, which @end is exclusive bytenr number. And update the test case to verify the @block_start to prevent such problem from happening. Thankfully this is not going to lead to any data corruption, as IO path does not utilize btrfs_drop_extent_map_range() with @skip_pinned set. So this fix is only here for the sake of consistency/correctness. CC: stable@vger.kernel.org # 6.5+ Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-04-18 18:18:50 +02:00
Johannes Thumshirn	2f7ef5bb4a	btrfs: fix information leak in btrfs_ioctl_logical_to_ino() Syzbot reported the following information leak for in btrfs_ioctl_logical_to_ino(): BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0xbc/0x110 lib/usercopy.c:40 instrument_copy_to_user include/linux/instrumented.h:114 [inline] _copy_to_user+0xbc/0x110 lib/usercopy.c:40 copy_to_user include/linux/uaccess.h:191 [inline] btrfs_ioctl_logical_to_ino+0x440/0x750 fs/btrfs/ioctl.c:3499 btrfs_ioctl+0x714/0x1260 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0x261/0x450 fs/ioctl.c:890 __x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890 x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: __kmalloc_large_node+0x231/0x370 mm/slub.c:3921 __do_kmalloc_node mm/slub.c:3954 [inline] __kmalloc_node+0xb07/0x1060 mm/slub.c:3973 kmalloc_node include/linux/slab.h:648 [inline] kvmalloc_node+0xc0/0x2d0 mm/util.c:634 kvmalloc include/linux/slab.h:766 [inline] init_data_container+0x49/0x1e0 fs/btrfs/backref.c:2779 btrfs_ioctl_logical_to_ino+0x17c/0x750 fs/btrfs/ioctl.c:3480 btrfs_ioctl+0x714/0x1260 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0x261/0x450 fs/ioctl.c:890 __x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890 x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Bytes 40-65535 of 65536 are uninitialized Memory access of size 65536 starts at ffff888045a40000 This happens, because we're copying a 'struct btrfs_data_container' back to user-space. This btrfs_data_container is allocated in 'init_data_container()' via kvmalloc(), which does not zero-fill the memory. Fix this by using kvzalloc() which zeroes out the memory on allocation. CC: stable@vger.kernel.org # 4.14+ Reported-by: <syzbot+510a1abbb8116eeb341d@syzkaller.appspotmail.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <Johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-04-18 18:18:13 +02:00
Linus Torvalds	8cd26fd90c	for-6.9-rc4-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmYgXDMACgkQxWXV+ddt WDsPpg//RzpLGyfFVFx+AdqIPScBvDSr6RIQAug++4OmDbIRMxzOpxKOAWThhivf 78KIms2fj9R/zLJEdUGCLQTcy8a1eWBnoeSzXoeTta2pip5cKrc9v3hJId53l0F6 BfltbVjpAKt6XHqeI0V2myrL/KHx5bApz5oNn/oEQCwiA2HBkasrYTRLEA7xMem2 hRUIXrTuIdwiyWugi84xjp9D0BxEdbTBfH6SR6RG4ESy+73gdEt4BAeDI6DzWN+D eKUv/CthhrP7xuO8Aq9XGkwznP7lIeIwBCiV5XURLR0HztFm64vXgbPQHhwqvI43 5uhA7wifc/VE8nOysubfET6MwVEeyOptW6+25ih/9Da9VLxRK1y/Hm94JW8t6Sxi VPgT5gz4YuE5/QaojETDLYgkkjKj7Lpe/Bs225J3QBCHu3fs/tp9kHKbUNJrcAeM b56tiRMccLVpeoslbK4ahvQqCH4/LKBMdAqfWK5/p24JkYT/ubVP3CdLS2MOeRpV UqDpQExuWsVJZKH8znSXXrHf2ZMYHmlA/1gRqdEmcvPF8A2vCc9aMMZHTP7v57EC /80NJv9HQuxcUFQCl0h4WBlB+gGQtAszz+0q1X9aedauC6Hd/7LeICLCPRczJC3g rD3J+EXiTg2MxqZWyXJXQ1Q9cQWNkQjG6o/rEhl5r5c3OGWgssk= =ZKAP -----END PGP SIGNATURE----- Merge tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fixup in zoned mode for out-of-order writes of metadata that are no longer necessary, this used to be tracked in a separate list but now the old locaion needs to be zeroed out, also add assertions - fix bulk page allocation retry, this may stall after first failure for compression read/write * tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: do not wait for short bulk allocation btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer	2024-04-17 18:25:40 -07:00
Sweet Tea Dorminy	131a821a24	btrfs: fallback if compressed IO fails for ENOSPC In commit b4ccace878f4 ("btrfs: refactor submit_compressed_extents()"), if an async extent compressed but failed to find enough space, we changed from falling back to an uncompressed write to just failing the write altogether. The principle was that if there's not enough space to write the compressed version of the data, there can't possibly be enough space to write the larger, uncompressed version of the data. However, this isn't necessarily true: due to fragmentation, there could be enough discontiguous free blocks to write the uncompressed version, but not enough contiguous free blocks to write the smaller but unsplittable compressed version. This has occurred to an internal workload which relied on write()'s return value indicating there was space. While rare, it has happened a few times. Thus, in order to prevent early ENOSPC, re-add a fallback to uncompressed writing. Fixes: b4ccace878f4 ("btrfs: refactor submit_compressed_extents()") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu@suse.com> Co-developed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-04-18 01:46:52 +02:00
Naohiro Aota	7192833c4e	btrfs: scrub: run relocation repair when/only needed When btrfs scrub finds an error, it reads mirrors to find correct data. If all the errors are fixed, sctx->error_bitmap is cleared for the stripe range. However, in the zoned mode, it runs relocation to repair scrub errors when the bitmap is not empty, which is a flipped condition. Also, it runs the relocation even if the scrub is read-only. This was missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair"). The repair is only necessary when there is a repaired sector and should be done on read-write scrub. So, tweak the condition for both regular and zoned case. Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub") Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-04-18 01:46:47 +02:00
David Sterba	e5a78fdec0	btrfs: remove colon from messages with state The message format in syslog is usually made of two parts: prefix ":" message Various tools parse the prefix up to the first ":". When there's an additional status of a btrfs filesystem like [5.199782] BTRFS info (device nvme1n1p1: state M): use zstd compression, level 9 where 'M' is for remount, there's one more ":" that does not conform to the format. Remove it. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-04-18 01:46:35 +02:00
Kent Overstreet	0389c09b2f	bcachefs: Fix bio alloc in check_extent_checksum() if the buffer is virtually mapped it won't be a single bvec Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-17 17:29:58 -04:00
Kent Overstreet	719aec84b1	bcachefs: fix leak in bch2_gc_write_reflink_key Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-17 17:29:58 -04:00
Kent Overstreet	605109ff5e	bcachefs: KEY_TYPE_error is allowed for reflink KEY_TYPE_error is left behind when we have to delete all pointers in an extent in fsck; it allows errors to be correctly returned by reads later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-17 17:29:58 -04:00
Kent Overstreet	fa845c7349	bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift Fixes: 27c15ed297cb bcachefs: bch_member.btree_allocated_bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-17 17:29:53 -04:00
Kent Overstreet	79055f50a6	bcachefs: make sure to release last journal pin in replay This fixes a deadlock when journal replay has many keys to insert that were from fsck, not the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-16 19:14:01 -04:00
Kent Overstreet	fabb4d4985	bcachefs: node scan: ignore multiple nodes with same seq if interior Interior nodes are not really needed, when we have to scan - but if this pops up for leaf nodes we'll need a real heuristic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-16 19:14:00 -04:00
Nathan Chancellor	9fd5a48a1e	bcachefs: Fix format specifier in validate_bset_keys() When building for 32-bit platforms, for which size_t is 'unsigned int', there is a warning from a format string in validate_bset_keys(): fs/bcachefs/btree_io.c: In function 'validate_bset_keys': fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=] 891 \| "bad k->u64s %u (min %u max %lu)", k->u64s, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err' 603 \| msg, ##__VA_ARGS__); \ \| ^~~ fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on' 887 \| if (btree_err_on(!bkeyp_u64s_valid(&b->format, k), \| ^~~~~~~~~~~~ fs/bcachefs/btree_io.c:891:64: note: format string is defined here 891 \| "bad k->u64s %u (min %u max %lu)", k->u64s, \| ~~^ \| \| \| long unsigned int \| %u cc1: all warnings being treated as errors BKEY_U64s is size_t so the entire expression is promoted to size_t. Use the '%zu' specifier so that there is no warning regardless of the width of size_t. Fixes: 031ad9e7dbd1 ("bcachefs: Check for packed bkeys that are too big") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-16 19:11:49 -04:00
Kent Overstreet	02bed83d59	bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE We need to initialize the stdio redirects before they're used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-16 19:11:49 -04:00
Jeongjun Park	c4a7dc9523	nilfs2: fix OOB in nilfs_set_de_type The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function, which uses this array, specifies the index to read from the array in the same way as "(mode & S_IFMT) >> S_SHIFT". static void nilfs_set_de_type(struct nilfs_dir_entry de, struct inode inode) { umode_t mode = inode->i_mode; de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob } However, when the index is determined this way, an out-of-bounds (OOB) error occurs by referring to an index that is 1 larger than the array size when the condition "mode & S_IFMT == S_IFMT" is satisfied. Therefore, a patch to resize the nilfs_type_by_mode array should be applied to prevent OOB errors. Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com Reported-by: syzbot+2e22057de05b9f3b30d8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-04-16 15:39:52 -07:00
Phillip Lougher	9253c54e01	Squashfs: check the inode number is not the invalid value of zero Syskiller has produced an out of bounds access in fill_meta_index(). That out of bounds access is ultimately caused because the inode has an inode number with the invalid value of zero, which was not checked. The reason this causes the out of bounds access is due to following sequence of events: 1. Fill_meta_index() is called to allocate (via empty_meta_index()) and fill a metadata index. It however suffers a data read error and aborts, invalidating the newly returned empty metadata index. It does this by setting the inode number of the index to zero, which means unused (zero is not a valid inode number). 2. When fill_meta_index() is subsequently called again on another read operation, locate_meta_index() returns the previous index because it matches the inode number of 0. Because this index has been returned it is expected to have been filled, and because it hasn't been, an out of bounds access is performed. This patch adds a sanity check which checks that the inode number is not zero when the inode is created and returns -EINVAL if it is. [phillip@squashfs.org.uk: whitespace fix] Link: https://lkml.kernel.org/r/20240409204723.446925-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20240408220206.435788-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com> Closes: https://lore.kernel.org/lkml/87f5c007-b8a5-41ae-8b57-431e924c5915.bugreport@ubisectech.com/ Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-04-16 15:39:50 -07:00
Christian Brauner	74871791ff	ntfs3: serve as alias for the legacy ntfs driver Johan Hovold reported that removing the legacy ntfs driver broke boot for him since his fstab uses the legacy ntfs driver to access firmware from the original Windows partition. Use ntfs3 as an alias for legacy ntfs if CONFIG_NTFS_FS is selected. This is similar to how ext3 is treated. Link: https://lore.kernel.org/r/Zf2zPf5TO5oYt3I3@hovoldconsulting.com Link: https://lore.kernel.org/r/20240325-hinkriegen-zuziehen-d7e2c490427a@brauner Fixes: 7ffa8f3d3023 ("fs: Remove NTFS classic") Tested-by: Johan Hovold <johan+linaro@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johan Hovold <johan@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-16 10:45:26 +02:00
Linus Torvalds	96fca68c4f	nfsd-6.9 fixes: - Fix a potential tracepoint crash - Fix NFSv4 GETATTR on big-endian platforms -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmYdlTMACgkQM2qzM29m f5esDBAAnXOgnizrGTMkpmqWL11UmpIjWDyTxQ7dWrk7dqQGXT3qAAya3dijaJiM a1eLdFiaaKFxtkFrR9QPCtqfpR/gNxkkHf05SK/LQ1SL2OMbAMa1/UIaf0teWM78 CafmMT1YLMyiEDFpB0rAnoJ5VvTU2BVowjfzAW/0PkmwLlO5+XMMhPx/qd1061Ll gwl2pqwZPankZRWsUBZtDE5bCTuKQDePrG7e7J7FKVPR+1EqAcudsDMh1tmSTvar 0NTeLH0LTJ2imZi21b+j9+VKtwXTtmuY2GxhADNb8goUuQI2+lqNakDk4AflQvuy Kg3Z0dnNkTWGKPIbV/020vhN/6Fev5RVF9SdPF5WcEfeaWDV5rjEY1s4svphUuS+ Nh8VCPeQEAamAcShA584G8onWdXGP9sYgBiWXZvh8R38Akq6AC6LPEkbqT6dR5mU ftMDGb3BBvkOs7ahjaiUUaPqoRXxeS+Qh06Sa3JrZhbMFdccZRq/AgodtC7ZYGZZ 4u7yG+y8MIytHbIljE2aCo8U8jV8f4nl6VV3xda3H9zZG0RRfpZfFetHiAWqRjoq BEB75eLFDjf1qAXENWzzdeS0wLRRr5PHIkBfDeFq71zyJO37RH15sfVnavinj2KY 7a0ASn2xlqzDHY7MTZ2ULRCLYsS7XwN88KBF7tNghfQBKJYs59A= =wAk4 -----END PGP SIGNATURE----- Merge tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a potential tracepoint crash - Fix NFSv4 GETATTR on big-endian platforms * tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: fix endianness issue in nfsd4_encode_fattr4 SUNRPC: Fix rpcgss_context trace event acceptor field	2024-04-15 14:09:47 -07:00
Linus Torvalds	cef27048e5	bcachefs fixes for 6.9-rc5 various recovery fixes: - fixes for the btree_insert_entry being resized on path allocation btree_path array recently became dynamically resizable, and btree_insert_entry along with it; this was being observed during journal replay, when write buffer btree updates don't use the write buffer and instead use the normal btree update path - multiple fixes for deadlock in recovery when we need to do lots of btree node merges; excessive merges were clocking up the whole pipeline - write buffer path now correctly does btree node merges when needed - fix failure to go RW when superblock indicates recovery passes needed (i.e. to complete an unfinished upgrade) various unsafety fixes - test case contributed by a user who had two drives out of a six drive array write out a whole bunch of garbage after power failure new (tiny) on disk format feature: since it appears the btree node scan tool will be a more regular thing (crappy hardware, user error) - this adds a 64 bit per-device bitmap of regions that have ever had btree nodes. a path->should_be_locked fix, from a larger patch series tightening up invariants and assertions around btree transaction and path locking state; this particular fix prevents us from keeping around btree_paths that are no longer needed. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYdaRIACgkQE6szbY3K bnbqcA/9ETT0Jekf/V4klQmoWj9GX5nQstUz+ENABNNPL+5hld62EojiRvOW2qwU zVs7O0M59B8/+4v4KJoW+RqnLFjAF4z/Gf+/Uw9WarsHAKIxxFFFARxG93JpGqOn nGa8RSw0BaYQIdbMR0Bdacc2f0N+JkJQx956/+JV7EG5MAJqXgz00AvIuLqMZ+2t 0m9av3n0tVmstyvvGqk8pouvQjK0XUvIDYN3oiUDl7WXOAIKXDlp6yviiGnTbusq DssmIt5fdeVBq/DAk5PMNEKM9NUP+weIZW1UWPWINaicarqyV+pn2fhvLrBxVl7q zBSN3v28viaABKC8A15b2bqj3IT2WIBDoBCEi406akMao9eiVsE6is13rFkPQwQI Obhc7NNDyOPPTvX25M3tKXpr8rSGoD2qHIMMKMIBe1ZWscj6lMbmUBErwzTOAW4+ pNTvzWT2XwcS7tE8Fx50ZxcehTQl6ir0hQvjJL5JV2po8XMbdGxcImBe6xPmAa3n /XIzyglL8IvW494wjCsHxtTeOt+f8nW7BXJCrWB71UQeXIXq4b9FADOwWtlGTnxJ 6XNprfi8TSp+RsSRxav6DBw2ou5viGjAjP2ddrO6Lw37XUYV0igS+BeDNEPA4dwI ZlbCzNE7qSXK2rjmGjyu7GCJ3+NOxJDQ8GdxkTDtpPrBF2kCOkQ= =NAId -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs Pull yet more bcachefs fixes from Kent Overstreet: "This gets recovery working again for the affected user I've been working with, and I'm still waiting to hear back on other bug reports but should fix it for everyone else who's been having issues with recovery. - Various recovery fixes: - fixes for the btree_insert_entry being resized on path allocation btree_path array recently became dynamically resizable, and btree_insert_entry along with it; this was being observed during journal replay, when write buffer btree updates don't use the write buffer and instead use the normal btree update path - multiple fixes for deadlock in recovery when we need to do lots of btree node merges; excessive merges were clocking up the whole pipeline - write buffer path now correctly does btree node merges when needed - fix failure to go RW when superblock indicates recovery passes needed (i.e. to complete an unfinished upgrade) - Various unsafety fixes - test case contributed by a user who had two drives out of a six drive array write out a whole bunch of garbage after power failure - New (tiny) on disk format feature: since it appears the btree node scan tool will be a more regular thing (crappy hardware, user error) - this adds a 64 bit per-device bitmap of regions that have ever had btree nodes. - A path->should_be_locked fix, from a larger patch series tightening up invariants and assertions around btree transaction and path locking state. This particular fix prevents us from keeping around btree_paths that are no longer needed" * tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs: (24 commits) bcachefs: set_btree_iter_dontneed also clears should_be_locked bcachefs: fix error path of __bch2_read_super() bcachefs: Check for backpointer bucket_offset >= bucket size bcachefs: bch_member.btree_allocated_bitmap bcachefs: sysfs internal/trigger_journal_flush bcachefs: Fix bch2_btree_node_fill() for !path bcachefs: add safety checks in bch2_btree_node_fill() bcachefs: Interior known are required to have known key types bcachefs: add missing bounds check in __bch2_bkey_val_invalid() bcachefs: Fix btree node merging on write buffer btrees bcachefs: Disable merges from interior update path bcachefs: Run merges at BCH_WATERMARK_btree bcachefs: Fix missing write refs in fs fio paths bcachefs: Fix deadlock in journal replay bcachefs: Go rw if running any explicit recovery passes bcachefs: Standardize helpers for printing enum strs with bounds checks bcachefs: don't queue btree nodes for rewrites during scan bcachefs: fix race in bch2_btree_node_evict() bcachefs: fix unsafety in bch2_stripe_to_text() bcachefs: fix unsafety in bch2_extent_ptr_to_text() ...	2024-04-15 11:01:11 -07:00
Kent Overstreet	ad29cf999a	bcachefs: set_btree_iter_dontneed also clears should_be_locked This is part of a larger series cleaning up the semantics of should_be_locked and adding assertions around it; if we don't need an iterator/path anymore, it clearly doesn't need to be locked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-15 13:31:15 -04:00
Chao Yu	3078e059a5	bcachefs: fix error path of __bch2_read_super() In __bch2_read_super(), if kstrdup() fails, it needs to release memory in sb->holder, fix to call bch2_free_super() in the error path. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-15 13:31:15 -04:00
Yang Li	09492cb451	cuse: add kernel-doc comments to cuse_process_init_reply() This commit adds kernel-doc style comments with complete parameter descriptions for the function cuse_process_init_reply. Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 11:02:10 +02:00
Danny Lin	eb4b691b91	fuse: fix leaked ENOSYS error on first statx call FUSE attempts to detect server support for statx by trying it once and setting no_statx=1 if it fails with ENOSYS, but consider the following scenario: - Userspace (e.g. sh) calls stat() on a file * succeeds - Userspace (e.g. lsd) calls statx(BTIME) on the same file - request_mask = STATX_BASIC_STATS \| STATX_BTIME - first pass: sync=true due to differing cache_mask - statx fails and returns ENOSYS - set no_statx and retry - retry sets mask = STATX_BASIC_STATS - now mask == cache_mask; sync=false (time_before: still valid) - so we take the "else if (stat)" path - "err" is still ENOSYS from the failed statx call Fix this by zeroing "err" before retrying the failed call. Fixes: d3045530bdd2 ("fuse: implement statx") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Danny Lin <danny@orbstack.dev> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 10:12:44 +02:00
Amir Goldstein	7cc9112628	fuse: fix parallel dio write on file open in passthrough mode Parallel dio write takes a negative refcount of fi->iocachectr and so does open of file in passthrough mode. The refcount of passthrough mode is associated with attach/detach of a fuse_backing object to fuse inode. For parallel dio write, the backing file is irrelevant, so the call to fuse_inode_uncached_io_start() passes a NULL fuse_backing object. Passing a NULL fuse_backing will result in false -EBUSY error if the file is already open in passthrough mode. Allow taking negative fi->iocachectr refcount with NULL fuse_backing, because it does not conflict with an already attached fuse_backing object. Fixes: 4a90451bbc7f ("fuse: implement open in passthrough mode") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 10:12:44 +02:00
Amir Goldstein	4864a6dd83	fuse: fix wrong ff->iomode state changes from parallel dio write There is a confusion with fuse_file_uncached_io_{start,end} interface. These helpers do two things when called from passthrough open()/release(): 1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode) 2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open) The calls from parallel dio write path need to take a reference on fi->iocachectr, but they should not be changing ff->iomode state, because in this case, the fi->iocachectr reference does not stick around until file release(). Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from parallel dio write path and rename fuse_file_cached_io_{start,end} helpers to fuse_file_cached_io_{open,release} to clarify the difference. Fixes: 205c1d802683 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-04-15 10:12:03 +02:00
Kent Overstreet	f0a73d4fde	bcachefs: Check for backpointer bucket_offset >= bucket size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 20:02:11 -04:00
Kent Overstreet	27c15ed297	bcachefs: bch_member.btree_allocated_bitmap This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 20:02:11 -04:00
Kent Overstreet	bdae2a7e60	bcachefs: sysfs internal/trigger_journal_flush Add a sysfs knob for immediately flushing the entire journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 20:02:11 -04:00
Kent Overstreet	e879389f57	bcachefs: Fix bch2_btree_node_fill() for !path We shouldn't be doing the unlock/relock dance when we're not using a path - this fixes an assertion pop when called from btree node scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 20:02:11 -04:00
Kent Overstreet	8cf2036e7b	bcachefs: add safety checks in bch2_btree_node_fill() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 18:01:12 -04:00
Kent Overstreet	d789e9a7d5	bcachefs: Interior known are required to have known key types For forwards compatibilyt, we allow bkeys of unknown type in leaf nodes; we can simply ignore metadata we don't understand. Pointers to btree nodes must always be of known types, howwever. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 18:01:12 -04:00
Kent Overstreet	bceb86be9e	bcachefs: add missing bounds check in __bch2_bkey_val_invalid() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-14 18:01:12 -04:00
Linus Torvalds	72374d71c3	Get rid of lockdep false positives around sysfs/overlayfs -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZhu2kwAKCRBZ7Krx/gZQ 62gzAP9eeADy6rQkzgWJ8d8sKzGfmd0nup9WlCOxZSR0XojTXwEAnue47dn7PlMx wQY0joZ0V5FO8PNTEbWc2P/dSQrANgc= =MshW -----END PGP SIGNATURE----- Merge tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysfs fix from Al Viro: "Get rid of lockdep false positives around sysfs/overlayfs syzbot has uncovered a class of lockdep false positives for setups with sysfs being one of the backing layers in overlayfs. The root cause is that of->mutex allocated when opening a sysfs file read-only (which overlayfs might do) is confused with of->mutex of a file opened writable (held in write to sysfs file, which overlayfs won't do). Assigning them separate lockdep classes fixes that bunch and it's obviously safe" * tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kernfs: annotate different lockdep class for of->mutex of writable files	2024-04-14 11:41:51 -07:00
Amir Goldstein	16b52bbee4	kernfs: annotate different lockdep class for of->mutex of writable files The writable file /sys/power/resume may call vfs lookup helpers for arbitrary paths and readonly files can be read by overlayfs from vfs helpers when sysfs is a lower layer of overalyfs. To avoid a lockdep warning of circular dependency between overlayfs inode lock and kernfs of->mutex, use a different lockdep class for writable and readonly kernfs files. Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-04-14 06:55:46 -04:00
Kent Overstreet	86dbf8c566	bcachefs: Fix btree node merging on write buffer btrees The btree write buffer flush fastpath that avoids the main transaction commit path had the unfortunate side effect of not doing btree node merging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-13 22:49:25 -04:00
Kent Overstreet	3f10048973	bcachefs: Disable merges from interior update path There's been a bug in the btree write buffer where it wasn't triggering btree node merges - and leaving behind a bunch of nearly empty btree nodes. Then during journal replay, when updates to the backpointers btree aren't using the btree write buffer (because we require synchronization with journal replay), we end up doing those merges all at once. Then if it's the interior update path running them, we deadlock because those run with the highest watermark. There's no real need for the interior update path to be doing btree node merges; other code paths can handle that at lower watermarks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-13 22:49:25 -04:00
Kent Overstreet	9054ef2ea9	bcachefs: Run merges at BCH_WATERMARK_btree This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-13 22:49:25 -04:00

... 3 4 5 6 7 ...

90801 Commits