IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Fix a missing bounds check in superblock validation.
Note that we don't yet have repair code for this case - repair code for
individual items is generally low priority, since the whole superblock
is checksummed, validated prior to write, and we have backups.
Reported-by: lei lu <llfamsec@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If capabilities of the share is not SMB2_SHARE_CAP_CONTINUOUS_AVAILABILITY,
ksmbd should not grant a persistent handle to the client.
This patch add continuous availability share parameter to control it.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4byte padding cause the connection issue with the applications of MacOS.
smb2_close response size increases by 4 bytes by padding, And the smb
client of MacOS check it and stop the connection. This patch use
struct_group_attr instead of struct_group for network_open_info to use
__packed to avoid padding.
Fixes: 0015eb6e1238 ("smb: client, common: fix fortify warnings")
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
File overwrite case is explicitly handled, so it is not necessary to
pass RENAME_NOREPLACE to vfs_rename.
Clearing the flag fixes rename operations when the share is a ntfs-3g
mount. The latter uses an older version of fuse with no support for
flags in the ->rename op.
Cc: stable@vger.kernel.org
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The response buffer should be allocated in smb2_allocate_rsp_buf
before validating request. But the fields in payload as well as smb2 header
is used in smb2_allocate_rsp_buf(). This patch add simple buffer size
validation to avoid potencial out-of-bounds in request buffer.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If ->ProtocolId is SMB2_TRANSFORM_PROTO_NUM, smb2 request size
validation could be skipped. if request size is smaller than
sizeof(struct smb2_query_info_req), slab-out-of-bounds read can happen in
smb2_allocate_rsp_buf(). This patch allocate response buffer after
decrypting transform request. smb3_decrypt_req() will validate transform
request size and avoid slab-out-of-bound in smb2_allocate_rsp_buf().
Reported-by: Norbert Szetei <norbert@doyensec.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
After commit 2c7d399e551c ("smb: client: reuse file lease key in
compound operations") the client started reusing lease keys for
rename, unlink and set path size operations to prevent it from
breaking its own leases and thus causing unnecessary lease breaks to
same connection.
The implementation relies on positive dentries and
cifsInodeInfo::lease_granted to decide whether reusing lease keys for
the compound requests. cifsInodeInfo::lease_granted was introduced by
commit 0ab95c2510b6 ("Defer close only when lease is enabled.") to
indicate whether lease caching is granted for a specific file, but
that can only happen until file is open, so
cifsInodeInfo::lease_granted was left uninitialised in ->alloc_inode
and then client started sending random lease keys for files that
hadn't any leases.
This fixes the following test case against samba:
mount.cifs //srv/share /mnt/1 -o ...,nosharesock
mount.cifs //srv/share /mnt/2 -o ...,nosharesock
touch /mnt/1/foo; tail -f /mnt/1/foo & pid=$!
mv /mnt/2/foo /mnt/2/bar # fails with -EIO
kill $pid
Fixes: 0ab95c2510b6 ("Defer close only when lease is enabled.")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add tracing for the refcounting/lifecycle of the cifs_tcon struct, marking
different events with different labels and giving each tcon its own debug
ID so that the tracelines corresponding to individual tcons can be
distinguished. This can be enabled with:
echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_tcon_ref/enable
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
During mount, cifs_mount_get_tcon() gets a tcon resource connection record
and then attaches an fscache volume cookie to it. However, it does this
irrespective of whether or not the tcon returned from cifs_get_tcon() is a
new record or one that's already in use. This leads to a warning about a
volume cookie collision and a leaked volume cookie because tcon->fscache
gets reset.
Fix this be adding a mutex and a "we've already tried this" flag and only
doing it once for the lifetime of the tcon.
[!] Note: Looking at cifs_mount_get_tcon(), a more general solution may
actually be required. Reacquiring the volume cookie isn't the only thing
that function does: it also partially reinitialises the tcon record without
any locking - which may cause live filesystem ops already using the tcon
through a previous mount to malfunction.
This can be reproduced simply by something like:
mount //example.com/test /xfstest.test -o user=shares,pass=xxx,fsc
mount //example.com/test /mnt -o user=shares,pass=xxx,fsc
Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
This series contains a reversion of one of the original 6.9
patches which seems to have been the cause of most of the
instability. It also incorporates several fixes to legacy
support and cache fixes.
There are few additional changes to improve stability,
but I want another week of testing before sending them
upstream.
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmYiiQ0ACgkQiP/V+0pf
/5hrCw//aJwdNAimpwPrc5UfE4Q37igQeXoT29VJbkOBO78rZ2cNgd3EFpgC2UES
RFJejQ/IQlEkpqbHiMHIyCii2MmWGT0xzePLf3nUZW/qmoUvhvXlPG5OZb0FomXY
gxCRFuUgegNcK3t3LtFAVn7v6NpXtOfLAgJb3MDIFP8WsCuN863pQcJCwn4aSuKc
C1ct2tLaaIeZSAy68xytqDwRXslMGaKUp7ygBzpyaIIEqy2l9H8NRKQ8Cmg+vyKF
2+zu3fNYIGIS3KflUtcTQDZ9IVtp/YxN7QXchZ56nnD5PFy9L9GgvBecZ0i8zzoZ
XFmzyp7HLwyBA8oNmmEJWMz93iwx61mePxOzPu2n1VfqWRTgFp/kd3KrFKWLfHvw
NoPGbneAhtwifKCNkxAmX6aCvnTZ18j9nds8WbRcuLRbTF0hHfkI36+vgoRWebaA
su673A0fnFFe64EEnOLjlnAa0V8CL26V2rX2Mi2Kjaw6emc1Yz5HDnGYjckKIlvS
fZjlfP1dtqzBecXvBLIuMQKfygpRJD83sEni+rGtAN1FKVP8eKz+/ZcyAG5xqcrZ
dnDXBegjhieqyz4q9vykxTLmYKEKd4fqbhhjZQ3PStyXgc6iFVKvD41akSdxR6ob
3oujNYblpkJVhHCcO+H4dWa7tznB7hqd9xv2Jerx4cKTdd9uIik=
=M4CO
-----END PGP SIGNATURE-----
Merge tag '9p-fixes-for-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull fs/9p fixes from Eric Van Hensbergen:
"This contains a reversion of one of the original 6.9 patches which
seems to have been the cause of most of the instability. It also
incorporates several fixes to legacy support and cache fixes.
There are few additional changes to improve stability, but I want
another week of testing before sending them upstream"
* tag '9p-fixes-for-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: drop inodes immediately on non-.L too
fs/9p: Revert "fs/9p: fix dups even in uncached mode"
fs/9p: remove erroneous nlink init from legacy stat2inode
9p: explicitly deny setlease attempts
fs/9p: fix the cache always being enabled on files with qid flags
fs/9p: translate O_TRUNC into OTRUNC
fs/9p: only translate RWX permissions for plain 9P2000
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZiJcTAAKCRDh3BK/laaZ
PK1QAP9u/S7GYKDj0k58xOVAof2x/q0puHWXoObRma+bPmeoeQEA2+K+vlnTJHub
kLRURaTCzGyFfL+CB/JQ4Kv4tDF5qQc=
=Eoob
-----END PGP SIGNATURE-----
Merge tag 'fuse-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix two bugs in the new passthrough mode
- Fix a statx bug introduced in v6.6
- Fix code documentation
* tag 'fuse-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
cuse: add kernel-doc comments to cuse_process_init_reply()
fuse: fix leaked ENOSYS error on first statx call
fuse: fix parallel dio write on file open in passthrough mode
fuse: fix wrong ff->iomode state changes from parallel dio write
or aren't considered suitable for backporting.
There are a significant number of fixups for this cycle's page_owner
changes (series "page_owner: print stacks and their outstanding
allocations"). Apart from that, singleton changes all over, mainly in MM.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZiGTewAKCRDdBJ7gKXxA
jt1QAP9QxiU/+gUMVjkHyKaMBHSBMD/CWBFjDfRjx+BPqYx55gD+JWxUXwlyVkMo
Z8fqtCGEgatev1VbwpCwByhvnH9bKgw=
=YBZ9
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues
or aren't considered suitable for backporting.
There are a significant number of fixups for this cycle's page_owner
changes (series "page_owner: print stacks and their outstanding
allocations"). Apart from that, singleton changes all over, mainly in
MM"
* tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix OOB in nilfs_set_de_type
MAINTAINERS: update Naoya Horiguchi's email address
fork: defer linking file vma until vma is fully initialized
mm/shmem: inline shmem_is_huge() for disabled transparent hugepages
mm,page_owner: defer enablement of static branch
Squashfs: check the inode number is not the invalid value of zero
mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
mm/userfaultfd: allow hugetlb change protection upon poison entry
mm,page_owner: fix printing of stack records
mm,page_owner: fix accounting of pages when migrating
mm,page_owner: fix refcount imbalance
mm,page_owner: update metadata for tail pages
userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE
mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
[BUG]
During my extent_map cleanup/refactor, with extra sanity checks,
extent-map-tests::test_case_7() would not pass the checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted
extent_map has a @block_start way too large.
Meanwhile my btrfs_file_extent_item based members are returning a
correct @disk_bytenr/@offset combination.
The extent map layout looks like this:
0 16K 32K 48K
| PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be
[36K, 48K).
However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE]
Inside btrfs_drop_extent_map_range() function, if we hit an extent_map
that covers the target range but is still beyond it, we need to split
that extent map into half:
|<-- drop range -->|
|<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward
extent_map::block_start by the difference between the end of drop range
and the extent map start.
However in that particular case, the difference is calculated using
(start + len - em->start).
The problem is @start can be modified if the drop range covers any
pinned extent.
This leads to wrong calculation, and would be caught by my later
extent_map sanity checks, which checks the em::block_start against
btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
This is a regression caused by commit c962098ca4af ("btrfs: fix
incorrect splitting in btrfs_drop_extent_map_range"), which removed the
@len update for pinned extents.
[FIX]
Fix it by avoiding using @start completely, and use @end - em->start
instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such
problem from happening.
Thankfully this is not going to lead to any data corruption, as IO path
does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.
So this fix is only here for the sake of consistency/correctness.
CC: stable@vger.kernel.org # 6.5+
Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Syzbot reported the following information leak for in
btrfs_ioctl_logical_to_ino():
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0xbc/0x110 lib/usercopy.c:40
instrument_copy_to_user include/linux/instrumented.h:114 [inline]
_copy_to_user+0xbc/0x110 lib/usercopy.c:40
copy_to_user include/linux/uaccess.h:191 [inline]
btrfs_ioctl_logical_to_ino+0x440/0x750 fs/btrfs/ioctl.c:3499
btrfs_ioctl+0x714/0x1260
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x261/0x450 fs/ioctl.c:890
__x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890
x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
__kmalloc_large_node+0x231/0x370 mm/slub.c:3921
__do_kmalloc_node mm/slub.c:3954 [inline]
__kmalloc_node+0xb07/0x1060 mm/slub.c:3973
kmalloc_node include/linux/slab.h:648 [inline]
kvmalloc_node+0xc0/0x2d0 mm/util.c:634
kvmalloc include/linux/slab.h:766 [inline]
init_data_container+0x49/0x1e0 fs/btrfs/backref.c:2779
btrfs_ioctl_logical_to_ino+0x17c/0x750 fs/btrfs/ioctl.c:3480
btrfs_ioctl+0x714/0x1260
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x261/0x450 fs/ioctl.c:890
__x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890
x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Bytes 40-65535 of 65536 are uninitialized
Memory access of size 65536 starts at ffff888045a40000
This happens, because we're copying a 'struct btrfs_data_container' back
to user-space. This btrfs_data_container is allocated in
'init_data_container()' via kvmalloc(), which does not zero-fill the
memory.
Fix this by using kvzalloc() which zeroes out the memory on allocation.
CC: stable@vger.kernel.org # 4.14+
Reported-by: <syzbot+510a1abbb8116eeb341d@syzkaller.appspotmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <Johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmYgXDMACgkQxWXV+ddt
WDsPpg//RzpLGyfFVFx+AdqIPScBvDSr6RIQAug++4OmDbIRMxzOpxKOAWThhivf
78KIms2fj9R/zLJEdUGCLQTcy8a1eWBnoeSzXoeTta2pip5cKrc9v3hJId53l0F6
BfltbVjpAKt6XHqeI0V2myrL/KHx5bApz5oNn/oEQCwiA2HBkasrYTRLEA7xMem2
hRUIXrTuIdwiyWugi84xjp9D0BxEdbTBfH6SR6RG4ESy+73gdEt4BAeDI6DzWN+D
eKUv/CthhrP7xuO8Aq9XGkwznP7lIeIwBCiV5XURLR0HztFm64vXgbPQHhwqvI43
5uhA7wifc/VE8nOysubfET6MwVEeyOptW6+25ih/9Da9VLxRK1y/Hm94JW8t6Sxi
VPgT5gz4YuE5/QaojETDLYgkkjKj7Lpe/Bs225J3QBCHu3fs/tp9kHKbUNJrcAeM
b56tiRMccLVpeoslbK4ahvQqCH4/LKBMdAqfWK5/p24JkYT/ubVP3CdLS2MOeRpV
UqDpQExuWsVJZKH8znSXXrHf2ZMYHmlA/1gRqdEmcvPF8A2vCc9aMMZHTP7v57EC
/80NJv9HQuxcUFQCl0h4WBlB+gGQtAszz+0q1X9aedauC6Hd/7LeICLCPRczJC3g
rD3J+EXiTg2MxqZWyXJXQ1Q9cQWNkQjG6o/rEhl5r5c3OGWgssk=
=ZKAP
-----END PGP SIGNATURE-----
Merge tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fixup in zoned mode for out-of-order writes of metadata that are no
longer necessary, this used to be tracked in a separate list but now
the old locaion needs to be zeroed out, also add assertions
- fix bulk page allocation retry, this may stall after first failure
for compression read/write
* tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: do not wait for short bulk allocation
btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling
btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer
In commit b4ccace878f4 ("btrfs: refactor submit_compressed_extents()"), if
an async extent compressed but failed to find enough space, we changed
from falling back to an uncompressed write to just failing the write
altogether. The principle was that if there's not enough space to write
the compressed version of the data, there can't possibly be enough space
to write the larger, uncompressed version of the data.
However, this isn't necessarily true: due to fragmentation, there could
be enough discontiguous free blocks to write the uncompressed version,
but not enough contiguous free blocks to write the smaller but
unsplittable compressed version.
This has occurred to an internal workload which relied on write()'s
return value indicating there was space. While rare, it has happened a
few times.
Thus, in order to prevent early ENOSPC, re-add a fallback to
uncompressed writing.
Fixes: b4ccace878f4 ("btrfs: refactor submit_compressed_extents()")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Co-developed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the errors are fixed, sctx->error_bitmap is cleared for the stripe
range. However, in the zoned mode, it runs relocation to repair scrub
errors when the bitmap is *not* empty, which is a flipped condition.
Also, it runs the relocation even if the scrub is read-only. This was
missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the
read-only flag during repair").
The repair is only necessary when there is a repaired sector and should be
done on read-write scrub. So, tweak the condition for both regular and
zoned case.
Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub")
Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The message format in syslog is usually made of two parts:
prefix ":" message
Various tools parse the prefix up to the first ":". When there's
an additional status of a btrfs filesystem like
[5.199782] BTRFS info (device nvme1n1p1: state M): use zstd compression, level 9
where 'M' is for remount, there's one more ":" that does not conform to
the format. Remove it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
KEY_TYPE_error is left behind when we have to delete all pointers in an
extent in fsck; it allows errors to be correctly returned by reads
later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a deadlock when journal replay has many keys to insert that
were from fsck, not the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Interior nodes are not really needed, when we have to scan - but if this
pops up for leaf nodes we'll need a real heuristic.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When building for 32-bit platforms, for which size_t is 'unsigned int',
there is a warning from a format string in validate_bset_keys():
fs/bcachefs/btree_io.c: In function 'validate_bset_keys':
fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=]
891 | "bad k->u64s %u (min %u max %lu)", k->u64s,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err'
603 | msg, ##__VA_ARGS__); \
| ^~~
fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on'
887 | if (btree_err_on(!bkeyp_u64s_valid(&b->format, k),
| ^~~~~~~~~~~~
fs/bcachefs/btree_io.c:891:64: note: format string is defined here
891 | "bad k->u64s %u (min %u max %lu)", k->u64s,
| ~~^
| |
| long unsigned int
| %u
cc1: all warnings being treated as errors
BKEY_U64s is size_t so the entire expression is promoted to size_t. Use
the '%zu' specifier so that there is no warning regardless of the width
of size_t.
Fixes: 031ad9e7dbd1 ("bcachefs: Check for packed bkeys that are too big")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is
defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function,
which uses this array, specifies the index to read from the array in the
same way as "(mode & S_IFMT) >> S_SHIFT".
static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode
*inode)
{
umode_t mode = inode->i_mode;
de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob
}
However, when the index is determined this way, an out-of-bounds (OOB)
error occurs by referring to an index that is 1 larger than the array size
when the condition "mode & S_IFMT == S_IFMT" is satisfied. Therefore, a
patch to resize the nilfs_type_by_mode array should be applied to prevent
OOB errors.
Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+2e22057de05b9f3b30d8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syskiller has produced an out of bounds access in fill_meta_index().
That out of bounds access is ultimately caused because the inode
has an inode number with the invalid value of zero, which was not checked.
The reason this causes the out of bounds access is due to following
sequence of events:
1. Fill_meta_index() is called to allocate (via empty_meta_index())
and fill a metadata index. It however suffers a data read error
and aborts, invalidating the newly returned empty metadata index.
It does this by setting the inode number of the index to zero,
which means unused (zero is not a valid inode number).
2. When fill_meta_index() is subsequently called again on another
read operation, locate_meta_index() returns the previous index
because it matches the inode number of 0. Because this index
has been returned it is expected to have been filled, and because
it hasn't been, an out of bounds access is performed.
This patch adds a sanity check which checks that the inode number
is not zero when the inode is created and returns -EINVAL if it is.
[phillip@squashfs.org.uk: whitespace fix]
Link: https://lkml.kernel.org/r/20240409204723.446925-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20240408220206.435788-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com>
Closes: https://lore.kernel.org/lkml/87f5c007-b8a5-41ae-8b57-431e924c5915.bugreport@ubisectech.com/
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johan Hovold reported that removing the legacy ntfs driver broke boot
for him since his fstab uses the legacy ntfs driver to access firmware
from the original Windows partition.
Use ntfs3 as an alias for legacy ntfs if CONFIG_NTFS_FS is selected.
This is similar to how ext3 is treated.
Link: https://lore.kernel.org/r/Zf2zPf5TO5oYt3I3@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240325-hinkriegen-zuziehen-d7e2c490427a@brauner
Fixes: 7ffa8f3d3023 ("fs: Remove NTFS classic")
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
various recovery fixes:
- fixes for the btree_insert_entry being resized on path allocation
btree_path array recently became dynamically resizable, and
btree_insert_entry along with it; this was being observed during
journal replay, when write buffer btree updates don't use the write
buffer and instead use the normal btree update path
- multiple fixes for deadlock in recovery when we need to do lots of
btree node merges; excessive merges were clocking up the whole
pipeline
- write buffer path now correctly does btree node merges when needed
- fix failure to go RW when superblock indicates recovery passes needed
(i.e. to complete an unfinished upgrade)
various unsafety fixes - test case contributed by a user who had two
drives out of a six drive array write out a whole bunch of garbage after
power failure
new (tiny) on disk format feature: since it appears the btree node scan
tool will be a more regular thing (crappy hardware, user error) - this
adds a 64 bit per-device bitmap of regions that have ever had btree
nodes.
a path->should_be_locked fix, from a larger patch series tightening up
invariants and assertions around btree transaction and path locking
state; this particular fix prevents us from keeping around btree_paths
that are no longer needed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYdaRIACgkQE6szbY3K
bnbqcA/9ETT0Jekf/V4klQmoWj9GX5nQstUz+ENABNNPL+5hld62EojiRvOW2qwU
zVs7O0M59B8/+4v4KJoW+RqnLFjAF4z/Gf+/Uw9WarsHAKIxxFFFARxG93JpGqOn
nGa8RSw0BaYQIdbMR0Bdacc2f0N+JkJQx956/+JV7EG5MAJqXgz00AvIuLqMZ+2t
0m9av3n0tVmstyvvGqk8pouvQjK0XUvIDYN3oiUDl7WXOAIKXDlp6yviiGnTbusq
DssmIt5fdeVBq/DAk5PMNEKM9NUP+weIZW1UWPWINaicarqyV+pn2fhvLrBxVl7q
zBSN3v28viaABKC8A15b2bqj3IT2WIBDoBCEi406akMao9eiVsE6is13rFkPQwQI
Obhc7NNDyOPPTvX25M3tKXpr8rSGoD2qHIMMKMIBe1ZWscj6lMbmUBErwzTOAW4+
pNTvzWT2XwcS7tE8Fx50ZxcehTQl6ir0hQvjJL5JV2po8XMbdGxcImBe6xPmAa3n
/XIzyglL8IvW494wjCsHxtTeOt+f8nW7BXJCrWB71UQeXIXq4b9FADOwWtlGTnxJ
6XNprfi8TSp+RsSRxav6DBw2ou5viGjAjP2ddrO6Lw37XUYV0igS+BeDNEPA4dwI
ZlbCzNE7qSXK2rjmGjyu7GCJ3+NOxJDQ8GdxkTDtpPrBF2kCOkQ=
=NAId
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs
Pull yet more bcachefs fixes from Kent Overstreet:
"This gets recovery working again for the affected user I've been
working with, and I'm still waiting to hear back on other bug reports
but should fix it for everyone else who's been having issues with
recovery.
- Various recovery fixes:
- fixes for the btree_insert_entry being resized on path
allocation btree_path array recently became dynamically
resizable, and btree_insert_entry along with it; this was being
observed during journal replay, when write buffer btree updates
don't use the write buffer and instead use the normal btree
update path
- multiple fixes for deadlock in recovery when we need to do lots
of btree node merges; excessive merges were clocking up the
whole pipeline
- write buffer path now correctly does btree node merges when
needed
- fix failure to go RW when superblock indicates recovery passes
needed (i.e. to complete an unfinished upgrade)
- Various unsafety fixes - test case contributed by a user who had
two drives out of a six drive array write out a whole bunch of
garbage after power failure
- New (tiny) on disk format feature: since it appears the btree node
scan tool will be a more regular thing (crappy hardware, user
error) - this adds a 64 bit per-device bitmap of regions that have
ever had btree nodes.
- A path->should_be_locked fix, from a larger patch series tightening
up invariants and assertions around btree transaction and path
locking state.
This particular fix prevents us from keeping around btree_paths
that are no longer needed"
* tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs: (24 commits)
bcachefs: set_btree_iter_dontneed also clears should_be_locked
bcachefs: fix error path of __bch2_read_super()
bcachefs: Check for backpointer bucket_offset >= bucket size
bcachefs: bch_member.btree_allocated_bitmap
bcachefs: sysfs internal/trigger_journal_flush
bcachefs: Fix bch2_btree_node_fill() for !path
bcachefs: add safety checks in bch2_btree_node_fill()
bcachefs: Interior known are required to have known key types
bcachefs: add missing bounds check in __bch2_bkey_val_invalid()
bcachefs: Fix btree node merging on write buffer btrees
bcachefs: Disable merges from interior update path
bcachefs: Run merges at BCH_WATERMARK_btree
bcachefs: Fix missing write refs in fs fio paths
bcachefs: Fix deadlock in journal replay
bcachefs: Go rw if running any explicit recovery passes
bcachefs: Standardize helpers for printing enum strs with bounds checks
bcachefs: don't queue btree nodes for rewrites during scan
bcachefs: fix race in bch2_btree_node_evict()
bcachefs: fix unsafety in bch2_stripe_to_text()
bcachefs: fix unsafety in bch2_extent_ptr_to_text()
...
This is part of a larger series cleaning up the semantics of
should_be_locked and adding assertions around it; if we don't need an
iterator/path anymore, it clearly doesn't need to be locked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In __bch2_read_super(), if kstrdup() fails, it needs to release memory
in sb->holder, fix to call bch2_free_super() in the error path.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This commit adds kernel-doc style comments with complete parameter
descriptions for the function cuse_process_init_reply.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
FUSE attempts to detect server support for statx by trying it once and
setting no_statx=1 if it fails with ENOSYS, but consider the following
scenario:
- Userspace (e.g. sh) calls stat() on a file
* succeeds
- Userspace (e.g. lsd) calls statx(BTIME) on the same file
- request_mask = STATX_BASIC_STATS | STATX_BTIME
- first pass: sync=true due to differing cache_mask
- statx fails and returns ENOSYS
- set no_statx and retry
- retry sets mask = STATX_BASIC_STATS
- now mask == cache_mask; sync=false (time_before: still valid)
- so we take the "else if (stat)" path
- "err" is still ENOSYS from the failed statx call
Fix this by zeroing "err" before retrying the failed call.
Fixes: d3045530bdd2 ("fuse: implement statx")
Cc: stable@vger.kernel.org # v6.6
Signed-off-by: Danny Lin <danny@orbstack.dev>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Parallel dio write takes a negative refcount of fi->iocachectr and so does
open of file in passthrough mode.
The refcount of passthrough mode is associated with attach/detach of a
fuse_backing object to fuse inode.
For parallel dio write, the backing file is irrelevant, so the call to
fuse_inode_uncached_io_start() passes a NULL fuse_backing object.
Passing a NULL fuse_backing will result in false -EBUSY error if the file
is already open in passthrough mode.
Allow taking negative fi->iocachectr refcount with NULL fuse_backing,
because it does not conflict with an already attached fuse_backing object.
Fixes: 4a90451bbc7f ("fuse: implement open in passthrough mode")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There is a confusion with fuse_file_uncached_io_{start,end} interface.
These helpers do two things when called from passthrough open()/release():
1. Take/drop negative refcount of fi->iocachectr (inode uncached io mode)
2. State change ff->iomode IOM_NONE <-> IOM_UNCACHED (file uncached open)
The calls from parallel dio write path need to take a reference on
fi->iocachectr, but they should not be changing ff->iomode state, because
in this case, the fi->iocachectr reference does not stick around until file
release().
Factor out helpers fuse_inode_uncached_io_{start,end}, to be used from
parallel dio write path and rename fuse_file_*cached_io_{start,end} helpers
to fuse_file_*cached_io_{open,release} to clarify the difference.
Fixes: 205c1d802683 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This adds a small (64 bit) per-device bitmap that tracks ranges that
have btree nodes, for accelerating btree node scan if it is ever needed.
- New helpers, bch2_dev_btree_bitmap_marked() and
bch2_dev_bitmap_mark(), for checking and updating the bitmap
- Interior btree update path updates the bitmaps when required
- The check_allocations pass has a new fsck_err check,
btree_bitmap_not_marked
- New on disk format version, mi_btree_mitmap, which indicates the new
bitmap is present
- Upgrade table lists the required recovery pass and expected fsck error
- Btree node scan uses the bitmap to skip ranges if we're on the new
version
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We shouldn't be doing the unlock/relock dance when we're not using a
path - this fixes an assertion pop when called from btree node scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
For forwards compatibilyt, we allow bkeys of unknown type in leaf nodes;
we can simply ignore metadata we don't understand. Pointers to btree
nodes must always be of known types, howwever.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZhu2kwAKCRBZ7Krx/gZQ
62gzAP9eeADy6rQkzgWJ8d8sKzGfmd0nup9WlCOxZSR0XojTXwEAnue47dn7PlMx
wQY0joZ0V5FO8PNTEbWc2P/dSQrANgc=
=MshW
-----END PGP SIGNATURE-----
Merge tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull sysfs fix from Al Viro:
"Get rid of lockdep false positives around sysfs/overlayfs
syzbot has uncovered a class of lockdep false positives for setups
with sysfs being one of the backing layers in overlayfs. The root
cause is that of->mutex allocated when opening a sysfs file read-only
(which overlayfs might do) is confused with of->mutex of a file opened
writable (held in write to sysfs file, which overlayfs won't do).
Assigning them separate lockdep classes fixes that bunch and it's
obviously safe"
* tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kernfs: annotate different lockdep class for of->mutex of writable files
The writable file /sys/power/resume may call vfs lookup helpers for
arbitrary paths and readonly files can be read by overlayfs from vfs
helpers when sysfs is a lower layer of overalyfs.
To avoid a lockdep warning of circular dependency between overlayfs
inode lock and kernfs of->mutex, use a different lockdep class for
writable and readonly kernfs files.
Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com
Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The btree write buffer flush fastpath that avoids the main transaction
commit path had the unfortunate side effect of not doing btree node
merging.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
There's been a bug in the btree write buffer where it wasn't triggering
btree node merges - and leaving behind a bunch of nearly empty btree
nodes.
Then during journal replay, when updates to the backpointers btree
aren't using the btree write buffer (because we require synchronization
with journal replay), we end up doing those merges all at once.
Then if it's the interior update path running them, we deadlock because
those run with the highest watermark.
There's no real need for the interior update path to be doing btree node
merges; other code paths can handle that at lower watermarks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a deadlock where the interior update path during journal
replay ends up doing a ton of merges on the backpointers btree, and
deadlocking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>