IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Syzkaller reported the following bug:
sysfs: cannot create duplicate filename '/devices/iommufd_mock4'
Call Trace:
sysfs_warn_dup+0x71/0x90
sysfs_create_dir_ns+0x1ee/0x260
? sysfs_create_mount_point+0x80/0x80
? spin_bug+0x1d0/0x1d0
? do_raw_spin_unlock+0x54/0x220
kobject_add_internal+0x221/0x970
kobject_add+0x11c/0x1e0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? kset_create_and_add+0x160/0x160
? kobject_put+0x5d/0x390
? bus_get_dev_root+0x4a/0x60
? kobject_put+0x5d/0x390
device_add+0x1d5/0x1550
? __fw_devlink_link_to_consumers.isra.0+0x1f0/0x1f0
? __init_waitqueue_head+0xcb/0x150
iommufd_test+0x462/0x3b60
? lock_release+0x1fe/0x640
? __might_fault+0x117/0x170
? reacquire_held_locks+0x4b0/0x4b0
? iommufd_selftest_destroy+0xd0/0xd0
? __might_fault+0xbe/0x170
iommufd_fops_ioctl+0x256/0x350
? iommufd_option+0x180/0x180
? __lock_acquire+0x1755/0x45f0
__x64_sys_ioctl+0xa13/0x1640
The bug is triggered when Syzkaller created multiple mock devices but
didn't destroy them in the same sequence, messing up the mock_dev_num
counter. Replace the atomic with an mock_dev_ida.
Cc: stable@vger.kernel.org
Fixes: 23a1b46f15d5 ("iommufd/selftest: Make the mock iommu driver into a real driver")
Link: https://lore.kernel.org/r/5af41d5af6d5c013cc51de01427abb8141b3587e.1708636627.git.nicolinc@nvidia.com
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Syzkaller reported the following WARN_ON:
WARNING: CPU: 1 PID: 4738 at drivers/iommu/iommufd/io_pagetable.c:1360
Call Trace:
iommufd_access_change_ioas+0x2fe/0x4e0
iommufd_access_destroy_object+0x50/0xb0
iommufd_object_remove+0x2a3/0x490
iommufd_object_destroy_user
iommufd_access_destroy+0x71/0xb0
iommufd_test_staccess_release+0x89/0xd0
__fput+0x272/0xb50
__fput_sync+0x4b/0x60
__do_sys_close
__se_sys_close
__x64_sys_close+0x8b/0x110
do_syscall_x64
The mismatch between the access pointer in the list and the passed-in
pointer is resulting from an overwrite of access->iopt_access_list_id, in
iopt_add_access(). Called from iommufd_access_change_ioas() when
xa_alloc() succeeds but iopt_calculate_iova_alignment() fails.
Add a new_id in iopt_add_access() and only update iopt_access_list_id when
returning successfully.
Cc: stable@vger.kernel.org
Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers")
Link: https://lore.kernel.org/r/2dda7acb25b8562ec5f1310de828ef5da9ef509c.1708636627.git.nicolinc@nvidia.com
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
but could not be used in practice so they are being removed.
Regarding the SPI-NAND area, Gigadevice chips were not using the right
buffer for an ECC status check operation.
Aside from these driver fixes, there is also a refcount fix in the MTD
core nodes parsing logic.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmXccMwACgkQJWrqGEe9
VoRSjAf/SdgiZdXz6Fv2uUQYXsHNXO9PnFPSzJ8XNg16g4Jdm/uG0dTIN9W12ZTK
QtI2v8+HLtmIzxcGpHEVSzxMTfMbNbB+CibZm+lQFABeqxrhasFHAEDE0Nmo0dNs
/Dh5Em6fxxOW/pYUoPQk9zm5RNNg+yr9eQk9FKrv723Wprk6tjHjQQJEpUchtXUz
B/qfNSszIQUHPIUtYdNc0A0E1Q4Dr525gBnKFP2lvd6OmjjI35fJ4QvOQuVCQ5b9
Q01y50SojypoFMb+SjiYxFBSnjO3ZECnbf4wvAE0S08tsXUdOfg4ksoiciZbC4mz
6BCw+MooPwi2vUPldeITlAm0Ij0G1A==
=0Nf5
-----END PGP SIGNATURE-----
Merge tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Many NAND page layouts have been added to the Marvell NAND controller
but could not be used in practice so they are being removed.
Regarding the SPI-NAND area, Gigadevice chips were not using the right
buffer for an ECC status check operation.
Aside from these driver fixes, there is also a refcount fix in the MTD
core nodes parsing logic"
* tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: fix layouts
mtd: Fix possible refcounting issue when going through partition nodes
mtd: spinand: gigadevice: Fix the get ecc status issue
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmXcsfAACgkQxWXV+ddt
WDt3XA/6AkPT8QNT+mOyp4NjPzquR4UMIPVGGvjWTeKNtjNnco9gPkOBWsHeeDQe
aiihh3X2NpNtsduEmqaz717EJW4za9lplGiyPR51H/pTfGfOthWL6Nj+auTPva3t
GnlYh+GUQ+44JJ5+biOK5HUpEEeUR87EN2z5lTWsHAxg7PolBiKYKvV4Wp33xJqR
ILGlYw04reOAljTn0Zf738IL5WpY9etj1GnNxQeEKFRrdF1GH1i6r/JRONU1hGHu
EiZT6XwoN07V+JURB+fPqtY1IXODDC8904OwLg5fKhBggWvR2IaiW1XH+ToFXQgU
idae1+Dy85Hi4s40SL5GcSO8mVHPEGEspwM/5G87YqIu3uH4L9+Wd4zTwVYLcwNm
SSUCDGj2d+/JIug5dPBV8GL7jrhPNnPOu8HR+bIxY9XUhyf+IZVlUNYlorup3lbm
rAouZiCevRhQRBAx33Id5ZOMhlIpPONKObcCEKmdm6WLlnkkqgKQbnapd/I/1mfT
nP5N7oWUtfXO4oq4k5XpJBcTVhXU+DzpQ7EMDGv3mSmIem0wsDmXPbF2MfoSIim8
UuToZ1YF5MuxNLGwYnpkUaxWhKKOFWMvAe65eXP+ureIjOJwQ4f85Nkro0JvKbr8
nVdzl3rDy49tnqW7Qu3vaNPOQneuWaOqCoQcYDcVAiqk11UhH9E=
=mBP6
-----END PGP SIGNATURE-----
Merge tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A more fixes for recently reported or discovered problems:
- fix corner case of send that would generate potentially large
stream of zeros if there's a hole at the end of the file
- fix chunk validation in zoned mode on conventional zones, it was
possible to create chunks that would not be allowed on sequential
zones
- fix validation of dev-replace ioctl filenames
- fix KCSAN warnings about access to block reserve struct members"
* tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
btrfs: fix data races when accessing the reserved amount of block reserves
btrfs: send: don't issue unnecessary zero writes for trailing hole
btrfs: dev-replace: properly validate device names
btrfs: zoned: don't skip block group profile checks on conventional zones
The addition of bal_rank_mask with encoding version 17 was merged
into ceph.git in Oct 2022 and made it into v18.2.0 release normally.
A few months later, the much delayed addition of max_xattr_size got
merged, also with encoding version 17, placed before bal_rank_mask
in the encoding -- but it didn't make v18.2.0 release.
The way this ended up being resolved on the MDS side is that
bal_rank_mask will continue to be encoded in version 17 while
max_xattr_size is now encoded in version 18. This does mean that
older kernels will misdecode version 17, but this is also true for
v18.2.0 and v18.2.1 clients in userspace.
The best we can do is backport this adjustment -- see ceph.git
commit 78abfeaff27fee343fb664db633de5b221699a73 for details.
[ idryomov: changelog ]
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/64440
Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When CONFIG_NTFS3_LZX_XPRESS is not set then we get the following build
error:
fs/ntfs3/frecord.c:2460:16: error: unused variable ‘i_size’
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Fixes: 4fd6c08a16d7 ("fs/ntfs3: Use i_size_read and i_size_write")
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When linking or renaming a file, if only one of the source or
destination directory is backed by an S_PRIVATE inode, then the related
set of layer masks would be used as uninitialized by
is_access_to_paths_allowed(). This would result to indeterministic
access for one side instead of always being allowed.
This bug could only be triggered with a mounted filesystem containing
both S_PRIVATE and !S_PRIVATE inodes, which doesn't seem possible.
The collect_domain_accesses() calls return early if
is_nouser_or_private() returns false, which means that the directory's
superblock has SB_NOUSER or its inode has S_PRIVATE. Because rename or
link actions are only allowed on the same mounted filesystem, the
superblock is always the same for both source and destination
directories. However, it might be possible in theory to have an
S_PRIVATE parent source inode with an !S_PRIVATE parent destination
inode, or vice versa.
To make sure this case is not an issue, explicitly initialized both set
of layer masks to 0, which means to allow all actions on the related
side. If at least on side has !S_PRIVATE, then
collect_domain_accesses() and is_access_to_paths_allowed() check for the
required access rights.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shervin Oloumi <enlightened@chromium.org>
Cc: stable@vger.kernel.org
Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Link: https://lore.kernel.org/r/20240219190345.2928627-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
MKTME repurposes the high bit of physical address to key id for encryption
key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same,
the valid bits in the MTRR mask register are based on the reduced number
of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts
it from the total usable physical bits, but it is called too late.
Move the call to early_init_intel() so that it is called in setup_arch(),
before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with
"disable_mtrr_cleanup". Without the patch, the values written to the
MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and
the writes failed; with the patch, the values are 46-bit wide, which
matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen <zixchen@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com
In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct
value straight away, instead of a two-phase approach"), the initialization
of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is
incorrect because early_init_amd() expected to be able to reduce the
value according to the contents of CPUID leaf 0x8000001f.
Fortunately, the bug was negated by init_amd()'s call to early_init_amd(),
which does reduce x86_phys_bits in the end. However, this is very
late in the boot process and, most notably, the wrong value is used for
x86_phys_bits when setting up MTRRs.
To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is
set/cleared, and c->extended_cpuid_level is retrieved.
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com
Commit a5a923038d70 (fbdev: fbcon: Properly revert changes when
vc_resize() failed) started restoring old font data upon failure (of
vc_resize()). But it performs so only for user fonts. It means that the
"system"/internal fonts are not restored at all. So in result, the very
first call to fbcon_do_set_font() performs no restore at all upon
failing vc_resize().
This can be reproduced by Syzkaller to crash the system on the next
invocation of font_get(). It's rather hard to hit the allocation failure
in vc_resize() on the first font_set(), but not impossible. Esp. if
fault injection is used to aid the execution/failure. It was
demonstrated by Sirius:
BUG: unable to handle page fault for address: fffffffffffffff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286
Call Trace:
<TASK>
con_font_get drivers/tty/vt/vt.c:4558 [inline]
con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673
vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline]
vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752
tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803
vfs_ioctl fs/ioctl.c:51 [inline]
...
So restore the font data in any case, not only for user fonts. Note the
later 'if' is now protected by 'old_userfont' and not 'old_data' as the
latter is always set now. (And it is supposed to be non-NULL. Otherwise
we would see the bug above again.)
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Fixes: a5a923038d70 ("fbdev: fbcon: Properly revert changes when vc_resize() failed")
Reported-and-tested-by: Ubisectech Sirius <bugreport@ubisectech.com>
Cc: Ubisectech Sirius <bugreport@ubisectech.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirislaby@kernel.org
At least the device test requires that no other driver using TTM is
loaded. So make those unit tests depend on UML || COMPILE_TEST to
prevent people from trying them on bare metal.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/all/20240219230116.77b8ad68@yea/
Tegra DRM doesn't support display on Tegra234 and later, so make sure
not to remove any existing framebuffers in that case.
v2: - add comments explaining how this situation can come about
- clear DRIVER_MODESET and DRIVER_ATOMIC feature bits
Fixes: 6848c291a54f ("drm/aperture: Convert drivers to aperture interfaces")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223150333.1401582-1-thierry.reding@gmail.com
It seems that if userspace provides a correct IFA_TARGET_NETNSID value
but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr()
returns -EINVAL with an elevated "struct net" refcount.
Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that we keep GRO flag in sync when XDP is disabled while
the device is closed.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
veth sets NETIF_F_GRO automatically when XDP is enabled,
because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which
is called both on ndo_stop and when XDP is turned off.
To avoid the flag from being cleared when the device is brought
down, the clearing is skipped when IFF_UP is not set.
Bringing the device down should indeed not modify its features.
Unfortunately, this means that clearing is also skipped when
XDP is disabled _while_ the device is down. And there's nothing
on the open path to bring the device features back into sync.
IOW if user enables XDP, disables it and then brings the device
up we'll end up with a stray GRO flag set but no NAPI instances.
We don't depend on the GRO flag on the datapath, so the datapath
won't crash. We will crash (or hang), however, next time features
are sync'ed (either by user via ethtool or peer changing its config).
The GRO flag will go away, and veth will try to disable the NAPIs.
But the open path never created them since XDP was off, the GRO flag
was a stray. If NAPI was initialized before we'll hang in napi_disable().
If it never was we'll crash trying to stop uninitialized hrtimer.
Move the GRO flag updates to the XDP enable / disable paths,
instead of mixing them with the ndo_open / ndo_close paths.
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some more mostly boring fixes, but some not
User reported ones:
- the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty performance
bug; user reported an unter initially taking 2 seconds and then ~2
minutes
- kill a __GFP_NOFAIL in the buffered read path; this was a leftover
from the trickier fix to kill __GFP_NOFAIL in readahead, where we
can't return errors (and have to silently truncate the read
ourselves).
bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
filesystems because our folio state is just barely too big, 2MB
hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
additionally, the flags argument was just buggy, we weren't supplying
GFP_KERNEL previously (!).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXbqqMACgkQE6szbY3K
bnYjnhAApY0vT6eVIYrZ7JGR6tw++xw02xRkcNW4zFE8INAvxQor5TXMEKkJs9Ui
owh8WZjydXe0FJPE+pROcHMfxkkup4yP2SafgzR8DGERBwZbV9x7hvUbdG90EngY
V/MevV+vr6UaV7133sY70K8BqUA/yAlCmmtOQVFgGRprEtEPS4Ur3vYR5+IzA0N7
OhNXu6LxzkYbrNp9qroCN2UEVgRDJ/Mtda6uHfIUrqOQMUhiq2og9kvzJXzIrW9l
URxm4eFQtJe0Yz09Ppypve+FutJIbtuDEYbcMJNT9Ig7BosD5vDjy9nhp8A5Q1Uk
oDWBbCJhDdSYSVC/EQY8bv0AaCkyCa7vshSoKq0fDCFJ8k+nQ1YMF5wNhfgJhtU9
Tl2Qytphp9/dxkvpIsR/5iNhLply9xTka1Wkp3G+3QJk0c17Dftpvz0/WhKI0P2B
d6y4mz/hfCtWoSQOJbJl3fM/ZVpjH54VHDmb7sGyb5f+bTUkX6OUoJ4os8MNKGcS
GdpEoWt/IAQj69c7w8aama5TXJ4kYe0XtXwbHTRE4j1PIQJA5SPvVt+32spRtb6i
1gIa94uWKYMuG2U0XGxookHfZZZaMQkl79oXJOYRiC589YVyZC1Lp5iqr027jHEQ
1HacrWPekPfmrhchyIzpH1mHOgaS+FKoD7eKrkvj0QSxpwfwpbI=
=KNWR
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Some more mostly boring fixes, but some not
User reported ones:
- the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty
performance bug; user reported an untar initially taking two
seconds and then ~2 minutes
- kill a __GFP_NOFAIL in the buffered read path; this was a leftover
from the trickier fix to kill __GFP_NOFAIL in readahead, where we
can't return errors (and have to silently truncate the read
ourselves).
bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
filesystems because our folio state is just barely too big, 2MB
hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
additionally, the flags argument was just buggy, we weren't
supplying GFP_KERNEL previously (!)"
* tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs:
bcachefs: fix bch2_save_backtrace()
bcachefs: Fix check_snapshot() memcpy
bcachefs: Fix bch2_journal_flush_device_pins()
bcachefs: fix iov_iter count underflow on sub-block dio read
bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
bcachefs: Kill __GFP_NOFAIL in buffered read path
bcachefs: fix backpointer_to_text() when dev does not exist
- The XFS online fsck documentation uses incredibly deeply nested
subsection and list nesting; that broke the PDF docs build. Tweak a
parameter to tell LaTeX to allow the deeper nesting.
- Fix a 6.8 PDF-build regression
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmXbi5QPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YZSMH/RIZh48S/Jh5mhjzqnKhGf1sFn6lSk8sFY3I
uJqML/LPo6GYzX8WvYKlfyP9+UvrLiDcQF0Er6MeIhK6mhKE1Lp7w1YvRgeXcgFR
H9DtxA4fJSGWlAaMqZBwsXjF2EFwjyxHtHUeNyaJ+YocHfrT6L9Cp9uBEvdT3Iye
F191VpjWLrFD0DJEh64CcmNd3rggN5jeD/n24dbNOmnem1cak2brIIUeltdkUmQG
48Hr27xqYF1QyVckfoRtnT/C3AyaCKbxRbTxeAjwUDjU+7nCsHf1MKltiFAZHnFs
7ZLsOboLhmR+y9xiZUg7OlpRaVj1C+7JSYC+WSaNjwRkkIfJUu4=
=MEzm
-----END PGP SIGNATURE-----
Merge tag 'docs-6.8-fixes3' of git://git.lwn.net/linux
Pull two documentation build fixes from Jonathan Corbet:
- The XFS online fsck documentation uses incredibly deeply nested
subsection and list nesting; that broke the PDF docs build. Tweak a
parameter to tell LaTeX to allow the deeper nesting.
- Fix a 6.8 PDF-build regression
* tag 'docs-6.8-fixes3' of git://git.lwn.net/linux:
docs: translations: use attribute to store current language
docs: Instruct LaTeX to cope with deeper nesting
Here are some small USB fixes for 6.8-rc6 to resolve some reported
problems. These include:
- regression fixes with typec tpcm code as reported by many
- cdnsp and cdns3 driver fixes
- usb role setting code bugfixes
- build fix for uhci driver
- ncm gadget driver bugfix
- MAINTAINERS entry update
All of these have been in linux-next all week with no reported issues
and there is at least one fix in here that is in Thorsten's regression
list that is being tracked.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZdtGEA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymzsgCg2IsWqIR72XUGsa5rrbRnskOP/G4An24BmUb6
t34d0VjiHagZTFlfRx6g
=eOL1
-----END PGP SIGNATURE-----
Merge tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.8-rc6 to resolve some reported
problems. These include:
- regression fixes with typec tpcm code as reported by many
- cdnsp and cdns3 driver fixes
- usb role setting code bugfixes
- build fix for uhci driver
- ncm gadget driver bugfix
- MAINTAINERS entry update
All of these have been in linux-next all week with no reported issues
and there is at least one fix in here that is in Thorsten's regression
list that is being tracked"
* tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tpcm: Fix issues with power being removed during reset
MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role"
usb: gadget: omap_udc: fix USB gadget regression on Palm TE
usb: dwc3: gadget: Don't disconnect if not started
usb: cdns3: fix memory double free when handle zero packet
usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
usb: roles: don't get/set_role() when usb_role_switch is unregistered
usb: roles: fix NULL pointer issue when put module's reference
usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
usb: cdnsp: blocked some cdns3 specific code
usb: uhci-grlib: Explicitly include linux/platform_device.h
Here are 3 small serial/tty driver fixes for 6.8-rc6 that resolve the
following reported errors:
- riscv hvc console driver fix that was reported by many
- amba-pl011 serial driver fix for RS485 mode
- stm32 serial driver fix for RS485 mode
All of these have been in linux-next all week with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZdtGnA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymrqwCfSIsUj9GLazXJTTTgMz1I94HXLrQAnjq9QOtg
EFt6xmUGcF4zFhnfSLal
=/k5+
-----END PGP SIGNATURE-----
Merge tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are three small serial/tty driver fixes for 6.8-rc6 that resolve
the following reported errors:
- riscv hvc console driver fix that was reported by many
- amba-pl011 serial driver fix for RS485 mode
- stm32 serial driver fix for RS485 mode
All of these have been in linux-next all week with no reported
problems"
* tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: amba-pl011: Fix DMA transmission in RS485 mode
serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled
tty: hvc: Don't enable the RISC-V SBI console by default
point in the return-to-userspace path, otherwise memory accesses after
the VERW execution could cause data to land in CPU buffers again
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXbG7IACgkQEsHwGGHe
VUoEEg//d1qt/PEWCC23wMO6gLMl4J/e4ZQAuGOKGed/jUmOaQKpHJmpDMRc0li5
llRYDdfE0ikmtQT3t9vQDs3xbWfT5bLMsijliRimb193FaS1HGlHMMS1nxhfjyfv
MecbWfkwzX2JnrxJpsbfue+7kks3HyIXYsXV7kSFiHavk4F3GFQXYLO11pKbNQwN
9UfjJDeVsrcWPGCHhoPKF5NHUnQKIA8ZC6g8yBq894AtdWOhFY7ePKBZefUWQQ1n
myc5GJ3dKFICMCZvkMABtHYCmHU/W3y/6tPtnrz3kT8GdCIAHG+K9VRUfY1ml94H
x327GoM3sEzHLsPizKy00/Uao+j6FOtv631LoDLsO2MF3sHoTZDaSgg5y2D/ZC7t
IZdK3mUGtdINRhGiWWpdxyaMfkQ62cdZk8FkeYkRAewYS6WYSdMX3cPqFNy4Ss5u
r3reMOD3JcxAatcqhHMXjARMfY+N08gQBpxBul3ejgH8t8aY7xJx6Vggty5kBlHZ
7urV9jIRxSXfbBmOcYu6HP1ucFLWNSUQCBn7Imrh+5zbE1XVv7NaAWvT4Nmgb0/X
57fHoYYSVwaJ0k3zWWM7QYEdcuJ7IZnVgTCQYx26Ec2AOQRxE9ose+awTLYtTbp1
T+XaOlItHKMRzx9K46D7xHwmC5qiokFki3exp5vfGZxGyT3+t/c=
=n5us
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure clearing CPU buffers using VERW happens at the latest
possible point in the return-to-userspace path, otherwise memory
accesses after the VERW execution could cause data to land in CPU
buffers again
* tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
x86/entry_32: Add VERW just before userspace transition
x86/entry_64: Add VERW just before userspace transition
x86/bugs: Add asm helpers for executing VERW
from silently failing to set it up
- Do not call bus_get_dev_root() for the mbigen irqchip as it always
returns NULL - use NULL directly
- Fix hardware interrupt number truncation when assigning MSI interrupts
- Correct sending end-of-interrupt messages to disabled interrupts lines on
RISC-V PLIC
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXbGgcACgkQEsHwGGHe
VUqrig//ay2UcLEi8CwxobHXIpuUq+pMt1pLhdDtyehTKx+T44GCwMFXGML8H27A
8CszKTEJsRxXuUP1iXquECfYqYqGmOZHcIMCX0vDodezRriJXq3m549zdoVY6LIy
m7x5mN4rfc8xaK/krSz0IKgCn7TZ7Nugw8zHE9PEJ7hj/exIA6EH2f1p0dbDc2z8
PRWsexi39mVLEstOl7yf5+hys6RN07a+9+PFJrEWCC0bO5We9Z+m7gnpu3zUrwcO
LlDAU6UwWhVc+xFipW9SFYEhCqtprdfUftf1OW2BLe1TM7pHxdvA3OwlT5ZxxN90
h4wmQ084v08hcn8YpUkaK5fWEtT+1isD3/8dVUMSRQQ4jcjLiEAVdVOvKOmJNEeJ
+MYqAktCoyay9ZYCrpZRRIVYfC4/FLMEPExPwILFM6nfMMVEBfkDqFyjGyw3uls7
QT7eHSo121kQsZc5V/8SuU30f64w/vJbtaZIYOjSR9+hQMQ+i+8cMuV0RSvmDIa2
1vGdhOLvG5Rk7Wy9xd9ITXbeq+z9KD/tH8BadyfARosvQTrg6aMhhqQRHWJtS6Pf
Vg50yS6D8ETzNNZSPFABrjBKiqQmEo3ILlUpMbR8jGBaLggqZhTs8eBRj3+XIXp8
UxgB2b47qKmZI9eaFzne9scnOGmQSRuZ7x8IrwworFGzChSzwpU=
=cMur
-----END PGP SIGNATURE-----
Merge tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
from silently failing to set it up
- Do not call bus_get_dev_root() for the mbigen irqchip as it always
returns NULL - use NULL directly
- Fix hardware interrupt number truncation when assigning MSI
interrupts
- Correct sending end-of-interrupt messages to disabled interrupts
lines on RISC-V PLIC
* tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Do not assume vPE tables are preallocated
irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
PCI/MSI: Prevent MSI hardware interrupt number truncation
irqchip/sifive-plic: Enable interrupt if needed before EOI
- Fix page refcount leak when looking up specific inodes
introduced by metabuf reworking.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmXbQekRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qpkyg//cjKnuzI2gHL3ff1o0jiTMD11Y/EmwzQS
C+JgE8WDMjUlVQpbwYnFWo/6se8qvk/fwTXPcRc45piF8YNZVchwsagxO626Kab7
0xTX7ZUgjuth6ahrkAJZkJNMgO4mYf928uYB6EVWaTb4iF0iT6glEOsSVAR/cssL
wYKbtgv3OP9t6fQcN/XL31hRkqr2MPk0y5Q27KyT4zo4lrx3xyah7Ndo3aEK/RcM
+6FUwqRiDsgDF/Ga65ylDvEp9eA03OFNHBn4DrORe3B9KV75NkmSJf/8QVEceNV/
9D072Hvt7/iyOq53AxWH3Jvp7aro/i0rvAHPbXZX4RVyqcJxaLYCQyrBFvQL0/Ie
B+793Iua8zbkQCbZ85LpTGrxAb5WydlSJp10AuHTr2MO+wS8bBqf96Jp9x1MuS9D
vqq7jbjwuZfnFUjpzu49GF6htG+WRVgY5TDzU6IAr3izXuqVxmz+zw5zExbGMmKm
2S5lb+q68DOBP4YaelAQHh97k0pYDW3eQ6GhDD2FA4P/DOr8p3vszZFBeaqaL70v
WS8z1wzOgEaleTRN4iRFlMCTvRLjOtDEaUNquohcHqJLe7w/DF7gzCU+0//8dVjD
cZieFX8uZDtzzcjrlYWeTo8oHd0q2WYixbCN8P92/YXUFIVpfdl85ugyr9KvkZxd
jKzpAy2LbSw=
=/Bfj
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
- Fix page refcount leak when looking up specific inodes
introduced by metabuf reworking
* tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix refcount on the metabuf used for inode lookup
pathwalk. This series is a result of code audit (the second round
of it) and it should deal with most of that stuff. Exceptions: ntfs3
->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a
note for NTFS folks - when documentation says that a method may not block,
it *does* imply that blocking allocations are to be avoided. Really).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZdroDAAKCRBZ7Krx/gZQ
60dKAQCzp6rYr3ye4nylho9Rzu8LEpH04TuNf3C6JuyUaNHxHwEAvNLatZsyFnmV
Ksp2Rg/IlKPNtQgYJ8xPxv9DFmNe8gI=
=47Un
-----END PGP SIGNATURE-----
Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull RCU pathwalk fixes from Al Viro:
"We still have some races in filesystem methods when exposed to RCU
pathwalk. This series is a result of code audit (the second round of
it) and it should deal with most of that stuff.
Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
Up to maintainers (a note for NTFS folks - when documentation says
that a method may not block, it *does* imply that blocking allocations
are to be avoided. Really)"
[ More explanations for people who aren't familiar with the vagaries of
RCU path walking: most of it is hidden from filesystems, but if a
filesystem actively participates in the low-level path walking it
needs to make sure the fields involved in that walk are RCU-safe.
That "actively participate in low-level path walking" includes things
like having its own ->d_hash()/->d_compare() routines, or by having
its own directory permission function that doesn't just use the common
helpers. Having a ->d_revalidate() function will also have this issue.
Note that instead of making everything RCU safe you can also choose to
abort the RCU pathwalk if your operation cannot be done safely under
RCU, but that obviously comes with a performance penalty. One common
pattern is to allow the simple cases under RCU, and abort only if you
need to do something more complicated.
So not everything needs to be RCU-safe, and things like the inode etc
that the VFS itself maintains obviously already are. But these fixes
tend to be about properly RCU-delaying things like ->s_fs_info that
are maintained by the filesystem and that got potentially released too
early. - Linus ]
* tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ext4_get_link(): fix breakage in RCU mode
cifs_get_link(): bail out in unsafe case
fuse: fix UAF in rcu pathwalks
procfs: make freeing proc_fs_info rcu-delayed
procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
nfs: fix UAF on pathwalk running into umount
nfs: make nfs_set_verifier() safe for use in RCU pathwalk
afs: fix __afs_break_callback() / afs_drop_open_mmap() race
hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
affs: free affs_sb_info with kfree_rcu()
rcu pathwalk: prevent bogus hard errors from may_lookup()
fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
and a fix for erofs failure exit breakage (had been there since
way back).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZdrkZAAKCRBZ7Krx/gZQ
67D8AP0eM68yZvbThA/Hb5iElDh3Aogt1AW/QAu9/alkDVHr+wD+PKqhamC8WXGk
b1QZ5AOHQFwzkzdF4738fdbujquBWQE=
=Ra0D
-----END PGP SIGNATURE-----
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A couple of fixes - revert of regression from this cycle and a fix for
erofs failure exit breakage (had been there since way back)"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
erofs: fix handling kern_mount() failure
Revert "get rid of DCACHE_GENOCIDE"
The 'duplicates' bool argument is always true when efivar_init() is
called from its only caller so let's just drop it instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Al points out that kill_sb() will be called if efivarfs_fill_super()
fails and so there is no point in cleaning up the efivar entry list.
Reported-by: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Work around a quirk in a few old (2011-ish) UEFI implementations, where
a call to `GetNextVariableName` with a buffer size larger than 512 bytes
will always return EFI_INVALID_PARAMETER.
There is some lore around EFI variable names being up to 1024 bytes in
size, but this has no basis in the UEFI specification, and the upper
bounds are typically platform specific, and apply to the entire variable
(name plus payload).
Given that Linux does not permit creating files with names longer than
NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a
reasonable limit.
Cc: <stable@vger.kernel.org> # 6.1+
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
1) errors from ext4_getblk() should not be propagated to caller
unless we are really sure that we would've gotten the same error
in non-RCU pathwalk.
2) we leak buffer_heads if ext4_getblk() is successful, but bh is
not uptodate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->d_revalidate() bails out there, anyway. It's not enough
to prevent getting into ->get_link() in RCU mode, but that
could happen only in a very contrieved setup. Not worth
trying to do anything fancy here unless ->d_revalidate()
stops kicking out of RCU mode at least in some cases.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->permission(), ->get_link() and ->inode_get_acl() might dereference
->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
as well) when called from rcu pathwalk.
Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
and dropping ->user_ns rcu-delayed too.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
is still synchronous, but that's not a problem - it does
rcu-delay everything that needs to be)
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
that keeps both around until struct inode is freed, making access
to them safe from rcu-pathwalk
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
server->flags
server->caps
*(server->io_stats)
and, worst of all, call
server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()). We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nfs_set_verifier() relies upon dentry being pinned; if that's
the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
that ->d_parent points to a positive dentry. For something
we'd run into in RCU mode that is *not* true - dentry might've
been through dentry_kill() just as we grabbed ->d_lock, with
its parent going through the same just as we get to into
nfs_set_verifier_locked(). It might get to detaching inode
(and zeroing ->d_inode) before nfs_set_verifier_locked() gets
to fetching that; we get an oops as the result.
That can happen in nfs{,4} ->d_revalidate(); the call chain in
question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
We have checked that the parent had been positive, but that's
done before we get to nfs_set_verifier() and it's possible for
memory pressure to pick our dentry as eviction candidate by that
time. If that happens, back-to-back attempts to kill dentry and
its parent are quite normal. Sure, in case of eviction we'll
fail the ->d_seq check in the caller, but we need to survive
until we return there...
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
do queue_work(&vnode->cb_work). In afs_drop_open_mmap() we decrement
->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
The trouble is, there's nothing to prevent __afs_break_callback() from
seeing ->cb_nr_mmap before the decrement and do queue_work() after both
the decrement and flush_work(). If that happens, we might be in trouble -
vnode might get freed before the queued work runs.
__afs_break_callback() is always done under ->cb_lock, so let's make
sure that ->cb_nr_mmap can change from non-zero to zero while holding
->cb_lock (the spinlock component of it - it's a seqlock and we don't
need to mess with the counter).
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->d_hash() and ->d_compare() use those, so we need to delay freeing
them.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
that is in process of getting shut down.
Besides, having nls and upcase table cleanup moved from ->put_super() towards
the place where sbi is freed makes for simpler failure exits.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
one of the flags in it is used by ->d_hash()/->d_compare()
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If lazy call of ->permission() returns a hard error, check that
try_to_unlazy() succeeds before returning it. That both makes
life easier for ->permission() instances and closes the race
in ENOTDIR handling - it is possible that positive d_can_lookup()
seen in link_path_walk() applies to the state *after* unlink() +
mkdir(), while nd->inode matches the state prior to that.
Normally seeing e.g. EACCES from permission check in rcu pathwalk
means that with some timings non-rcu pathwalk would've run into
the same; however, running into a non-executable regular file
in the middle of a pathname would not get to permission check -
it would fail with ENOTDIR instead.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Avoids fun races in RCU pathwalk... Same goes for freeing LSM shite
hanging off super_block's arse.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
check_snapshot() copies the bch_snapshot to a temporary to easily handle
older versions that don't have all the fields of the current version,
but it lacked a min() to correctly handle keys newer and larger than the
current version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If a journal write errored, the list of devices it was written to could
be empty - we're not supposed to mark an empty replicas list.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_direct_IO_read() checks the request offset and size for sector
alignment and then falls through to a couple calculations to shrink
the size of the request based on the inode size. The problem is that
these checks round up to the fs block size, which runs the risk of
underflowing iter->count if the block size happens to be large
enough. This is triggered by fstest generic/361 with a 4k block
size, which subsequently leads to a crash. To avoid this crash,
check that the shorten length doesn't exceed the overall length of
the iter.
Fixes:
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
keyspace where no keys are visible in the current snapshot, we have a
problem - we'll scan for a very long time before scanning terminates.
Awhile back, this was fixed for most cases with peek_upto() (and
assertions that enforce that it's being used).
But the fix missed the fact that the inodes btree is different - every
key offset is in a different snapshot tree, not just the inode field.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the
easy one in read_single_folio() (where wa can return an error) was
missed - oops.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>