IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The macro is unused, remove it to tame gcc warning:
fs/nfsd/nfs3proc.c:702:0: warning: macro "nfsd3_fhandleres" is not used
[-Wunused-macros]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Have the NFSD encoders annotate the boundaries of every
direct-data-placement eligible result data payload. Then change
svcrdma to use that annotation instead of the xdr->page_len
when handling Write chunks.
For NFSv4 on RDMA, that enables the ability to recognize multiple
result payloads per compound. This is a pre-requisite for supporting
multiple Write chunks per RPC transaction.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: "result payload" is a less confusing name for these
payloads. "READ payload" reflects only the NFS usage.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If the flexfiles mirroring is enabled, then the read code expects to be
able to set pgio->pg_mirror_idx to point to the data server that is
being used for this particular read. However it does not change the
pg_mirror_count because we only need to send a single read.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
- revert efivarfs kmemleak fix again - it was a false positive;
- make CONFIG_EFI_EARLYCON depend on CONFIG_EFI explicitly so it does not
pull in other dependencies unnecessarily if CONFIG_EFI is not set
- defer attempts to load SSDT overrides from EFI vars until after the
efivar layer is up.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAl+/i/UACgkQw08iOZLZ
jyRb+Qv/RVQSvZW+u6MrYPZthmVxnNQ4pFHKAjibD3h9isbrWdq6AETtNghUGWQr
nr1WeOj4Qa2aDe4z63Sra3QNsdLnSn0FsHuJPD8rozMd4N4jiChtpLukdShibXXz
25yfs7KpNyqwj3QnFd2LpJBXGqzdoKFrzWbnWSnFEMrSfkqptBhogslhVxzal8Uz
4hUyGhe/iBfgU720uoVCmofPpYxqV/cndEmsnA8rZnW5yTPXIn6f9c4KUOcFxgLS
LchxZYRd9GZoQ4Yt40ih9JX1ILZNhrhXh96cfNuUiVwnf9Lg7xSOGIX3+e3PR9Lz
1VI4UuA7HM8qDZx+N5iiBRrBIgtdtMgKLEip3n+x84/7P/p26HXe3OJCPBWh0Q0q
aWCcV8qiwju6VYK6dv4Gz+2/OYiSKXrXZCx57dnCr7tv5srYenwsvCCYa9wLTpMW
16/DRmkHcfbAdfS38Bhs3qY/zK+He7XE/sPJao/NuVwoJ3Nu0dA4LR7JFG7TVjKm
bepO3A8W
=gvBz
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-for-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"More EFI fixes forwarded from Ard Biesheuvel:
- revert efivarfs kmemleak fix again - it was a false positive
- make CONFIG_EFI_EARLYCON depend on CONFIG_EFI explicitly so it does
not pull in other dependencies unnecessarily if CONFIG_EFI is not
set
- defer attempts to load SSDT overrides from EFI vars until after the
efivar layer is up"
* tag 'efi-urgent-for-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: EFI_EARLYCON should depend on EFI
efivarfs: revert "fix memory leak in efivarfs_create()"
efi/efivars: Set generic ops before loading SSDT
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl/BGBQACgkQxWXV+ddt
WDtRYQ/+IUGjJ4l6MyL3PwLgTVabKsSpm2R3y3M/0tVJ0FIhDXYbkpjB2CkpIdcH
jUvHnRL4H59hwG6rwlXU2oC298FNbbrLGIU+c9DR50RuyQCDGnT02XvxwfDIEFzp
WLNZ/CAhRkm6boj//70lV26BpeXT59KMYwNixfCNTXq9Ir0qYHHCGg4cEBQLS++2
JUU8XVTLURIYiFOLbwmABI7V43OgDhdORIr+qnR8xjCUyhusZsjVVbvIdW3BDi/S
wK7NJsfuqgsF0zD9URJjpwTFiJL9SvBLWR8JnM9NiLW3ZbkGBL+efL6mdWuH7534
gruJRS2zYPMO2/Kpjy/31CWLap3PUSD3i8cKF+uo3liojWuSUhN8kfvcNxJVd5se
NkEK+4zOjsDIVbv7gcjThSv4KTnOUO/XfN9TWUMuduaMBmGQNaQut1FpGV98utiK
yW6x8xqcR4SI+lqY6ILqCK+qUHf19BLSsuyzZdTIontKKRA9F9hY8a4XTZuzTWml
BGYmFGP640vOo8C9GjrQfpAwa7CB/DnF/cg1AAmuZ8vrEm9zYjmauFKK8ZPcveA3
KGrnmIlYjhAIX16oRbfwOgj9D2xa1loBzJyHQHByvCMXGVFBnqRTRANRHFrdQWJB
qh9+J4EJcUXPE9WGHxAW/g9vpFkV7IRABHs7aUB8zApxI9nGA0Q=
=kcxn
-----END PGP SIGNATURE-----
Merge tag 'for-5.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes for various warnings that accumulated over past two weeks:
- tree-checker: add missing return values for some errors
- lockdep fixes
- when reading qgroup config and starting quota rescan
- reverse order of quota ioctl lock and VFS freeze lock
- avoid accessing potentially stale fs info during device scan,
reported by syzbot
- add scope NOFS protection around qgroup relation changes
- check for running transaction before flushing qgroups
- fix tracking of new delalloc ranges for some cases"
* tag 'for-5.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix lockdep splat when enabling and disabling qgroups
btrfs: do nofs allocations when adding and removing qgroup relations
btrfs: fix lockdep splat when reading qgroup config on mount
btrfs: tree-checker: add missing returns after data_ref alignment checks
btrfs: don't access possibly stale fs_info data for printing duplicate device
btrfs: tree-checker: add missing return after error in root_item
btrfs: qgroup: don't commit transaction when we already hold the handle
btrfs: fix missing delalloc new bit for new delalloc ranges
Commit 20f829999c38 ("gfs2: Rework read and page fault locking") lifted
the glock lock taking from the low-level ->readpage and ->readahead
address space operations to the higher-level ->read_iter file and
->fault vm operations. The glocks are still taken in LM_ST_SHARED mode
only. On filesystems mounted without the noatime option, ->read_iter
sometimes needs to update the atime as well, though. Right now, this
leads to a failed locking mode assertion in gfs2_dirty_inode.
Fix that by introducing a new update_time inode operation. There, if
the glock is held non-exclusively, upgrade it to an exclusive lock.
Reported-by: Alexander Aring <aahringo@redhat.com>
Fixes: 20f829999c38 ("gfs2: Rework read and page fault locking")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When one task is in io_uring_cancel_files() and another is doing
io_prep_async_work() a race may happen. That's because after accounting
a request inflight in first call to io_grab_identity() it still may fail
and go to io_identity_cow(), which migh briefly keep dangling
work.identity and not only.
Grab files last, so io_prep_async_work() won't fail if it did get into
->inflight_list.
note: the bug shouldn't exist after making io_uring_cancel_files() not
poking into other tasks' requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
GFS2's freeze/thaw mechanism uses a special freeze glock to control its
operation. It does this with a sync glock operation (glops.c) called
freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the
glops function causes the file system to be frozen. This is intended. However,
GFS2's mount and unmount processes also hold the freeze glock to prevent other
processes, perhaps on different cluster nodes, from mounting the frozen file
system in read-write mode.
Before this patch, there was no check in freeze_go_sync for whether a freeze
in intended or whether the glock demote was caused by a normal unmount.
So it was trying to freeze the file system it's trying to unmount, which
ends up in a deadlock.
This patch adds an additional check to freeze_go_sync so that demotes of the
freeze glock are ignored if they come from the unmount process.
Fixes: 20b329129009 ("gfs2: Fix regression in freeze_go_sync")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
If gfs2 tries to mount a (corrupt) file system that has no resource
groups it still tries to set preferences on the first one, which causes
a kernel null pointer dereference. This patch adds a check to function
gfs2_ri_update so this condition is detected and reported back as an
error.
Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The memory leak addressed by commit fe5186cf12e3 is a false positive:
all allocations are recorded in a linked list, and freed when the
filesystem is unmounted. This leads to double frees, and as reported
by David, leads to crashes if SLUB is configured to self destruct when
double frees occur.
So drop the redundant kfree() again, and instead, mark the offending
pointer variable so the allocation is ignored by kmemleak.
Cc: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Fixes: fe5186cf12e3 ("efivarfs: fix memory leak in efivarfs_create()")
Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl+9eV4ACgkQiiy9cAdy
T1EXOAv+JF6QZiwB6TELiusDLw+6UWpT7CcT1guL+eSrFVYEPIhqF4V4QXY0oA0Y
F8jTpHxEx8wdECAPPNGHh/a4E+Y1vV/W8Nv5DkglAwjeXAD2Y84VAp8hH890jnn0
M8I9qdnbfSodRueshpKScRPHbfp4Smlz1BR9R0syk7T7TmCy8aKNwYN1lBy5Nf9f
ICMn1F5e9z4nX43NJIwzO+NSPehtLm8ULFZER/pQ+tGDhwXTdFc9HPzfu0ZoYbEO
zADjmY4PItVYRINnWBntEBLYcAFeAB0finPTP2kCfXfRDF5cPgElp84F3Uro7se5
bioboePO+bUS0jigIiP3qZ7zTHEdJoICsiJzVGmDZYsawK3MAwamp2EH3axAr4B/
h4LULgN7nCatPW5lMo3/3EPZXVbXVTOYIB2REtqJugK8USQ9+v9SMLNT/qWn0GE5
bzZoZ22wkHEOn4EIxYSCX4tgj9cJ2v9B/0NMpTQLTECKBQi3iV32GxLglvMwZru6
eWKL5tZj
=lxdD
-----END PGP SIGNATURE-----
Merge tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four smb3 fixes for stable: one fixes a memleak, the other three
address a problem found with decryption offload that can cause a use
after free"
* tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Handle error case during offload read path
smb3: Avoid Mid pending list corruption
smb3: Call cifs reconnect from demultiplex thread
cifs: fix a memleak with modefromsid
The stated reasons for separating fscrypt_master_key::mk_secret_sem from
the standard semaphore contained in every 'struct key' no longer apply.
First, due to commit a992b20cd4ee ("fscrypt: add
fscrypt_prepare_new_inode() and fscrypt_set_context()"),
fscrypt_get_encryption_info() is no longer called from within a
filesystem transaction.
Second, due to commit d3ec10aa9581 ("KEYS: Don't write out to userspace
while holding key semaphore"), the semaphore for the "keyring" key type
no longer ranks above page faults.
That leaves performance as the only possible reason to keep the separate
mk_secret_sem. Specifically, having mk_secret_sem reduces the
contention between setup_file_encryption_key() and
FS_IOC_{ADD,REMOVE}_ENCRYPTION_KEY. However, these ioctls aren't
executed often, so this doesn't seem to be worth the extra complexity.
Therefore, simplify the locking design by just using key->sem instead of
mk_secret_sem.
Link: https://lore.kernel.org/r/20201117032626.320275-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
In an encrypted directory, a regular dentry (one that doesn't have the
no-key name flag) can only be created if the directory's encryption key
is available.
Therefore the calls to fscrypt_require_key() in __fscrypt_prepare_link()
and __fscrypt_prepare_rename() are unnecessary, as these functions
already check that the dentries they're given aren't no-key names.
Remove these unnecessary calls to fscrypt_require_key().
Link: https://lore.kernel.org/r/20201118075609.120337-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on ubifs by rejecting no-key dentries in ubifs_create(),
ubifs_mkdir(), ubifs_mknod(), and ubifs_symlink().
Note that ubifs doesn't actually report the duplicate filenames from
readdir, but rather it seems to replace the original dentry with a new
one (which is still wrong, just a different effect from ext4).
On ubifs, this fixes xfstest generic/595 as well as the new xfstest I
wrote specifically for this bug.
Fixes: f4f61d2cc6d8 ("ubifs: Implement encrypted filenames")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on f2fs by rejecting no-key dentries in f2fs_add_link().
Note that the weird check for the current task in f2fs_do_add_link()
seems to make this bug difficult to reproduce on f2fs.
Fixes: 9ea97163c6da ("f2fs crypto: add filename encryption for f2fs_add_link")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on ext4 by rejecting no-key dentries in ext4_add_entry().
Note that the duplicate check in ext4_find_dest_de() sometimes prevented
this bug. However in many cases it didn't, since ext4_find_dest_de()
doesn't examine every dentry.
Fixes: 4461471107b7 ("ext4 crypto: enable filename encryption")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
It's possible to create a duplicate filename in an encrypted directory
by creating a file concurrently with adding the encryption key.
Specifically, sys_open(O_CREAT) (or sys_mkdir(), sys_mknod(), or
sys_symlink()) can lookup the target filename while the directory's
encryption key hasn't been added yet, resulting in a negative no-key
dentry. The VFS then calls ->create() (or ->mkdir(), ->mknod(), or
->symlink()) because the dentry is negative. Normally, ->create() would
return -ENOKEY due to the directory's key being unavailable. However,
if the key was added between the dentry lookup and ->create(), then the
filesystem will go ahead and try to create the file.
If the target filename happens to already exist as a normal name (not a
no-key name), a duplicate filename may be added to the directory.
In order to fix this, we need to fix the filesystems to prevent
->create(), ->mkdir(), ->mknod(), and ->symlink() on no-key names.
(->rename() and ->link() need it too, but those are already handled
correctly by fscrypt_prepare_rename() and fscrypt_prepare_link().)
In preparation for this, add a helper function fscrypt_is_nokey_name()
that filesystems can use to do this check. Use this helper function for
the existing checks that fs/crypto/ does for rename and link.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:
============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20
but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&gl->gl_lockref.lock);
lock(&gl->gl_lockref.lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/0:1/12:
#0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
dump_stack+0x8b/0xb0
__lock_acquire.cold+0x19e/0x2e3
lock_acquire+0x150/0x410
? lockref_get+0x9/0x20
_raw_spin_lock+0x27/0x40
? lockref_get+0x9/0x20
lockref_get+0x9/0x20
delete_work_func+0x188/0x260
process_one_work+0x237/0x540
worker_thread+0x4d/0x3b0
? process_one_work+0x540/0x540
kthread+0x127/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit 0e539ca1bbbe ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
introduced additional locking in gfs2_rgrp_go_dump, which is also used for
dumping resource group glocks via debugfs. However, on that code path, the
glock spin lock is already taken in dump_glock, and taking it again in
gfs2_glock2rgrp leads to deadlock. This can be reproduced with:
$ mkfs.gfs2 -O -p lock_nolock /dev/FOO
$ mount /dev/FOO /mnt/foo
$ touch /mnt/foo/bar
$ cat /sys/kernel/debug/gfs2/FOO/glocks
Fix that by not taking the glock spin lock inside the go_dump callback.
Fixes: 0e539ca1bbbe ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
iov_iter::type is a bitmask that also keeps direction etc., so it
shouldn't be directly compared against ITER_*. Use proper helper.
Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: <stable@vger.kernel.org> # 5.9
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():
[ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
[ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 59.603673] Call Trace:
[ 59.604286] dump_stack+0x107/0x163
[ 59.605237] ubsan_epilogue+0xb/0x5a
[ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
[ 59.607335] ? lock_downgrade+0x6c0/0x6c0
[ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.609166] io_uring_create.cold+0x99/0x149
[ 59.610114] io_uring_setup+0xd6/0x140
[ 59.610975] ? io_uring_create+0x2510/0x2510
[ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
[ 59.614038] ? trace_hardirqs_on+0x5b/0x180
[ 59.615056] do_syscall_64+0x2d/0x40
[ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.617007] RIP: 0033:0x7f2bb8a0b239
This is caused by roundup_pow_of_two() if the input entries larger
enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
round up first, that may overflow and truncate it to 0, which is not
the expected behavior. So check the cq size first and then do round up.
Fixes: 88ec3211e463 ("io_uring: round-up cq size before comparing with rounded sq size")
Reported-by: Abaci Fuzz <abaci@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Although it isn't used directly by the ioctls,
"struct fsverity_descriptor" is required by userspace programs that need
to compute fs-verity file digests in a standalone way. Therefore
it's also needed to sign files in a standalone way.
Similarly, "struct fsverity_formatted_digest" (previously called
"struct fsverity_signed_digest" which was misleading) is also needed to
sign files if the built-in signature verification is being used.
Therefore, move these structs to the UAPI header.
While doing this, try to make it clear that the signature-related fields
in fsverity_descriptor aren't used in the file digest computation.
Acked-by: Luca Boccassi <luca.boccassi@microsoft.com>
Link: https://lore.kernel.org/r/20201113211918.71883-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
When adding or removing a qgroup relation we are doing a GFP_KERNEL
allocation which is not safe because we are holding a transaction
handle open and that can make us deadlock if the allocator needs to
recurse into the filesystem. So just surround those calls with a
nofs context.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are sectorsize alignment checks that are reported but then
check_extent_data_ref continues. This was not intended, wrong alignment
is not a minor problem and we should return with error.
CC: stable@vger.kernel.org # 5.4+
Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Syzbot reported a possible use-after-free when printing a duplicate device
warning device_list_add().
At this point it can happen that a btrfs_device::fs_info is not correctly
setup yet, so we're accessing stale data, when printing the warning
message using the btrfs_printk() wrappers.
==================================================================
BUG: KASAN: use-after-free in btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
Read of size 8 at addr ffff8880878e06a8 by task syz-executor225/7068
CPU: 1 PID: 7068 Comm: syz-executor225 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d6/0x29e lib/dump_stack.c:118
print_address_description+0x66/0x620 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report+0x132/0x1d0 mm/kasan/report.c:530
btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
device_list_add+0x1a88/0x1d60 fs/btrfs/volumes.c:943
btrfs_scan_one_device+0x196/0x490 fs/btrfs/volumes.c:1359
btrfs_mount_root+0x48f/0xb60 fs/btrfs/super.c:1634
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
fc_mount fs/namespace.c:978 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x179d/0x29e0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount+0x126/0x180 fs/namespace.c:3390
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44840a
RSP: 002b:00007ffedfffd608 EFLAGS: 00000293 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007ffedfffd670 RCX: 000000000044840a
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffedfffd630
RBP: 00007ffedfffd630 R08: 00007ffedfffd670 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000001a
R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000003
Allocated by task 6945:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
kmalloc_node include/linux/slab.h:577 [inline]
kvmalloc_node+0x81/0x110 mm/util.c:574
kvmalloc include/linux/mm.h:757 [inline]
kvzalloc include/linux/mm.h:765 [inline]
btrfs_mount_root+0xd0/0xb60 fs/btrfs/super.c:1613
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
fc_mount fs/namespace.c:978 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x179d/0x29e0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount+0x126/0x180 fs/namespace.c:3390
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 6945:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kfree+0x113/0x200 mm/slab.c:3756
deactivate_locked_super+0xa7/0xf0 fs/super.c:335
btrfs_mount_root+0x72b/0xb60 fs/btrfs/super.c:1678
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
fc_mount fs/namespace.c:978 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x179d/0x29e0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount+0x126/0x180 fs/namespace.c:3390
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff8880878e0000
which belongs to the cache kmalloc-16k of size 16384
The buggy address is located 1704 bytes inside of
16384-byte region [ffff8880878e0000, ffff8880878e4000)
The buggy address belongs to the page:
page:0000000060704f30 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x878e0
head:0000000060704f30 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfffe0000010200(slab|head)
raw: 00fffe0000010200 ffffea00028e9a08 ffffea00021e3608 ffff8880aa440b00
raw: 0000000000000000 ffff8880878e0000 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880878e0580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880878e0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880878e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880878e0700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880878e0780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The syzkaller reproducer for this use-after-free crafts a filesystem image
and loop mounts it twice in a loop. The mount will fail as the crafted
image has an invalid chunk tree. When this happens btrfs_mount_root() will
call deactivate_locked_super(), which then cleans up fs_info and
fs_info::sb. If a second thread now adds the same block-device to the
filesystem, it will get detected as a duplicate device and
device_list_add() will reject the duplicate and print a warning. But as
the fs_info pointer passed in is non-NULL this will result in a
use-after-free.
Instead of printing possibly uninitialized or already freed memory in
btrfs_printk(), explicitly pass in a NULL fs_info so the printing of the
device name will be skipped altogether.
There was a slightly different approach discussed in
https://lore.kernel.org/linux-btrfs/20200114060920.4527-1-anand.jain@oracle.com/t/#u
Link: https://lore.kernel.org/linux-btrfs/000000000000c9e14b05afcc41ba@google.com
Reported-by: syzbot+582e66e5edf36a22c7b0@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This adds support for the shutdown(2) system call, which is useful for
dealing with sockets.
shutdown(2) may block, so we have to punt it to async context.
Suggested-by: Norman Maurer <norman.maurer@googlemail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
CAP_SYS_ADMIN is too restrictive for a lot of uses cases, allow
CAP_SYS_NICE based on the premise that such users are already allowed
to raise the priority of tasks.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built in
Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl+tRcoACgkQwjcgfpV0
+n1LZQf/af1p9A0zT1nC3IrcaABceJDnD3dJuV9SD6QhFuD2Dw/Mshr+MVzDsO+3
btvuu0r4CzQ5ajfpfcGcvBFFWbbPTwKvWQe++9Unwoz5acw7hpV5yxNwMivdaJEh
3o4pkgpCmwtliTwiroDficO7Vlqefqf4LZd7/iQYQTnuPK3waYQBjwp9t2D9tlx7
kXiEQDP2BDRCUrKEjlR7AHTZ156mw+UsiquAuxMCGTKBqwiELEEV6aPseqa5MmNV
RDV1IXWdhOQyQfzg0s6vTzwGeN0JubSxHng6O9UbE+tctz4EqaaHIEsRuMBq8oLD
Y8JeGp1ovypTJxeLE5t6eEzsfvTRsg==
=GnmM
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built
in Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure"
* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Free efi_pgd with free_pages()
efivarfs: fix memory leak in efivarfs_create()
efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (madvise, pagemap,
readahead, memcg, userfaultfd), kbuild, and vfs"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix madvise WILLNEED performance problem
libfs: fix error cast of negative value in simple_attr_write()
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
mm: memcg/slab: fix root memcg vmstats
mm: fix readahead_page_batch for retry entries
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
compiler-clang: remove version check for BPF Tracing
mm/madvise: fix memory leak from process_madvise
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl+6tkoACgkQ8vlZVpUN
gaO1xgf/aJ5chEWFEQVrdSdd+cvhuILz1Hp2iW8xdZgPeN2ovWIC3LPCTDr9FWB0
MhpGS9avYIf8mHZgsw7HVzqUv6gPcT0khragPp348QJzxnbz/saZ5ujK/WR2zJxr
SoB9f2vdqW0gBbKMO6avXm0gTnuNemcK5oH6tzI5ECBpV3Ltk1dJWtgQkVp9rAyP
EFEb9hUYpdZ3J1cm8SCUIO99Tu2KMd+yNRv42z0BKNTfBNe2P5aG56p1sMYMcIr7
BiVUrhPkbAf3gMsMDzZQE5mHZHyzJoHNHssLHWcEU/o9Wd2wZHjAEzzn9Tz49rUg
yhYTrhLcQJcfmL0XvgrIhJaXTfMc6w==
=s/1A
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"
* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix bogus warning in ext4_update_dx_flag()
jbd2: fix kernel-doc markups
ext4: drop fast_commit from /proc/mounts
When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.
To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).
When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.
A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes. What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.
This results in something like the following being seen in dmesg:
kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus
showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification. This was causing the cache to be
invalidated for that vnode when it shouldn't have been. If it happens
on a data file, this might lead to local changes being lost.
Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.
Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway. It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.
Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion. It will lead to the error cast if user inputs a
negative value.
Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value. The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly. Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.
Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix various deficiencies in online fsck's metadata checking code.
- Fix an integer casting bug in the xattr code on 32-bit systems.
- Fix a hang in an inode walk when the inode index is corrupt.
- Fix error codes being dropped when initializing per-AG structures
- Fix nowait directio writes that partially succeed but return EAGAIN.
- Revert last week's rmap comparison patch because it was wrong.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl+38UwACgkQ+H93GTRK
tOtMFQ/9EV2/I673TJj8GY5gJE9mUXGbiUyMedl8XammhNPL0JZMAGsnMXBWCtjO
0pmO+TS4epwlYpZ4XVLlxNUxUwDkxgq3nHKbEz2jezepKPTCasL7XZOECGQ1gKdA
11BRNP7Y91ndYtVZGxHu+92oeZAzJgTh6OVYtJytniTgF9r96hgr/+3dA8GQxkqm
bkkfWfKxxCwMYLRLRNcnVbkj0xDMgmKOILyFR63ZhW8RtrfmdIUYDUty7RGvj4bJ
csZmrkcu/wIj+9NeXw8KS5KpNOWu2q3baORXe6EodoVgFMa4I11kiuGucZehsIbH
yNgTLDaFNUm1aBCkSrYtz7m4iwLq8No7XB/OIXrALSd5yJqaXhDyMnEV/tBeAL7D
MXn032Sc6hPSyGBtCmurTSo61oKP3HjgMXA4vvNw5CxJ7Q4EoZyBXCdHtZcRnB7+
MSa+ylBTbmP/AJ2AQrPiArGlAKUTnJM6WknIBCWiIueRtadTh1cquBFVbDxoEIX5
eKcjdQrX2xNrFNE2rRuYI4ml+wwtdgk7JO41gjAw+NA2V1LJW6Q5A5RKX2PiOidC
oGdNPTLG7Rfh7sMaPo66X3xTQPoOwcV0O+ArXlFNDBZXDUw0d1tWzVfYo+/2Zym6
3sFcTKMdTKtG8NasNjvbanmZTV1VLbZAJRdevH1NFAWUICiTBkY=
=HoPI
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"The critical fixes are for a crash that someone reported in the xattr
code on 32-bit arm last week; and a revert of the rmap key comparison
change from last week as it was totally wrong. I need a vacation. :(
Summary:
- Fix various deficiencies in online fsck's metadata checking code
- Fix an integer casting bug in the xattr code on 32-bit systems
- Fix a hang in an inode walk when the inode index is corrupt
- Fix error codes being dropped when initializing per-AG structures
- Fix nowait directio writes that partially succeed but return EAGAIN
- Revert last week's rmap comparison patch because it was wrong"
* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: revert "xfs: fix rmap key and record comparison functions"
xfs: don't allow NOWAIT DIO across extent boundaries
xfs: return corresponding errcode if xfs_initialize_perag() fail
xfs: ensure inobt record walks always make forward progress
xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
xfs: directory scrub should check the null bestfree entries too
xfs: strengthen rmap record flags checking
xfs: fix the minrecs logic when dealing with inode root child blocks
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl+4JHgACgkQnJ2qBz9k
QNlXOQf/YHs4q4HgBI0tsStS/U4xFtmY77Rcm1pllqH6BZPBg1vRpzfh7hZvIPMa
GceTcMAX4OmG6++fRzVgNDIuem3Jl0oDCm++pWPev+S/V06PuTu36viuFWJ3e/5g
0wDLYXRj4dRUiQtjbSkI7LAgIX1wbTANOKSZeaKFYaGHfEcFm1GkHUuHzEBVX1Jw
bRpaod3ikmjoaoI6TTZlKKnrKksSw6F5wHUiHu2ZHdZ6kQ36elwHFu8QXJCzkZ7F
F9vt4IIKq6xzEVdwDXAPjsFkPp2B4Bz+AgcSpoitg/2L5hc2d/kxgI4zvpXY8TGs
hpW6YPXEXIjhHjKX22f99ThI4BqXww==
=bBTT
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A single fanotify fix from Amir"
* tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix logic of reporting name info with watched parent
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+4DAwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphdOD/9xOEnYPuekvVH9G9nyNd//Q9fPArG2+j6V
/MCnze07GNtDt7z15oR+T07hKXmf+Ejh4nu3JJ6MUNfe/47hhJqHSxRHU6+PJCjk
hPrsaTsDedxxLEDiLmvhXnUPzfVzJtefxVAAaKikWOb3SBqLdh7xTFSlor1HbRBl
Zk4d343cjBDYfvSSt/zMWDzwwvramdz7rJnnPMKXITu64ITL5314vuK2YVZmBOet
YujSah7J8FL1jKhiG1Iw5rayd2Q3smnHWIEQ+lvW6WiTvMJMLOxif2xNF4/VEZs1
CBGJUQt42LI6QGEzRBHohcefZFuPGoxnduSzHCOIhh7d6+k+y9mZfsPGohr3g9Ov
NotXpVonnA7GbRqzo1+IfBRve7iRONdZ3/LBwyRmqav4I4jX68wXBNH5IDpVR0Sn
c31avxa/ZL7iLIBx32enp0/r3mqNTQotEleSLUdyJQXAZTyG2INRhjLLXTqSQ5BX
oVp0fZzKCwsr6HCPZpXZ/f2G7dhzuF0ghoceC02GsOVooni22gdVnQj+AWNus398
e+wcimT4MX6AHNFxO2aUtJow0KWWZRzC1p5Mxu/9W3YiMtJiC0YOGePfSqiTqX0g
Uk0H5dOAgBUQrAsusf7bKr0K6W25yEk/JipxhWqi0rC71x42mLTsCT1wxSCvLwqs
WxhdtVKroQ==
=7PAe
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:
- Disallow async path resolution of /proc/self
- Tighten constraints for segmented async buffered reads
- Fix double completion for a retry error case
- Fix for fixed file life times (Pavel)"
* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components
Currently the kernel does not provide an infrastructure to translate
architecture numbers to a human-readable name. Translating syscall
numbers to syscall names is possible through FTRACE_SYSCALL
infrastructure but it does not provide support for compat syscalls.
This will create a file for each PID as /proc/pid/seccomp_cache.
The file will be empty when no seccomp filters are loaded, or be
in the format of:
<arch name> <decimal syscall number> <ALLOW | FILTER>
where ALLOW means the cache is guaranteed to allow the syscall,
and filter means the cache will pass the syscall to the BPF filter.
For the docker default profile on x86_64 it looks like:
x86_64 0 ALLOW
x86_64 1 ALLOW
x86_64 2 ALLOW
x86_64 3 ALLOW
[...]
x86_64 132 ALLOW
x86_64 133 ALLOW
x86_64 134 FILTER
x86_64 135 FILTER
x86_64 136 FILTER
x86_64 137 ALLOW
x86_64 138 ALLOW
x86_64 139 FILTER
x86_64 140 ALLOW
x86_64 141 ALLOW
[...]
This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default
of N because I think certain users of seccomp might not want the
application to know which syscalls are definitely usable. For
the same reason, it is also guarded by CAP_SYS_ADMIN.
Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.edu
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.
This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.
Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.
This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The idea of the warning in ext4_update_dx_flag() is that we should warn
when we are clearing EXT4_INODE_INDEX on a filesystem with metadata
checksums enabled since after clearing the flag, checksums for internal
htree nodes will become invalid. So there's no need to warn (or actually
do anything) when EXT4_INODE_INDEX is not set.
Link: https://lore.kernel.org/r/20201118153032.17281-1-jack@suse.cz
Fixes: 48a34311953d ("ext4: fix checksum errors with indexed dirs")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Kernel-doc markup should use this format:
identifier - description
They should not have any type before that, as otherwise
the parser won't do the right thing.
Also, some identifiers have different names between their
prototypes and the kernel-doc markup.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This reverts commit 6ff646b2ceb0eec916101877f38da0b73e3a5b7f.
Your maintainer committed a major braino in the rmap code by adding the
attr fork, bmbt, and unwritten extent usage bits into rmap record key
comparisons. While XFS uses the usage bits *in the rmap records* for
cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
the owner and offset information to distinguish between reverse mappings
of the same physical extent into the data fork of a file at multiple
offsets. The other bits are not important for key comparisons for index
lookups, and never have been.
Eric Sandeen reports that this causes regressions in generic/299, so
undo this patch before it does more damage.
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Fixes: 6ff646b2ceb0 ("xfs: fix rmap key and record comparison functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
The options in /proc/mounts must be valid mount options --- and
fast_commit is not a mount option. Otherwise, command sequences like
this will fail:
# mount /dev/vdc /vdc
# mkdir -p /vdc/phoronix_test_suite /pts
# mount --bind /vdc/phoronix_test_suite /pts
# mount -o remount,nodioread_nolock /pts
mount: /pts: mount point not mounted or bad option.
And in the system logs, you'll find:
EXT4-fs (vdc): Unrecognized mount option "fast_commit" or missing value
Fixes: 995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>