linux

iv/linux

Author	SHA1	Message	Date
Jens Axboe	c9fc52c173	io_uring: ensure that send/sendmsg and recv/recvmsg check sqe->ioprio commit 73911426aaaadbae54fa72359b33a7b6a56947db upstream. All other opcodes correctly check if this is set and -EINVAL if it is and they don't support that field, for some reason the these were forgotten. This was unified a bit differently in the upstream tree, but had the same effect as making sure we error on this field. Rather than have a painful backport of the upstream commit, just fixup the mentioned opcodes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-07 17:52:19 +02:00
Jens Axboe	fb2fbb3c10	io_uring: use separate list entry for iopoll requests A previous commit ended up enabling file tracking for iopoll requests, which conflicts with both of them using the same list entry for tracking. Add a separate list entry just for iopoll requests, avoid this issue. No upstream commit exists for this issue. Reported-by: Greg Thelen <gthelen@google.com> Fixes: df3f3bb5059d ("io_uring: add missing item types for various requests") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-27 09:41:01 +02:00
Jens Axboe	df3f3bb505	io_uring: add missing item types for various requests Any read/write should grab current->nsproxy, denoted by IO_WQ_WORK_FILES as it refers to current->files as well, and connect and recv/recvmsg, send/sendmsg should grab current->fs which is denoted by IO_WQ_WORK_FS. No upstream commit exists for this issue. Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-25 15:16:09 +02:00
Pavel Begunkov	8adb751d29	io_uring: fix using under-expanded iters [ upstream commit cd65869512ab5668a5d16f789bc4da1319c435c4 ] The issue was first described and addressed in 89c2b3b7491820 ("io_uring: reexpand under-reexpanded iters"), but shortly after reimplemented as. cd65869512ab56 ("io_uring: use iov_iter state save/restore helpers"). Here we follow the approach from the second patch but without in-callback resubmissions, fixups for not yet supported in 5.10 short read retries and replacing iov_iter_state with iter copies to not pull even more dependencies, and because it's just much simpler. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-06 08:42:41 +02:00
Pavel Begunkov	57d01bcae7	io_uring: don't re-import iovecs from callbacks We can't re-import or modify iterators from iocb callbacks, it's not safe as it might be reverted and/or reexpanded while unwinding stack. It's also not safe to resubmit as io-wq thread will race with stack undwinding for the iterator and other data. Disallow resubmission from callbacks, it can fail some cases that were handled before, but the possibility of such a failure was a part of the API from the beginning and so it should be fine. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-06-06 08:42:41 +02:00
Jens Axboe	3c48558be5	io_uring: always grab file table for deferred statx Lee reports that there's a use-after-free of the process file table. There's an assumption that we don't need the file table for some variants of statx invocation, but that turns out to be false and we end up with not grabbing a reference for the request even if the deferred execution uses it. Get rid of the REQ_F_NO_FILE_TABLE optimization for statx, and always grab that reference. This issues doesn't exist upstream since the native workers got introduced with 5.12. Link: https://lore.kernel.org/io-uring/YoOJ%2FT4QRKC+fAZE@google.com/ Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-25 09:17:51 +02:00
Jens Axboe	29f077d070	io_uring: always use original task when preparing req identity If the ring is setup with IORING_SETUP_IOPOLL and we have more than one task doing submissions on a ring, we can up in a situation where we assign the context from the current task rather than the request originator. Always use req->task rather than assume it's the same as current. No upstream patch exists for this issue, as only older kernels with the non-native workers have this problem. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-18 10:23:48 +02:00
Jens Axboe	2827328e64	io_uring: fix race between timeout flush and removal commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream. io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it. Leave it on the list and let the normal timeout cancelation take care of it. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-13 21:01:08 +02:00
Pavel Begunkov	aed30a2054	io_uring: don't touch scm_fp_list after queueing skb [ Upstream commit a07211e3001435fe8591b992464cd8d5e3c98c5a ] It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side. Fixes: 6b06314c47e14 ("io_uring: add file set registration") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-13 21:01:06 +02:00
Pavel Begunkov	b27de7011c	io_uring: fix memory leak of uid in files registration commit c86d18f4aa93e0e66cda0e55827cd03eea6bc5f8 upstream. When there are no files for __io_sqe_files_scm() to process in the range, it'll free everything and return. However, it forgets to put uid. Fixes: 08a451739a9b5 ("io_uring: allow sparse fixed file sets") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/accee442376f33ce8aaebb099d04967533efde92.1648226048.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:40:42 +02:00
Jens Axboe	509565faed	io_uring: terminate manual loop iterator loop correctly for non-vecs [ Upstream commit 5e929367468c8f97cd1ffb0417316cecfebef94b ] The fix for not advancing the iterator if we're using fixed buffers is broken in that it can hit a condition where we don't terminate the loop. This results in io-wq looping forever, asking to read (or write) 0 bytes for every subsequent loop. Reported-by: Joel Jaeschke <joel.jaeschke@gmail.com> Link: https://github.com/axboe/liburing/issues/549 Fixes: 16c8d2df7ec0 ("io_uring: ensure symmetry in handling iter types in loop_rw_iter()") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:40:03 +02:00
Pavel Begunkov	dc1163203a	io_uring: return back safer resurrect commit f70865db5ff35f5ed0c7e9ef63e7cca3d4947f04 upstream. Revert of revert of "io_uring: wait potential ->release() on resurrect", which adds a helper for resurrect not racing completion reinit, as was removed because of a strange bug with no clear root or link to the patch. Was improved, instead of rcu_synchronize(), just wait_for_completion() because we're at 0 refs and it will happen very shortly. Specifically use non-interruptible version to ignore all pending signals that may have ended prior interruptible wait. This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7a080c20f686d026efade810b116b72f88abaff9.1618101759.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-19 13:44:46 +01:00
Eric Dumazet	4a93c65946	io_uring: add a schedule point in io_add_buffers() commit f240762f88b4b1b58561939ffd44837759756477 upstream. Looping ~65535 times doing kmalloc() calls can trigger soft lockups, especially with DEBUG features (like KASAN). [ 253.536212] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [b219417889:12575] [ 253.544433] Modules linked in: vfat fat i2c_mux_pca954x i2c_mux spidev cdc_acm xhci_pci xhci_hcd sha3_generic gq(O) [ 253.544451] CPU: 64 PID: 12575 Comm: b219417889 Tainted: G S O 5.17.0-smp-DEV #801 [ 253.544457] RIP: 0010:kernel_text_address (./include/asm-generic/sections.h:192 ./include/linux/kallsyms.h:29 kernel/extable.c:67 kernel/extable.c:98) [ 253.544464] Code: 0f 93 c0 48 c7 c1 e0 63 d7 a4 48 39 cb 0f 92 c1 20 c1 0f b6 c1 5b 5d c3 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 89 fb <48> c7 c0 00 00 80 a0 41 be 01 00 00 00 48 39 c7 72 0c 48 c7 c0 40 [ 253.544468] RSP: 0018:ffff8882d8baf4c0 EFLAGS: 00000246 [ 253.544471] RAX: 1ffff1105b175e00 RBX: ffffffffa13ef09a RCX: 00000000a13ef001 [ 253.544474] RDX: ffffffffa13ef09a RSI: ffff8882d8baf558 RDI: ffffffffa13ef09a [ 253.544476] RBP: ffff8882d8baf4d8 R08: ffff8882d8baf5e0 R09: 0000000000000004 [ 253.544479] R10: ffff8882d8baf5e8 R11: ffffffffa0d59a50 R12: ffff8882eab20380 [ 253.544481] R13: ffffffffa0d59a50 R14: dffffc0000000000 R15: 1ffff1105b175eb0 [ 253.544483] FS: 00000000016d3380(0000) GS:ffff88af48c00000(0000) knlGS:0000000000000000 [ 253.544486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 253.544488] CR2: 00000000004af0f0 CR3: 00000002eabfa004 CR4: 00000000003706e0 [ 253.544491] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 253.544492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 253.544494] Call Trace: [ 253.544496] <TASK> [ 253.544498] ? io_queue_sqe (fs/io_uring.c:7143) [ 253.544505] __kernel_text_address (kernel/extable.c:78) [ 253.544508] unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19) [ 253.544514] arch_stack_walk (arch/x86/kernel/stacktrace.c:27) [ 253.544517] ? io_queue_sqe (fs/io_uring.c:7143) [ 253.544521] stack_trace_save (kernel/stacktrace.c:123) [ 253.544527] ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515) [ 253.544531] ? ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515) [ 253.544533] ? __kasan_kmalloc (mm/kasan/common.c:524) [ 253.544535] ? kmem_cache_alloc_trace (./include/linux/kasan.h:270 mm/slab.c:3567) [ 253.544541] ? io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828) [ 253.544544] ? __io_queue_sqe (fs/io_uring.c:?) [ 253.544551] __kasan_kmalloc (mm/kasan/common.c:524) [ 253.544553] kmem_cache_alloc_trace (./include/linux/kasan.h:270 mm/slab.c:3567) [ 253.544556] ? io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828) [ 253.544560] io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828) [ 253.544564] ? __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) [ 253.544567] ? __kasan_slab_alloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) [ 253.544569] ? kmem_cache_alloc_bulk (mm/slab.h:732 mm/slab.c:3546) [ 253.544573] ? __io_alloc_req_refill (fs/io_uring.c:2078) [ 253.544578] ? io_submit_sqes (fs/io_uring.c:7441) [ 253.544581] ? __se_sys_io_uring_enter (fs/io_uring.c:10154 fs/io_uring.c:10096) [ 253.544584] ? __x64_sys_io_uring_enter (fs/io_uring.c:10096) [ 253.544587] ? do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 253.544590] ? entry_SYSCALL_64_after_hwframe (??:?) [ 253.544596] __io_queue_sqe (fs/io_uring.c:?) [ 253.544600] io_queue_sqe (fs/io_uring.c:7143) [ 253.544603] io_submit_sqe (fs/io_uring.c:?) [ 253.544608] io_submit_sqes (fs/io_uring.c:?) [ 253.544612] __se_sys_io_uring_enter (fs/io_uring.c:10154 fs/io_uring.c:10096) [ 253.544616] __x64_sys_io_uring_enter (fs/io_uring.c:10096) [ 253.544619] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 253.544623] entry_SYSCALL_64_after_hwframe (??:?) Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring <io-uring@vger.kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220215041003.2394784-1-eric.dumazet@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-02 11:42:49 +01:00
Lee Jones	748786564a	Revert "io_uring: reinforce cancel on flush during exit" This reverts commit 88dbd085a51ec78c83dde79ad63bca8aa4272a9d. Causes the following Syzkaller reported issue: BUG: kernel NULL pointer dereference, address: 0000000000000010 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP KASAN CPU: 1 PID: 546 Comm: syz-executor631 Tainted: G B 5.10.76-syzkaller-01178-g4944ec82ebb9 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:arch_atomic_try_cmpxchg syzkaller/managers/android-5-10/kernel/./arch/x86/include/asm/atomic.h:202 [inline] RIP: 0010:atomic_try_cmpxchg_acquire syzkaller/managers/android-5-10/kernel/./include/asm-generic/atomic-instrumented.h:707 [inline] RIP: 0010:queued_spin_lock syzkaller/managers/android-5-10/kernel/./include/asm-generic/qspinlock.h:82 [inline] RIP: 0010:do_raw_spin_lock_flags syzkaller/managers/android-5-10/kernel/./include/linux/spinlock.h:195 [inline] RIP: 0010:__raw_spin_lock_irqsave syzkaller/managers/android-5-10/kernel/./include/linux/spinlock_api_smp.h:119 [inline] RIP: 0010:_raw_spin_lock_irqsave+0x10d/0x210 syzkaller/managers/android-5-10/kernel/kernel/locking/spinlock.c:159 Code: 00 00 00 e8 d5 29 09 fd 4c 89 e7 be 04 00 00 00 e8 c8 29 09 fd 42 8a 04 3b 84 c0 0f 85 be 00 00 00 8b 44 24 40 b9 01 00 00 00 <f0> 41 0f b1 4d 00 75 45 48 c7 44 24 20 0e 36 e0 45 4b c7 04 37 00 RSP: 0018:ffffc90000f174e0 EFLAGS: 00010097 RAX: 0000000000000000 RBX: 1ffff920001e2ea4 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90000f17520 RBP: ffffc90000f175b0 R08: dffffc0000000000 R09: 0000000000000003 R10: fffff520001e2ea5 R11: 0000000000000004 R12: ffffc90000f17520 R13: 0000000000000010 R14: 1ffff920001e2ea0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000640f000 CR4: 00000000003506a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: prepare_to_wait+0x9c/0x290 syzkaller/managers/android-5-10/kernel/kernel/sched/wait.c:248 io_uring_cancel_files syzkaller/managers/android-5-10/kernel/fs/io_uring.c:8690 [inline] io_uring_cancel_task_requests+0x16a9/0x1ed0 syzkaller/managers/android-5-10/kernel/fs/io_uring.c:8760 io_uring_flush+0x170/0x6d0 syzkaller/managers/android-5-10/kernel/fs/io_uring.c:8923 filp_close+0xb0/0x150 syzkaller/managers/android-5-10/kernel/fs/open.c:1319 close_files syzkaller/managers/android-5-10/kernel/fs/file.c:401 [inline] put_files_struct+0x1d4/0x350 syzkaller/managers/android-5-10/kernel/fs/file.c:429 exit_files+0x80/0xa0 syzkaller/managers/android-5-10/kernel/fs/file.c:458 do_exit+0x6d9/0x23a0 syzkaller/managers/android-5-10/kernel/kernel/exit.c:808 do_group_exit+0x16a/0x2d0 syzkaller/managers/android-5-10/kernel/kernel/exit.c:910 get_signal+0x133e/0x1f80 syzkaller/managers/android-5-10/kernel/kernel/signal.c:2790 arch_do_signal+0x8d/0x620 syzkaller/managers/android-5-10/kernel/arch/x86/kernel/signal.c:805 exit_to_user_mode_loop syzkaller/managers/android-5-10/kernel/kernel/entry/common.c:161 [inline] exit_to_user_mode_prepare+0xaa/0xe0 syzkaller/managers/android-5-10/kernel/kernel/entry/common.c:191 syscall_exit_to_user_mode+0x24/0x40 syzkaller/managers/android-5-10/kernel/kernel/entry/common.c:266 do_syscall_64+0x3d/0x70 syzkaller/managers/android-5-10/kernel/arch/x86/entry/common.c:56 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc6d1589a89 Code: Unable to access opcode bytes at RIP 0x7fc6d1589a5f. RSP: 002b:00007ffd2b5da728 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffdfc RBX: 0000000000005193 RCX: 00007fc6d1589a89 RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fc6d161142c RBP: 0000000000000032 R08: 00007ffd2b5eb0b8 R09: 0000000000000000 R10: 00007ffd2b5da750 R11: 0000000000000246 R12: 00007fc6d161142c R13: 00007ffd2b5da750 R14: 00007ffd2b5da770 R15: 0000000000000000 Modules linked in: CR2: 0000000000000010 ---[ end trace fe8044f7dc4d8d65 ]--- RIP: 0010:arch_atomic_try_cmpxchg syzkaller/managers/android-5-10/kernel/./arch/x86/include/asm/atomic.h:202 [inline] RIP: 0010:atomic_try_cmpxchg_acquire syzkaller/managers/android-5-10/kernel/./include/asm-generic/atomic-instrumented.h:707 [inline] RIP: 0010:queued_spin_lock syzkaller/managers/android-5-10/kernel/./include/asm-generic/qspinlock.h:82 [inline] RIP: 0010:do_raw_spin_lock_flags syzkaller/managers/android-5-10/kernel/./include/linux/spinlock.h:195 [inline] RIP: 0010:__raw_spin_lock_irqsave syzkaller/managers/android-5-10/kernel/./include/linux/spinlock_api_smp.h:119 [inline] RIP: 0010:_raw_spin_lock_irqsave+0x10d/0x210 syzkaller/managers/android-5-10/kernel/kernel/locking/spinlock.c:159 Code: 00 00 00 e8 d5 29 09 fd 4c 89 e7 be 04 00 00 00 e8 c8 29 09 fd 42 8a 04 3b 84 c0 0f 85 be 00 00 00 8b 44 24 40 b9 01 00 00 00 <f0> 41 0f b1 4d 00 75 45 48 c7 44 24 20 0e 36 e0 45 4b c7 04 37 00 RSP: 0018:ffffc90000f174e0 EFLAGS: 00010097 RAX: 0000000000000000 RBX: 1ffff920001e2ea4 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90000f17520 RBP: ffffc90000f175b0 R08: dffffc0000000000 R09: 0000000000000003 R10: fffff520001e2ea5 R11: 0000000000000004 R12: ffffc90000f17520 R13: 0000000000000010 R14: 1ffff920001e2ea0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000640f000 CR4: 00000000003506a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess), 1 bytes skipped: 0: 00 00 add %al,(%rax) 2: e8 d5 29 09 fd callq 0xfd0929dc 7: 4c 89 e7 mov %r12,%rdi a: be 04 00 00 00 mov $0x4,%esi f: e8 c8 29 09 fd callq 0xfd0929dc 14: 42 8a 04 3b mov (%rbx,%r15,1),%al 18: 84 c0 test %al,%al 1a: 0f 85 be 00 00 00 jne 0xde 20: 8b 44 24 40 mov 0x40(%rsp),%eax 24: b9 01 00 00 00 mov $0x1,%ecx * 29: f0 41 0f b1 4d 00 lock cmpxchg %ecx,0x0(%r13) <-- trapping instruction 2f: 75 45 jne 0x76 31: 48 c7 44 24 20 0e 36 movq $0x45e0360e,0x20(%rsp) 38: e0 45 3a: 4b rex.WXB 3b: c7 .byte 0xc7 3c: 04 37 add $0x37,%al Link: https://syzkaller.appspot.com/bug?extid=b0003676644cf0d6acc4 Reported-by: syzbot+b0003676644cf0d6acc4@syzkaller.appspotmail.com Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-06 14:10:08 +01:00
Pavel Begunkov	3f2c12ec8a	io_uring: don't take uring_lock during iowq cancel commit 792bb6eb862333658bf1bd2260133f0507e2da8d upstream. [ 97.866748] a.out/2890 is trying to acquire lock: [ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_wq_submit_work+0x155/0x240 [ 97.869735] [ 97.869735] but task is already holding lock: [ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.873074] [ 97.873074] other info that might help us debug this: [ 97.874520] Possible unsafe locking scenario: [ 97.874520] [ 97.875845] CPU0 [ 97.876440] ---- [ 97.877048] lock(&ctx->uring_lock); [ 97.877961] lock(&ctx->uring_lock); [ 97.878881] [ 97.878881] * DEADLOCK * [ 97.878881] [ 97.880341] May be due to missing lock nesting notation [ 97.880341] [ 97.881952] 1 lock held by a.out/2890: [ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.885108] [ 97.885108] stack backtrace: [ 97.890457] Call Trace: [ 97.891121] dump_stack+0xac/0xe3 [ 97.891972] __lock_acquire+0xab6/0x13a0 [ 97.892940] lock_acquire+0x2c3/0x390 [ 97.894894] __mutex_lock+0xae/0x9f0 [ 97.901101] io_wq_submit_work+0x155/0x240 [ 97.902112] io_wq_cancel_cb+0x162/0x490 [ 97.904126] io_async_find_and_cancel+0x3b/0x140 [ 97.905247] io_issue_sqe+0x86d/0x13e0 [ 97.909122] __io_queue_sqe+0x10b/0x550 [ 97.913971] io_queue_sqe+0x235/0x470 [ 97.914894] io_submit_sqes+0xcce/0xf10 [ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 97.921424] do_syscall_64+0x2d/0x40 [ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9 While holding uring_lock, e.g. from inline execution, async cancel request may attempt cancellations through io_wq_submit_work, which may try to grab a lock. Delay it to task_work, so we do it from a clean context and don't have to worry about locking. Cc: <stable@vger.kernel.org> # 5.5+ Fixes: c07e6719511e ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()") Reported-by: Abaci <abaci@linux.alibaba.com> Reported-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> [Lee: The first hunk solves a different (double free) issue in v5.10. Only the first hunk of the original patch is relevant to v5.10 AND the first hunk of the original patch is only relevant to v5.10] Reported-by: syzbot+59d8a1f4e60c20c066cf@syzkaller.appspotmail.com Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-02 19:48:18 +01:00
Kamal Mostafa	f59da9f7ef	io_uring: fix splice_fd_in checks backport typo The linux-5.10.y backport of commit "io_uring: add ->splice_fd_in checks" includes a typo: "\|" where "\|\|" should be. (The original upstream commit is fine.) Fixes: 54eb6211b979 ("io_uring: add ->splice_fd_in checks") Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org # v5.10 Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-27 09:56:46 +02:00
Jens Axboe	ce092350b4	io_uring: put provided buffer meta data under memcg accounting [ Upstream commit 9990da93d2bf9892c2c14c958bef050d4e461a1a ] For each provided buffer, we allocate a struct io_buffer to hold the data associated with it. As a large number of buffers can be provided, account that data with memcg. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-09-30 10:11:05 +02:00
Jens Axboe	ce8f81b76d	io_uring: ensure symmetry in handling iter types in loop_rw_iter() commit 16c8d2df7ec0eed31b7d3b61cb13206a7fb930cc upstream. When setting up the next segment, we check what type the iter is and handle it accordingly. However, when incrementing and processed amount we do not, and both iter advance and addr/len are adjusted, regardless of type. Split the increment side just like we do on the setup side. Fixes: 4017eb91a9e7 ("io_uring: make loop_rw_iter() use original user supplied pointers") Cc: stable@vger.kernel.org Reported-by: Valentina Palmiotti <vpalmiotti@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-22 12:27:54 +02:00
Pavel Begunkov	9a4e7f9038	io_uring: remove duplicated io_size from rw [ Upstream commit 632546c4b5a4dad8e3ac456406c65c0db9a0b570 ] io_size and iov_count in io_read() and io_write() hold the same value, kill the last one. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-09-18 13:40:35 +02:00
David Laight	6930a2a5be	fs/io_uring Don't use the return value from import_iovec(). [ Upstream commit 10fc72e43352753a08f9cf83aa5c40baec00d212 ] This is the only code that relies on import_iovec() returning iter.count on success. This allows a better interface to import_iovec(). Signed-off-by: David Laight <david.laight@aculab.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-09-18 13:40:35 +02:00
Pavel Begunkov	548ee201fb	io_uring: fail links of cancelled timeouts commit 2ae2eb9dde18979b40629dd413b9adbd6c894cdf upstream. When we cancel a timeout we should mark it with REQ_F_FAIL, so linked requests are cancelled as well, but not queued for further execution. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fff625b44eeced3a5cae79f60e6acf3fbdf8f990.1631192135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-18 13:40:05 +02:00
Pavel Begunkov	54eb6211b9	io_uring: add ->splice_fd_in checks commit 26578cda3db983b17cabe4e577af26306beb9987 upstream. ->splice_fd_in is used only by splice/tee, but no other request checks it for validity. Add the check for most of request types excluding reads/writes/sends/recvs, we don't want overhead for them and can leave them be as is until the field is actually used. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-18 13:40:05 +02:00
Pavel Begunkov	a3ed34bcad	io_uring: place fixed tables under memcg limits commit 0bea96f59ba40e63c0ae93ad6a02417b95f22f4d upstream. Fixed tables may be large enough, place all of them together with allocated tags under memcg limits. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b3ac9f5da9821bb59837b5fe25e8ef4be982218c.1629451684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-18 13:40:05 +02:00
Pavel Begunkov	5103b73334	io_uring: limit fixed table size by RLIMIT_NOFILE Limit the number of files in io_uring fixed tables by RLIMIT_NOFILE, that's the first and the simpliest restriction that we should impose. Cc: stable@vger.kernel.org Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b2756c340aed7d6c0b302c26dab50c6c5907f4ce.1629451684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-18 13:40:05 +02:00
Jens Axboe	24fbd77d5a	io_uring: IORING_OP_WRITE needs hash_reg_file set commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8 upstream. During some testing, it became evident that using IORING_OP_WRITE doesn't hash buffered writes like the other writes commands do. That's simply an oversight, and can cause performance regressions when doing buffered writes with this command. Correct that and add the flag, so that buffered writes are correctly hashed when using the non-iovec based write command. Cc: stable@vger.kernel.org Fixes: 3a6820f2bb8a ("io_uring: add non-vectored read/write commands") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-09-15 09:50:47 +02:00
Jens Axboe	f15e642673	io_uring: only assign io_uring_enter() SQPOLL error in actual error case [ upstream commit 21f965221e7c42609521342403e8fb91b8b3e76e ] If an SQPOLL based ring is newly created and an application issues an io_uring_enter(2) system call on it, then we can return a spurious -EOWNERDEAD error. This happens because there's nothing to submit, and if the caller doesn't specify any other action, the initial error assignment of -EOWNERDEAD never gets overwritten. This causes us to return it directly, even if it isn't valid. Move the error assignment into the actual failure case instead. Cc: stable@vger.kernel.org Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death") Reported-by: Sherlock Holo sherlockya@gmail.com Link: https://github.com/axboe/liburing/issues/413 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:35:57 -04:00
Jens Axboe	695ab28a7f	io_uring: fix xa_alloc_cycle() error return value check [ upstream commit a30f895ad3239f45012e860d4f94c1a388b36d14 ] We currently check for ret != 0 to indicate error, but '1' is a valid return and just indicates that the allocation succeeded with a wrap. Correct the check to be for < 0, like it was before the xarray conversion. Cc: stable@vger.kernel.org Fixes: 61cf93700fe6 ("io_uring: Convert personality_idr to XArray") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-26 08:35:57 -04:00
Yang Yingliang	65b2658634	io_uring: fix null-ptr-deref in io_sq_offload_start() I met a null-ptr-deref when doing fault-inject test: [ 65.441626][ T8299] general protection fault, probably for non-canonical address 0xdffffc0000000029: 0000 [#1] PREEMPT SMP KASAN [ 65.443219][ T8299] KASAN: null-ptr-deref in range [0x0000000000000148-0x000000000000014f] [ 65.444331][ T8299] CPU: 2 PID: 8299 Comm: test Not tainted 5.10.49+ #499 [ 65.445277][ T8299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 65.446614][ T8299] RIP: 0010:io_disable_sqo_submit+0x124/0x260 [ 65.447554][ T8299] Code: 7b 40 89 ee e8 2d b9 9a ff 85 ed 74 40 e8 04 b8 9a ff 49 8d be 48 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 22 01 00 00 49 8b ae 48 01 00 00 48 85 ed 74 0d [ 65.450860][ T8299] RSP: 0018:ffffc9000122fd70 EFLAGS: 00010202 [ 65.451826][ T8299] RAX: dffffc0000000000 RBX: ffff88801b11f000 RCX: ffffffff81d5d783 [ 65.453166][ T8299] RDX: 0000000000000029 RSI: ffffffff81d5d78c RDI: 0000000000000148 [ 65.454606][ T8299] RBP: 0000000000000002 R08: ffff88810168c280 R09: ffffed1003623e79 [ 65.456063][ T8299] R10: ffffc9000122fd70 R11: ffffed1003623e78 R12: ffff88801b11f040 [ 65.457542][ T8299] R13: ffff88801b11f3c0 R14: 0000000000000000 R15: 000000000000001a [ 65.458910][ T8299] FS: 00007ffb602e3500(0000) GS:ffff888064100000(0000) knlGS:0000000000000000 [ 65.460533][ T8299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.461736][ T8299] CR2: 00007ffb5fe7eb24 CR3: 000000010a619000 CR4: 0000000000750ee0 [ 65.463146][ T8299] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 65.464618][ T8299] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 65.466052][ T8299] PKRU: 55555554 [ 65.466708][ T8299] Call Trace: [ 65.467304][ T8299] io_uring_setup+0x2041/0x3ac0 [ 65.468169][ T8299] ? io_iopoll_check+0x500/0x500 [ 65.469123][ T8299] ? syscall_enter_from_user_mode+0x1c/0x50 [ 65.470241][ T8299] do_syscall_64+0x2d/0x70 [ 65.471028][ T8299] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 65.472099][ T8299] RIP: 0033:0x7ffb5fdec839 [ 65.472925][ T8299] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 [ 65.476465][ T8299] RSP: 002b:00007ffc33539ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000001a9 [ 65.478026][ T8299] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb5fdec839 [ 65.479503][ T8299] RDX: 0000000020ffd000 RSI: 0000000020000080 RDI: 0000000000100001 [ 65.480927][ T8299] RBP: 00007ffc33539f70 R08: 0000000000000000 R09: 0000000000000000 [ 65.482416][ T8299] R10: 0000000000000000 R11: 0000000000000206 R12: 0000555e85531320 [ 65.483845][ T8299] R13: 00007ffc3353a0a0 R14: 0000000000000000 R15: 0000000000000000 [ 65.485331][ T8299] Modules linked in: [ 65.486000][ T8299] Dumping ftrace buffer: [ 65.486772][ T8299] (ftrace buffer empty) [ 65.487595][ T8299] ---[ end trace a9a5fad3ebb303b7 ]--- If io_allocate_scq_urings() fails in io_uring_create(), 'ctx->sq_data' is not set yet, when calling io_sq_offload_start() in io_disable_sqo_submit() in error path, it will lead a null-ptr-deref. The io_disable_sqo_submit() has been removed in mainline by commit 70aacfe66136 ("io_uring: kill sqo_dead and sqo submission halting"), so the bug has been eliminated in mainline, it's a fix only for stable-5.10. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:39 +02:00
Pavel Begunkov	6f5d7a45f5	io_uring: fix link timeout refs [ Upstream commit a298232ee6b9a1d5d732aa497ff8be0d45b5bd82 ] WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 Call Trace: __refcount_sub_and_test include/linux/refcount.h:283 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] io_put_req fs/io_uring.c:2140 [inline] io_queue_linked_timeout fs/io_uring.c:6300 [inline] __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354 io_submit_sqe fs/io_uring.c:6534 [inline] io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline] __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182 io_link_timeout_fn() should put only one reference of the linked timeout request, however in case of racing with the master request's completion first io_req_complete() puts one and then io_put_req_deferred() is called. Cc: stable@vger.kernel.org # 5.12+ Fixes: 9ae1f8dd372e0 ("io_uring: fix inconsistent lock state") Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.1620417627.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-31 08:16:10 +02:00
Pavel Begunkov	fca5343b48	io_uring: remove double poll entry on arm failure commit 46fee9ab02cb24979bbe07631fc3ae95ae08aa3e upstream. __io_queue_proc() can enqueue both poll entries and still fail afterwards, so the callers trying to cancel it should also try to remove the second poll entry (if any). For example, it may leave the request alive referencing a io_uring context but not accessible for cancellation: [ 282.599913][ T1620] task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004 [ 282.609927][ T1620] Call Trace: [ 282.613711][ T1620] __schedule+0x93a/0x26f0 [ 282.634647][ T1620] schedule+0xd3/0x270 [ 282.638874][ T1620] io_uring_cancel_generic+0x54d/0x890 [ 282.660346][ T1620] io_sq_thread+0xaac/0x1250 [ 282.696394][ T1620] ret_from_fork+0x1f/0x30 Cc: stable@vger.kernel.org Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users") Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.1626774457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:46 +02:00
Pavel Begunkov	9eef902915	io_uring: explicitly count entries for poll reqs commit 68b11e8b1562986c134764433af64e97d30c9fc0 upstream. If __io_queue_proc() fails to add a second poll entry, e.g. kmalloc() failed, but it goes on with a third waitqueue, it may succeed and overwrite the error status. Count the number of poll entries we added, so we can set pt->error to zero at the beginning and find out when the mentioned scenario happens. Cc: stable@vger.kernel.org Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d6b9e561f88bcc0163623b74a76c39f712151c3.1626774457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:46 +02:00
Yang Yingliang	090588059c	io_uring: fix clear IORING_SETUP_R_DISABLED in wrong function In commit 3ebba796fa25 ("io_uring: ensure that SQPOLL thread is started for exit"), the IORING_SETUP_R_DISABLED is cleared in io_sq_offload_start(), but when backport it to stable-5.10, IORING_SETUP_R_DISABLED is cleared in __io_req_task_submit(), move clearing IORING_SETUP_R_DISABLED to io_sq_offload_start() to fix this. Fixes: 6cae8095490ca ("io_uring: ensure that SQPOLL thread is started for exit") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:45:02 +02:00
Jens Axboe	9073188835	io_uring: convert io_buffer_idr to XArray commit 9e15c3a0ced5a61f320b989072c24983cb1620c1 upstream. Like we did for the personality idr, convert the IO buffer idr to use XArray. This avoids a use-after-free on removal of entries, since idr doesn't like doing so from inside an iterator, and it nicely reduces the amount of code we need to support this feature. Fixes: 5a2e745d4d43 ("io_uring: buffer registration infrastructure") Cc: stable@vger.kernel.org Cc: Matthew Wilcox <willy@infradead.org> Cc: yangerkun <yangerkun@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:44:56 +02:00
Matthew Wilcox (Oracle)	c5a50a220a	io_uring: Convert personality_idr to XArray commit 61cf93700fe6359552848ed5e3becba6cd760efa upstream. You can't call idr_remove() from within a idr_for_each() callback, but you can call xa_erase() from an xa_for_each() loop, so switch the entire personality_idr from the IDR to the XArray. This manifests as a use-after-free as idr_for_each() attempts to walk the rest of the node after removing the last entry from it. Fixes: 071698e13ac6 ("io_uring: allow registering credentials") Cc: stable@vger.kernel.org # 5.6+ Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [Pavel: rebased (creds load was moved into io_init_req())] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ccff36e1375f2b0ebf73d957f037b43becc0dde.1615212806.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:44:56 +02:00
Yejune Deng	cb2985feb1	io_uring: simplify io_remove_personalities() commit 0bead8cd39b9c9c7c4e902018ccf129107ac50ef upstream. The function io_remove_personalities() is very similar to io_unregister_personality(),so implement io_remove_personalities() calling io_unregister_personality(). Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:44:56 +02:00
Pavel Begunkov	a75457f630	io_uring: fix blocking inline submission commit 976517f162a05f4315b2373fd11585c395506259 upstream. There is a complaint against sys_io_uring_enter() blocking if it submits stdin reads. The problem is in __io_file_supports_async(), which sees that it's a cdev and allows it to be processed inline. Punt char devices using generic rules of io_file_supports_async(), including checking for presence of *_iter() versions of rw callbacks. Apparently, it will affect most of cdevs with some exceptions like null and zero devices. Cc: stable@vger.kernel.org Reported-by: Birk Hirdman <lonjil@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-14 16:56:54 +02:00
Pavel Begunkov	ec72cb50c1	io_uring: use better types for cflags [ Upstream commit 8c3f9cd1603d0e4af6c50ebc6d974ab7bdd03cf4 ] __io_cqring_fill_event() takes cflags as long to squeeze it into u32 in an CQE, awhile all users pass int or unsigned. Replace it with unsigned int and store it as u32 in struct io_completion to match CQE. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-10 13:39:23 +02:00
Pavel Begunkov	0b2a990e5d	io_uring: fix link timeout refs [ Upstream commit a298232ee6b9a1d5d732aa497ff8be0d45b5bd82 ] WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 Call Trace: __refcount_sub_and_test include/linux/refcount.h:283 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] io_put_req fs/io_uring.c:2140 [inline] io_queue_linked_timeout fs/io_uring.c:6300 [inline] __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354 io_submit_sqe fs/io_uring.c:6534 [inline] io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline] __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182 io_link_timeout_fn() should put only one reference of the linked timeout request, however in case of racing with the master request's completion first io_req_complete() puts one and then io_put_req_deferred() is called. Cc: stable@vger.kernel.org # 5.12+ Fixes: 9ae1f8dd372e0 ("io_uring: fix inconsistent lock state") Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.1620417627.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-06-10 13:39:23 +02:00
Pavel Begunkov	cbbc13b115	io_uring: fix overflows checks in provide buffers [ Upstream commit 38134ada0ceea3e848fe993263c0ff6207fd46e7 ] Colin reported before possible overflow and sign extension problems in io_provide_buffers_prep(). As Linus pointed out previous attempt did nothing useful, see d81269fecb8ce ("io_uring: fix provide_buffers sign extension"). Do that with help of check_<op>_overflow helpers. And fix struct io_provide_buf::len type, as it doesn't make much sense to keep it signed. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/46538827e70fce5f6cdb50897cff4cacc490f380.1618488258.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:28 +02:00
Thadeu Lima de Souza Cascardo	7e916d0124	io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers commit d1f82808877bb10d3deee7cf3374a4eb3fb582db upstream. Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS. Truncate those lengths when doing io_add_buffers, so buffer addresses still use the uncapped length. Also, take the chance and change struct io_buffer len member to __u32, so it matches struct io_provide_buffer len member. This fixes CVE-2021-3491, also reported as ZDI-CAN-13546. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Reported-by: Billy Jheng Bing-Jhong (@st424204) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:49:55 +02:00
Jens Axboe	6fbdce3cde	io_uring: don't mark S_ISBLK async work as unbounded [ Upstream commit 4b982bd0f383db9132e892c0c5144117359a6289 ] S_ISBLK is marked as unbounded work for async preparation, because it doesn't match S_ISREG. That is incorrect, as any read/write to a block device is also a bounded operation. Fix it up and ensure that S_ISBLK isn't marked unbounded. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-16 11:43:21 +02:00
Pavel Begunkov	7345d4b2d4	io_uring: fix timeout cancel return code [ Upstream commit 1ee4160c73b2102a52bc97a4128a89c34821414f ] When we cancel a timeout we should emit a sensible return code, like -ECANCELED but not 0, otherwise it may trick users. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-10 13:36:10 +02:00
Stefan Metzmacher	44c816c8b9	io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL [ Upstream commit 0031275d119efe16711cd93519b595e6f9b4b330 ] Without that it's not safe to use them in a linked combination with others. Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE should be possible. We already handle short reads and writes for the following opcodes: - IORING_OP_READV - IORING_OP_READ_FIXED - IORING_OP_READ - IORING_OP_WRITEV - IORING_OP_WRITE_FIXED - IORING_OP_WRITE - IORING_OP_SPLICE - IORING_OP_TEE Now we have it for these as well: - IORING_OP_SENDMSG - IORING_OP_SEND - IORING_OP_RECVMSG - IORING_OP_RECV For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC flags in order to call req_set_fail_links(). There might be applications arround depending on the behavior that even short send[msg]()/recv[msg]() retuns continue an IOSQE_IO_LINK chain. It's very unlikely that such applications pass in MSG_WAITALL, which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'. It's expected that the low level sock_sendmsg() call just ignores MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set SO_ZEROCOPY. We also expect the caller to know about the implicit truncation to MAX_RW_COUNT, which we don't detect. cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-07 15:00:06 +02:00
Stefan Metzmacher	21c2bbc17b	io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls [ Upstream commit 76cd979f4f38a27df22efb5773a0d567181a9392 ] We never want to generate any SIGPIPE, -EPIPE only is much better. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/38961085c3ec49fd21550c7788f214d1ff02d2d4.1615908477.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-07 15:00:06 +02:00
Pavel Begunkov	861fc287e0	io_uring: fix ->flags races by linked timeouts [ Upstream commit efe814a471e0e58f28f1efaf430c8784a4f36626 ] It's racy to modify req->flags from a not owning context, e.g. linked timeout calling req_set_fail_links() for the master request might race with that request setting/clearing flags while being executed concurrently. Just remove req_set_fail_links(prev) from io_link_timeout_fn(), io_async_find_and_cancel() and functions down the line take care of setting the fail bit. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-07 15:00:05 +02:00
Pavel Begunkov	dcf2dfc161	io_uring: fix provide_buffers sign extension [ Upstream commit d81269fecb8ce16eb07efafc9ff5520b2a31c486 ] io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension problems. Not a huge problem as it's only used for access_ok() and increases the checked length, but better to keep typing right. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-30 14:32:06 +02:00
Jens Axboe	76f496681d	io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return [ Upstream commit b5b0ecb736f1ce1e68eb50613c0cfecff10198eb ] The callback can only be armed, if we get -EIOCBQUEUED returned. It's important that we clear the WAITQ bit for other cases, otherwise we can queue for async retry and filemap will assume that we're armed and return -EAGAIN instead of just blocking for the IO. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-25 09:04:13 +01:00
Jens Axboe	3c08f772ad	io_uring: don't attempt IO reissue from the ring exit path [ Upstream commit 7c977a58dc83366e488c217fd88b1469d242bee5 ] If we're exiting the ring, just let the IO fail with -EAGAIN as nobody will care anyway. It's not the right context to reissue from. Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-25 09:04:13 +01:00
Pavel Begunkov	1c20e9040f	io_uring: fix inconsistent lock state [ Upstream commit 9ae1f8dd372e0e4c020b345cf9e09f519265e981 ] WARNING: inconsistent lock state inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&fs->lock); <Interrupt> lock(&fs->lock); * DEADLOCK * 1 lock held by syz-executor217/8450: #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442 stack backtrace: CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> [...] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] io_req_clean_work fs/io_uring.c:1398 [inline] io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046 io_free_req fs/io_uring.c:2269 [inline] io_double_put_req fs/io_uring.c:2392 [inline] io_put_req+0xf9/0x570 fs/io_uring.c:2388 io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497 __run_hrtimer kernel/time/hrtimer.c:1519 [inline] __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583 hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline] __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline] run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline] sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199 spin_unlock_irq include/linux/spinlock.h:404 [inline] io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525 __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594 io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639 io_queue_link_head fs/io_uring.c:6650 [inline] io_submit_sqe fs/io_uring.c:6697 [inline] io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Don't free requests from under hrtimer context (softirq) as it may sleep or take spinlocks improperly (e.g. non-irq versions). Cc: stable@vger.kernel.org # 5.6+ Reported-by: syzbot+81d17233a2b02eafba33@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-25 09:04:13 +01:00
Jens Axboe	6cae809549	io_uring: ensure that SQPOLL thread is started for exit commit 3ebba796fa251d042be42b929a2d916ee5c34a49 upstream. If we create it in a disabled state because IORING_SETUP_R_DISABLED is set on ring creation, we need to ensure that we've kicked the thread if we're exiting before it's been explicitly disabled. Otherwise we can run into a deadlock where exit is waiting go park the SQPOLL thread, but the SQPOLL thread itself is waiting to get a signal to start. That results in the below trace of both tasks hung, waiting on each other: INFO: task syz-executor458:8401 blocked for more than 143 seconds. Not tainted 5.11.0-next-20210226-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor458 state:D stack:27536 pid: 8401 ppid: 8400 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread_park fs/io_uring.c:7115 [inline] io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103 io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745 __io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840 io_uring_files_cancel include/linux/io_uring.h:47 [inline] do_exit+0x299/0x2a60 kernel/exit.c:780 do_group_exit+0x125/0x310 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43e899 RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000 R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds. task:iou-sqp-8401 state:D stack:30272 pid: 8402 ppid: 8400 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds. Reported-by: syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-25 09:04:09 +01:00

1 2 3 4 5 ...

974 Commits