IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 73911426aaaadbae54fa72359b33a7b6a56947db upstream.
All other opcodes correctly check if this is set and -EINVAL if it is
and they don't support that field, for some reason the these were
forgotten.
This was unified a bit differently in the upstream tree, but had the
same effect as making sure we error on this field. Rather than have
a painful backport of the upstream commit, just fixup the mentioned
opcodes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A previous commit ended up enabling file tracking for iopoll requests,
which conflicts with both of them using the same list entry for tracking.
Add a separate list entry just for iopoll requests, avoid this issue.
No upstream commit exists for this issue.
Reported-by: Greg Thelen <gthelen@google.com>
Fixes: df3f3bb5059d ("io_uring: add missing item types for various requests")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Any read/write should grab current->nsproxy, denoted by IO_WQ_WORK_FILES
as it refers to current->files as well, and connect and recv/recvmsg,
send/sendmsg should grab current->fs which is denoted by IO_WQ_WORK_FS.
No upstream commit exists for this issue.
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit cd65869512ab5668a5d16f789bc4da1319c435c4 ]
The issue was first described and addressed in
89c2b3b7491820 ("io_uring: reexpand under-reexpanded iters"), but
shortly after reimplemented as.
cd65869512ab56 ("io_uring: use iov_iter state save/restore helpers").
Here we follow the approach from the second patch but without in-callback
resubmissions, fixups for not yet supported in 5.10 short read retries
and replacing iov_iter_state with iter copies to not pull even more
dependencies, and because it's just much simpler.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can't re-import or modify iterators from iocb callbacks, it's not
safe as it might be reverted and/or reexpanded while unwinding stack.
It's also not safe to resubmit as io-wq thread will race with stack
undwinding for the iterator and other data.
Disallow resubmission from callbacks, it can fail some cases that were
handled before, but the possibility of such a failure was a part of the
API from the beginning and so it should be fine.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lee reports that there's a use-after-free of the process file table.
There's an assumption that we don't need the file table for some
variants of statx invocation, but that turns out to be false and we
end up with not grabbing a reference for the request even if the
deferred execution uses it.
Get rid of the REQ_F_NO_FILE_TABLE optimization for statx, and always
grab that reference.
This issues doesn't exist upstream since the native workers got
introduced with 5.12.
Link: https://lore.kernel.org/io-uring/YoOJ%2FT4QRKC+fAZE@google.com/
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the ring is setup with IORING_SETUP_IOPOLL and we have more than
one task doing submissions on a ring, we can up in a situation where
we assign the context from the current task rather than the request
originator.
Always use req->task rather than assume it's the same as current.
No upstream patch exists for this issue, as only older kernels with
the non-native workers have this problem.
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream.
io_flush_timeouts() assumes the timeout isn't in progress of triggering
or being removed/canceled, so it unconditionally removes it from the
timeout list and attempts to cancel it.
Leave it on the list and let the normal timeout cancelation take care
of it.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a07211e3001435fe8591b992464cd8d5e3c98c5a ]
It's safer to not touch scm_fp_list after we queued an skb to which it
was assigned, there might be races lurking if we screw subtle sync
guarantees on the io_uring side.
Fixes: 6b06314c47e14 ("io_uring: add file set registration")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c86d18f4aa93e0e66cda0e55827cd03eea6bc5f8 upstream.
When there are no files for __io_sqe_files_scm() to process in the
range, it'll free everything and return. However, it forgets to put uid.
Fixes: 08a451739a9b5 ("io_uring: allow sparse fixed file sets")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/accee442376f33ce8aaebb099d04967533efde92.1648226048.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e929367468c8f97cd1ffb0417316cecfebef94b ]
The fix for not advancing the iterator if we're using fixed buffers is
broken in that it can hit a condition where we don't terminate the loop.
This results in io-wq looping forever, asking to read (or write) 0 bytes
for every subsequent loop.
Reported-by: Joel Jaeschke <joel.jaeschke@gmail.com>
Link: https://github.com/axboe/liburing/issues/549
Fixes: 16c8d2df7ec0 ("io_uring: ensure symmetry in handling iter types in loop_rw_iter()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f70865db5ff35f5ed0c7e9ef63e7cca3d4947f04 upstream.
Revert of revert of "io_uring: wait potential ->release() on resurrect",
which adds a helper for resurrect not racing completion reinit, as was
removed because of a strange bug with no clear root or link to the
patch.
Was improved, instead of rcu_synchronize(), just wait_for_completion()
because we're at 0 refs and it will happen very shortly. Specifically
use non-interruptible version to ignore all pending signals that may
have ended prior interruptible wait.
This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a080c20f686d026efade810b116b72f88abaff9.1618101759.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 792bb6eb862333658bf1bd2260133f0507e2da8d upstream.
[ 97.866748] a.out/2890 is trying to acquire lock:
[ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at:
io_wq_submit_work+0x155/0x240
[ 97.869735]
[ 97.869735] but task is already holding lock:
[ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at:
__x64_sys_io_uring_enter+0x3f0/0x5b0
[ 97.873074]
[ 97.873074] other info that might help us debug this:
[ 97.874520] Possible unsafe locking scenario:
[ 97.874520]
[ 97.875845] CPU0
[ 97.876440] ----
[ 97.877048] lock(&ctx->uring_lock);
[ 97.877961] lock(&ctx->uring_lock);
[ 97.878881]
[ 97.878881] *** DEADLOCK ***
[ 97.878881]
[ 97.880341] May be due to missing lock nesting notation
[ 97.880341]
[ 97.881952] 1 lock held by a.out/2890:
[ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at:
__x64_sys_io_uring_enter+0x3f0/0x5b0
[ 97.885108]
[ 97.885108] stack backtrace:
[ 97.890457] Call Trace:
[ 97.891121] dump_stack+0xac/0xe3
[ 97.891972] __lock_acquire+0xab6/0x13a0
[ 97.892940] lock_acquire+0x2c3/0x390
[ 97.894894] __mutex_lock+0xae/0x9f0
[ 97.901101] io_wq_submit_work+0x155/0x240
[ 97.902112] io_wq_cancel_cb+0x162/0x490
[ 97.904126] io_async_find_and_cancel+0x3b/0x140
[ 97.905247] io_issue_sqe+0x86d/0x13e0
[ 97.909122] __io_queue_sqe+0x10b/0x550
[ 97.913971] io_queue_sqe+0x235/0x470
[ 97.914894] io_submit_sqes+0xcce/0xf10
[ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0
[ 97.921424] do_syscall_64+0x2d/0x40
[ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9
While holding uring_lock, e.g. from inline execution, async cancel
request may attempt cancellations through io_wq_submit_work, which may
try to grab a lock. Delay it to task_work, so we do it from a clean
context and don't have to worry about locking.
Cc: <stable@vger.kernel.org> # 5.5+
Fixes: c07e6719511e ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()")
Reported-by: Abaci <abaci@linux.alibaba.com>
Reported-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[Lee: The first hunk solves a different (double free) issue in v5.10.
Only the first hunk of the original patch is relevant to v5.10 AND
the first hunk of the original patch is only relevant to v5.10]
Reported-by: syzbot+59d8a1f4e60c20c066cf@syzkaller.appspotmail.com
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The linux-5.10.y backport of commit "io_uring: add ->splice_fd_in checks"
includes a typo: "|" where "||" should be. (The original upstream commit
is fine.)
Fixes: 54eb6211b979 ("io_uring: add ->splice_fd_in checks")
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org # v5.10
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9990da93d2bf9892c2c14c958bef050d4e461a1a ]
For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.
Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 16c8d2df7ec0eed31b7d3b61cb13206a7fb930cc upstream.
When setting up the next segment, we check what type the iter is and
handle it accordingly. However, when incrementing and processed amount
we do not, and both iter advance and addr/len are adjusted, regardless
of type. Split the increment side just like we do on the setup side.
Fixes: 4017eb91a9e7 ("io_uring: make loop_rw_iter() use original user supplied pointers")
Cc: stable@vger.kernel.org
Reported-by: Valentina Palmiotti <vpalmiotti@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 632546c4b5a4dad8e3ac456406c65c0db9a0b570 ]
io_size and iov_count in io_read() and io_write() hold the same value,
kill the last one.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10fc72e43352753a08f9cf83aa5c40baec00d212 ]
This is the only code that relies on import_iovec() returning
iter.count on success.
This allows a better interface to import_iovec().
Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 26578cda3db983b17cabe4e577af26306beb9987 upstream.
->splice_fd_in is used only by splice/tee, but no other request checks
it for validity. Add the check for most of request types excluding
reads/writes/sends/recvs, we don't want overhead for them and can leave
them be as is until the field is actually used.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8 upstream.
During some testing, it became evident that using IORING_OP_WRITE doesn't
hash buffered writes like the other writes commands do. That's simply
an oversight, and can cause performance regressions when doing buffered
writes with this command.
Correct that and add the flag, so that buffered writes are correctly
hashed when using the non-iovec based write command.
Cc: stable@vger.kernel.org
Fixes: 3a6820f2bb8a ("io_uring: add non-vectored read/write commands")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit 21f965221e7c42609521342403e8fb91b8b3e76e ]
If an SQPOLL based ring is newly created and an application issues an
io_uring_enter(2) system call on it, then we can return a spurious
-EOWNERDEAD error. This happens because there's nothing to submit, and
if the caller doesn't specify any other action, the initial error
assignment of -EOWNERDEAD never gets overwritten. This causes us to
return it directly, even if it isn't valid.
Move the error assignment into the actual failure case instead.
Cc: stable@vger.kernel.org
Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death")
Reported-by: Sherlock Holo sherlockya@gmail.com
Link: https://github.com/axboe/liburing/issues/413
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ upstream commit a30f895ad3239f45012e860d4f94c1a388b36d14 ]
We currently check for ret != 0 to indicate error, but '1' is a valid
return and just indicates that the allocation succeeded with a wrap.
Correct the check to be for < 0, like it was before the xarray
conversion.
Cc: stable@vger.kernel.org
Fixes: 61cf93700fe6 ("io_uring: Convert personality_idr to XArray")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 46fee9ab02cb24979bbe07631fc3ae95ae08aa3e upstream.
__io_queue_proc() can enqueue both poll entries and still fail
afterwards, so the callers trying to cancel it should also try to remove
the second poll entry (if any).
For example, it may leave the request alive referencing a io_uring
context but not accessible for cancellation:
[ 282.599913][ T1620] task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004
[ 282.609927][ T1620] Call Trace:
[ 282.613711][ T1620] __schedule+0x93a/0x26f0
[ 282.634647][ T1620] schedule+0xd3/0x270
[ 282.638874][ T1620] io_uring_cancel_generic+0x54d/0x890
[ 282.660346][ T1620] io_sq_thread+0xaac/0x1250
[ 282.696394][ T1620] ret_from_fork+0x1f/0x30
Cc: stable@vger.kernel.org
Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users")
Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.1626774457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68b11e8b1562986c134764433af64e97d30c9fc0 upstream.
If __io_queue_proc() fails to add a second poll entry, e.g. kmalloc()
failed, but it goes on with a third waitqueue, it may succeed and
overwrite the error status. Count the number of poll entries we added,
so we can set pt->error to zero at the beginning and find out when the
mentioned scenario happens.
Cc: stable@vger.kernel.org
Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d6b9e561f88bcc0163623b74a76c39f712151c3.1626774457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 3ebba796fa25 ("io_uring: ensure that SQPOLL thread is started for exit"),
the IORING_SETUP_R_DISABLED is cleared in io_sq_offload_start(), but when backport
it to stable-5.10, IORING_SETUP_R_DISABLED is cleared in __io_req_task_submit(),
move clearing IORING_SETUP_R_DISABLED to io_sq_offload_start() to fix this.
Fixes: 6cae8095490ca ("io_uring: ensure that SQPOLL thread is started for exit")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e15c3a0ced5a61f320b989072c24983cb1620c1 upstream.
Like we did for the personality idr, convert the IO buffer idr to use
XArray. This avoids a use-after-free on removal of entries, since idr
doesn't like doing so from inside an iterator, and it nicely reduces
the amount of code we need to support this feature.
Fixes: 5a2e745d4d43 ("io_uring: buffer registration infrastructure")
Cc: stable@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Cc: yangerkun <yangerkun@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61cf93700fe6359552848ed5e3becba6cd760efa upstream.
You can't call idr_remove() from within a idr_for_each() callback,
but you can call xa_erase() from an xa_for_each() loop, so switch the
entire personality_idr from the IDR to the XArray. This manifests as a
use-after-free as idr_for_each() attempts to walk the rest of the node
after removing the last entry from it.
Fixes: 071698e13ac6 ("io_uring: allow registering credentials")
Cc: stable@vger.kernel.org # 5.6+
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[Pavel: rebased (creds load was moved into io_init_req())]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7ccff36e1375f2b0ebf73d957f037b43becc0dde.1615212806.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bead8cd39b9c9c7c4e902018ccf129107ac50ef upstream.
The function io_remove_personalities() is very similar to
io_unregister_personality(),so implement io_remove_personalities()
calling io_unregister_personality().
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 976517f162a05f4315b2373fd11585c395506259 upstream.
There is a complaint against sys_io_uring_enter() blocking if it submits
stdin reads. The problem is in __io_file_supports_async(), which
sees that it's a cdev and allows it to be processed inline.
Punt char devices using generic rules of io_file_supports_async(),
including checking for presence of *_iter() versions of rw callbacks.
Apparently, it will affect most of cdevs with some exceptions like
null and zero devices.
Cc: stable@vger.kernel.org
Reported-by: Birk Hirdman <lonjil@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c3f9cd1603d0e4af6c50ebc6d974ab7bdd03cf4 ]
__io_cqring_fill_event() takes cflags as long to squeeze it into u32 in
an CQE, awhile all users pass int or unsigned. Replace it with unsigned
int and store it as u32 in struct io_completion to match CQE.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38134ada0ceea3e848fe993263c0ff6207fd46e7 ]
Colin reported before possible overflow and sign extension problems in
io_provide_buffers_prep(). As Linus pointed out previous attempt did nothing
useful, see d81269fecb8ce ("io_uring: fix provide_buffers sign extension").
Do that with help of check_<op>_overflow helpers. And fix struct
io_provide_buf::len type, as it doesn't make much sense to keep it
signed.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/46538827e70fce5f6cdb50897cff4cacc490f380.1618488258.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d1f82808877bb10d3deee7cf3374a4eb3fb582db upstream.
Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on
that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS.
Truncate those lengths when doing io_add_buffers, so buffer addresses still
use the uncapped length.
Also, take the chance and change struct io_buffer len member to __u32, so
it matches struct io_provide_buffer len member.
This fixes CVE-2021-3491, also reported as ZDI-CAN-13546.
Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Reported-by: Billy Jheng Bing-Jhong (@st424204)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4b982bd0f383db9132e892c0c5144117359a6289 ]
S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ee4160c73b2102a52bc97a4128a89c34821414f ]
When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0031275d119efe16711cd93519b595e6f9b4b330 ]
Without that it's not safe to use them in a linked combination with
others.
Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.
We already handle short reads and writes for the following opcodes:
- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE
Now we have it for these as well:
- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV
For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().
There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.
It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.
We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efe814a471e0e58f28f1efaf430c8784a4f36626 ]
It's racy to modify req->flags from a not owning context, e.g. linked
timeout calling req_set_fail_links() for the master request might race
with that request setting/clearing flags while being executed
concurrently. Just remove req_set_fail_links(prev) from
io_link_timeout_fn(), io_async_find_and_cancel() and functions down the
line take care of setting the fail bit.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d81269fecb8ce16eb07efafc9ff5520b2a31c486 ]
io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension
problems. Not a huge problem as it's only used for access_ok() and
increases the checked length, but better to keep typing right.
Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5b0ecb736f1ce1e68eb50613c0cfecff10198eb ]
The callback can only be armed, if we get -EIOCBQUEUED returned. It's
important that we clear the WAITQ bit for other cases, otherwise we can
queue for async retry and filemap will assume that we're armed and
return -EAGAIN instead of just blocking for the IO.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c977a58dc83366e488c217fd88b1469d242bee5 ]
If we're exiting the ring, just let the IO fail with -EAGAIN as nobody
will care anyway. It's not the right context to reissue from.
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3ebba796fa251d042be42b929a2d916ee5c34a49 upstream.
If we create it in a disabled state because IORING_SETUP_R_DISABLED is
set on ring creation, we need to ensure that we've kicked the thread if
we're exiting before it's been explicitly disabled. Otherwise we can run
into a deadlock where exit is waiting go park the SQPOLL thread, but the
SQPOLL thread itself is waiting to get a signal to start.
That results in the below trace of both tasks hung, waiting on each other:
INFO: task syz-executor458:8401 blocked for more than 143 seconds.
Not tainted 5.11.0-next-20210226-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor458 state:D stack:27536 pid: 8401 ppid: 8400 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4324 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5075
schedule+0xcf/0x270 kernel/sched/core.c:5154
schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
io_sq_thread_park fs/io_uring.c:7115 [inline]
io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103
io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745
__io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840
io_uring_files_cancel include/linux/io_uring.h:47 [inline]
do_exit+0x299/0x2a60 kernel/exit.c:780
do_group_exit+0x125/0x310 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x43e899
RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000
R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds.
task:iou-sqp-8401 state:D stack:30272 pid: 8402 ppid: 8400 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4324 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5075
schedule+0xcf/0x270 kernel/sched/core.c:5154
schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds.
Reported-by: syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>