IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
Not tainted 5.15.0-rc5-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
io_worker_exit fs/io-wq.c:183 [inline]
io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
io-wq worker may submit a task_work to the master task and upon
io_worker_exit() wait for the tw to get executed. The problem appears
when the master task is waiting in coredump.c:
468 freezer_do_not_count();
469 wait_for_completion(&core_state->startup);
470 freezer_count();
Apparently having some dependency on children threads getting everything
stuck. Workaround it by cancelling the taks_work callback that causes it
before going into io_worker_exit() waiting.
p.s. probably a better option is to not submit tw elevating the refcount
in the first place, but let's leave this excercise for the future.
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/142a716f4ed936feae868959059154362bfa8c19.1635509451.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The ring iteration is racy, which isn't necessarily a problem except it
can cause us to iterate the whole thing. That isn't desired or ideal,
and it can lead to excessive runtimes of reading fdinfo.
Cap the iteration at tail - head OR the ring size. While in there, clean
up the ring masking and just dump the raw values along with the masks.
That provides more useful debug info.
Fixes: 83f84356bc8f ("io_uring: add more uring info to fdinfo for debug")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ioprio setup doesn't depend on other fields that are modified in
io_prep_rw() and we can move it down in the function without worrying
about performance. It's useful as it makes iocb->ki_flags
accesses/modifications closer together, so it's more likely the compiler
will cache it in a register and avoid extra reloads.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
opcode prep functions are one of the first things that are called, we
can't have ->async_data allocated at this point and it's certainly a
bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE
just in case.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If an opcode doesn't support polling, just let it be executed
synchronously in iowq, otherwise it will do a nonblock attempt just to
fail in io_arm_poll_handler() and return back to blocking execution.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->pollout or ->pollin are set only for opcodes that need a file, so if
io_arm_poll_handler() tests them first we can be sure that the request
has file set and the ->file check can be removed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error
on the same lines as the check instead of having a weird code flow. The
main loop doesn't change but goes one indention left.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid
of switch, just replace it with a single if as we're retrying in both
other cases. Kill issue_sqe label, Get rid of needs_poll nesting and
disambiguate a bit the comment.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current logic of requests with IOSQE_ASYNC is first queueing it to
io-worker, then execute it in a synchronous way. For unbound works like
pollable requests(e.g. read/write a socketfd), the io-worker may stuck
there waiting for events for a long time. And thus other works wait in
the list for a long time too.
Let's introduce a new way for unbound works (currently pollable
requests), with this a request will first be queued to io-worker, then
executed in a nonblock try rather than a synchronous way. Failure of
that leads it to arm poll stuff and then the worker can begin to handle
other works.
The detail process of this kind of requests is:
step1: original context:
queue it to io-worker
step2: io-worker context:
nonblock try(the old logic is a synchronous try here)
|
|--fail--> arm poll
|
|--(fail/ready)-->synchronous issue
|
|--(succeed)-->worker finish it's job, tw
take over the req
This works much better than the old IOSQE_ASYNC logic in cases where
unbound max_worker is relatively small. In this case, number of
io-worker eazily increments to max_worker, new worker cannot be created
and running workers stuck there handling old works in IOSQE_ASYNC mode.
In my 64-core machine, set unbound max_worker to 20, run echo-server,
turns out:
(arguments: register_file, connetion number is 1000, message size is 12
Byte)
original IOSQE_ASYNC: 76664.151 tps
after this patch: 166934.985 tps
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211018133445.103438-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use ERR_CAST() instead of ERR_PTR(PTR_ERR()).
This makes it more readable and also fix this warning detected by
err_cast.cocci:
./fs/io_uring.c: WARNING: 3208: 11-18: ERR_CAST can be used with buf
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Link: https://lore.kernel.org/r/20211020084948.1038420-1-deng.changcheng@zte.com.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently force_nonblock stands for three meanings:
- nowait or not
- in an io-worker or not(hold uring_lock or not)
Let's split the logic to two flags, IO_URING_F_NONBLOCK and
IO_URING_F_UNLOCKED for convenience of the next patch.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20211018133431.103298-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When enabling -Wunused warnings by building with W=1, I get an
instance of the -Wunused-but-set-parameter warning in the io_uring code:
fs/io_uring.c: In function 'io_queue_async_work':
fs/io_uring.c:1445:61: error: parameter 'locked' set but not used [-Werror=unused-but-set-parameter]
1445 | static void io_queue_async_work(struct io_kiocb *req, bool *locked)
| ~~~~~~^~~~~~
There are very few warnings of this type, so it would be nice to enable
this by default and fix all the existing instances. As the assignment
serves no purpose by itself other than to prevent developers from using
the variable, an easy workaround is to remove the assignment and just
rename the argument to "dont_use".
Fixes: f237c30a5610 ("io_uring: batch task work locking")
Link: https://lore.kernel.org/lkml/20210920121352.93063-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211019153507.348480-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The block layer can use this knowledge to make smarter decisions on
how to handle the request, if it knows that N more may be coming. Switch
to using blk_start_plug_nr_ios() to pass in that information.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge REQ_F_NOWAIT_READ and REQ_F_NOWAIT_WRITE into one flag, i.e.
REQ_F_SUPPORT_NOWAIT. First it gets rid of dependence on CONFIG_64BIT
but also simplifies the code.
One thing to consider is when we don't have ->{read,write}_iter and go
through loop_rw_iter(). Just fail it with -EAGAIN if we expect nowait
behaviour but not sure whether it supports it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f832a20e5186c2e79c6519280c238f559a1d2bbc.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't check if we can do nowait before arming apoll, there are several
reasons for that. First, we don't care much about files that don't
support nowait. Second, it may be useful -- we don't want to be taking
away extra workers from io-wq when it can go in some async. Even if it
will go through io-wq eventually, it make difference in the numbers of
workers actually used. And the last one, it's needed to clean nowait in
future commits.
[kernel test robot: fix unused-var]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d06f3cb2c8b686d970269a87986f154edb83043.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit reorders the conditions in a branch in io_write. The
reorder to check 'ret2 == -EAGAIN' first as checking
'(req->ctx->flags & IORING_SETUP_IOPOLL)' will likely be more
expensive due to 2x memory derefences.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Link: https://lore.kernel.org/r/20211017013229.4124279-1-goldstein.w.n@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We pass iovec** into __io_import_iovec(), which should keep it,
initialise and modify accordingly. It's expensive, return it directly
from __io_import_iovec encoding errors with ERR_PTR if needed.
io_import_iovec keeps the old interface, but it's inline and so
everything is optimised nicely.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6230e9769982f03a8f86fa58df24666088c44d3e.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Combine force_nonblock branches (which is already optimised by
compiler), flip branches so the most hot/common path is the first, e.g.
as with non on-stack iov setup, and add extra likely/unlikely
attributions for errror paths.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2c2536c5896d70994de76e387ea09a0402173a3f.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make io_import_iovec taking struct io_rw_state instead of an iter
pointer. First it takes care of initialising iovec pointer, which can be
forgotten. Even more, we can not init it if not needed, e.g. in case of
IORING_OP_READ_FIXED or IORING_OP_READ. Also hide saving iter_state
inside of it by splitting out an inline function of it to avoid extra
ifs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b1bbc213a95e5272d4da5867bb977d9acb6f2109.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
First, change IO_URING_F_NONBLOCK to take sign bit of the int, so
checking for it can be turned into test + sign-based-jump, makes the
binary smaller and may be faster.
Then, instead of passing need_lock boolean into io_import_iovec() just
give it issue_flags, which is already stored somewhere. Saves some space
on stack, a couple of test + cmov operations and other conversions.
note: we still leave
force_nonblock = issue_flags & IO_URING_F_NONBLOCK
variable, but it's optimised out by the compiler into testing
issue_flags directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ee96547e692f6c975c229cd82fc721679571a734.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently io_read() and io_write() keep separate pointers to an iter and
to struct iov_iter_state, which is not great for register spilling and
requires more on-stack copies. They are both either on-stack or in
req->async_data at the same time, so use struct io_rw_state and keep a
pointer only to it, so having all the state with just one pointer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5c5e7ffd7dc25fc35075c70411ba99df72f237fa.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't override req->result in io_complete_rw_iopoll() when it's already
of the same value, we have an if just above it, so move the assignment
there. Also, add one simle unlikely() in __io_complete_rw_common().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8dfeb4f84026a20172bcf82c05010abe955874ae.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Apparently, percpu_ref_put/get() are expensive enough if done per
request, get them in a batch and cache on the submission side to avoid
getting it over and over again. Also, if we're completing under
uring_lock, return refs back into the cache instead of
perfcpu_ref_put(). Pretty similar to how we do tctx->cached_refs
accounting, but fall back to normal putting when we already changed a
rsrc node by the time of free.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b40d8c5bc77d3c9550df8a319117a374ac85f8f4.1633817310.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Looking at the assembly, the compiler decided to reload req->opcode in
io_op_defs[opcode].needs_file instead of one it had in a register, so
store it in a temp variable so it can be optimised out. Also move the
personality block later, it's better for spilling/etc. as it only
depends on @sqe, which we're keeping anyway.
By the way, zero req->opcode if it over IORING_OP_LAST, not a problem,
at the moment but is safer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6ba869f5f8b7b0f991c87fdf089f0abf87cbe06b.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Plugging is only needed with requests that also need a file, so hide
plugging under a ->needs_file check. Also, place ->needs_file and ->plug
bits into the same byte of io_op_defs, it may matter for compilers, e.g.
only with the change a tested one decided to optimise two memory testb
into a more with two register testb.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1600d1287bb7d16451d4ef3343252787a5314927.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->async_data is a slow path, so it won't matter much if we do the clean
up inside io_clean_op(). Moreover, in many cases it's allocated together
with setting one or more of IO_REQ_CLEAN_FLAGS flags, so it'd go through
io_clean_op() anyway.
Control ->async_data allocation with a new flag REQ_F_ASYNC_DATA, so we
can do all the maintainence under io_req_needs_clean() fast check.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6892cf5883c459f36bda26f30ceb16742b20b84b.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Delay reading the next node in io_free_batch_list(), allows the compiler
to load the value a bit later improving register spilling in some cases.
With gcc 11.1 it helped to move @task_refs variable from the stack to a
register and optimises out a couple of per request instructions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cc9fdfb6f72a4e8bc9918a5e9f2d97869a263ae4.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Attribute cold functions so compilers can optimise them for size. It
shrinks the binary by 2.5-3%
text data bss dec hex filename
90670 14002 8 104680 198e8 ./fs/io_uring.o
88053 14002 8 102063 18eaf ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b53d385f91dca45170b67d7f11c7abd787e821f6.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currenlty, we allocate one ctx reference per request at submission time
and put them at free. It's batched and not so expensive but it still
bloats the kernel, adds 2 function calls for rcu and adds some overhead
for request counting in io_free_batch_list().
Always keep one reference with a request, even when it's freed and in
io_uring request caches. There is extra work at ring exit / quiesce
paths, which now need to put all cached requests. io_ring_exit_work() is
already looping, so it's not a problem. Add hybrid-busy waiting to
io_ctx_quiesce() as well for now.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/99613fbe396e80777228cde39bbda1aa8938554e.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>