IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Callers already have room to store the addr and length information,
clean it up by having the caller just assign the previously provided
data.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use a plain array for any group ID that's less than 64, and punt
anything beyond that to an xarray. 64 fits in a page even for 4KB
page sizes and with the planned additions.
This makes the expected group usage faster by avoiding a hash and lookup
to find our list, and it uses less memory upfront by not allocating any
memory for provided buffers unless it's actually being used.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The read/write opcodes use it already, but the recv/recvmsg do not. If
we switch them over and read and validate this at init time while we're
checking if the opcode supports it anyway, then we can do it in one spot
and we don't have to pass in a separate group ID for io_buffer_select().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's no point in validity checking buf_index if the request doesn't
have REQ_F_BUFFER_SELECT set, as we will never use it for that case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After the recent changes, this is direct call to io_buffer_select()
anyway. With this change, there are no wrappers left for provided
buffer selection.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For all of send/sendmsg and recv/recvmsg we have the local 'sr' variable,
yet some cases still use req->sr_msg which sr points to. Use 'sr'
consistently.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If IORING_RECVSEND_POLL_FIRST is set for recv/recvmsg or send/sendmsg,
then we arm poll first rather than attempt a receive or send upfront.
This can be useful if we expect there to be no data (or space) available
for the request, as we can then avoid wasting time on the initial
issue attempt.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't punt this check to the op prep handlers, add the support to
io_op_defs and we can check them while setting up the request.
This reduces the text size by 500 bytes on aarch64, and makes this less
fragile by having the check in one spot and needing opcodes to opt in
to IOPOLL or ioprio support.
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We defer file assignment to ensure that fixed files work with links
between a direct accept/open and the links that follow it. But this has
the side effect that normal file assignment is then not complete by the
time that request submission has been done.
For deferred execution, if the file is a regular file, assign it when
we do the async prep anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The IORING_SQ_NEED_WAKEUP flag is now set using atomic_or() which
implies a full barrier on some architectures but it is not required to
do so. Use the more appropriate smp_mb__after_atomic() which avoids the
extra barrier on those architectures.
Signed-off-by: Almog Khaikin <almogkh@gmail.com>
Link: https://lore.kernel.org/r/20220426163403.112692-1-almogkh@gmail.com
Fixes: 8018823e6987 ("io_uring: serialize ctx->rings->sq_flags with atomic_or/and")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for
running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the
application can tell if task_work is pending in the kernel for this
ring. This allows use cases like io_uring_peek_cqe() to still function
appropriately, or for the task to know when it would be useful to
call io_uring_wait_cqe() to run pending events.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-7-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().
This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-6-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that
just does a task wakeup and then we can remove the special wakeup we
have in task_work_add.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-5-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-3-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For now just use a CQE flag for this, with big CQE support we could
return the actual number of bytes left.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-5.19/io_uring-socket: (73 commits)
io_uring: use the text representation of ops in trace
io_uring: rename op -> opcode
io_uring: add io_uring_get_opcode
io_uring: add type to op enum
io_uring: add socket(2) support
net: add __sys_socket_file()
io_uring: fix trace for reduced sqe padding
io_uring: add fgetxattr and getxattr support
io_uring: add fsetxattr and setxattr support
fs: split off do_getxattr from getxattr
fs: split off setxattr_copy and do_setxattr function from setxattr
io_uring: return an error when cqe is dropped
io_uring: use constants for cq_overflow bitfield
io_uring: rework io_uring_enter to simplify return value
io_uring: trace cqe overflows
io_uring: add trace support for CQE overflow
io_uring: allow re-poll if we made progress
io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
io_uring: add support for IORING_ASYNC_CANCEL_ANY
io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key
...
Only allow data field to be 0 in struct io_uring_rsrc_update user
arguments to allow for future possible usage.
Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.
Fixes: 3e08773c3841 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.
Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.
Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In some debug scenarios it is useful to have the text representation of
the opcode. Add this function in preparation.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If IO_URING_SCM_ALL isn't set, as it would not be on 32-bit builds,
then we trigger a warning:
fs/io_uring.c: In function '__io_sqe_files_unregister':
fs/io_uring.c:8992:13: warning: unused variable 'i' [-Wunused-variable]
8992 | int i;
| ^
Move the ifdef up to include the 'i' variable declaration.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 5e45690a1cb8 ("io_uring: store SCM state in io_fixed_file->file_ptr")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move common error-handling to io_req_complete, so that various callers
avoid repeating that. Few callers (io_tee, io_splice) require slightly
different handling. These are changed to use __io_req_complete instead.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220422101048.419942-1-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This adds support to io_uring for the fgetxattr and getxattr API.
Signed-off-by: Stefan Roesch <shr@fb.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20220323154420.3301504-5-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This adds support to io_uring for the fsetxattr and setxattr API.
Signed-off-by: Stefan Roesch <shr@fb.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20220323154420.3301504-4-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Right now io_uring will not actively inform userspace if a CQE is
dropped. This is extremely rare, requiring a CQ ring overflow, as well as
a GFP_ATOMIC kmalloc failure. However the consequences could cause for
example applications to go into an undefined state, possibly waiting for a
CQE that never arrives.
Return an error code (EBADR) in these cases. Since this is expected to be
incredibly rare, try and avoid as much as possible affecting the hot code
paths, and so it only is returned lazily and when there is no other
available CQEs.
Once the error is returned, reset the error condition assuming the user is
either ok with it or will clean up appropriately.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-6-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring_enter returns the count submitted preferrably over an error
code. In some code paths this check is not required, so reorganise the
code so that the check is only done as needed.
This is also a prep for returning error codes only in waiting scenarios.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Trace cqe overflows in io_uring. Print ocqe before the check, so if it is
NULL it indicates that it has been dropped.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently check REQ_F_POLLED before arming async poll for a
notification to retry. If it's set, then we don't allow poll and will
punt to io-wq instead. This is done to prevent a situation where a buggy
driver will repeatedly return that there's space/data available yet we
get -EAGAIN.
However, if we already transferred data, then it should be safe to rely
on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is
also set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Like commit 7ba89d2af17a for recv/recvmsg, support MSG_WAITALL for the
send side. If this flag is set and we do a short send, retry for a
stream of seqpacket socket.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rather than match on a specific key, be it user_data or file, allow
canceling any request that we can lookup. Works like
IORING_ASYNC_CANCEL_ALL in that it cancels multiple requests, but it
doesn't key off user_data or the file.
Can't be set with IORING_ASYNC_CANCEL_FD, as that's a key selector.
Only one may be used at the time.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-6-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently sqe->addr must contain the user_data of the request being
canceled. Introduce the IORING_ASYNC_CANCEL_FD flag, which tells the
kernel that we're keying off the file fd instead for cancelation. This
allows canceling any request that a) uses a file, and b) was assigned the
file based on the value being passed in.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-5-axboe@kernel.dk
The current cancelation will lookup and cancel the first request it
finds based on the key passed in. Add a flag that allows to cancel any
request that matches they key. It completes with the number of requests
found and canceled, or res < 0 if an error occured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-4-axboe@kernel.dk
In preparation for being able to not only key cancel off the user_data,
pass in the io_cancel_data struct for the various functions that deal
with request cancelation.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-3-axboe@kernel.dk
Move ->timeout_lock grabbing inside of io_timeout_cancel(), so
we can do io_req_task_queue_fail() outside of the lock. It's much nicer
than relying on triple nested locking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cde758c2897930d31e205ed8f476d4ec879a8849.1650458197.git.asml.silence@gmail.com
[axboe: drop now wrong timeout_lock annotation]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous commit removed SCM accounting for non-unix sockets, as those
are the only ones that can cause a fixed file reference. While that is
true, it also means we're now dereferencing the file as part of the
workqueue driven __io_sqe_files_unregister() after the process has
exited. This isn't safe for SCM files, as unix gc may have already
reaped them when the process exited. KASAN complains about this:
[ 12.307040] Freed by task 0:
[ 12.307592] kasan_save_stack+0x28/0x4c
[ 12.308318] kasan_set_track+0x28/0x38
[ 12.309049] kasan_set_free_info+0x24/0x44
[ 12.309890] ____kasan_slab_free+0x108/0x11c
[ 12.310739] __kasan_slab_free+0x14/0x1c
[ 12.311482] slab_free_freelist_hook+0xd4/0x164
[ 12.312382] kmem_cache_free+0x100/0x1dc
[ 12.313178] file_free_rcu+0x58/0x74
[ 12.313864] rcu_core+0x59c/0x7c0
[ 12.314675] rcu_core_si+0xc/0x14
[ 12.315496] _stext+0x30c/0x414
[ 12.316287]
[ 12.316687] Last potentially related work creation:
[ 12.317885] kasan_save_stack+0x28/0x4c
[ 12.318845] __kasan_record_aux_stack+0x9c/0xb0
[ 12.319976] kasan_record_aux_stack_noalloc+0x10/0x18
[ 12.321268] call_rcu+0x50/0x35c
[ 12.322082] __fput+0x2fc/0x324
[ 12.322873] ____fput+0xc/0x14
[ 12.323644] task_work_run+0xac/0x10c
[ 12.324561] do_notify_resume+0x37c/0xe74
[ 12.325420] el0_svc+0x5c/0x68
[ 12.326050] el0t_64_sync_handler+0xb0/0x12c
[ 12.326918] el0t_64_sync+0x164/0x168
[ 12.327657]
[ 12.327976] Second to last potentially related work creation:
[ 12.329134] kasan_save_stack+0x28/0x4c
[ 12.329864] __kasan_record_aux_stack+0x9c/0xb0
[ 12.330735] kasan_record_aux_stack+0x10/0x18
[ 12.331576] task_work_add+0x34/0xf0
[ 12.332284] fput_many+0x11c/0x134
[ 12.332960] fput+0x10/0x94
[ 12.333524] __scm_destroy+0x80/0x84
[ 12.334213] unix_destruct_scm+0xc4/0x144
[ 12.334948] skb_release_head_state+0x5c/0x6c
[ 12.335696] skb_release_all+0x14/0x38
[ 12.336339] __kfree_skb+0x14/0x28
[ 12.336928] kfree_skb_reason+0xf4/0x108
[ 12.337604] unix_gc+0x1e8/0x42c
[ 12.338154] unix_release_sock+0x25c/0x2dc
[ 12.338895] unix_release+0x58/0x78
[ 12.339531] __sock_release+0x68/0xec
[ 12.340170] sock_close+0x14/0x20
[ 12.340729] __fput+0x18c/0x324
[ 12.341254] ____fput+0xc/0x14
[ 12.341763] task_work_run+0xac/0x10c
[ 12.342367] do_notify_resume+0x37c/0xe74
[ 12.343086] el0_svc+0x5c/0x68
[ 12.343510] el0t_64_sync_handler+0xb0/0x12c
[ 12.344086] el0t_64_sync+0x164/0x168
We have an extra bit we can use in file_ptr on 64-bit, use that to store
whether this file is SCM'ed or not, avoiding the need to look at the
file contents itself. This does mean that 32-bit will be stuck with SCM
for all registered files, just like 64-bit did before the referenced
commit.
Fixes: 1f59bc0f18cf ("io_uring: don't scm-account for non af_unix sockets")
Signed-off-by: Jens Axboe <axboe@kernel.dk>