linux

iv/linux

Author	SHA1	Message	Date
Jens Axboe	2fcabce2d7	io_uring: disallow mixed provided buffer group registrations It's nonsensical to register a provided buffer ring, if a classic provided buffer group with the same ID exists. Depending on the order of which we decide what type to pick, the other type will never get used. Explicitly disallow it and return an error if this is attempted. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 16:21:30 -06:00
Jens Axboe	1d0dbbfa28	io_uring: initialize io_buffer_list head when shared ring is unregistered We use ->buf_pages != 0 to tell if this is a shared buffer ring or a classic provided buffer group. If we unregister the shared ring and then attempt to use it, buf_pages is zero yet the classic list head isn't properly initialized. This causes io_buffer_select() to think that we have classic buffers available, but then we crash when we try and get one from the list. Just initialize the list if we unregister a shared buffer ring, leaving it in a sane state for either re-registration or for attempting to use it. And do the same for the initial setup from the classic path. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 16:21:11 -06:00
Pavel Begunkov	0184f08e65	io_uring: add fully sparse buffer registration Honour IORING_RSRC_REGISTER_SPARSE not only for direct files but fixed buffers as well. It makes the rsrc API more consistent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66f429e4912fe39fb3318217ff33a2853d4544be.1652879898.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 12:53:04 -06:00
Christoph Hellwig	0bf1dbee9b	io_uring: use rcu_dereference in io_close Accessing the file table needs a rcu_dereference_protected(). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	a294bef57c	io_uring: consistently use the EPOLL* defines POLL* are unannotated values for the userspace ABI, while everything in-kernel should use EPOLL* and the __poll_t type. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	58f5c8d39e	io_uring: make apoll_events a __poll_t apoll_events is fed to vfs_poll and the poll tables, so it should be a __poll_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	ee67ba3b20	io_uring: drop a spurious inline on a forward declaration io_file_get_normal isn't marked inline, so don't claim it as such in the forward declaration. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	984824db84	io_uring: don't use ERR_PTR for user pointers ERR_PTR abuses the high bits of a pointer to transport error information. This is only safe for kernel pointers and not user pointers. Fix io_buffer_select and its helpers to just return NULL for failure and get rid of this abuse. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:18:56 -06:00
Christoph Hellwig	20cbd21d89	io_uring: use a rwf_t for io_rw.flags Use the proper type. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:17:52 -06:00
Jens Axboe	c7fb19428d	io_uring: add support for ring mapped supplied buffers Provided buffers allow an application to supply io_uring with buffers that can then be grabbed for a read/receive request, when the data source is ready to deliver data. The existing scheme relies on using IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use in real world applications. It's pretty efficient if the application is able to supply back batches of provided buffers when they have been consumed and the application is ready to recycle them, but if fragmentation occurs in the buffer space, it can become difficult to supply enough buffers at the time. This hurts efficiency. Add a register op, IORING_REGISTER_PBUF_RING, which allows an application to setup a shared queue for each buffer group of provided buffers. The application can then supply buffers simply by adding them to this ring, and the kernel can consume then just as easily. The ring shares the head with the application, the tail remains private in the kernel. Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the ring, they must use the mapped ring. Mapped provided buffer rings can co-exist with normal provided buffers, just not within the same group ID. To gauge overhead of the existing scheme and evaluate the mapped ring approach, a simple NOP benchmark was written. It uses a ring of 128 entries, and submits/completes 32 at the time. 'Replenish' is how many buffers are provided back at the time after they have been consumed: Test Replenish NOPs/sec ================================================================ No provided buffers NA ~30M Provided buffers 32 ~16M Provided buffers 1 ~10M Ring buffers 32 ~27M Ring buffers 1 ~27M The ring mapped buffers perform almost as well as not using provided buffers at all, and they don't care if you provided 1 or more back at the same time. This means application can just replenish as they go, rather than need to batch and compact, further reducing overhead in the application. The NOP benchmark above doesn't need to do any compaction, so that overhead isn't even reflected in the above test. Co-developed-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:42 -06:00
Jens Axboe	d8c2237d0a	io_uring: add io_pin_pages() helper Abstract this out from io_sqe_buffer_register() so we can use it elsewhere too without duplicating this code. No intended functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:42 -06:00
Jens Axboe	3d200242a6	io_uring: add buffer selection support to IORING_OP_NOP Obviously not really useful since it's not transferring data, but it is helpful in benchmarking overhead of provided buffers. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:42 -06:00
Jens Axboe	e7637a492b	io_uring: fix locking state for empty buffer group io_provided_buffer_select() must drop the submit lock, if needed, even in the error handling case. Failure to do so will leave us with the ctx->uring_lock held, causing spew like: ==================================== WARNING: iou-wrk-366/368 still has locks held! 5.18.0-rc6-00294-gdf8dc7004331 #994 Not tainted ------------------------------------ 1 lock held by iou-wrk-366/368: #0: ffff0000c72598a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_ring_submit_lock+0x20/0x48 stack backtrace: CPU: 4 PID: 368 Comm: iou-wrk-366 Not tainted 5.18.0-rc6-00294-gdf8dc7004331 #994 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0xa4/0xd4 show_stack+0x14/0x5c dump_stack_lvl+0x88/0xb0 dump_stack+0x14/0x2c debug_check_no_locks_held+0x84/0x90 try_to_freeze.isra.0+0x18/0x44 get_signal+0x94/0x6ec io_wqe_worker+0x1d8/0x2b4 ret_from_fork+0x10/0x20 and triggering later hangs off get_signal() because we attempt to re-grab the lock. Reported-by: syzbot+987d7bb19195ae45208c@syzkaller.appspotmail.com Fixes: `149c69b04a` ("io_uring: abstract out provided buffer list selection") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:41 -06:00
Hao Xu	4e86a2c980	io_uring: implement multishot mode for accept Refactor io_accept() to support multishot mode. theoretical analysis: 1) when connections come in fast - singleshot: add accept sqe(userspace) --> accept inline ^ \| \|-----------------\| - multishot: add accept sqe(userspace) --> accept inline ^ \| \|----\| we do accept repeatedly in place until get EAGAIN 2) when connections come in at a low pressure similar thing like 1), we reduce a lot of userspace-kernel context switch and useless vfs_poll() tests: Did some tests, which goes in this way: server client(multiple) accept connect read write write read close close Basically, raise up a number of clients(on same machine with server) to connect to the server, and then write some data to it, the server will write those data back to the client after it receives them, and then close the connection after write return. Then the client will read the data and then close the connection. Here I test 10000 clients connect one server, data size 128 bytes. And each client has a go routine for it, so they come to the server in short time. test 20 times before/after this patchset, time spent:(unit cycle, which is the return value of clock()) before: 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075 +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984 +1934226+1914385)/20.0 = 1927633.75 after: 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112 +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506 +1871324+1940803)/20.0 = 1894750.45 (1927633.75 - 1894750.45) / 1927633.75 = 1.65% Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-5-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-14 08:34:22 -06:00
Hao Xu	dbc2564cfe	io_uring: let fast poll support multishot For operations like accept, multishot is a useful feature, since we can reduce a number of accept sqe. Let's integrate it to fast poll, it may be good for other operations in the future. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-4-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-14 08:34:22 -06:00
Hao Xu	227685ebfa	io_uring: add REQ_F_APOLL_MULTISHOT for requests Add a flag to indicate multishot mode for fast poll. currently only accept use it, but there may be more operations leveraging it in the future. Also add a mask IO_APOLL_MULTI_POLLED which stands for REQ_F_APOLL_MULTI \| REQ_F_POLLED, to make the code short and cleaner. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-3-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-14 08:34:22 -06:00
Dylan Yudaken	1b1d7b4bf1	io_uring: only wake when the correct events are set The check for waking up a request compares the poll_t bits, however this will always contain some common flags so this always wakes up. For files with single wait queues such as sockets this can cause the request to be sent to the async worker unnecesarily. Further if it is non-blocking will complete the request with EAGAIN which is not desired. Here exclude these common events, making sure to not exclude POLLERR which might be important. Fixes: `d7718a9d25` ("io_uring: use poll driven retry for files that support it") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220512091834.728610-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 14:37:50 -06:00
Pavel Begunkov	e0deb6a025	io_uring: avoid io-wq -EAGAIN looping for !IOPOLL If an opcode handler semi-reliably returns -EAGAIN, io_wq_submit_work() might continue busily hammer the same handler over and over again, which is not ideal. The -EAGAIN handling in question was put there only for IOPOLL, so restrict it to IOPOLL mode only where there is no other recourse than to retry as we cannot wait. Fixes: `def596e955` ("io_uring: support for IO polling") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f168b4f24181942f3614dd8ff648221736f572e6.1652433740.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:50:42 -06:00
Jens Axboe	a8da73a32b	io_uring: add flag for allocating a fully sparse direct descriptor space Currently to setup a fully sparse descriptor space upfront, the app needs to alloate an array of the full size and memset it to -1 and then pass that in. Make this a bit easier by allowing a flag that simply does this internally rather than needing to copy each slot separately. This works with IORING_REGISTER_FILES2 as the flag is set in struct io_uring_rsrc_register, and is only allow when the type is IORING_RSRC_FILE as this doesn't make sense for registered buffers. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:50 -06:00
Jens Axboe	09893e15f1	io_uring: bump max direct descriptor count to 1M We currently limit these to 32K, but since we're now backing the table space with vmalloc when needed, there's no reason why we can't make it bigger. The total space is limited by RLIMIT_NOFILE as well. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:46 -06:00
Jens Axboe	c30c3e00cb	io_uring: allow allocated fixed files for accept If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot, then that's a hint to allocate a fixed file descriptor rather than have one be passed in directly. This can be useful for having io_uring manage the direct descriptor space, and also allows multi-shot support to work with fixed files. Normal accept direct requests will complete with 0 for success, and < 0 in case of error. If io_uring is asked to allocated the direct descriptor, then the direct descriptor is returned in case of success. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:43 -06:00
Jens Axboe	1339f24b33	io_uring: allow allocated fixed files for openat/openat2 If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot, then that's a hint to allocate a fixed file descriptor rather than have one be passed in directly. This can be useful for having io_uring manage the direct descriptor space. Normal open direct requests will complete with 0 for success, and < 0 in case of error. If io_uring is asked to allocated the direct descriptor, then the direct descriptor is returned in case of success. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:42 -06:00
Jens Axboe	b70b8e3331	io_uring: add basic fixed file allocator Applications currently always pick where they want fixed files to go. In preparation for allowing these types of commands with multishot support, add a basic allocator in the fixed file table. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:40 -06:00
Jens Axboe	d78bd8adfc	io_uring: track fixed files with a bitmap In preparation for adding a basic allocator for direct descriptors, add helpers that set/clear whether a file slot is used. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:29 -06:00
Jens Axboe	7ccba24d3b	io_uring: don't clear req->kbuf when buffer selection is done It's not needed as the REQ_F_BUFFER_SELECTED flag tracks the state of whether or not kbuf is valid, so just drop it. Suggested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	1dbd023eb0	io_uring: eliminate the need to track provided buffer ID separately We have io_kiocb->buf_index which is used for either fixed buffers, or for provided buffers. For the latter, it's used to hold the buffer group ID for buffer selection. Post selection, req->kbuf->bid is used to get the buffer ID. Store the buffer ID, when selected, in req->buf_index. If we do end up recycling the buffer, reset it back to the buffer group ID. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	660cbfa234	io_uring: move provided buffer state closer to submit state The timeout and other items that follow are less hot, so let's move the provided buffer state above that. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	a4f8d94cfb	io_uring: move provided and fixed buffers into the same io_kiocb area These are mutually exclusive - if you use provided buffers, then you cannot use fixed buffers and vice versa. Move them into the same spot in the io_kiocb, which is also advantageous for provided buffers as they get near the submit side hot cacheline. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	149c69b04a	io_uring: abstract out provided buffer list selection In preparation for providing another way to select a buffer, move the existing logic into a helper. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	b66e65f414	io_uring: never call io_buffer_select() for a buffer re-select Callers already have room to store the addr and length information, clean it up by having the caller just assign the previously provided data. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	9cfc7e94e4	io_uring: get rid of hashed provided buffer groups Use a plain array for any group ID that's less than 64, and punt anything beyond that to an xarray. 64 fits in a page even for 4KB page sizes and with the planned additions. This makes the expected group usage faster by avoiding a hash and lookup to find our list, and it uses less memory upfront by not allocating any memory for provided buffers unless it's actually being used. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	4e90670252	io_uring: always use req->buf_index for the provided buffer group The read/write opcodes use it already, but the recv/recvmsg do not. If we switch them over and read and validate this at init time while we're checking if the opcode supports it anyway, then we can do it in one spot and we don't have to pass in a separate group ID for io_buffer_select(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	bb68d504f7	io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set There's no point in validity checking buf_index if the request doesn't have REQ_F_BUFFER_SELECT set, as we will never use it for that case. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	e5b003495e	io_uring: kill io_rw_buffer_select() wrapper After the recent changes, this is direct call to io_buffer_select() anyway. With this change, there are no wrappers left for provided buffer selection. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:06 -06:00
Jens Axboe	c54d52c2d6	io_uring: make io_buffer_select() return the user address directly There's no point in having callers provide a kbuf, we're just returning the address anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-09 06:29:02 -06:00
Jens Axboe	9396ed850f	io_uring: kill io_recv_buffer_select() wrapper It's just a thin wrapper around io_buffer_select(), get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-05 17:15:15 -06:00
Jens Axboe	0a352aaa94	io_uring: use 'sr' vs 'req->sr_msg' consistently For all of send/sendmsg and recv/recvmsg we have the local 'sr' variable, yet some cases still use req->sr_msg which sr points to. Use 'sr' consistently. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-05 17:14:33 -06:00
Jens Axboe	0455d4ccec	io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg If IORING_RECVSEND_POLL_FIRST is set for recv/recvmsg or send/sendmsg, then we arm poll first rather than attempt a receive or send upfront. This can be useful if we expect there to be no data (or space) available for the request, as we can then avoid wasting time on the initial issue attempt. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-05 17:09:31 -06:00
Jens Axboe	73911426aa	io_uring: check IOPOLL/ioprio support upfront Don't punt this check to the op prep handlers, add the support to io_op_defs and we can check them while setting up the request. This reduces the text size by 500 bytes on aarch64, and makes this less fragile by having the check in one spot and needing opcodes to opt in to IOPOLL or ioprio support. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-05 17:09:15 -06:00
Almog Khaikin	f2e030dd7a	io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread() The IORING_SQ_NEED_WAKEUP flag is now set using atomic_or() which implies a full barrier on some architectures but it is not required to do so. Use the more appropriate smp_mb__after_atomic() which avoids the extra barrier on those architectures. Signed-off-by: Almog Khaikin <almogkh@gmail.com> Link: https://lore.kernel.org/r/20220426163403.112692-1-almogkh@gmail.com Fixes: 8018823e6987 ("io_uring: serialize ctx->rings->sq_flags with atomic_or/and") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-30 08:39:54 -06:00
Jens Axboe	ef060ea9e4	io_uring: add IORING_SETUP_TASKRUN_FLAG If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the application can tell if task_work is pending in the kernel for this ring. This allows use cases like io_uring_peek_cqe() to still function appropriately, or for the task to know when it would be useful to call io_uring_wait_cqe() to run pending events. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220426014904.60384-7-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-30 08:39:54 -06:00
Jens Axboe	e1169f06d5	io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used If this is set, io_uring will never use an IPI to deliver a task_work notification. This can be used in the common case where a single task or thread communicates with the ring, and doesn't rely on io_uring_cqe_peek(). This provides a noticeable win in performance, both from eliminating the IPI itself, but also from avoiding interrupting the submitting task unnecessarily. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220426014904.60384-6-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-30 08:39:54 -06:00
Jens Axboe	9f010507bb	io_uring: set task_work notify method at init time While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that just does a task wakeup and then we can remove the special wakeup we have in task_work_add. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220426014904.60384-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-30 08:39:54 -06:00
Jens Axboe	3a4b89a25c	io_uring: serialize ctx->rings->sq_flags with atomic_or/and Rather than require ctx->completion_lock for ensuring that we don't clobber the flags, use the atomic bitop helpers instead. This removes the need to grab the completion_lock, in preparation for needing to set or clear sq_flags when we don't know the status of this lock. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220426014904.60384-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-30 08:39:54 -06:00
Jens Axboe	69cc1b6fa5	io_uring: fix compile warning for 32-bit builds If IO_URING_SCM_ALL isn't set, as it would not be on 32-bit builds, then we trigger a warning: fs/io_uring.c: In function '__io_sqe_files_unregister': fs/io_uring.c:8992:13: warning: unused variable 'i' [-Wunused-variable] 8992 \| int i; \| ^ Move the ifdef up to include the 'i' variable declaration. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `5e45690a1c` ("io_uring: store SCM state in io_fixed_file->file_ptr") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-25 16:43:45 -06:00
Dylan Yudaken	155bc9505d	io_uring: return an error when cqe is dropped Right now io_uring will not actively inform userspace if a CQE is dropped. This is extremely rare, requiring a CQ ring overflow, as well as a GFP_ATOMIC kmalloc failure. However the consequences could cause for example applications to go into an undefined state, possibly waiting for a CQE that never arrives. Return an error code (EBADR) in these cases. Since this is expected to be incredibly rare, try and avoid as much as possible affecting the hot code paths, and so it only is returned lazily and when there is no other available CQEs. Once the error is returned, reset the error condition assuming the user is either ok with it or will clean up appropriately. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-6-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-24 18:18:18 -06:00
Dylan Yudaken	10988a0a67	io_uring: use constants for cq_overflow bitfield Prepare to use this bitfield for more flags by using constants instead of magic value 0 Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-24 18:18:18 -06:00
Dylan Yudaken	3e813c9026	io_uring: rework io_uring_enter to simplify return value io_uring_enter returns the count submitted preferrably over an error code. In some code paths this check is not required, so reorganise the code so that the check is only done as needed. This is also a prep for returning error codes only in waiting scenarios. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-4-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-24 18:18:18 -06:00
Dylan Yudaken	08dcd0288f	io_uring: trace cqe overflows Trace cqe overflows in io_uring. Print ocqe before the check, so if it is NULL it indicates that it has been dropped. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220421091345.2115755-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-24 18:18:18 -06:00
Jens Axboe	10c873334f	io_uring: allow re-poll if we made progress We currently check REQ_F_POLLED before arming async poll for a notification to retry. If it's set, then we don't allow poll and will punt to io-wq instead. This is done to prevent a situation where a buggy driver will repeatedly return that there's space/data available yet we get -EAGAIN. However, if we already transferred data, then it should be safe to rely on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is also set. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-24 18:18:18 -06:00

1 2 3 4 5 ...

1967 Commits