linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	cec53f4c8d	Merge tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: - A single fix for over-eager retries for networking (Pavel) - Revert the notification slot support for zerocopy sends. It turns out that even after more than a year or development and testing, there's not full agreement on whether just using plain ordered notifications is Good Enough to avoid the complexity of using the notifications slots. Because of that, we decided that it's best left to a future final decision. We can always bring back this feature, but we can't really change it or remove it once we've released 6.0 with it enabled. The reverts leave the usual CQE notifications as the primary interface for knowing when data was sent, and when it was acked. (Pavel) * tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block: selftests/net: return back io_uring zc send tests io_uring/net: simplify zerocopy send user API io_uring/notif: remove notif registration Revert "io_uring: rename IORING_OP_FILES_UPDATE" Revert "io_uring: add zc notification flush requests" selftests/net: temporarily disable io_uring zc test io_uring/net: fix overexcessive retries	2022-09-02 16:37:01 -07:00
Pavel Begunkov	b48c312be0	io_uring/net: simplify zerocopy send user API Following user feedback, this patch simplifies zerocopy send API. One of the main complaints is that the current API is difficult with the userspace managing notification slots, and then send retries with error handling make it even worse. Instead of keeping notification slots change it to the per-request notifications model, which posts both completion and notification CQEs for each request when any data has been sent, and only one CQE if it fails. All notification CQEs will have IORING_CQE_F_NOTIF set and IORING_CQE_F_MORE in completion CQEs indicates whether to wait a notification or not. IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now. This is less flexible, but greatly simplifies the user API and also the kernel implementation. We reuse notif helpers in this patch, but in the future there won't be need for keeping two requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Pavel Begunkov	57f332246a	io_uring/notif: remove notif registration We're going to remove the userspace exposed zerocopy notification API, remove notification registration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Pavel Begunkov	d9808ceb31	Revert "io_uring: rename IORING_OP_FILES_UPDATE" This reverts commit `4379d5f15b`. We removed notification flushing, also cleanup uapi preparation changes to not pollute it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89edc3905350f91e1b6e26d9dbf42ee44fd451a2.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Pavel Begunkov	23c12d5fc0	Revert "io_uring: add zc notification flush requests" This reverts commit `492dddb4f6`. Soon we won't have the very notion of notification flushing, so remove notification flushing requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8850334ca56e65b413cb34fd158db81d7b2865a3.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Linus Torvalds	9c9d1896fa	Merge tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM support for IORING_OP_URING_CMD from Paul Moore: "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD. These are necessary as without them the IORING_OP_URING_CMD remains outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch, and my SELinux patch). They have been discussed at length with the io_uring folks, and Jens has given his thumbs-up on the relevant patches (see the commit descriptions). There is one patch that is not strictly necessary, but it makes testing much easier and is very trivial: the /dev/null IORING_OP_URING_CMD patch." * tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: Smack: Provide read control for io_uring_cmd /dev/null: add IORING_OP_URING_CMD support selinux: implement the security_uring_cmd() LSM hook lsm,io_uring: add LSM hooks for the new uring_cmd file op	2022-08-31 09:23:16 -07:00
Pavel Begunkov	dfb58b1796	io_uring/net: fix overexcessive retries Length parameter of io_sg_from_iter() can be smaller than the iterator's size, as it's with TCP, so when we set from->count at the end of the function we truncate the iterator forcing TCP to return preliminary with a short send. It affects zerocopy sends with large payload sizes and leads to retries and possible request failures. Fixes: `3ff1a0d395` ("io_uring: enable managed frags with register buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-26 10:31:42 -06:00
Luis Chamberlain	2a58401240	lsm,io_uring: add LSM hooks for the new uring_cmd file op io-uring cmd support was added through `ee692a21e9` ("fs,io_uring: add infrastructure for uring-cmd"), this extended the struct file_operations to allow a new command which each subsystem can use to enable command passthrough. Add an LSM specific for the command passthrough which enables LSMs to inspect the command details. This was discussed long ago without no clear pointer for something conclusive, so this enables LSMs to at least reject this new file operation. [0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com Cc: stable@vger.kernel.org Fixes: `ee692a21e9` ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Moore <paul@paul-moore.com>	2022-08-26 11:19:43 -04:00
Pavel Begunkov	581711c466	io_uring/net: save address for sendzc async execution We usually copy all bits that a request needs from the userspace for async execution, so the userspace can keep them on the stack. However, send zerocopy violates this pattern for addresses and may reloads it e.g. from io-wq. Save the address if any in ->async_data as usual. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com [axboe: fold in incremental fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-25 07:52:30 -06:00
Pavel Begunkov	5916943943	io_uring: conditional ->async_data allocation There are opcodes that need ->async_data only in some cases and allocation it unconditionally may hurt performance. Add an option to opdef to make move the allocation part from the core io_uring to opcode specific code. Note, we can't just set opdef->async_size to zero because there are other helpers that rely on it, e.g. io_alloc_async_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-24 08:57:28 -06:00
Pavel Begunkov	53bdc88aac	io_uring/notif: order notif vs send CQEs Currently, there is no ordering between notification CQEs and completions of the send flushing it, this quite complicates the userspace, especially since we don't flush notification when the send(+flush) request fails, i.e. there will be only one CQE. What we can do is to make sure that notification completions come only after sends. The easiest way to achieve this is to not try to complete a notification inline from io_sendzc() but defer it to task_work, considering that io-wq sendzc is disallowed CQEs will be naturally ordered because task_works will only be executed after we're done with submission and so inline completion. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cddfd1c2bf91f22b9fe08e13b7dffdd8f858a151.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-24 08:57:28 -06:00
Pavel Begunkov	986e263def	io_uring/net: fix indentation Fix up indentation before we get complaints from tooling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-24 08:57:15 -06:00
Pavel Begunkov	5a848b7c9e	io_uring/net: fix zc send link failing Failed requests should be marked with req_set_fail(), so links and cqe skipping work correctly, which is missing in io_sendzc(). Note, io_sendzc() return IOU_OK on failure, so the core code won't do the cleanup for us. Fixes: `06a5464be8` ("io_uring: wire send zc request type") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-24 08:57:00 -06:00
Pavel Begunkov	2cacedc873	io_uring/net: fix must_hold annotation Fix up the io_alloc_notif()'s __must_hold as we don't have a ctx argument there but should get it from the slot instead. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbb0a920f18e0aed590bf58300af817b9befb8a3.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-24 08:57:00 -06:00
Kanchan Joshi	a9c3eda7ea	io_uring: fix submission-failure handling for uring-cmd If ->uring_cmd returned an error value different from -EAGAIN or -EIOCBQUEUED, it gets overridden with IOU_OK. This invites trouble as caller (io_uring core code) handles IOU_OK differently than other error codes. Fix this by returning the actual error code. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-23 09:46:17 -06:00
Jens Axboe	47abea041f	io_uring: fix off-by-one in sync cancelation file check The passed in index should be validated against the number of registered files we have, it needs to be smaller than the index value to avoid going one beyond the end. Fixes: `78a861b949` ("io_uring: add sync cancelation API through io_uring_register()") Reported-by: Luo Likang <luolikang@nsfocus.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-23 07:26:08 -06:00
Pavel Begunkov	3f743e9bbb	io_uring/net: use right helpers for async_data There is another spot where we check ->async_data directly instead of using req_has_async_data(), which is the way to do it, fix it up. Fixes: `43e0bbbd0b` ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-18 07:27:20 -06:00
Pavel Begunkov	5993000dc6	io_uring/notif: raise limit on notification slots 1024 notification slots is rather an arbitrary value, raise it up, everything is accounted to memcg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eb78a0a5f2fa5941f8e845cdae5fb399bf7ba0be.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-15 21:34:00 -06:00
Pavel Begunkov	86dc8f23bb	io_uring/net: improve zc addr import error handling We may account memory to a memcg of a request that didn't even got to the network layer. It's not a bug as it'll be routinely cleaned up on flush, but it might be confusing for the userspace. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8aae61f4c3ddc4da97c1da876bb73871f352d50.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-15 21:34:00 -06:00
Pavel Begunkov	063604265f	io_uring/net: use right helpers for async recycle We have a helper that checks for whether a request contains anything in ->async_data or not, namely req_has_async_data(). It's better to use it as it might have some extra considerations. Fixes: `43e0bbbd0b` ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7414da4e7c3c32c31fc02dfd1355af4ccf4ca5f.1660566179.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-15 21:34:00 -06:00
Linus Torvalds	1da8cf961b	Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: - Regression fix for this merge window, fixing a wrong order of arguments for io_req_set_res() for passthru (Dylan) - Fix for the audit code leaking context memory (Peilin) - Ensure that provided buffers are memcg accounted (Pavel) - Correctly handle short zero-copy sends (Pavel) - Sparse warning fixes for the recvmsg multishot command (Dylan) - Error handling fix for passthru (Anuj) - Remove randomization of struct kiocb fields, to avoid it growing in size if re-arranged in such a fashion that it grows more holes or padding (Keith, Linus) - Small series improving type safety of the sqe fields (Stefan) * tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block: io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields io_uring: make io_kiocb_to_cmd() typesafe fs: don't randomize struct kiocb fields io_uring: consistently make use of io_notif_to_data() io_uring: fix error handling for io_uring_cmd io_uring: fix io_recvmsg_prep_multishot sparse warnings io_uring/net: send retry for zerocopy io_uring: mem-account pbuf buckets audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() io_uring: pass correct parameters to io_req_set_res	2022-08-13 13:28:54 -07:00
Stefan Metzmacher	9c71d39aa0	io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/ffcaf8dc4778db4af673822df60dbda6efdd3065.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-12 17:01:00 -06:00
Stefan Metzmacher	f2ccb5aed7	io_uring: make io_kiocb_to_cmd() typesafe We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-12 17:01:00 -06:00
Stefan Metzmacher	da2634e89c	io_uring: consistently make use of io_notif_to_data() This makes the assignment typesafe. It prepares changing io_kiocb_to_cmd() in the next commit. Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/8da6e9d12cf95ad4bc73274406d12bca7aabf72e.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-11 10:56:13 -06:00
Anuj Gupta	3ed159c984	io_uring: fix error handling for io_uring_cmd Commit `97b388d70b` ("io_uring: handle completions in the core") moved the error handling from handler to core. But for io_uring_cmd handler we end up completing more than once (both in handler and in core) leading to use_after_free. Change io_uring_cmd handler to avoid calling io_uring_cmd_done in case of error. Fixes: `97b388d70b` ("io_uring: handle completions in the core") Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220811091459.6929-1-anuj20.g@samsung.com [axboe: fix ret vs req typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-11 10:56:00 -06:00
Dylan Yudaken	d1f6222c49	io_uring: fix io_recvmsg_prep_multishot sparse warnings Fix casts missing the __user parts. This seemed to only cause errors on the alpha build, or if checked with sparse, but it was definitely an oversight. Reported-by: kernel test robot <lkp@intel.com> Fixes: `9bb66906f2` ("io_uring: support multishot in recvmsg") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220805115450.3921352-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-05 08:41:18 -06:00
Pavel Begunkov	4a933e6208	io_uring/net: send retry for zerocopy io_uring handles short sends/recvs for stream sockets when MSG_WAITALL is set, however new zerocopy send is inconsistent in this regard, which might be confusing. Handle short sends. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b876a4838597d9bba4f3215db60d72c33c448ad0.1659622472.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-04 08:35:16 -06:00
Pavel Begunkov	cc18cc5e82	io_uring: mem-account pbuf buckets Potentially, someone may create as many pbuf bucket as there are indexes in an xarray without any other restrictions bounding our memory usage, put memory needed for the buckets under memory accounting. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d34c452e45793e978d26e2606211ec9070d329ea.1659622312.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-04 08:35:07 -06:00
Peilin Ye	f482aa9865	audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() Currently @audit_context is allocated twice for io_uring workers: 1. copy_process() calls audit_alloc(); 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which is effectively audit_alloc()) and overwrites @audit_context, causing: BUG: memory leak unreferenced object 0xffff888144547400 (size 1024): <...> hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8135cfc3>] audit_alloc+0x133/0x210 [<ffffffff81239e63>] copy_process+0xcd3/0x2340 [<ffffffff8123b5f3>] create_io_thread+0x63/0x90 [<ffffffff81686604>] create_io_worker+0xb4/0x230 [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0 [<ffffffff8167663a>] io_queue_iowq+0xba/0x200 [<ffffffff816768b3>] io_queue_async+0x113/0x180 [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0 [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120 [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570 [<ffffffff81272c4e>] task_work_run+0x7e/0xc0 [<ffffffff8125a688>] get_signal+0xc18/0xf10 [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730 [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180 [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20 [<ffffffff844a7e80>] do_syscall_64+0x40/0x80 Then, 3. io_sq_thread() or io_wqe_worker() frees @audit_context using audit_free(); 4. do_exit() eventually calls audit_free() again, which is okay because audit_free() does a NULL check. As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and redundant audit_free() calls. Fixes: `5bd2182d58` ("audit,io_uring,io-wq: add some basic audit support to io_uring") Suggested-by: Paul Moore <paul@paul-moore.com> Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-04 08:33:54 -06:00
Linus Torvalds	5264406cdb	Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter updates from Al Viro: "Part 1 - isolated cleanups and optimizations. One of the goals is to reduce the overhead of using ->read_iter() and ->write_iter() instead of ->read()/->write(). new_sync_{read,write}() has a surprising amount of overhead, in particular inside iocb_flags(). That's the explanation for the beginning of the series is in this pile; it's not directly iov_iter-related, but it's a part of the same work..." * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: first_iovec_segment(): just return address iov_iter: massage calling conventions for first_{iovec,bvec}_segment() iov_iter: first_{iovec,bvec}_segment() - simplify a bit iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT iov_iter_bvec_advance(): don't bother with bvec_iter copy_page_{to,from}_iter(): switch iovec variants to generic keep iocb_flags() result cached in struct file iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC struct file: use anonymous union member for rcuhead and llist btrfs: use IOMAP_DIO_NOSYNC teach iomap_dio_rw() to suppress dsync No need of likely/unlikely on calls of check_copy_size()	2022-08-03 13:50:22 -07:00
Ming Lei	ff2557b722	io_uring: pass correct parameters to io_req_set_res The two parameters of 'res' and 'cflags' are swapped, so fix it. Without this fix, 'ublk del' hangs forever. Cc: Pavel Begunkov <asml.silence@gmail.com> Fixes: `de23077eda` ("io_uring: set completion results upfront") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220803120757.1668278-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-03 08:45:25 -06:00
Linus Torvalds	42df1cbf6a	Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block Pull io_uring zerocopy support from Jens Axboe: "This adds support for efficient support for zerocopy sends through io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and UDP. The core network changes to support this is in a stable branch from Jakub that both io_uring and net-next has pulled in, and the io_uring changes are layered on top of that. All of the work has been done by Pavel" * tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits) io_uring: notification completion optimisation io_uring: export req alloc from core io_uring/net: use unsigned for flags io_uring/net: make page accounting more consistent io_uring/net: checks errors of zc mem accounting io_uring/net: improve io_get_notif_slot types selftests/io_uring: test zerocopy send io_uring: enable managed frags with register buffers io_uring: add zc notification flush requests io_uring: rename IORING_OP_FILES_UPDATE io_uring: flush notifiers after sendzc io_uring: sendzc with fixed buffers io_uring: allow to pass addr into sendzc io_uring: account locked pages for non-fixed zc io_uring: wire send zc request type io_uring: add notification slot registration io_uring: add rsrc referencing for notifiers io_uring: complete notifiers in tw io_uring: cache struct io_notif io_uring: add zc notification infrastructure ...	2022-08-02 13:37:55 -07:00
Pavel Begunkov	14b146b688	io_uring: notification completion optimisation We want to use all optimisations that we have for io_uring requests like completion batching, memory caching and more but for zc notifications. Fortunately, notification perfectly fit the request model so we can overlay them onto struct io_kiocb and use all the infratructure. Most of the fields of struct io_notif natively fits into io_kiocb, so we replace struct io_notif with struct io_kiocb carrying struct io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand coded caching. __io_notif_complete_tw() is converted to use io_uring's tw infra. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-27 08:50:50 -06:00
Pavel Begunkov	bd1a3783dd	io_uring: export req alloc from core We want to do request allocation out of the core io_uring code, make the allocation functions public for other io_uring parts. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0314fedd3a02a514210ba42d4720332538c65956.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-27 08:50:50 -06:00
Pavel Begunkov	293402e564	io_uring/net: use unsigned for flags Use unsigned int type for msg flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5cfaed13d3191337b14b8664ca68b515d9e2d1b4.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-25 09:48:25 -06:00
Pavel Begunkov	6a9ce66f4d	io_uring/net: make page accounting more consistent Make network page accounting more consistent with how buffer registration is working, i.e. account all memory to ctx->user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4aacfe64bbb81b27f9ecf5d5c219c69a07e5aa56.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-25 09:48:25 -06:00
Pavel Begunkov	2e32ba5607	io_uring/net: checks errors of zc mem accounting mm_account_pinned_pages() may fail, don't ignore the return value. Fixes: `e29e3bd4b9` ("io_uring: account locked pages for non-fixed zc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dae0542ed8e6706071bb83ad3e7ad6a70d207fd9.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-25 09:48:17 -06:00
Pavel Begunkov	cb309ae49d	io_uring/net: improve io_get_notif_slot types Don't use signed int for slot indexing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4d15aefebb5e55729dd9b5ec01ab16b70033343.1658742118.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-25 09:47:45 -06:00
Pavel Begunkov	3ff1a0d395	io_uring: enable managed frags with register buffers io_uring's registered buffers infra has a good performant way of pinning pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely register buffer backed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	492dddb4f6	io_uring: add zc notification flush requests Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df13e2363400682a73dd9e71c3b990b8d1ff0333.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	4379d5f15b	io_uring: rename IORING_OP_FILES_UPDATE IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0a907133907d9af3415a8a7aa1802c6aa97c03c6.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	63809137eb	io_uring: flush notifiers after sendzc Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	10c7d33ecd	io_uring: sendzc with fixed buffers Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com [axboe: fold in 32-bit pointer cast warning fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	092aeedb75	io_uring: allow to pass addr into sendzc Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	e29e3bd4b9	io_uring: account locked pages for non-fixed zc Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	06a5464be8	io_uring: wire send zc request type Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a80387c6a68ce9cf99b3b6ef6f71068468761fb7.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	bc24d6bd32	io_uring: add notification slot registration Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a0aa8161fe3ebb2a4cc6e5dbd0cffb96e6881cf5.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	68ef5578ef	io_uring: add rsrc referencing for notifiers In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cd7a01d26837945b6982fa9cf15a63230f2ed4f.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	e58d498e81	io_uring: complete notifiers in tw We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/089799ab665b10b78fdc614ae6d59fa7ef0d5f91.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	eb4a299b2f	io_uring: cache struct io_notif kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dec18f7fcbab9f4bd40b96e5ae158b119945230.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00

1 2 3 4 5

224 Commits