linux

iv/linux

Author	SHA1	Message	Date
Ming Lei	860e1c7f8b	io_uring: complete request via task work in case of DEFER_TASKRUN So far io_req_complete_post() only covers DEFER_TASKRUN by completing request via task work when the request is completed from IOWQ. However, uring command could be completed from any context, and if io uring is setup with DEFER_TASKRUN, the command is required to be completed from current context, otherwise wait on IORING_ENTER_GETEVENTS can't be wakeup, and may hang forever. The issue can be observed on removing ublk device, but turns out it is one generic issue for uring command & DEFER_TASKRUN, so solve it in io_uring core code. Fixes: e6aeb2721d3b ("io_uring: complete all requests in task context") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/ Reported-by: Jens Axboe <axboe@kernel.dk> Cc: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-14 06:38:23 -06:00
Jakub Kicinski	800e68c44f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: tools/testing/selftests/net/config 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 3a0385be133e ("selftests: add the missing CONFIG_IP_SCTP in net config") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 16:04:28 -07:00
Pavel Begunkov	d581076b6a	io_uring/rsrc: extract SCM file put helper SCM file accounting is a slow path and is only used for UNIX files. Extract a helper out of io_rsrc_file_put() that does the SCM unaccounting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/58cc7bffc2ee96bec8c2b89274a51febcbfa5556.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	2933ae6eaa	io_uring/rsrc: refactor io_rsrc_node_switch We use io_rsrc_node_switch() coupled with io_rsrc_node_switch_start() for a bunch of cases including initialising ctx->rsrc_node, i.e. by passing NULL instead of rsrc_data. Leave it to only deal with actual node changing. For that, first remove it from io_uring_create() and add a function allocating the first node. Then also remove all calls to io_rsrc_node_switch() from files/buffers register as we already have a node installed and it does essentially nothing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d146fe306ff98b1a5a60c997c252534f03d423d7.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	13c223962e	io_uring/rsrc: zero node's rsrc data on alloc struct io_rsrc_node::rsrc_data field is initialised on rsrc removal and shouldn't be used before that, still let's play safe and zero the field on alloc. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/09bd03cedc8da8a7974c5e6e4bf0489fd16593ab.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	528407b1e0	io_uring/rsrc: consolidate node caching We store one pre-allocated rsrc node in ->rsrc_backup_node, merge it with ->rsrc_node_cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6d5410e51ccd29be7a716be045b51d6b371baef6.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	786788a8cf	io_uring/rsrc: add lockdep checks Add a lockdep chek to make sure that file and buffer updates hold ->uring_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/961bbe6e433ec9bc0375127f23468b37b729df99.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	8ce4269eee	io_uring: add irq lockdep checks We don't post CQEs from the IRQ context, add a check catching that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f23f7a24dbe8027b3d37873fece2b6488f878b31.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	ceac766a55	io_uring/kbuf: remove extra ->buf_ring null check The kernel test robot complains about __io_remove_buffers(). io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced before check 'bl->buf_ring' (see line 219) That check is not needed as ->buf_ring will always be set, so we can remove it and so silence the warning. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9a632bbf749d9d911e605255652ce08d18e7d2c6.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Pavel Begunkov	8b1df11f97	io_uring: shut io_prep_async_work warning io_uring/io_uring.c:432 io_prep_async_work() error: we previously assumed 'req->file' could be null (see line 425). Even though it's a false positive as there will not be REQ_F_ISREG set without a file, let's add a simple check to make the kernel test robot happy. We don't care about performance here, but assumingly it'll be optimised out by the compiler. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6cfbe92c74b789c0b4f046f7f98d19b1ca2e5b7.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:09:41 -06:00
Jens Axboe	27a67079c0	io_uring/uring_cmd: take advantage of completion batching We know now what the completion context is for the uring_cmd completion handling, so use that to have io_req_task_complete() decide what the best way to complete the request is. This allows batching of the posted completions if we have multiple pending, rather than always doing them one-by-one. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-12 12:07:36 -06:00
Linus Torvalds	d3f05a4c42	io_uring-6.3-2023-04-06 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmQvSOkQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgptl3D/4hgUAWqfzPq4aBz/bzPLetrb+49+zyw865 Pva8XPrVaY6qDnkb4O2XGy3KkDo2LRgSjLc1tEqrwbgbpQL7ZGowoxx9/eo4FUJ+ sgMAwP4pAn4iJjiAJVA4R4cSN72fibgDlHqeZ7YrcQ9sYtgSgCcw1mAyLsA19d7i lHM6QW4KkQxL7P21rl9IjtpoTN/uYVl8CXIeJ5/VCnRYS5kigEzNKDbvLxx0OQgx Eca/qPuhv4k6RFu2WrhGKw/Xjbq3J1FzGFwKb8JPGdF8JrZpZAKkijWMlhlNJosL CG2x1x5eVHFTnDzqTQU8GZmkFSmmUeTa6I5p6IEHsvzb3fCsOxAZJAEL8bI9n/vP pQ72XvGP+dP6iN/2A5Yii2DVp2ERoyDiLtKh6kLNzpLQ15796nCJ4PmX1483lCIO ktMymI0gydRCW5Mhx7HBLK+5pZmcUUobNRcwOs8So2/eaOVlsPp3V1y/OeNMpKZG vrcWSad84+pdmEnoZsyKOeGOF5H8lqMJjQ4edSrx2xg/eo1/yATm+vziVegPsFj5 xYXKS3Wno5EhFc/IWn14ThiEFVku+HAFKXors8P+5P6YKjR4F2699u7GvnwFQ2GF oOc8lCesewftnxxJ4gbPpgfaf/6tctE4M0BoR3JLLC65EriAwnqRh+OiIK0nkABr TaRpCqTIhg== =32lQ -----END PGP SIGNATURE----- Merge tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: "Just two minor fixes for provided buffers - one where we could potentially leak a buffer, and one where the returned values was off-by-one in some cases" * tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux: io_uring: fix memory leak when removing provided buffers io_uring: fix return value when removing provided buffers	2023-04-08 11:34:17 -07:00
Pavel Begunkov	360cd42c4e	io_uring: optimise io_req_local_work_add Chains of memory accesses are never good for performance. The req->task->io_uring->in_cancel in io_req_local_work_add() is there so that when a task is exiting via io_uring_try_cancel_requests() and starts waiting for completions, it gets woken up by every new task_work item queued. Do a little trick by announcing waiting in io_uring_try_cancel_requests(), making io_req_local_work_add() wake us up. We also need to check for deferred tw items after prepare_to_wait(TASK_INTERRUPTIBLE); Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fb11597e9bbcb365901824f8c5c2cf0d6ee100d0.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:24:36 -06:00
Pavel Begunkov	c66ae3ec38	io_uring: refactor __io_cq_unlock_post_flush() Separate ->task_complete path in __io_cq_unlock_post_flush(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/baa9b8d822f024e4ee01c40209dbbe38d9c8c11d.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	8751d15426	io_uring: reduce scheduling due to tw Every task_work will try to wake the task to be executed, which causes excessive scheduling and additional overhead. For some tw it's justified, but others won't do much but post a single CQE. When a task waits for multiple cqes, every such task_work will wake it up. Instead, the task may give a hint about how many cqes it waits for, io_req_local_work_add() will compare against it and skip wake ups if #cqes + #tw is not enough to satisfy the waiting condition. Task_work that uses the optimisation should be simple enough and never post more than one CQE. It's also ignored for non DEFER_TASKRUN rings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	5150940079	io_uring: inline llist_add() We'll need to grab some information from the previous request in the tw list, inline llist_add(), it'll be used in the following patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0165493af7b379943c792114b972f331e7d7d10.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	8501fe70ae	io_uring: add tw add flags We pass 'allow_local' into io_req_task_work_add() but will need more flags. Replace it with a flags bit field and name this allow_local flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4c0f01e7ef4e6feebfb199093cc995af7a19befa.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	6e7248adf8	io_uring: refactor io_cqring_wake() Instead of smp_mb() + __io_cqring_wake() in __io_cq_unlock_post_flush() use equivalent io_cqring_wake(). With that we can clean it up further and remove __io_cqring_wake(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/662ee5d898168ac206be06038525e97b64072a46.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:28 -06:00
Pavel Begunkov	d73a572df2	io_uring: optimize local tw add ctx pinning We currently pin the ctx for io_req_local_work_add() with percpu_ref_get/put, which implies two rcu_read_lock/unlock pairs and some extra overhead on top in the fast path. Replace it with a pure rcu read and let io_ring_exit_work() synchronise against it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbdfcb6b232627f30e9e50ef91f13c4f05910247.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:10 -06:00
Pavel Begunkov	ab1c590f5c	io_uring: move pinning out of io_req_local_work_add Move ctx pinning from io_req_local_work_add() to the caller, looks better and makes working with the code a bit easier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/49c0dbed390b0d6d04cb942dd3592879fd5bfb1b.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:22:07 -06:00
Jakub Kicinski	d9c960675a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/google/gve/gve.h 3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts") 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/ https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/ Adjacent changes: net/can/isotp.c 051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") 96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-06 12:01:20 -07:00
Jens Axboe	758d5d64b6	io_uring/uring_cmd: assign ioucmd->cmd at async prep time Rather than check this in the fast path issue, it makes more sense to just assign the copy of the data when we're setting it up anyway. This makes the code a bit cleaner, and removes the need for this check in the issue path. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 09:30:18 -06:00
Pavel Begunkov	69bbc6ade9	io_uring/rsrc: add custom limit for node caching The number of entries in the rsrc node cache is limited to 512, which still seems unnecessarily large. Add per cache thresholds and set to to 32 for the rsrc node cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d0cd538b944dac0bf878e276fc0199f21e6bccea.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	757ef4682b	io_uring/rsrc: optimise io_rsrc_data refcounting Every struct io_rsrc_node takes a struct io_rsrc_data reference, which means all rsrc updates do 2 extra atomics. Replace atomics refcounting with a int as it's all done under ->uring_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e73c3d6820cf679532696d790b5b8fae23537213.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	1f2c8f610a	io_uring/rsrc: add lockdep sanity checks We should hold ->uring_lock while putting nodes with io_put_rsrc_node(), add a lockdep check for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b50d5f156ac41450029796738c1dfd22a521df7a.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	9eae8655f9	io_uring/rsrc: cache struct io_rsrc_node Add allocation cache for struct io_rsrc_node, it's always allocated and put under ->uring_lock, so it doesn't need any extra synchronisation around caches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	36b9818a5a	io_uring/rsrc: don't offload node free struct delayed_work rsrc_put_work was previously used to offload node freeing because io_rsrc_node_ref_zero() was previously called by RCU in the IRQ context. Now, as percpu refcounting is gone, we can do it eagerly at the spot without pushing it to a worker. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	ff7c75ecaa	io_uring/rsrc: optimise io_rsrc_put allocation Every io_rsrc_node keeps a list of items to put, and all entries are kmalloc()'ed. However, it's quite often to queue up only one entry per node, so let's add an inline entry there to avoid extra allocations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c482c1c652c45c85ac52e67c974bc758a50fed5f.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	c824986c11	io_uring/rsrc: rename rsrc_list We have too many "rsrc" around which makes the name of struct io_rsrc_node::rsrc_list confusing. The field is responsible for keeping a list of files or buffers, so call it item_list and add comments around. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3e34d4dfc1fdbb6b520f904ee6187c2ccf680efe.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	0a4813b1ab	io_uring/rsrc: kill rsrc_ref_lock We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means io_rsrc_node_ref_zero() is not executed from the irq context as an RCU callback anymore, and we also put it under ->uring_lock. io_rsrc_node_switch(), which queues up nodes into the list, is also protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	ef8ae64ffa	io_uring/rsrc: protect node refs with uring_lock Currently, for nodes we have an atomic counter and some cached (non-atomic) refs protected by uring_lock. Let's put all ref manipulations under uring_lock and get rid of the atomic part. It's free as in all cases we care about we already hold the lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	03adabe81a	io_uring: io_free_req() via tw io_free_req() is not often used but nevertheless problematic as there is no way to know the current context, it may be used from the submission path or even by an irq handler. Push it to a fresh context using task_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3a92fe80bb068757e51aaa0b105cfbe8f5dfee9e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	2ad4c6d080	io_uring: don't put nodes under spinlocks io_req_put_rsrc() doesn't need any locking, so move it out of a spinlock section in __io_req_complete_post() and adjust helpers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d5b87a5f31270dade6805f7acafc4cc34b84b241.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	8e15c0e71b	io_uring/rsrc: keep cached refs per node We cache refs of the current node (i.e. ctx->rsrc_node) in ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the cached refs in struct io_rsrc_node for now. It's a prep patch and shouldn't change anything in practise. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	b8fb5b4fdd	io_uring/rsrc: use non-pcpu refcounts for nodes One problem with the current rsrc infra is that often updates will generates lots of rsrc nodes, each carry pcpu refs. That takes quite a lot of memory, especially if there is a stall, and takes lots of CPU cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark updating files in a loop. Replace pcpu refs with normal refcounting. There is already a hot path avoiding atomics / refs, but following patches will further improve it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Jens Axboe	e3ef728ff0	io_uring: cap io_sqring_entries() at SQ ring size We already do this manually for the !SQPOLL case, do it in general and we can also dump the ugly min3() in io_submit_sqes(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Jens Axboe	2ad57931db	io_uring: rename trace_io_uring_submit_sqe() tracepoint It has nothing to do with the SQE at this point, it's a request submission. While in there, get rid of the 'force_nonblock' argument which is also dead, as we only pass in true. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Pavel Begunkov	a282967c84	io_uring: encapsulate task_work state For task works we're passing around a bool pointer for whether the current ring is locked or not, let's wrap it in a structure, that will make it more opaque preventing abuse and will also help us to pass more info in the future if needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Pavel Begunkov	13bfa6f15d	io_uring: remove extra tw trylocks Before cond_resched()'ing in handle_tw_list() we also drop the current ring context, and so the next loop iteration will need to pick/pin a new context and do trylock. The chunk removed by this patch was intended to be an optimisation covering exactly this case, i.e. retaking the lock after reschedule, but in reality it's skipped for the first iteration after resched as described and will keep hammering the lock if it's contended. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Jens Axboe	07d99096e1	io_uring/io-wq: drop outdated comment Since the move to PF_IO_WORKER, we don't juggle memory context manually anymore. Remove that outdated part of the comment for __io_worker_idle(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Gabriel Krisman Bertazi	eb47943f22	io-wq: Drop struct io_wqe Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a single io_wqe instance embedded per io_wq. Drop the extra structure in favor of accessing struct io_wq directly, cleaning up quite a bit of dereferences and backpointers. No functional changes intended. Tested with liburing's testsuite and mmtests performance microbenchmarks. I didn't observe any performance regressions. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Gabriel Krisman Bertazi	dfd63baf89	io-wq: Move wq accounting to io_wq Since we now have a single io_wqe per io_wq instead of per-node, and in preparation to its removal, move the accounting into the parent structure. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Jens Axboe	fcb46c0ccc	io_uring/kbuf: disallow mapping a badly aligned provided ring buffer On at least parisc, we have strict requirements on how we virtually map an address that is shared between the application and the kernel. On these platforms, IOU_PBUF_RING_MMAP should be used when setting up a shared ring buffer for provided buffers. If the application is mapping these pages and asking the kernel to pin+map them as well, then we have no control over what virtual address we get in the kernel. For that case, do a sanity check if SHM_COLOUR is defined, and disallow the mapping request. The application must fall back to using IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently with the set of helpers that it has. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Breno Leitao	e1fe7ee885	io_uring: Add KASAN support for alloc_caches Add support for KASAN in the alloc_caches (apoll and netmsg_cache). Thus, if something touches the unused caches, it will raise a KASAN warning/exception. It poisons the object when the object is put to the cache, and unpoisons it when the object is gotten or freed. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Breno Leitao	efba1a9e65	io_uring: Move from hlist to io_wq_work_node Having cache entries linked using the hlist format brings no benefit, and also requires an unnecessary extra pointer address per cache entry. Use the internal io_wq_work_node single-linked list for the internal alloc caches (async_msghdr and async_poll) This is required to be able to use KASAN on cache entries, since we do not need to touch unused (and poisoned) cache entries when adding more entries to the list. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:12 -06:00
Breno Leitao	da64d6db3b	io_uring: One wqe per wq Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now bound to a task, the task basically uses only the NUMA local io_wqe, and almost never changes NUMA nodes, thus, the other wqes are mostly unused. Allocate just one io_wqe embedded into io_wq, and uses all possible cpus (cpu_possible_mask) in the io_wqe->cpumask. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Jens Axboe	c56e022c0a	io_uring: add support for user mapped provided buffer ring The ring mapped provided buffer rings rely on the application allocating the memory for the ring, and then the kernel will map it. This generally works fine, but runs into issues on some architectures where we need to be able to ensure that the kernel and application virtual address for the ring play nicely together. This at least impacts architectures that set SHM_COLOUR, but potentially also anyone setting SHMLBA. To use this variant of ring provided buffers, the application need not allocate any memory for the ring. Instead the kernel will do so, and the allocation must subsequently call mmap(2) on the ring with the offset set to: IORING_OFF_PBUF_RING \| (bgid << IORING_OFF_PBUF_SHIFT) to get a virtual address for the buffer ring. Normally the application would allocate a suitable piece of memory (and correctly aligned) and simply pass that in via io_uring_buf_reg.ring_addr and the kernel would map it. Outside of the setup differences, the kernel allocate + user mapped provided buffer ring works exactly the same. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Jens Axboe	81cf17cd3a	io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags' In preparation for allowing flags to be set for registration, rename the padding and use it for that. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Jens Axboe	25a2c188a0	io_uring/kbuf: add buffer_list->is_mapped member Rather than rely on checking buffer_list->buf_pages or ->buf_nr_pages, add a separate member that tracks if this is a ring mapped provided buffer list or not. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Jens Axboe	ba56b63242	io_uring/kbuf: move pinning of provided buffer ring into helper In preparation for allowing the kernel to allocate the provided buffer rings and have the application mmap it instead, abstract out the current method of pinning and mapping the user allocated ring. No functional changes intended in this patch. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00

... 3 4 5 6 7 ...

785 Commits