linux

iv/linux

Author	SHA1	Message	Date
Pavel Begunkov	d73a572df2	io_uring: optimize local tw add ctx pinning We currently pin the ctx for io_req_local_work_add() with percpu_ref_get/put, which implies two rcu_read_lock/unlock pairs and some extra overhead on top in the fast path. Replace it with a pure rcu read and let io_ring_exit_work() synchronise against it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbdfcb6b232627f30e9e50ef91f13c4f05910247.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:23:10 -06:00
Pavel Begunkov	ab1c590f5c	io_uring: move pinning out of io_req_local_work_add Move ctx pinning from io_req_local_work_add() to the caller, looks better and makes working with the code a bit easier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/49c0dbed390b0d6d04cb942dd3592879fd5bfb1b.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:22:07 -06:00
Jakub Kicinski	d9c960675a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/google/gve/gve.h `3ce9345580` ("gve: Secure enough bytes in the first TX desc for all TCP pkts") `75eaae158b` ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/ https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/ Adjacent changes: net/can/isotp.c `051737439e` ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") `96d1c81e6a` ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-06 12:01:20 -07:00
Jens Axboe	758d5d64b6	io_uring/uring_cmd: assign ioucmd->cmd at async prep time Rather than check this in the fast path issue, it makes more sense to just assign the copy of the data when we're setting it up anyway. This makes the code a bit cleaner, and removes the need for this check in the issue path. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 09:30:18 -06:00
Pavel Begunkov	69bbc6ade9	io_uring/rsrc: add custom limit for node caching The number of entries in the rsrc node cache is limited to 512, which still seems unnecessarily large. Add per cache thresholds and set to to 32 for the rsrc node cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d0cd538b944dac0bf878e276fc0199f21e6bccea.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	757ef4682b	io_uring/rsrc: optimise io_rsrc_data refcounting Every struct io_rsrc_node takes a struct io_rsrc_data reference, which means all rsrc updates do 2 extra atomics. Replace atomics refcounting with a int as it's all done under ->uring_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e73c3d6820cf679532696d790b5b8fae23537213.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	1f2c8f610a	io_uring/rsrc: add lockdep sanity checks We should hold ->uring_lock while putting nodes with io_put_rsrc_node(), add a lockdep check for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b50d5f156ac41450029796738c1dfd22a521df7a.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	9eae8655f9	io_uring/rsrc: cache struct io_rsrc_node Add allocation cache for struct io_rsrc_node, it's always allocated and put under ->uring_lock, so it doesn't need any extra synchronisation around caches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	36b9818a5a	io_uring/rsrc: don't offload node free struct delayed_work rsrc_put_work was previously used to offload node freeing because io_rsrc_node_ref_zero() was previously called by RCU in the IRQ context. Now, as percpu refcounting is gone, we can do it eagerly at the spot without pushing it to a worker. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	ff7c75ecaa	io_uring/rsrc: optimise io_rsrc_put allocation Every io_rsrc_node keeps a list of items to put, and all entries are kmalloc()'ed. However, it's quite often to queue up only one entry per node, so let's add an inline entry there to avoid extra allocations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c482c1c652c45c85ac52e67c974bc758a50fed5f.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	c824986c11	io_uring/rsrc: rename rsrc_list We have too many "rsrc" around which makes the name of struct io_rsrc_node::rsrc_list confusing. The field is responsible for keeping a list of files or buffers, so call it item_list and add comments around. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3e34d4dfc1fdbb6b520f904ee6187c2ccf680efe.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	0a4813b1ab	io_uring/rsrc: kill rsrc_ref_lock We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means io_rsrc_node_ref_zero() is not executed from the irq context as an RCU callback anymore, and we also put it under ->uring_lock. io_rsrc_node_switch(), which queues up nodes into the list, is also protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	ef8ae64ffa	io_uring/rsrc: protect node refs with uring_lock Currently, for nodes we have an atomic counter and some cached (non-atomic) refs protected by uring_lock. Let's put all ref manipulations under uring_lock and get rid of the atomic part. It's free as in all cases we care about we already hold the lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	03adabe81a	io_uring: io_free_req() via tw io_free_req() is not often used but nevertheless problematic as there is no way to know the current context, it may be used from the submission path or even by an irq handler. Push it to a fresh context using task_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3a92fe80bb068757e51aaa0b105cfbe8f5dfee9e.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	2ad4c6d080	io_uring: don't put nodes under spinlocks io_req_put_rsrc() doesn't need any locking, so move it out of a spinlock section in __io_req_complete_post() and adjust helpers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d5b87a5f31270dade6805f7acafc4cc34b84b241.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	8e15c0e71b	io_uring/rsrc: keep cached refs per node We cache refs of the current node (i.e. ctx->rsrc_node) in ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the cached refs in struct io_rsrc_node for now. It's a prep patch and shouldn't change anything in practise. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Pavel Begunkov	b8fb5b4fdd	io_uring/rsrc: use non-pcpu refcounts for nodes One problem with the current rsrc infra is that often updates will generates lots of rsrc nodes, each carry pcpu refs. That takes quite a lot of memory, especially if there is a stall, and takes lots of CPU cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark updating files in a loop. Replace pcpu refs with normal refcounting. There is already a hot path avoiding atomics / refs, but following patches will further improve it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-04 09:30:39 -06:00
Jens Axboe	e3ef728ff0	io_uring: cap io_sqring_entries() at SQ ring size We already do this manually for the !SQPOLL case, do it in general and we can also dump the ugly min3() in io_submit_sqes(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Jens Axboe	2ad57931db	io_uring: rename trace_io_uring_submit_sqe() tracepoint It has nothing to do with the SQE at this point, it's a request submission. While in there, get rid of the 'force_nonblock' argument which is also dead, as we only pass in true. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Pavel Begunkov	a282967c84	io_uring: encapsulate task_work state For task works we're passing around a bool pointer for whether the current ring is locked or not, let's wrap it in a structure, that will make it more opaque preventing abuse and will also help us to pass more info in the future if needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Pavel Begunkov	13bfa6f15d	io_uring: remove extra tw trylocks Before cond_resched()'ing in handle_tw_list() we also drop the current ring context, and so the next loop iteration will need to pick/pin a new context and do trylock. The chunk removed by this patch was intended to be an optimisation covering exactly this case, i.e. retaking the lock after reschedule, but in reality it's skipped for the first iteration after resched as described and will keep hammering the lock if it's contended. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Jens Axboe	07d99096e1	io_uring/io-wq: drop outdated comment Since the move to PF_IO_WORKER, we don't juggle memory context manually anymore. Remove that outdated part of the comment for __io_worker_idle(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Gabriel Krisman Bertazi	eb47943f22	io-wq: Drop struct io_wqe Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a single io_wqe instance embedded per io_wq. Drop the extra structure in favor of accessing struct io_wq directly, cleaning up quite a bit of dereferences and backpointers. No functional changes intended. Tested with liburing's testsuite and mmtests performance microbenchmarks. I didn't observe any performance regressions. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:15 -06:00
Gabriel Krisman Bertazi	dfd63baf89	io-wq: Move wq accounting to io_wq Since we now have a single io_wqe per io_wq instead of per-node, and in preparation to its removal, move the accounting into the parent structure. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Jens Axboe	fcb46c0ccc	io_uring/kbuf: disallow mapping a badly aligned provided ring buffer On at least parisc, we have strict requirements on how we virtually map an address that is shared between the application and the kernel. On these platforms, IOU_PBUF_RING_MMAP should be used when setting up a shared ring buffer for provided buffers. If the application is mapping these pages and asking the kernel to pin+map them as well, then we have no control over what virtual address we get in the kernel. For that case, do a sanity check if SHM_COLOUR is defined, and disallow the mapping request. The application must fall back to using IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently with the set of helpers that it has. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Breno Leitao	e1fe7ee885	io_uring: Add KASAN support for alloc_caches Add support for KASAN in the alloc_caches (apoll and netmsg_cache). Thus, if something touches the unused caches, it will raise a KASAN warning/exception. It poisons the object when the object is put to the cache, and unpoisons it when the object is gotten or freed. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:14 -06:00
Breno Leitao	efba1a9e65	io_uring: Move from hlist to io_wq_work_node Having cache entries linked using the hlist format brings no benefit, and also requires an unnecessary extra pointer address per cache entry. Use the internal io_wq_work_node single-linked list for the internal alloc caches (async_msghdr and async_poll) This is required to be able to use KASAN on cache entries, since we do not need to touch unused (and poisoned) cache entries when adding more entries to the list. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:16:12 -06:00
Breno Leitao	da64d6db3b	io_uring: One wqe per wq Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now bound to a task, the task basically uses only the NUMA local io_wqe, and almost never changes NUMA nodes, thus, the other wqes are mostly unused. Allocate just one io_wqe embedded into io_wq, and uses all possible cpus (cpu_possible_mask) in the io_wqe->cpumask. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Jens Axboe	c56e022c0a	io_uring: add support for user mapped provided buffer ring The ring mapped provided buffer rings rely on the application allocating the memory for the ring, and then the kernel will map it. This generally works fine, but runs into issues on some architectures where we need to be able to ensure that the kernel and application virtual address for the ring play nicely together. This at least impacts architectures that set SHM_COLOUR, but potentially also anyone setting SHMLBA. To use this variant of ring provided buffers, the application need not allocate any memory for the ring. Instead the kernel will do so, and the allocation must subsequently call mmap(2) on the ring with the offset set to: IORING_OFF_PBUF_RING \| (bgid << IORING_OFF_PBUF_SHIFT) to get a virtual address for the buffer ring. Normally the application would allocate a suitable piece of memory (and correctly aligned) and simply pass that in via io_uring_buf_reg.ring_addr and the kernel would map it. Outside of the setup differences, the kernel allocate + user mapped provided buffer ring works exactly the same. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Jens Axboe	81cf17cd3a	io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags' In preparation for allowing flags to be set for registration, rename the padding and use it for that. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:21 -06:00
Jens Axboe	25a2c188a0	io_uring/kbuf: add buffer_list->is_mapped member Rather than rely on checking buffer_list->buf_pages or ->buf_nr_pages, add a separate member that tracks if this is a ring mapped provided buffer list or not. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Jens Axboe	ba56b63242	io_uring/kbuf: move pinning of provided buffer ring into helper In preparation for allowing the kernel to allocate the provided buffer rings and have the application mmap it instead, abstract out the current method of pinning and mapping the user allocated ring. No functional changes intended in this patch. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Helge Deller	d808459b2e	io_uring: Adjust mapping wrt architecture aliasing requirements Some architectures have memory cache aliasing requirements (e.g. parisc) if memory is shared between userspace and kernel. This patch fixes the kernel to return an aliased address when asked by userspace via mmap(). Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Jens Axboe	d4755e1538	io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it io_uring hashes writes to a given file/inode so that it can serialize them. This is useful if the file system needs exclusive access to the file to perform the write, as otherwise we end up with a ton of io-wq threads trying to lock the inode at the same time. This can cause excessive system time. But if the file system has flagged that it supports parallel O_DIRECT writes, then there's no need to serialize the writes. Check for that through FMODE_DIO_PARALLEL_WRITE and don't hash it if we don't need to. In a basic test of 8 threads writing to a file on XFS on a gen2 Optane, with each thread writing in 4k chunks, it improves performance from ~1350K IOPS (or ~5290MiB/sec) to ~1410K IOPS (or ~5500MiB/sec). Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-03 07:14:20 -06:00
Wojciech Lukowicz	b4a72c0589	io_uring: fix memory leak when removing provided buffers When removing provided buffers, io_buffer structs are not being disposed of, leading to a memory leak. They can't be freed individually, because they are allocated in page-sized groups. They need to be added to some free list instead, such as io_buffers_cache. All callers already hold the lock protecting it, apart from when destroying buffers, so had to extend the lock there. Fixes: `cc3cec8367` ("io_uring: speedup provided buffer handling") Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com> Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-01 16:52:12 -06:00
Wojciech Lukowicz	c0921e51da	io_uring: fix return value when removing provided buffers When a request to remove buffers is submitted, and the given number to be removed is larger than available in the specified buffer group, the resulting CQE result will be the number of removed buffers + 1, which is 1 more than it should be. Previously, the head was part of the list and it got removed after the loop, so the increment was needed. Now, the head is not an element of the list, so the increment shouldn't be there anymore. Fixes: `dbc7d452e7` ("io_uring: manage provided buffers strictly ordered") Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com> Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-01 16:52:12 -06:00
Linus Torvalds	f3fa7f026e	Merge tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: - Fix a regression with the poll retry, introduced in this merge window (me) - Fix a regression with the alloc cache not decrementing the member count on removal. Also a regression from this merge window (Pavel) - Fix race around rsrc node grabbing (Pavel) * tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux: io_uring: fix poll/netmsg alloc caches io_uring/rsrc: fix rogue rsrc node grabbing io_uring/poll: clear single/double poll flags on poll arming	2023-03-31 12:30:13 -07:00
Jakub Kicinski	79548b7984	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c `3fbe4d8c0e` ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") `924531326e` ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-30 14:43:03 -07:00
Jens Axboe	95e49cf837	iov_iter: add iter_iov_addr() and iter_iov_len() helpers These just return the address and length of the current iovec segment in the iterator. Convert existing iov_iter_iovec() users to use them instead of getting a copy of the current vec. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-30 08:12:29 -06:00
Jens Axboe	de4f5fed3f	iov_iter: add iter_iovec() helper This returns a pointer to the current iovec entry in the iterator. Only useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF and ITER_IOVEC identically for the first segment. Rename struct iov_iter->iov to iov_iter->__iov to find any potentially troublesome spots, and also to prevent anyone from adding new code that accesses iter->iov directly. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-30 08:12:29 -06:00
Pavel Begunkov	fd30d1cdcc	io_uring: fix poll/netmsg alloc caches We increase cache->nr_cached when we free into the cache but don't decrease when we take from it, so in some time we'll get an empty cache with cache->nr_cached larger than IO_ALLOC_CACHE_MAX, that fails io_alloc_cache_put() and effectively disables caching. Fixes: `9b797a37c4` ("io_uring: add abstraction around apoll cache") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-30 06:53:42 -06:00
Pavel Begunkov	4ff0b50de8	io_uring/rsrc: fix rogue rsrc node grabbing We should not be looking at ctx->rsrc_node and anyhow modifying the node without holding uring_lock, grabbing references in such a way is not safe either. Cc: stable@vger.kernel.org Fixes: `5106dd6e74` ("io_uring: propagate issue_flags state down to file assignment") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1202ede2d7bb90136e3482b2b84aad9ed483e5d6.1680098433.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-29 09:23:46 -06:00
Jens Axboe	005308f7bd	io_uring/poll: clear single/double poll flags on poll arming Unless we have at least one entry queued, then don't call into io_poll_remove_entries(). Normally this isn't possible, but if we retry poll then we can have ->nr_entries cleared again as we're setting it up. If this happens for a poll retry, then we'll still have at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks it has entries to remove. Clear REQ_F_SINGLE_POLL and REQ_F_DOUBLE_POLL unconditionally when arming a poll request. Fixes: `c16bda3759` ("io_uring/poll: allow some retries for poll triggering spuriously") Cc: stable@vger.kernel.org Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-28 07:09:01 -06:00
Linus Torvalds	83511470af	Merge tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Send Identify with CNS 06h only to I/O controllers (Martin George) - Fix nvme_tcp_term_pdu to match spec (Caleb Sander) - Pass in issue_flags for uring_cmd, so the end_io handlers don't need to assume what the right context is (me) - Fix for ublk, marking it as LIVE before adding it to avoid races on the initial IO (Ming) * tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux: nvme-tcp: fix nvme_tcp_term_pdu to match spec nvme: send Identify with CNS 06h only to I/O controllers block/io_uring: pass in issue_flags for uring_cmd task_work handling block: ublk_drv: mark device as LIVE before adding disk	2023-03-24 14:10:39 -07:00
Savino Dicanosa	02a4d923e4	io_uring/rsrc: fix null-ptr-deref in io_file_bitmap_get() When fixed files are unregistered, file_alloc_end and alloc_hint are not cleared. This can later cause a NULL pointer dereference in io_file_bitmap_get() if auto index selection is enabled via IORING_FILE_INDEX_ALLOC: [ 6.519129] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] [ 6.541468] RIP: 0010:_find_next_zero_bit+0x1a/0x70 [...] [ 6.560906] Call Trace: [ 6.561322] <TASK> [ 6.561672] io_file_bitmap_get+0x38/0x60 [ 6.562281] io_fixed_fd_install+0x63/0xb0 [ 6.562851] ? __pfx_io_socket+0x10/0x10 [ 6.563396] io_socket+0x93/0xf0 [ 6.563855] ? __pfx_io_socket+0x10/0x10 [ 6.564411] io_issue_sqe+0x5b/0x3d0 [ 6.564914] io_submit_sqes+0x1de/0x650 [ 6.565452] __do_sys_io_uring_enter+0x4fc/0xb20 [ 6.566083] ? __do_sys_io_uring_register+0x11e/0xd80 [ 6.566779] do_syscall_64+0x3c/0x90 [ 6.567247] entry_SYSCALL_64_after_hwframe+0x72/0xdc [...] To fix the issue, set file alloc range and alloc_hint to zero after file tables are freed. Cc: stable@vger.kernel.org Fixes: `4278a0deb1` ("io_uring: defer alloc_hint update to io_file_bitmap_set()") Signed-off-by: Savino Dicanosa <sd7.dev@pm.me> [axboe: add explicit bitmap == NULL check as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-22 11:04:55 -06:00
Jens Axboe	74e2e17ee1	io_uring/net: avoid sending -ECONNABORTED on repeated connection requests Since io_uring does nonblocking connect requests, if we do two repeated ones without having a listener, the second will get -ECONNABORTED rather than the expected -ECONNREFUSED. Treat -ECONNABORTED like a normal retry condition if we're nonblocking, if we haven't already seen it. Cc: stable@vger.kernel.org Fixes: `3fb1bd6881` ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Link: https://github.com/axboe/liburing/issues/828 Reported-by: Hui, Chunyang <sanqian.hcy@antgroup.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-20 20:44:45 -06:00
Jens Axboe	9d2789ac9d	block/io_uring: pass in issue_flags for uring_cmd task_work handling io_uring_cmd_done() currently assumes that the uring_lock is held when invoked, and while it generally is, this is not guaranteed. Pass in the issue_flags associated with it, so that we have IO_URING_F_UNLOCKED available to be able to lock the CQ ring appropriately when completing events. Cc: stable@vger.kernel.org Fixes: `ee692a21e9` ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-20 20:01:25 -06:00
Keith Busch	54bdd67d0f	blk-mq: remove hybrid polling io_uring provides the only way user space can poll completions, and that always sets BLK_POLL_NOSLEEP. This effectively makes hybrid polling dead code, so remove it and everything supporting it. Hybrid polling was effectively killed off with `9650b453a3`, "block: ignore RWF_HIPRI hint for sync dio", but still potentially reachable through io_uring until `d729cf9acb`, "io_uring: don't sleep when polling for I/O", but hybrid polling probably should not have been reachable through that async interface from the beginning. Fixes: `9650b453a3` ("block: ignore RWF_HIPRI hint for sync dio") Fixes: `d729cf9acb` ("io_uring: don't sleep when polling for I/O") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20230320194926.3353144-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-20 15:30:03 -06:00
Jakub Kicinski	1118aa4c70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/wireless/nl80211.c `b27f07c50a` ("wifi: nl80211: fix puncturing bitmap policy") `cbbaf2bb82` ("wifi: nl80211: add a command to enable/disable HW timestamping") https://lore.kernel.org/all/20230314105421.3608efae@canb.auug.org.au tools/testing/selftests/net/Makefile `62199e3f16` ("selftests: net: Add VXLAN MDB test") `13715acf8a` ("selftest: Add test for bind() conflicts.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-17 16:29:25 -07:00
Pavel Begunkov	d2acf78908	io_uring/rsrc: fix folio accounting \| BUG: Bad page state in process kworker/u8:0 pfn:5c001 \| page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001 \| head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1 \| anon flags: 0x3fffc00000b0004(uptodate\|head\|mappedtodisk\|swapbacked\|node=0\|zone=0\|lastcpupid=0xffff) \| raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000 \| raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000 \| head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1 \| head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000 \| page dumped because: nonzero pincount \| CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1 \| Hardware name: linux,dummy-virt (DT) \| Workqueue: events_unbound io_ring_exit_work \| Call trace: \| dump_backtrace+0x13c/0x208 \| show_stack+0x34/0x58 \| dump_stack_lvl+0x150/0x1a8 \| dump_stack+0x20/0x30 \| bad_page+0xec/0x238 \| free_tail_pages_check+0x280/0x350 \| free_pcp_prepare+0x60c/0x830 \| free_unref_page+0x50/0x498 \| free_compound_page+0xcc/0x100 \| free_transhuge_page+0x1f0/0x2b8 \| destroy_large_folio+0x80/0xc8 \| __folio_put+0xc4/0xf8 \| gup_put_folio+0xd0/0x250 \| unpin_user_page+0xcc/0x128 \| io_buffer_unmap+0xec/0x2c0 \| __io_sqe_buffers_unregister+0xa4/0x1e0 \| io_ring_exit_work+0x68c/0x1188 \| process_one_work+0x91c/0x1a58 \| worker_thread+0x48c/0xe30 \| kthread+0x278/0x2f0 \| ret_from_fork+0x10/0x20 Mark reports an issue with the recent patches coalescing compound pages while registering them in io_uring. The reason is that we try to drop excessive references with folio_put_refs(), but pages were acquired with pin_user_pages(), which has extra accounting and so should be put down with matching unpin_user_pages() or at least gup_put_folio(). As a fix unpin_user_pages() all but first page instead, and let's figure out a better API after. Fixes: `57bebf807e` ("io_uring/rsrc: optimise registered huge pages") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:32:18 -06:00

1 2 3 4 5 ...

617 Commits