Jens Axboe
34d2bfe7d4
io_uring: improve task work cache utilization
While profiling task_work intensive workloads, I noticed that most of the time in tctx_task_work() is spending stalled on loading 'req'. This is one of the unfortunate side effects of using linked lists, particularly when they end up being passe around. Prefetch the next request, if there is one. There's a sufficient amount of work in between that this makes it available for the next loop. While fiddling with the cache layout, move the link outside of the hot completion cacheline. It's rarely used in hot workloads, so better to bring in kbuf which is used for networked loads with provided buffers. This reduces tctx_task_work() overhead from ~3% to 1-1.5% in my testing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.
Description
Languages
C
97.6%
Assembly
1%
Shell
0.5%
Python
0.3%
Makefile
0.3%