34 Commits

Author SHA1 Message Date
Dylan Yudaken
f75d5036d0 io_uring: trace local task work run
Add tracing for io_run_local_task_work

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220830125013.570060-8-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21 10:30:42 -06:00
Stefan Roesch
1c849b481b io_uring: Add tracepoint for short writes
This adds the io_uring_short_write tracepoint to io_uring. A short write
is issued if not all pages that are required for a write are in the page
cache and the async buffered writes have to return EAGAIN.

Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220616212221.2024518-13-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:32 -06:00
Dylan Yudaken
9b26e811e9 io_uring: fix io_uring_cqe_overflow trace format
Make the trace format consistent with io_uring_complete for cflags

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-12-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
eccd880185 io_uring: add trace event for running task work
This is useful for investigating if task_work is batching

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-8-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:15 -06:00
Pavel Begunkov
48863ffd3e io_uring: clean up tracing events
We have lots of trace events accepting an io_uring request and wanting
to print some of its fields like user_data, opcode, flags and so on.
However, as trace points were unaware of io_uring structures, we had to
pass all the fields as arguments. Teach trace/events/io_uring.h about
struct io_kiocb and stop the misery of passing a horde of arguments to
trace helpers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/40ff72f92798114e56d400f2b003beb6cde6ef53.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:14 -06:00
Dylan Yudaken
e70b64a3f2 io_uring: move io_uring_get_opcode out of TP_printk
The TP_printk macro's are not supposed to use custom code ([1]) or else
tools such as perf cannot use these events.

Convert the opcode string representation to use the __string wiring that
the event framework provides ([2]).

[1]: https://lwn.net/Articles/379903/
[2]: https://lwn.net/Articles/381064/

Fixes: 033b87d24f72 ("io_uring: use the text representation of ops in trace")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220623083743.2648321-1-dylany@fb.com
[axboe: fixup spurious removal of sq_thread assignment]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-23 08:40:36 -06:00
Linus Torvalds
9836e93c0a for-5.19/io_uring-passthrough-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKovAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv9oD/4qCs7k3bPZZWZ6xoWb4EObyyWOUifi26lp
 vpsJHFUbA67S/i4++LV9H18SazWJ7h08ac4bjgZ+NQz40/1WkTN8/Fa76jo+BnNK
 7T10Wp4Ak6uwWVrKaA81pnT+G9+xmHlJ3X27aKxzLuT7BEPpShZ6ouFVjTkx9CzN
 LrLjuCDTOBBN+ZoaroWYfdLwTQX2VCAl9B15lOtQIlFvuuU8VlrvLboY+80K8TvY
 1wvTA2HTjnXoYx+/cTTMIFZIwQH3r1hsbwEDD8/YJj1+ouhSRQ1b0p/nk2pA+3ws
 HF5r/YS/rLBjlPF094IzeOBaUyA433AN1VhZqnII8ek7ViT3W3x+BRrgE9O6ZkWT
 0AjX1BXReI5rdFmxBmwsSdBnrSoGaJOf2GdsCCdubXBIi+F/RvyajrPf7PTB5zbW
 9WEK/uy3xvZsRVkUGAzOb9QGdvjcllgMzwPJsDegDCw5PdcPdT3mzy6KGIWipFLp
 j8R+br7hRMpOJv/YpihJDMzSDkQ/r1/SCwR4fpLid/QdSHG/eRTQK6c4Su5bNYEy
 QDy2F6kQdBVtEJCQHcEOsbhXzSTNBcdB+ujUUM5653FkaHe6y4JbomLrsNx407Id
 i/4ROwA5K1dioJx503Eap+OhbI5rV+PFytJTwxvLrNyVGccwbH2YOVq80fsVBP2e
 cZbn6EX4Vg==
 =/peE
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring NVMe command passthrough from Jens Axboe:
 "On top of everything else, this adds support for passthrough for
  io_uring.

  The initial feature for this is NVMe passthrough support, which allows
  non-filesystem based IO commands and admin commands.

  To support this, io_uring grows support for SQE and CQE members that
  are twice as big, allowing to pass in a full NVMe command without
  having to copy data around. And to complete with more than just a
  single 32-bit value as the output"

* tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits)
  io_uring: cleanup handling of the two task_work lists
  nvme: enable uring-passthrough for admin commands
  nvme: helper for uring-passthrough checks
  blk-mq: fix passthrough plugging
  nvme: add vectored-io support for uring-cmd
  nvme: wire-up uring-cmd support for io-passthru on char-device.
  nvme: refactor nvme_submit_user_cmd()
  block: wire-up support for passthrough plugging
  fs,io_uring: add infrastructure for uring-cmd
  io_uring: support CQE32 for nop operation
  io_uring: enable CQE32
  io_uring: support CQE32 in /proc info
  io_uring: add tracing for additional CQE32 fields
  io_uring: overflow processing for CQE32
  io_uring: flush completions for CQE32
  io_uring: modify io_get_cqe for CQE32
  io_uring: add CQE32 completion processing
  io_uring: add CQE32 setup processing
  io_uring: change ring size calculation for CQE32
  io_uring: store add. return values for CQE32
  ...
2022-05-23 13:06:15 -07:00
Linus Torvalds
368da430d0 for-5.19/io_uring-socket-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKorgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm0eEACdTzhm7h5cXn9KjIvWLkdocAb/NOL8GYPn
 Q1mY1SqKQFZvs/fyKHkkZEiIBPxhvN6snVFXMpb4LDmPYeeH4GTUlNomrGTIjvf/
 j6SnZN4lCs9A2NlE+iDVWnFQOPQFALza2Y9BhC5xzay326qnKlO+0fQv3C1vXXrc
 /PNLqxQr7+GmO0a0PJnS6mGWGj6qF7nLqilB9apnKsTK6BKbJEec6ciKreqxU6ME
 WHaux11uIAbcf8rc6C/2myEK0k6jCOAue3vZ0lizygf+8klUCl2vMqV5BLwCBlXG
 /e7hBsUUrGr0CG0fryqhQQTUxsZLshioBbQH1vttSeZCli46mmWWAhPNy3/jb1ZU
 72bazA84Fe9ney9uVZvZoMoBsG+6t6UOatqND13MeRFAXnkRr0jZRuau2iBxgqAr
 OINJW+IVPU7IrCD+S4lV1/LCdhLhYcob8/zfKmIrdHMQnWG/gLonVpYJIBCyLDAv
 2jvHFIPJuSMUSGVjRKCb16LLNV6u7YG6VOWbKuippxfJxDdwA3TOtOhvTJIpYq0u
 TotPgpZ7bfcr4xDsGgD9mZS8E7jwsL/G0/MwsnixELykEXuhd++sgoTbr+RyUYdV
 45Hm6DsxlytjzOb/5uQrqhwrso05eVt14K74XApPa3fWKL8aWCh1jGSdo3CSbIyW
 iHwss919Ag==
 =nb5i
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring socket() support from Jens Axboe:
 "This adds support for socket(2) for io_uring. This is handy when using
  direct / registered file descriptors with io_uring.

  Outside of those two patches, a small series from Dylan on top that
  improves the tracing by providing a text representation of the opcode
  rather than needing to decode this by reading the header file every
  time.

  That sits in this branch as it was the last opcode added (until it
  wasn't...)"

* tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
2022-05-23 12:42:33 -07:00
Linus Torvalds
09beaff75e for-5.19/io_uring-xattr-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKopkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgakEACktFtUBQLrYOXbM/mVxMpR//ht4e29E8k0
 j/DkqK0yDKn9VkvDryALguH+ixNSI9Z4N7xSELLb/meNQsbJ7YdprL3xJn3BoUgs
 3zx44janE8J3Q5TsXvD2z2jPIMaT892t5+5aLFYZqP1g+KDXI8T1WpHsETMkKfRG
 ZPeerUrd0fhtnDpViaaYbRutIEt8V8tsPhh0XG/4GojWjUW0FTsRKBSGuQ0sQnUr
 aJDfF5VylOjOBzRGimGZ23vJIgtZ8UEpX0T2MxR5V6ffj4cI8bCFQOrphh7yHxF5
 f09xte80zX6pow5AivIpultZShR6IoQG5DIvF59woNP16uXy5yUyVTQvdnt8RlyY
 RjLd8ro9Gt4wBQGqckJLyY/o1FGhaQ8S99wOixUlpb9qKAOGmQZI97FQKFENqx/1
 Xe+bP6QmTt9uCXsYPIFBtZaaEv2u0yjHOyERFUSzKJQUuPTa5Rmen0EXYXRhe5/E
 p+sR3Qbk1wzlW7UHuCT2gcaI67SAFG+yDv1U6BAaVdcS71i0WCA+Q2a6AuB+NJzg
 ER4+JRoeOnjEXSP2UPvIUBL1Komdj4R2hnrOK4S80R3yQ3NaadrWywhBn5HNcniM
 wE2P6J0erzRFqyfBw9tyNLsZwR1iS7JqSD9/NuBLoWwb42O0l+WgqqwDTSxMsde4
 egKBaidRqg==
 =CfhD
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring xattr support from Jens Axboe:
 "Support for the xattr variants"

* tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block:
  io_uring: cleanup error-handling around io_req_complete
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr
2022-05-23 12:30:30 -07:00
Linus Torvalds
3a166bdbf3 for-5.19/io_uring-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKol0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn+sEACbdEQqG6OoCOhJ0ZuxTdQqNMGxCImKBxjP
 8Bqf+0hYNgwfG+80/UQvmc7olb+KxvZ6KtrgViC/ujhvMQmX0Xf/881kiiKG/iHJ
 XKoL9PdqIkenIGnlyEp1uRmnUbooYF+s4iT6Gj/pjnn29GbcKjsPzKV1CUNkt3GC
 R+wpdKczHQDaSwzDY5Ntyjf68QUQOyUznkHW+6JOcBeih3ET7NfapR/zsFS93RlL
 B9pQ9NiBBQfzCAUycVyQMC+p/rJbKWgidAiFk4fXKRm8/7iNwT4dB0+oUymlECxt
 xvalRVK6ER1s4RSdQcUTZoQA+SrzzOnK1DYja9cvcLT3wH+aojana6S0rOMDi8wp
 hoWT5jdMaZN09Vcm7J4sBN15i50m9aDITp21PKOVDZXSMVsebltCL9phaN5+9x/j
 AfF6Vki1WTB4gYaDHR8v6UkW+HcF1WOmMdq8GB9UMfnTya6EJqAooYT9lhQBP/rv
 jxkdj9Fu98O87dOfy1Av9AxH1UB8d7ypCJKkSEMAUPoWf0rC9HjYr0cRq/yppAj8
 pI/0PwXaXRfQuoHPqZyETrPel77VQdBw+Hg+6TS0KlTd3WlVEJMZJPtXK466IFLp
 pYSRVnSI9PuhiClOpxriTCw0cppfRIv11IerCxRziqH9S1zijk0VBCN40//XDs1o
 JfvoA6htKQ==
 =S+Uf
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "Here are the main io_uring changes for 5.19. This contains:

   - Fixes for sparse type warnings (Christoph, Vasily)

   - Support for multi-shot accept (Hao)

   - Support for io_uring managed fixed files, rather than always
     needing the applicationt o manage the indices (me)

   - Fix for a spurious poll wakeup (Dylan)

   - CQE overflow fixes (Dylan)

   - Support more types of cancelations (me)

   - Support for co-operative task_work signaling, rather than always
     forcing an IPI (me)

   - Support for doing poll first when appropriate, rather than always
     attempting a transfer first (me)

   - Provided buffer cleanups and support for mapped buffers (me)

   - Improve how io_uring handles inflight SCM files (Pavel)

   - Speedups for registered files (Pavel, me)

   - Organize the completion data in a struct in io_kiocb rather than
     keep it in separate spots (Pavel)

   - task_work improvements (Pavel)

   - Cleanup and optimize the submission path, in general and for
     handling links (Pavel)

   - Speedups for registered resource handling (Pavel)

   - Support sparse buffers and file maps (Pavel, me)

   - Various fixes and cleanups (Almog, Pavel, me)"

* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
  io_uring: fix incorrect __kernel_rwf_t cast
  io_uring: disallow mixed provided buffer group registrations
  io_uring: initialize io_buffer_list head when shared ring is unregistered
  io_uring: add fully sparse buffer registration
  io_uring: use rcu_dereference in io_close
  io_uring: consistently use the EPOLL* defines
  io_uring: make apoll_events a __poll_t
  io_uring: drop a spurious inline on a forward declaration
  io_uring: don't use ERR_PTR for user pointers
  io_uring: use a rwf_t for io_rw.flags
  io_uring: add support for ring mapped supplied buffers
  io_uring: add io_pin_pages() helper
  io_uring: add buffer selection support to IORING_OP_NOP
  io_uring: fix locking state for empty buffer group
  io_uring: implement multishot mode for accept
  io_uring: let fast poll support multishot
  io_uring: add REQ_F_APOLL_MULTISHOT for requests
  io_uring: add IORING_ACCEPT_MULTISHOT for accept
  io_uring: only wake when the correct events are set
  io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
  ...
2022-05-23 12:22:49 -07:00
Vasily Averin
0e7579ca73 io_uring: fix incorrect __kernel_rwf_t cast
Currently 'make C=1 fs/io_uring.o' generates sparse warning:

  CHECK   fs/io_uring.c
fs/io_uring.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h, i
nclude/trace/events/io_uring.h):
./include/trace/events/io_uring.h:488:1:
 warning: incorrect type in assignment (different base types)
    expected unsigned int [usertype] op_flags
    got restricted __kernel_rwf_t const [usertype] rw_flags

This happen on cast of sqe->rw_flags which is defined as __kernel_rwf_t,
this type is bitwise and requires __force attribute for any casts.
However rw_flags is a member of the union, and its access can be safely
replaced by using of its neighbours, so let's use poll32_events to fix
the sparse warning.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Link: https://lore.kernel.org/r/6f009241-a63f-ae43-a04b-62841aaef293@openvz.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-19 12:27:59 -06:00
Dylan Yudaken
2d2d5cb6ca io_uring: fix ordering of args in io_uring_queue_async_work
Fix arg ordering in TP_ARGS macro, which fixes the output.

Fixes: 502c87d65564c ("io-uring: Make tracepoints consistent.")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220512091834.728610-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12 06:18:58 -06:00
Stefan Roesch
c4bb964fa0 io_uring: add tracing for additional CQE32 fields
This adds tracing for the extra1 and extra2 fields.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09 06:35:34 -06:00
Dylan Yudaken
033b87d24f io_uring: use the text representation of ops in trace
It is annoying to translate opcodes to textwhen tracing io_uring. Use the
io_uring_get_opcode function instead to use the text representation.

A downside here might have been that if the opcode is invalid it will not
be obvious, however the opcode is already overridden in these cases to
0 (NOP) in io_init_req(). Therefore this is a non issue.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-5-dylany@fb.com
[axboe: don't include register, those are not req opcodes]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-28 17:06:03 -06:00
Dylan Yudaken
1460af7de6 io_uring: rename op -> opcode
do this for consistency with the other trace messages

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-26 06:50:42 -06:00
Jens Axboe
0200ce6a57 io_uring: fix trace for reduced sqe padding
__pad2 is only 1 u64 now, the other one is addr3. Adjust the trace so
that it matches up.

Fixes: a56834e0fafe ("io_uring: add fgetxattr and getxattr support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:46 -06:00
Dylan Yudaken
47894438e9 io_uring: add trace support for CQE overflow
Add trace function for overflowing CQ ring.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Dylan Yudaken
052ebf1fbb io_uring: make tracing format consistent
Make the tracing formatting for user_data and flags consistent.

Having consistent formatting allows one for example to grep for a specific
user_data/flags and be able to trace a single sqe through easily.

Change user_data to 0x%llx and flags to 0x%x everywhere. The '0x' is
useful to disambiguate for example "user_data 100".

Additionally remove the '=' for flags in io_uring_req_failed, again for consistency.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220316095204.2191498-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-16 05:45:28 -06:00
Stefan Roesch
502c87d655 io-uring: Make tracepoints consistent.
This makes the io-uring tracepoints consistent. Where it makes sense
the tracepoints start with the following four fields:
- context (ring)
- request
- user_data
- opcode.

Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220214180430.70572-3-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-10 06:32:49 -07:00
Usama Arif
2757be22c0 io_uring: remove trace for eventfd
The information on whether eventfd is registered is not very useful and
would result in the tracepoint being enclosed in an rcu_readlock in a
later patch that tries to avoid ring quiesce for registering eventfd.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Link: https://lore.kernel.org/r/20220204145117.1186568-2-usama.arif@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-10 06:32:49 -07:00
Jens Axboe
a87acfde94 io_uring: dump sqe contents if issue fails
I recently had to look at a production problem where a request ended
up getting the dreaded -EINVAL error on submit. The most used and
hence useless of error codes, as it just tells you that something
was wrong with your request, but not more than that.

Let's dump the full sqe contents if we run into an issue failure,
that'll allow easier diagnosing of a wide variety of issues.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:52 -06:00
Jens Axboe
2fc2a7a62e io_uring: io_uring_complete() trace should take an integer
It currently takes a long, and while that's normally OK, the io_uring
limit is an int. Internally in io_uring it's an int, but sometimes it's
passed as a long. That can yield confusing results where a completions
seems to generate a huge result:

ou-sqp-1297-1298    [001] ...1   788.056371: io_uring_complete: ring 000000000e98e046, user_data 0x0, result 4294967171, cflags 0

which is due to -ECANCELED being stored in an unsigned, and then passed
in as a long. Using the right int type, the trace looks correct:

iou-sqp-338-339     [002] ...1    15.633098: io_uring_complete: ring 00000000e0ac60cf, user_data 0x0, result -125, cflags 0

Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-03 16:59:06 -06:00
Olivier Langlois
3d7b7b5285 io_uring: minor clean up in trace events definition
Fix tabulation to make nice columns

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16 06:41:48 -06:00
Olivier Langlois
236daeae36 io_uring: Add to traces the req pointer when available
The req pointer uniquely identify a specific request.
Having it in traces can provide valuable insights that is not possible
to have if the calling process is reusing the same user_data value.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16 06:41:46 -06:00
Linus Torvalds
9b1f61d5d7 tracing updates for 5.13
New feature:
 
  The "func-no-repeats" option in tracefs/options directory. When set
  the function tracer will detect if the current function being traced
  is the same as the previous one, and instead of recording it, it will
  keep track of the number of times that the function is repeated in a row.
  And when another function is recorded, it will write a new event that
  shows the function that repeated, the number of times it repeated and
  the time stamp of when the last repeated function occurred.
 
 Enhancements:
 
  In order to implement the above "func-no-repeats" option, the ring
  buffer timestamp can now give the accurate timestamp of the event
  as it is being recorded, instead of having to record an absolute
  timestamp for all events. This helps the histogram code which no longer
  needs to waste ring buffer space.
 
  New validation logic to make sure all trace events that access
  dereferenced pointers do so in a safe way, and will warn otherwise.
 
 Fixes:
 
  No longer limit the PIDs of tasks that are recorded for "saved_cmdlines"
  to PID_MAX_DEFAULT (32768), as systemd now allows for a much larger
  range. This caused the mapping of PIDs to the task names to be dropped
  for all tasks with a PID greater than 32768.
 
  Change trace_clock_global() to never block. This caused a deadlock.
 
 Clean ups:
 
  Typos, prototype fixes, and removing of duplicate or unused code.
 
  Better management of ftrace_page allocations.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYI/1vBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qiL0AP9EemIC5TDh2oihqLRNeUjdTu0ryEoM
 HRFqxozSF985twD/bfkt86KQC8rLHwxTbxQZ863bmdaC6cMGFhWiF+H/MAs=
 =psYt
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New feature:

   - A new "func-no-repeats" option in tracefs/options directory.

     When set the function tracer will detect if the current function
     being traced is the same as the previous one, and instead of
     recording it, it will keep track of the number of times that the
     function is repeated in a row. And when another function is
     recorded, it will write a new event that shows the function that
     repeated, the number of times it repeated and the time stamp of
     when the last repeated function occurred.

  Enhancements:

   - In order to implement the above "func-no-repeats" option, the ring
     buffer timestamp can now give the accurate timestamp of the event
     as it is being recorded, instead of having to record an absolute
     timestamp for all events. This helps the histogram code which no
     longer needs to waste ring buffer space.

   - New validation logic to make sure all trace events that access
     dereferenced pointers do so in a safe way, and will warn otherwise.

  Fixes:

   - No longer limit the PIDs of tasks that are recorded for
     "saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows
     for a much larger range. This caused the mapping of PIDs to the
     task names to be dropped for all tasks with a PID greater than
     32768.

   - Change trace_clock_global() to never block. This caused a deadlock.

  Clean ups:

   - Typos, prototype fixes, and removing of duplicate or unused code.

   - Better management of ftrace_page allocations"

* tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (32 commits)
  tracing: Restructure trace_clock_global() to never block
  tracing: Map all PIDs to command lines
  ftrace: Reuse the output of the function tracer for func_repeats
  tracing: Add "func_no_repeats" option for function tracing
  tracing: Unify the logic for function tracing options
  tracing: Add method for recording "func_repeats" events
  tracing: Add "last_func_repeats" to struct trace_array
  tracing: Define new ftrace event "func_repeats"
  tracing: Define static void trace_print_time()
  ftrace: Simplify the calculation of page number for ftrace_page->records some more
  ftrace: Store the order of pages allocated in ftrace_page
  tracing: Remove unused argument from "ring_buffer_time_stamp()
  tracing: Remove duplicate struct declaration in trace_events.h
  tracing: Update create_system_filter() kernel-doc comment
  tracing: A minor cleanup for create_system_filter()
  kernel: trace: Mundane typo fixes in the file trace_events_filter.c
  tracing: Fix various typos in comments
  scripts/recordmcount.pl: Make vim and emacs indent the same
  scripts/recordmcount.pl: Make indent spacing consistent
  tracing: Add a verifier to check string pointers for trace events
  ...
2021-05-03 11:19:54 -07:00
Jens Axboe
7471e1afab io_uring: include cflags in completion trace event
We should be including the completion flags for better introspection on
exactly what completion event was logged.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 17:41:59 -06:00
Ingo Molnar
f2cc020d78 tracing: Fix various typos in comments
Fix ~59 single-word typos in the tracing code comments, and fix
the grammar in a handful of places.

Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com
Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.com

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-23 14:08:18 -04:00
Jens Axboe
d7718a9d25 io_uring: use poll driven retry for files that support it
Currently io_uring tries any request in a non-blocking manner, if it can,
and then retries from a worker thread if we get -EAGAIN. Now that we have
a new and fancy poll based retry backend, use that to retry requests if
the file supports it.

This means that, for example, an IORING_OP_RECVMSG on a socket no longer
requires an async thread to complete the IO. If we get -EAGAIN reading
from the socket in a non-blocking manner, we arm a poll handler for
notification on when the socket becomes readable. When it does, the
pending read is executed directly by the task again, through the io_uring
task work handlers. Not only is this faster and more efficient, it also
means we're not generating potentially tons of async threads that just
sit and block, waiting for the IO to complete.

The feature is marked with IORING_FEAT_FAST_POLL, meaning that async
pollable IO is fast, and that poll<link>other_op is fast as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:38 -07:00
Jens Axboe
354420f705 io_uring: add opcode to issue trace event
For some test apps at least, user_data is just zeroes. So it's not a
good way to tell what the command actually is. Add the opcode to the
issue trace point.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:04:06 -07:00
Jens Axboe
915967f69c io_uring: improve trace_io_uring_defer() trace point
We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:10 -07:00
Jens Axboe
51c3ff62ca io_uring: add completion trace event
We currently don't have a completion event trace, add one of those. And
to better be able to match up submissions and completions, add user_data
to the submission trace as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04 07:07:52 -07:00
Jens Axboe
0069fc6b1c io_uring: remove io_uring_add_to_prev() trace event
This internal logic was killed with the conversion to io-wq, so we no
longer have a need for this particular trace. Kill it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-01 12:44:40 -06:00
Jens Axboe
561fb04a6a io_uring: replace workqueue usage with io-wq
Drop various work-arounds we have for workqueues:

- We no longer need the async_list for tracking sequential IO.

- We don't have to maintain our own mm tracking/setting.

- We don't need a separate workqueue for buffered writes. This didn't
  even work that well to begin with, as it was suboptimal for multiple
  buffered writers on multiple files.

- We can properly cancel pending interruptible work. This fixes
  deadlocks with particularly socket IO, where we cannot cancel them
  when the io_uring is closed. Hence the ring will wait forever for
  these requests to complete, which may never happen. This is different
  from disk IO where we know requests will complete in a finite amount
  of time.

- Due to being able to cancel work interruptible work that is already
  running, we can implement file table support for work. We need that
  for supporting system calls that add to a process file table.

- It gets us one step closer to adding async support for any system
  call.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:06 -06:00
Dmitrii Dolgov
c826bd7a74 io_uring: add set of tracing events
To trace io_uring activity one can get an information from workqueue and
io trace events, but looks like some parts could be hard to identify via
this approach. Making what happens inside io_uring more transparent is
important to be able to reason about many aspects of it, hence introduce
the set of tracing events.

All such events could be roughly divided into two categories:

* those, that are helping to understand correctness (from both kernel
  and an application point of view). E.g. a ring creation, file
  registration, or waiting for available CQE. Proposed approach is to
  get a pointer to an original structure of interest (ring context, or
  request), and then find relevant events. io_uring_queue_async_work
  also exposes a pointer to work_struct, to be able to track down
  corresponding workqueue events.

* those, that provide performance related information. Mostly it's about
  events that change the flow of requests, e.g. whether an async work
  was queued, or delayed due to some dependencies. Another important
  case is how io_uring optimizations (e.g. registered files) are
  utilized.

Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:24:18 -06:00