linux

iv/linux

Author	SHA1	Message	Date
Jens Axboe	1151a7cccb	io_uring: move shutdown under the general net section Gets rid of some ifdefs and enables use of the net defines for when CONFIG_NET isn't set. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-31 02:50:06 -06:00
Jens Axboe	157dc813b4	io_uring: unify calling convention for async prep handling Make them consistent in preparation for defining a req async prep handler. The readv/writev requests share a prep handler, move it one level down so the initial one is consistent with the others. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-31 02:50:06 -06:00
Jens Axboe	fcde59feb1	io_uring: add io_op_defs 'def' pointer in req init and issue Define and set it when appropriate, and use it consistently in the function rather than using io_op_defs[opcode]. Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-31 02:50:02 -06:00
Jens Axboe	54739cc6b4	io_uring: make prep and issue side of req handlers named consistently Almost all of them are, the odd ones out are the poll remove and the files update request. Name them like the others, which is: io_#cmdname_prep for request preparation io_#cmdname for request issue Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-25 05:37:06 -06:00
Jens Axboe	ecddc25d13	io_uring: make timeout prep handlers consistent with other prep handlers All other opcodes take a {req, sqe} set for prep handling, split out a timeout prep handler so that timeout and linked timeouts can use the same one. Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-25 05:36:54 -06:00
Linus Torvalds	140e40e39a	zonefs changes for 5.19-rc1 This set of patches improve zonefs open sequential file accounting and adds accounting for active sequential files to allow the user to handle the maximum number of active zones of an NVMe ZNS drive. sysfs attributes for both open and active sequential files are also added to facilitate access to this information from applications without resorting to inspecting the block device limits. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYosTQQAKCRDdoc3SxdoY dqUWAQDGKoSkyRAPJAmuQXYOuOJTLu0b8DSfvyPopFLfKXpPHAEAg995JNTLUs0G R3m7lH6GK+OSBWhZ/Z5HOND3QS9BhgM= =hvqx -----END PGP SIGNATURE----- Merge tag 'zonefs-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: "This improves zonefs open sequential file accounting and adds accounting for active sequential files to allow the user to handle the maximum number of active zones of an NVMe ZNS drive. sysfs attributes for both open and active sequential files are also added to facilitate access to this information from applications without resorting to inspecting the block device limits" * tag 'zonefs-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: documentation: zonefs: Document sysfs attributes documentation: zonefs: Cleanup the mount options section zonefs: Add active seq file accounting zonefs: Export open zone resource information through sysfs zonefs: Always do seq file write open accounting zonefs: Rename super block information fields zonefs: Fix management of open zones zonefs: Clear inode information flags on inode creation	2022-05-23 14:36:45 -07:00
Linus Torvalds	115cd47132	for-5.19/block-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKrUsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgDjD/44hY9h0JsOLoRH1IvFtuaH6n718JXuqG17 hHCfmnAUVqj2jT00IUbVlUTd905bCGpfrodBL3PAmPev1zZHOUd/MnJKrSynJ+/s NJEMZQaHxLmocNDpJ1sZo7UbAFErsZXB0gVYUO8cH2bFYNu84H1mhRCOReYyqmvQ aIAASX5qRB/ciBQCivzAJl2jTdn4WOn5hWi9RLidQB7kSbaXGPmgKAuN88WI4H7A zQgAkEl2EEquyMI5tV1uquS7engJaC/4PsenF0S9iTyrhJLjneczJBJZKMLeMR8d sOm6sKJdpkrfYDyaA4PIkgmLoEGTtwGpqGHl4iXTyinUAxJoca5tmPvBb3wp66GE 2Mr7pumxc1yJID2VHbsERXlOAX3aZNCowx2gum2MTRIO8g11Eu3aaVn2kv37MBJ2 4R2a/cJFl5zj9M8536cG+Yqpy0DDVCCQKUIqEupgEu1dyfpznyWH5BTAHXi1E8td nxUin7uXdD0AJkaR0m04McjS/Bcmc1dc6I8xvkdUFYBqYCZWpKOTiEpIBlHg0XJA sxdngyz5lSYTGVA4o4QCrdR0Tx1n36A1IYFuQj0wzxBJYZ02jEZuII/A3dd+8hiv EY+VeUQeVIXFFuOcY+e0ScPpn7Nr17hAd1en/j2Hcoe4ZE8plqG2QTcnwgflcbis iomvJ4yk0Q== =0Rw1 -----END PGP SIGNATURE----- Merge tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "Here are the core block changes for 5.19. This contains: - blk-throttle accounting fix (Laibin) - Series removing redundant assignments (Michal) - Expose bio cache via the bio_set, so that DM can use it (Mike) - Finish off the bio allocation interface cleanups by dealing with the weirdest member of the family. bio_kmalloc combines a kmalloc for the bio and bio_vecs with a hidden bio_init call and magic cleanup semantics (Christoph) - Clean up the block layer API so that APIs consumed by file systems are (almost) only struct block_device based, so that file systems don't have to poke into block layer internals like the request_queue (Christoph) - Clean up the blk_execute_rq* API (Christoph) - Clean up various lose end in the blk-cgroup code to make it easier to follow in preparation of reworking the blkcg assignment for bios (Christoph) - Fix use-after-free issues in BFQ when processes with merged queues get moved to different cgroups (Jan) - BFQ fixes (Jan) - Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming, Wolfgang, me)" * tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits) blk-mq: fix typo in comment bfq: Remove bfq_requeue_request_body() bfq: Remove superfluous conversion from RQ_BIC() bfq: Allow current waker to defend against a tentative one bfq: Relax waker detection for shared queues blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE() blk-throttle: Set BIO_THROTTLED when bio has been throttled blk-cgroup: Remove unnecessary rcu_read_lock/unlock() blk-cgroup: always terminate io.stat lines block, bfq: make bfq_has_work() more accurate block, bfq: protect 'bfqd->queued' by 'bfqd->lock' block: cleanup the VM accounting in submit_bio block: Fix the bio.bi_opf comment block: reorder the REQ_ flags blk-iocost: combine local_stat and desc_stat to stat block: improve the error message from bio_check_eod block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone block: remove superfluous calls to blkcg_bio_issue_init kthread: unexport kthread_blkcg blk-cgroup: cleanup blkcg_maybe_throttle_current ...	2022-05-23 13:56:39 -07:00
Linus Torvalds	df1c5d73d2	for-5.19/writeback-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKrAMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpuqeD/sH85B2UQOlHtcn10NOQSv9U8KerbJ9LOoq ZqCaBcb9NEPRvQPjOmFvr08S8rBvGcpGcniKpwvCZiX78mdp+DFAHZDJppWasSdX F5EXV+40Pxtg+kAOJNEh2XNXuTGRddys9i70sxbKbkLG9m74nT8pnDmND0WZn2KS 3d1ljBKZkJ+Ohy1NuUXRTm9KkrMyjSrsOh0ge893DwY7Dmz7/M34wBvY4JOLnAjj 03tz9Ge4/HNeqtEQMYCOFetxfKuxCeL583sJNP5SpmbCWFEnFtipY0ezGMUmDPoV QdLpqJTBMNpUiSLmNVmqQaaOF7IGdklWQRHoyFl3qspygnNe2xT+Lj3QHZnHTQVJ JaZRudW5eLTWYJ4wFw1FdhOQqXxU1NqNkFRblwdntPKfuq363URcwB9rFVCleNd0 MMrUNDRZeYURfzpTMkbRKNJByDcdnbtvaxjhE8un1IwTyAzJ8TK3IvAr/sFt4xTB 89R4lxRdZ+RD3dmhU6v+OrCJ5Xl3KlbmPTdfb21XSMF/NxizSWg6IY+Xwi3rlE8g b3lHETEpLV4jBA/OA/BsW2gOKxMwj/0hGUwXGAvr73haRWAxLOKjDpU5FhGi8sO1 ioeZSO3AOlHxir0fYujvWcme4RsTWChdzZSlOUbYXV0UQVlq8s3PvuyI7XHdi7CB l+F3TvuOOw== =vChY -----END PGP SIGNATURE----- Merge tag 'for-5.19/writeback-2022-05-22' of git://git.kernel.dk/linux-block Pull writeback fix from Jens Axboe: "A single writeback fix that didn't belong in any other branch, correcting the number of skipped pages" * tag 'for-5.19/writeback-2022-05-22' of git://git.kernel.dk/linux-block: fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages	2022-05-23 13:48:23 -07:00
Linus Torvalds	9836e93c0a	for-5.19/io_uring-passthrough-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKovAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv9oD/4qCs7k3bPZZWZ6xoWb4EObyyWOUifi26lp vpsJHFUbA67S/i4++LV9H18SazWJ7h08ac4bjgZ+NQz40/1WkTN8/Fa76jo+BnNK 7T10Wp4Ak6uwWVrKaA81pnT+G9+xmHlJ3X27aKxzLuT7BEPpShZ6ouFVjTkx9CzN LrLjuCDTOBBN+ZoaroWYfdLwTQX2VCAl9B15lOtQIlFvuuU8VlrvLboY+80K8TvY 1wvTA2HTjnXoYx+/cTTMIFZIwQH3r1hsbwEDD8/YJj1+ouhSRQ1b0p/nk2pA+3ws HF5r/YS/rLBjlPF094IzeOBaUyA433AN1VhZqnII8ek7ViT3W3x+BRrgE9O6ZkWT 0AjX1BXReI5rdFmxBmwsSdBnrSoGaJOf2GdsCCdubXBIi+F/RvyajrPf7PTB5zbW 9WEK/uy3xvZsRVkUGAzOb9QGdvjcllgMzwPJsDegDCw5PdcPdT3mzy6KGIWipFLp j8R+br7hRMpOJv/YpihJDMzSDkQ/r1/SCwR4fpLid/QdSHG/eRTQK6c4Su5bNYEy QDy2F6kQdBVtEJCQHcEOsbhXzSTNBcdB+ujUUM5653FkaHe6y4JbomLrsNx407Id i/4ROwA5K1dioJx503Eap+OhbI5rV+PFytJTwxvLrNyVGccwbH2YOVq80fsVBP2e cZbn6EX4Vg== =/peE -----END PGP SIGNATURE----- Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block Pull io_uring NVMe command passthrough from Jens Axboe: "On top of everything else, this adds support for passthrough for io_uring. The initial feature for this is NVMe passthrough support, which allows non-filesystem based IO commands and admin commands. To support this, io_uring grows support for SQE and CQE members that are twice as big, allowing to pass in a full NVMe command without having to copy data around. And to complete with more than just a single 32-bit value as the output" * tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits) io_uring: cleanup handling of the two task_work lists nvme: enable uring-passthrough for admin commands nvme: helper for uring-passthrough checks blk-mq: fix passthrough plugging nvme: add vectored-io support for uring-cmd nvme: wire-up uring-cmd support for io-passthru on char-device. nvme: refactor nvme_submit_user_cmd() block: wire-up support for passthrough plugging fs,io_uring: add infrastructure for uring-cmd io_uring: support CQE32 for nop operation io_uring: enable CQE32 io_uring: support CQE32 in /proc info io_uring: add tracing for additional CQE32 fields io_uring: overflow processing for CQE32 io_uring: flush completions for CQE32 io_uring: modify io_get_cqe for CQE32 io_uring: add CQE32 completion processing io_uring: add CQE32 setup processing io_uring: change ring size calculation for CQE32 io_uring: store add. return values for CQE32 ...	2022-05-23 13:06:15 -07:00
Linus Torvalds	e1a8fde720	for-5.19/io_uring-net-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKotMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmVwEACo7qBTjrrneZEwlYUWrSr45QtDNsQHPWjv aoK1dBLVH4ZjoZoOTI/aYcRgd5IJYo1P6I9tUrolM/+N3adM4UTEVC7i2PYDOaL3 WUm/YT2aSLiyHaHQON7SMyGSVU8kfM9YvJAGbj7ohalO9A2VVtHfUAmcAtBdgWqv Dl/Uu6vbogOl19xztAwN4nvwqljA+GUMnbHJ/oeASzrMzYMOdQ0q3UsQbEt+pTXt rBzv8fCsrKsT2uBc59Bi3eFKeBMM6ERzux/40TlqcOnXf3KUCK7nM4VaRgPbvXdt GOOYfYs+j9L8SSEedvdKyYNq4vVwWgYfTRAKMNB0FPiOaTGZuUthqkgRZGYY8AA9 +lJWxa+mzPmWEOmL+E44kt0OwtKDHX72ccEJUD7PHhTp0g87yKZfS6mXRNYLSxm7 IYt7N1x3cOp0lrwUTvLDnSPOTuYOSEiB2JZtfkf+y3SuI5SWowIcudKOuO5p7G1r IpAROsZrpHzMf/eniINoX3IrqBSqr254jzwq+9IgUaw/ky76oPYqM1dWP9BnVxCg PXgvfT5zj6xrU43TxTeIPU92JoAqhMeXi6dcyoiAAf9+8Vih+sbmLzAdJbYb5F2v G0ISy31+x/Goi43fQS59HzS/MNXJplcmy2mxKUYBT7/ZoJ2A26Q8SukTWD+U8sDn XIrV4HEOUQ== =PUw1 -----END PGP SIGNATURE----- Merge tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block Pull io_uring 'more data in socket' support from Jens Axboe: "To be able to fully utilize the 'poll first' support in the core io_uring branch, it's advantageous knowing if the socket was empty after a receive. This adds support for that" * tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block: io_uring: return hint on whether more data is available after receive tcp: pass back data left in socket after receive	2022-05-23 12:51:04 -07:00
Linus Torvalds	368da430d0	for-5.19/io_uring-socket-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKorgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpm0eEACdTzhm7h5cXn9KjIvWLkdocAb/NOL8GYPn Q1mY1SqKQFZvs/fyKHkkZEiIBPxhvN6snVFXMpb4LDmPYeeH4GTUlNomrGTIjvf/ j6SnZN4lCs9A2NlE+iDVWnFQOPQFALza2Y9BhC5xzay326qnKlO+0fQv3C1vXXrc /PNLqxQr7+GmO0a0PJnS6mGWGj6qF7nLqilB9apnKsTK6BKbJEec6ciKreqxU6ME WHaux11uIAbcf8rc6C/2myEK0k6jCOAue3vZ0lizygf+8klUCl2vMqV5BLwCBlXG /e7hBsUUrGr0CG0fryqhQQTUxsZLshioBbQH1vttSeZCli46mmWWAhPNy3/jb1ZU 72bazA84Fe9ney9uVZvZoMoBsG+6t6UOatqND13MeRFAXnkRr0jZRuau2iBxgqAr OINJW+IVPU7IrCD+S4lV1/LCdhLhYcob8/zfKmIrdHMQnWG/gLonVpYJIBCyLDAv 2jvHFIPJuSMUSGVjRKCb16LLNV6u7YG6VOWbKuippxfJxDdwA3TOtOhvTJIpYq0u TotPgpZ7bfcr4xDsGgD9mZS8E7jwsL/G0/MwsnixELykEXuhd++sgoTbr+RyUYdV 45Hm6DsxlytjzOb/5uQrqhwrso05eVt14K74XApPa3fWKL8aWCh1jGSdo3CSbIyW iHwss919Ag== =nb5i -----END PGP SIGNATURE----- Merge tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block Pull io_uring socket() support from Jens Axboe: "This adds support for socket(2) for io_uring. This is handy when using direct / registered file descriptors with io_uring. Outside of those two patches, a small series from Dylan on top that improves the tracing by providing a text representation of the opcode rather than needing to decode this by reading the header file every time. That sits in this branch as it was the last opcode added (until it wasn't...)" * tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block: io_uring: use the text representation of ops in trace io_uring: rename op -> opcode io_uring: add io_uring_get_opcode io_uring: add type to op enum io_uring: add socket(2) support net: add __sys_socket_file()	2022-05-23 12:42:33 -07:00
Linus Torvalds	09beaff75e	for-5.19/io_uring-xattr-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKopkQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgakEACktFtUBQLrYOXbM/mVxMpR//ht4e29E8k0 j/DkqK0yDKn9VkvDryALguH+ixNSI9Z4N7xSELLb/meNQsbJ7YdprL3xJn3BoUgs 3zx44janE8J3Q5TsXvD2z2jPIMaT892t5+5aLFYZqP1g+KDXI8T1WpHsETMkKfRG ZPeerUrd0fhtnDpViaaYbRutIEt8V8tsPhh0XG/4GojWjUW0FTsRKBSGuQ0sQnUr aJDfF5VylOjOBzRGimGZ23vJIgtZ8UEpX0T2MxR5V6ffj4cI8bCFQOrphh7yHxF5 f09xte80zX6pow5AivIpultZShR6IoQG5DIvF59woNP16uXy5yUyVTQvdnt8RlyY RjLd8ro9Gt4wBQGqckJLyY/o1FGhaQ8S99wOixUlpb9qKAOGmQZI97FQKFENqx/1 Xe+bP6QmTt9uCXsYPIFBtZaaEv2u0yjHOyERFUSzKJQUuPTa5Rmen0EXYXRhe5/E p+sR3Qbk1wzlW7UHuCT2gcaI67SAFG+yDv1U6BAaVdcS71i0WCA+Q2a6AuB+NJzg ER4+JRoeOnjEXSP2UPvIUBL1Komdj4R2hnrOK4S80R3yQ3NaadrWywhBn5HNcniM wE2P6J0erzRFqyfBw9tyNLsZwR1iS7JqSD9/NuBLoWwb42O0l+WgqqwDTSxMsde4 egKBaidRqg== =CfhD -----END PGP SIGNATURE----- Merge tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block Pull io_uring xattr support from Jens Axboe: "Support for the xattr variants" * tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block: io_uring: cleanup error-handling around io_req_complete io_uring: fix trace for reduced sqe padding io_uring: add fgetxattr and getxattr support io_uring: add fsetxattr and setxattr support fs: split off do_getxattr from getxattr fs: split off setxattr_copy and do_setxattr function from setxattr	2022-05-23 12:30:30 -07:00
Linus Torvalds	3a166bdbf3	for-5.19/io_uring-2022-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKol0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpn+sEACbdEQqG6OoCOhJ0ZuxTdQqNMGxCImKBxjP 8Bqf+0hYNgwfG+80/UQvmc7olb+KxvZ6KtrgViC/ujhvMQmX0Xf/881kiiKG/iHJ XKoL9PdqIkenIGnlyEp1uRmnUbooYF+s4iT6Gj/pjnn29GbcKjsPzKV1CUNkt3GC R+wpdKczHQDaSwzDY5Ntyjf68QUQOyUznkHW+6JOcBeih3ET7NfapR/zsFS93RlL B9pQ9NiBBQfzCAUycVyQMC+p/rJbKWgidAiFk4fXKRm8/7iNwT4dB0+oUymlECxt xvalRVK6ER1s4RSdQcUTZoQA+SrzzOnK1DYja9cvcLT3wH+aojana6S0rOMDi8wp hoWT5jdMaZN09Vcm7J4sBN15i50m9aDITp21PKOVDZXSMVsebltCL9phaN5+9x/j AfF6Vki1WTB4gYaDHR8v6UkW+HcF1WOmMdq8GB9UMfnTya6EJqAooYT9lhQBP/rv jxkdj9Fu98O87dOfy1Av9AxH1UB8d7ypCJKkSEMAUPoWf0rC9HjYr0cRq/yppAj8 pI/0PwXaXRfQuoHPqZyETrPel77VQdBw+Hg+6TS0KlTd3WlVEJMZJPtXK466IFLp pYSRVnSI9PuhiClOpxriTCw0cppfRIv11IerCxRziqH9S1zijk0VBCN40//XDs1o JfvoA6htKQ== =S+Uf -----END PGP SIGNATURE----- Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: "Here are the main io_uring changes for 5.19. This contains: - Fixes for sparse type warnings (Christoph, Vasily) - Support for multi-shot accept (Hao) - Support for io_uring managed fixed files, rather than always needing the applicationt o manage the indices (me) - Fix for a spurious poll wakeup (Dylan) - CQE overflow fixes (Dylan) - Support more types of cancelations (me) - Support for co-operative task_work signaling, rather than always forcing an IPI (me) - Support for doing poll first when appropriate, rather than always attempting a transfer first (me) - Provided buffer cleanups and support for mapped buffers (me) - Improve how io_uring handles inflight SCM files (Pavel) - Speedups for registered files (Pavel, me) - Organize the completion data in a struct in io_kiocb rather than keep it in separate spots (Pavel) - task_work improvements (Pavel) - Cleanup and optimize the submission path, in general and for handling links (Pavel) - Speedups for registered resource handling (Pavel) - Support sparse buffers and file maps (Pavel, me) - Various fixes and cleanups (Almog, Pavel, me)" * tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits) io_uring: fix incorrect __kernel_rwf_t cast io_uring: disallow mixed provided buffer group registrations io_uring: initialize io_buffer_list head when shared ring is unregistered io_uring: add fully sparse buffer registration io_uring: use rcu_dereference in io_close io_uring: consistently use the EPOLL* defines io_uring: make apoll_events a __poll_t io_uring: drop a spurious inline on a forward declaration io_uring: don't use ERR_PTR for user pointers io_uring: use a rwf_t for io_rw.flags io_uring: add support for ring mapped supplied buffers io_uring: add io_pin_pages() helper io_uring: add buffer selection support to IORING_OP_NOP io_uring: fix locking state for empty buffer group io_uring: implement multishot mode for accept io_uring: let fast poll support multishot io_uring: add REQ_F_APOLL_MULTISHOT for requests io_uring: add IORING_ACCEPT_MULTISHOT for accept io_uring: only wake when the correct events are set io_uring: avoid io-wq -EAGAIN looping for !IOPOLL ...	2022-05-23 12:22:49 -07:00
David Howells	2aeb8c86d4	afs: Fix afs_getattr() to refetch file status if callback break occurred If a callback break occurs (change notification), afs_getattr() needs to issue an FS.FetchStatus RPC operation to update the status of the file being examined by the stat-family of system calls. Fix afs_getattr() to do this if AFS_VNODE_CB_PROMISED has been cleared on a vnode by a callback break. Skip this if AT_STATX_DONT_SYNC is set. This can be tested by appending to a file on one AFS client and then using "stat -L" to examine its length on a machine running kafs. This can also be watched through tracing on the kafs machine. The callback break is seen: kworker/1:1-46 [001] ..... 978.910812: afs_cb_call: c=0000005f YFSCB.CallBack kworker/1:1-46 [001] ...1. 978.910829: afs_cb_break: 100058:23b4c:242d2c2 b=2 s=1 break-cb kworker/1:1-46 [001] ..... 978.911062: afs_call_done: c=0000005f ret=0 ab=0 [0000000082994ead] And then the stat command generated no traffic if unpatched, but with this change a call to fetch the status can be observed: stat-4471 [000] ..... 986.744122: afs_make_fs_call: c=000000ab 100058:023b4c:242d2c2 YFS.FetchStatus stat-4471 [000] ..... 986.745578: afs_call_done: c=000000ab ret=0 ab=0 [0000000087fc8c84] Fixes: `08e0e7c82e` ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Tested-by: Markus Suvanto <markus.suvanto@gmail.com> Tested-by: kafs-testing+fedora34_64checkkafs-build-496@auristor.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216010 Link: https://lore.kernel.org/r/165308359800.162686.14122417881564420962.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-05-22 09:25:47 -10:00
Jens Axboe	3fe07bcd80	io_uring: cleanup handling of the two task_work lists Rather than pass in a bool for whether or not this work item needs to go into the priority list or not, provide separate helpers for it. For most use cases, this also then gets rid of the branch for non-priority task work. While at it, rename the prior_task_list to prio_task_list. Prior is a confusing name for it, as it would seem to indicate that this is the previous task_work list. prio makes it clear that this is a priority task_work list. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-21 09:17:05 -06:00
Zhihao Cheng	68f4c6eba7	fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages Commit `505a666ee3` ("writeback: plug writeback in wb_writeback() and writeback_inodes_wb()") has us holding a plug during wb_writeback, which may cause a potential ABBA dead lock: wb_writeback fat_file_fsync blk_start_plug(&plug) for (;;) { iter i-1: some reqs have been added into plug->mq_list // LOCK A iter i: progress = __writeback_inodes_wb(wb, work) . writeback_sb_inodes // fat's bdev . __writeback_single_inode . . generic_writepages . . __block_write_full_page . . . . __generic_file_fsync . . . . sync_inode_metadata . . . . writeback_single_inode . . . . __writeback_single_inode . . . . fat_write_inode . . . . __fat_write_inode . . . . sync_dirty_buffer // fat's bdev . . . . lock_buffer(bh) // LOCK B . . . . submit_bh . . . . blk_mq_get_tag // LOCK A . . . trylock_buffer(bh) // LOCK B . . . redirty_page_for_writepage . . . wbc->pages_skipped++ . . --wbc->nr_to_write . wrote += write_chunk - wbc.nr_to_write // wrote > 0 . requeue_inode . redirty_tail_locked if (progress) // progress > 0 continue; iter i+1: queue_io // similar process with iter i, infinite for-loop ! } blk_finish_plug(&plug) // flush plug won't be called Above process triggers a hungtask like: [ 399.044861] INFO: task bb:2607 blocked for more than 30 seconds. [ 399.046824] Not tainted 5.18.0-rc1-00005-gefae4d9eb6a2-dirty [ 399.051539] task:bb state:D stack: 0 pid: 2607 ppid: 2426 flags:0x00004000 [ 399.051556] Call Trace: [ 399.051570] __schedule+0x480/0x1050 [ 399.051592] schedule+0x92/0x1a0 [ 399.051602] io_schedule+0x22/0x50 [ 399.051613] blk_mq_get_tag+0x1d3/0x3c0 [ 399.051640] __blk_mq_alloc_requests+0x21d/0x3f0 [ 399.051657] blk_mq_submit_bio+0x68d/0xca0 [ 399.051674] __submit_bio+0x1b5/0x2d0 [ 399.051708] submit_bio_noacct+0x34e/0x720 [ 399.051718] submit_bio+0x3b/0x150 [ 399.051725] submit_bh_wbc+0x161/0x230 [ 399.051734] __sync_dirty_buffer+0xd1/0x420 [ 399.051744] sync_dirty_buffer+0x17/0x20 [ 399.051750] __fat_write_inode+0x289/0x310 [ 399.051766] fat_write_inode+0x2a/0xa0 [ 399.051783] __writeback_single_inode+0x53c/0x6f0 [ 399.051795] writeback_single_inode+0x145/0x200 [ 399.051803] sync_inode_metadata+0x45/0x70 [ 399.051856] __generic_file_fsync+0xa3/0x150 [ 399.051880] fat_file_fsync+0x1d/0x80 [ 399.051895] vfs_fsync_range+0x40/0xb0 [ 399.051929] __x64_sys_fsync+0x18/0x30 In my test, 'need_resched()' (which is imported by `590dca3a71` "fs-writeback: unplug before cond_resched in writeback_sb_inodes") in function 'writeback_sb_inodes()' seldom comes true, unless cond_resched() is deleted from write_cache_pages(). Fix it by correcting wrote number according number of skipped pages in writeback_sb_inodes(). Goto Link to find a reproducer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215837 Cc: stable@vger.kernel.org # v4.3 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220510133805.1988292-1-chengzhihao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-19 06:29:41 -06:00
Linus Torvalds	01464a73a6	io_uring-5.18-2022-05-18 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKFeygQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsXxD/9dnVf1LCfVR3wQpV/O7o1u5zRpjTSFetmS qGw4kMDUtfE63oCLzGW/WpmQLrHduoTa32Ms/v0vU1+1MixUHatzRzN5OF2Nv7Wv Ukc6JtH2PjcYFNouAFJ8r/H1uanrr9DOK7R/9n3t9CwOIoand+PUmZ0rIHVN9u0j KtpousBih66c7tajADqSxKb1uPlDrgTkzfAYVj5O4OwBAiIwY2CLJfvbLFTdoHP9 bdCrsvo9R08QRH9PedPqctdQURBxfzd0kXm3N8+d4r4GEQvbaGI+6bN88VYW2aEa VfBCZD4A8/VupcGWWhAnKu0rOwxgrh2jDEd5g+0g/f3SRTY2LBUJAlEMM7N043vX MtAuO05E3/Svr5doAwAJsp44h6SvS2s5ogvPIYjEUvY4JK2UDdWOcYrb7HqXrNBi 0jNsgUu4+N89i6jiewf0dPw6BKK7Cf9322fGy4X1dQ77gIkDWt4Z55iV46aSX3Y8 dQt2Jaoj0c/wRkOgZtrDi7FmdM0xcMzGMB/oOyI07d3ERjbyZuDTFowWh/kqd1xA 6vsrkyCQaGilmDu9xTywPSfarKoxIrX+TgZj5rnnJpv2tENJSx/o8yKWcu9wnO1L gnanzpwYW4qIOjkWKvIBLBHXXA6Qp8aXacif8rODUhIuABQpe85b+btpHeqO9lWL a3wS9Gc2nQ== =8Xxv -----END PGP SIGNATURE----- Merge tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two small changes fixing issues from the 5.18 merge window: - Fix wrong ordering of a tracepoint (Dylan) - Fix MSG_RING on IOPOLL rings (me)" * tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block: io_uring: don't attempt to IOPOLL for MSG_RING requests io_uring: fix ordering of args in io_uring_queue_async_work	2022-05-18 14:21:30 -10:00
Jens Axboe	2fcabce2d7	io_uring: disallow mixed provided buffer group registrations It's nonsensical to register a provided buffer ring, if a classic provided buffer group with the same ID exists. Depending on the order of which we decide what type to pick, the other type will never get used. Explicitly disallow it and return an error if this is attempted. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 16:21:30 -06:00
Jens Axboe	1d0dbbfa28	io_uring: initialize io_buffer_list head when shared ring is unregistered We use ->buf_pages != 0 to tell if this is a shared buffer ring or a classic provided buffer group. If we unregister the shared ring and then attempt to use it, buf_pages is zero yet the classic list head isn't properly initialized. This causes io_buffer_select() to think that we have classic buffers available, but then we crash when we try and get one from the list. Just initialize the list if we unregister a shared buffer ring, leaving it in a sane state for either re-registration or for attempting to use it. And do the same for the initial setup from the classic path. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 16:21:11 -06:00
Pavel Begunkov	0184f08e65	io_uring: add fully sparse buffer registration Honour IORING_RSRC_REGISTER_SPARSE not only for direct files but fixed buffers as well. It makes the rsrc API more consistent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66f429e4912fe39fb3318217ff33a2853d4544be.1652879898.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 12:53:04 -06:00
Christoph Hellwig	0bf1dbee9b	io_uring: use rcu_dereference in io_close Accessing the file table needs a rcu_dereference_protected(). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	a294bef57c	io_uring: consistently use the EPOLL* defines POLL* are unannotated values for the userspace ABI, while everything in-kernel should use EPOLL* and the __poll_t type. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	58f5c8d39e	io_uring: make apoll_events a __poll_t apoll_events is fed to vfs_poll and the poll tables, so it should be a __poll_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	ee67ba3b20	io_uring: drop a spurious inline on a forward declaration io_file_get_normal isn't marked inline, so don't claim it as such in the forward declaration. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:19:05 -06:00
Christoph Hellwig	984824db84	io_uring: don't use ERR_PTR for user pointers ERR_PTR abuses the high bits of a pointer to transport error information. This is only safe for kernel pointers and not user pointers. Fix io_buffer_select and its helpers to just return NULL for failure and get rid of this abuse. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:18:56 -06:00
Christoph Hellwig	20cbd21d89	io_uring: use a rwf_t for io_rw.flags Use the proper type. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220518084005.3255380-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:17:52 -06:00
Jens Axboe	c7fb19428d	io_uring: add support for ring mapped supplied buffers Provided buffers allow an application to supply io_uring with buffers that can then be grabbed for a read/receive request, when the data source is ready to deliver data. The existing scheme relies on using IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use in real world applications. It's pretty efficient if the application is able to supply back batches of provided buffers when they have been consumed and the application is ready to recycle them, but if fragmentation occurs in the buffer space, it can become difficult to supply enough buffers at the time. This hurts efficiency. Add a register op, IORING_REGISTER_PBUF_RING, which allows an application to setup a shared queue for each buffer group of provided buffers. The application can then supply buffers simply by adding them to this ring, and the kernel can consume then just as easily. The ring shares the head with the application, the tail remains private in the kernel. Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the ring, they must use the mapped ring. Mapped provided buffer rings can co-exist with normal provided buffers, just not within the same group ID. To gauge overhead of the existing scheme and evaluate the mapped ring approach, a simple NOP benchmark was written. It uses a ring of 128 entries, and submits/completes 32 at the time. 'Replenish' is how many buffers are provided back at the time after they have been consumed: Test Replenish NOPs/sec ================================================================ No provided buffers NA ~30M Provided buffers 32 ~16M Provided buffers 1 ~10M Ring buffers 32 ~27M Ring buffers 1 ~27M The ring mapped buffers perform almost as well as not using provided buffers at all, and they don't care if you provided 1 or more back at the same time. This means application can just replenish as they go, rather than need to batch and compact, further reducing overhead in the application. The NOP benchmark above doesn't need to do any compaction, so that overhead isn't even reflected in the above test. Co-developed-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:42 -06:00
Jens Axboe	d8c2237d0a	io_uring: add io_pin_pages() helper Abstract this out from io_sqe_buffer_register() so we can use it elsewhere too without duplicating this code. No intended functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:42 -06:00
Jens Axboe	3d200242a6	io_uring: add buffer selection support to IORING_OP_NOP Obviously not really useful since it's not transferring data, but it is helpful in benchmarking overhead of provided buffers. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:42 -06:00
Jens Axboe	e7637a492b	io_uring: fix locking state for empty buffer group io_provided_buffer_select() must drop the submit lock, if needed, even in the error handling case. Failure to do so will leave us with the ctx->uring_lock held, causing spew like: ==================================== WARNING: iou-wrk-366/368 still has locks held! 5.18.0-rc6-00294-gdf8dc7004331 #994 Not tainted ------------------------------------ 1 lock held by iou-wrk-366/368: #0: ffff0000c72598a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_ring_submit_lock+0x20/0x48 stack backtrace: CPU: 4 PID: 368 Comm: iou-wrk-366 Not tainted 5.18.0-rc6-00294-gdf8dc7004331 #994 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0xa4/0xd4 show_stack+0x14/0x5c dump_stack_lvl+0x88/0xb0 dump_stack+0x14/0x2c debug_check_no_locks_held+0x84/0x90 try_to_freeze.isra.0+0x18/0x44 get_signal+0x94/0x6ec io_wqe_worker+0x1d8/0x2b4 ret_from_fork+0x10/0x20 and triggering later hangs off get_signal() because we attempt to re-grab the lock. Reported-by: syzbot+987d7bb19195ae45208c@syzkaller.appspotmail.com Fixes: `149c69b04a` ("io_uring: abstract out provided buffer list selection") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-18 06:12:41 -06:00
Jens Axboe	aa184e8671	io_uring: don't attempt to IOPOLL for MSG_RING requests We gate whether to IOPOLL for a request on whether the opcode is allowed on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING is the only one that allows a file yet isn't pollable, it's merely supported to allow communication on an IOPOLL ring, not because we can poll for completion of it. Put the assigned file early and clear it, so we don't attempt to poll for it. Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com Fixes: `3f1d52abf0` ("io_uring: defer msg-ring file validity check until command issue") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-17 12:46:04 -06:00
Hao Xu	4e86a2c980	io_uring: implement multishot mode for accept Refactor io_accept() to support multishot mode. theoretical analysis: 1) when connections come in fast - singleshot: add accept sqe(userspace) --> accept inline ^ \| \|-----------------\| - multishot: add accept sqe(userspace) --> accept inline ^ \| \|----\| we do accept repeatedly in place until get EAGAIN 2) when connections come in at a low pressure similar thing like 1), we reduce a lot of userspace-kernel context switch and useless vfs_poll() tests: Did some tests, which goes in this way: server client(multiple) accept connect read write write read close close Basically, raise up a number of clients(on same machine with server) to connect to the server, and then write some data to it, the server will write those data back to the client after it receives them, and then close the connection after write return. Then the client will read the data and then close the connection. Here I test 10000 clients connect one server, data size 128 bytes. And each client has a go routine for it, so they come to the server in short time. test 20 times before/after this patchset, time spent:(unit cycle, which is the return value of clock()) before: 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075 +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984 +1934226+1914385)/20.0 = 1927633.75 after: 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112 +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506 +1871324+1940803)/20.0 = 1894750.45 (1927633.75 - 1894750.45) / 1927633.75 = 1.65% Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-5-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-14 08:34:22 -06:00
Hao Xu	dbc2564cfe	io_uring: let fast poll support multishot For operations like accept, multishot is a useful feature, since we can reduce a number of accept sqe. Let's integrate it to fast poll, it may be good for other operations in the future. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-4-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-14 08:34:22 -06:00
Hao Xu	227685ebfa	io_uring: add REQ_F_APOLL_MULTISHOT for requests Add a flag to indicate multishot mode for fast poll. currently only accept use it, but there may be more operations leveraging it in the future. Also add a mask IO_APOLL_MULTI_POLLED which stands for REQ_F_APOLL_MULTI \| REQ_F_POLLED, to make the code short and cleaner. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220514142046.58072-3-haoxu.linux@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-14 08:34:22 -06:00
Linus Torvalds	d928e8f3af	gfs2 fixes - Fix filesystem block deallocation for short writes. - Stop using glock holder auto-demotion for now. - Get rid of buffered writes inefficiencies due to page faults being disabled. - Minor other cleanups. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJ+wL4UHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqNkRAAuBc4oJ3Wkp/dkfF6MDY9DXqCzhQo TodFIdEQvOUtncYBljE6DZQ9MT1sRvwxDe0nIjErFQzHYcW88zczILWOBRhrhlci kANL6JtjSvtE3kecvR9I3nZX44aDETJliV3FX8n7vDSNMTIwjzW38d0XmDLX9t8F bTEFv3rKsUzgcGaxpeQe7mzoQPi910WFPO/pos2ghuZwB1QEpdBrCESz4XB+OwKM +V+8nooHvYp8T+2AzrBM6hBgsYelrBPRXlz6cYEjWY2FQuvQk/thX+zO2dvXCsPA uNWJTCKJEsufWflPNI2ugZ9TVneIU1umGACoEHteeRvG6Qsg6K4Kf0EJEFf6Y0Tg PKUJLUcdi0suS8UuUCBTAVAgCv9+ueLhIbbFeRkbHjxSjET7NQl2gbkfY2V1RsFt yNFLMGU01xsb1YylncY4xQc9WVMDbPsdv1KGDF75njchuHZXhfg00ezPQys4Uy5R 0EUwqoPYNePV6VsoHLbU+kImf936VawP916yDiyflyz6UFSi5vNg7SwMqDrXpIxM T8nNgrTC+Npv7T2xc8JhSFGv9T86nZXWjQTpDzV8onPvxdCLCT1cmm352aHqXd7+ wY9ZFJZ4iMoinYUkEzgySQW00+wK/AThQKQ6ImhLEvwxMUc6dJUnesVGkzLDGh9Z KSfqgYzmlb2YdKY= =FJq+ -----END PGP SIGNATURE----- Merge tag 'gfs2-v5.18-rc4-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "We've finally identified commit `dc732906c2` ("gfs2: Introduce flag for glock holder auto-demotion") to be the other cause of the filesystem corruption we've been seeing. This feature isn't strictly necessary anymore, so we've decided to stop using it for now. With this and the gfs_iomap_end rounding fix you've already seen ("gfs2: Fix filesystem block deallocation for short writes" in this pull request), we're corruption free again now. - Fix filesystem block deallocation for short writes. - Stop using glock holder auto-demotion for now. - Get rid of buffered writes inefficiencies due to page faults being disabled. - Minor other cleanups" * tag 'gfs2-v5.18-rc4-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Stop using glock holder auto-demotion for now gfs2: buffered write prefaulting gfs2: Align read and write chunks to the page cache gfs2: Pull return value test out of should_fault_in_pages gfs2: Clean up use of fault_in_iov_iter_{read,write}able gfs2: Variable rename gfs2: Fix filesystem block deallocation for short writes	2022-05-13 14:32:53 -07:00
Dylan Yudaken	1b1d7b4bf1	io_uring: only wake when the correct events are set The check for waking up a request compares the poll_t bits, however this will always contain some common flags so this always wakes up. For files with single wait queues such as sockets this can cause the request to be sent to the async worker unnecesarily. Further if it is non-blocking will complete the request with EAGAIN which is not desired. Here exclude these common events, making sure to not exclude POLLERR which might be important. Fixes: `d7718a9d25` ("io_uring: use poll driven retry for files that support it") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220512091834.728610-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 14:37:50 -06:00
Andreas Gruenbacher	e1fa9ea85c	gfs2: Stop using glock holder auto-demotion for now We're having unresolved issues with the glock holder auto-demotion mechanism introduced in commit `dc732906c2`. This mechanism was assumed to be essential for avoiding frequent short reads and writes until commit `296abc0d91` ("gfs2: No short reads or writes upon glock contention"). Since then, when the inode glock is lost, it is simply re-acquired and the operation is resumed. This means that apart from the performance penalty, we might as well drop the inode glock before faulting in pages, and re-acquire it afterwards. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 22:32:52 +02:00
Andreas Gruenbacher	fa5dfa645d	gfs2: buffered write prefaulting In gfs2_file_buffered_write, to increase the likelihood that all the user memory we're trying to write will be resident in memory, carry out the write in chunks and fault in each chunk of user memory before trying to write it. Otherwise, some workloads will trigger frequent short "internal" writes, causing filesystem blocks to be allocated and then partially deallocated again when writing into holes, which is wasteful and breaks reservations. Neither the chunked writes nor any of the short "internal" writes are user visible. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 22:32:52 +02:00
Andreas Gruenbacher	324d116c5a	gfs2: Align read and write chunks to the page cache Align the chunks that reads and writes are carried out in to the page cache rather than the user buffers. This will be more efficient in general, especially for allocating writes. Optimizing the case that the user buffer is gfs2 backed isn't very useful; we only need to make sure we won't deadlock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 22:00:22 +02:00
Andreas Gruenbacher	7238226450	gfs2: Pull return value test out of should_fault_in_pages Pull the return value test of the previous read or write operation out of should_fault_in_pages(). In a following patch, we'll fault in pages before the I/O and there will be no return value to check. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 22:00:22 +02:00
Andreas Gruenbacher	6d22ff4710	gfs2: Clean up use of fault_in_iov_iter_{read,write}able No need to store the return value of the fault_in functions in separate variables. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 22:00:22 +02:00
Andreas Gruenbacher	42e4c3bdca	gfs2: Variable rename Instead of counting the number of bytes read from the filesystem, functions gfs2_file_direct_read and gfs2_file_read_iter count the number of bytes written into the user buffer. Conversely, functions gfs2_file_direct_write and gfs2_file_buffered_write count the number of bytes read from the user buffer. This is nothing but confusing, so change the read functions to count how many bytes they have read, and the write functions to count how many bytes they have written. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 22:00:22 +02:00
Andreas Gruenbacher	d031a8866e	gfs2: Fix filesystem block deallocation for short writes When a write cannot be carried out in full, gfs2_iomap_end() releases blocks that have been allocated for this write but haven't been used. To compute the end of the allocation, gfs2_iomap_end() incorrectly rounded the end of the attempted write down to the next block boundary to arrive at the end of the allocation. It would have to round up, but the end of the allocation is also available as iomap->offset + iomap->length, so just use that instead. In addition, use round_up() for computing the start of the unused range. Fixes: `64bc06bb32` ("gfs2: iomap buffered write support") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2022-05-13 21:59:18 +02:00
Linus Torvalds	c3f5e692bf	Two fixes to properly maintain xattrs on async creates and thus preserve SELinux context on newly created files and to avoid improper usage of folio->private field which triggered BUG_ONs, both marked for stable. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmJ+kC4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi0dJB/9tCG5VsOa8wyh/eR6HZyAHAGtXEnPB qUn+bOvsRFw65GorwvbQw5fgXrQSaxzEgYuOEuCKOCA1KUxpYTGwYMhZIGNS0ZQm 251MyfGl3+aahe3H2pAkbuk8HnUq+989BPiRzwpTi0vZMDZNy1nyuE7VK+Nw/77w E+swcFeyvmimKSlXURH6LQFzamxcnTdFoGRL1cc+97xOFSmbq3JSS+ZhrHhlZEyi 7C5tIfz8BtNGswCh2HfHhmFWxBAe0GWll6SqvKq2go/1Cg4UUiCn5BMsGCAYnL3G eFnt3n7b3NARfSdoI+OCWa5yVERRDy7c4zxzyqdIxLKI1LVhXRhHqdaR =73K4 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.18-rc7' of https://github.com/ceph/ceph-client Pull ceph fix from Ilya Dryomov: "Two fixes to properly maintain xattrs on async creates and thus preserve SELinux context on newly created files and to avoid improper usage of folio->private field which triggered BUG_ONs. Both marked for stable" * tag 'ceph-for-5.18-rc7' of https://github.com/ceph/ceph-client: ceph: check folio PG_private bit instead of folio->private ceph: fix setting of xattrs on async created inodes	2022-05-13 11:12:04 -07:00
Linus Torvalds	6dd5884d1d	NFS client bugfixes for Linux 5.18 Highlights include: Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJ+j5kACgkQZwvnipYK APIQag/+JXtDt9xo0SsGTMJ2PJArnRoyd3QcZjJtoabaZTylZyILZ20sxvt/jgby miOKrI+bbsykbrwQijRLF/Yys1G+iMBzSWy4lpJ9eH4AVkfjr5qqauRbPobo/ZiI i5fDLR84FlzKYWBY1Nv6YE5VREukIlXbrq7KWe/HoS7/hSAamhkUv+a0M8iNLGO4 QtH7+M0iBZTI9yM4gFAcMAANV21SxvjqP1z62kbCp00qO2mL0PgF/2pxC9WgX24/ EZX037ykzKFkjgWfzT8+/eIfCQGIPi/9e6Ir4Bc99psVFOYd1YxkTLBycNwm1VOz 5RLORbURDVMPQH2/qZ57u7B0gJF76UZM4pH3gv+i9nFhoUqf3kFZAOy48MZEFz3X sPiQZLck63mvO9Bd3QX6pFZc0datiYmhuXRknjxV6Bz/Y41y3NzeJeX4r88s4q7l tisZDmaIm0Q+H07QOTL0aCk456amP6XLnO1+nu/PSR/3ImwJSaOpypHx/BxDJUTG TvpY1ouBZ8irfT6JrbfVnSdbedIZx4c+6btVw6mlT40edZF6M3r+3s8s2TU7x+co uiBMB4Qj/C19zqcf/DziiL1PZEJPm3lk0fBHc6JIWCV4I3eYxQ5J4nFJsy8/jPm1 lTgreoHWfnYC3WhlxUk+N3+9X/9+iFEJUxqGYfANf8GFaGqcYVQ= =/Yg+ -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "One more pull request. There was a bug in the fix to ensure that gss- proxy continues to work correctly after we fixed the AF_LOCAL socket leak in the RPC code. This therefore reverts that broken patch, and replaces it with one that works correctly. Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option" * tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix broken handling of the softreval mount option SUNRPC: Ensure that the gssproxy client can start in a connected state Revert "SUNRPC: Ensure gss-proxy connects on setup"	2022-05-13 11:04:37 -07:00
Linus Torvalds	364a453ab9	hotfixes for 5.18-rc7 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYnvwxgAKCRDdBJ7gKXxA jhymAQDvHnFT3F5ydvBqApbzrQRUk/+fkkQSrF/xYawknZNgkAEA6Tnh9XqYplJN bbmml6HTVvDjprEOCGakY/Kyz7qmdQ0= =SMJQ -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Seven MM fixes, three of which address issues added in the most recent merge window, four of which are cc:stable. Three non-MM fixes, none very serious" [ And yes, that's a real pull request from Andrew, not me creating a branch from emailed patches. Woo-hoo! ] * tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add a mailing list for DAMON development selftests: vm: Makefile: rename TARGETS to VMTARGETS mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool mailmap: add entry for martyna.szapar-mudlaw@intel.com arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map procfs: prevent unprivileged processes accessing fdinfo dir mm: mremap: fix sign for EFAULT error return value mm/hwpoison: use pr_err() instead of dump_page() in get_any_page() mm/huge_memory: do not overkill when splitting huge_zero_page Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"	2022-05-13 10:22:37 -07:00
Pavel Begunkov	e0deb6a025	io_uring: avoid io-wq -EAGAIN looping for !IOPOLL If an opcode handler semi-reliably returns -EAGAIN, io_wq_submit_work() might continue busily hammer the same handler over and over again, which is not ideal. The -EAGAIN handling in question was put there only for IOPOLL, so restrict it to IOPOLL mode only where there is no other recourse than to retry as we cannot wait. Fixes: `def596e955` ("io_uring: support for IO polling") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f168b4f24181942f3614dd8ff648221736f572e6.1652433740.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:50:42 -06:00
Jens Axboe	a8da73a32b	io_uring: add flag for allocating a fully sparse direct descriptor space Currently to setup a fully sparse descriptor space upfront, the app needs to alloate an array of the full size and memset it to -1 and then pass that in. Make this a bit easier by allowing a flag that simply does this internally rather than needing to copy each slot separately. This works with IORING_REGISTER_FILES2 as the flag is set in struct io_uring_rsrc_register, and is only allow when the type is IORING_RSRC_FILE as this doesn't make sense for registered buffers. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:50 -06:00
Jens Axboe	09893e15f1	io_uring: bump max direct descriptor count to 1M We currently limit these to 32K, but since we're now backing the table space with vmalloc when needed, there's no reason why we can't make it bigger. The total space is limited by RLIMIT_NOFILE as well. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:46 -06:00
Jens Axboe	c30c3e00cb	io_uring: allow allocated fixed files for accept If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot, then that's a hint to allocate a fixed file descriptor rather than have one be passed in directly. This can be useful for having io_uring manage the direct descriptor space, and also allows multi-shot support to work with fixed files. Normal accept direct requests will complete with 0 for success, and < 0 in case of error. If io_uring is asked to allocated the direct descriptor, then the direct descriptor is returned in case of success. Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-05-13 06:28:43 -06:00

1 2 3 4 5 ...

75900 Commits