linux

iv/linux

Author	SHA1	Message	Date
Shin'ichiro Kawasaki	0a5aa8d161	block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection Commit `9d497e2941` ("block: don't protect submit_bio_checks by q_usage_counter") moved blk_mq_attempt_bio_merge and rq_qos_throttle calls out of q_usage_counter protection. However, these functions require q_usage_counter protection. The blk_mq_attempt_bio_merge call without the protection resulted in blktests block/005 failure with KASAN null- ptr-deref or use-after-free at bio merge. The rq_qos_throttle call without the protection caused kernel hang at qos throttle. To fix the failures, move the blk_mq_attempt_bio_merge and rq_qos_throttle calls back to q_usage_counter protection. Fixes: `9d497e2941` ("block: don't protect submit_bio_checks by q_usage_counter") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20220308080915.3473689-1-shinichiro.kawasaki@wdc.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-03-08 17:48:39 -07:00
Stefano Garzarella	bb49c6fa8b	block: clear iocb->private in blkdev_bio_end_io_async() iocb_bio_iopoll() expects iocb->private to be cleared before releasing the bio. We already do this in blkdev_bio_end_io(), but we forgot in the recently added blkdev_bio_end_io_async(). Fixes: `54a88eb838` ("block: add single bio async direct IO helper") Cc: asml.silence@gmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-22 06:59:49 -07:00
Laibin Qiu	e92bc4cd34	block/wbt: fix negative inflight counter when remove scsi device Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in wbt_disable_default() when switch elevator to bfq. And when we remove scsi device, wbt will be enabled by wbt_enable_default. If it become false positive between wbt_wait() and wbt_track() when submit write request. The following is the scenario that triggered the problem. T1 T2 T3 elevator_switch_mq bfq_init_queue wbt_disable_default <= Set rwb->enable_state (OFF) Submit_bio blk_mq_make_request rq_qos_throttle <= rwb->enable_state (OFF) scsi_remove_device sd_remove del_gendisk blk_unregister_queue elv_unregister_queue wbt_enable_default <= Set rwb->enable_state (ON) q_qos_track <= rwb->enable_state (ON) ^^^^^^ this request will mark WBT_TRACKED without inflight add and will lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung. Fix this by move wbt_enable_default() from elv_unregister to bfq_exit_queue(). Only re-enable wbt when bfq exit. Fixes: `76a8040817` ("blk-wbt: make sure throttle is enabled properly") Remove oneline stale comment, and kill one oneshot local variable. Signed-off-by: Ming Lei <ming.lei@rehdat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/ Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-17 07:54:03 -07:00
Christoph Hellwig	7a5428dcb7	block: fix surprise removal for drivers calling blk_set_queue_dying Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit `8e141f9eb8` that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: `8e141f9eb8` ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-17 07:54:03 -07:00
Haimin Zhang	cc8f7fe1f5	block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize the buffer of a bio. Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-17 07:54:03 -07:00
Pankaj Raghav	a12821d5e0	block: Add handling for zone append command in blk_complete_request Zone append command needs special handling to update the bi_sector field in the bio struct with the actual position of the data in the device. It is stored in __sector field of the request struct. Fixes: `5581a5ddfe` ("block: add completion handler for fast path") Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220211093425.43262-2-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-11 09:56:13 -07:00
Martin K. Petersen	b13e0c7185	block: bio-integrity: Advance seed correctly for larger interval sizes Commit `309a62fa3a` ("bio-integrity: bio_integrity_advance must update integrity seed") added code to update the integrity seed value when advancing a bio. However, it failed to take into account that the integrity interval might be larger than the 512-byte block layer sector size. This broke bio splitting on PI devices with 4KB logical blocks. The seed value should be advanced by bio_integrity_intervals() and not the number of sectors. Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org Fixes: `309a62fa3a` ("bio-integrity: bio_integrity_advance must update integrity seed") Tested-by: Dmitry Ivanov <dmitry.ivanov2@hpe.com> Reported-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220204034209.4193-1-martin.petersen@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-03 21:09:24 -07:00
Ilya Dryomov	3e1f941dd9	block: fix DIO handling regressions in blkdev_read_iter() Commit `ceaa762527` ("block: move direct_IO into our own read_iter handler") introduced several regressions for bdev DIO: 1. read spanning EOF always returns 0 instead of the number of bytes read. This is because "count" is assigned early and isn't updated when the iterator is truncated: $ lsblk -o name,size /dev/vdb NAME SIZE vdb 1G $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 0/4194304 bytes at offset 1070596096 0.000000 bytes, 0 ops; 0.0007 sec (0.000000 bytes/sec and 0.0000 ops/sec) instead of $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 3145728/4194304 bytes at offset 1070596096 3 MiB, 1 ops; 0.0007 sec (3.865 GiB/sec and 1319.2612 ops/sec) 2. truncated iterator isn't reexpanded 3. iterator isn't reverted on blkdev_direct_IO() error 4. zero size read no longer skips atime update Fixes: `ceaa762527` ("block: move direct_IO into our own read_iter handler") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220201100420.25875-1-idryomov@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:48:27 -07:00
Mike Snitzer	e45c47d1f9	block: add bio_start_io_acct_time() to control start_time bio_start_io_acct_time() interface is like bio_start_io_acct() that allows start_time to be passed in. This gives drivers the ability to defer starting accounting until after IO is issued (but possibily not entirely due to bio splitting). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-28 12:28:15 -07:00
Yu Kuai	592ee1197f	blk-mq: fix missing blk_account_io_done() in error path If blk_mq_request_issue_directly() failed from blk_insert_cloned_request(), the request will be accounted start. Currently, blk_insert_cloned_request() is only called by dm, and such request won't be accounted done by dm. In normal path, io will be accounted start from blk_mq_bio_to_request(), when the request is allocated, and such io will be accounted done from __blk_mq_end_request_acct() whether it succeeded or failed. Thus add blk_account_io_done() to fix the problem. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126012132.3111551-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-26 06:34:41 -07:00
Miaoqian Lin	83114df32a	block: fix memory leak in disk_register_independent_access_ranges kobject_init_and_add() takes reference even when it fails. According to the doc of kobject_init_and_add() If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Fix this issue by adding kobject_put(). Callback function blk_ia_ranges_sysfs_release() in kobject_put() can handle the pointer "iars" properly. Fixes: `a2247f19ee` ("block: Add independent access ranges support") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220120101025.22411-1-linmq006@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-23 09:13:09 -07:00
Linus Torvalds	3689f9f8b0	Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first__bit() instead of find_next__bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each__bit_from() with for_each__bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly	2022-01-23 06:20:44 +02:00
Christoph Hellwig	0a4ee51818	mm: remove cleancache Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit `814bbf49dc` ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-22 08:33:38 +02:00
Linus Torvalds	3c7c25038b	Merge tag 'block-5.17-2022-01-21' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Various little minor fixes that should go into this release: - Fix issue with cloned bios and IO accounting (Christoph) - Remove redundant assignments (Colin, GuoYong) - Fix an issue with the mq-deadline async_depth sysfs interface (me) - Fix brd module loading race (Tetsuo) - Shared tag map wakeup fix (Laibin) - End of bdev read fix (OGAWA) - srcu leak fix (Ming)" * tag 'block-5.17-2022-01-21' of git://git.kernel.dk/linux-block: block: fix async_depth sysfs interface for mq-deadline block: Fix wrong offset in bio_truncate() block: assign bi_bdev for cloned bios in blk_rq_prep_clone block: cleanup q->srcu block: Remove unnecessary variable assignment brd: remove brd_devices_mutex mutex aoe: remove redundant assignment on variable n loop: remove redundant initialization of pointer node blk-mq: fix tag_get wait task can't be awakened	2022-01-21 16:17:03 +02:00
Jens Axboe	46cdc45acb	block: fix async_depth sysfs interface for mq-deadline A previous commit added this feature, but it inadvertently used the wrong variable to show/store the setting from/to, victimized by copy/paste. Fix it up so that the async_depth sysfs interface reads and writes from the right setting. Fixes: `07757588e5` ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215485 Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-20 10:54:02 -07:00
OGAWA Hirofumi	3ee859e384	block: Fix wrong offset in bio_truncate() bio_truncate() clears the buffer outside of last block of bdev, however current bio_truncate() is using the wrong offset of page. So it can return the uninitialized data. This happened when both of truncated/corrupted FS and userspace (via bdev) are trying to read the last of bdev. Reported-by: syzbot+ac94ae5f68b84197f41c@syzkaller.appspotmail.com Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/875yqt1c9g.fsf@mail.parknet.co.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-20 06:30:12 -07:00
Christoph Hellwig	fd9f4e62a3	block: assign bi_bdev for cloned bios in blk_rq_prep_clone bio_clone_fast() sets the cloned bio to have the same ->bi_bdev as the source bio. This means that when request-based dm called setup_clone(), the cloned bio had its ->bi_bdev pointing to the dm device. After Commit `0b6e522cdc` ("blk-mq: use ->bi_bdev for I/O accounting") __blk_account_io_start() started using the request's ->bio->bi_bdev for I/O accounting, if it was set. This caused IO going to the underlying devices to use the dm device for their I/O accounting. Set up the proper ->bi_bdev in blk_rq_prep_clone based on the whole device bdev for the queue the request is cloned onto. Fixes: `0b6e522cdc` ("blk-mq: use ->bi_bdev for I/O accounting") Reported-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [hch: the commit message is mostly from a different patch from Benjamin] Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Link: https://lore.kernel.org/r/20220118070444.1241739-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-18 06:34:05 -07:00
Ming Lei	850fd2abbe	block: cleanup q->srcu srcu structure has to be cleanup via cleanup_srcu_struct(), so fix it. Reported-by: syzbot+4f789823c1abc5accf13@syzkaller.appspotmail.com Fixes: `704b914f15` ("blk-mq: move srcu from blk_mq_hw_ctx to request_queue") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220111123401.520192-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-17 07:24:45 -07:00
GuoYong Zheng	e6a2e5116e	block: Remove unnecessary variable assignment The parameter "ret" should be zero when running to this line, no need to set to zero again, remove it. Signed-off-by: GuoYong Zheng <zhenggy@chinatelecom.cn> Link: https://lore.kernel.org/r/1642414957-6785-1-git-send-email-zhenggy@chinatelecom.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-17 07:24:04 -07:00
Yury Norov	9b51d9d866	cpumask: replace cpumask_next_* with cpumask_first_* where appropriate cpumask_first() is a more effective analogue of 'next' version if n == -1 (which means start == 0). This patch replaces 'next' with 'first' where things look trivial. There's no cpumask_first_zero() function, so create it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>	2022-01-15 08:47:31 -08:00
Linus Torvalds	e1a7aa25ff	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, pm80xx, lpfc, mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes. The most impactful change is likely the switch from GFP_DMA to GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too many people" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (121 commits) scsi: mpi3mr: Bump driver version to 8.0.0.61.0 scsi: mpi3mr: Fixes around reply request queues scsi: mpi3mr: Enhanced Task Management Support Reply handling scsi: mpi3mr: Use TM response codes from MPI3 headers scsi: mpi3mr: Add io_uring interface support in I/O-polled mode scsi: mpi3mr: Print cable mngnt and temp threshold events scsi: mpi3mr: Support Prepare for Reset event scsi: mpi3mr: Add Event acknowledgment logic scsi: mpi3mr: Gracefully handle online FW update operation scsi: mpi3mr: Detect async reset that occurred in firmware scsi: mpi3mr: Add IOC reinit function scsi: mpi3mr: Handle offline FW activation in graceful manner scsi: mpi3mr: Code refactor of IOC init - part2 scsi: mpi3mr: Code refactor of IOC init - part1 scsi: mpi3mr: Fault IOC when internal command gets timeout scsi: mpi3mr: Display IOC firmware package version scsi: mpi3mr: Handle unaligned PLL in unmap cmnds scsi: mpi3mr: Increase internal cmnds timeout to 60s scsi: mpi3mr: Do access status validation before adding devices scsi: mpi3mr: Add support for PCIe Managed Switch SES device ...	2022-01-14 14:37:34 +01:00
Laibin Qiu	180dccb0db	blk-mq: fix tag_get wait task can't be awakened In case of shared tags, there might be more than one hctx which allocates from the same tags, and each hctx is limited to allocate at most: hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U); tag idle detection is lazy, and may be delayed for 30sec, so there could be just one real active hctx(queue) but all others are actually idle and still accounted as active because of the lazy idle detection. Then if wake_batch is > hctx_max_depth, driver tag allocation may wait forever on this real active hctx. Fix this by recalculating wake_batch when inc or dec active_queues. Fixes: `0d2602ca30` ("blk-mq: improve support for shared tags maps") Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220113025536.1479653-1-qiulaibin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-13 12:52:14 -07:00
Linus Torvalds	f079ab01b5	Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux Pull iomap updates from Matthew Wilcox: "Convert xfs/iomap to use folios. This should be all that is needed for XFS to use large folios. There is no code in this pull request to create large folios, but no additional changes should be needed to XFS or iomap once they are created. Usually this would have come from Darrick, and we had intended that it would come that route. Between the holidays and various things which Darrick needed to work on, he asked if I could send things directly. There weren't any other iomap patches pending for this release, which probably also played a role" * tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits) iomap: Inline __iomap_zero_iter into its caller xfs: Support large folios iomap: Support large folios in invalidatepage iomap: Convert iomap_migrate_page() to use folios iomap: Convert iomap_add_to_ioend() to take a folio iomap: Simplify iomap_do_writepage() iomap: Simplify iomap_writepage_map() iomap,xfs: Convert ->discard_page to ->discard_folio iomap: Convert iomap_write_end_inline to take a folio iomap: Convert iomap_write_begin() and iomap_write_end() to folios iomap: Convert __iomap_zero_iter to use a folio iomap: Allow iomap_write_begin() to be called with the full length iomap: Convert iomap_page_mkwrite to use a folio iomap: Convert readahead and readpage to use a folio iomap: Convert iomap_read_inline_data to take a folio iomap: Use folio offsets instead of page offsets iomap: Convert bio completions to use folios iomap: Pass the iomap_page into iomap_set_range_uptodate iomap: Add iomap_invalidate_folio iomap: Convert iomap_releasepage to use a folio ...	2022-01-12 12:51:41 -08:00
Linus Torvalds	d3c8108035	Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...	2022-01-12 10:26:52 -08:00
Ming Lei	9d497e2941	block: don't protect submit_bio_checks by q_usage_counter Commit `cc9c884dd7` ("block: call submit_bio_checks under q_usage_counter") uses q_usage_counter to protect submit_bio_checks for avoiding IO after disk is deleted by del_gendisk(). Turns out the protection isn't necessary, because once blk_mq_freeze_queue_wait() in del_gendisk() returns: 1) all in-flight IO has been done 2) all new IO will be failed in __bio_queue_enter() because q_usage_counter is dead, and GD_DEAD is set 3) both disk and request queue instance are safe since caller of submit_bio() guarantees that the disk can't be closed. Once submit_bio_checks() needn't the protection of q_usage_counter, we can move submit_bio_checks before calling blk_mq_submit_bio() and ->submit_bio(). With this change, we needn't to throttle queue with holding one allocated request, then precise driver tag or request won't be wasted in throttling. Meantime we can unify the bio check for both bio based and request based driver. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220104134223.590803-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-01-09 18:54:52 -07:00
Lukas Bulwahn	669a064625	block: drop needless assignment in set_task_ioprio() Commit `5fc11eebb4` ("block: open code create_task_io_context in set_task_ioprio") introduces a needless assignment 'ioc = task->io_context', as the local variable ioc is not further used before returning. Even after the further fix, commit `a957b61254` ("block: fix error in handling dead task for ioprio setting"), the assignment still remains needless. Drop this needless assignment in set_task_ioprio(). This code smell was identified with 'make clang-analyzer'. Fixes: `5fc11eebb4` ("block: open code create_task_io_context in set_task_ioprio") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211223125300.20691-1-lukas.bulwahn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-23 07:10:07 -07:00
Alan Stern	6e1fcab00a	scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume() John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: `e27829dc92` ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
Tetsuo Handa	e338924bd0	block: check minor range in device_add_disk() ioctl(fd, LOOP_CTL_ADD, 1048576) causes sysfs: cannot create duplicate filename '/dev/block/7:0' message because such request is treated as if ioctl(fd, LOOP_CTL_ADD, 0) due to MINORMASK == 1048575. Verify that all minor numbers for that device fit in the minor range. Reported-by: wangyangbo <wangyangbo@uniontech.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/b1b19379-23ee-5379-0eb5-94bf5f79f1b4@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-21 09:34:29 -07:00
Christoph Hellwig	99d8690aae	block: fix error unwinding in device_add_disk One device_add is called disk->ev will be freed by disk_release, so we should free it twice. Fix this by allocating disk->ev after device_add so that the extra local unwinding can be removed entirely. Based on an earlier patch from Tetsuo Handa. Reported-by: syzbot <syzbot+28a66a9fbc621c939000@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+28a66a9fbc621c939000@syzkaller.appspotmail.com> Fixes: `83cbce9574` ("block: add error handling for device_add_disk / add_disk") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211221161851.788424-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-21 09:31:51 -07:00
Ming Lei	37e11c3616	block: call blk_exit_queue() before freeing q->stats blk_stat_disable_accounting() is added in commit `68497092bd` ("block: make queue stat accounting a reference"), and called in kyber_exit_sched(). So we have to free q->stats after elevator is unloaded from blk_exit_queue() in blk_release_queue(). Otherwise kernel panic is caused. Fixes: `68497092bd` ("block: make queue stat accounting a reference") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211221040436.1333880-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-20 21:07:51 -07:00
Jens Axboe	a957b61254	block: fix error in handling dead task for ioprio setting Don't combine the task exiting and "already have io_context" case, we need to just abort if the task is marked as dead. Return -ESRCH, which is the documented value for ioprio_set() if the specified task could not be found. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: syzbot+8836466a79f4175961b0@syzkaller.appspotmail.com Fixes: `5fc11eebb4` ("block: open code create_task_io_context in set_task_ioprio") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-20 20:32:24 -07:00
Keith Busch	518579a9af	blk-mq: blk-mq: check quiesce state before queue_rqs The low level drivers don't expect to see new requests after a successful quiesce completes. Check the queue quiesce state within the rcu protected area prior to calling the driver's queue_rqs(). Fixes: `3c67d44de7` ("block: add mq_ops->queue_rqs hook") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211220205919.180191-1-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-20 14:03:59 -07:00
Linus Torvalds	2da09da4ae	Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block Pull block revert from Jens Axboe: "It turns out that the fix for not hammering on the delayed work timer too much caused a performance regression for BFQ, so let's revert the change for now. I've got some ideas on how to fix it appropriately, but they should wait for 5.17" * tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block: Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"	2021-12-19 12:38:53 -08:00
Jens Axboe	87959fa16c	Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption" This reverts commit `cb2ac2912a`. Alex and the kernel test robot report that this causes a significant performance regression with BFQ. I can reproduce that result, so let's revert this one as we're close to -rc6 and we there's no point in trying to rush a fix. Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/ Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/ Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-19 07:58:44 -07:00
Linus Torvalds	fa09ca5ebc	Merge tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Fix for hammering on the delayed run queue timer (me) - bcache regression fix for this merge window (Lin) - Fix a divide-by-zero in the blk-iocost code (Tejun) * tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block: bcache: fix NULL pointer reference in cached_dev_detach_finish block: reduce kblockd_mod_delayed_work_on() CPU consumption iocost: Fix divide-by-zero on donation from low hweight cgroup	2021-12-17 11:46:07 -08:00
Matthew Wilcox (Oracle)	85f5a74c2b	block: Add bio_add_folio() This is a thin wrapper around bio_add_page(). The main advantage here is the documentation that folios larger than 2GiB are not supported. It's not currently possible to allocate folios that large, but if it ever becomes possible, this function will fail gracefully instead of doing I/O to the wrong bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-12-16 15:49:51 -05:00
Christoph Hellwig	5ef1630586	block: only build the icq tracking code when needed Only bfq needs to code to track icq, so make it conditional. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	90b627f542	block: fold create_task_io_context into ioc_find_get_icq Fold create_task_io_context into the only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	5fc11eebb4	block: open code create_task_io_context in set_task_ioprio The flow in set_task_ioprio can be simplified by simply open coding create_task_io_context, which removes a refcount roundtrip on the I/O context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	8472161b77	block: fold get_task_io_context into set_task_ioprio Fold get_task_io_context into its only caller, and simplify the code as no reference to the I/O context is required to just set the ioprio field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:02 -07:00
Christoph Hellwig	a411cd3cfd	block: move set_task_ioprio to blk-ioc.c Keep set_task_ioprio with the other low-level code that accesses the io_context structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	091abcb3ef	block: cleanup ioc_clear_queue Fold __ioc_clear_queue into ioc_clear_queue and switch to always use plain _irq locking instead of the more expensive _irqsave that is not needed here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	edf70ff5a1	block: refactor put_io_context Move the code to delay freeing the icqs into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	8a20c0c7e0	block: remove the NULL ioc check in put_io_context No caller passes in a NULL pointer, so remove the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	4be8a2eaff	block: refactor put_iocontext_active Factor out a ioc_exit_icqs helper to tear down the icqs and the fold the rest of put_iocontext_active into exit_io_context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	0aed2f162b	block: simplify struct io_context refcounting Don't hold a reference to ->refcount for each active reference, but just one for all active references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Christoph Hellwig	8a2ba1785c	block: remove the nr_task field from struct io_context Nothing ever looks at ->nr_tasks, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 10:59:01 -07:00
Jens Axboe	3c67d44de7	block: add mq_ops->queue_rqs hook If we have a list of requests in our plug list, send it to the driver in one go, if possible. The driver must set mq_ops->queue_rqs() to support this, if not the usual one-by-one path is used. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 08:49:17 -07:00
Jens Axboe	fcade2ce06	block: use singly linked list for bio cache Pointless to maintain a head/tail for the list, as we never need to access the tail. Entries are always LIFO for cache hotness reasons. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 08:43:09 -07:00
Jens Axboe	5581a5ddfe	block: add completion handler for fast path The batched completions only deal with non-partial requests anyway, and it doesn't deal with any requests that have errors. Add a completion handler that assumes it's a full request and that it's all being ended successfully. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-16 08:42:06 -07:00

1 2 3 4 5 ...

6048 Commits