linux

iv/linux

Author	SHA1	Message	Date
Eric Biggers	eebcafaebb	blk-crypto-fallback: properly prefix function and struct names For clarity, avoid using just the "blk_crypto_" prefix for functions and structs that are specific to blk-crypto-fallback. Instead, use "blk_crypto_fallback_". Some places already did this, but others didn't. This is also a prerequisite for using "struct blk_crypto_keyslot" to mean a generic blk-crypto keyslot (which is what it sounds like). Rename the fallback one to "struct blk_crypto_fallback_keyslot". No change in behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 10:49:32 -06:00
Xie Yongji	f059a1d2e2	block: Add invalidate_disk() helper to invalidate the gendisk To hide internal implementation and simplify some driver code, this adds a helper to invalidate the gendisk. It will clean the gendisk's associated buffer/page caches and reset its internal states. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210922123711.187-2-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 10:12:41 -06:00
Pavel Begunkov	e94f68527a	block: kill extra rcu lock/unlock in queue enter blk_try_enter_queue() already takes rcu_read_lock/unlock, so we can avoid the second pair in percpu_ref_tryget_live(), use a newly added percpu_ref_tryget_live_rcu(). As rcu_read_lock/unlock imply barrier()s, it's pretty noticeable, especially for for !CONFIG_PREEMPT_RCU (default for some distributions), where __rcu_read_lock/unlock() are not inlined. 3.20% io_uring [kernel.vmlinux] [k] __rcu_read_unlock 3.05% io_uring [kernel.vmlinux] [k] __rcu_read_lock 2.52% io_uring [kernel.vmlinux] [k] __rcu_read_unlock 2.28% io_uring [kernel.vmlinux] [k] __rcu_read_lock Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b11c67ea495ed9d44f067622d852de4a510ce65.1634822969.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:37:26 -06:00
Pavel Begunkov	6549a874fb	block: convert fops.c magic constants to SHIFT_SECTOR Don't use shifting by a magic number 9 but replace with a more descriptive SHIFT_SECTOR. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/068782b9f7e97569fb59a99529b23bb17ea4c5e2.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:27:17 -06:00
Pavel Begunkov	179ae84f7e	block: clean up blk_mq_submit_bio() merging Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a common if, so we don't check it twice. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/daedc90d4029a5d1d73344771632b1faca3aaf81.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:27:17 -06:00
Pavel Begunkov	6450fe1f66	block: optimise boundary blkdev_read_iter's checks Combine pos and len checks and mark unlikely. Also, don't reexpand if it's not truncated. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fff34e613aeaae1ad12977dc4592cb1a1f5d3190.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:27:17 -06:00
Jackie Liu	057178cf51	fs: bdev: fix conflicting comment from lookup_bdev We switched to directly use dev_t to get block device, lookup changed the meaning of use, now we fix this conflicting comment. Fixes: 4e7b5671c6a8 ("block: remove i_bdev") Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211021071344.1600362-1-liu.yun@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:24:14 -06:00
John Garry	0994c64eb4	blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags Since it is now possible for a tagset to share a single set of tags, the iter function should not re-iter the tags for the count of #hw queues in that case. Rather it should just iter once. Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support") Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1634550083-202815-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:21:52 -06:00
Christoph Hellwig	008f75a20e	block: cleanup the flush plug helpers Consolidate the various helpers into a single blk_flush_plug helper that takes a plk_plug and the from_scheduler bool and switch all callsites to call it directly. Checks that the plug is non-NULL must be performed by the caller, something that most already do anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Pavel Begunkov	b600455d84	block: optimise blk_flush_plug_list Don't call flush_plug_callbacks if there are no plug callbacks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Christoph Hellwig	dbb6f764a0	blk-mq: move blk_mq_flush_plug_list to block/blk-mq.h This helper is internal to the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Christoph Hellwig	a214b949d8	blk-mq: only flush requests from the plug in blk_mq_submit_bio Replace the call to blk_flush_plug_list in blk_mq_submit_bio with a direct call to blk_mq_flush_plug_list. This means we do not flush plug callback from stackable devices, which doesn't really help with the accumulated requests anyway, and it also means the cached requests aren't freed here as they can still be used later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Jens Axboe	037057a5a9	block: remove inaccurate requeue check This check is meant to catch cases where a requeue is attempted on a request that is still inserted. It's never really been useful to catch any misuse, and now it's actively wrong. Outside of that, this should not be a BUG_ON() to begin with. Remove the check as it's now causing active harm, as requeue off the plug path will trigger it even though the request state is just fine. Reported-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/linux-block/CAHj4cs80zAUc2grnCZ015-2Rvd-=gXRfB_dFKy=RTm+wRo09HQ@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:21:40 -06:00
Pavel Begunkov	c809084ab0	block: inline a part of bio_release_pages() Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:07 -06:00
Pavel Begunkov	1497a51a32	block: don't bloat enter_queue with percpu_ref percpu_ref_put() are inlined for performance and bloat the binary, we don't care about the fail case of blk_try_enter_queue(), so we can replace it with a call to blk_queue_exit(). Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:06 -06:00
Pavel Begunkov	478eb72b81	block: optimise req_bio_endio() First, get rid of an extra branch and chain error checks. Also reshuffle it with bio_advance(), so it goes closer to the final check, with that the compiler loads rq->rq_flags only once, and also doesn't reload bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:05 -06:00
Pavel Begunkov	859897c3fb	block: convert leftovers to bdev_get_queue Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster. Apparently, there are a few such spots in block that got lost during rebases. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:03 -06:00
Ming Lei	e70feb8b3e	blk-mq: support concurrent queue quiesce/unquiesce blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support concurrent/nested quiesce. One biggest issue is that unquiesce can happen unexpectedly in case that quiesce/unquiesce are run concurrently from more than one context. This patch introduces q->mq_quiesce_depth to deal concurrent quiesce, and we only unquiesce queue when it is the last/outer-most one of all contexts. Several kernel panic issue has been reported[1][2][3] when running stress quiesce test. And this patch has been verified in these reports. [1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787 [2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722 [3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 18:27:58 -06:00
Zheng Liang	2fc428f6b7	block, bfq: fix UAF problem in bfqg_stats_init() In bfq_pd_alloc(), the function bfqg_stats_init() init bfqg. If blkg_rwstat_init() init bfqg_stats->bytes successful and init bfqg_stats->ios failed, bfqg_stats_init() return failed, bfqg will be freed. But blkg_rwstat->cpu_cnt is not deleted from the list of percpu_counters. If we traverse the list of percpu_counters, It will have UAF problem. we should use blkg_rwstat_exit() to cleanup bfqg_stats bytes in the above scenario. Fixes: commit fd41e60331b ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios") Signed-off-by: Zheng Liang <zhengliang6@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211018024225.1493938-1-zhengliang6@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 15:18:30 -06:00
Jens Axboe	a808a9d545	block: inline fast path of driver tag allocation If we don't use an IO scheduler or have shared tags, then we don't need to call into this external function at all. This saves ~2% for such a setup. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 15:18:25 -06:00
Christoph Hellwig	d92ca9d834	blk-mq: don't handle non-flush requests in blk_insert_flush Return to the normal blk_mq_submit_bio flow if the bio did not end up actually being a flush because the device didn't support it. Note that this is basically impossible to hit without special instrumentation given that submit_bio_checks already clears these flags usually, so we'd need a tight race to actually hit this code path. With this the call to blk_mq_run_hw_queue for the flush requests can be removed given that the actual flush requests are always issued via the requeue workqueue which runs the queue unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 11:10:09 -06:00
Jens Axboe	dc5fc361d8	block: attempt direct issue of plug list If we have just one queue type in the plug list, then we can extend our direct issue to cover a full plug list as well. This allows sending a batch of requests for direct issue, which is more efficient than doing one-at-a-time kind of issue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 09:22:04 -06:00
Jens Axboe	bc490f8173	block: change plugging to use a singly linked list Use a singly linked list for the blk_plug. This saves 8 bytes in the blk_plug struct, and makes for faster list manipulations than doubly linked lists. As we don't use the doubly linked lists for anything, singly linked is just fine. This yields a bump in default (merging enabled) performance from 7.0 to 7.1M IOPS, and ~7.5M IOPS with merging disabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 09:21:42 -06:00
Christoph Hellwig	97eeb5fc14	partitions/ibm: use bdev_nr_sectors instead of open coding it Use the proper helper to read the block device size and switch various places to pass the size in terms of sectors which is more practical. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-4-hch@lst.de [axboe: fix comment typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:17:33 -06:00
Christoph Hellwig	f9831b8857	partitions/efi: use bdev_nr_bytes instead of open coding it Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:16:50 -06:00
Christoph Hellwig	946e993730	block/ioctl: use bdev_nr_sectors and bdev_nr_bytes Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:16:50 -06:00
Andrea Righi	480d42dc00	blk-wbt: prevent NULL pointer dereference in wb_timer_fn The timer callback used to evaluate if the latency is exceeded can be executed after the corresponding disk has been released, causing the following NULL pointer dereference: [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 119.987617] #PF: supervisor read access in kernel mode [ 119.987971] #PF: error_code(0x0000) - not-present page [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0 [ 119.988697] Oops: 0000 [#1] SMP NOPTI [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0 [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00 [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246 [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780 [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000 [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500 [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00 [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000 [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0 [ 119.995906] Call Trace: [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30 [ 119.996505] blk_stat_timer_fn+0x138/0x140 [ 119.996830] call_timer_fn+0x2b/0x100 [ 119.997136] __run_timers.part.0+0x1d1/0x240 [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20 [ 119.997826] ? ktime_get+0x3e/0xa0 [ 119.998110] ? native_apic_msr_write+0x2c/0x30 [ 119.998456] ? lapic_next_event+0x20/0x30 [ 119.998779] ? clockevents_program_event+0x94/0xf0 [ 119.999150] run_timer_softirq+0x2a/0x50 [ 119.999465] __do_softirq+0xcb/0x26f [ 119.999764] irq_exit_rcu+0x8c/0xb0 [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90 [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20 In this case simply return from the timer callback (no action required) to prevent the NULL pointer dereference. BugLink: https://bugs.launchpad.net/bugs/1947557 Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/ Fixes: 34dbad5d26e2 ("blk-stat: convert to callback-based statistics reporting") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktop Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:13:41 -06:00
Jens Axboe	6155631a0c	block: align blkdev_dio inlined bio to a cacheline We get all sorts of unreliable and funky results since the bio is designed to align on a cacheline, which it does not when inlined like this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:52 -06:00
Jens Axboe	e028f167ec	block: move blk_mq_tag_to_rq() inline This is in the fast path of driver issue or completion, and it's a single array index operation. Move it inline to avoid a function call for it. This does mean making struct blk_mq_tags block layer public, but there's not really much in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:41 -06:00
Jens Axboe	df87eb0fce	block: get rid of plug list sorting Even if we have multiple queues in the plug list, chances that they are very interspersed is minimal. Don't bother spending CPU cycles sorting the list. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:26 -06:00
Jens Axboe	87c037d11b	block: return whether or not to unplug through boolean Instead of returning the same queue request through a request pointer, use a boolean to accomplish the same. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:04 -06:00
Christoph Hellwig	8a7d267b4a	block: don't call blk_status_to_errno in blk_update_request We only need to call it to resolve the blk_status_t -> errno mapping for tracing, so move the conversion into the tracepoints that are not called at all when tracing isn't enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:54:57 -06:00
Jens Axboe	db9a02baa2	block: move bdev_read_only() into the header This is called for every write in the fast path, move it inline next to get_disk_ro() which is called internally. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:53:22 -06:00
Jens Axboe	e0d78afeb8	block: fix too broad elevator check in blk_mq_free_request() We added RQF_ELV to tell whether there's an IO scheduler attached, and RQF_ELVPRIV tells us whether there's an IO scheduler with private data attached. Don't check RQF_ELV in blk_mq_free_request(), what we care about here is just if we have scheduler private data attached. This fixes a boot crash Fixes: 2ff0682da6e0 ("block: store elevator state in request") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:48:15 -06:00
Jens Axboe	f09313c57a	block: cache inode size in bdev Reading the inode size brings in a new cacheline for IO submit, and it's in the hot path being checked for every single IO. When doing millions of IOs per core per second, this is noticeable overhead. Cache the nr_sectors in the bdev itself. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:43:23 -06:00
Christoph Hellwig	2a93ad8fcb	block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211018101130.1838532-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:43:23 -06:00
Jens Axboe	f794f3351f	block: add support for blk_mq_end_request_batch() Instead of calling blk_mq_end_request() on a single request, add a helper that takes the new struct io_comp_batch and completes any request stored in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:40:43 -06:00
Jens Axboe	5a72e899ce	block: add a struct io_comp_batch argument to fops->iopoll() struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:40:40 -06:00
Jens Axboe	013a7f9543	block: provide helpers for rq_list manipulation Instead of open-coding the list additions, traversal, and removal, provide a basic set of helpers. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:40:39 -06:00
Jens Axboe	afd7de03c5	block: remove some blk_mq_hw_ctx debugfs entries Just like the blk_mq_ctx counterparts, we've got a bunch of counters in here that are only for debugfs and are of questionnable value. They are: - dispatched, index of how many requests were dispatched in one go - poll_{considered,invoked,success}, which track poll sucess rates. We're confident in the iopoll implementation at this point, don't bother tracking these. As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes, dropping a whole cacheline. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:40:33 -06:00
Jens Axboe	9a14d6ce41	block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes These were added as part of early days debugging for blk-mq, and they are not really useful anymore. Rather than spend cycles updating them, just get rid of them. As a bonus, this shrinks the per-cpu software queue size from 256b to 192b. That's a whole cacheline less. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:39:48 -06:00
Pavel Begunkov	128459062b	block: cache rq_flags inside blk_mq_rq_ctx_init() Add a local variable for rq_flags, it helps to compile out some of rq_flags reloads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:38:45 -06:00
Pavel Begunkov	605f784e4f	block: blk_mq_rq_ctx_init cache ctx/q/hctx We should have enough of registers in blk_mq_rq_ctx_init(), store them in local vars, so we don't keep reloading them. note: keeping q->elevator may look unnecessary, but it's also used inside inlined blk_mq_tags_from_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:38:43 -06:00
Pavel Begunkov	4f266f2be8	block: skip elevator fields init for non-elv queue Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is no elevator. Also, move some other initialisers that imply barriers to the end, so the compiler is free to rearrange and optimise other the rest of them. note: fold in a change from Jens leaving queue_list unconditional, as it might lead to problems otherwise. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 14:38:42 -06:00
Zqiang	9fbfabfda2	block: fix incorrect references to disk objects When adding partitions to the disk, the reference count of the disk object is increased. then alloc partition device and called device_add(), if the device_add() return error, the reference count of the disk object will be reduced twice, at put_device(pdev) and put_disk(disk). this leads to the end of the object's life cycle prematurely, and trigger following calltrace. __init_work+0x2d/0x50 kernel/workqueue.c:519 synchronize_rcu_expedited+0x3af/0x650 kernel/rcu/tree_exp.h:847 bdi_remove_from_list mm/backing-dev.c:938 [inline] bdi_unregister+0x17f/0x5c0 mm/backing-dev.c:946 release_bdi+0xa1/0xc0 mm/backing-dev.c:968 kref_put include/linux/kref.h:65 [inline] bdi_put+0x72/0xa0 mm/backing-dev.c:976 bdev_free_inode+0x11e/0x220 block/bdev.c:408 i_callback+0x3f/0x70 fs/inode.c:226 rcu_do_batch kernel/rcu/tree.c:2508 [inline] rcu_core+0x76d/0x16c0 kernel/rcu/tree.c:2743 __do_softirq+0x1d7/0x93b kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xf2/0x130 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x93/0xc0 making disk is NULL when calling put_disk(). Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211018103422.2043-1-qiang.zhang1211@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 11:20:38 -06:00
Jens Axboe	2ff0682da6	block: store elevator state in request Add an rq private RQF_ELV flag, which tells the block layer that this request was initialized on a queue that has an IO scheduler attached. This allows for faster checking in the fast path, rather than having to deference rq->q later on. Elevator switching does full quiesce of the queue before detaching an IO scheduler, so it's safe to cache this in the request itself. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 08:51:52 -06:00
Jens Axboe	90b8faa0e8	block: only mark bio as tracked if it really is tracked We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even though we may not even have an rq_qos handler. Only mark it as TRACKED if it really is potentially tracked. This saves considerable time for the case where the bio isn't tracked: 2.64% -1.65% [kernel.vmlinux] [k] bio_endio Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 08:50:47 -06:00
Jens Axboe	9be3e06fb7	block: move update request helpers into blk-mq.c For some reason we still have them in blk-core, with the rest of the request completion being in blk-mq. That causes and out-of-line call for each completion. Move them into blk-mq.c instead, where they belong. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 08:50:28 -06:00
Jens Axboe	c477b79778	block: remove useless caller argument to print_req_error() We have exactly one caller of this, just get rid of adding the useless function name to the output. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 08:50:22 -06:00
Jens Axboe	d4aa57a1ca	block: don't bother iter advancing a fully done bio If we're completing nbytes and nbytes is the size of the bio, don't bother with calling into the iterator increment helpers. Just clear the bio size and we're done. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 08:50:16 -06:00

... 3 4 5 6 7 ...

6037 Commits