linux/block
Paolo Valente 8cacc5ab3e block, bfq: do not merge queues on flash storage with queueing
To boost throughput with a set of processes doing interleaved I/O
(i.e., a set of processes whose individual I/O is random, but whose
merged cumulative I/O is sequential), BFQ merges the queues associated
with these processes, i.e., redirects the I/O of these processes into a
common, shared queue. In the shared queue, I/O requests are ordered by
their position on the medium, thus sequential I/O gets dispatched to
the device when the shared queue is served.

Queue merging costs execution time, because, to detect which queues to
merge, BFQ must maintain a list of the head I/O requests of active
queues, ordered by request positions. Measurements showed that this
costs about 10% of BFQ's total per-request processing time.

Request processing time becomes more and more critical as the speed of
the underlying storage device grows. Yet, fortunately, queue merging
is basically useless on the very devices that are so fast to make
request processing time critical. To reach a high throughput, these
devices must have many requests queued at the same time. But, in this
configuration, the internal scheduling algorithms of these devices do
also the job of queue merging: they reorder requests so as to obtain
as much as possible a sequential I/O pattern. As a consequence, with
processes doing interleaved I/O, the throughput reached by one such
device is likely to be the same, with and without queue merging.

In view of this fact, this commit disables queue merging, and all
related housekeeping, for non-rotational devices with internal
queueing. The total, single-lock-protected, per-request processing
time of BFQ drops to, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
(time measured with simple code instrumentation, and using the
throughput-sync.sh script of the S suite [1], in performance-profiling
mode). To put this result into context, the total,
single-lock-protected, per-request execution time of the lightest I/O
scheduler available in blk-mq, mq-deadline, is 0.7 us (mq-deadline is
~800 LOC, against ~10500 LOC for BFQ).

Disabling merging provides a further, remarkable benefit in terms of
throughput. Merging tends to make many workloads artificially more
uneven, mainly because of shared queues remaining non empty for
incomparably more time than normal queues. So, if, e.g., one of the
queues in a set of merged queues has a higher weight than a normal
queue, then the shared queue may inherit such a high weight and, by
staying almost always active, may force BFQ to perform I/O plugging
most of the time. This evidently makes it harder for BFQ to let the
device reach a high throughput.

As a practical example of this problem, and of the benefits of this
commit, we measured again the throughput in the nasty scenario
considered in previous commit messages: dbench test (in the Phoronix
suite), with 6 clients, on a filesystem with journaling, and with the
journaling daemon enjoying a higher weight than normal processes. With
this commit, the throughput grows from ~150 MB/s to ~200 MB/s on a
PLEXTOR PX-256M5 SSD. This is the same peak throughput reached by any
of the other I/O schedulers. As such, this is also likely to be the
maximum possible throughput reachable with this workload on this
device, because I/O is mostly random, and the other schedulers
basically just pass I/O requests to the drive as fast as possible.

[1] https://github.com/Algodev-github/S

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Francesco Pollicino <fra.fra.800@gmail.com>
Signed-off-by: Alessio Masola <alessio.masola@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-01 08:15:40 -06:00
..
partitions partitions/aix: append null character to print data from disk 2018-07-27 09:17:41 -06:00
badblocks.c badblocks: fix wrong return value in badblocks_set if badblocks are disabled 2017-11-03 11:29:50 -07:00
bfq-cgroup.c block, bfq: do not merge queues on flash storage with queueing 2019-04-01 08:15:40 -06:00
bfq-iosched.c block, bfq: do not merge queues on flash storage with queueing 2019-04-01 08:15:40 -06:00
bfq-iosched.h block, bfq: do not merge queues on flash storage with queueing 2019-04-01 08:15:40 -06:00
bfq-wf2q.c block, bfq: do not idle for lowest-weight queues 2019-04-01 08:15:39 -06:00
bio-integrity.c block: remove the bio_integrity_advance export 2018-12-16 08:33:57 -07:00
bio.c block: add BIO_NO_PAGE_REF flag 2019-03-18 10:44:48 -06:00
blk-cgroup.c blkcg: Fix kernel-doc warnings 2019-03-20 14:39:09 -06:00
blk-core.c mm: refactor readahead defines in mm.h 2019-03-12 10:04:01 -07:00
blk-exec.c block: remove dead elevator code 2018-11-07 13:42:32 -07:00
blk-flush.c blk-mq: use blk_mq_put_driver_tag() to put tag 2019-03-24 10:26:16 -06:00
blk-integrity.c block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable 2018-09-24 12:33:57 -06:00
blk-ioc.c block: remove the queue_lock indirection 2018-11-15 12:17:28 -07:00
blk-iolatency.c blk-iolatency: #include "blk.h" 2019-03-20 14:19:38 -06:00
blk-lib.c block: fix 32 bit overflow in __blkdev_issue_discard() 2018-11-14 08:17:18 -07:00
blk-map.c Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block 2018-01-29 11:51:49 -08:00
blk-merge.c block: fix segment calculation for passthrough IO 2019-03-06 09:42:54 -07:00
blk-mq-cpumap.c blk-mq: initial support for multiple queue maps 2018-11-07 13:45:00 -07:00
blk-mq-debugfs-zoned.c block: Cleanup license notice 2019-01-17 21:21:40 -07:00
blk-mq-debugfs.c SCSI misc on 20190306 2019-03-09 16:53:47 -08:00
blk-mq-debugfs.h blk-mq-debugfs: support rq_qos 2018-12-16 19:53:47 -07:00
blk-mq-pci.c blk-mq: initial support for multiple queue maps 2018-11-07 13:45:00 -07:00
blk-mq-rdma.c blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues 2018-12-13 09:59:08 +01:00
blk-mq-sched.c blk-mq: save queue mapping result into ctx directly 2019-02-01 08:33:04 -07:00
blk-mq-sched.h block: mq-deadline: Fix write completion handling 2018-12-17 11:19:39 -07:00
blk-mq-sysfs.c blk-mq: export hctx->type in debugfs instead of sysfs 2018-12-17 05:44:45 -07:00
blk-mq-tag.c blk-mq: save queue mapping result into ctx directly 2019-02-01 08:33:04 -07:00
blk-mq-tag.h Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
blk-mq-virtio.c blk-mq: initial support for multiple queue maps 2018-11-07 13:45:00 -07:00
blk-mq.c blk-mq: fix sbitmap ws_active for shared tags 2019-03-25 13:05:47 -06:00
blk-mq.h blk-mq: use blk_mq_put_driver_tag() to put tag 2019-03-24 10:26:16 -06:00
blk-pm.c block: remove the queue_lock indirection 2018-11-15 12:17:28 -07:00
blk-pm.h block: remove the queue_lock indirection 2018-11-15 12:17:28 -07:00
blk-rq-qos.c blk-mq-debugfs: support rq_qos 2018-12-16 19:53:47 -07:00
blk-rq-qos.h block: fix blk-iolatency accounting underflow 2018-12-17 11:19:54 -07:00
blk-settings.c block: kill QUEUE_FLAG_FLUSH_NQ 2019-02-09 15:40:24 -07:00
blk-softirq.c block: remove a few unused exports 2018-11-15 12:13:25 -07:00
blk-stat.c block: remove a few unused exports 2018-11-15 12:13:25 -07:00
blk-stat.h block: deactivate blk_stat timer in wbt_disable_default() 2018-12-12 06:47:51 -07:00
blk-sysfs.c block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value 2019-03-20 14:02:07 -06:00
blk-throttle.c blkcg: consolidate bio_issue_init() to be a part of core 2018-12-07 22:26:37 -07:00
blk-timeout.c block: don't hold the queue_lock over blk_abort_request 2018-11-15 12:13:18 -07:00
blk-wbt.c blk-wbt: Declare local functions static 2019-01-24 11:09:21 -07:00
blk-wbt.h block: remove external dependency on wbt_flags 2018-07-09 09:07:54 -06:00
blk-zoned.c for-4.21/block-20181221 2018-12-28 13:19:59 -08:00
blk.h blk-mq: save queue mapping result into ctx directly 2019-02-01 08:33:04 -07:00
bounce.c block: bounce: make sure that bvec table is updated 2019-02-21 10:58:44 -07:00
bsg-lib.c scsi: bsg-lib: handle bidi requests without block layer help 2019-02-05 21:27:40 -05:00
bsg.c scsi: bsg-lib: handle bidi requests without block layer help 2019-02-05 21:27:40 -05:00
cmdline-parser.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat_ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elevator.c block: avoid setting none scheduler if it's already none 2019-02-11 08:21:40 -07:00
genhd.c block: Replace function name in string with __func__ 2019-02-28 14:09:08 -07:00
ioctl.c block: Introduce BLKGETNRZONES ioctl 2018-10-25 11:17:40 -06:00
ioprio.c block: add ioprio_check_cap function 2018-05-31 10:50:54 -04:00
Kconfig Kconfig updates for v4.21 2018-12-29 13:03:29 -08:00
Kconfig.iosched block: remove legacy IO schedulers 2018-11-07 13:42:32 -07:00
kyber-iosched.c kyber: use sbitmap add_wait_queue/list_del wait helpers 2018-12-20 12:17:21 -07:00
Makefile block: remove legacy IO schedulers 2018-11-07 13:42:32 -07:00
mq-deadline.c block: mq-deadline: Fix write completion handling 2018-12-17 11:19:39 -07:00
opal_proto.h block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled 2017-09-11 09:45:52 -06:00
partition-generic.c block: return just one value from part_in_flight 2018-12-10 08:30:38 -07:00
scsi_ioctl.c block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
sed-opal.c block: sed-opal: Fix a couple off by one bugs 2018-06-20 12:04:06 -06:00
t10-pi.c block: move dif_prepare/dif_complete functions to block layer 2018-07-30 08:27:02 -06:00