Commit Graph

796793 Commits

Author SHA1 Message Date
YueHaibing
e41128cfd9 block: remove set but not used variable 'et'
Fixes gcc '-Wunused-but-set-variable' warning:

block/blk-ioc.c: In function 'put_io_context_active':
block/blk-ioc.c:174:24: warning:
 variable 'et' set but not used [-Wunused-but-set-variable]

It not used any more after commit
a1ce35fa49 ("block: remove dead elevator code")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 20:08:00 -07:00
Christoph Hellwig
0e17e06cbf block: remove the BLKPREP_* values.
Unused now.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:14 -07:00
Christoph Hellwig
4c1cb67c03 scsi: return blk_status_t from device handler ->prep_fn
Remove the last use of the old BLKPREP_* values, which get converted
to BLK_STS_* later anyway.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:14 -07:00
Christoph Hellwig
159b2cbf59 scsi: return blk_status_t from scsi_init_io and ->init_command
Replace the old BLKPREP_* values with the BLK_STS_ ones that they are
converted to later anyway.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:14 -07:00
Christoph Hellwig
14784565f7 scsi: clean up error handling in scsi_init_io
There is no need to call scsi_mq_free_sgtables until we have actually
allocated sgtables.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:14 -07:00
Christoph Hellwig
785ba83b4f scsi: push blk_status_t up into scsi_setup_{fs,scsi}_cmnd
This just moves the prep_to_mq calls up in preparation of further removal
of BLKPREP_* usage.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:14 -07:00
Christoph Hellwig
c092d4ec53 scsi: simplify scsi_prep_state_check
Return a blk_status_t directly, and make the code a little more compact
by handling the fast path in the caller.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:13 -07:00
Christoph Hellwig
535ac5d3fe ide: cleanup ->prep_rq calling convention
The return value is just used as a binary yes/no decision, so switch
it to a bool instead of the old BLKPREP_* values returned as an int.

Also clean up a few related comments.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 19:17:13 -07:00
Christoph Hellwig
9d037ad707 block: remove req->timeout_list
Unused now that the legacy request path is gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 11:44:10 -07:00
Christoph Hellwig
27d420bc47 mtip32xxx: use for_each_sg
Use the proper helper instead of manually iterating the scatterlist,
which is broken in the presence of chained S/G lists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
d85cb20453 mtip32xx: don't use req->special
Instead create add to the icmd into struct mtip_cmd which can be unioned
with the scatterlist used for the normal I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
55c7bc37e0 mtip32xx: remove mtip_get_int_command
Merging this function into the only callers makes the code flow easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
7bbf118f3b mtip32xx: remove mtip_init_cmd_header
There isn't much need for this helper - we can just calculate the offset
for the command header once late in the submission path and fill out
the ctba and ctbau fields there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
643b5f68d0 mtip32xx: add missing endianess annotations on struct smart_attr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
449a15d9e4 mtip32xx: remove __force_bit2int
There is no good excuse not to use proper __le16/32 types.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
81e66174ab mtip32xx: return a blk_status_t from mtip_send_trim
This allows for better error propagation and simpler code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
10966fa138 mtip32xx: merge mtip_submit_request into mtip_queue_rq
Factor out a new is_stopped helper that matches the existing
is_se_active helper, and merge the trivial amount of remaining code
into the only caller.  This also allows better error handling by
returning a BLK_STS_* directly instead of explicitly calling
blk_mq_end_request, and moving blk_mq_start_request closer to the
actual issue to hardware.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
b5fa0e9ec9 mtip32xx: move the blk_rq_map_sg call to mtip_hw_submit_io
We have all arguments at hand in mtip_hw_submit_io, so keep the
rq to sg mapping close to the dma_map_sg call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:39:21 -07:00
Christoph Hellwig
72d7ce8eb2 sx8: use a per-host tag_set
The current sx8 code spends a lot of effort dealing with the fact that
tags are per-host, but there might be multiple queues.  Now that the
driver has been converted to blk-mq it can take care of the blk-mq
tag_set concept that has been designed just for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:14:14 -07:00
Christoph Hellwig
cd94c9ed59 sx8: cleanup queue and disk allocation / freeing
Make the disk/queue alloc and free helpers per-port by moving the
trivial loops into the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 08:14:12 -07:00
Jens Axboe
ab11fe5af1 blk-mq-tag: document tag iteration helper return value
Document the fact that the strategy function passed in can
control whether to continue iterating or not.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 11:09:50 -07:00
Jens Axboe
ae8799125d blk-mq: provide a helper to check if a queue is busy
Returns true if the queue currently has requests pending,
false if not.

DM can use this to replace the atomic_inc/dec they do per device
to see if a device is busy.

Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 10:24:23 -07:00
Jens Axboe
7baa85727d blk-mq-tag: change busy_iter_fn to return whether to continue or not
We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.

Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 10:24:07 -07:00
Jan Kara
c28445fa06 loop: Get rid of 'nested' acquisition of loop_ctl_mutex
The nested acquisition of loop_ctl_mutex (->lo_ctl_mutex back then) has
been introduced by commit f028f3b2f9 "loop: fix circular locking in
loop_clr_fd()" to fix lockdep complains about bd_mutex being acquired
after lo_ctl_mutex during partition rereading. Now that these are
properly fixed, let's stop fooling lockdep.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:37 -07:00
Jan Kara
1dded9acf6 loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex
Code in loop_change_fd() drops reference to the old file (and also the
new file in a failure case) under loop_ctl_mutex. Similarly to a
situation in loop_set_fd() this can create a circular locking dependency
if this was the last reference holding the file open. Delay dropping of
the file reference until we have released loop_ctl_mutex.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:36 -07:00
Jan Kara
0da03cab87 loop: Fix deadlock when calling blkdev_reread_part()
Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is
called with loop_ctl_mutex held, it will call blkdev_reread_part() which
acquires bdev->bd_mutex. See syzbot report for details [1].

Move call to blkdev_reread_part() in __loop_clr_fd() from under
loop_ctl_mutex to finish fixing of the lockdep warning and the possible
deadlock.

[1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588

Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:32 -07:00
Jan Kara
85b0a54a82 loop: Move loop_reread_partitions() out of loop_ctl_mutex
Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is
called with loop_ctl_mutex held, it will call blkdev_reread_part() which
acquires bdev->bd_mutex. See syzbot report for details [1].

Move all calls of loop_rescan_partitions() out of loop_ctl_mutex to
avoid lockdep warning and fix deadlock possibility.

[1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588

Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:31 -07:00
Jan Kara
d57f3374ba loop: Move special partition reread handling in loop_clr_fd()
The call of __blkdev_reread_part() from loop_reread_partition() happens
only when we need to invalidate partitions from loop_release(). Thus
move a detection for this into loop_clr_fd() and simplify
loop_reread_partition().

This makes loop_reread_partition() safe to use without loop_ctl_mutex
because we use only lo->lo_number and lo->lo_file_name in case of error
for reporting purposes (thus possibly reporting outdate information is
not a big deal) and we are safe from 'lo' going away under us by
elevated lo->lo_refcnt.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:30 -07:00
Jan Kara
c371077000 loop: Push loop_ctl_mutex down to loop_change_fd()
Push loop_ctl_mutex down to loop_change_fd(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:28 -07:00
Jan Kara
757ecf40b7 loop: Push loop_ctl_mutex down to loop_set_fd()
Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to
call loop_reread_partitions() without lo_ctl_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:27 -07:00
Jan Kara
550df5fdac loop: Push loop_ctl_mutex down to loop_set_status()
Push loop_ctl_mutex down to loop_set_status(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:22 -07:00
Jan Kara
4a5ce9ba58 loop: Push loop_ctl_mutex down to loop_get_status()
Push loop_ctl_mutex down to loop_get_status() to avoid the unusual
convention that the function gets called with loop_ctl_mutex held and
releases it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:20 -07:00
Jan Kara
7ccd0791d9 loop: Push loop_ctl_mutex down into loop_clr_fd()
loop_clr_fd() has a weird locking convention that is expects
loop_ctl_mutex held, releases it on success and keeps it on failure.
Untangle the mess by moving locking of loop_ctl_mutex into
loop_clr_fd().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:19 -07:00
Jan Kara
a2505b799a loop: Split setting of lo_state from loop_clr_fd
Move setting of lo_state to Lo_rundown out into the callers. That will
allow us to unlock loop_ctl_mutex while the loop device is protected
from other changes by its special state.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:18 -07:00
Jan Kara
a13165441d loop: Push lo_ctl_mutex down into individual ioctls
Push acquisition of lo_ctl_mutex down into individual ioctl handling
branches. This is a preparatory step for pushing the lock down into
individual ioctl handling functions so that they can release the lock as
they need it. We also factor out some simple ioctl handlers that will
not need any special handling to reduce unnecessary code duplication.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:16 -07:00
Jan Kara
0a42e99b58 loop: Get rid of loop_index_mutex
Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as
there is no good reason to keep these two separate and it just
complicates the locking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:15 -07:00
Jan Kara
967d1dc144 loop: Fold __loop_release into loop_release
__loop_release() has a single call site. Fold it there. This is
currently not a huge win but it will make following replacement of
loop_index_mutex more obvious.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:14 -07:00
Tetsuo Handa
310ca162d7 block/loop: Use global lock for ioctl() operation.
syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_ctl_mutex locks.

Since ioctl() request on loop devices is not frequent operation, we don't
need fine grained locking. Let's use global lock in order to allow safe
traversal at loop_validate_file().

Note that syzbot is also reporting circular locking dependency between
bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
blkdev_reread_part() with lock held. This patch does not address it.

[1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3
[2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:11 -07:00
Tetsuo Handa
b1ab5fa309 block/loop: Don't grab "struct file" for vfs_getattr() operation.
vfs_getattr() needs "struct path" rather than "struct file".
Let's use path_get()/path_put() rather than get_file()/fput().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:30:09 -07:00
Colin Ian King
e051bd0ddf ms_block: remove unused pointer 'set'
Pointer 'set' is declared but not used, remove it. Cleans up warning:

warning: unused variable ‘set’ [-Wunused-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08 06:17:26 -07:00
Jens Axboe
dbef525773 sunvdc: fix compiler warning
Stephen reports:

After merging the block tree, today's linux-next build (sparc64 defconfig)
produced this warning:

/home/sfr/next/next/drivers/block/sunvdc.c: In function 'init_queue':
/home/sfr/next/next/drivers/block/sunvdc.c:788:6: warning: unused variable 'ret' [-Wunused-variable]
  int ret;
      ^~~

Kill the unused variable.

Fixes: fa182a1fa9 ("sunvdc: convert to blk-mq")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 21:17:57 -07:00
Jens Axboe
4b04cc6a8f nvme: add separate poll queue map
Adds support for defining a variable number of poll queues, currently
configurable with the 'poll_queues' module parameter. Defaults to
a single poll queue.

And now we finally have poll support without triggering interrupts!

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:45:00 -07:00
Jens Axboe
d1e36282b0 block: add REQ_HIPRI and inherit it from IOCB_HIPRI
We use IOCB_HIPRI to poll for IO in the caller instead of scheduling.
This information is not available for (or after) IO submission. The
driver may make different queue choices based on the type of IO, so
make the fact that we will poll for this IO known to the lower layers
as well.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:45:00 -07:00
Jens Axboe
3b6592f70a nvme: utilize two queue maps, one for reads and one for writes
NVMe does round-robin between queues by default, which means that
sharing a queue map for both reads and writes can be problematic
in terms of read servicing. It's much easier to flood the queue
with writes and reduce the read servicing.

Implement two queue maps, one for reads and one for writes. The
write queue count is configurable through the 'write_queues'
parameter.

By default, we retain the previous behavior of having a single
queue set, shared between reads and writes. Setting 'write_queues'
to a non-zero value will create two queue sets, one for reads and
one for writes, the latter using the configurable number of
queues (hardware queue counts permitting).

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:45:00 -07:00
Jens Axboe
843477d4cc blk-mq: initial support for multiple queue maps
Add a queue offset to the tag map. This enables users to map
iteratively, for each queue map type they support.

Bump maximum number of supported maps to 2, we're now fully
able to support more than 1 map.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:45:00 -07:00
Jens Axboe
3110fc7960 blk-mq: improve plug list sorting
Currently we only look at the software queue, but with support
for multiple maps, we should also look at the hardware queue.
This is important since we'll flush out the request list if
either the software queue or hardware queue don't match.

This sorts by software queue first, then hardware queue if
that differs. Finally we sort by request location like before.
This minimizes the flush points per plug list.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:45:00 -07:00
Jens Axboe
67cae4c948 blk-mq: cleanup and improve list insertion
It's somewhat strange to have a list insertion function that
relies on the fact that the caller has mapped things correctly.
Pass in the hardware queue directly for insertion, which makes
for a much cleaner interface and implementation.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:44:59 -07:00
Jens Axboe
ea4f995ee8 blk-mq: cache request hardware queue mapping
We call blk_mq_map_queue() a lot, at least two times for each
request per IO, sometimes more. Since we now have an indirect
call as well in that function. cache the mapping so we don't
have to re-call blk_mq_map_queue() for the same request
multiple times.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:44:59 -07:00
Jens Axboe
392546aed2 blk-mq: separate number of hardware queues from nr_cpu_ids
With multiple maps, nr_cpu_ids is no longer the maximum number of
hardware queues we support on a given devices. The initializer of
the tag_set can have set ->nr_hw_queues larger than the available
number of CPUs, since we can exceed that with multiple queue maps.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:44:59 -07:00
Jens Axboe
b3c661b15d blk-mq: support multiple hctx maps
Add support for the tag set carrying multiple queue maps, and
for the driver to inform blk-mq how many it wishes to support
through setting set->nr_maps.

This adds an mq_ops helper for drivers that support more than 1
map, mq_ops->rq_flags_to_type(). The function takes request/bio
flags and CPU, and returns a queue map index for that. We then
use the type information in blk_mq_map_queue() to index the map
set.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07 13:44:59 -07:00