855 Commits

Author SHA1 Message Date
Sagi Grimberg
4054637c9b nvme: flush reset_work before safely continuing with delete operation
Prevent racing controller reset and delete flows. reset_work must not
ever self-requeue so flushing it suffices.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:17 +01:00
Sagi Grimberg
12fa1304ba nvme-rdma: reuse nvme_delete_ctrl when reconnect attempts expire
instead of just queueing delete work

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:07 +01:00
Christoph Hellwig
6cd53d14aa nvme: consolidate common code from ->reset_work
No change in behavior except that the FC code cancels two work items a
little later now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:06 +01:00
Christoph Hellwig
e9bc25874c nvme-rdma: remove nvme_rdma_remove_ctrl
It is only used in two places, and some of the work done by it will
be taken into common code soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:05 +01:00
Christoph Hellwig
c5017e8570 nvme: move controller deletion to common code
Move the ->delete_work and the associated helpers to common code instead
of duplicating them in every driver.  This also adds the missing reference
get/put for the loop driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:04 +01:00
Christoph Hellwig
29c0964873 nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrl
No need to have two functions doing the same thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:04 +01:00
James Smart
71c691fd06 nvme-fc: avoid workqueue flush stalls
There's no need to wait for the full nvme_wq, which is now shared,
to flush. flush only the delete_work item.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sgi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:03 +01:00
Keith Busch
5e0fab57fb nvme: Fix setting logical block format when revalidating
Revalidating the disk needs to set the logical block format and capacity,
otherwise it can't figure out if the users modified anything about
the namespace.

Fixes: cdbff4f26bd9 ("nvme: remove nvme_revalidate_ns")

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-30 08:22:41 -06:00
James Smart
ecad0d2cb8 nvme-fc: remove NVME_FC_MAX_SEGMENTS
The define is an arbitrary limit to the io size on the initiator,
capping the io to 1MB-4KB.

Remove the define from the transport. I/O size will solely be limited
by the LLDD sg limits.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:35 +03:00
James Smart
56d5f4f108 nvme-fc: add support for duplicate_connect option
Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same host port
and target port for the same host (hostnqn, hostid) to the same
subsystem. Fails the connection request if an existing controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:32 +03:00
James Smart
36e835f243 nvme-rdma: add support for duplicate_connect option
Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same target
address (traddr), target port (trsvcid), and if specified, host
address (host_traddr). Fails the connection request if there is
an existing controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:32 +03:00
James Smart
991231dc48 nvme: add helper to compare options to controller
Adds a helper function that compares the host and subsytem
specified in a connect options list vs a controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:28 +03:00
James Smart
3b33876207 nvme: add duplicate_connect option
Add the "duplicate_connect" boolean option (presence means true).
Default is false.

When false, the transport should validate whether a new controller request
is targeted for the same host transport addressing and target transport
addressing as an existing controller. If so, the new controller request
should be rejected.

When true, the callee is explicitly requesting a duplicate controller
connection to be made and the new request should be attempted.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:20 +03:00
Christoph Hellwig
999ada2871 nvme: check for a live controller in nvme_dev_open
This is a much more sensible check than just the admin queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@rimbeg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-27 09:05:22 +03:00
Christoph Hellwig
a6a5149b10 nvme: get rid of nvme_ctrl_list
Use the core chrdev code to set up the link between the character device
and the nvme controller.  This allows us to get rid of the global list
of all controllers, and also ensures that we have both a reference to
the controller and the transport module before the open method of the
character device is called.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sgi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
2017-10-27 09:04:53 +03:00
Christoph Hellwig
d22524a478 nvme: switch controller refcounting to use struct device
Instead of allocating a separate struct device for the character device
handle embedd it into struct nvme_ctrl and use it for the main controller
refcounting.  This removes double refcounting and gets us an automatic
reference for the character device operations.  We keep ctrl->device as a
pointer for now to avoid chaning printks all over, but in the future we
could look into message printing helpers that take a controller structure
similar to what other subsystems do.

Note the delete_ctrl operation always already has a reference (either
through sysfs due this change, or because every open file on the
/dev/nvme-fabrics node has a refernece) when it is entered now, so we
don't need to do the unless_zero variant there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
2017-10-27 09:04:07 +03:00
Christoph Hellwig
c6424a90da nvme: simplify nvme_open
Now that we are protected against lookup vs free races for the namespace
by using kref_get_unless_zero we don't need the hack of NULLing out the
disk private data during removal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-27 09:03:31 +03:00
Christoph Hellwig
2dd4122854 nvme: use kref_get_unless_zero in nvme_find_get_ns
For kref_get_unless_zero to protect against lookup vs free races we need
to use it in all places where we aren't guaranteed to already hold a
reference.  There is no such guarantee in nvme_find_get_ns, so switch to
kref_get_unless_zero in this function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-27 09:02:58 +03:00
Nitzan Carmi
e62a538da2 nvme-rdma: Add debug message when reaches timeout
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-23 16:28:42 +02:00
Max Gurtovoy
f87c89ad93 nvme-rdma: align nvme_rdma_device structure
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-23 16:28:05 +02:00
Sagi Grimberg
7db8144653 nvme-rdma: fix possible hang when issuing commands during ctrl removal
nvme_rdma_queue_is_ready() fails requests in case a queue is not
LIVE. If the controller is in RECONNECTING state, we might be in
this state for a long time (until we successfully reconnect) and
we are better off with failing the request fast. Otherwise, we
fail with BLK_STS_RESOURCE to have the block layer try again
soon.

In case we are removing the controller when the admin queue
is not LIVE, we will terminate the request with BLK_STS_RESOURCE
but it happens before we call blk_mq_start_request() so the
request timeout never expires, and the queue will never get
back to LIVE (because we are removing the controller). This
causes the removal operation to block infinitly [1].

Thus, if we are removing (state DELETING), and the queue is
not LIVE, we need to fail the request permanently as there is
no chance for it to ever complete successfully.

[1]
--
sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/u66:2   D    0   440      2 0x80000000
Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x221/0x580
 io_schedule_timeout+0x1e/0x50
 wait_for_completion_io_timeout+0x118/0x180
 blk_execute_rq+0x86/0xc0
 __nvme_submit_sync_cmd+0x89/0xf0
 nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
 nvme_shutdown_ctrl+0x41/0xe0
 nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
 nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
 nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
 process_one_work+0x1fd/0x630
 worker_thread+0x1db/0x3b0
 kthread+0x11e/0x150
 ret_from_fork+0x27/0x40
01              D    0  2868   2862 0x00000000
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x260/0x580
 wait_for_completion+0x108/0x170
 flush_work+0x1e0/0x270
 nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
 nvme_sysfs_delete+0x2a/0x40
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x45/0x60
 kernfs_fop_write+0x124/0x1c0
 __vfs_write+0x28/0x150
 vfs_write+0xc7/0x1b0
 SyS_write+0x49/0xa0
 entry_SYSCALL_64_fastpath+0x18/0xad
--

Reported-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-23 16:27:44 +02:00
James Smart
134aedc9c1 nvme-fc: correct io timeout behavior
The transport io timeout behavior wasn't quite correct. It ignored
that the io error handler is supposed to be synchronous so it possibly
allowed the blk request to be restarted while the io associated was
still aborting. Timeouts on reserved commands, those used for
association create, were never timing out thus they hung out forever.

To correct:
If an io is times out while a remoteport is not connected, just
restart the io timer. The lack of connectivity will simultaneously
be resetting the controller, so the reset path will abort and terminate
the io.

If an io is times out while it was marked for transport abort, just
reset the io timer. The abort process is underway and will complete
the io.

Otherwise, if an io times out, abort the io. If the abort was
unsuccessful (unlikely) give up and return not handled.

If the abort was successful, as the abort process is underway it will
terminate the io, so rather than synchronously waiting, just restart
the io timer.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:17:05 +02:00
James Smart
0a02e39fd1 nvme-fc: correct io termination handling
The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.

Change the io completion handler for the following items:

When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.

In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).

Rearranged code flow to remove a largely unneeded local variable.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:16:59 +02:00
Chaitanya Kulkarni
a7a7cbe353 nvme-pci: add SGL support
This adds SGL support for NVMe PCIe driver, based on an earlier patch
from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch
refactors the original code and adds new module parameter sgl_threshold
to determine whether to use SGL or PRP for IOs.

The usage of SGLs is controlled by the sgl_threshold module parameter,
which allows to conditionally use SGLs if average request segment
size (avg_seg_size) is greater than sgl_threshold. In the original patch,
the decision of using SGLs was dependent only on the IO size,
with the new approach we consider not only IO size but also the
number of physical segments present in the IO.

We calculate avg_seg_size based on request payload bytes and number
of physical segments present in the request.

For e.g.:-

1. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

2. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k
avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.

3. blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:16:58 +02:00
Christoph Hellwig
9843f685ae nvme: use ida_simple_{get,remove} for the controller instance
Switch to the ida_simple_* helpers instead of opencoding them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-20 12:16:57 +02:00
Roy Shterman
ba2dec35e4 nvmet: Change max_nsid in subsystem due to ns_disable if needed
In case we disable namespaces which has the nsid like
subsystem max_nsid we need to search for the next largest nsid
in this subsystem. If the subsystem don't has more namespaces
we set it to 0, else we take nsid from the last namespace in
namespaces list because the list is sorted while inserting.

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:06:02 +02:00
Sagi Grimberg
f04b9cc87b nvme-rdma: Fix error status return in tagset allocation failure
We should make sure to escelate allocation failures to prevent a
use-after-free in nvmf_create_ctrl.

Fixes: b28a308ee777 ("nvme-rdma: move tagset allocation to a dedicated routine")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 17:13:51 +02:00
Sagi Grimberg
bd9f07590a nvme-rdma: Fix possible double free in reconnect flow
The fact that we free the async event buffer in
nvme_rdma_destroy_admin_queue can cause us to free it
more than once because this happens in every reconnect
attempt since commit 31fdf1840170. we rely on the queue
state flags DELETING to avoid this for other resources.

A more complete fix is to not destroy the admin/io queues
unconditionally on every reconnect attempt, but its a bit
more extensive and will go in the next release.

Fixes: 31fdf1840170 ("nvme-rdma: reuse configure/destroy_admin_queue")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 17:07:22 +02:00
Bhumika Goyal
66603a31f8 nvmet: make config_item_type const
Make config_item_type structures const as they are either passed to a
function having the argument as const or used inside an if statement or
stored in the const "ci_type" field of a config_item structure.

Done using Coccinelle

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:28 +02:00
Israel Rukshin
86f36b9c6c nvme-loop: Add BLK_MQ_F_NO_SCHED flag to admin tag set
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:54 +02:00
Israel Rukshin
5a22e2bf44 nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart  <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:54 +02:00
Israel Rukshin
94f29d4f78 nvme-rdma: Add BLK_MQ_F_NO_SCHED flag to admin tag set
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:53 +02:00
Minwoo Im
16772ae6d9 nvme-pci: fix typos in comments
fixed comment typos in adapter_alloc_cq() and adapter_alloc_sq().
'the the' duplications are replaced with 'that the'.

Signed-off-by: Minwoo Im <dn3108@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:56:07 +02:00
James Smart
f9cf2a6491 nvmet: synchronize sqhd update
In testing target io in read write mix, we did indeed get into cases where
sqhd didn't update properly and slowly missed enough updates to shutdown
the queue.

Protect the updating sqhd by using cmpxchg, and for that turn the sqhd
field into a u32 so that cmpxchg works on it for all architectures.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 09:16:12 +02:00
James Smart
17c4dc6eb7 nvme-fc: retry initial controller connections 3 times
Currently, if a frame is lost of command fails as part of initial
association create for a new controller, the new controller connection
request will immediately fail.

Add in an immediate 3 retry loop before giving up.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:30:01 +02:00
James Smart
8a82dbf191 nvme-fc: fix iowait hang
Add missing iowait head initialization.
Fix irqsave vs irq: wait_event_lock_irq() doesn't do irq save/restore

Fixes: 36715cf4b366 ("nvme_fc: replace ioabort msleep loop with completion”)
Cc: <stable@vger.kernel.org> # 4.13
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Tested-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:30:00 +02:00
Sagi Grimberg
0ad0bfa298 nvme-rdma: stop controller reset if the controller is deleting
If the controller is deleting (in case the user decided to delete it), we
have no point to continue reset sequence.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:29:39 +02:00
Sagi Grimberg
5013e98b5e nvme-rdma: change queue flag semantics DELETING -> ALLOCATED
Instead of marking we are deleting, mark we are allocated and check that
instead. This makes the logic symmetrical to connected mark check.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:28:55 +02:00
Sagi Grimberg
60a5188633 nvme-rdma: Don't local invalidate if the queue is not live
No chance for the local invalidate to succeed if the queue-pair
is in error state. Most likely the target will do a remote
invalidation of our mr so not a big loss on the test_bit.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:54 +02:00
Sagi Grimberg
5e1fe61d41 nvme-rdma: teardown admin/io queues once on error recovery
Relying on the queue state while tearing down on every reconnect
attempt is not a good design. We should do it once in err_work
and simply try to establish the queues for each reconnect attempt.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:53 +02:00
Sagi Grimberg
0fc176dfda nvme-rdma: Check that reinit_request got a proper mr
Warn if req->mr is NULL as it should never happen.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:52 +02:00
Sagi Grimberg
0c5b43b9c1 nvme-rdma: move assignment to declaration
No need for the extra line for trivial assignments.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:52 +02:00
Sagi Grimberg
d8bfceebc4 nvme-rdma: fix wrong logging message
Not necessarily address resolution failed.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:51 +02:00
Sagi Grimberg
60070c78ef nvme-rdma: pass tagset to directly nvme_rdma_free_tagset
Instead of flagging admin/io.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:50 +02:00
Sagi Grimberg
31b8446079 nvme: introduce nvme_reinit_tagset
Move blk_mq_reinit_tagset from blk-mq to nvme core
as the only user of it. Current transports that use
it (rdma, fc) simply implement .reinit_request op.

This patch does not change any functionality.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:48 +02:00
Christoph Hellwig
761f2e1ed8 nvme: simplify compat_ioctl handling
We can just use our normal ioctl handler for the compat case and remove
the boilerplate code for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2017-10-16 14:54:10 +02:00
Javier González
1a94b2d484 lightnvm: implement generic path for sync I/O
Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Javier González
1b839187db lightnvm: fail fast on passthrough commands
Make LightNVM passhtrough commands fail fast. User space will then take
care of re-submitting.

Fixes: 84d4add793c6 ('lightnvm: add ioctls for vector I/Os')
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
James Smart
469d0ef06d nvme-fc: move remote port get/put/free location
move nvme_fc_rport_get/put and rport free to higher in the file to
avoid adding prototypes to resolve references in upcoming code additions

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-05 10:02:47 +02:00
Christoph Hellwig
8969f1f829 nvme-pci: Use PCI bus address for data/queues in CMB
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

Based on a report and previous patch from Abhishek Shah.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Cc: stable@vger.kernel.org
Reported-by: Abhishek Shah <abhishek.shah@broadcom.com>
Tested-by: Abhishek Shah <abhishek.shah@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 11:42:53 +02:00