587 Commits

Author SHA1 Message Date
Varun Prakash
4286c72c53 nvmet-tcp: fix data digest pointer calculation
commit e790de54e94a7a15fb725b34724d41d41cbaa60c upstream.

exp_ddgst is of type __le32, &cmd->exp_ddgst + cmd->offset increases
&cmd->exp_ddgst by 4 * cmd->offset, fix this by type casting
&cmd->exp_ddgst to u8 *.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02 19:48:21 +01:00
Sagi Grimberg
240a7025a6 nvme: code command_id with a genctr for use-after-free validation
[ Upstream commit e7006de6c23803799be000a5dcce4d916a36541a ]

We cannot detect a (perhaps buggy) controller that is sending us
a completion for a request that was already completed (for example
sending a completion twice), this phenomenon was seen in the wild
a few times.

So to protect against this, we use the upper 4 msbits of the nvme sqe
command_id to use as a 4-bit generation counter and verify it matches
the existing request generation that is incrementing on every execution.

The 16-bit command_id structure now is constructed by:
| xxxx | xxxxxxxxxxxx |
  gen    request tag

This means that we are giving up some possible queue depth as 12 bits
allow for a maximum queue depth of 4095 instead of 65536, however we
never create such long queues anyways so no real harm done.

Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18 13:40:29 +02:00
Amit Engel
10e759e350 nvmet: pass back cntlid on successful completion
[ Upstream commit e804d5abe2d74cfe23f5f83be580d1cdc9307111 ]

According to the NVMe specification, the response dword 0 value of the
Connect command is based on status code: return cntlid for successful
compeltion return IPO and IATTR for connect invalid parameters.  Fix
a missing error information for a zero sized queue, and return the
cntlid also for I/O queue Connect commands.

Signed-off-by: Amit Engel <amit.engel@dell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15 09:50:25 +02:00
Maurizio Lombardi
89047f0089 nvme-tcp: can't set sk_user_data without write_lock
[ Upstream commit 0755d3be2d9bb6ea38598ccd30d6bbaa1a5c3a50 ]

The sk_user_data pointer is supposed to be modified only while
holding the write_lock "sk_callback_lock", otherwise
we could race with other threads and crash the kernel.

we can't take the write_lock in nvmet_tcp_state_change()
because it would cause a deadlock, but the release_work queue
will set the pointer to NULL later so we can simply remove
the assignment.

Fixes: b5332a9f3f3d ("nvmet-tcp: fix incorrect locking in state_change sk callback")

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20 16:05:53 +02:00
Hannes Reinecke
950a739905 nvmet-fc: do not check for invalid target port in nvmet_fc_handle_fcp_rqst()
[ Upstream commit 2a4a910aa4f0acc428dc8d10227c42e14ed21d10 ]

When parsing a request in nvmet_fc_handle_fcp_rqst() we should not
check for invalid target ports; if we do the command is aborted
from the fcp layer, causing the host to assume a transport error.
Rather we should still forward this request to the nvmet layer, which
will then correctly fail the command with an appropriate error status.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:00 +02:00
Hannes Reinecke
511a010291 nvme-loop: do not warn for deleted controllers during reset
[ Upstream commit 6622f9acd29cd4f6272720e827e6406f5a970cb0 ]

During concurrent reset and delete calls the reset workqueue is
flushed, causing nvme_loop_reset_ctrl_work() to be executed when
the controller is in state DELETING or DELETING_NOIO.
But this is expected, so we shouldn't issue a WARN_ON here.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18 10:00:05 +02:00
Hannes Reinecke
155c2fea4b nvme-loop: check for NVME_LOOP_Q_LIVE in nvme_loop_destroy_admin_queue()
[ Upstream commit 4237de2f73a669e4f89ac0aa2b44fb1a1d9ec583 ]

We need to check the NVME_LOOP_Q_LIVE flag in
nvme_loop_destroy_admin_queue() to protect against duplicate
invocations eg during concurrent reset and remove calls.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18 10:00:05 +02:00
Hannes Reinecke
620424df29 nvme-loop: clear NVME_LOOP_Q_LIVE when nvme_loop_configure_admin_queue() fails
[ Upstream commit 1c5f8e882a05de5c011e8c3fbeceb0d1c590eb53 ]

When the call to nvme_enable_ctrl() in nvme_loop_configure_admin_queue()
fails the NVME_LOOP_Q_LIVE flag is not cleared.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18 10:00:05 +02:00
Hannes Reinecke
1c80ca596c nvme-loop: reset queue count to 1 in nvme_loop_destroy_io_queues()
[ Upstream commit a6c144f3d2e230f2b3ac5ed8c51e0f0391556197 ]

The queue count is increased in nvme_loop_init_io_queues(), so we
need to reset it to 1 at the end of nvme_loop_destroy_io_queues().
Otherwise the function is not re-entrant safe, and crash will happen
during concurrent reset and remove calls.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-18 10:00:05 +02:00
Sagi Grimberg
590f718a64 nvmet: fix false keep-alive timeout when a controller is torn down
[ Upstream commit aaeadd7075dc9e184bc7876e9dd7b3bada771df2 ]

Controller teardown flow may take some time in case it has many I/O
queues, and the host may not send us keep-alive during this period.
Hence reset the traffic based keep-alive timer so we don't trigger
a controller teardown as a result of a keep-alive expiration.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-16 12:01:38 +02:00
Max Gurtovoy
c440cd0807 nvmet: fix freeing unallocated p2pmem
[ Upstream commit bcd9a0797d73eeff659582f23277e7ab6e5f18f3 ]

In case p2p device was found but the p2p pool is empty, the nvme target
is still trying to free the sgl from the p2p pool instead of the
regular sgl pool and causing a crash (BUG() is called). Instead, assign
the p2p_dev for the request only if it was allocated from p2p pool.

This is the crash that was caused:

[Sun May 30 19:13:53 2021] ------------[ cut here ]------------
[Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518!
[Sun May 30 19:13:53 2021] invalid opcode: 0000 [#1] SMP PTI
...
[Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518!
...
[Sun May 30 19:13:53 2021] RIP: 0010:gen_pool_free_owner+0xa8/0xb0
...
[Sun May 30 19:13:53 2021] Call Trace:
[Sun May 30 19:13:53 2021] ------------[ cut here ]------------
[Sun May 30 19:13:53 2021]  pci_free_p2pmem+0x2b/0x70
[Sun May 30 19:13:53 2021]  pci_p2pmem_free_sgl+0x4f/0x80
[Sun May 30 19:13:53 2021]  nvmet_req_free_sgls+0x1e/0x80 [nvmet]
[Sun May 30 19:13:53 2021] kernel BUG at lib/genalloc.c:518!
[Sun May 30 19:13:53 2021]  nvmet_rdma_release_rsp+0x4e/0x1f0 [nvmet_rdma]
[Sun May 30 19:13:53 2021]  nvmet_rdma_send_done+0x1c/0x60 [nvmet_rdma]

Fixes: c6e3f1339812 ("nvmet: add metadata support for block devices")
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:39:18 +02:00
Hou Pu
cb554bbf36 nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
commit 25df1acd2d36eb72b14c3d00f6b861b1e00b3aab upstream.

Using "<=" instead "<" to compare inline data size.

Fixes: bdaf13279192 ("nvmet-tcp: fix a segmentation fault during io parsing error")
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03 09:00:39 +02:00
Daniel Wagner
e207bbf555 nvmet: seset ns->file when open fails
[ Upstream commit 85428beac80dbcace5b146b218697c73e367dcf5 ]

Reset the ns->file value to NULL also in the error case in
nvmet_file_ns_enable().

The ns->file variable points either to file object or contains the
error code after the filp_open() call. This can lead to following
problem:

When the user first setups an invalid file backend and tries to enable
the ns, it will fail. Then the user switches over to a bdev backend
and enables successfully the ns. The first received I/O will crash the
system because the IO backend is chosen based on the ns->file value:

static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
{
	[...]

	if (req->ns->file)
		return nvmet_file_parse_io_cmd(req);

	return nvmet_bdev_parse_io_cmd(req);
}

Reported-by: Enzo Matsumiya <ematsumiya@suse.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26 12:06:50 +02:00
Wu Bo
9c980795cc nvme-loop: fix memory leak in nvme_loop_create_ctrl()
[ Upstream commit 03504e3b54cc8118cc26c064e60a0b00c2308708 ]

When creating loop ctrl in nvme_loop_create_ctrl(), if nvme_init_ctrl()
fails, the loop ctrl should be freed before jumping to the "out" label.

Fixes: 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver")
Signed-off-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26 12:06:48 +02:00
Wu Bo
4720f29acb nvmet: fix memory leak in nvmet_alloc_ctrl()
[ Upstream commit fec356a61aa3d3a66416b4321f1279e09e0f256f ]

When creating ctrl in nvmet_alloc_ctrl(), if the cntlid_min is larger
than cntlid_max of the subsystem, and jumps to the
"out_free_changed_ns_list" label, but the ctrl->sqs lack of be freed.
Fix this by jumping to the "out_free_sqs" label.

Fixes: 94a39d61f80f ("nvmet: make ctrl-id configurable")
Signed-off-by: Wu Bo <wubo40@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26 12:06:48 +02:00
Amit
737ccd2134 nvmet: remove unused ctrl->cqs
[ Upstream commit 6d65aeab7bf6e83e75f53cfdbdb84603e52e1182 ]

remove unused cqs from nvmet_ctrl struct
this will reduce the allocated memory.

Signed-off-by: Amit <amit.engel@dell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-26 12:06:48 +02:00
Keith Busch
3851a86c3d nvmet: remove unsupported command noise
[ Upstream commit 4a20342572f66c5b20a1ee680f5ac0a13703748f ]

Nothing can stop a host from submitting invalid commands. The target
just needs to respond with an appropriate status, but that's not a
target error. Demote invalid command messages to the debug level so
these events don't spam the kernel logs.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22 11:40:54 +02:00
Michal Kalderon
64f3410c7b nvmet-rdma: Fix NULL deref when SEND is completed with error
[ Upstream commit 8cc365f9559b86802afc0208389f5c8d46b4ad61 ]

When running some traffic and taking down the link on peer, a
retry counter exceeded error is received. This leads to
nvmet_rdma_error_comp which tried accessing the cq_context to
obtain the queue. The cq_context is no longer valid after the
fix to use shared CQ mechanism and should be obtained similar
to how it is obtained in other functions from the wc->qp.

[ 905.786331] nvmet_rdma: SEND for CQE 0x00000000e3337f90 failed with status transport retry counter exceeded (12).
[ 905.832048] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 905.839919] PGD 0 P4D 0
[ 905.842464] Oops: 0000 1 SMP NOPTI
[ 905.846144] CPU: 13 PID: 1557 Comm: kworker/13:1H Kdump: loaded Tainted: G OE --------- - - 4.18.0-304.el8.x86_64 #1
[ 905.872135] RIP: 0010:nvmet_rdma_error_comp+0x5/0x1b [nvmet_rdma]
[ 905.878259] Code: 19 4f c0 e8 89 b3 a5 f6 e9 5b e0 ff ff 0f b7 75 14 4c 89 ea 48 c7 c7 08 1a 4f c0 e8 71 b3 a5 f6 e9 4b e0 ff ff 0f 1f 44 00 00 <48> 8b 47 48 48 85 c0 74 08 48 89 c7 e9 98 bf 49 00 e9 c3 e3 ff ff
[ 905.897135] RSP: 0018:ffffab601c45fe28 EFLAGS: 00010246
[ 905.902387] RAX: 0000000000000065 RBX: ffff9e729ea2f800 RCX: 0000000000000000
[ 905.909558] RDX: 0000000000000000 RSI: ffff9e72df9567c8 RDI: 0000000000000000
[ 905.916731] RBP: ffff9e729ea2b400 R08: 000000000000074d R09: 0000000000000074
[ 905.923903] R10: 0000000000000000 R11: ffffab601c45fcc0 R12: 0000000000000010
[ 905.931074] R13: 0000000000000000 R14: 0000000000000010 R15: ffff9e729ea2f400
[ 905.938247] FS: 0000000000000000(0000) GS:ffff9e72df940000(0000) knlGS:0000000000000000
[ 905.938249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 905.950067] nvmet_rdma: SEND for CQE 0x00000000c7356cca failed with status transport retry counter exceeded (12).
[ 905.961855] CR2: 0000000000000048 CR3: 000000678d010004 CR4: 00000000007706e0
[ 905.961855] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 905.961856] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 905.961857] PKRU: 55555554
[ 906.010315] Call Trace:
[ 906.012778] __ib_process_cq+0x89/0x170 [ib_core]
[ 906.017509] ib_cq_poll_work+0x26/0x80 [ib_core]
[ 906.022152] process_one_work+0x1a7/0x360
[ 906.026182] ? create_worker+0x1a0/0x1a0
[ 906.030123] worker_thread+0x30/0x390
[ 906.033802] ? create_worker+0x1a0/0x1a0
[ 906.037744] kthread+0x116/0x130
[ 906.040988] ? kthread_flush_work_fn+0x10/0x10
[ 906.045456] ret_from_fork+0x1f/0x40

Fixes: ca0f1a8055be2 ("nvmet-rdma: use new shared CQ mechanism")
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:14 +02:00
Chaitanya Kulkarni
c98ecfb182 nvmet: fix inline bio check for bdev-ns
[ Upstream commit 608a969046e6e0567d05a166be66c77d2dd8220b ]

When handling rw commands, for inline bio case we only consider
transfer size. This works well when req->sg_cnt fits into the
req->inline_bvec, but it will result in the warning in
__bio_add_page() when req->sg_cnt > NVMET_MAX_INLINE_BVEC.

Consider an I/O size 32768 and first page is not aligned to the page
boundary, then I/O is split in following manner :-

[ 2206.256140] nvmet: sg->length 3440 sg->offset 656
[ 2206.256144] nvmet: sg->length 4096 sg->offset 0
[ 2206.256148] nvmet: sg->length 4096 sg->offset 0
[ 2206.256152] nvmet: sg->length 4096 sg->offset 0
[ 2206.256155] nvmet: sg->length 4096 sg->offset 0
[ 2206.256159] nvmet: sg->length 4096 sg->offset 0
[ 2206.256163] nvmet: sg->length 4096 sg->offset 0
[ 2206.256166] nvmet: sg->length 4096 sg->offset 0
[ 2206.256170] nvmet: sg->length 656 sg->offset 0

Now the req->transfer_size == NVMET_MAX_INLINE_DATA_LEN i.e. 32768, but
the req->sg_cnt is (9) > NVMET_MAX_INLINE_BIOVEC which is (8).
This will result in the following warning message :-

nvmet_bdev_execute_rw()
	bio_add_page()
		__bio_add_page()
			WARN_ON_ONCE(bio_full(bio, len));

This scenario is very hard to reproduce on the nvme-loop transport only
with rw commands issued with the passthru IOCTL interface from the host
application and the data buffer is allocated with the malloc() and not
the posix_memalign().

Fixes: 73383adfad24 ("nvmet: don't split large I/Os unconditionally")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:14 +02:00
Chaitanya Kulkarni
f8ae879b77 nvmet: add lba to sect conversion helpers
[ Upstream commit 193fcf371f9e3705c14a0bf1d4bfc44af0f7c124 ]

In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:14 +02:00
Elad Grupi
b2c55f81c1 nvmet-tcp: fix a segmentation fault during io parsing error
[ Upstream commit bdaf13279192c60b2b1fc99badef53b494fec055 ]

In case there is an io that contains inline data and it goes to
parsing error flow, command response will free command and iov
before clearing the data on the socket buffer.
This will delay the command response until receive flow is completed.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14 09:50:27 +02:00
Sagi Grimberg
60ade0d56b nvmet-tcp: fix incorrect locking in state_change sk callback
[ Upstream commit b5332a9f3f3d884a1b646ce155e664cc558c1722 ]

We are not changing anything in the TCP connection state so
we should not take a write_lock but rather a read lock.

This caused a deadlock when running nvmet-tcp and nvme-tcp
on the same system, where state_change callbacks on the
host and on the controller side have causal relationship
and made lockdep report on this with blktests:

================================
WARNING: inconsistent lock state
5.12.0-rc3 #1 Tainted: G          I
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage.
nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
{IN-SOFTIRQ-W} state was registered at:
  __lock_acquire+0x79b/0x18d0
  lock_acquire+0x1ca/0x480
  _raw_write_lock_bh+0x39/0x80
  nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp]
  tcp_fin+0x2a8/0x780
  tcp_data_queue+0xf94/0x1f20
  tcp_rcv_established+0x6ba/0x1f00
  tcp_v4_do_rcv+0x502/0x760
  tcp_v4_rcv+0x257e/0x3430
  ip_protocol_deliver_rcu+0x69/0x6a0
  ip_local_deliver_finish+0x1e2/0x2f0
  ip_local_deliver+0x1a2/0x420
  ip_rcv+0x4fb/0x6b0
  __netif_receive_skb_one_core+0x162/0x1b0
  process_backlog+0x1ff/0x770
  __napi_poll.constprop.0+0xa9/0x5c0
  net_rx_action+0x7b3/0xb30
  __do_softirq+0x1f0/0x940
  do_softirq+0xa1/0xd0
  __local_bh_enable_ip+0xd8/0x100
  ip_finish_output2+0x6b7/0x18a0
  __ip_queue_xmit+0x706/0x1aa0
  __tcp_transmit_skb+0x2068/0x2e20
  tcp_write_xmit+0xc9e/0x2bb0
  __tcp_push_pending_frames+0x92/0x310
  inet_shutdown+0x158/0x300
  __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
  nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
  nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
  nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
  nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
  kernfs_fop_write_iter+0x2c7/0x460
  new_sync_write+0x36c/0x610
  vfs_write+0x5c0/0x870
  ksys_write+0xf9/0x1d0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 10687
hardirqs last  enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40
hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90
softirqs last  enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940
softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(clock-AF_INET);
  <Interrupt>
    lock(clock-AF_INET);

 *** DEADLOCK ***

5 locks held by nvme/1324:
 #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330
 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp]
 #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300

stack backtrace:
CPU: 26 PID: 1324 Comm: nvme Tainted: G          I       5.12.0-rc3 #1
Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
Call Trace:
 dump_stack+0x93/0xc2
 mark_lock_irq.cold+0x2c/0xb3
 ? verify_lock_unused+0x390/0x390
 ? stack_trace_consume_entry+0x160/0x160
 ? lock_downgrade+0x100/0x100
 ? save_trace+0x88/0x5e0
 ? _raw_spin_unlock_irqrestore+0x2d/0x40
 mark_lock+0x530/0x1470
 ? mark_lock_irq+0x1d10/0x1d10
 ? enqueue_timer+0x660/0x660
 mark_usage+0x215/0x2a0
 __lock_acquire+0x79b/0x18d0
 ? tcp_schedule_loss_probe.part.0+0x38c/0x520
 lock_acquire+0x1ca/0x480
 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 ? rcu_read_unlock+0x40/0x40
 ? tcp_mtu_probe+0x1ae0/0x1ae0
 ? kmalloc_reserve+0xa0/0xa0
 ? sysfs_file_ops+0x170/0x170
 _raw_read_lock+0x3d/0xa0
 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 ? sysfs_file_ops+0x170/0x170
 inet_shutdown+0x189/0x300
 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
 nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
 nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
 nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
 nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
 kernfs_fop_write_iter+0x2c7/0x460
 new_sync_write+0x36c/0x610
 ? new_sync_read+0x600/0x600
 ? lock_acquire+0x1ca/0x480
 ? rcu_read_unlock+0x40/0x40
 ? lock_is_held_type+0x9a/0x110
 vfs_write+0x5c0/0x870
 ksys_write+0xf9/0x1d0
 ? __ia32_sys_read+0xa0/0xa0
 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
 ? syscall_enter_from_user_mode+0x27/0x70
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14 09:50:25 +02:00
Hou Pu
9ec74da2ac nvmet: return proper error code from discovery ctrl
[ Upstream commit 79695dcd9ad4463a82def7f42960e6d7baa76f0b ]

Return NVME_SC_INVALID_FIELD from discovery controller like normal
controller when executing identify or get log page command.

Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11 14:47:26 +02:00
Elad Grupi
fa068ee3f3 nvmet-tcp: fix kmap leak when data digest in use
[ Upstream commit bac04454ef9fada009f0572576837548b190bf94 ]

When data digest is enabled we should unmap pdu iovec before handling
the data digest pdu.

Signed-off-by: Elad Grupi <elad.grupi@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07 15:00:06 +02:00
Lv Yunlong
8f0534c96a nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
[ Upstream commit abec6561fc4e0fbb19591a0b35676d8c783b5493 ]

In nvmet_rdma_write_data_done, rsp is recoverd by wc->wr_cqe and freed by
nvmet_rdma_release_rsp(). But after that, pr_info() used the freed
chunk's member object and could leak the freed chunk address with
wc->wr_cqe by computing the offset.

Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30 14:31:52 +02:00
Sagi Grimberg
628f39a57a nvmet: don't check iosqes,iocqes for discovery controllers
commit d218a8a3003e84ab136e69a4e30dd4ec7dab2d22 upstream.

From the base spec, Figure 78:

  "Controller Configuration, these fields are defined as parameters to
   configure an "I/O Controller (IOC)" and not to configure a "Discovery
   Controller (DC).

   ...
   If the controller does not support I/O queues, then this field shall
   be read-only with a value of 0h

Just perform this check for I/O controllers.

Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
Reported-by: Belanger, Martin <Martin.Belanger@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-25 09:04:07 +01:00
Chaitanya Kulkarni
0bbbd44ba1 nvmet: set status to 0 in case for invalid nsid
[ Upstream commit 40244ad36bcfb796a6bb9e95bdcbf8ddf3134509 ]

For unallocated namespace in nvmet_execute_identify_ns() don't set the
status to NVME_SC_INVALID_NS, set it to zero.

Fixes: bffcd507780e ("nvmet: set right status on error in id-ns handler")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:50 +01:00
Chaitanya Kulkarni
6c32db01dd nvmet: remove extra variable in identify ns
[ Upstream commit 3c7b224f1956ed232b24ed2eb2c54e4476c6acb2 ]

We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_ns() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:50 +01:00
Sagi Grimberg
5f8ab7f8fe nvmet-tcp: fix potential race of tcp socket closing accept_work
[ Upstream commit 0fbcfb089a3f2f2a731d01f0aec8f7697a849c28 ]

When we accept a TCP connection and allocate an nvmet-tcp queue we should
make sure not to fully establish it or reference it as the connection may
be already closing, which triggers queue release work, which does not
fence against queue establishment.

In order to address such a race, we make sure to check the sk_state and
contain the queue reference to be done underneath the sk_callback_lock
such that the queue release work correctly fences against it.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:50 +01:00
Sagi Grimberg
91edfca6f8 nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
[ Upstream commit fda871c0ba5d2eed2cd1c881573168129da70058 ]

When a host sends multiple h2cdata PDUs for a single command, we
should verify the data digest calculation per PDU and not
per command.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:37:49 +01:00
Sagi Grimberg
c1debbaf15 nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
[ Upstream commit cb8563f5c735a042ea2dd7df1ad55ae06d63ffeb ]

When the host sends multiple h2cdata PDUs, we keep track on
the receive progress and calculate the scatterlist index and
offsets.

The issue is that sg_offset should only be kept for the first
iov entry we map in the iovec as this is the difference between
our cursor and the sg entry offset itself.

In addition, the sg index was calculated wrong because we should
not round up when dividing the command byte offset with PAG_SIZE.

Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-10 09:29:15 +01:00
Chaitanya Kulkarni
f243b5dd98 nvmet: set right status on error in id-ns handler
[ Upstream commit bffcd507780ea614b5543c66f2e37ce0d55cd449 ]

The function nvmet_execute_identify_ns() doesn't set the status if call
to nvmet_find_namespace() fails. In that case we set the status of the
request to the value return by the nvmet_copy_sgl().

Set the status to NVME_SC_INVALID_NS and adjust the code such that
request will have the right status on nvmet_find_namespace() failure.

Without this patch :-
NVME Identify Namespace 3:
nsze    : 0
ncap    : 0
nuse    : 0
nsfeat  : 0
nlbaf   : 0
flbas   : 0
mc      : 0
dpc     : 0
dps     : 0
nmic    : 0
rescap  : 0
fpi     : 0
dlfeat  : 0
nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
noiob   : 0
nvmcap  : 0
mssrl   : 0
mcl     : 0
msrc    : 0
nsattr	: 0
nvmsetid: 0
anagrpid: 0
endgid  : 0
nguid   : 00000000000000000000000000000000
eui64   : 0000000000000000
lbaf  0 : ms:0   lbads:0  rp:0 (in use)

With this patch-series :-
feb3b88b501e (HEAD -> nvme-5.11) nvmet: remove extra variable in identify ns
6302aa67210a nvmet: remove extra variable in id-desclist
ed57951da453 nvmet: remove extra variable in smart log nsid
be384b8c24dc nvmet: set right status on error in id-ns handler

NVMe status: INVALID_NS: The namespace or the format of that namespace is invalid(0xb)

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-07 15:37:16 +01:00
Israel Rukshin
a60c7aaacc nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
commit 7a84665619bb5da8c8b6517157875a1fd7632014 upstream.

When setting port traddr to INADDR_ANY, the listening cm_id->device
is NULL. The associate IB device is known only when a connect request
event arrives, so checking T10-PI device capability should be done
at this stage.

Fixes: b09160c3996c ("nvmet-rdma: add metadata/T10-PI support")
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:30 +01:00
Israel Rukshin
8e57baf3cd nvmet-rdma: Fix list_del corruption on queue establishment failure
[ Upstream commit 9ceb7863537748c67fa43ac4f2f565819bbd36e4 ]

When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some
requests at rsp_wait_list. In case a disconnect occurs at this
state, no one will empty this list and will return the requests to
free_rsps list. Normally nvmet_rdma_queue_established() free those
requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in
this case __nvmet_rdma_queue_disconnect() is called before. The
crash happens at nvmet_rdma_free_rsps() when calling
list_del(&rsp->free_list), because the request exists only at
the wait list. To fix the issue, simply clear rsp_wait_list when
destroying the queue.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:26 +01:00
Christoph Hellwig
404fa09374 RDMA/core: remove use of dma_virt_ops
[ Upstream commit 5a7a9e038b032137ae9c45d5429f18a2ffdf7d42 ]

Use the ib_dma_* helpers to skip the DMA translation instead.  This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it.  This also means the software RDMA drivers now don't
have to mess with DMA parameters that are not relevant to them at all, and
that in the future we can use PCI P2P transfers even for software RDMA, as
there is no first fake layer of DMA mapping that the P2P DMA support.

Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-09 13:46:24 +01:00
Chaitanya Kulkarni
3c3751f2da nvmet: fix a NULL pointer dereference when tracing the flush command
When target side trace in turned on and flush command is issued from the
host it results in the following Oops.

[  856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068
[  856.790686] #PF: supervisor read access in kernel mode
[  856.791262] #PF: error_code(0x0000) - not-present page
[  856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0
[  856.792527] Oops: 0000 [#1] SMP NOPTI
[  856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G           OE     5.9.0nvme-5.9+ #71
[  856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
[  856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet]
[  856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0
[  856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246
[  856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000
[  856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034
[  856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe
[  856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034
[  856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440
[  856.802667] FS:  00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
[  856.803635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0
[  856.805283] Call Trace:
[  856.805613]  nvmet_req_init+0x27c/0x480 [nvmet]
[  856.806200]  nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop]
[  856.806862]  blk_mq_dispatch_rq_list+0x123/0x7b0
[  856.807459]  ? kvm_sched_clock_read+0x14/0x30
[  856.808025]  __blk_mq_sched_dispatch_requests+0xc7/0x170
[  856.808708]  blk_mq_sched_dispatch_requests+0x30/0x60
[  856.809372]  __blk_mq_run_hw_queue+0x70/0x100
[  856.809935]  __blk_mq_delay_run_hw_queue+0x156/0x170
[  856.810574]  blk_mq_run_hw_queue+0x86/0xe0
[  856.811104]  blk_mq_sched_insert_request+0xef/0x160
[  856.811733]  blk_execute_rq+0x69/0xc0
[  856.812212]  ? blk_mq_rq_ctx_init+0xd0/0x230
[  856.812784]  nvme_execute_passthru_rq+0x57/0x130 [nvme_core]
[  856.813461]  nvme_submit_user_cmd+0xeb/0x300 [nvme_core]
[  856.814099]  nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core]
[  856.814752]  blkdev_ioctl+0x1dc/0x2c0
[  856.815197]  block_ioctl+0x3f/0x50
[  856.815606]  __x64_sys_ioctl+0x84/0xc0
[  856.816074]  do_syscall_64+0x33/0x40
[  856.816533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  856.817168] RIP: 0033:0x7f6a222ed107
[  856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8
[  856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107
[  856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003
[  856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005
[  856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3
[  856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900
[  856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel

Move the nvmet_req_init() tracepoint after we parse the command in
nvmet_req_init() so that we can get rid of the duplicate
nvmet_find_namespace() call.
Rename __assign_disk_name() ->  __assign_req_name(). Now that we call
tracepoint after parsing the command simplify the newly added
__assign_req_name() which fixes this bug.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-27 10:02:50 +01:00
Chaitanya Kulkarni
150dfb6c83 nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
By default, we set the passthru request allocation flag such that it
returns the error in the following code path and we fail the I/O when
BLK_MQ_REQ_NOWAIT is used for request allocation :-

nvme_alloc_request()
 blk_mq_alloc_request()
  blk_mq_queue_enter()
   if (flag & BLK_MQ_REQ_NOWAIT)
        return -EBUSY; <-- return if busy.

On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where
the controller is perfectly healthy and not in a degraded state.

Block layer request allocation does allow us to wait instead of
immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not
used. This has shown to fix the I/O error problem reported under
heavy random write workload.

Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation
which resolves this issue.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:28:02 +02:00
Logan Gunthorpe
5e063101ff nvmet: cleanup nvmet_passthru_map_sg()
Clean up some confusing elements of nvmet_passthru_map_sg() by returning
early if the request is greater than the maximum bio size. This allows
us to drop the sg_cnt variable.

This should not result in any functional change but makes the code
clearer and more understandable. The original code allocated a truncated
bio then would return EINVAL when bio_add_pc_page() filled that bio. The
new code just returns EINVAL early if this would happen.

Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Suggested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:28:02 +02:00
Logan Gunthorpe
df06047d54 nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain
so the effective maximum transfer should also be limitted by
BIO_MAX_PAGES (presently this works out to 1MB).

For PCI passthru devices the max_sectors would typically be more
limitting than BIO_MAX_PAGES, but this may not be true for all passthru
devices.

Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:57 +02:00
zhenwei pi
85bd23f3dc nvmet: fix uninitialized work for zero kato
When connecting a controller with a zero kato value using the following
command line

   nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0

the warning below can be reproduced:

WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
with trace:
  mod_delayed_work_on+0x59/0x90
  nvmet_update_cc+0xee/0x100 [nvmet]
  nvmet_execute_prop_set+0x72/0x80 [nvmet]
  nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
  nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
  ...

This is caused by queuing up an uninitialized work.  Althrough the
keep-alive timer is disabled during allocating the controller (fixed in
0d3b6a8d213a), ka_work still has a chance to run (called by
nvmet_start_ctrl).

Fixes: 0d3b6a8d213a ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22 15:27:14 +02:00
Chaitanya Kulkarni
1401fcc4e3 nvme-loop: don't put ctrl on nvme_init_ctrl error
The function nvme_init_ctrl() gets the ctrl reference & when it fails it
does put the ctrl reference in the error unwind code.

When creating loop ctrl in nvme_loop_create_ctrl() if nvme_init_ctrl()
returns non zero (i.e. error) value it jumps to the "out_put_ctrl" label
which calls nvme_put_ctrl(), that will lead to douple ctrl put in error
unwind path.

Update nvme_loop_create_ctrl() such that this patch removes the
"out_put_ctrl" label, add a new "out" label after nvme_put_ctrl() in
error unwind path and jump to newly added label when nvme_init_ctrl()
call retuns an error.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07 07:56:16 +02:00
James Smart
ddd3d10517 nvmet-fc: fix missing check for no hostport struct
A hostport port pointer is allowed to be NULL as it is not allocated if
the lldd does not support the new interfaces for NVME LS request support.
The hostport free routine validates the handle but forgot to validate the
hostport pointer.

Validate the hostport pointer before using it to validate the handle.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27 09:14:19 +02:00
Chaitanya Kulkarni
5b3356d9da nvmet: add passthru ZNS support
In the default passthru implementation NVMeOF target passthru ctrl is
not capable of handling Zoned Namespaces (ZNS).

Update the nvmet_parse_pasthru_admin_cmd() to allow
NVME_ID_CNS_CS_CTRL/NVME_CIS_ZNS and NVME_ID_CNS_CS_NS/NVME_CIS_ZNS.

With this addition NVMeOF Passthru allows Zoned Namespaces.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27 09:14:19 +02:00
Amit Engel
4e683c48db nvmet: handle keep-alive timer when kato is modified by a set features cmd
A user may modify the kato by a set features cmd.  To properly deal
with races or a kato value of 0 (no keep alive enabled) change
nvmet_set_feat_kato to first disable the timer, then set the value
and then re-enable the timer.

Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27 09:14:19 +02:00
Mark Wunderlich
f7790e5de9 nvmet-tcp: have queue io_work context run on sock incoming cpu
No real good need to spread queues artificially. Usually the
target will serve multiple hosts, and it's better to run on the socket
incoming cpu for better affinitization rather than spread queues on all
online cpus.

We rely on RSS to spread the work around sufficiently.

Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27 09:14:18 +02:00
Chaitanya Kulkarni
b2702aaaa4 nvme: lift the file open code from nvme_ctrl_get_by_path
Lift opening the file open/close code from nvme_ctrl_get_by_path into
the caller, just keeping a simple nvme_ctrl_from_file() helper.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: refactored a bit, split the bug fixes into a separate prep patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-09-27 09:14:18 +02:00
Linus Torvalds
c37b718922 block-5.9-2020-09-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9qLq8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqrCD/4hjvDuZIDMZILnOhHtZPB+q1bnf48dcB2v
 WQlDJAZVDKIdBy49gN2mraLwbeKvnzTv25kfZkXZPL4F32TuXK8E57tHPCYEq5li
 qb6y3o0Y1y0p78PtYittfWeUWaYkT2v91QQjLc6Vyh8swL15XDQ37lBw7qqtNUhw
 WMTS1Q2bw0wjltRAC5XSTD73PwcjFMYhE/1YBWE4vckaB1K+kLN6RRaVZ03unC/U
 zhSd2WqZyHwaEsl66Vtl7ty+SMoqahfXNvBcvvLVY6mD2U0hLWCBlnWY5SjYmzLZ
 3lVqmiL+diaAoDRroQnpFkAJSRnnWv3g3gWbygbSKJScFGKh15k7Px4ztWr233Xb
 KZtoZN826PhSkujIB8wKaFrrx3Zz00a9flqvum2ejOQAP5FiS2QRLlaZuf2U2xqm
 5AhZ7ul1qm9kik0ZULPBY1myK1Y8sSoKSnu3WAVUPo974fAPTvhWUpO+i6SssYWu
 oI2VUK9BvcgP1MjMms1EYEpY8rg8G+TGzN+P6jJcBirAAecdpXhLaLwOdEQC9De/
 P/OfyHg6lIgs9PP0NOp8BeUDQe9bDW8gNxv57R57+N7ZIKP0443LwSaHpmFInBC+
 lxAGcghsl++TZZ+sCDM8Lkw5IZWcc/czewHzVFVMjpnivt0kuQrndFyOxRdHjggy
 4Vo1Sa1Ahg==
 =zT03
 -----END PGP SIGNATURE-----

Merge tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few NVMe fixes, and a dasd write zero fix"

* tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
  nvmet: get transport reference for passthru ctrl
  nvme-core: get/put ctrl and transport module in nvme_dev_open/release()
  nvme-tcp: fix kconfig dependency warning when !CRYPTO
  nvme-pci: disable the write zeros command for Intel 600P/P3100
  s390/dasd: Fix zero write for FBA devices
2020-09-22 14:31:38 -07:00
Christoph Hellwig
3a6b076168 nvmet: get transport reference for passthru ctrl
Grab a reference to the transport driver to ensure it can't be unloaded
while a passthrough controller is active.

Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Reported-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-09-17 10:36:25 +02:00
Linus Torvalds
8075fc3b11 block-5.9-2020-09-04
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9SWMMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgphIcD/488Q7rXb2eABp1fGs4gu+VFOCLogeHL8xh
 5xHNiOPnZG2SGr8DQJY/7EX2kE65rbZi8/g+2N6anovI2nduRu0tzSra7fRgzbys
 ZQC1CUel0MbCd7e8OaEfg108PSHNxBf1PqDcE7zCeyZ0DIs3s4vK/bQtmzzxZHgU
 wNw4OIP9gOdqgjowb6GGHo9SLN4GT8rZ0jZVPLa7GwFsvxCTwv/7lHO8rqeSeuCu
 5H6i3M/rSbtTXPLHf4Fy97x9WmBmdgu4epTXiwbOxaagpx3lm/7n1P3CpavR+Gcq
 O5VGIIzazxPwnZl9y/6rZFLGYqcj38RxUvC8KtK6tDXxEu/BDJa1d6hXI03SyXAO
 ZAiEpQTKOkJE3R8ewUDrXLvl3p6FvwZVZ5SIFwUb+0JFrVQYwrgfoRJtzb5SIUan
 T9/bSYge7lFRI92FZRIqhvk8rsEBRdu7N/rQCyGf6GuZ0vRXWRAqN7T02iDn3czX
 pdGAepU5ymw8CwyUiNNnkY0DUaQLBIO9tCA9epxLwdroQ95vJtMPRBX1STQ65GVk
 XvMFAJqDAehQ/nP5xO60cWGZHyL7L/ccpofZlA/ytgAIZRa85GvhrdVy7yc6DKto
 wu6h2tkX9+ldoUjVbn/60T+Ft3QUTlfAuDfherkNoFNB/G5i1pzOHbwvL7B3czr3
 ZMjoNiOIqA==
 =8fvz
 -----END PGP SIGNATURE-----

Merge tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A bit larger than usual this week, mostly due to the NVMe fixes
  arriving late for -rc3 and hence didn't make last weeks pull request.

   - NVMe:
        - instance leak and io boundary fixes from Keith
        - fc locking fix from Christophe
        - various tcp/rdma reset during traffic fixes from Sagi
        - pci use-after-free fix from Tong
        - tcp target null deref fix from Ziye

   - Locking fix for partition removal (Christoph)

   - Ensure bdi->io_pages is always set (me)

   - Fixup for hd struct reference (Ming)

   - Fix for zero length bvecs (Ming)

   - Two small blk-iocost fixes (Tejun)"

* tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
  block: allow for_each_bvec to support zero len bvec
  blk-stat: make q->stats->lock irqsafe
  blk-iocost: ioc_pd_free() shouldn't assume irq disabled
  block: fix locking in bdev_del_partition
  block: release disk reference in hd_struct_free_work
  block: ensure bdi->io_pages is always initialized
  nvme-pci: cancel nvme device request before disabling
  nvme: only use power of two io boundaries
  nvme: fix controller instance leak
  nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
  nvme: Fix NULL dereference for pci nvme controllers
  nvme-rdma: fix reset hang if controller died in the middle of a reset
  nvme-rdma: fix timeout handler
  nvme-rdma: serialize controller teardown sequences
  nvme-tcp: fix reset hang if controller died in the middle of a reset
  nvme-tcp: fix timeout handler
  nvme-tcp: serialize controller teardown sequences
  nvme: have nvme_wait_freeze_timeout return if it timed out
  nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
  nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
2020-09-04 13:04:51 -07:00
Christophe JAILLET
70e37988db nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
The way 'spin_lock()' and 'spin_lock_irqsave()' are used is not consistent
in this function.

Use 'spin_lock_irqsave()' also here, as there is no guarantee that
interruptions are disabled at that point, according to surrounding code.

Fixes: a97ec51b37ef ("nvmet_fc: Rework target side abort handling")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2020-08-28 16:43:57 -07:00