23121 Commits

Author SHA1 Message Date
Mike Christie
f24ee7391a scsi: core: Fix passthrough retry counter handling
commit fac8e558da9485e13a0ae0488aa0b8a8c307cd34 upstream.

Passthrough users will set the scsi_cmnd->allowed value and were expecting
up to $allowed retries. The problem is that before:

commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request")

we used to set the retries on the scsi_request then copy them over to
scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set
scsi_cmnd->allowed to 0 in scsi_prepare_cmd and overwrite what the
passthrough user set.

This moves the allowed initialization to after the blk_rq_is_passthrough()
check so it's only done for the non-passthrough path where the ULD
init_command will normally set an allowed value it prefers.

Link: https://lore.kernel.org/r/20220812011206.9157-1-michael.christie@oracle.com
Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31 17:18:20 +02:00
Saurabh Sengar
828f57ac75 scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq
commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream.

storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it
doesn't need to make forward progress under memory pressure.  Marking this
workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a
non-WQ_MEM_RECLAIM workqueue.  In the current state it causes the following
warning:

[   14.506347] ------------[ cut here ]------------
[   14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn
[   14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130
[   14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu
[   14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[   14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun
[   14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130
		<-snip->
[   14.506408] Call Trace:
[   14.506412]  __flush_work+0xf1/0x1c0
[   14.506414]  __cancel_work_timer+0x12f/0x1b0
[   14.506417]  ? kernfs_put+0xf0/0x190
[   14.506418]  cancel_delayed_work_sync+0x13/0x20
[   14.506420]  disk_block_events+0x78/0x80
[   14.506421]  del_gendisk+0x3d/0x2f0
[   14.506423]  sr_remove+0x28/0x70
[   14.506427]  device_release_driver_internal+0xef/0x1c0
[   14.506428]  device_release_driver+0x12/0x20
[   14.506429]  bus_remove_device+0xe1/0x150
[   14.506431]  device_del+0x167/0x380
[   14.506432]  __scsi_remove_device+0x11d/0x150
[   14.506433]  scsi_remove_device+0x26/0x40
[   14.506434]  storvsc_remove_lun+0x40/0x60
[   14.506436]  process_one_work+0x209/0x400
[   14.506437]  worker_thread+0x34/0x400
[   14.506439]  kthread+0x121/0x140
[   14.506440]  ? process_one_work+0x400/0x400
[   14.506441]  ? kthread_park+0x90/0x90
[   14.506443]  ret_from_fork+0x35/0x40
[   14.506445] ---[ end trace 2d9633159fdc6ee7 ]---

Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com
Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31 17:18:20 +02:00
James Smart
4eb7a1beff scsi: lpfc: Fix possible memory leak when failing to issue CMF WQE
[ Upstream commit 2f67dc7970bce3529edce93a0a14234d88b3fcd5 ]

There is no corresponding free routine if lpfc_sli4_issue_wqe fails to
issue the CMF WQE in lpfc_issue_cmf_sync_wqe.

If ret_val is non-zero, then free the iocbq request structure.

Link: https://lore.kernel.org/r/20220701211425.2708-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:43 +02:00
James Smart
2d544e9d19 scsi: lpfc: Prevent buffer overflow crashes in debugfs with malformed user input
[ Upstream commit f8191d40aa612981ce897e66cda6a88db8df17bb ]

Malformed user input to debugfs results in buffer overflow crashes.  Adapt
input string lengths to fit within internal buffers, leaving space for NULL
terminators.

Link: https://lore.kernel.org/r/20220701211425.2708-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:43 +02:00
Mike Christie
0483ffc02e scsi: iscsi: Fix HW conn removal use after free
[ Upstream commit c577ab7ba5f3bf9062db8a58b6e89d4fe370447e ]

If qla4xxx doesn't remove the connection before the session, the iSCSI
class tries to remove the connection for it. We were doing a
iscsi_put_conn() in the iter function which is not needed and will result
in a use after free because iscsi_remove_conn() will free the connection.

Link: https://lore.kernel.org/r/20220616222738.5722-2-michael.christie@oracle.com
Tested-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25 11:45:42 +02:00
Arun Easi
b76d3c729c scsi: qla2xxx: Fix losing FCP-2 targets during port perturbation tests
commit 58d1c124cd79ea686b512043c5bd515590b2ed95 upstream.

When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target
state was incorrectly transitioned when both of the targets were gone. Fix
this by ignoring state transition for FCP-2 targets.

Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:05 +02:00
Arun Easi
3267770578 scsi: qla2xxx: Fix losing target when it reappears during delete
commit 118b0c863c8f5629cc5271fc24d72d926e0715d9 upstream.

FC target disappeared during port perturbation tests due to a race that
tramples target state.  Fix the issue by adding state checks before
proceeding.

Link: https://lore.kernel.org/r/20220616053508.27186-8-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:05 +02:00
Arun Easi
11e9db717b scsi: qla2xxx: Fix losing FCP-2 targets on long port disable with I/Os
commit 2416ccd3815ba1613e10a6da0a24ef21acfe5633 upstream.

FCP-2 devices were not coming back online once they were lost, login
retries exhausted, and then came back up.  Fix this by accepting RSCN when
the device is not online.

Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Quinn Tran
dc07af0e7c scsi: qla2xxx: Wind down adapter after PCIe error
commit d3117c83ba316b3200d9f2fe900f2b9a5525a25c upstream.

Put adapter into a wind down state if OS does not make any attempt to
recover the adapter after PCIe error.

Link: https://lore.kernel.org/r/20220616053508.27186-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Quinn Tran
c3a824d061 scsi: qla2xxx: Fix erroneous mailbox timeout after PCI error injection
commit f260694e6463b63ae550aad25ddefe94cb1904da upstream.

Clear wait for mailbox interrupt flag to prevent stale mailbox:

Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps).
Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ...

To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging
the mailbox. When the stale mailbox completion does arrive, it will be
dropped.

Link: https://lore.kernel.org/r/20220616053508.27186-11-njavali@marvell.com
Fixes: b6faaaf796d7 ("scsi: qla2xxx: Serialize mailbox request")
Cc: Naresh Bannoth <nbannoth@in.ibm.com>
Cc: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Cc: stable@vger.kernel.org
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Tested-by: Naresh Bannoth <nbannoth@in.ibm.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Arun Easi
79b2ac420c scsi: qla2xxx: Fix excessive I/O error messages by default
commit bff4873c709085e09d0ffae0c25b8e65256e3205 upstream.

Disable printing I/O error messages by default.  The messages will be
printed only when logging was enabled.

Link: https://lore.kernel.org/r/20220616053508.27186-2-njavali@marvell.com
Fixes: 8e2d81c6b5be ("scsi: qla2xxx: Fix excessive messages during device logout")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Arun Easi
7dcd49c42b scsi: qla2xxx: Fix crash due to stale SRB access around I/O timeouts
commit c39587bc0abaf16593f7abcdf8aeec3c038c7d52 upstream.

Ensure SRB is returned during I/O timeout error escalation. If that is not
possible fail the escalation path.

Following crash stack was seen:

BUG: unable to handle kernel paging request at 0000002f56aa90f8
IP: qla_chk_edif_rx_sa_delete_pending+0x14/0x30 [qla2xxx]
Call Trace:
 ? qla2x00_status_entry+0x19f/0x1c50 [qla2xxx]
 ? qla2x00_start_sp+0x116/0x1170 [qla2xxx]
 ? dma_pool_alloc+0x1d6/0x210
 ? mempool_alloc+0x54/0x130
 ? qla24xx_process_response_queue+0x548/0x12b0 [qla2xxx]
 ? qla_do_work+0x2d/0x40 [qla2xxx]
 ? process_one_work+0x14c/0x390

Link: https://lore.kernel.org/r/20220616053508.27186-6-njavali@marvell.com
Fixes: d74595278f4a ("scsi: qla2xxx: Add multiple queue pair functionality.")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Quinn Tran
b16f0327ff scsi: qla2xxx: Turn off multi-queue for 8G adapters
commit 5304673bdb1635e27555bd636fd5d6956f1cd552 upstream.

For 8G adapters, multi-queue was enabled accidentally. Make sure
multi-queue is not enabled.

Link: https://lore.kernel.org/r/20220616053508.27186-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Arun Easi
182726ce80 scsi: qla2xxx: Fix discovery issues in FC-AL topology
commit 47ccb113cead905bdc236571bf8ac6fed90321b3 upstream.

A direct attach tape device, when gets swapped with another, was not
discovered. Fix this by looking at loop map and reinitialize link if there
are devices present.

Link: https://lore.kernel.org/linux-scsi/baef87c3-5dad-3b47-44c1-6914bfc90108@cybernetics.com/
Link: https://lore.kernel.org/r/20220713052045.10683-8-njavali@marvell.com
Cc: stable@vger.kernel.org
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Tested-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:04 +02:00
Quinn Tran
bff616494e scsi: qla2xxx: Fix imbalance vha->vref_count
commit 63fa7f2644b4b48e1913af33092c044bf48e9321 upstream.

vref_count took an extra decrement in the task management path.  Add an
extra ref count to compensate the imbalance.

Link: https://lore.kernel.org/r/20220713052045.10683-7-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:16:03 +02:00
Mahesh Rajashekhara
e240328df6 scsi: smartpqi: Fix DMA direction for RAID requests
[ Upstream commit 69695aeaa6621bc49cdd7a8e5a8d1042461e496e ]

Correct a SOP READ and WRITE DMA flags for some requests.

This update corrects DMA direction issues with SCSI commands removed from
the controller's internal lookup table.

Currently, SCSI READ BLOCK LIMITS (0x5) was removed from the controller
lookup table and exposed a DMA direction flag issue.

SCSI READ BLOCK LIMITS was recently removed from our controller lookup
table so the controller uses the respective IU flag field to set the DMA
data direction. Since the DMA direction is incorrect the FW never completes
the request causing a hang.

Some SCSI commands which use SCSI READ BLOCK LIMITS

      * sg_map
      * mt -f /dev/stX status

After updating controller firmware, users may notice their tape units
failing. This patch resolves the issue.

Also, the AIO path DMA direction is correct.

The DMA direction flag is a day-one bug with no reported BZ.

Fixes: 6c223761eb54 ("smartpqi: initial commit of Microsemi smartpqi driver")
Link: https://lore.kernel.org/r/165730605618.177165.9054223644512926624.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:32 +02:00
James Smart
8cc35cd5ae scsi: lpfc: Revert RSCN_MEMENTO workaround for misbehaved configuration
[ Upstream commit ffc566411aded3c12c63e1310fb23830e9608c59 ]

The RSCN_MEMENTO logic was to workaround a target that does not register
both FCP and NVMe FC4 types at the same time.  This caused the
configuration to not produce a second RSCN for the NVMe FC4 type
registration in a timely manner.  The intention of the RSCN_MEMENTO flag
was to always signal to try NVMe PRLI.

However, there are other FCP-only target arrays in correctly behaved
configurations that reject the NVMe PRLI followed by a LOGO leading to
never rediscovering the target after an issue_lip (as LOGO causes a repeat
of PLOGI/PRLIs).

Revert the RSCN_MEMENTO patch as it is causing correctly behaved configs to
fail while it exists only to succeed on a misbehaved config.

Link: https://lore.kernel.org/r/20220701211425.2708-9-jsmart2021@gmail.com
Fixes: 1045592fc968 ("scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:29 +02:00
Dan Carpenter
873fb210f1 scsi: qla2xxx: Check correct variable in qla24xx_async_gffid()
[ Upstream commit 7c33e477bd883f79cccec418980cb8f7f2d50347 ]

There is a copy and paste bug here.  It should check ".rsp" instead of
".req".  The error message is copy and pasted as well so update that too.

Link: https://lore.kernel.org/r/YrK1A/t3L6HKnswO@kili
Fixes: 9c40c36e75ff ("scsi: qla2xxx: edif: Reduce Initiator-Initiator thrashing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:29 +02:00
Mike Christie
984be156c3 scsi: iscsi: Fix session removal on shutdown
[ Upstream commit 31500e902759322ba3c64b60dabae2704e738df8 ]

When the system is shutting down, iscsid is not running so we will not get
a response to the ISCSI_ERR_INVALID_HOST error event. The system shutdown
will then hang waiting on userspace to remove the session.

This has libiscsi force the destruction of the session from the kernel when
iscsi_host_remove() is called from a driver's shutdown callout.

This fixes a regression added in qedi boot with commit d1f2ce77638d ("scsi:
qedi: Fix host removal with running sessions") which made qedi use the
common session removal function that waits on userspace instead of rolling
its own kernel based removal.

Link: https://lore.kernel.org/r/20220616222738.5722-7-michael.christie@oracle.com
Fixes: d1f2ce77638d ("scsi: qedi: Fix host removal with running sessions")
Tested-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:22 +02:00
Mike Christie
9bb43a58d1 scsi: iscsi: Add helper to remove a session from the kernel
[ Upstream commit bb42856bfd54fda1cbc7c470fcf5db1596938f4f ]

During qedi shutdown we need to stop the iSCSI layer from sending new nops
as pings and from responding to target ones and make sure there is no
running connection cleanups. Commit d1f2ce77638d ("scsi: qedi: Fix host
removal with running sessions") converted the driver to use the libicsi
helper to drive session removal, so the above issues could be handled. The
problem is that during system shutdown iscsid will not be running so when
we try to remove the root session we will hang waiting for userspace to
reply.

Add a helper that will drive the destruction of sessions like these during
system shutdown.

Link: https://lore.kernel.org/r/20220616222738.5722-5-michael.christie@oracle.com
Tested-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:22 +02:00
Mike Christie
0ccf5a8038 scsi: iscsi: Allow iscsi_if_stop_conn() to be called from kernel
[ Upstream commit 3328333b47f4163504267440ec0a36087a407a5f ]

iscsi_if_stop_conn() is only called from the userspace interface but in a
subsequent commit we will want to call it from the kernel interface to
allow drivers like qedi to remove sessions from inside the kernel during
shutdown. This removes the iscsi_uevent code from iscsi_if_stop_conn() so we
can call it in a new helper.

Link: https://lore.kernel.org/r/20220616222738.5722-3-michael.christie@oracle.com
Tested-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:21 +02:00
Quinn Tran
637e64b296 scsi: qla2xxx: edif: Reduce N2N thrashing at app_start time
[ Upstream commit 37be3f9d6993a721bc019f03c97ea0fe66319997 ]

For N2N + remote WWPN is bigger than local adapter, remote adapter will
login to local adapter while authentication application is not running.
When authentication application starts, the current session in FW needs to
to be invalidated.

Make sure the old session is torn down before triggering a relogin.

Link: https://lore.kernel.org/r/20220608115849.16693-9-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:07 +02:00
Quinn Tran
412eb69964 scsi: qla2xxx: edif: Fix no logout on delete for N2N
[ Upstream commit ec538eb838f334453b10e7e9b260f0c358018a37 ]

The driver failed to send implicit logout on session delete. For edif, this
failed to flush any lingering SA index in FW.

Set a flag to turn on implicit logout early in the session recovery to make
sure the logout will go out in case of error.

Link: https://lore.kernel.org/r/20220608115849.16693-8-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:06 +02:00
Quinn Tran
f65746b1cc scsi: qla2xxx: edif: Fix session thrash
[ Upstream commit a8fdfb0b39c2b31722c70bdf2272b949d5af4b7b ]

Current code prematurely sends out PRLI before authentication application
has given the OK to do so. This causes PRLI failure and session teardown.

Prevents PRLI from going out before authentication app gives the OK.

Link: https://lore.kernel.org/r/20220608115849.16693-7-njavali@marvell.com
Fixes: 91f6f5fbe87b ("scsi: qla2xxx: edif: Reduce connection thrash")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:06 +02:00
Quinn Tran
8f34bd2810 scsi: qla2xxx: edif: Tear down session if keys have been removed
[ Upstream commit d7e2e4a68fc047a025afcd200e6b7e1fbc8b1999 ]

If all keys for a session have been deleted, trigger a session teardown.

Link: https://lore.kernel.org/r/20220608115849.16693-6-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:06 +02:00
Quinn Tran
003389c2fb scsi: qla2xxx: edif: Fix no login after app start
[ Upstream commit 24c796098f5395477f7f7ebf8e24f3f08a139f71 ]

The scenario is this: User loaded driver but has not started authentication
app. All sessions to secure device will exhaust all login attempts, fail,
and in stay in deleted state. Then some time later the app is started. The
driver will replenish the login retry count, trigger delete to prepare for
secure login. After deletion, relogin is triggered.

For the session that is already deleted, the delete trigger is a no-op. If
none of the sessions trigger a relogin, no progress is made.

Add a relogin trigger.

Link: https://lore.kernel.org/r/20220608115849.16693-5-njavali@marvell.com
Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:06 +02:00
Quinn Tran
f95502bc5e scsi: qla2xxx: edif: Reduce disruption due to multiple app start
[ Upstream commit 0dbfce5255fe8d069a1a3b712a25b263264cfa58 ]

Multiple app start can trigger a session bounce. Make driver skip over
session teardown if app start is seen more than once.

Link: https://lore.kernel.org/r/20220608115849.16693-4-njavali@marvell.com
Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:06 +02:00
Quinn Tran
ba6202e0da scsi: qla2xxx: edif: Send LOGO for unexpected IKE message
[ Upstream commit 2b659ed67a12f39f56d8dcad9b5d5a74d67c01b3 ]

If the session is down and the local port continues to receive AUTH ELS
messages, the driver needs to send back LOGO so that the remote device
knows to tear down its session. Terminate and clean up the AUTH ELS
exchange followed by a passthrough LOGO.

Link: https://lore.kernel.org/r/20220608115849.16693-3-njavali@marvell.com
Fixes: 225479296c4f ("scsi: qla2xxx: edif: Reject AUTH ELS on session down")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:06 +02:00
Quinn Tran
bbafa6ea9c scsi: qla2xxx: edif: Fix n2n login retry for secure device
[ Upstream commit aec55325ddec975216119da000092cb8664a3399 ]

After initiator has burned up all login retries, target authentication
application begins to run. This triggers a link bounce on target side.
Initiator will attempt another login. Due to N2N, the PRLI [nvme | fcp] can
fail because of the mode mismatch with target. This patch add a few more
login retries to revive the connection.

Link: https://lore.kernel.org/r/20220607044627.19563-11-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:02 +02:00
Quinn Tran
2eba320770 scsi: qla2xxx: edif: Fix n2n discovery issue with secure target
[ Upstream commit 789d54a4178634850e441f60c0326124138e7269 ]

User failed to see disk via n2n topology. Driver used up all login retries
before authentication application started. When authentication application
started, driver did not have enough login retries to connect securely. On
app_start, driver will reset the login retry attempt count.

Link: https://lore.kernel.org/r/20220607044627.19563-10-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:02 +02:00
Quinn Tran
9f42306701 scsi: qla2xxx: edif: Add retry for ELS passthrough
[ Upstream commit 0b3f3143d473b489a7aa0779c43bcdb344bd3014 ]

Relating to EDIF, when sending IKE message, updating key or deleting key,
driver can encounter IOCB queue full. Add additional retries to reduce
higher level recovery.

Link: https://lore.kernel.org/r/20220607044627.19563-8-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:02 +02:00
Quinn Tran
8f5aa46923 scsi: qla2xxx: edif: Synchronize NPIV deletion with authentication application
[ Upstream commit cf79716e6636400ae38c37bc8a652b1e522abbba ]

Notify authentication application of a NPIV deletion event is about to
occur. This allows app to perform cleanup.

Link: https://lore.kernel.org/r/20220607044627.19563-7-njavali@marvell.com
Fixes: 9efea843a906 ("scsi: qla2xxx: edif: Add detection of secure device")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:01 +02:00
Quinn Tran
d2deafaef0 scsi: qla2xxx: edif: Fix potential stuck session in sa update
[ Upstream commit e0fb8ce2bb9e52c846e54ad2c58b5b7beb13eb09 ]

When a thread is in the process of reestablish a session, a flag is set to
prevent multiple threads/triggers from doing the same task. This flag was
left on, and any attempt to relogin was locked out. Clear this flag if the
attempt has failed.

Link: https://lore.kernel.org/r/20220607044627.19563-6-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:01 +02:00
Quinn Tran
47c7d335bc scsi: qla2xxx: edif: Add bsg interface to read doorbell events
[ Upstream commit 5ecd241bd7b1088a189581c0b560a13fe93621f6 ]

Add bsg interface for app to read doorbell events. This interface lets
driver know how much app can read based on return buffer size. When the
next event(s) occur, driver will return the bsg_job with the event(s) in
the return buffer.

If there is no event to read, driver will hold on to the bsg_job up to few
seconds as a way to control the polling interval.

Link: https://lore.kernel.org/r/20220607044627.19563-5-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:01 +02:00
Quinn Tran
86facc2718 scsi: qla2xxx: edif: Wait for app to ack on sess down
[ Upstream commit df648afa39da9c4d3af99c6c03dc3e9c7dfa99b0 ]

On session deletion, wait for app to acknowledge before moving on. This
allows both app and driver to stay in sync. In addition, this gives a
chance for authentication app to do any type of cleanup before moving on.

Link: https://lore.kernel.org/r/20220607044627.19563-4-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:01 +02:00
Quinn Tran
f02c787f70 scsi: qla2xxx: edif: bsg refactor
[ Upstream commit 7a7b0b4865d3490f62d6ef1a3aa39fa2b47859a4 ]

 - Add version field to edif bsg for future enhancement.

 - Add version edif bsg version check

 - Remove unused interfaces and fields.

Link: https://lore.kernel.org/r/20220607044627.19563-3-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:01 +02:00
Quinn Tran
733008cbbd scsi: qla2xxx: edif: Reduce Initiator-Initiator thrashing
[ Upstream commit 9c40c36e75ffd49952cd4ead0672defc4b4dbdf7 ]

This patch uses GFFID switch command to scan whether remote device is
Target or Initiator mode.  Based on that info, driver will not pass up
Initiator info to authentication application. This helps reduce unnecessary
stress for authentication application to deal with unused connections.

Link: https://lore.kernel.org/r/20220607044627.19563-2-njavali@marvell.com
Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 15:15:01 +02:00
Bikash Hazarika
84002f936a scsi: qla2xxx: Zero undefined mailbox IN registers
commit 6c96a3c7d49593ef15805f5e497601c87695abc9 upstream.

While requesting a new mailbox command, driver does not write any data to
unused registers.  Initialize the unused register value to zero while
requesting a new mailbox command to prevent stale entry access by firmware.

Link: https://lore.kernel.org/r/20220713052045.10683-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:13:54 +02:00
Bikash Hazarika
9d08698376 scsi: qla2xxx: Fix incorrect display of max frame size
commit cf3b4fb655796674e605268bd4bfb47a47c8bce6 upstream.

Replace display field with the correct field.

Link: https://lore.kernel.org/r/20220713052045.10683-3-njavali@marvell.com
Fixes: 8777e4314d39 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:13:53 +02:00
Tony Battersby
03d8241112 scsi: sg: Allow waiting for commands to complete on removed device
commit 3455607fd7be10b449f5135c00dc306b85dc0d21 upstream.

When a SCSI device is removed while in active use, currently sg will
immediately return -ENODEV on any attempt to wait for active commands that
were sent before the removal.  This is problematic for commands that use
SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel
when userspace frees or reuses it after getting ENODEV, leading to
corrupted userspace memory (in the case of READ-type commands) or corrupted
data being sent to the device (in the case of WRITE-type commands).  This
has been seen in practice when logging out of a iscsi_tcp session, where
the iSCSI driver may still be processing commands after the device has been
marked for removal.

Change the policy to allow userspace to wait for active sg commands even
when the device is being removed.  Return -ENODEV only when there are no
more responses to read.

Link: https://lore.kernel.org/r/5ebea46f-fe83-2d0b-233d-d0dcb362dd0a@cybernetics.com
Cc: <stable@vger.kernel.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:13:53 +02:00
James Smart
b0ab912836 scsi: lpfc: Remove extra atomic_inc on cmd_pending in queuecommand after VMID
commit 0948a9c5386095baae4012190a6b65aba684a907 upstream.

VMID introduced an extra increment of cmd_pending, causing double-counting
of the I/O. The normal increment ios performed in lpfc_get_scsi_buf.

Link: https://lore.kernel.org/r/20220701211425.2708-5-jsmart2021@gmail.com
Fixes: 33c79741deaf ("scsi: lpfc: vmid: Introduce VMID in I/O path")
Cc: <stable@vger.kernel.org> # v5.14+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:13:53 +02:00
Nilesh Javali
5c5639005d scsi: Revert "scsi: qla2xxx: Fix disk failure to rediscover"
commit 5bc7b01c513a4a9b4cfe306e8d1720cfcfd3b8a3 upstream.

This fixes the regression of NVMe discovery failure during driver load
time.

This reverts commit 6a45c8e137d4e2c72eecf1ac7cf64f2fdfcead99.

Link: https://lore.kernel.org/r/20220713052045.10683-2-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 15:13:40 +02:00
Jason Yan
d9a434fa0c scsi: core: Fix warning in scsi_alloc_sgtables()
As explained in SG_IO howto[1]:

"If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of
iov_len lengths. If not, the minimum of the two is the transfer length."

When iovec_count is non-zero and dxfer_len is zero, the sg_io() just
genarated a null bio, and finally caused a warning below. To fix it, skip
generating a bio for this request if dxfer_len is zero.

[1] https://tldp.org/HOWTO/SCSI-Generic-HOWTO/x198.html

WARNING: CPU: 2 PID: 3643 at drivers/scsi/scsi_lib.c:1032 scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Modules linked in:

CPU: 2 PID: 3643 Comm: syz-executor397 Not tainted
5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-204/01/2014
RIP: 0010:scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Code: e7 fc 31 ff 44 89 f6 e8 c1 4e e7 fc 45 85 f6 0f 84 1a f5 ff ff e8
93 4c e7 fc 83 c5 01 0f b7 ed e9 0f f5 ff ff e8 83 4c e7 fc <0f> 0b 41
   bc 0a 00 00 00 e9 2b fb ff ff 41 bc 09 00 00 00 e9 20 fb
RSP: 0018:ffffc90000d07558 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88801bfc96a0 RCX: 0000000000000000
RDX: ffff88801c876000 RSI: ffffffff849060bd RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff849055b9 R11: 0000000000000000 R12: ffff888012b8c000
R13: ffff88801bfc9580 R14: 0000000000000000 R15: ffff88801432c000
FS:  00007effdec8e700(0000) GS:ffff88802cc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007effdec6d718 CR3: 00000000206d6000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 scsi_setup_scsi_cmnd drivers/scsi/scsi_lib.c:1219 [inline]
 scsi_prepare_cmd drivers/scsi/scsi_lib.c:1614 [inline]
 scsi_queue_rq+0x283e/0x3630 drivers/scsi/scsi_lib.c:1730
 blk_mq_dispatch_rq_list+0x6ea/0x22e0 block/blk-mq.c:1851
 __blk_mq_sched_dispatch_requests+0x20b/0x410 block/blk-mq-sched.c:299
 blk_mq_sched_dispatch_requests+0xfb/0x180 block/blk-mq-sched.c:332
 __blk_mq_run_hw_queue+0xf9/0x350 block/blk-mq.c:1968
 __blk_mq_delay_run_hw_queue+0x5b6/0x6c0 block/blk-mq.c:2045
 blk_mq_run_hw_queue+0x30f/0x480 block/blk-mq.c:2096
 blk_mq_sched_insert_request+0x340/0x440 block/blk-mq-sched.c:451
 blk_execute_rq+0xcc/0x340 block/blk-mq.c:1231
 sg_io+0x67c/0x1210 drivers/scsi/scsi_ioctl.c:485
 scsi_ioctl_sg_io drivers/scsi/scsi_ioctl.c:866 [inline]
 scsi_ioctl+0xa66/0x1560 drivers/scsi/scsi_ioctl.c:921
 sd_ioctl+0x199/0x2a0 drivers/scsi/sd.c:1576
 blkdev_ioctl+0x37a/0x800 block/ioctl.c:588
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:874 [inline]
 __se_sys_ioctl fs/ioctl.c:860 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7effdecdc5d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 81 14 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007effdec8e2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007effded664c0 RCX: 00007effdecdc5d9
RDX: 0000000020002300 RSI: 0000000000002285 RDI: 0000000000000004
RBP: 00007effded34034 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007effded34054 R14: 2f30656c69662f2e R15: 00007effded664c8

Link: https://lore.kernel.org/r/20220720025120.3226770-1-yanaijie@huawei.com
Fixes: 25636e282fe9 ("block: fix SG_IO vector request data length handling")
Reported-by: syzbot+d44b35ecfb807e5af0b5@syzkaller.appspotmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-26 21:54:30 -04:00
David Jeffery
0fde22c542 scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
During system shutdown or reboot, mpt3sas will reset the firmware back to
ready state. However, the driver leaves running a watchdog work item
intended to keep the firmware in operational state. This causes a second,
unneeded reset on shutdown and moves the firmware back to operational
instead of in ready state as intended. And if the mpt3sas_fwfault_debug
module parameter is set, this extra reset also panics the system.

mpt3sas's scsih_shutdown needs to stop the watchdog before resetting the
firmware back to ready state.

Link: https://lore.kernel.org/r/20220722142448.6289-1-djeffery@redhat.com
Fixes: fae21608c31c ("scsi: mpt3sas: Transition IOC to Ready state during shutdown")
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-26 21:40:43 -04:00
Changyuan Lyu
355bf2e036 scsi: pm80xx: Set stopped phy's linkrate to Disabled
Negotiated link rate needs to be updated to 'Disabled' when phy is stopped.

Link: https://lore.kernel.org/r/20220708205026.969161-1-changyuanl@google.com
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-13 23:27:59 -04:00
Changyuan Lyu
e78276cadb scsi: pm80xx: Fix 'Unknown' max/min linkrate
Currently, the data flow of the max/min linkrate in the driver is

 * in pm8001_get_lrate_mode():
   hardcoded value ==> struct sas_phy

 * in pm8001_bytes_dmaed():
   struct pm8001_phy ==> struct sas_phy

 * in pm8001_phy_control():
   libsas data ==> struct pm8001_phy

Since pm8001_bytes_dmaed() follows pm8001_get_lrate_mode(), and the fields
in struct pm8001_phy are not initialized, sysfs
`/sys/class/sas_phy/phy-*/maximum_linkrate` always shows `Unknown`.

To fix the issue, change the dataflow to the following:

 * in pm8001_phy_init():
   initial value ==> struct pm8001_phy

 * in pm8001_get_lrate_mode():
   struct pm8001_phy ==> struct sas_phy

 * in pm8001_phy_control():
   libsas data ==> struct pm8001_phy

For negotiated linkrate, the current dataflow is:

 * in pm8001_get_lrate_mode():
   iomb data ==> struct asd_sas_phy ==> struct sas_phy

 * in pm8001_bytes_dmaed():
   struct asd_sas_phy ==> struct sas_phy

Since pm8001_bytes_dmaed() follows pm8001_get_lrate_mode(), the assignment
statements in pm8001_bytes_dmaed() are unnecessary and cleaned up.

Link: https://lore.kernel.org/r/20220707175210.528858-1-changyuanl@google.com
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-13 23:27:59 -04:00
Ming Lei
8312cd3a7b scsi: megaraid: Clear READ queue map's nr_queues
The megaraid SCSI driver sets set->nr_maps as 3 if poll_queues is > 0, and
blk-mq actually initializes each map's nr_queues as nr_hw_queues.
Consequently the driver has to clear READ queue map's nr_queues, otherwise
the queue map becomes broken if poll_queues is set as non-zero.

Link: https://lore.kernel.org/r/20220706125942.528533-1-ming.lei@redhat.com
Fixes: 9e4bec5b2a23 ("scsi: megaraid_sas: mq_poll support")
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: sumit.saxena@broadcom.com
Cc: chandrakanth.patil@broadcom.com
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Tested-by: Guangwu Zhang <guazhang@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-13 22:57:03 -04:00
John Garry
fce54ed027 scsi: hisi_sas: Limit max hw sectors for v3 HW
If the controller is behind an IOMMU then the IOMMU IOVA caching range can
affect performance, as discussed in [0].

Limit the max HW sectors to not exceed this limit. We need to hardcode the
value until a proper DMA mapping API is available.

[0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/

Link: https://lore.kernel.org/r/1655988119-223714-1-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-27 22:43:57 -04:00
Tyrel Datwyler
aeaadcde1a scsi: ibmvfc: Store vhost pointer during subcrq allocation
Currently the back pointer from a queue to the vhost adapter isn't set
until after subcrq interrupt registration. The value is available when a
queue is first allocated and can/should be also set for primary and async
queues as well as subcrqs.

This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS
interrupt controller where a pending subcrq interrupt from the previous
kernel can be replayed immediately upon IRQ registration resulting in
dereference of a garbage backpointer in ibmvfc_interrupt_scsi().

Kernel attempted to read user page (58) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000058
Faulting instruction address: 0xc008000003216a08
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000003216a08] ibmvfc_interrupt_scsi+0x40/0xb0 [ibmvfc]
LR [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270
Call Trace:
[c000000047fa3d80] [c0000000123e6180] 0xc0000000123e6180 (unreliable)
[c000000047fa3df0] [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270
[c000000047fa3ea0] [c000000008207d18] handle_irq_event+0x98/0x188
[c000000047fa3ef0] [c00000000820f564] handle_fasteoi_irq+0xc4/0x310
[c000000047fa3f40] [c000000008205c60] generic_handle_irq+0x50/0x80
[c000000047fa3f60] [c000000008015c40] __do_irq+0x70/0x1a0
[c000000047fa3f90] [c000000008016d7c] __do_IRQ+0x9c/0x130
[c000000014622f60] [0000000020000000] 0x20000000
[c000000014622ff0] [c000000008016e50] do_IRQ+0x40/0xa0
[c000000014623020] [c000000008017044] replay_soft_interrupts+0x194/0x2f0
[c000000014623210] [c0000000080172a8] arch_local_irq_restore+0x108/0x170
[c000000014623240] [c000000008eb1008] _raw_spin_unlock_irqrestore+0x58/0xb0
[c000000014623270] [c00000000820b12c] __setup_irq+0x49c/0x9f0
[c000000014623310] [c00000000820b7c0] request_threaded_irq+0x140/0x230
[c000000014623380] [c008000003212a50] ibmvfc_register_scsi_channel+0x1e8/0x2f0 [ibmvfc]
[c000000014623450] [c008000003213d1c] ibmvfc_init_sub_crqs+0xc4/0x1f0 [ibmvfc]
[c0000000146234d0] [c0080000032145a8] ibmvfc_reset_crq+0x150/0x210 [ibmvfc]
[c000000014623550] [c0080000032147c8] ibmvfc_init_crq+0x160/0x280 [ibmvfc]
[c0000000146235f0] [c00800000321a9cc] ibmvfc_probe+0x2a4/0x530 [ibmvfc]

Link: https://lore.kernel.org/r/20220616191126.1281259-2-tyreld@linux.ibm.com
Fixes: 3034ebe26389 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Cc: stable@vger.kernel.org
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:42:04 -04:00
Tyrel Datwyler
72ea7fe0db scsi: ibmvfc: Allocate/free queue resource only during probe/remove
Currently, the sub-queues and event pool resources are allocated/freed for
every CRQ connection event such as reset and LPM. This exposes the driver
to a couple issues. First the inefficiency of freeing and reallocating
memory that can simply be resued after being sanitized. Further, a system
under memory pressue runs the risk of allocation failures that could result
in a crippled driver. Finally, there is a race window where command
submission/compeletion can try to pull/return elements from/to an event
pool that is being deleted or already has been deleted due to the lack of
host state around freeing/allocating resources. The following is an example
of list corruption following a live partition migration (LPM):

Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: vfat fat isofs cdrom ext4 mbcache jbd2 nft_counter nft_compat nf_tables nfnetlink rpadlpar_io rpaphp xsk_diag nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill bonding tls sunrpc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth vmx_crypto dm_multipath dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
CPU: 0 PID: 2108 Comm: ibmvfc_0 Kdump: loaded Not tainted 5.14.0-70.9.1.el9_0.ppc64le #1
NIP: c0000000007c4bb0 LR: c0000000007c4bac CTR: 00000000005b9a10
REGS: c00000025c10b760 TRAP: 0700  Not tainted (5.14.0-70.9.1.el9_0.ppc64le)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 2800028f XER: 0000000f
CFAR: c0000000001f55bc IRQMASK: 0
        GPR00: c0000000007c4bac c00000025c10ba00 c000000002a47c00 000000000000004e
        GPR04: c0000031e3006f88 c0000031e308bd00 c00000025c10b768 0000000000000027
        GPR08: 0000000000000000 c0000031e3009dc0 00000031e0eb0000 0000000000000000
        GPR12: c0000031e2ffffa8 c000000002dd0000 c000000000187108 c00000020fcee2c0
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 c008000002f81300
        GPR24: 5deadbeef0000100 5deadbeef0000122 c000000263ba6910 c00000024cc88000
        GPR28: 000000000000003c c0000002430a0000 c0000002430ac300 000000000000c300
NIP [c0000000007c4bb0] __list_del_entry_valid+0x90/0x100
LR [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100
Call Trace:
[c00000025c10ba00] [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100 (unreliable)
[c00000025c10ba60] [c008000002f42284] ibmvfc_free_queue+0xec/0x210 [ibmvfc]
[c00000025c10bb10] [c008000002f4246c] ibmvfc_deregister_scsi_channel+0xc4/0x160 [ibmvfc]
[c00000025c10bba0] [c008000002f42580] ibmvfc_release_sub_crqs+0x78/0x130 [ibmvfc]
[c00000025c10bc20] [c008000002f4f6cc] ibmvfc_do_work+0x5c4/0xc70 [ibmvfc]
[c00000025c10bce0] [c008000002f4fdec] ibmvfc_work+0x74/0x1e8 [ibmvfc]
[c00000025c10bda0] [c0000000001872b8] kthread+0x1b8/0x1c0
[c00000025c10be10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
40820034 38600001 38210060 4e800020 7c0802a6 7c641b78 3c62fe7a 7d254b78
3863b590 f8010070 4ba309cd 60000000 <0fe00000> 7c0802a6 3c62fe7a 3863b640
---[ end trace 11a2b65a92f8b66c ]---
ibmvfc 30000003: Send warning. Receive queue closed, will retry.

Add registration/deregistration helpers that are called instead during
connection resets to sanitize and reconfigure the queues.

Link: https://lore.kernel.org/r/20220616191126.1281259-3-tyreld@linux.ibm.com
Fixes: 3034ebe26389 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Cc: stable@vger.kernel.org
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:40:10 -04:00