23215 Commits

Author SHA1 Message Date
Nilesh Javali
4de0d18da9 scsi: qla2xxx: Update version to 10.02.07.700-k
Link: https://lore.kernel.org/r/20220616053508.27186-12-njavali@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:54 -04:00
Quinn Tran
f260694e64 scsi: qla2xxx: Fix erroneous mailbox timeout after PCI error injection
Clear wait for mailbox interrupt flag to prevent stale mailbox:

Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps).
Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ...

To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging
the mailbox. When the stale mailbox completion does arrive, it will be
dropped.

Link: https://lore.kernel.org/r/20220616053508.27186-11-njavali@marvell.com
Fixes: b6faaaf796d7 ("scsi: qla2xxx: Serialize mailbox request")
Cc: Naresh Bannoth <nbannoth@in.ibm.com>
Cc: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Cc: stable@vger.kernel.org
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Tested-by: Naresh Bannoth <nbannoth@in.ibm.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:54 -04:00
Arun Easi
2416ccd381 scsi: qla2xxx: Fix losing FCP-2 targets on long port disable with I/Os
FCP-2 devices were not coming back online once they were lost, login
retries exhausted, and then came back up.  Fix this by accepting RSCN when
the device is not online.

Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:54 -04:00
Arun Easi
f12d2d130e scsi: qla2xxx: Add debug prints in the device remove path
Add a debug print in the devloss callback.

Link: https://lore.kernel.org/r/20220616053508.27186-9-njavali@marvell.com
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:53 -04:00
Arun Easi
118b0c863c scsi: qla2xxx: Fix losing target when it reappears during delete
FC target disappeared during port perturbation tests due to a race that
tramples target state.  Fix the issue by adding state checks before
proceeding.

Link: https://lore.kernel.org/r/20220616053508.27186-8-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:53 -04:00
Arun Easi
58d1c124cd scsi: qla2xxx: Fix losing FCP-2 targets during port perturbation tests
When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target
state was incorrectly transitioned when both of the targets were gone. Fix
this by ignoring state transition for FCP-2 targets.

Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com
Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:53 -04:00
Arun Easi
c39587bc0a scsi: qla2xxx: Fix crash due to stale SRB access around I/O timeouts
Ensure SRB is returned during I/O timeout error escalation. If that is not
possible fail the escalation path.

Following crash stack was seen:

BUG: unable to handle kernel paging request at 0000002f56aa90f8
IP: qla_chk_edif_rx_sa_delete_pending+0x14/0x30 [qla2xxx]
Call Trace:
 ? qla2x00_status_entry+0x19f/0x1c50 [qla2xxx]
 ? qla2x00_start_sp+0x116/0x1170 [qla2xxx]
 ? dma_pool_alloc+0x1d6/0x210
 ? mempool_alloc+0x54/0x130
 ? qla24xx_process_response_queue+0x548/0x12b0 [qla2xxx]
 ? qla_do_work+0x2d/0x40 [qla2xxx]
 ? process_one_work+0x14c/0x390

Link: https://lore.kernel.org/r/20220616053508.27186-6-njavali@marvell.com
Fixes: d74595278f4a ("scsi: qla2xxx: Add multiple queue pair functionality.")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:53 -04:00
Quinn Tran
5304673bdb scsi: qla2xxx: Turn off multi-queue for 8G adapters
For 8G adapters, multi-queue was enabled accidentally. Make sure
multi-queue is not enabled.

Link: https://lore.kernel.org/r/20220616053508.27186-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:53 -04:00
Quinn Tran
d3117c83ba scsi: qla2xxx: Wind down adapter after PCIe error
Put adapter into a wind down state if OS does not make any attempt to
recover the adapter after PCIe error.

Link: https://lore.kernel.org/r/20220616053508.27186-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:53 -04:00
Bikash Hazarika
476da8faa3 scsi: qla2xxx: Add a new v2 dport diagnostic feature
FW requires minimum 72 bytes buffer size for D_port result. Buffer size
1024 is mentioned in the FW spec so buffer size is increased to 1024.
Rewrite the logic to handle START/RESTART command from SDMAPI.

Link: https://lore.kernel.org/r/20220616053508.27186-3-njavali@marvell.com
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:52 -04:00
Arun Easi
bff4873c70 scsi: qla2xxx: Fix excessive I/O error messages by default
Disable printing I/O error messages by default.  The messages will be
printed only when logging was enabled.

Link: https://lore.kernel.org/r/20220616053508.27186-2-njavali@marvell.com
Fixes: 8e2d81c6b5be ("scsi: qla2xxx: Fix excessive messages during device logout")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:59:52 -04:00
Dmitry Bogdanov
65080c51fd scsi: iscsi: Prefer xmit of DataOut over new commands
iscsi_data_xmit() (TX worker) is iterating over the queue of new SCSI
commands concurrently with the queue being replenished. Only after the
queue is emptied will we start sending pending DataOut PDUs. That leads to
DataOut timeout on the target side and to connection reinstatement.

Give priority to pending DataOut commands over new commands.

Link: https://lore.kernel.org/r/20220607131953.11584-1-d.bogdanov@yadro.com
Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:53:39 -04:00
John Garry
42f22fe36d scsi: pm8001: Expose hardware queues for pm80xx
In commit 05c6c029a44d ("scsi: pm80xx: Increase number of supported
queues"), support for 80xx chip was improved by enabling multiple HW
queues.

In this, like other SCSI MQ HBA drivers at the time, the HW queues were not
exposed to upper layer, and instead the driver managed the queues
internally.

However, this management duplicates blk-mq code. In addition, the HW queue
management is sub-optimal for a system where the number of CPUs exceeds the
HW queues - this is because queues are selected in a round-robin fashion,
when it would be better to make adjacent CPUs submit on the same queue. And
finally, the affinity of the completion queue interrupts is not set to
mirror the cpu<->HQ queue mapping, which is suboptimal.

As such, for when MSIX is supported, expose HW queues to upper layer. We
always use queue index #0 for "internal" commands, i.e. anything which does
not come from the block layer, so omit this from the affinity spreading.

Link: https://lore.kernel.org/r/1654879602-33497-5-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:45:09 -04:00
John Garry
940f5efa63 scsi: pm8001: Use non-atomic bitmap ops for tag alloc + free
In pm8001_tag_alloc() we don't require atomic set_bit() as we are already
in atomic context. In pm8001_tag_free() we should use the same host
spinlock to protect clearing the tag (and then don't require the atomic
clear_bit()).

Link: https://lore.kernel.org/r/1654879602-33497-4-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:45:09 -04:00
John Garry
98132d842d scsi: pm8001: Set up tags before using them
The current code is buggy in that the tags are set up after they are needed
in pm80xx_chip_init() -> pm80xx_set_sas_protocol_timer_config().  The tag
depth is earlier read in pm80xx_chip_init() -> read_main_config_table().

Add a post init callback to do the pm80xx work which needs to be done after
reading the tags. I don't see a better way to do this.

Link: https://lore.kernel.org/r/1654879602-33497-3-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:45:09 -04:00
John Garry
35a7e9dbff scsi: pm8001: Rework shost initial values
Some values in pm8001_prep_sas_ha_init() are set the same as they would be
set in scsi_host_alloc(), or could be in the sht (which would be better),
or later just overwritten, so rework the following:

 - cmd_per_lun can be set in the sht

 - max_lun and max_channel are as scsi_host_alloc() (so no need to set)

 - can_queue is later overwritten (so don't set in
   pm8001_prep_sas_ha_init())

Link: https://lore.kernel.org/r/1654879602-33497-2-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:45:08 -04:00
Tyrel Datwyler
aeaadcde1a scsi: ibmvfc: Store vhost pointer during subcrq allocation
Currently the back pointer from a queue to the vhost adapter isn't set
until after subcrq interrupt registration. The value is available when a
queue is first allocated and can/should be also set for primary and async
queues as well as subcrqs.

This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS
interrupt controller where a pending subcrq interrupt from the previous
kernel can be replayed immediately upon IRQ registration resulting in
dereference of a garbage backpointer in ibmvfc_interrupt_scsi().

Kernel attempted to read user page (58) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000058
Faulting instruction address: 0xc008000003216a08
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000003216a08] ibmvfc_interrupt_scsi+0x40/0xb0 [ibmvfc]
LR [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270
Call Trace:
[c000000047fa3d80] [c0000000123e6180] 0xc0000000123e6180 (unreliable)
[c000000047fa3df0] [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270
[c000000047fa3ea0] [c000000008207d18] handle_irq_event+0x98/0x188
[c000000047fa3ef0] [c00000000820f564] handle_fasteoi_irq+0xc4/0x310
[c000000047fa3f40] [c000000008205c60] generic_handle_irq+0x50/0x80
[c000000047fa3f60] [c000000008015c40] __do_irq+0x70/0x1a0
[c000000047fa3f90] [c000000008016d7c] __do_IRQ+0x9c/0x130
[c000000014622f60] [0000000020000000] 0x20000000
[c000000014622ff0] [c000000008016e50] do_IRQ+0x40/0xa0
[c000000014623020] [c000000008017044] replay_soft_interrupts+0x194/0x2f0
[c000000014623210] [c0000000080172a8] arch_local_irq_restore+0x108/0x170
[c000000014623240] [c000000008eb1008] _raw_spin_unlock_irqrestore+0x58/0xb0
[c000000014623270] [c00000000820b12c] __setup_irq+0x49c/0x9f0
[c000000014623310] [c00000000820b7c0] request_threaded_irq+0x140/0x230
[c000000014623380] [c008000003212a50] ibmvfc_register_scsi_channel+0x1e8/0x2f0 [ibmvfc]
[c000000014623450] [c008000003213d1c] ibmvfc_init_sub_crqs+0xc4/0x1f0 [ibmvfc]
[c0000000146234d0] [c0080000032145a8] ibmvfc_reset_crq+0x150/0x210 [ibmvfc]
[c000000014623550] [c0080000032147c8] ibmvfc_init_crq+0x160/0x280 [ibmvfc]
[c0000000146235f0] [c00800000321a9cc] ibmvfc_probe+0x2a4/0x530 [ibmvfc]

Link: https://lore.kernel.org/r/20220616191126.1281259-2-tyreld@linux.ibm.com
Fixes: 3034ebe26389 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Cc: stable@vger.kernel.org
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:42:04 -04:00
Tyrel Datwyler
72ea7fe0db scsi: ibmvfc: Allocate/free queue resource only during probe/remove
Currently, the sub-queues and event pool resources are allocated/freed for
every CRQ connection event such as reset and LPM. This exposes the driver
to a couple issues. First the inefficiency of freeing and reallocating
memory that can simply be resued after being sanitized. Further, a system
under memory pressue runs the risk of allocation failures that could result
in a crippled driver. Finally, there is a race window where command
submission/compeletion can try to pull/return elements from/to an event
pool that is being deleted or already has been deleted due to the lack of
host state around freeing/allocating resources. The following is an example
of list corruption following a live partition migration (LPM):

Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: vfat fat isofs cdrom ext4 mbcache jbd2 nft_counter nft_compat nf_tables nfnetlink rpadlpar_io rpaphp xsk_diag nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill bonding tls sunrpc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth vmx_crypto dm_multipath dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
CPU: 0 PID: 2108 Comm: ibmvfc_0 Kdump: loaded Not tainted 5.14.0-70.9.1.el9_0.ppc64le #1
NIP: c0000000007c4bb0 LR: c0000000007c4bac CTR: 00000000005b9a10
REGS: c00000025c10b760 TRAP: 0700  Not tainted (5.14.0-70.9.1.el9_0.ppc64le)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 2800028f XER: 0000000f
CFAR: c0000000001f55bc IRQMASK: 0
        GPR00: c0000000007c4bac c00000025c10ba00 c000000002a47c00 000000000000004e
        GPR04: c0000031e3006f88 c0000031e308bd00 c00000025c10b768 0000000000000027
        GPR08: 0000000000000000 c0000031e3009dc0 00000031e0eb0000 0000000000000000
        GPR12: c0000031e2ffffa8 c000000002dd0000 c000000000187108 c00000020fcee2c0
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 0000000000000000 c008000002f81300
        GPR24: 5deadbeef0000100 5deadbeef0000122 c000000263ba6910 c00000024cc88000
        GPR28: 000000000000003c c0000002430a0000 c0000002430ac300 000000000000c300
NIP [c0000000007c4bb0] __list_del_entry_valid+0x90/0x100
LR [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100
Call Trace:
[c00000025c10ba00] [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100 (unreliable)
[c00000025c10ba60] [c008000002f42284] ibmvfc_free_queue+0xec/0x210 [ibmvfc]
[c00000025c10bb10] [c008000002f4246c] ibmvfc_deregister_scsi_channel+0xc4/0x160 [ibmvfc]
[c00000025c10bba0] [c008000002f42580] ibmvfc_release_sub_crqs+0x78/0x130 [ibmvfc]
[c00000025c10bc20] [c008000002f4f6cc] ibmvfc_do_work+0x5c4/0xc70 [ibmvfc]
[c00000025c10bce0] [c008000002f4fdec] ibmvfc_work+0x74/0x1e8 [ibmvfc]
[c00000025c10bda0] [c0000000001872b8] kthread+0x1b8/0x1c0
[c00000025c10be10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
40820034 38600001 38210060 4e800020 7c0802a6 7c641b78 3c62fe7a 7d254b78
3863b590 f8010070 4ba309cd 60000000 <0fe00000> 7c0802a6 3c62fe7a 3863b640
---[ end trace 11a2b65a92f8b66c ]---
ibmvfc 30000003: Send warning. Receive queue closed, will retry.

Add registration/deregistration helpers that are called instead during
connection resets to sanitize and reconfigure the queues.

Link: https://lore.kernel.org/r/20220616191126.1281259-3-tyreld@linux.ibm.com
Fixes: 3034ebe26389 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels")
Cc: stable@vger.kernel.org
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:40:10 -04:00
Saurabh Sengar
1d3e098078 scsi: storvsc: Correct reporting of Hyper-V I/O size limits
Current code is based on the idea that the max number of SGL entries
also determines the max size of an I/O request.  While this idea was
true in older versions of the storvsc driver when SGL entry length
was limited to 4 Kbytes, commit 3d9c3dcc58e9 ("scsi: storvsc: Enable
scatterlist entry lengths > 4Kbytes") removed that limitation. It's
now theoretically possible for the block layer to send requests that
exceed the maximum size supported by Hyper-V. This problem doesn't
currently happen in practice because the block layer defaults to a
512 Kbyte maximum, while Hyper-V in Azure supports 2 Mbyte I/O sizes.
But some future configuration of Hyper-V could have a smaller max I/O
size, and the block layer could exceed that max.

Fix this by correctly setting max_sectors as well as sg_tablesize to
reflect the maximum I/O size that Hyper-V reports. While allowing
I/O sizes larger than the block layer default of 512 Kbytes doesn’t
provide any noticeable performance benefit in the tests we ran, it's
still appropriate to report the correct underlying Hyper-V capabilities
to the Linux block layer.

Also tweak the virt_boundary_mask to reflect that the required
alignment derives from Hyper-V communication using a 4 Kbyte page size,
and not on the guest page size, which might be bigger (eg. ARM64).

Link: https://lore.kernel.org/r/1655190355-28722-1-git-send-email-ssengar@linux.microsoft.com
Fixes: 3d9c3dcc58e9 ("scsi: storvsc: Enable scatter list entry lengths > 4Kbytes")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:36:03 -04:00
Xiu Jianfeng
e733f8a894 scsi: lpfc: Use memset_startat() helper in lpfc_nvmet_xmt_fcp_op_cmp()
Use memset_startat() helper to simplify the code, no functional changes.

Link: https://lore.kernel.org/r/20220613021851.59699-1-xiujianfeng@huawei.com
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-16 21:23:18 -04:00
Sergey Gorenko
f6eed15f3e scsi: iscsi: Exclude zero from the endpoint ID range
The kernel returns an endpoint ID as r.ep_connect_ret.handle in the
iscsi_uevent. The iscsid validates a received endpoint ID and treats zero
as an error. The commit referenced in the fixes line changed the endpoint
ID range, and zero is always assigned to the first endpoint ID.  So, the
first attempt to create a new iSER connection always fails.

Link: https://lore.kernel.org/r/20220613123854.55073-1-sergeygo@nvidia.com
Fixes: 3c6ae371b8a1 ("scsi: iscsi: Release endpoint ID when its freed")
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-13 22:11:36 -04:00
Damien Le Moal
3dafe0648d scsi: libsas: Introduce struct smp_rps_resp
Similarly to sas report general and discovery responses, define the
structure struct smp_rps_resp to handle SATA PHY report responses using a
structure with a size that is exactly equal to the sas defined response
size.

With this change, struct smp_resp becomes unused and is removed.

Link: https://lore.kernel.org/r/20220609022456.409087-4-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:08:06 -04:00
Damien Le Moal
44f2bfe9ef scsi: libsas: Introduce struct smp_rg_resp
When compiling with gcc 12, several warnings are thrown by gcc when
compiling drivers/scsi/libsas/sas_expander.c, e.g.:

In function ‘sas_get_ex_change_count’,
    inlined from ‘sas_find_bcast_dev’ at
    drivers/scsi/libsas/sas_expander.c:1816:8:
drivers/scsi/libsas/sas_expander.c:1781:20: warning: array subscript
‘struct smp_resp[0]’ is partly outside array bounds of ‘unsigned
char[32]’ [-Warray-bounds]
 1781 |         if (rg_resp->result != SMP_RESP_FUNC_ACC) {
      |             ~~~~~~~^~~~~~~~

This is due to the use of the struct smp_resp to aggregate all possible
response types using a union but allocating a response buffer with a size
exactly equal to the size of the response type needed. This leads to access
to fields of struct smp_resp from an allocated memory area that is smaller
than the size of struct smp_resp.

Fix this by defining struct smp_rg_resp for sas report general responses.

Link: https://lore.kernel.org/r/20220609022456.409087-3-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:08:06 -04:00
Damien Le Moal
c3752f4460 scsi: libsas: Introduce struct smp_disc_resp
When compiling with gcc 12, several warnings are thrown by gcc when
compiling drivers/scsi/libsas/sas_expander.c, e.g.:

In function ‘sas_get_phy_change_count’,
    inlined from ‘sas_find_bcast_phy.constprop’ at
drivers/scsi/libsas/sas_expander.c:1737:9:
drivers/scsi/libsas/sas_expander.c:1697:39: warning: array subscript
‘struct smp_resp[0]’ is partly outside array bounds of ‘unsigned
char[56]’ [-Warray-bounds]
 1697 |                 *pcc = disc_resp->disc.change_count;
      |                        ~~~~~~~~~~~~~~~^~~~~~~~~~~~~

This is due to the use of the struct smp_resp to aggregate all possible
response types using a union but allocating a response buffer with a size
exactly equal to the size of the response type needed. This leads to access
to fields of struct smp_resp from an allocated memory area that is smaller
than the size of struct smp_resp.

Fix this by defining struct smp_disc_resp for sas discovery operations.
Since this structure and the generic struct smp_resp are identical for
the little endian and big endian archs, move the definition of these
structures at the end of include/scsi/sas.h to avoid repeating their
definition.

Link: https://lore.kernel.org/r/20220609022456.409087-2-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:08:06 -04:00
Nilesh Javali
0f4d7d5561 scsi: qla2xxx: Update version to 10.02.07.600-k
Link: https://lore.kernel.org/r/20220608115849.16693-11-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:05 -04:00
Quinn Tran
bcf536072f scsi: qla2xxx: edif: Fix slow session teardown
User experience slow recovery when target device went through a stop/start
of the authentication application (app_stop/app_start).

Between the period of app_stop and app_start on the target device, target
device choose to send ELS Reject for any receive AUTH ELS command.  At this
time, authentication application does not do ELS reject if it encounters
error.

Therefore, AUTH ELS reject signify authentication application is not
running. If driver passes up the AUTH ELS Reject to the authentication
application, then it would result in authentication application
retrying/resending the same AUTH ELS command again + delay.

As a work around, driver should trigger a session tear down where it tells
the local authentication application to also tear down.  At the next
relogin, both sides are then synchronized.

Link: https://lore.kernel.org/r/20220608115849.16693-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:05 -04:00
Quinn Tran
37be3f9d69 scsi: qla2xxx: edif: Reduce N2N thrashing at app_start time
For N2N + remote WWPN is bigger than local adapter, remote adapter will
login to local adapter while authentication application is not running.
When authentication application starts, the current session in FW needs to
to be invalidated.

Make sure the old session is torn down before triggering a relogin.

Link: https://lore.kernel.org/r/20220608115849.16693-9-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:04 -04:00
Quinn Tran
ec538eb838 scsi: qla2xxx: edif: Fix no logout on delete for N2N
The driver failed to send implicit logout on session delete. For edif, this
failed to flush any lingering SA index in FW.

Set a flag to turn on implicit logout early in the session recovery to make
sure the logout will go out in case of error.

Link: https://lore.kernel.org/r/20220608115849.16693-8-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:04 -04:00
Quinn Tran
a8fdfb0b39 scsi: qla2xxx: edif: Fix session thrash
Current code prematurely sends out PRLI before authentication application
has given the OK to do so. This causes PRLI failure and session teardown.

Prevents PRLI from going out before authentication app gives the OK.

Link: https://lore.kernel.org/r/20220608115849.16693-7-njavali@marvell.com
Fixes: 91f6f5fbe87b ("scsi: qla2xxx: edif: Reduce connection thrash")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:04 -04:00
Quinn Tran
d7e2e4a68f scsi: qla2xxx: edif: Tear down session if keys have been removed
If all keys for a session have been deleted, trigger a session teardown.

Link: https://lore.kernel.org/r/20220608115849.16693-6-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:04 -04:00
Quinn Tran
24c796098f scsi: qla2xxx: edif: Fix no login after app start
The scenario is this: User loaded driver but has not started authentication
app. All sessions to secure device will exhaust all login attempts, fail,
and in stay in deleted state. Then some time later the app is started. The
driver will replenish the login retry count, trigger delete to prepare for
secure login. After deletion, relogin is triggered.

For the session that is already deleted, the delete trigger is a no-op. If
none of the sessions trigger a relogin, no progress is made.

Add a relogin trigger.

Link: https://lore.kernel.org/r/20220608115849.16693-5-njavali@marvell.com
Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:04 -04:00
Quinn Tran
0dbfce5255 scsi: qla2xxx: edif: Reduce disruption due to multiple app start
Multiple app start can trigger a session bounce. Make driver skip over
session teardown if app start is seen more than once.

Link: https://lore.kernel.org/r/20220608115849.16693-4-njavali@marvell.com
Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:04 -04:00
Quinn Tran
2b659ed67a scsi: qla2xxx: edif: Send LOGO for unexpected IKE message
If the session is down and the local port continues to receive AUTH ELS
messages, the driver needs to send back LOGO so that the remote device
knows to tear down its session. Terminate and clean up the AUTH ELS
exchange followed by a passthrough LOGO.

Link: https://lore.kernel.org/r/20220608115849.16693-3-njavali@marvell.com
Fixes: 225479296c4f ("scsi: qla2xxx: edif: Reject AUTH ELS on session down")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:03 -04:00
Quinn Tran
63ab6cb582 scsi: qla2xxx: edif: Fix I/O timeout due to over-subscription
The current edif code does not keep track of FW IOCB resources.  This led
to IOCB queue full on error recovery (I/O timeout).  Make use of the
existing code that tracks IOCB resources to prevent over-subscription.

Link: https://lore.kernel.org/r/20220608115849.16693-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 13:04:03 -04:00
Damien Le Moal
566d3c57eb scsi: scsi_debug: Fix zone transition to full condition
When a write command to a sequential write required or sequential write
preferred zone result in the zone write pointer reaching the end of the
zone, the zone condition must be set to full AND the number of implicitly
or explicitly open zones updated to have a correct accounting for zone
resources. However, the function zbc_inc_wp() only sets the zone condition
to full without updating the open zone counters, resulting in a zone state
machine breakage.

Introduce the helper function zbc_set_zone_full() and use it in
zbc_inc_wp() to correctly transition zones to the full condition.

Link: https://lore.kernel.org/r/20220608011302.92061-1-damien.lemoal@opensource.wdc.com
Fixes: f0d1cf9378bd ("scsi: scsi_debug: Add ZBC zone commands")
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-10 12:50:05 -04:00
Chengguang Xu
ec1e8adcbd scsi: pmcraid: Fix missing resource cleanup in error case
Fix missing resource cleanup (when '(--i) == 0') for error case in
pmcraid_register_interrupt_handler().

Link: https://lore.kernel.org/r/20220529153456.4183738-6-cgxu519@mykernel.net
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 22:05:14 -04:00
Chengguang Xu
d64c491911 scsi: ipr: Fix missing/incorrect resource cleanup in error case
Fix missing resource cleanup (when '(--i) == 0') for error case in
ipr_alloc_mem() and skip incorrect resource cleanup (when '(--i) == 0') for
error case in ipr_request_other_msi_irqs() because variable i started from
1.

Link: https://lore.kernel.org/r/20220529153456.4183738-4-cgxu519@mykernel.net
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 22:05:14 -04:00
Helge Deller
120f1d95ef scsi: mpt3sas: Fix out-of-bounds compiler warning
I'm facing this warning when building for the parisc64 architecture:

drivers/scsi/mpt3sas/mpt3sas_base.c: In function ‘_base_make_ioc_operational’:
drivers/scsi/mpt3sas/mpt3sas_base.c:5396:40: warning: array subscript ‘Mpi2SasIOUnitPage1_t {aka struct _MPI2_CONFIG_PAGE_SASIOUNIT_1}[0]’ is partly outside array bounds of ‘unsigned char[20]’ [-Warray-bounds]
 5396 |             (le16_to_cpu(sas_iounit_pg1->SASWideMaxQueueDepth)) ?
drivers/scsi/mpt3sas/mpt3sas_base.c:5382:26: note: referencing an object of size 20 allocated by ‘kzalloc’
 5382 |         sas_iounit_pg1 = kzalloc(sz, GFP_KERNEL);
      |                          ^~~~~~~~~~~~~~~~~~~~~~~

The problem is, that only 20 bytes are allocated with kmalloc(), which is
sufficient to hold the bytes which are needed.  Nevertheless, gcc complains
because the whole Mpi2SasIOUnitPage1_t struct is 32 bytes in size and thus
doesn't fit into those 20 bytes.

This patch simply allocates all 32 bytes (instead of 20) and thus avoids
the warning. There is no functional change introduced by this patch.

While touching the code I cleaned up to calculation of max_wideport_qd,
max_narrowport_qd and max_sata_qd to make it easier readable.

Test successfully tested on a HP C8000 PA-RISC workstation with 64-bit
kernel.

Link: https://lore.kernel.org/r/YpZ197iZdDZSCzrT@p100
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 22:01:41 -04:00
keliu
3fd3a52ca6 scsi: core: iscsi: Directly use ida_alloc()/ida_free()
Use ida_alloc()/ida_free() instead of the deprecated
ida_simple_get()/ida_simple_remove() interface.

Link: https://lore.kernel.org/r/20220527083049.2552526-1-liuke94@huawei.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: keliu <liuke94@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:58:06 -04:00
Nilesh Javali
4dc48a107a scsi: qla2xxx: Update version to 10.02.07.500-k
Link: https://lore.kernel.org/r/20220607044627.19563-12-njavali@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:11 -04:00
Quinn Tran
aec55325dd scsi: qla2xxx: edif: Fix n2n login retry for secure device
After initiator has burned up all login retries, target authentication
application begins to run. This triggers a link bounce on target side.
Initiator will attempt another login. Due to N2N, the PRLI [nvme | fcp] can
fail because of the mode mismatch with target. This patch add a few more
login retries to revive the connection.

Link: https://lore.kernel.org/r/20220607044627.19563-11-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
789d54a417 scsi: qla2xxx: edif: Fix n2n discovery issue with secure target
User failed to see disk via n2n topology. Driver used up all login retries
before authentication application started. When authentication application
started, driver did not have enough login retries to connect securely. On
app_start, driver will reset the login retry attempt count.

Link: https://lore.kernel.org/r/20220607044627.19563-10-njavali@marvell.com
Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
1040e5f75d scsi: qla2xxx: edif: Remove old doorbell interface
Recently driver has implemented a new doorbell mechanism via bsg.  The new
doorbell tells driver the exact buffer size application has where driver
can fill it up with events. The old doorbell guestimated application buffer
size is 256.

Remove duplicate functionality, the application has moved on to the new
doorbell interface.

Link: https://lore.kernel.org/r/20220607044627.19563-9-njavali@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
0b3f3143d4 scsi: qla2xxx: edif: Add retry for ELS passthrough
Relating to EDIF, when sending IKE message, updating key or deleting key,
driver can encounter IOCB queue full. Add additional retries to reduce
higher level recovery.

Link: https://lore.kernel.org/r/20220607044627.19563-8-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
cf79716e66 scsi: qla2xxx: edif: Synchronize NPIV deletion with authentication application
Notify authentication application of a NPIV deletion event is about to
occur. This allows app to perform cleanup.

Link: https://lore.kernel.org/r/20220607044627.19563-7-njavali@marvell.com
Fixes: 9efea843a906 ("scsi: qla2xxx: edif: Add detection of secure device")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
e0fb8ce2bb scsi: qla2xxx: edif: Fix potential stuck session in sa update
When a thread is in the process of reestablish a session, a flag is set to
prevent multiple threads/triggers from doing the same task. This flag was
left on, and any attempt to relogin was locked out. Clear this flag if the
attempt has failed.

Link: https://lore.kernel.org/r/20220607044627.19563-6-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
5ecd241bd7 scsi: qla2xxx: edif: Add bsg interface to read doorbell events
Add bsg interface for app to read doorbell events. This interface lets
driver know how much app can read based on return buffer size. When the
next event(s) occur, driver will return the bsg_job with the event(s) in
the return buffer.

If there is no event to read, driver will hold on to the bsg_job up to few
seconds as a way to control the polling interval.

Link: https://lore.kernel.org/r/20220607044627.19563-5-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
df648afa39 scsi: qla2xxx: edif: Wait for app to ack on sess down
On session deletion, wait for app to acknowledge before moving on. This
allows both app and driver to stay in sync. In addition, this gives a
chance for authentication app to do any type of cleanup before moving on.

Link: https://lore.kernel.org/r/20220607044627.19563-4-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
7a7b0b4865 scsi: qla2xxx: edif: bsg refactor
- Add version field to edif bsg for future enhancement.

 - Add version edif bsg version check

 - Remove unused interfaces and fields.

Link: https://lore.kernel.org/r/20220607044627.19563-3-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:10 -04:00
Quinn Tran
9c40c36e75 scsi: qla2xxx: edif: Reduce Initiator-Initiator thrashing
This patch uses GFFID switch command to scan whether remote device is
Target or Initiator mode.  Based on that info, driver will not pass up
Initiator info to authentication application. This helps reduce unnecessary
stress for authentication application to deal with unused connections.

Link: https://lore.kernel.org/r/20220607044627.19563-2-njavali@marvell.com
Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-07 21:50:09 -04:00