608 Commits

Author SHA1 Message Date
John Garry
f5f2a27160 scsi: hisi_sas: Don't send bcast events from HW during nexus HA reset
Remote devices may go missing from the per-device nexus reset part of the
HA nexus, i.e after the controller reset. This is because libsas may find
the devices to be gone as the phy may be temporarily down when processing
the bcast event generated from the nexus reset. Filter out bcast events
during this time to stop the devices being lost.

Link: https://lore.kernel.org/r/1662378529-101489-6-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06 22:28:11 -04:00
John Garry
e9b6bada98 scsi: hisi_sas: Add helper to process bcast events
Add a helper for bcast processing to reduce duplication.

Link: https://lore.kernel.org/r/1662378529-101489-5-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06 22:28:11 -04:00
John Garry
11ff0c98fc scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()
In resetting the controller, SATA devices may be lost.

The issue is that when we insert the bcast events to rescan the topology in
hisi_sas_rescan_topology(), when we subsequently nexus reset the SATA
devices in hisi_sas_async_I_T_nexus_reset(), there is a small timing window
in which the remote phy is down and we process the bcast event (meaning
that libsas judges that the disk is lost).

Ensure that all bcast events are processed prior to the nexus reset to
close this window.

Link: https://lore.kernel.org/r/1662378529-101489-4-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06 22:28:11 -04:00
John Garry
bc5551157a scsi: hisi_sas: Clear HISI_SAS_HW_FAULT_BIT earlier
Once the controller HW has been reset then we can unset flag
HISI_SAS_HW_FAULT_BIT. In clearing this flag earlier we can now
successfully execute commands in hisi_sas_controller_reset_done(), like
bcast processing.

Link: https://lore.kernel.org/r/1662378529-101489-3-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06 22:28:10 -04:00
John Garry
245050af5d scsi: hisi_sas: Revert change to limit max hw sectors for v3 HW
Now that libsas and the SCSI core code limits the default sectors from
commit 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors
according to DMA optimal limit") and commit 608128d391fa ("scsi: sd: allow
max_sectors be capped at DMA optimal size limit"), there is no need for
the hack to limit the max HW sectors.

Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06 22:28:10 -04:00
Xingui Yang
7e15334f5d scsi: hisi_sas: Modify v3 HW SATA completion error processing
If the I/O completion response frame returned by the target device has been
written to the host memory and the err bit in the status field of the
received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will
let EH analyze and further determine cause of failure.

Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18 23:04:12 -04:00
Xiang Chen
f0902095a7 scsi: hisi_sas: Relocate DMA unmap of SMP task
Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned
normally. If the cq of SMP I/O is returned with exception actually SMP TAS
is never unmapped. Relocate DMA unmap of SMP task to fix the issue.

Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18 23:04:12 -04:00
Xiang Chen
bc22f9c06c scsi: hisi_sas: Remove unnecessary variable to hold DMA map elements
Use slot->n_elem to store the return value of dma_map_sg() for SSP and SMP
IOs, and remove unnecessary variable n_elem_req.

Link: https://lore.kernel.org/r/1657823002-139010-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18 23:04:11 -04:00
John Garry
eed9f513bf scsi: hisi_sas: Call hisi_sas_slave_configure() from slave_configure_v3_hw()
There is duplicated code between slave_configure_v3_hw() and
hisi_sas_slave_configure(), so call common function
hisi_sas_slave_configure() from slave_configure_v3_hw().

Link: https://lore.kernel.org/r/1657823002-139010-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-18 23:04:11 -04:00
Martin K. Petersen
11e50ed239 Merge branch '5.19/scsi-fixes' into 5.20/scsi-staging
Bring in fixes to resolve a merge conflict in the lpfc driver update.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-07 17:20:43 -04:00
John Garry
fce54ed027 scsi: hisi_sas: Limit max hw sectors for v3 HW
If the controller is behind an IOMMU then the IOMMU IOVA caching range can
affect performance, as discussed in [0].

Limit the max HW sectors to not exceed this limit. We need to hardcode the
value until a proper DMA mapping API is available.

[0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/

Link: https://lore.kernel.org/r/1655988119-223714-1-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-27 22:43:57 -04:00
Jiang Jian
e1397bc6ad scsi: hisi_sas: Align comments
Properly align comment lines in slot_index_alloc_quirk_v2_hw().

Link: https://lore.kernel.org/r/20220621072405.34394-1-jiangjian@cdjrlc.com
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-21 21:33:19 -04:00
John Garry
6c6ac8b777 scsi: hisi_sas: Fix memory ordering in hisi_sas_task_deliver()
The memories for the slot should be observed to be written prior to
observing the slot as ready.

Prior to commit 26fc0ea74fcb ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR"),
we had a spin_lock() + spin_unlock() immediately before marking the slot as
ready. The spin_unlock() - with release semantics - caused the slot memory
to be observed to be written.

Now that the spin_lock() + spin_unlock() is gone, use a smp_wmb().

Link: https://lore.kernel.org/r/1652774661-12935-1-git-send-email-john.garry@huawei.com
Fixes: 26fc0ea74fcb ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR")
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19 20:16:26 -04:00
John Garry
e9dedc13bb scsi: hisi_sas: Fix rescan after deleting a disk
Removing an ATA device via sysfs means that the device may not be found
through re-scanning:

root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete
root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan
root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john#

The problem is that the rescan of the device may conflict with the device
in being re-initialized, as follows:

 - In the rescan we call hisi_sas_slave_alloc() in store_scan() ->
   sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc()
   -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device()
   In hisi_sas_init_device() we issue an IT nexus reset for ATA devices

 - That IT nexus causes the remote PHY to go down and this triggers a bcast
   event

 - In parallel libsas processes the bcast event, finds that the phy is down
   and marks the device as gone

The hard reset issued in hisi_sas_init_device() is unncessary - as
described in the code comment - so remove it. Also set dev status as
HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call.

Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huawei.com
Fixes: 36c6b7613ef1 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback")
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19 20:16:26 -04:00
John Garry
71453bd9d1 scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset
We have seen errors like this when a SATA device is probed:

[524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ...
[524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00

Since commit 21c7e972475e ("scsi: hisi_sas: Disable SATA disk phy for
severe I_T nexus reset failure"), we issue an ATA softreset to disks after
a phy reset to ensure that they are in sound working order. If the
softreset is issued before the remote phy has come back up then the
softreset will fail (errors as above). Remedy this by waiting for the phy
to come back up after the reset.

Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19 20:16:26 -04:00
Xiang Chen
9b5387fe5a scsi: hisi_sas: Undo RPM resume for failed notify phy event for v3 HW
If we fail to notify the phy up event then undo the RPM resume, as the phy
up notify event handling pairs with that RPM resume.

Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huawei.com
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-10 21:47:55 -04:00
Dan Carpenter
066f4c3194 scsi: hisi_sas: Remove stray fallthrough annotation
This case statement doesn't fall through any more so remove the fallthrough
annotation.

Link: https://lore.kernel.org/r/20220317075214.GC25237@kili
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:53:03 -04:00
John Garry
095478a6e5 scsi: hisi_sas: Use libsas internal abort support
Use the common libsas internal abort functionality.

In addition, this driver has special handling for internal abort timeouts -
specifically whether to reset the controller in that instance, so extend
the API for that.

Timeout is now increased to 20 * Hz from 6 * Hz.

We also retry for failure now, but this should not make a difference.

Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-14 23:33:24 -04:00
Xingui Yang
62413199cd scsi: hisi_sas: Modify v3 HW SSP underflow error processing
In case of SSP underflow allow the response frame IU to be examined for
setting the response stat value rather than always setting
SAS_DATA_UNDERRUN.

This will mean that we call sas_ssp_task_response() in those scenarios and
may send sense data to upper layer.

Such a condition would be for bad blocks were we just reporting an
underflow error to upper layer, but now the sense data will tell
immediately that the media is faulty.

Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27 21:46:41 -05:00
Xiang Chen
286ce4c65f scsi: hisi_sas: Limit users changing debugfs BIST count value
Add a file operation for "cnt" file under bist directory, so users can only
read "cnt" or clear "cnt" to zero, but cannot randomly modify.

Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27 21:46:40 -05:00
Qi Liu
86287065fa scsi: hisi_sas: Rename error labels in hisi_sas_v3_probe()
To avoid doubt, rename the error labels to indicate the action they will
take.

Link: https://lore.kernel.org/r/1645703489-87194-5-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27 21:46:40 -05:00
Qi Liu
554fb72ee3 scsi: hisi_sas: Free irq vectors in order for v3 HW
If the driver probe fails to request the channel IRQ or fatal IRQ, the
driver will free the IRQ vectors before freeing the IRQs in free_irq(),
and this will cause a kernel BUG like this:

------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:369!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Call trace:
   free_msi_irqs+0x118/0x13c
   pci_disable_msi+0xfc/0x120
   pci_free_irq_vectors+0x24/0x3c
   hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw]
   local_pci_probe+0x44/0xb0
   work_for_cpu_fn+0x20/0x34
   process_one_work+0x1d0/0x340
   worker_thread+0x2e0/0x460
   kthread+0x180/0x190
   ret_from_fork+0x10/0x20
---[ end trace b88990335b610c11 ]---

So we use devm_add_action() to control the order in which we free the
vectors.

Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27 21:46:40 -05:00
Xiang Chen
512623de52 scsi: hisi_sas: Change hisi_sas_control_phy() phyup timeout
The time of phyup not only depends on the controller but also the type of
disk connected. As an example, from experience, for some SATA disks the
amount of time from reset/power-on to receive the D2H FIS for phyup can
take upto and more than 10s sometimes. According to the specification of
some SATA disks such as ST14000NM0018, the max time from power-on to ready
is 30s.

Based on this the current timeout of phyup at 2s which is not enough. So
set the value as HISI_SAS_WAIT_PHYUP_TIMEOUT (30s) in
hisi_sas_control_phy().

For v3 hw there is a pre-existing workaround for a HW bug, being that we
issue a link reset when the OOB occurs but the phyup does not. The current
phyup timeout is HISI_SAS_WAIT_PHYUP_TIMEOUT. So if this does occur from
when issuing a phy enable or similar via hisi_sas_control_phy(), the
subsequent HW workaround linkreset processing calls hisi_sas_control_phy(),
but this will pend the original phy reset timing out, so it is safe.

Link: https://lore.kernel.org/r/1645703489-87194-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27 21:46:40 -05:00
Xiang Chen
c4e070457a scsi: hisi_sas: Change permission of parameter prot_mask
Currently the permission of parameter prot_mask is 0x0, which means that
the member does not appear in sysfs. Change it as other module parameters
to 0444 for world-readable.

[mkp: s/v3/v2/]

Link: https://lore.kernel.org/r/1645703489-87194-2-git-send-email-john.garry@huawei.com
Fixes: d6a9000b81be ("scsi: hisi_sas: Add support for DIF feature for v2 hw")
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-27 21:46:40 -05:00
Yang Li
07dd40b307 scsi: hisi_sas: Remove unnecessary print function dev_err()
The print function dev_err() is redundant because platform_get_irq()
already prints an error.

Eliminate the follow coccicheck warnings:
./drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1661:3-10: line 1661 is
redundant because platform_get_irq() already prints an error
./drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1642:4-11: line 1642 is
redundant because platform_get_irq() already prints an error
./drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1679:3-10: line 1679 is
redundant because platform_get_irq() already prints an error

Link: https://lore.kernel.org/r/20220215020524.44268-1-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
John Garry
3f2e252ef7 scsi: libsas: Add sas_execute_ata_cmd()
Add a function to execute an ATA command using the TMF code, and use in the
hisi_sas driver. That driver needs to be able to issue the command on a
specific phy, so add an interface for that.

With that, hisi_sas_exec_internal_tmf_task() may be deleted.

Link: https://lore.kernel.org/r/1645534259-27068-19-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:02 -05:00
John Garry
4fea759edf scsi: libsas: Add sas_abort_task()
Add a generic implementation of abort task TMF handler, and use in LLDDs.

With that, some LLDDs custom TMF functions can now be deleted.

Link: https://lore.kernel.org/r/1645112566-115804-18-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:36 -05:00
John Garry
72f8810e1f scsi: libsas: Add sas_query_task()
Add a generic implementation of query task TMF handler, and use in LLDDs.

Link: https://lore.kernel.org/r/1645112566-115804-17-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:36 -05:00
John Garry
29d7769055 scsi: libsas: Add sas_lu_reset()
Add a generic implementation of LU reset TMF handler, and use in LLDDs.

Link: https://lore.kernel.org/r/1645112566-115804-16-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:36 -05:00
John Garry
e858545295 scsi: libsas: Add sas_clear_task_set()
Add a generic implementation of clear task set TMF handler, and use in
LLDDs.

Link: https://lore.kernel.org/r/1645112566-115804-15-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:36 -05:00
John Garry
69b80a0ed0 scsi: libsas: Add sas_abort_task_set()
Add a generic implementation of abort task set TMF handler, and use in
LLDDs.

Link: https://lore.kernel.org/r/1645112566-115804-14-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:36 -05:00
John Garry
693e66a0a6 scsi: libsas: Add TMF handler aborted callback
The hisi_sas and pm8001 TMF handlers have some special processing for when
the TMF is aborted, so add a callback and fill it in for those drivers.

Link: https://lore.kernel.org/r/1645112566-115804-13-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:35 -05:00
John Garry
96e54376a8 scsi: libsas: Add sas_task.tmf
Add a pointer to a sas_tmf_task to the sas_task struct, as this will be
used when the common LLDD TMF code is factored out.

Also set it for the LLDDs to store per-sas_task TMF info.

Link: https://lore.kernel.org/r/1645112566-115804-9-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:35 -05:00
John Garry
bbfe82cdba scsi: libsas: Add struct sas_tmf_task
Some of the LLDDs which use libsas have their own definition of a struct
to hold TMF info, so add a common struct for libsas.

Also add an interim force phy id field for hisi_sas driver, which will be
removed once the STP "TMF" code is factored out.

Even though some LLDDs (pm8001) use a u32 for the tag, u16 will be adequate,
as that named driver only uses tags in range [0, 1024).

Link: https://lore.kernel.org/r/1645112566-115804-8-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:35 -05:00
John Garry
da19eaba6e scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT
There is no user, so delete it.

Link: https://lore.kernel.org/r/1645112566-115804-6-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:34 -05:00
John Garry
25882c82f8 scsi: libsas: Delete lldd_clear_aca callback
This callback is never called, so remove support.

Link: https://lore.kernel.org/r/1645112566-115804-4-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-19 15:59:34 -05:00
Martin K. Petersen
ac2beb4e3b Merge branch '5.17/scsi-fixes' into 5.18/scsi-staging
Pull 5.17 fixes branch into 5.18 tree to resolve a few pm8001 driver
merge conflicts.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-14 21:51:29 -05:00
John Garry
26fc0ea74f scsi: libsas: Drop SAS_TASK_AT_INITIATOR
This flag is now only ever set, so delete it.

This also avoids a use-after-free in the pm8001 queue path, as reported in
the following:

https://lore.kernel.org/linux-scsi/c3cb7228-254e-9584-182b-007ac5e6fe0a@huawei.com/T/#m28c94c6d3ff582ec4a9fa54819180740e8bd4cfb

https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/

[mkp: checkpatch + two SAS_TASK_AT_INITIATOR references]

Link: https://lore.kernel.org/r/1644489804-85730-3-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-11 17:02:50 -05:00
John Garry
c763ec4c10 scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal
The hisi_sas_slot.is_internal member is not set properly for ATA commands
which the driver sends directly. A TMF struct pointer is normally used as a
test to set this, but it is NULL for those commands. It's not ideal, but
pass an empty TMF struct to set that member properly.

Link: https://lore.kernel.org/r/1643627607-138785-1-git-send-email-john.garry@huawei.com
Fixes: dc313f6b125b ("scsi: hisi_sas: Factor out task prep and delivery code")
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-31 16:50:59 -05:00
Christophe JAILLET
8001fa240f scsi: hisi_sas: Remove useless DMA-32 fallback configuration
As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail
for the same reason.

Simplify code and remove some dead code accordingly.

[1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t

Link: https://lore.kernel.org/r/1bf2d3660178b0e6f172e5208bc0bd68d31d9268.1642237482.git.christophe.jaillet@wanadoo.fr
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24 23:30:29 -05:00
Xiang Chen
5d9224fb07 scsi: hisi_sas: Remove unused variable and check in hisi_sas_send_ata_reset_each_phy()
In commit 29e2bac87421 ("scsi: hisi_sas: Fix some issues related to
asd_sas_port->phy_list"), we use asd_sas_port->phy_mask instead of
accessing asd_sas_port->phy_list, and it is enough to use
asd_sas_port->phy_mask to check the state of phy, so remove the unused
check and variable.

Link: https://lore.kernel.org/r/1641300126-53574-1-git-send-email-chenxiang66@hisilicon.com
Fixes: 29e2bac87421 ("scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Colin King <colin.i.king@gmail.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-07 09:09:39 -05:00
Xiang Chen
b4cc094922 scsi: hisi_sas: Use autosuspend for the host controller
The controller may frequently enter and exit suspend for each I/O which we
need to deal with. This is inefficient and may cause too much suspend and
resume activity for the controller.  To avoid this, use a default 5s
autosuspend for the controller to stop frequently suspending and
resuming. This value may still be modified via sysfs interfaces.

Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22 23:38:31 -05:00
Xiang Chen
ae9b69e85e scsi: hisi_sas: Keep controller active between ISR of phyup and the event being processed
It is possible that controller may become suspended between processing a
phyup interrupt and the event being processed by libsas. As such, we can't
ensure the controller is active when processing the phyup event - this may
cause the phyup event to be lost or other issues.  To avoid any possible
issues, add pm_runtime_get_noresume() in phyup interrupt handler and
pm_runtime_put_sync() in the work handler exit to ensure that we stay
always active. Since we only want to call pm_runtime_get_noresume() for v3
hw, signal this will a new event, HISI_PHYE_PHY_UP_PM.

Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22 23:38:31 -05:00
Xiang Chen
97f4100939 scsi: hisi_sas: Add more logs for runtime suspend/resume
Add some logs at the beginning and end of suspend/resume.

Link: https://lore.kernel.org/r/1639999298-244569-9-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22 23:38:30 -05:00
Xiang Chen
29e2bac874 scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list
Most places that use asd_sas_port->phy_list are protected by spinlock
asd_sas_port->phy_list_lock, however there are still some places which miss
grabbing the lock. Add it in function hisi_sas_refresh_port_id() when
accessing asd_sas_port->phy_list. This carries a risk that list mutates
while at the same time dropping the lock in function
hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of
accessing asd_sas_port->phy_list to avoid this risk.

Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22 23:38:29 -05:00
John Garry
6cc7390877 scsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend"
This reverts commit b14a37e011d829404c29a5ae17849d7efb034893.

In that commit, we had to filter out phy-up events during suspend, as it
work cause a deadlock between processing the phyup event and the resume HA
function try to drain the HA event workqueue to complete the resume
process.

Now that we no longer try to drain the HA event queue during the HA resume
processor, the deadlock would not occur, so remove the special handling for
it.

Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22 23:38:29 -05:00
John Garry
fbefe22811 scsi: libsas: Don't always drain event workqueue for HA resume
For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:

The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.

Other drivers which use libsas don't worry about this as none support
runtime suspend.

The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.

There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.

Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22 23:38:29 -05:00
Qi Liu
37310bad7f scsi: hisi_sas: Fix phyup timeout on FPGA
The OOB interrupt and phyup interrupt handlers may run out-of-order in high
CPU usage scenarios. Since the hisi_sas_phy.timer is added in
hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order
execution will cause hisi_sas_phy.timer timeout to trigger.

To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure
that the timer won't be added after phyup handler completes.

Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16 22:59:58 -05:00
Qi Liu
16775db613 scsi: hisi_sas: Prevent parallel FLR and controller reset
If we issue a controller reset command during executing a FLR a hung task
may be found:

 Call trace:
  __switch_to+0x158/0x1cc
  __schedule+0x2e8/0x85c
  schedule+0x7c/0x110
  schedule_timeout+0x190/0x1cc
  __down+0x7c/0xd4
  down+0x5c/0x7c
  hisi_sas_task_exec+0x510/0x680 [hisi_sas_main]
  hisi_sas_queue_command+0x24/0x30 [hisi_sas_main]
  smp_execute_task_sg+0xf4/0x23c [libsas]
  sas_smp_phy_control+0x110/0x1e0 [libsas]
  transport_sas_phy_reset+0xc8/0x190 [libsas]
  phy_reset_work+0x2c/0x40 [libsas]
  process_one_work+0x1dc/0x48c
  worker_thread+0x15c/0x464
  kthread+0x160/0x170
  ret_from_fork+0x10/0x18

This is a race condition which occurs when the FLR completes first.

Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as
HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so
now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .

Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16 22:59:58 -05:00
Qi Liu
20c634932a scsi: hisi_sas: Prevent parallel controller reset and control phy command
A user may issue a control phy command from sysfs at any time, even if the
controller is resetting.

If a phy is disabled by hardreset/linkreset command before calling
get_phys_state() in the reset path, the saved phy state may be incorrect.

To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure
that the controller reset may not run at the same time as when the phy
control function is running.

Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-16 22:59:57 -05:00