IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
John Garry <john.g.garry@oracle.com> says:
This series tidies-up libsas a bit, including:
- delete structure(s) with only one member
- delete structure members which are only ever set
- delete structure members which are never set and code which relies on
that member being set
This conflicts with the following series:
https://lore.kernel.org/linux-scsi/20230809132249.37948-1-yuehaibing@huawei.com/
Any conflict should be trivial to resolve.
Based on mkp-scsi staging at a18e81d17a7e ("scsi: ufs: ufs-pci: Add support for QEMU")
Link: https://lore.kernel.org/r/20230815115156.343535-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab3e ("[SCSI] aic94xx: new
driver"), sas_ssp_task.task_prio is never set, so delete it and any
references which depend on it being set (all of them).
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-8-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since commit 79855d178557 ("libsas: remove task_collector mode"), struct
scsi_core only contains a reference to the shost. struct scsi_core is only
used in sas_ha_struct.core, so delete scsi_core and replace with a
reference to the shost there.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-5-john.g.garry@oracle.com
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab3e ("[SCSI] aic94xx: new
driver"), sas_ha_struct.lldd_module has only ever been set, so remove it.
Struct scsi_host_template already has a reference to the LLD driver
module as to stop the driver being removed unexpectedly.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-2-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
LKP reports below warning when building for RISC-V with randconfig
configuration.
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:4567:35: sparse:
sparse: incorrect type in argument 4 (different base types)
@@ expected restricted __le32 [usertype] *[assigned] ptr
@@ got unsigned int * @@
Type cast to fix this warning.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307260823.whMNpZ1C-lkp@intel.com/
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230726051759.30038-1-sunilvl@ventanamicro.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When FIO and debugfs snapshot occur concurrently, some SATA I/Os are failed
to return to the upper layer due to the setting of HISI_SAS_REJECT_CMD_BIT.
Then the SCSI layer invokes the error processing thread. However,
sas_ata_hard_reset() in EH also fails to be reset due to the setting of
HISI_SAS_REJECT_CMD_BIT. As a result, the device is disabled.
Calling scsi_block_requests() in the front of a debugfs snapshot and wait
command complete before setting HISI_SAS_REJECT_CMD_BIT to avoid SATA I/O
failures.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The PIO read command has no response frame and the struct iu[1024] won't be
filled. I/Os which are normally completed will be treated as failed in
sas_ata_task_done() when iu contains abnormal dirty data.
Consequently ending_fis should not be filled by iu when the response frame
hasn't been written to memory.
Fixes: d380f55503ed ("scsi: hisi_sas: Don't bother clearing status buffer IU in task prep")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
DMA setup lock timeout protection is added when DMA setup frames are
received. It's a function outside the protocol and used to prevent SATA
disk I/Os from being delivered for a long time. The default value is 100ms,
it's too strict and easily triggered timeout when the disk is overloaded or
faulty. Based on the average I/O latency of 300 disks, we adjust the value
to 2.5s.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For SAS HBAs of 920 and previous version, we use init_reg_v3_hw() to set
some registers which are related to HW boards. For SAS HBAs of 920B and
later version, those HW registers are set through firmware. And different
HBA models are distinguished through pci_dev->revision.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The suspend/resume functions in this driver seem to have multiple problems,
the latest one just got introduced by a bugfix:
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
As far as I can tell, the 'usage_count' is not meant to be accessed by
device drivers at all, though I don't know what the driver is supposed to
do instead.
Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and
marking functions as __maybe_unused to avoid warnings about unused
functions. This should probably be changed to using
DEFINE_RUNTIME_DEV_PM_OPS().
Both changes require actually understanding what the driver needs to do,
and being able to test this, so instead here is the simplest patch to make
it pass the randconfig builds instead.
Fixes: e368d38cb952 ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.org
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
chenxiang <chenxiang66@hisilicon.com> says:
This series contain some fixes including:
- Grab sas_dev lock when traversing sas_dev list to avoid NULL
pointer
- Handle NCQ error when IPTT is valid
- Ensure all enabled PHYs up during controller reset
- Exit suspend state when usage count of runtime PM is greater than 0
https://lore.kernel.org/r/1679283265-115066-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the current status of the host controller is suspended, enabling a
local PHY just after disabling all local PHYs in expander environment, a
hang as follows occurs:
[ 486.854655] INFO: task kworker/u256:1:899 blocked for more than 120 seconds.
[ 486.862207] Not tainted 6.1.0-rc4+ #1
[ 486.870545] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 486.878893] task:kworker/u256:1 state:D stack:0 pid:899 ppid:2 flags:0x00000008
[ 486.887745] Workqueue: 0000:74:02.0_disco_q sas_discover_domain [libsas]
[ 486.894704] Call trace:
[ 486.897400] __switch_to+0xf0/0x170
[ 486.901146] __schedule+0x3e4/0x1160
[ 486.904970] schedule+0x64/0x104
[ 486.908442] rpm_resume+0x158/0x6a0
[ 486.912163] __pm_runtime_resume+0x5c/0x84
[ 486.916489] smp_execute_task_sg+0x1f8/0x264 [libsas]
[ 486.921773] sas_discover_expander.part.0+0xbc/0x720 [libsas]
[ 486.927750] sas_discover_root_expander+0x90/0x154 [libsas]
[ 486.933552] sas_discover_domain+0x444/0x6d0 [libsas]
[ 486.938826] process_one_work+0x1e0/0x450
[ 486.943057] worker_thread+0x150/0x44c
[ 486.947015] kthread+0x114/0x120
[ 486.950447] ret_from_fork+0x10/0x20
[ 486.954292] INFO: task kworker/u256:2:1780 blocked for more than 120 seconds.
[ 486.961637] Not tainted 6.1.0-rc4+ #1
[ 486.966087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 486.974356] task:kworker/u256:2 state:D stack:0 pid:1780 ppid:2 flags:0x00000208
[ 486.983141] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas]
[ 486.990252] Call trace:
[ 486.992930] __switch_to+0xf0/0x170
[ 486.996645] __schedule+0x3e4/0x1160
[ 487.000439] schedule+0x64/0x104
[ 487.003886] schedule_timeout+0x17c/0x1c0
[ 487.008102] wait_for_completion+0x7c/0x160
[ 487.012488] __flush_workqueue+0x104/0x3e0
[ 487.016782] sas_porte_bytes_dmaed+0x414/0x454 [libsas]
[ 487.022203] sas_port_event_worker+0x38/0x60 [libsas]
[ 487.027449] process_one_work+0x1e0/0x450
[ 487.031645] worker_thread+0x150/0x44c
[ 487.035594] kthread+0x114/0x120
[ 487.039017] ret_from_fork+0x10/0x20
[ 487.042828] INFO: task bash:11488 blocked for more than 121 seconds.
[ 487.049366] Not tainted 6.1.0-rc4+ #1
[ 487.053746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 487.061953] task:bash state:D stack:0 pid:11488 ppid:10977 flags:0x00000204
[ 487.070698] Call trace:
[ 487.073355] __switch_to+0xf0/0x170
[ 487.077050] __schedule+0x3e4/0x1160
[ 487.080833] schedule+0x64/0x104
[ 487.084270] schedule_timeout+0x17c/0x1c0
[ 487.088474] wait_for_completion+0x7c/0x160
[ 487.092851] __flush_workqueue+0x104/0x3e0
[ 487.097137] drain_workqueue+0xb8/0x160
[ 487.101159] __sas_drain_work+0x50/0x90 [libsas]
[ 487.105963] sas_suspend_ha+0x64/0xd4 [libsas]
[ 487.110590] suspend_v3_hw+0x198/0x1e8 [hisi_sas_v3_hw]
[ 487.115989] pci_pm_runtime_suspend+0x5c/0x1d0
[ 487.120606] __rpm_callback+0x50/0x150
[ 487.124535] rpm_callback+0x74/0x80
[ 487.128204] rpm_suspend+0x110/0x640
[ 487.131955] rpm_idle+0x1f4/0x2d0
[ 487.135447] __pm_runtime_idle+0x58/0x94
[ 487.139538] queue_phy_enable+0xcc/0xf0 [libsas]
[ 487.144330] store_sas_phy_enable+0x74/0x100
[ 487.148770] dev_attr_store+0x20/0x34
[ 487.152606] sysfs_kf_write+0x4c/0x5c
[ 487.156437] kernfs_fop_write_iter+0x120/0x1b0
[ 487.161049] vfs_write+0x2d0/0x36c
[ 487.164625] ksys_write+0x70/0x100
[ 487.168194] __arm64_sys_write+0x24/0x30
[ 487.172280] invoke_syscall+0x50/0x120
[ 487.176186] el0_svc_common.constprop.0+0x168/0x190
[ 487.181214] do_el0_svc+0x34/0xc0
[ 487.184680] el0_svc+0x2c/0xb4
[ 487.187879] el0t_64_sync_handler+0xb8/0xbc
[ 487.192205] el0t_64_sync+0x19c/0x1a0
We find that when all local PHYs are disabled, all the devices will be
removed, the ->runtime_suspend() callback suspend_v3_hw() directly execute
since the controller usage count drop to 0. On the other side, the first
local PHY is enabled through the sysfs interface, and ensures that function
phy_up_v3_hw() is completed due to suspend_v3_hw()->
interrupt_disable_v3_hw(). In the expander scenario,
sas_discover_root_expander() is executed in event work
DISCE_DISCOVER_DOMAIN, which will increases the controller usage count and
carry out a resume and sends SMPIO, it cannot be completed because the
runtime PM status of the controller is RPM_SUSPENDING. At the same time,
the ->runtime_suspend() callback suspend_v3_hw() also cannot complete the
process because of drain libsas event queue in sas_suspend_ha(), so hung
occurs.
(thread 1) | (thread 2)
... |
rpm_idle() |
... |
__update_runtime_status(RPM_SUSPENDING)|
... | ...
suspend_v3_hw() | smp_execute_task_sg()
... | ...
interrupt_disable_v3_hw() | pm_runtime_get_sync()
| ...
... | rpm_resume() //RPM_SUSPENDING
|
__sas_drain_work() |
To fix this, check if the current runtime PM status of the controller
allows to be suspended continue after interrupt_disable_v3_hw(), return
immediately if not.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hislicon.com>
Link: https://lore.kernel.org/r/1679283265-115066-5-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an NCQ error occurs when the IPTT is valid and slot->abort flag is set
in completion path, sas_task_abort() will be called to abort only one NCQ
command now, and the host would be set to SHOST_RECOVERY state. But this
may not kick-off EH Immediately until other outstanding QCs timeouts. As a
result, the host may remain in the SHOST_RECOVERY state for up to 30
seconds, such as follows:
[7972317.645234] hisi_sas_v3_hw 0000:74:04.0: erroneous completion iptt=3264 task=00000000466116b8 dev id=2 sas_addr=0x5000000000000502 CQ hdr: 0x1883 0x20cc0 0x40000 0x20420000 Error info: 0x0 0x0 0x200000 0x0
[7972341.508264] sas: Enter sas_scsi_recover_host busy: 32 failed: 32
[7972341.984731] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 32 tries: 1
All NCQ commands that are in the queue should be aborted when an NCQ error
occurs in this scenario.
Fixes: 05d91b557af9 ("scsi: hisi_sas: Directly trigger SCSI error handling for completion errors")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1679283265-115066-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche <bvanassche@acm.org> says:
It helps humans and the compiler if it is made explicit that SCSI host
templates are not modified. Hence this patch series that constifies most
SCSI host templates. Please consider this patch series for the next merge
window.
Link: https://lore.kernel.org/r/20230322195515.1267197-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make it explicit that the SCSI host template is not modified.
Acked-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230322195515.1267197-42-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we sync irq to avoid freeing task before using task in I/O
completion. After adding io_uring support, we need to do something similar
for poll queues. As the process of CQ entries on poll queue are protected
by spinlock cq->lock, we can use spin_lock() + spin_unlock() on cq->lock to
make sure that CQ entries are processed to completion and then the complete
queue is synced.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1678169355-76215-4-git-send-email-chenxiang66@hisilicon.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a module parameter to set how many queues are used for iopoll. Also
fill the interface mq_poll. For internal I/Os from libsas and libata we use
non-iopoll queue (queue 0) to deliver and complete them. But for internal
abort I/Os, just don't send them for poll queues.
There is still a risk associated as this sends internal abort commands to
non-iopoll queues which actually requires sending an internal abort command
to every queue. As a result, make the module parameter as "experimental"
for now.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1678169355-76215-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Put the work of processing cq slots in a separate function,
complete_v3_hw(), which can then be used by cq_thread_v3_hw() and other
functions when adding poll support.
Co-developed-by: John Garry <john.garry@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1678169355-76215-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case devm_add_action() fails, check it in the caller of
interrupt_preinit_v3_hw().
Link: https://lore.kernel.org/r/20230227031030.893324-1-void0red@gmail.com
Signed-off-by: Kang Chen <void0red@gmail.com>
Acked-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When an NCQ error occurs, the controller will abnormally complete the I/Os
that are newly delivered to disk, and bit8 in CQ dw3 will be set which
indicates that the SATA disk is in error state. The current processing flow
is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set
FIS stat to ATA_ERR. After analyzing the I/O by ata_eh_analyze_tf(),
err_mask will set to AC_ERR_HSM. If media error occurs for four times
within 10 minutes and the chip rejects new I/Os for four times, NCQ will be
disabled due to excessive errors, which is undesirable.
Therefore, use sas_task_abort() to handle abnormally completed I/Os when
SATA disk is in error state, as these abnormally completed I/Os are already
processed by sas_ata_device_link_abort() and qc->flag are set to
ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be
modified in EH. Unlike the current process flow, it will not increase the
count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the disk
that do not have an error but do not return after the NCQ error, they are
retried after the EH.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-5-git-send-email-john.garry@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in error
state and the current IPTT is invalid. An invalid IPTT does not correspond
to any slot.
In this scenario, new I/Os that delivered to disk will be rejected by the
controller and all I/Os remaining in the disk should be aborted, which we
add here with the sas_ata_device_link_abort() call.
In hisi_sas_abort_task() we don't want to issue a soft reset as it may
cause info to be lost in the target disk for the ATA EH autopsy. In this
case, just release resources - the disk won't return other I/Os normally
after NCQ Error, so this is safe.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-4-git-send-email-john.garry@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr,
mpt3sas, target); the biggest change (from my biased viewpoint) being
that the mpi3mr now attached to the SAS transport class, making it the
first fusion type device to do so. Beyond the usual bug fixing and
security class reworks, there aren't a huge number of core changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY0B74yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishW2NAP9CPp2R
7NRmSyxcyVYvtCNUW3WxXh65Gn+KgArmg8XucgEAhUBX1fSjOzpERWEU+UaXitbE
Rb+FbjxSc5YxR+nJ/Qc=
=0Wlp
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr,
mpt3sas, target). The biggest change (from my biased viewpoint) being
that the mpi3mr now attached to the SAS transport class, making it the
first fusion type device to do so.
Beyond the usual bug fixing and security class reworks, there aren't a
huge number of core changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits)
scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()
scsi: mpi3mr: Remove unnecessary cast
scsi: stex: Properly zero out the passthrough command structure
scsi: mpi3mr: Update driver version to 8.2.0.3.0
scsi: mpi3mr: Fix scheduling while atomic type bug
scsi: mpi3mr: Scan the devices during resume time
scsi: mpi3mr: Free enclosure objects during driver unload
scsi: mpi3mr: Handle 0xF003 Fault Code
scsi: mpi3mr: Graceful handling of surprise removal of PCIe HBA
scsi: mpi3mr: Schedule IRQ kthreads only on non-RT kernels
scsi: mpi3mr: Support new power management framework
scsi: mpi3mr: Update mpi3 header files
scsi: mpt3sas: Revert "scsi: mpt3sas: Fix ioc->base_readl() use"
scsi: mpt3sas: Revert "scsi: mpt3sas: Fix writel() use"
scsi: wd33c93: Remove dead code related to the long-gone config WD33C93_PIO
scsi: core: Add I/O timeout count for SCSI device
scsi: qedf: Populate sysfs attributes for vport
scsi: pm8001: Replace one-element array with flexible-array member
scsi: 3w-xxxx: Replace one-element array with flexible-array member
scsi: hptiop: Replace one-element array with flexible-array member in struct hpt_iop_request_ioctl_command()
...
Now that libsas and the SCSI core code limits the default sectors from
commit 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors
according to DMA optimal limit") and commit 608128d391fa ("scsi: sd: allow
max_sectors be capped at DMA optimal size limit"), there is no need for
the hack to limit the max HW sectors.
Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0,
change their return type into void. Most callers ignore the returned value
anyway.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org
[axboe: fold in fix from Bart]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the I/O completion response frame returned by the target device has been
written to the host memory and the err bit in the status field of the
received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will
let EH analyze and further determine cause of failure.
Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned
normally. If the cq of SMP I/O is returned with exception actually SMP TAS
is never unmapped. Relocate DMA unmap of SMP task to fix the issue.
Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is duplicated code between slave_configure_v3_hw() and
hisi_sas_slave_configure(), so call common function
hisi_sas_slave_configure() from slave_configure_v3_hw().
Link: https://lore.kernel.org/r/1657823002-139010-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we fail to notify the phy up event then undo the RPM resume, as the phy
up notify event handling pairs with that RPM resume.
Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huawei.com
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the common libsas internal abort functionality.
In addition, this driver has special handling for internal abort timeouts -
specifically whether to reset the controller in that instance, so extend
the API for that.
Timeout is now increased to 20 * Hz from 6 * Hz.
We also retry for failure now, but this should not make a difference.
Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case of SSP underflow allow the response frame IU to be examined for
setting the response stat value rather than always setting
SAS_DATA_UNDERRUN.
This will mean that we call sas_ssp_task_response() in those scenarios and
may send sense data to upper layer.
Such a condition would be for bad blocks were we just reporting an
underflow error to upper layer, but now the sense data will tell
immediately that the media is faulty.
Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a file operation for "cnt" file under bist directory, so users can only
read "cnt" or clear "cnt" to zero, but cannot randomly modify.
Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver probe fails to request the channel IRQ or fatal IRQ, the
driver will free the IRQ vectors before freeing the IRQs in free_irq(),
and this will cause a kernel BUG like this:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:369!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Call trace:
free_msi_irqs+0x118/0x13c
pci_disable_msi+0xfc/0x120
pci_free_irq_vectors+0x24/0x3c
hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw]
local_pci_probe+0x44/0xb0
work_for_cpu_fn+0x20/0x34
process_one_work+0x1d0/0x340
worker_thread+0x2e0/0x460
kthread+0x180/0x190
ret_from_fork+0x10/0x20
---[ end trace b88990335b610c11 ]---
So we use devm_add_action() to control the order in which we free the
vectors.
Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the permission of parameter prot_mask is 0x0, which means that
the member does not appear in sysfs. Change it as other module parameters
to 0444 for world-readable.
[mkp: s/v3/v2/]
Link: https://lore.kernel.org/r/1645703489-87194-2-git-send-email-john.garry@huawei.com
Fixes: d6a9000b81be ("scsi: hisi_sas: Add support for DIF feature for v2 hw")
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some of the LLDDs which use libsas have their own definition of a struct
to hold TMF info, so add a common struct for libsas.
Also add an interim force phy id field for hisi_sas driver, which will be
removed once the STP "TMF" code is factored out.
Even though some LLDDs (pm8001) use a u32 for the tag, u16 will be adequate,
as that named driver only uses tags in range [0, 1024).
Link: https://lore.kernel.org/r/1645112566-115804-8-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The controller may frequently enter and exit suspend for each I/O which we
need to deal with. This is inefficient and may cause too much suspend and
resume activity for the controller. To avoid this, use a default 5s
autosuspend for the controller to stop frequently suspending and
resuming. This value may still be modified via sysfs interfaces.
Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is possible that controller may become suspended between processing a
phyup interrupt and the event being processed by libsas. As such, we can't
ensure the controller is active when processing the phyup event - this may
cause the phyup event to be lost or other issues. To avoid any possible
issues, add pm_runtime_get_noresume() in phyup interrupt handler and
pm_runtime_put_sync() in the work handler exit to ensure that we stay
always active. Since we only want to call pm_runtime_get_noresume() for v3
hw, signal this will a new event, HISI_PHYE_PHY_UP_PM.
Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:
The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.
Other drivers which use libsas don't worry about this as none support
runtime suspend.
The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.
There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.
Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The OOB interrupt and phyup interrupt handlers may run out-of-order in high
CPU usage scenarios. Since the hisi_sas_phy.timer is added in
hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order
execution will cause hisi_sas_phy.timer timeout to trigger.
To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure
that the timer won't be added after phyup handler completes.
Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we issue a controller reset command during executing a FLR a hung task
may be found:
Call trace:
__switch_to+0x158/0x1cc
__schedule+0x2e8/0x85c
schedule+0x7c/0x110
schedule_timeout+0x190/0x1cc
__down+0x7c/0xd4
down+0x5c/0x7c
hisi_sas_task_exec+0x510/0x680 [hisi_sas_main]
hisi_sas_queue_command+0x24/0x30 [hisi_sas_main]
smp_execute_task_sg+0xf4/0x23c [libsas]
sas_smp_phy_control+0x110/0x1e0 [libsas]
transport_sas_phy_reset+0xc8/0x190 [libsas]
phy_reset_work+0x2c/0x40 [libsas]
process_one_work+0x1dc/0x48c
worker_thread+0x15c/0x464
kthread+0x160/0x170
ret_from_fork+0x10/0x18
This is a race condition which occurs when the FLR completes first.
Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as
HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so
now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .
Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>