Commit Graph

24237 Commits

Author SHA1 Message Date
Nilesh Javali
641671d97b Revert "scsi: qla2xxx: Fix buffer overrun"
Revert due to Get PLOGI Template failed.
This reverts commit b68710a809.

Cc: stable@vger.kernel.org
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-9-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:15 -04:00
Nilesh Javali
b496953dd0 scsi: qla2xxx: Fix smatch warn for qla_init_iocb_limit()
Fix indentation for warning reported by smatch:

drivers/scsi/qla2xxx/qla_init.c:4199 qla_init_iocb_limit() warn: inconsistent indenting

Fixes: efa74a62aa ("scsi: qla2xxx: Adjust IOCB resource on qpair create")
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:15 -04:00
Manish Rangankar
e9105c4b7a scsi: qla2xxx: Remove unsupported ql2xenabledif option
User accidently passed module parameter ql2xenabledif=1 which is
unsupported. However, driver still initialized which lead to guard tag
errors during device discovery.

Remove unsupported ql2xenabledif=1 option and validate the user input.

Cc: stable@vger.kernel.org
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:15 -04:00
Quinn Tran
0ba0b018f9 scsi: qla2xxx: Error code did not return to upper layer
TMF was returned with an error code. The error code was not preserved to be
returned to upper layer. Instead, the error code from the Marker was
returned.

Preserve error code from TMF and return it to upper layer.

Cc: stable@vger.kernel.org
Fixes: da7c21b72a ("scsi: qla2xxx: Fix command flush during TMF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:14 -04:00
Bikash Hazarika
cd248a95f8 scsi: qla2xxx: Add logs for SFP temperature monitoring
Add logs for SFP Temperature Alert async event to check if laser is
enabled/disabled.

Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:14 -04:00
Quinn Tran
e370b64c7d scsi: qla2xxx: Fix firmware resource tracking
The storage was not draining I/Os and the work load was not spread out
across different CPUs evenly. This led to firmware resource counters
getting overrun on the busy CPU. This overrun prevented error recovery from
happening in a timely manner.

By switching the counter to atomic, it allows the count to be little more
accurate to prevent the overrun.

Cc: stable@vger.kernel.org
Fixes: da7c21b72a ("scsi: qla2xxx: Fix command flush during TMF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:14 -04:00
Quinn Tran
6d0b65569c scsi: qla2xxx: Flush mailbox commands on chip reset
Fix race condition between Interrupt thread and Chip reset thread in trying
to flush the same mailbox. With the race condition, the "ha->mbx_intr_comp"
will get an extra complete() call. The extra complete call create erroneous
mailbox timeout condition when the next mailbox is sent where the mailbox
call does not wait for interrupt to arrive. Instead, it advances without
waiting.

Add lock protection around the check for mailbox completion.

Cc: stable@vger.kernel.org
Fixes: b2000805a9 ("scsi: qla2xxx: Flush mailbox commands on chip reset")
Signed-off-by: Quinn Tran <quinn.tran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:14 -04:00
Manish Rangankar
875386b988 scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe
Introduce infrastructure in the driver to support the processing of
unsolicited LS (Link Service) requests. This will involve the utilization
of a new pass-up of unsolicited FC-NVMe request IOCB interface. Unsolicited
requests will be submitted to the NVMe transport layer through
nvme_fc_rcv_ls_req(). Any received LS responses, which are sent using
xmt_ls_rsp(), will be forwarded to the firmware through the existing
Pass-Through IOCB interface, responsible for sending FC-NVMe Link Service
requests and responses.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:45:14 -04:00
Quinn Tran
ae25f65a35 scsi: qla2xxx: Allow 32-byte CDBs
System crashes when a 32-byte CDB was sent to a non T10 PI disk:

[  177.143279]  ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
[  177.149165]  ? internal_add_timer+0x42/0x70
[  177.153372]  qla2xxx_mqueuecommand+0x207/0x2b0 [qla2xxx]
[  177.158730]  scsi_queue_rq+0x2b7/0xc00
[  177.162501]  blk_mq_dispatch_rq_list+0x3ea/0x7e0

Current code attempted to use CRC IOCB to send the command but failed.
Instead, type 6 IOCB should be used to send the I/O.

Clone existing type 6 IOCB code with addition of MQ support to allow
32-byte CDBs to go through.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230817063132.21900-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:37:42 -04:00
Quinn Tran
efeda3bf91 scsi: qla2xxx: Move resource to allow code reuse
dsd_list contains a list of dsd buffer resources allocated during traffic
time. It resides in the qla_hw_data location where some of the code is not
reusable.

Move this list to qpair to allow reuse by either single queue or multi
queue adapter / code.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230817063132.21900-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:37:42 -04:00
Andy Shevchenko
19d7102a95 scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE
The lpfc_vmid_host_uuid is not defined as uuid_t and its usage is not the
same as for uuid_t operations (like exporting or importing).  Hence replace
call to uuid_is_null() by respective memchr_inv() without abusing casting.

With that, replace LPFC_COMPRESS_VMID_SIZE with plain number and respective
sizeof() to make code robust to changes in the future, if any.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230818155452.875781-1-andriy.shevchenko@linux.intel.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:57 -04:00
Yue Haibing
04aff456af scsi: pm8001: Remove unused declarations
Commit 4fcf812ca3 ("[SCSI] libsas: export sas_alloc_task()") removed
these implementations but not the declarations.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230818124700.49724-1-yuehaibing@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:57 -04:00
Chengfeng Ye
1a19755519 scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock
There is a long call chain that &fip->ctlr_lock is acquired by isr
fnic_isr_msix_wq_copy() under hard IRQ context. Thus other process context
code acquiring the lock should disable IRQ, otherwise deadlock could happen
if the IRQ preempts the execution while the lock is held in process context
on the same CPU.

[ISR]
fnic_isr_msix_wq_copy()
 -> fnic_wq_copy_cmpl_handler()
 -> fnic_fcpio_cmpl_handler()
 -> fnic_fcpio_flogi_reg_cmpl_handler()
 -> fnic_flush_tx()
 -> fnic_send_frame()
 -> fcoe_ctlr_els_send()
 -> spin_lock_bh(&fip->ctlr_lock)

[Process Context]
1. fcoe_ctlr_timer_work()
 -> fcoe_ctlr_flogi_send()
 -> spin_lock_bh(&fip->ctlr_lock)

2. fcoe_ctlr_recv_work()
 -> fcoe_ctlr_recv_handler()
 -> fcoe_ctlr_recv_els()
 -> fcoe_ctlr_announce()
 -> spin_lock_bh(&fip->ctlr_lock)

3. fcoe_ctlr_recv_work()
 -> fcoe_ctlr_recv_handler()
 -> fcoe_ctlr_recv_els()
 -> fcoe_ctlr_flogi_retry()
 -> spin_lock_bh(&fip->ctlr_lock)

4. -> fcoe_xmit()
 -> fcoe_ctlr_els_send()
 -> spin_lock_bh(&fip->ctlr_lock)

spin_lock_bh() is not enough since fnic_isr_msix_wq_copy() is a
hardirq.

These flaws were found by an experimental static analysis tool I am
developing for irq-related deadlock.

The patch fix the potential deadlocks by spin_lock_irqsave() to disable
hard irq.

Fixes: 794d98e77f ("[SCSI] libfcoe: retry rejected FLOGI to another FCF if possible")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230817074708.7509-1-dg573847474@gmail.com
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:56 -04:00
Rajeshwar R Shinde
2d6f70fe17 scsi: elx: sli4: Remove code duplication
In the function sli_xmit_bls_rsp64_wqe(), the 'if' and 'else' conditions
evaluates the same expression and give the same output. Also, params->s_id
shall not be equal to U32_MAX. Remove the unused code.

This fixes coccinelle warning such as:
drivers/scsi/elx/libefc_sli/sli4.c:2320:2-4: WARNING: possible
condition with no effect (if == else)

Signed-off-by: Rajeshwar R Shinde <coolrrsh@gmail.com>
Link: https://lore.kernel.org/r/20230817114301.17601-1-coolrrsh@gmail.com
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:56 -04:00
Gustavo A. R. Silva
56a4d69a26 scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s
One-element and zero-length arrays are deprecated. So, replace one-element
array in struct fc_rscn_pl_s with flexible-array member.

This results in no differences in binary output.

Link: https://github.com/KSPP/linux/issues/339
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZN0VTpDBOSVHGayb@work
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:56 -04:00
Yue Haibing
1e4474c845 scsi: qla2xxx: Remove unused declarations
These declarations are not used anymore, remove them.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230816130842.16684-1-yuehaibing@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:56 -04:00
Zheng Zengkai
5d344c5eb4 scsi: pmcraid: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it manually. Use pci_dev_id() to simplify
the code a little bit.

Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Link: https://lore.kernel.org/r/20230811111310.32364-1-zhengzengkai@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:56 -04:00
Igor Pylypiv
5454329595 scsi: pm80xx: Set RETFIS when requested by libsas
By default PM80xx HBAs return FIS only when a drive reports an error.
The RETFIS bit forces the controller to populate FIS even when a drive
reports no error.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20230819213040.1101044-3-ipylypiv@google.com
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:11:41 -04:00
Igor Pylypiv
72875018f6 scsi: libsas: Add return_fis_on_success to sas_ata_task
Set return_fis_on_success when libata requests result taskfile.

For Command Duration Limits policy 0xD (command completes without
an error) libata needs FIS in order to detect the ATA_SENSE bit and
read the Sense Data for Successful NCQ Commands log (0Fh).

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20230819213040.1101044-2-ipylypiv@google.com
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:11:41 -04:00
Jialin Zhang
bb1459cb84 scsi: megaraid: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it manually. Use pci_dev_id() to simplify
the code a little bit.

Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Link: https://lore.kernel.org/r/20230815025419.3523236-4-zhangjialin11@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:47:44 -04:00
Jialin Zhang
a46421fdf7 scsi: megaraid_sas: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it manually. Use pci_dev_id() to simplify
the code a little bit.

Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Link: https://lore.kernel.org/r/20230815025419.3523236-3-zhangjialin11@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:47:44 -04:00
Jialin Zhang
48e590218d scsi: mvumi: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it mannally. Use pci_dev_id() to simplify
the code a little bit.

Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Link: https://lore.kernel.org/r/20230815025419.3523236-2-zhangjialin11@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:47:44 -04:00
Tony Battersby
62ec209209 scsi: core: Use 32-bit hostnum in scsi_host_lookup()
Change scsi_host_lookup() hostnum argument type from unsigned short to
unsigned int to match the type used everywhere else.

Fixes: 6d49f63b41 ("[SCSI] Make host_no an unsigned int")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://lore.kernel.org/r/a02497e7-c12b-ef15-47fc-3f0a0b00ffce@cybernetics.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:42:03 -04:00
Artem Chernyshev
9a23ed57ab scsi: isci: Return result of sas_register_ha()
To properly manage possible failure of sas_register_ha() in
isci_register_sas_ha(), return its result instead of zero

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Link: https://lore.kernel.org/r/20230813202336.240874-1-artem.chernyshev@red-soft.ru
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:39:30 -04:00
Arnd Bergmann
bfaa4a0ce1 scsi: gvp11: Remove unused gvp11_setup() function
This function has no declaration, which causes a warning:

drivers/scsi/gvp11.c:53:6: error: no previous prototype for 'gvp11_setup' [-Werror=missing-prototypes]

Since there is also no caller, just remove the function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230810141947.1236730-12-arnd@kernel.org
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:37:11 -04:00
Arnd Bergmann
71cc486335 scsi: qlogicpti: Mark qlogicpti_info() static
The qlogicpti_info() function is only used in this file and should be
static to avoid a warning:

drivers/scsi/qlogicpti.c:846:13: error: no previous prototype for 'qlogicpti_info' [-Werror=missing-prototypes]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230810141947.1236730-8-arnd@kernel.org
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:37:11 -04:00
Alex Henrie
68a4f84a17 scsi: ppa: Add a module parameter for the transfer mode
I have an Iomega Z100P2 zip drive, but it does not work with my StarTech
PEX1P2 AX99100 PCIe parallel port, which evidently does not support 16-bit
or 32-bit EPP. Currently the only way to tell the PPA driver to use 8-bit
EPP is to write 'mode=3' to /proc/scsi/ppa/*, but the driver doesn't
actually distinguish between the three EPP modes and still tries to use
16-bit or 32-bit EPP. And even if writing to that file did make the driver
use 8-bit EPP, it still wouldn't do me any good because by the time that
file exists, the drive has already failed to initialize.

Add a new parameter /sys/module/ppa/mode to set the transfer mode before
initializing the drive. This parameter replaces the use of
CONFIG_SCSI_IZIP_EPP16 in the PPA driver.

At the same time, default to 8-bit EPP. 16-bit and 32-bit EPP are not
necessary for the drive to function, nor are they part of the IEEE 1284
standard, so the driver should not assume that they are available.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Link: https://lore.kernel.org/r/20230807155856.362864-2-alexhenrie24@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:32:40 -04:00
Alex Henrie
b68442ebda scsi: ppa: Fix compilation with PPA_DEBUG=1
Fix a regression introduced in February 2003 in Linux 2.5.61 by a patch
from Alan Cox titled "fix ppa for new scsi".[1]

Link: https://lore.kernel.org/lkml/E18jn1B-0005gQ-00@the-village.bc.nu/
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Link: https://lore.kernel.org/r/20230807155856.362864-1-alexhenrie24@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:32:40 -04:00
Xiang Yang
e9b525b6cc scsi: arcmsr: Add __init and __exit for arcmsr_module_{init,exit}()
Add __init and __exit for arcmsr_module_{init,exit}().

Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Link: https://lore.kernel.org/r/20230804022913.1917023-1-xiangyang3@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:27:10 -04:00
Yue Haibing
a905b5cddc scsi: core: Remove unused extern declarations
These functions have never been implemented since the beginning of git
history.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230809142107.42756-1-yuehaibing@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:23:41 -04:00
Yue Haibing
2fcd1e2b64 scsi: libsas: Remove unused declarations
Commit 042ebd293b ("scsi: libsas: kill useless ha_event and do some
cleanup") removed sas_hae_reset() but not its declaration.  Commit
2908d778ab ("[SCSI] aic94xx: new driver") declared but never implemented
other functions.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230809132249.37948-1-yuehaibing@huawei.com
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 16:22:09 -04:00
Linus Torvalds
7308e92756 SCSI fixes on 20230813
Eleven small fixes, ten in drivers.  Of the two fixes marked core, one
 is in the raid helper class (used by some raid device drivers) and the
 other one is the /proc/scsi/scsi parsing fix for potential reads
 beyond the end of the buffer.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZNiFDyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWjjAP4mtUh5
 75CoeRVGifOeMAgyoCJJrP0hsco3E/8/6U69wwD/diZOzdmYs+aKzYEJj+Y7m+rO
 PuXwF7i8QAzNoxGERiE=
 =rOpe
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Eleven small fixes, ten in drivers.

  Of the two fixes marked core, one is in the raid helper class (used by
  some raid device drivers) and the other one is the /proc/scsi/scsi
  parsing fix for potential reads beyond the end of the buffer"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qedf: Fix firmware halt over suspend and resume
  scsi: qedi: Fix firmware halt over suspend and resume
  scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock
  scsi: lpfc: Remove reftag check in DIF paths
  scsi: ufs: renesas: Fix private allocation
  scsi: snic: Fix possible memory leak if device_add() fails
  scsi: core: Fix possible memory leak if device_add() fails
  scsi: core: Fix legacy /proc parsing buffer overflow
  scsi: 53c700: Check that command slot is not NULL
  scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
  scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
2023-08-13 08:43:26 -07:00
Martin K. Petersen
9640d57d15 Merge patch series "mpi3mr: Few Enhancements and minor fixes"
Ranjan Kumar <ranjan.kumar@broadcom.com> says:

Few Enhancements and minor fixes of mpi3mr driver.

Link: https://lore.kernel.org/r/20230804104248.118924-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:43:24 -04:00
Ranjan Kumar
9a9068b2af scsi: mpi3mr: Update driver version to 8.5.0.0.0
Update driver version to 8.5.0.0.0

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-7-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:41:48 -04:00
Ranjan Kumar
d9a5ab0ea9 scsi: mpi3mr: Enhance handling of devices removed after controller reset
Mark all of the devices that are exposed to the OS prior to a controller
reset and not detected by the controller after the reset as removed devices
and the I/Os to those devices are unblocked (and returned with
DID_NO_CONNECT) prior to removing the devices one after the other.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-6-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:41:48 -04:00
Ranjan Kumar
e7a8648e1c scsi: mpi3mr: WRITE SAME implementation
Enhance driver to divert the WRITE SAME commands that are issued with
UNMAP=1 and NDOB=1 and with the transfer length greater than the max WRITE
SAME length specified by the firmware for the particular drive to the
controller firmware.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307280034.DXU5pTVV-lkp@intel.com/
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-5-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:41:48 -04:00
Ranjan Kumar
d9adb81e67 scsi: mpi3mr: Add support for more than 1MB I/O
Enhance the driver to get the maximum data length per I/O request from IOC
Facts data and report that to the upper layers.  If the IOC facts data is
not reported then a default I/O size of 1MB is reported to the OS.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:41:48 -04:00
Ranjan Kumar
6f81b1cfdf scsi: mpi3mr: Update MPI Headers to version 3.00.28
Updated MPI Headers to version 3.00.28.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-3-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:41:48 -04:00
Ranjan Kumar
9134211f7b scsi: mpi3mr: Invoke soft reset upon TSU or event ack time out
When a timestamp update or an event acknowledgment command times out, the
driver invokes the soft reset handler to recover the controller while
holding a mutex lock. The soft reset handler also tries to acquire the same
mutex to send initialization commands to the controller which leads to a
deadlock scenario.

To resolve the issue the driver will check thestatus and if this indicates
the controller is operational, the driver will issue a diagnostic fault
reset and exit out of the command processing function. If the controller is
already faulted or asynchronously reset, then the driver will just exit the
command processing function.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:41:47 -04:00
Nilesh Javali
ef222f551e scsi: qedf: Fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.

To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.

Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Nilesh Javali
1516ee035d scsi: qedi: Fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.

To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.

Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Chengfeng Ye
dd64f80587 scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock
As &qedi_percpu->p_work_lock is acquired by hard IRQ qedi_msix_handler(),
other acquisitions of the same lock under process context should disable
IRQ, otherwise deadlock could happen if the IRQ preempts the execution
while the lock is held in process context on the same CPU.

qedi_cpu_offline() is one such function which acquires the lock in process
context.

[Deadlock Scenario]
qedi_cpu_offline()
    ->spin_lock(&p->p_work_lock)
        <irq>
        ->qedi_msix_handler()
        ->edi_process_completions()
        ->spin_lock_irqsave(&p->p_work_lock, flags); (deadlock here)

This flaw was found by an experimental static analysis tool I am developing
for IRQ-related deadlocks.

The tentative patch fix the potential deadlock by spin_lock_irqsave()
under process context.

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230726125655.4197-1-dg573847474@gmail.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Justin Tee
8eebf0e84f scsi: lpfc: Remove reftag check in DIF paths
When preparing protection DIF I/O for DMA, the driver obtains reference
tags from scsi_prot_ref_tag().  Previously, there was a wrong assumption
that an all 0xffffffff value meant error and thus the driver failed the
I/O.  This patch removes the evaluation code and accepts whatever the upper
layer returns.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230803211932.155745-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Zhu Wang
41320b18a0 scsi: snic: Fix possible memory leak if device_add() fails
If device_add() returns error, the name allocated by dev_set_name() needs
be freed. As the comment of device_add() says, put_device() should be used
to give up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanp().

Fixes: c8806b6c9e ("snic: driver for Cisco SCSI HBA")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Acked-by: Narsimhulu Musini <nmusini@cisco.com>
Link: https://lore.kernel.org/r/20230801111421.63651-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Zhu Wang
04b5b5cb01 scsi: core: Fix possible memory leak if device_add() fails
If device_add() returns error, the name allocated by dev_set_name() needs
be freed. As the comment of device_add() says, put_device() should be used
to decrease the reference count in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanp().

Fixes: ee959b00c3 ("SCSI: convert struct class_device to struct device")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230803020230.226903-1-wangzhu9@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Justin Tee
dded1dc31a scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN
Only nodes whose state is at least past a PLOGI issue and strictly less
than a PRLI issue should be put into device recovery mode upon RSCN
receipt.  Previously, the allowance of LOGO and PRLI completion states did
not make sense because those nodes should be allowed to flow through and
marked as NPort dissappeared as is normally done.  A follow up RSCN GID_FT
would recover those nodes in such cases.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:25:47 -04:00
Linus Torvalds
fb0d91991c ata fixes for 6.5-rc5
- Prevent the scsi disk driver from issuing a START STOP UNIT command
    for ATA devices during system resume as this causes various issues
    reported by multiple users.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZM73RgAKCRDdoc3SxdoY
 dng8AP4qmIrU9K95uy7S9Ix8aMJj0HCWvFlBr6Evh8kpyEw7HgD/SHHlvbYg+g8n
 lD9/JWRzpHkHl5XM8DqWyKSvi906pgM=
 =m3pD
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fix from Damien Le Moal:

 - Prevent the scsi disk driver from issuing a START STOP UNIT command
   for ATA devices during system resume as this causes various issues
   reported by multiple users.

* tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata,scsi: do not issue START STOP UNIT on resume
2023-08-05 18:45:18 -07:00
Niklas Cassel
541528170a ata,scsi: remove ata_sas_port_init()
ata_sas_port_init() now only contains a single initialization.

Move this single initialization to ata_sas_port_alloc(), since:
1) ata_sas_port_alloc() already initializes some of the struct members.
2) ata_sas_port_alloc() is only used by libsas.

Suggested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-08-02 17:45:30 +09:00
Hannes Reinecke
a76f1b637c ata,scsi: cleanup __ata_port_probe()
Rename __ata_port_probe() to ata_port_probe() and drop the wrapper
ata_sas_async_probe().

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-08-02 17:45:27 +09:00
Hannes Reinecke
6c2fe21e08 ata,scsi: remove ata_sas_port_destroy()
Is now a wrapper around kfree(), so call it directly.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-08-02 17:45:16 +09:00
Hannes Reinecke
43aa43351b ata,scsi: remove ata_sas_port_{start,stop} callbacks
Callbacks are empty now, so remove them.

Also, remove the call to ap->ops->port_start() in ata_sas_port_init(),
as this would otherwise cause a NULL pointer dereference, now when the
callback is gone.

Signed-off-by: Hannes Reinecke <hare@suse.de>
[niklas: remove the call to ap->ops->port_start() in ata_sas_port_init()]
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-08-02 17:45:13 +09:00
Damien Le Moal
0a85890559 ata,scsi: do not issue START STOP UNIT on resume
During system resume, ata_port_pm_resume() triggers ata EH to
1) Resume the controller
2) Reset and rescan the ports
3) Revalidate devices
This EH execution is started asynchronously from ata_port_pm_resume(),
which means that when sd_resume() is executed, none or only part of the
above processing may have been executed. However, sd_resume() issues a
START STOP UNIT to wake up the drive from sleep mode. This command is
translated to ATA with ata_scsi_start_stop_xlat() and issued to the
device. However, depending on the state of execution of the EH process
and revalidation triggerred by ata_port_pm_resume(), two things may
happen:
1) The START STOP UNIT fails if it is received before the controller has
   been reenabled at the beginning of the EH execution. This is visible
   with error messages like:

ata10.00: device reported invalid CHS sector 0
sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
sd 9:0:0:0: PM: failed to resume async: error -5

2) The START STOP UNIT command is received while the EH process is
   on-going, which mean that it is stopped and must wait for its
   completion, at which point the command is rather useless as the drive
   is already fully spun up already. This case results also in a
   significant delay in sd_resume() which is observable by users as
   the entire system resume completion is delayed.

Given that ATA devices will be woken up by libata activity on resume,
sd_resume() has no need to issue a START STOP UNIT command, which solves
the above mentioned problems. Do not issue this command by introducing
the new scsi_device flag no_start_on_resume and setting this flag to 1
in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP
UNIT command only if this flag is not set.

Reported-by: Paul Ausbeck <paula@soe.ucsc.edu>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880
Fixes: a19a93e4c6 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Tanner Watkins <dalzot@gmail.com>
Tested-by: Paul Ausbeck <paula@soe.ucsc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
2023-08-02 17:01:12 +09:00
Tony Battersby
9426d3cef5 scsi: core: Fix legacy /proc parsing buffer overflow
(lightly modified commit message mostly by Linus Torvalds)

The parsing code for /proc/scsi/scsi is disgusting and broken.  We should
have just used 'sscanf()' or something simple like that, but the logic may
actually predate our kernel sscanf library routine for all I know.  It
certainly predates both git and BK histories.

And we can't change it to be something sane like that now, because the
string matching at the start is done case-insensitively, and the separator
parsing between numbers isn't done at all, so *any* separator will work,
including a possible terminating NUL character.

This interface is root-only, and entirely for legacy use, so there is
absolutely no point in trying to tighten up the parsing.  Because any
separator has traditionally worked, it's entirely possible that people have
used random characters rather than the suggested space.

So don't bother to try to pretty it up, and let's just make a minimal patch
that can be back-ported and we can forget about this whole sorry thing for
another two decades.

Just make it at least not read past the end of the supplied data.

Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybernetics.com/
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin K Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K Petersen <martin.petersen@oracle.com>
2023-07-31 15:39:39 -04:00
Oleksandr Natalenko
25dbc20dea scsi: qedf: Do not touch __user pointer in qedf_dbg_fp_int_cmd_read() directly
The qedf_dbg_fp_int_cmd_read() function invokes sprintf() directly on a
__user pointer, which may crash the kernel.

Avoid doing that by vmalloc()'ating a buffer for scnprintf() and then
calling simple_read_from_buffer() which does a proper copy_to_user() call.

Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Link: https://lore.kernel.org/lkml/20230724120241.40495-1-oleksandr@redhat.com/
Link: https://lore.kernel.org/linux-scsi/20230726101236.11922-1-skashyap@marvell.com/
Cc: Saurav Kashyap <skashyap@marvell.com>
Cc: Rob Evers <revers@redhat.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jozef Bacik <jobacik@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Link: https://lore.kernel.org/r/20230731084034.37021-4-oleksandr@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 14:42:59 -04:00
Oleksandr Natalenko
31b5991a9a scsi: qedf: Do not touch __user pointer in qedf_dbg_debug_cmd_read() directly
The qedf_dbg_debug_cmd_read() function invokes sprintf() directly on a
__user pointer, which may crash the kernel.

Avoid doing that by using a small on-stack buffer for scnprintf() and then
calling simple_read_from_buffer() which does a proper copy_to_user() call.

Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Link: https://lore.kernel.org/lkml/20230724120241.40495-1-oleksandr@redhat.com/
Link: https://lore.kernel.org/linux-scsi/20230726101236.11922-1-skashyap@marvell.com/
Cc: Saurav Kashyap <skashyap@marvell.com>
Cc: Rob Evers <revers@redhat.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jozef Bacik <jobacik@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Link: https://lore.kernel.org/r/20230731084034.37021-3-oleksandr@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 14:42:59 -04:00
Oleksandr Natalenko
7d3d20dee4 scsi: qedf: Do not touch __user pointer in qedf_dbg_stop_io_on_error_cmd_read() directly
The qedf_dbg_stop_io_on_error_cmd_read() function invokes sprintf()
directly on a __user pointer, which may crash the kernel.

Avoid doing that by using a small on-stack buffer for scnprintf() and then
calling simple_read_from_buffer() which does a proper copy_to_user() call.

Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Link: https://lore.kernel.org/lkml/20230724120241.40495-1-oleksandr@redhat.com/
Link: https://lore.kernel.org/linux-scsi/20230726101236.11922-1-skashyap@marvell.com/
Cc: Saurav Kashyap <skashyap@marvell.com>
Cc: Rob Evers <revers@redhat.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jozef Bacik <jobacik@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Link: https://lore.kernel.org/r/20230731084034.37021-2-oleksandr@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 14:42:59 -04:00
Alexandra Diupina
8366d1f124 scsi: 53c700: Check that command slot is not NULL
Add a check for the command slot value to avoid dereferencing a NULL
pointer.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Co-developed-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru>
Signed-off-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru>
Signed-off-by: Alexandra Diupina <adiupina@astralinux.ru>
Link: https://lore.kernel.org/r/20230728123521.18293-1-adiupina@astralinux.ru
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 14:38:17 -04:00
Karan Tilak Kumar
5a43b07a87 scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
fnic_clean_pending_aborts() was returning a non-zero value irrespective of
failure or success.  This caused the caller of this function to assume that
the device reset had failed, even though it would succeed in most cases. As
a consequence, a successful device reset would escalate to host reset.

Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20230727193919.2519-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 14:29:21 -04:00
Michael Kelley
175544ad48 scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
Hyper-V provides the ability to connect Fibre Channel LUNs to the host
system and present them in a guest VM as a SCSI device. I/O to the vFC
device is handled by the storvsc driver. The storvsc driver includes a
partial integration with the FC transport implemented in the generic
portion of the Linux SCSI subsystem so that FC attributes can be displayed
in /sys.  However, the partial integration means that some aspects of vFC
don't work properly. Unfortunately, a full and correct integration isn't
practical because of limitations in what Hyper-V provides to the guest.

In particular, in the context of Hyper-V storvsc, the FC transport timeout
function fc_eh_timed_out() causes a kernel panic because it can't find the
rport and dereferences a NULL pointer. The original patch that added the
call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this
regard.

In many cases a timeout is due to a transient condition, so the situation
can be improved by just continuing to wait like with other I/O requests
issued by storvsc, and avoiding the guaranteed panic. For a permanent
failure, continuing to wait may result in a hung thread instead of a panic,
which again may be better.

So fix the panic by removing the storvsc call to fc_eh_timed_out().  This
allows storvsc to keep waiting for a response.  The change has been tested
by users who experienced a panic in fc_eh_timed_out() due to transient
timeouts, and it solves their problem.

In the future we may want to deprecate the vFC functionality in storvsc
since it can't be fully fixed. But it has current users for whom it is
working well enough, so it should probably stay for a while longer.

Fixes: 3930d73098 ("scsi: storvsc: use default I/O timeout handler for FC devices")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 14:23:28 -04:00
Sunil V L
b7fc2caf20 scsi: hisi_sas: Fix warning detected by sparse
LKP reports below warning when building for RISC-V with randconfig
configuration.

drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:4567:35: sparse:
sparse: incorrect type in argument 4 (different base types)
@@     expected restricted __le32 [usertype] *[assigned] ptr
@@     got unsigned int * @@

Type cast to fix this warning.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307260823.whMNpZ1C-lkp@intel.com/
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230726051759.30038-1-sunilvl@ventanamicro.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 12:59:58 -04:00
Wang Jinchao
ec6c7c9f5f scsi: aic7xxx: Fix firmware build fatal error
When building with CONFIG_AIC7XXX_BUILD_FIRMWARE=y, two fatal errors
are reported as shown below:

  aicasm_gram.tab.c:203:10: fatal error: aicasm_gram.tab.h:
  No such file or directory

  aicasm_macro_gram.tab.c:167:10: fatal error: aicasm_macro_gram.tab.h:
  No such file or directory

Fix these issues to make randconfig builds more reliable.

[mkp: add missing include]

Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Link: https://lore.kernel.org/r/ZK0XIj6XzY5MCvtd@fedora
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-31 11:28:55 -04:00
Yang Yingliang
d4e0265345 scsi: pm80xx: Fix error return code in pm8001_pci_probe()
If pm8001_init_sas_add() fails, return error code in pm8001_pci_probe().

Fixes: 14a8f116cd ("scsi: pm80xx: Add GET_NVMD timeout during probe")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20230725125706.566990-1-yangyingliang@huawei.com
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-25 21:54:37 -04:00
Lin Ma
47cd3770e3 scsi: qla4xxx: Add length check when parsing nlattrs
There are three places that qla4xxx parses nlattrs:

 - qla4xxx_set_chap_entry()

 - qla4xxx_iface_set_param()

 - qla4xxx_sysfs_ddb_set_param()

and each of them directly converts the nlattr to specific pointer of
structure without length checking. This could be dangerous as those
attributes are not validated and a malformed nlattr (e.g., length 0) could
result in an OOB read that leaks heap dirty data.

Add the nla_len check before accessing the nlattr data and return EINVAL if
the length check fails.

Fixes: 26ffd7b45f ("[SCSI] qla4xxx: Add support to set CHAP entries")
Fixes: 1e9e2be3ee ("[SCSI] qla4xxx: Add flash node mgmt support")
Fixes: 00c31889f7 ("[SCSI] qla4xxx: fix data alignment and use nl helpers")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723080053.3714534-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-25 21:51:04 -04:00
Lin Ma
ee0268f230 scsi: be2iscsi: Add length check when parsing nlattrs
beiscsi_iface_set_param() parses nlattr with nla_for_each_attr and assumes
every attributes can be viewed as struct iscsi_iface_param_info.

This is not true because there is no any nla_policy to validate the
attributes passed from the upper function iscsi_set_iface_params().

Add the nla_len check before accessing the nlattr data and return EINVAL if
the length check fails.

Fixes: 0e43895ec1 ("[SCSI] be2iscsi: adding functionality to change network settings using iscsiadm")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723075938.3713864-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-25 21:49:32 -04:00
Lin Ma
ce51c81700 scsi: iscsi: Add strlen() check in iscsi_if_set{_host}_param()
The functions iscsi_if_set_param() and iscsi_if_set_host_param() convert an
nlattr payload to type char* and then call C string handling functions like
sscanf and kstrdup:

  char *data = (char*)ev + sizeof(*ev);
  ...
  sscanf(data, "%d", &value);

However, since the nlattr is provided by the user-space program and the
nlmsg skb is allocated with GFP_KERNEL instead of GFP_ZERO flag (see
netlink_alloc_large_skb() in netlink_sendmsg()), dirty data on the heap can
lead to an OOB access for those string handling functions.

By investigating how the bug is introduced, we find it is really
interesting as the old version parsing code starting from commit
fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up") treated
the nlattr as integer bytes instead of string and had length check in
iscsi_copy_param():

  if (ev->u.set_param.len != sizeof(uint32_t))
    BUG();

But, since the commit a54a52caad ("[SCSI] iscsi: fixup set/get param
functions"), the code treated the nlattr as C string while forgetting to
add any strlen checks(), opening the possibility of an OOB access.

Fix the potential OOB by adding the strlen() check before accessing the
buf. If the data passes this check, all low-level set_param handlers can
safely treat this buf as legal C string.

Fixes: fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up")
Fixes: 1d9bf13a9c ("[SCSI] iscsi class: add iscsi host set param event")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723075820.3713119-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-25 21:48:13 -04:00
Lin Ma
971dfcb74a scsi: iscsi: Add length check for nlattr payload
The current NETLINK_ISCSI netlink parsing loop checks every nlmsg to make
sure the length is bigger than sizeof(struct iscsi_uevent) and then calls
iscsi_if_recv_msg().

  nlh = nlmsg_hdr(skb);
  if (nlh->nlmsg_len < sizeof(*nlh) + sizeof(*ev) ||
    skb->len < nlh->nlmsg_len) {
    break;
  }
  ...
  err = iscsi_if_recv_msg(skb, nlh, &group);

Hence, in iscsi_if_recv_msg() the nlmsg_data can be safely converted to
iscsi_uevent as the length is already checked.

However, in other cases the length of nlattr payload is not checked before
the payload is converted to other data structures. One example is
iscsi_set_path() which converts the payload to type iscsi_path without any
checks:

  params = (struct iscsi_path *)((char *)ev + sizeof(*ev));

Whereas iscsi_if_transport_conn() correctly checks the pdu_len:

  pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
  if ((ev->u.send_pdu.hdr_size > pdu_len) ..
    err = -EINVAL;

To sum up, some code paths called in iscsi_if_recv_msg() do not check the
length of the data (see below picture) and directly convert the data to
another data structure. This could result in an out-of-bound reads and heap
dirty data leakage.

             _________  nlmsg_len(nlh) _______________
            /                                         \
+----------+--------------+---------------------------+
| nlmsghdr | iscsi_uevent |          data              |
+----------+--------------+---------------------------+
                          \                          /
                         iscsi_uevent->u.set_param.len

Fix the issue by adding the length check before accessing it. To clean up
the code, an additional parameter named rlen is added. The rlen is
calculated at the beginning of iscsi_if_recv_msg() which avoids duplicated
calculation.

Fixes: ac20c7bf07 ("[SCSI] iscsi_transport: Added Ping support")
Fixes: 43514774ff ("[SCSI] iscsi class: Add new NETLINK_ISCSI messages for cnic/bnx2i driver.")
Fixes: 1d9bf13a9c ("[SCSI] iscsi class: add iscsi host set param event")
Fixes: 01cb225dad ("[SCSI] iscsi: add target discvery event to transport class")
Fixes: 264faaaa12 ("[SCSI] iscsi: add transport end point callbacks")
Fixes: fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230725024529.428311-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-25 21:43:24 -04:00
Bart Van Assche
65a558f66c block: Improve performance for BLK_MQ_F_BLOCKING drivers
blk_mq_run_queue() runs the queue asynchronously if BLK_MQ_F_BLOCKING
has been set. This is suboptimal since running the queue asynchronously
is slower than running the queue synchronously. This patch modifies
blk_mq_run_queue() as follows if BLK_MQ_F_BLOCKING has been set:
- Run the queue synchronously if it is allowed to sleep.
- Run the queue asynchronously if it is not allowed to sleep.
Additionally, blk_mq_run_hw_queue(hctx, false) calls are modified into
blk_mq_run_hw_queue(hctx, hctx->flags & BLK_MQ_F_BLOCKING) if the caller
may be invoked from atomic context.

The following caller chains have been reviewed:

blk_mq_run_hw_queue(hctx, false)
  blk_mq_get_tag()      /* may sleep, hence the functions it calls may also sleep */
  blk_execute_rq()             /* may sleep */
  blk_mq_run_hw_queues(q, async=false)
    blk_freeze_queue_start()   /* may sleep */
    blk_mq_requeue_work()      /* may sleep */
    scsi_kick_queue()
      scsi_requeue_run_queue() /* may sleep */
      scsi_run_host_queues()
        scsi_ioctl_reset()     /* may sleep */
  blk_mq_insert_requests(hctx, ctx, list, run_queue_async=false)
    blk_mq_dispatch_plug_list(plug, from_sched=false)
      blk_mq_flush_plug_list(plug, from_schedule=false)
        __blk_flush_plug(plug, from_schedule=false)
	blk_add_rq_to_plug()
	  blk_mq_submit_bio()  /* may sleep if REQ_NOWAIT has not been set */
  blk_mq_plug_issue_direct()
    blk_mq_flush_plug_list()   /* see above */
  blk_mq_dispatch_plug_list(plug, from_sched=false)
    blk_mq_flush_plug_list()   /* see above */
  blk_mq_try_issue_directly()
    blk_mq_submit_bio()        /* may sleep if REQ_NOWAIT has not been set */
  blk_mq_try_issue_list_directly(hctx, list)
    blk_mq_insert_requests() /* see above */

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230721172731.955724-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24 20:13:12 -06:00
Bart Van Assche
d42e2e3448 scsi: Remove a blk_mq_run_hw_queues() call
blk_mq_kick_requeue_list() calls blk_mq_run_hw_queues() asynchronously.
Leave out the direct blk_mq_run_hw_queues() call. This patch causes
scsi_run_queue() to call blk_mq_run_hw_queues() asynchronously instead
of synchronously. Since scsi_run_queue() is not called from the hot I/O
submission path, this patch does not affect the hot path.

This patch prepares for allowing blk_mq_run_hw_queue() to sleep if
BLK_MQ_F_BLOCKING has been set. scsi_run_queue() may be called from
atomic context and must not sleep. Hence the removal of the
blk_mq_run_hw_queues(q, false) call. See also scsi_unblock_requests().

Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230721172731.955724-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24 20:13:12 -06:00
Bart Van Assche
b5ca9acff5 scsi: Inline scsi_kick_queue()
Inline scsi_kick_queue() to prepare for modifying the second argument
passed to blk_mq_run_hw_queues().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230721172731.955724-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24 20:13:12 -06:00
Martin K. Petersen
72b81768e8 Merge patch series: "qla2xxx driver bug fixes"
Nilesh Javali <njavali@marvell.com> says:

Martin,

Please apply the qla2xxx driver bug fixes to the scsi tree at your
earliest convenience.

Link: https://lore.kernel.org/r/20230714070104.40052-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:28:42 -04:00
Nilesh Javali
a31a596a42 scsi: qla2xxx: Update version to 10.02.08.500-k
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-11-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:48 -04:00
Quinn Tran
009e7fe4a1 scsi: qla2xxx: fix inconsistent TMF timeout
Different behavior were experienced of session being torn down vs not when
TMF is timed out. When FW detects the time out, the session is torn down.
When driver detects the time out, the session is not torn down.

Allow TMF error to return to upper layer without session tear down.

Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
5d3148d8e8 scsi: qla2xxx: Fix TMF leak through
Task management can retry up to 5 times when FW resource becomes bottle
neck. Between the retries, there is a short sleep.  Current code assumes
the chip has not reset or session has not changed.

Check for chip reset or session change before sending Task management.

Cc: stable@vger.kernel.org
Fixes: 9803fb5d27 ("scsi: qla2xxx: Fix task management cmd failure")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-9-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
8ebaa45163 scsi: qla2xxx: Turn off noisy message log
Some consider noisy log as test failure.  Turn off noisy message log.

Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
39d2274071 scsi: qla2xxx: Fix session hang in gnl
Connection does not resume after a host reset / chip reset. The cause of
the blockage is due to the FCF_ASYNC_ACTIVE left on. The gnl command was
interrupted by the chip reset. On exiting the command, this flag should be
turn off to allow relogin to reoccur. Clear this flag to prevent blockage.

Cc: stable@vger.kernel.org
Fixes: 17e64648aa ("scsi: qla2xxx: Correct fcport flags handling")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
5b51f35d12 scsi: qla2xxx: Fix erroneous link up failure
Link up failure occurred where driver failed to see certain events from FW
indicating link up (AEN 8011) and fabric login completion (AEN 8014).
Without these 2 events, driver would not proceed forward to scan the
fabric. The cause of this is due to delay in the receive of interrupt for
Mailbox 60 that causes qla to set the fw_started flag late.  The late
setting of this flag causes other interrupts to be dropped.  These dropped
interrupts happen to be the link up (AEN 8011) and fabric login completion
(AEN 8014).

Set fw_started flag early to prevent interrupts being dropped.

Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
da7c21b72a scsi: qla2xxx: Fix command flush during TMF
For each TMF request, driver iterates through each qpair and flushes
commands associated to the TMF. At the end of the qpair flush, a Marker is
used to complete the flush transaction. This process was repeated for each
qpair. The multiple flush and marker for this TMF request seems to cause
confusion for FW.

Instead, 1 flush is sent to FW. Driver would wait for FW to go through all
the I/Os on each qpair to be read then return. Driver then closes out the
transaction with a Marker.

Cc: stable@vger.kernel.org
Fixes: d90171dd0d ("scsi: qla2xxx: Multi-que support for TMF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
a8ec192427 scsi: qla2xxx: Limit TMF to 8 per function
Per FW recommendation, 8 TMF's can be outstanding for each
function. Previously, it allowed 8 per target.

Limit TMF to 8 per function.

Cc: stable@vger.kernel.org
Fixes: 6a87679626 ("scsi: qla2xxx: Fix task management cmd fail due to unavailable resource")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
efa74a62aa scsi: qla2xxx: Adjust IOCB resource on qpair create
During NVMe queue creation, a new qpair is created. FW resource limit needs
to be re-adjusted to take into account the new qpair. Otherwise, NVMe
command can not go through.  This issue was discovered while
testing/forcing FW execution to fail at load time.

Add call to readjust IOCB and exchange limit.

In addition, get FW state command and require FW to be running. Otherwise,
error is generated.

Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:47 -04:00
Quinn Tran
6dfe4344c1 scsi: qla2xxx: Fix deletion race condition
System crash when using debug kernel due to link list corruption. The cause
of the link list corruption is due to session deletion was allowed to queue
up twice.  Here's the internal trace that show the same port was allowed to
double queue for deletion on different cpu.

20808683956 015 qla2xxx [0000:13:00.1]-e801:4: Scheduling sess ffff93ebf9306800 for deletion 50:06:0e:80:12:48:ff:50 fc4_type 1
20808683957 027 qla2xxx [0000:13:00.1]-e801:4: Scheduling sess ffff93ebf9306800 for deletion 50:06:0e:80:12:48:ff:50 fc4_type 1

Move the clearing/setting of deleted flag lock.

Cc: stable@vger.kernel.org
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:27:46 -04:00
Martin K. Petersen
d417a6ffdb Merge patch series "lpfc: Update lpfc to revision 14.2.0.14"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.2.0.14

This patch set contains logging improvements, kref handling fixes,
discovery bug fixes, and refactoring of repeated code.

The patches were cut against Martin's 6.6/scsi-queue tree.

Link: https://lore.kernel.org/r/20230712180522.112722-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:57 -04:00
Justin Tee
71fe5ddac5 scsi: lpfc: Copyright updates for 14.2.0.14 patches
Update copyrights to 2023 for files modified in the 14.2.0.14 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-13-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:08 -04:00
Justin Tee
cfb9b8f506 scsi: lpfc: Update lpfc version to 14.2.0.14
Update lpfc version to 14.2.0.14

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-12-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:08 -04:00
Justin Tee
81907422ca scsi: lpfc: Clean up SLI-4 sysfs resource reporting
Currently, we have dated logic to work around the differences between SLI-4
and SLI-3 resource reporting through sysfs.

Leave the SLI-3 path untouched, but for SLI4 path, retrieve resource values
from the phba->sli4_hba->max_cfg_param structure.  Max values are populated
during ACQE events right after READ_CONFIG mbox cmd is sent.  Instead of
the dated subtraction logic, used resource calculation is directly fed into
sysfs for display.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-11-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:08 -04:00
Justin Tee
d668b368ef scsi: lpfc: Refactor cpu affinity assignment paths
During initialization, a lot of the same logic is used on MSI-X vector CPU
affinity assignment.

Create a lpfc_next_present_cpu() helper routine, and apply its usage for
refactoring purposes.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
089ea22e37 scsi: lpfc: Abort outstanding ELS cmds when mailbox timeout error is detected
A mailbox timeout error usually indicates something has gone wrong, and a
follow up reset of the HBA is a typical recovery mechanism.  Introduce a
MBX_TMO_ERR flag to detect such cases and have lpfc_els_flush_cmd abort ELS
commands if the MBX_TMO_ERR flag condition was set.  This ensures all of
the registered SGL resources meant for ELS traffic are not leaked after an
HBA reset.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
9388da3037 scsi: lpfc: Make fabric zone discovery more robust when handling unsolicited LOGO
This patch provides better target rport recovery when a target rport is
running in initiator mode to discover the fabric.  Such a target will issue
a LOGO before switching back to strict target mode and changes are made to
recover the login.  Log messages are also updated accordingly.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
04c3200114 scsi: lpfc: Set Establish Image Pair service parameter only for Target Functions
Previously, Establish Image Pair was set in all PRLI_ACC responses
regardless if the received PRLI was from an initiator or target function.
Specific target vendors that can operate in both initiator and target mode,
may view the PRLI_ACC with Establish Image Pair set as an invalid service
parameter when operating in initiator only mode.  This causes discovery
issues later when the target switches on its target mode function.

Revise logic that determines an rport's role as an initiator or target and
set the Establish Image Pair service parameter bit only if the Target
Function bit is set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
90cec07f53 scsi: lpfc: Revise ndlp kref handling for dev_loss_tmo_callbk and lpfc_drop_node
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes
the initial node reference when a vport is unloading.  When lpfc_cleanup()
sends a DEVICE_RM event and is in NPR state, the driver calls
lpfc_drop_node().  Subsequently, lpfc_drop_node() also removes an ndlp kref
thinking it is the initial reference.  This unintentionally introduces an
extra kref decrement on the ndlp object.

Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and
lpfc_drop_node() to coordinate the removal of the initial node reference.

In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided
the node is registered in the dev_loss context because the driver cannot
call the SCSI transport in dev_loss context or afterwards.  And, have
lpfc_drop_node() not remove a reference if another thread is acting or has
already acted on it.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
377d7abadd scsi: lpfc: Qualify ndlp discovery state when processing RSCN
Conditionalize when to put an ndlp into recovery mode when processing
RSCNs.  As long as an ndlp state is beyond a PLOGI issue and has been
mapped to a transport layer before, the ndlp qualifies to be put into
recovery mode.  Otherwise, treat the ndlp rport normally through the
discovery engine.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
869ab8b8a3 scsi: lpfc: Remove extra ndlp kref decrement in FLOGI cmpl for loop topology
In lpfc_cmpl_els_flogi(), the return out: label decrements the ndlp kref
signaling that FLOGI processing on the ndlp is complete.  In loop topology
path, there is an unnecessary ndlp put because it also branches to the out:
label.  This also signals ndlp usage completion too soon.  As such, remove
the extra lpfc_nlp_put() when in loop topology.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
1a5cd3d073 scsi: lpfc: Simplify fcp_abort transport callback log message
The driver is reaching into a nvme_fc_cmd_iu ptr that belongs to the
transport during an abort.  This could cause an unintentional ptr
dereference into memory that the driver does not own.  Since the
nvme_fc_cmd_iu ptr was for logging purposes only, simplify the log message
such that the nvme_fc_cmd_iu reference is no longer needed.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
4cf7cfa8ba scsi: lpfc: Pull out fw diagnostic dump log message from driver's trace buffer
The firmware diagnostic dump log message does not need to be a part of the
driver's log trace buffer because it is an expected user triggered event.
Change LOG_TRACE_EVENT verbose flag to LOG_SLI.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Rob Herring
109a2a48fc scsi: sun_esp: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.  As
part of that merge prepping Arm DT support 13 years ago, they "temporarily"
include each other. They also include platform_device.h and of.h. As a
result, there's a pretty much random mix of those include files used
throughout the tree. In order to detangle these headers and replace the
implicit includes with struct declarations, users need to explicitly
include the correct includes.

Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230714175052.4066150-1-robh@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 15:49:41 -04:00
Rob Herring
c4ca20f0f1 scsi: qlogicpti: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.  As
part of that merge prepping Arm DT support 13 years ago, they "temporarily"
include each other. They also include platform_device.h and of.h. As a
result, there's a pretty much random mix of those include files used
throughout the tree. In order to detangle these headers and replace the
implicit includes with struct declarations, users need to explicitly
include the correct includes.

Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230714175052.4066150-1-robh@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 15:49:41 -04:00
Michael Kelley
010c1e1c57 scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
The Hyper-V host is queried to get the max transfer size that it supports,
and this value is used to set max_sectors for the synthetic SCSI
controller.  However, this max transfer size may be too large for virtual
Fibre Channel devices, which are limited to 512 Kbytes.  If a larger
transfer size is used with a vFC device, Hyper-V always returns an error,
and storvsc logs a message like this where the SRB status and SCSI status
are both zero:

hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001

Add logic to limit the max transfer size to 512 Kbytes for vFC devices.

Fixes: 1d3e098078 ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1689887102-32806-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 15:43:15 -04:00
Yihang Li
29f45ed18a scsi: hisi_sas: Delete unused lock in hisi_sas_port_notify_formed()
Currently spinlock hisi_hba->lock is used by both interrupts and threads
which requires the use of spin_lock_irqsave()/spin_unlock_irqrestore().
However, some places still use spin_lock()/spin_unlock().  Reviewing the
code revealed that it is unnecessary to use hisi_hba->lock in the function
hisi_sas_port_notify_formed() which is the only place that uses the
spinlock in interrupt context. So delete unused lock in
hisi_sas_port_notify_formed().

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 15:18:25 -04:00
Yihang Li
32be33747d scsi: hisi_sas: Block requests before a debugfs snapshot
When FIO and debugfs snapshot occur concurrently, some SATA I/Os are failed
to return to the upper layer due to the setting of HISI_SAS_REJECT_CMD_BIT.
Then the SCSI layer invokes the error processing thread. However,
sas_ata_hard_reset() in EH also fails to be reset due to the setting of
HISI_SAS_REJECT_CMD_BIT. As a result, the device is disabled.

Calling scsi_block_requests() in the front of a debugfs snapshot and wait
command complete before setting HISI_SAS_REJECT_CMD_BIT to avoid SATA I/O
failures.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 15:18:25 -04:00
Xingui Yang
f5393a5602 scsi: hisi_sas: Fix normally completed I/O analysed as failed
The PIO read command has no response frame and the struct iu[1024] won't be
filled. I/Os which are normally completed will be treated as failed in
sas_ata_task_done() when iu contains abnormal dirty data.

Consequently ending_fis should not be filled by iu when the response frame
hasn't been written to memory.

Fixes: d380f55503 ("scsi: hisi_sas: Don't bother clearing status buffer IU in task prep")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 15:18:25 -04:00
Yu Kuai
80b6051085 scsi: sg: Fix checking return value of blk_get_queue()
Commit fcaa174a9c ("scsi/sg: don't grab scsi host module reference") make
a mess how blk_get_queue() is called, blk_get_queue() returns true on
success while the caller expects it returns 0 on success.

Fix this problem and also add a corresponding error message on failure.

Fixes: fcaa174a9c ("scsi/sg: don't grab scsi host module reference")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Closes: https://lore.kernel.org/all/87lefv622n.fsf@linux.ibm.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230705024001.177585-1-yukuai1@huaweicloud.com
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-19 23:12:13 -04:00