2322 Commits

Author SHA1 Message Date
Linus Torvalds
6ed92e559a SCSI misc on 20231102
Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc,
 scsi_debug) plus the usual assorted minor fixes and updates.  The
 major change this time around is a prep patch for rethreading of the
 driver reset handler API not to take a scsi_cmd structure which starts
 to reduce various drivers' dependence on scsi_cmd in error handling.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZUORLiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQ4WAQDDIhzp
 /PiJBBtt0U9ii/lYqRLrOVnN0extKEgEGO+FbwEAssKgs+5Jn/7XCgdpSrx8Co3/
 0cPXrZGxs7tFpFWLZjM=
 =AlRU
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc,
  scsi_debug) plus the usual assorted minor fixes and updates.

  The major change this time around is a prep patch for rethreading of
  the driver reset handler API not to take a scsi_cmd structure which
  starts to reduce various drivers' dependence on scsi_cmd in error
  handling"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (132 commits)
  scsi: ufs: core: Leave space for '\0' in utf8 desc string
  scsi: ufs: core: Conversion to bool not necessary
  scsi: ufs: core: Fix race between force complete and ISR
  scsi: megaraid: Fix up debug message in megaraid_abort_and_reset()
  scsi: aic79xx: Fix up NULL command in ahd_done()
  scsi: message: fusion: Initialize return value in mptfc_bus_reset()
  scsi: mpt3sas: Fix loop logic
  scsi: snic: Remove useless code in snic_dr_clean_pending_req()
  scsi: core: Add comment to target_destroy in scsi_host_template
  scsi: core: Clean up scsi_dev_queue_ready()
  scsi: pmcraid: Add missing scsi_device_put() in pmcraid_eh_target_reset_handler()
  scsi: target: core: Fix kernel-doc comment
  scsi: pmcraid: Fix kernel-doc comment
  scsi: core: Handle depopulation and restoration in progress
  scsi: ufs: core: Add support for parsing OPP
  scsi: ufs: core: Add OPP support for scaling clocks and regulators
  scsi: ufs: dt-bindings: common: Add OPP table
  scsi: scsi_debug: Add param to control sdev's allow_restart
  scsi: scsi_debug: Add debugfs interface to fail target reset
  scsi: scsi_debug: Add new error injection type: Reset LUN failed
  ...
2023-11-02 15:13:50 -10:00
Linus Torvalds
eb55307e67 X86 core code updates:
- Limit the hardcoded topology quirk for Hygon CPUs to those which have a
     model ID less than 4. The newer models have the topology CPUID leaf 0xB
     correctly implemented and are not affected.
 
   - Make SMT control more robust against enumeration failures
 
     SMT control was added to allow controlling SMT at boottime or
     runtime. The primary purpose was to provide a simple mechanism to
     disable SMT in the light of speculation attack vectors.
 
     It turned out that the code is sensible to enumeration failures and
     worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
     which means the primary thread mask is not set up correctly. By chance
     a XEN/PV boot ends up with smp_num_siblings == 2, which makes the
     hotplug control stay at its default value "enabled". So the mask is
     never evaluated.
 
     The ongoing rework of the topology evaluation caused XEN/PV to end up
     with smp_num_siblings == 1, which sets the SMT control to "not
     supported" and the empty primary thread mask causes the hotplug core to
     deny the bringup of the APS.
 
     Make the decision logic more robust and take 'not supported' and 'not
     implemented' into account for the decision whether a CPU should be
     booted or not.
 
   - Fake primary thread mask for XEN/PV
 
     Pretend that all XEN/PV vCPUs are primary threads, which makes the
     usage of the primary thread mask valid on XEN/PV. That is consistent
     with because all of the topology information on XEN/PV is fake or even
     non-existent.
 
   - Encapsulate topology information in cpuinfo_x86
 
     Move the randomly scattered topology data into a separate data
     structure for readability and as a preparatory step for the topology
     evaluation overhaul.
 
   - Consolidate APIC ID data type to u32
 
     It's fixed width hardware data and not randomly u16, int, unsigned long
     or whatever developers decided to use.
 
   - Cure the abuse of cpuinfo for persisting logical IDs.
 
     Per CPU cpuinfo is used to persist the logical package and die
     IDs. That's really not the right place simply because cpuinfo is
     subject to be reinitialized when a CPU goes through an offline/online
     cycle.
 
     Use separate per CPU data for the persisting to enable the further
     topology management rework. It will be removed once the new topology
     management is in place.
 
   - Provide a debug interface for inspecting topology information
 
     Useful in general and extremly helpful for validating the topology
     management rework in terms of correctness or "bug" compatibility.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmU+yX0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoROUD/4vlvKEcpm9rbI5DzLcaq4DFHKbyEZF
 cQtzuOSM/9vTc9DHnuoNNLl9TWSYxiVYnejf3E21evfsqspYlzbTH8bId9XBCUid
 6B68AJW842M2erNuwj0b0HwF1z++zpDmBDyhGOty/KQhoM8pYOHMvntAmbzJbuso
 Dgx6BLVFcboTy6RwlfRa0EE8f9W5V+JbmG/VBDpdyCInal7VrudoVFZmWQnPIft7
 zwOJpAoehkp8OKq7geKDf79yWxu9a1sNPd62HtaVEvfHwehHqE6OaMLss1us+0vT
 SJ/D6gmRQBOwcXaZL0wL1dG7Km9Et4AisOvzhXGvTa5b2D5oljVoqJ7V7FTf5g3u
 y3aqWbeUJzERUbeJt1HoGVAKyA4GtZOvg+TNIysf6F1Z4khl9alfa9jiqjj4g1au
 zgItq/ZMBEBmJ7X4FxQUEUVBG2CDsEidyNBDRcimWQUDfBakV/iCs0suD8uu8ZOD
 K5jMx8Hi2+xFx7r1YqsfsyMBYOf/zUZw65RbNe+kI992JbJ9nhcODbnbo5MlAsyv
 vcqlK5FwXgZ4YAC8dZHU/tyTiqAW7oaOSkqKwTP5gcyNEqsjQHV//q6v+uqtjfYn
 1C4oUsRHT2vJiV9ktNJTA4GQHIYF4geGgpG8Ih2SjXsSzdGtUd3DtX1iq0YiLEOk
 eHhYsnniqsYB5g==
 =xrz8
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Thomas Gleixner:

 - Limit the hardcoded topology quirk for Hygon CPUs to those which have
   a model ID less than 4.

   The newer models have the topology CPUID leaf 0xB correctly
   implemented and are not affected.

 - Make SMT control more robust against enumeration failures

   SMT control was added to allow controlling SMT at boottime or
   runtime. The primary purpose was to provide a simple mechanism to
   disable SMT in the light of speculation attack vectors.

   It turned out that the code is sensible to enumeration failures and
   worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
   which means the primary thread mask is not set up correctly. By
   chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
   the hotplug control stay at its default value "enabled". So the mask
   is never evaluated.

   The ongoing rework of the topology evaluation caused XEN/PV to end up
   with smp_num_siblings == 1, which sets the SMT control to "not
   supported" and the empty primary thread mask causes the hotplug core
   to deny the bringup of the APS.

   Make the decision logic more robust and take 'not supported' and 'not
   implemented' into account for the decision whether a CPU should be
   booted or not.

 - Fake primary thread mask for XEN/PV

   Pretend that all XEN/PV vCPUs are primary threads, which makes the
   usage of the primary thread mask valid on XEN/PV. That is consistent
   with because all of the topology information on XEN/PV is fake or
   even non-existent.

 - Encapsulate topology information in cpuinfo_x86

   Move the randomly scattered topology data into a separate data
   structure for readability and as a preparatory step for the topology
   evaluation overhaul.

 - Consolidate APIC ID data type to u32

   It's fixed width hardware data and not randomly u16, int, unsigned
   long or whatever developers decided to use.

 - Cure the abuse of cpuinfo for persisting logical IDs.

   Per CPU cpuinfo is used to persist the logical package and die IDs.
   That's really not the right place simply because cpuinfo is subject
   to be reinitialized when a CPU goes through an offline/online cycle.

   Use separate per CPU data for the persisting to enable the further
   topology management rework. It will be removed once the new topology
   management is in place.

 - Provide a debug interface for inspecting topology information

   Useful in general and extremly helpful for validating the topology
   management rework in terms of correctness or "bug" compatibility.

* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
  x86/cpu: Provide debug interface
  x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  x86/apic: Use u32 for [gs]et_apic_id()
  x86/apic: Use u32 for phys_pkg_id()
  x86/apic: Use u32 for cpu_present_to_apicid()
  x86/apic: Use u32 for check_apicid_used()
  x86/apic: Use u32 for APIC IDs in global data
  x86/apic: Use BAD_APICID consistently
  x86/cpu: Move cpu_l[l2]c_id into topology info
  x86/cpu: Move logical package and die IDs into topology info
  x86/cpu: Remove pointless evaluation of x86_coreid_bits
  x86/cpu: Move cu_id into topology info
  x86/cpu: Move cpu_core_id into topology info
  hwmon: (fam15h_power) Use topology_core_id()
  scsi: lpfc: Use topology_core_id()
  x86/cpu: Move cpu_die_id into topology info
  x86/cpu: Move phys_proc_id into topology info
  x86/cpu: Encapsulate topology information in cpuinfo_x86
  ...
2023-10-30 17:37:47 -10:00
Martin K. Petersen
af46076d66 Merge patch series "lpfc: Update lpfc to revision 14.2.0.15"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.2.0.15

This patch set contains error handling fixes, ELS bug fixes, and
logging improvements.

The patches were cut against Martin's 6.7/scsi-queue tree.

Link: https://lore.kernel.org/r/20231009161812.97232-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 17:00:47 -04:00
Justin Tee
8a9a690b5a scsi: lpfc: Update lpfc version to 14.2.0.15
Update lpfc version to 14.2.0.15.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Justin Tee
41c831bbb0 scsi: lpfc: Introduce LOG_NODE_VERBOSE messaging flag
The preexisting LOG_NODE message flag frequently spams a subset of the same
log messages during normal FC driver operations.  When analyzing driver
logs, this sometimes leads to difficulty in troubleshooting.

Because LOG_IP log message flag is unused, convert it to a new
LOG_NODE_VERBOSE flag.  The LOG_NODE_VERBOSE shall specifically be used for
diagnosing issues that require precise ndlp tracking detail.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Justin Tee
a3c3c0a806 scsi: lpfc: Validate ELS LS_ACC completion payload
A WCQE success completion status does not guarantee valid LS_ACC receipt
for ELS commands.  So, introduce a small helper routine that validates ELS
LS_ACC frames in ELS cmpl routines.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Justin Tee
12e896c742 scsi: lpfc: Reject received PRLIs with only initiator fcn role for NPIV ports
Currently, NPIV ports send PRLI_ACC to all received unsolicited PRLI
requests.  For an NPIV port, there is no point to PRLI_ACC if the received
PRLI request has the initiator function bit set and the target function bit
unset.  Modify the lpfc_rcv_prli_support_check() routine to send a PRLI_RJT
in such cases.  NPIV ports are expected to send PRLI_ACC only if the Target
function bit is set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Justin Tee
d472a76603 scsi: lpfc: Treat IOERR_SLI_DOWN I/O completion status the same as pci offline
During receipt of a hardware error attention ACQE, IOERR_SLI_DOWN status is
set by the driver for all outstanding I/Os.

In such hardware error attention cases, we can treat the situation exactly
the same as pci_channel_offline.  Thus, add IOERR_SLI_DOWN status to the
same category as pci_channel_offline handling in lpfc_nvme_io_cmd_cmpl.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Justin Tee
0506814609 scsi: lpfc: Remove unnecessary zero return code assignment in lpfc_sli4_hba_setup
In order to enter the !rc if statement block in question, rc had to have
been zero to begin with.  Thus, the rc = 0 assignment is unnecessary and
can be removed.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Thomas Gleixner
09253672b5 scsi: lpfc: Use topology_core_id()
Use the provided topology helper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.446856860@linutronix.de
2023-10-10 14:38:17 +02:00
Thomas Gleixner
02fb601d27 x86/cpu: Move phys_proc_id into topology info
Rename it to pkg_id which is the terminology used in the kernel.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.329006989@linutronix.de
2023-10-10 14:38:17 +02:00
Justin Tee
dae40be7a1 scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is
decremented twice.  Once for SCSI transport registration and second to
remove the initial node allocation kref.  If there is also an NVMe
transport registration, another reference count decrement is expected in
lpfc_nvme_unregister_port().

Race conditions between the NVMe transport remoteport_delete and
dev_loss_tmo callbacks sometimes results in premature ndlp object release
resulting in use-after-free issues.

Fix by not dropping the ndlp object in dev_loss_tmo callback with an
outstanding NVMe transport registration.  Inversely, mark the final
NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13 20:51:16 -04:00
Justin Tee
9c3034968e scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking
nlp_flag for NLP_DROPPED.  There is an attempt to restore the ndlp lock
when exiting the if statement, but the nlp_put kref could be the final
decrement causing a use-after-free memory access on a released ndlp object.

Instead of trying to reacquire the ndlp lock after checking nlp_flag, just
return after calling nlp_put.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com
Reviewed-by: "Ewan D. Milne" <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13 20:49:34 -04:00
Jinjie Ruan
7dcc683db3 scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
Since debugfs_create_file() returns ERR_PTR and never NULL, use IS_ERR() to
check the return value.

Fixes: 2fcbc569b9f5 ("scsi: lpfc: Make debugfs ktime stats generic for NVME and SCSI")
Fixes: 4c47efc140fa ("scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures")
Fixes: 6a828b0f6192 ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues")
Fixes: 95bfc6d8ad86 ("scsi: lpfc: Make FW logging dynamically configurable")
Fixes: 9f77870870d8 ("scsi: lpfc: Add debugfs support for cm framework buffers")
Fixes: c490850a0947 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230906030809.2847970-1-ruanjinjie@huawei.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13 20:48:36 -04:00
James Bottomley
e03843a0f0 Merge branch 'fixes' into misc 2023-09-02 08:25:19 +01:00
Andy Shevchenko
19d7102a95 scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE
The lpfc_vmid_host_uuid is not defined as uuid_t and its usage is not the
same as for uuid_t operations (like exporting or importing).  Hence replace
call to uuid_is_null() by respective memchr_inv() without abusing casting.

With that, replace LPFC_COMPRESS_VMID_SIZE with plain number and respective
sizeof() to make code robust to changes in the future, if any.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230818155452.875781-1-andriy.shevchenko@linux.intel.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21 17:13:57 -04:00
Justin Tee
8eebf0e84f scsi: lpfc: Remove reftag check in DIF paths
When preparing protection DIF I/O for DMA, the driver obtains reference
tags from scsi_prot_ref_tag().  Previously, there was a wrong assumption
that an all 0xffffffff value meant error and thus the driver failed the
I/O.  This patch removes the evaluation code and accepts whatever the upper
layer returns.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230803211932.155745-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:34:08 -04:00
Justin Tee
dded1dc31a scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN
Only nodes whose state is at least past a PLOGI issue and strictly less
than a PRLI issue should be put into device recovery mode upon RSCN
receipt.  Previously, the allowance of LOGO and PRLI completion states did
not make sense because those nodes should be allowed to flow through and
marked as NPort dissappeared as is normally done.  A follow up RSCN GID_FT
would recover those nodes in such cases.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:25:47 -04:00
Justin Tee
71fe5ddac5 scsi: lpfc: Copyright updates for 14.2.0.14 patches
Update copyrights to 2023 for files modified in the 14.2.0.14 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-13-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:08 -04:00
Justin Tee
cfb9b8f506 scsi: lpfc: Update lpfc version to 14.2.0.14
Update lpfc version to 14.2.0.14

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-12-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:08 -04:00
Justin Tee
81907422ca scsi: lpfc: Clean up SLI-4 sysfs resource reporting
Currently, we have dated logic to work around the differences between SLI-4
and SLI-3 resource reporting through sysfs.

Leave the SLI-3 path untouched, but for SLI4 path, retrieve resource values
from the phba->sli4_hba->max_cfg_param structure.  Max values are populated
during ACQE events right after READ_CONFIG mbox cmd is sent.  Instead of
the dated subtraction logic, used resource calculation is directly fed into
sysfs for display.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-11-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:08 -04:00
Justin Tee
d668b368ef scsi: lpfc: Refactor cpu affinity assignment paths
During initialization, a lot of the same logic is used on MSI-X vector CPU
affinity assignment.

Create a lpfc_next_present_cpu() helper routine, and apply its usage for
refactoring purposes.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
089ea22e37 scsi: lpfc: Abort outstanding ELS cmds when mailbox timeout error is detected
A mailbox timeout error usually indicates something has gone wrong, and a
follow up reset of the HBA is a typical recovery mechanism.  Introduce a
MBX_TMO_ERR flag to detect such cases and have lpfc_els_flush_cmd abort ELS
commands if the MBX_TMO_ERR flag condition was set.  This ensures all of
the registered SGL resources meant for ELS traffic are not leaked after an
HBA reset.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
9388da3037 scsi: lpfc: Make fabric zone discovery more robust when handling unsolicited LOGO
This patch provides better target rport recovery when a target rport is
running in initiator mode to discover the fabric.  Such a target will issue
a LOGO before switching back to strict target mode and changes are made to
recover the login.  Log messages are also updated accordingly.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
04c3200114 scsi: lpfc: Set Establish Image Pair service parameter only for Target Functions
Previously, Establish Image Pair was set in all PRLI_ACC responses
regardless if the received PRLI was from an initiator or target function.
Specific target vendors that can operate in both initiator and target mode,
may view the PRLI_ACC with Establish Image Pair set as an invalid service
parameter when operating in initiator only mode.  This causes discovery
issues later when the target switches on its target mode function.

Revise logic that determines an rport's role as an initiator or target and
set the Establish Image Pair service parameter bit only if the Target
Function bit is set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
90cec07f53 scsi: lpfc: Revise ndlp kref handling for dev_loss_tmo_callbk and lpfc_drop_node
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes
the initial node reference when a vport is unloading.  When lpfc_cleanup()
sends a DEVICE_RM event and is in NPR state, the driver calls
lpfc_drop_node().  Subsequently, lpfc_drop_node() also removes an ndlp kref
thinking it is the initial reference.  This unintentionally introduces an
extra kref decrement on the ndlp object.

Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and
lpfc_drop_node() to coordinate the removal of the initial node reference.

In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided
the node is registered in the dev_loss context because the driver cannot
call the SCSI transport in dev_loss context or afterwards.  And, have
lpfc_drop_node() not remove a reference if another thread is acting or has
already acted on it.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
377d7abadd scsi: lpfc: Qualify ndlp discovery state when processing RSCN
Conditionalize when to put an ndlp into recovery mode when processing
RSCNs.  As long as an ndlp state is beyond a PLOGI issue and has been
mapped to a transport layer before, the ndlp qualifies to be put into
recovery mode.  Otherwise, treat the ndlp rport normally through the
discovery engine.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
869ab8b8a3 scsi: lpfc: Remove extra ndlp kref decrement in FLOGI cmpl for loop topology
In lpfc_cmpl_els_flogi(), the return out: label decrements the ndlp kref
signaling that FLOGI processing on the ndlp is complete.  In loop topology
path, there is an unnecessary ndlp put because it also branches to the out:
label.  This also signals ndlp usage completion too soon.  As such, remove
the extra lpfc_nlp_put() when in loop topology.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
1a5cd3d073 scsi: lpfc: Simplify fcp_abort transport callback log message
The driver is reaching into a nvme_fc_cmd_iu ptr that belongs to the
transport during an abort.  This could cause an unintentional ptr
dereference into memory that the driver does not own.  Since the
nvme_fc_cmd_iu ptr was for logging purposes only, simplify the log message
such that the nvme_fc_cmd_iu reference is no longer needed.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee
4cf7cfa8ba scsi: lpfc: Pull out fw diagnostic dump log message from driver's trace buffer
The firmware diagnostic dump log message does not need to be a part of the
driver's log trace buffer because it is an expected user triggered event.
Change LOG_TRACE_EVENT verbose flag to LOG_SLI.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Martin K. Petersen
e96277a570 Merge branch '6.5/scsi-staging' into 6.5/scsi-fixes
Pull in the currently staged SCSI fixes for 6.5.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-11 12:15:15 -04:00
Linus Torvalds
7fcd473a64 SCSI misc on 20230708
A few late arriving patches that missed the initial pull request.
 It's mostly bug fixes (the dt-bindings is a fix for the initial pull).
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZKmvwSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishResAQCPDbBh
 omMRBE+W+Vx2TgOJGjo/F+T1D2JjBhLIGpNVggEApJtgrQutAToiCU/qIP9GOTl7
 evetzh5boMMuyD2s7ak=
 =pi4v
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "A few late arriving patches that missed the initial pull request. It's
  mostly bug fixes (the dt-bindings is a fix for the initial pull)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Remove unused function declaration
  scsi: target: docs: Remove tcm_mod_builder.py
  scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
  scsi: dt-bindings: ufs: qcom: Fix ICE phandle
  scsi: core: Simplify scsi_cdl_check_cmd()
  scsi: isci: Fix comment typo
  scsi: smartpqi: Replace one-element arrays with flexible-array members
  scsi: target: tcmu: Replace strlcpy() with strscpy()
  scsi: ncr53c8xx: Replace strlcpy() with strscpy()
  scsi: lpfc: Fix lpfc_name struct packing
2023-07-08 12:35:18 -07:00
Tuo Li
0e881c0a4b scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan()
The variable phba->fcf.fcf_flag is often protected by the lock
phba->hbalock() when is accessed. Here is an example in
lpfc_unregister_fcf_rescan():

  spin_lock_irq(&phba->hbalock);
  phba->fcf.fcf_flag |= FCF_INIT_DISC;
  spin_unlock_irq(&phba->hbalock);

However, in the same function, phba->fcf.fcf_flag is assigned with 0
without holding the lock, and thus can cause a data race:

  phba->fcf.fcf_flag = 0;

To fix this possible data race, a lock and unlock pair is added when
accessing the variable phba->fcf.fcf_flag.

Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-05 21:14:22 -04:00
Linus Torvalds
ca7ce08d6a SCSI misc on 20230629
Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
 lpfc, qla2xxx).  We have a couple of major core changes impacting
 other systems: Command Duration Limits, which spills into block and
 ATA and block level Persistent Reservation Operations, which touches
 block, nvme, target and dm (both of which are added with merge commits
 containing a cover letter explaining what's going on).
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZJ19cSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfZpAQCQBuWR
 ELcOhsaG5KzO6xLWcH8mjsOoxffKvazZjTKXlAD5ATEv7++E250oKS3t+yfjae5I
 Lc195MlDju85ItUQgfk=
 =U9ik
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
  lpfc, qla2xxx).

  We have a couple of major core changes impacting other systems:

   - Command Duration Limits, which spills into block and ATA

   - block level Persistent Reservation Operations, which touches block,
     nvme, target and dm

  Both of these are added with merge commits containing a cover letter
  explaining what's going on"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
  scsi: core: Improve warning message in scsi_device_block()
  scsi: core: Replace scsi_target_block() with scsi_block_targets()
  scsi: core: Don't wait for quiesce in scsi_device_block()
  scsi: core: Don't wait for quiesce in scsi_stop_queue()
  scsi: core: Merge scsi_internal_device_block() and device_block()
  scsi: sg: Increase number of devices
  scsi: bsg: Increase number of devices
  scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
  scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
  scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
  scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
  scsi: ufs: ufs-qcom: Switch to the new ICE API
  scsi: ufs: dt-bindings: qcom: Add ICE phandle
  scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
  scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
  scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
  scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
  scsi: ufs: core: Remove dedicated hwq for dev command
  scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
  scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
  ...
2023-06-30 11:57:07 -07:00
Arnd Bergmann
00c2cae6b6 scsi: lpfc: Fix lpfc_name struct packing
clang points out that the lpfc_name structure has an 8-byte alignment
requirement on most architectures, but is embedded in a number of other
structures that are forced to be only 1-byte aligned:

drivers/scsi/lpfc/lpfc_hw.h:1516:30: error: field pe within 'struct lpfc_fdmi_reg_port_list' is less aligned than 'struct lpfc_fdmi_port_entry' and is usually due to 'struct lpfc_fdmi_reg_port_list' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
        struct lpfc_fdmi_port_entry pe;
drivers/scsi/lpfc/lpfc_hw.h:850:19: error: field portName within 'struct _ADISC' is less aligned than 'struct lpfc_name' and is usually due to 'struct _ADISC' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
drivers/scsi/lpfc/lpfc_hw.h:851:19: error: field nodeName within 'struct _ADISC' is less aligned than 'struct lpfc_name' and is usually due to 'struct _ADISC' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
drivers/scsi/lpfc/lpfc_hw.h:922:19: error: field portName within 'struct _RNID' is less aligned than 'struct lpfc_name' and is usually due to 'struct _RNID' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
drivers/scsi/lpfc/lpfc_hw.h:923:19: error: field nodeName within 'struct _RNID' is less aligned than 'struct lpfc_name' and is usually due to 'struct _RNID' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]

From the git history, I can see that all the __packed annotations were done
specifically to avoid introducing implicit padding around the lpfc_name
instances, though this was probably the wrong approach.

To improve this, only annotate the one uint64_t field inside of lpfc_name
as packed, with an explicit 4-byte alignment, as is the default already on
the 32-bit x86 ABI but not on most others. With this, the other __packed
annotations can be removed again, as this avoids the incorrect padding.

Two other structures change their layout as a result of this change:

 - struct _LOGO never gained a __packed annotation even though it has the
   same alignment problem as the others but is not used anywhere in the
   driver today.

 - struct serv_param similarly has this issue, and it is used, my guess is
   that this is only an internal structure rather than part of a binary
   interface, so the padding has no negative effect here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230616090705.2623408-1-arnd@kernel.org
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-21 21:09:18 -04:00
Justin Tee
9cefd6e7e0 scsi: lpfc: Fix incorrect big endian type assignment in bsg loopback path
The kernel test robot reported sparse warnings regarding incorrect type
assignment for __be16 variables in bsg loopback path.

Change the flagged lines to use the be16_to_cpu() and cpu_to_be16() macros
appropriately.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230614175944.3577-1-justintee8345@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306110819.sDIKiGgg-lkp@intel.com/
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-14 21:57:48 -04:00
Gustavo A. R. Silva
a48e2c328c scsi: lpfc: Avoid -Wstringop-overflow warning
Prevent any potential integer wrapping issue, and avoid a
-Wstringop-overflow warning by using the check_mul_overflow() helper.

drivers/scsi/lpfc/lpfc.h:
837:#define LPFC_RAS_MIN_BUFF_POST_SIZE (256 * 1024)

drivers/scsi/lpfc/lpfc_debugfs.c:
2266 size = LPFC_RAS_MIN_BUFF_POST_SIZE * phba->cfg_ras_fwlog_buffsize;

this can wrap to negative if cfg_ras_fwlog_buffsize is large
enough. And even when in practice this is not possible (due to
phba->cfg_ras_fwlog_buffsize never being larger than 4[1]), the
compiler is legitimately warning us about potentially buggy code.

Fix the following warning seen under GCC-13:
In function ‘lpfc_debugfs_ras_log_data’,
    inlined from ‘lpfc_debugfs_ras_log_open’ at drivers/scsi/lpfc/lpfc_debugfs.c:2271:15:
drivers/scsi/lpfc/lpfc_debugfs.c:2210:25: warning: ‘memcpy’ specified bound between 18446744071562067968 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
 2210 |                         memcpy(buffer + copied, dmabuf->virt,
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 2211 |                                size - copied - 1);
      |                                ~~~~~~~~~~~~~~~~~~

Link: https://github.com/KSPP/linux/issues/305
Link: https://lore.kernel.org/linux-hardening/CABPRKS8zyzrbsWt4B5fp7kMowAZFiMLKg5kW26uELpg1cDKY3A@mail.gmail.com/ [1]
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZHkseX6TiFahvxJA@work
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-07 21:20:21 -04:00
Justin Tee
bb26224ed4 scsi: lpfc: Use struct_size() helper
Prefer struct_size() over open-coded versions of idiom:

sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count

where count is the max number of items the flexible array is supposed to
contain.

Link: https://github.com/KSPP/linux/issues/160
Co-developed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230531223319.24328-1-justintee8345@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-07 21:20:21 -04:00
Justin Tee
6e8a669e61 scsi: lpfc: Fix incorrect big endian type assignments in FDMI and VMID paths
The kernel test robot reported sparse warnings regarding the improper usage
of beXX_to_cpu() macros.

Change the flagged FDMI and VMID member variables to __beXX and redo the
beXX_to_cpu() macros appropriately.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230530191405.21580-1-justintee8345@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305261159.lTW5NYrv-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202305260751.NWFvhLY5-lkp@intel.com/
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:17:52 -04:00
Martin K. Petersen
21be4d0344 Merge patch series "lpfc: Update lpfc to revision 14.2.0.13"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.2.0.13

This patch set contains discovery bug fixes, firmware logging
improvements, clean up of CQ handling, and statistics collection
enhancements.

The patches were cut against Martin's 6.5/scsi-queue tree.

Link: https://lore.kernel.org/r/20230523183206.7728-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:15:07 -04:00
Justin Tee
b93f9eb8f4 scsi: lpfc: Copyright updates for 14.2.0.13 patches
Update copyrights to 2023 for files modified in the 14.2.0.13 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
48abf8b4b5 scsi: lpfc: Update lpfc version to 14.2.0.13
Update lpfc version to 14.2.0.13

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
93190ac1d4 scsi: lpfc: Enhance congestion statistics collection
Various improvements are made for collecting congestion statistics:

 - Pre-existing logic is replaced with use of an hrtimer for increased
   reporting accuracy.

 - Congestion timestamp information is reorganized into a single struct.

 - Common statistic collection logic is refactored into a helper routine.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
6a84d01508 scsi: lpfc: Clean up SLI-4 CQE status handling
There is mishandling of SLI-4 CQE status values larger than what is allowed
by the LPFC_IOCB_STATUS_MASK of 4 bits.  The LPFC_IOCB_STATUS_MASK is a
leftover SLI-3 construct and serves no purpose in SLI-4 path.

Remove the LPFC_IOCB_STATUS_MASK and clean up general CQE status handling
in SLI-4 completion paths.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
b9951e1cff scsi: lpfc: Change firmware upgrade logging to KERN_NOTICE instead of TRACE_EVENT
A firmware upgrade does not necessitate dumping of phba->dbg_log[] to kmsg
via LOG_TRACE_EVENT.  A simple KERN_NOTICE log message should suffice to
notify the user of successful or unsuccessful firmware upgrade.  As such,
firmware upgrade log messages are updated to use KERN_NOTICE instead of
LOG_TRACE_EVENT.  Additionally, in order to notify the user of reset type
for instantiating newly downloaded firmware, lpfc_log_msg's default
KERN_LEVEL is updated to 5 or KERN_NOTICE.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
9914a3d033 scsi: lpfc: Revise NPIV ELS unsol rcv cmpl logic to drop ndlp based on nlp_state
When NPIV ports are zoned to devices that support both initiator and target
mode, a remote device's initiated PRLI results in unintended final kref
clean up of the device's ndlp structure.  This disrupts NPIV ports'
discovery for target devices that support both initiator and target mode.

Modify the NPIV lpfc_drop_node clause such that we allow the ndlp to live
so long as it was in NLP_STE_PLOGI_ISSUE, NLP_STE_REG_LOGIN_ISSUE, or
NLP_STE_PRLI_ISSUE nlp_state.  This allows lpfc's issued PRLI completion
routine to determine if the final kref clean up should execute rather than
a remote device's issued PRLI.

Fixes: db651ec22524 ("scsi: lpfc: Correct used_rpi count when devloss tmo fires with no recovery")
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
73ded37869 scsi: lpfc: Account for fabric domain ctlr device loss recovery
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS
flag only handled Fabric Port Login, Fabric Controller, Management, and
Name Server addresses.

Fabric domain controllers fall under the same category for usage of the
NLP_IN_RECOV_POST_DEV_LOSS flag.  Add a default case statement to mark an
ndlp for device loss recovery.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
fd57a687d4 scsi: lpfc: Clear NLP_IN_DEV_LOSS flag if already in rediscovery
In dev_loss_tmo callback routine, we early return if the ndlp is in a state
of rediscovery.  This occurs when a target proactively PLOGIs or PRLIs
after an RSCN before the dev_loss_tmo callback routine is scheduled to run.
Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such
cases.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee
a4157aaf0f scsi: lpfc: Fix use-after-free rport memory access in lpfc_register_remote_port()
Due to a target port D_ID swap, it is possible for the
lpfc_register_remote_port() routine to touch post mortem fc_rport memory
when trying to access fc_rport->dd_data.

The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(),
where fc_remote_port_delete() reclaims fc_rport memory.

Remove the fc_rport->dd_data->pnode NULL assignment because the following
line reassigns ndlp->rport with an fc_rport object from
fc_remote_port_add() anyways.  The pnode nullification is superfluous.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:19 -04:00
Azeem Shaikh
73be26b12d scsi: lpfc: Replace all non-returning strlcpy() with strscpy()
strlcpy() reads the entire source buffer first.  This read may exceed the
destination size limit.  This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1].  In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230530155745.343032-1-azeemshaikh38@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 17:59:06 -04:00