IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This series consists of the usual driver updates (ufs, qla2xxx,
target, smartpqi, lpfc, mpt3sas). The core change causing the most
churn was replacing the command request field request with a macro,
allowing us to offset map to it and remove the redundant field; the
same was also done for the tag field. The most impactful change is
the final removal of scsi_ioctl, which has been deprecated for over a
decade.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYTD/TiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdUkAQCjb3Ux
4K9438mMelHlzM4er1S1IJ0WNnvObaVMNO9LBwD+JUz+rHsrKvuEX9j3g3C3u6JH
hC3BUEW8f2LLnujWanQ=
=lC5o
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, qla2xxx,
target, smartpqi, lpfc, mpt3sas).
The core change causing the most churn was replacing the command
request field request with a macro, allowing us to offset map to it
and remove the redundant field; the same was also done for the tag
field.
The most impactful change is the final removal of scsi_ioctl, which
has been deprecated for over a decade"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
scsi: ufs: ufs-exynos: Fix static checker warning
scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
scsi: lpfc: Copyright updates for 14.0.0.1 patches
scsi: lpfc: Update lpfc version to 14.0.0.1
scsi: lpfc: Add bsg support for retrieving adapter cmf data
scsi: lpfc: Add cmf_info sysfs entry
scsi: lpfc: Add debugfs support for cm framework buffers
scsi: lpfc: Add support for maintaining the cm statistics buffer
scsi: lpfc: Add rx monitoring statistics
scsi: lpfc: Add support for the CM framework
scsi: lpfc: Add cmfsync WQE support
scsi: lpfc: Add support for cm enablement buffer
scsi: lpfc: Add cm statistics buffer support
scsi: lpfc: Add EDC ELS support
scsi: lpfc: Expand FPIN and RDF receive logging
scsi: lpfc: Add MIB feature enablement support
scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
scsi: fc: Add EDC ELS definition
...
Allow abbreviated cm framework status information to be obtained via sysfs.
Link: https://lore.kernel.org/r/20210816162901.121235-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add the logic to move the congestion management and event information into
the cmd statistics buffer maintained for the adapter. The update includes
rolling up values for the last minute, hour, and day information.
Link: https://lore.kernel.org/r/20210816162901.121235-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver provides overwatch of the cm behavior by maintaining a set of rx
I/O statistics. This information is also used in later updating of the cm
statistics buffer.
Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Complete the enablement of the cm framework feature in the adapter. Perform
the following:
- Detect the presence of the congestion management framework feature.
When the cm framework is present:
- Issue the SET_FEATURE command to enable the feature.
- Register the cm statistics buffer with the adapter.
- Read the cm enablement buffer to determine the cm framework state for cm
management.
When cm management is enabled:
- Monitor all FPIN and congestion signalling events, incrementing
counters.
- Regularly sync with the adapter to communicate congestion events and to
receive an rx request limit.
- Monitor requests for rx data and ensure that no more than the
adapter prescribed limit is issued on the link. If the limit is
exceeded, SCSI and/or NVMe traffic is temporarily suspended.
- Maintain the minute, hourly, daily statistics buffer.
- Monitor for congestion enablement change events, causing a reread of the
enablement buffer and acting on any change in enablement.
And:
- Add teardown logic, including buffer deregistration, on adapter
detachment or reset.
Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As part of the cmf framework, the firmware maintains a table with
congestion related state information, specifically whether enabled and if
enabled, whether monitoring or actively managing congestion.
Add definition of the table and add support to read the table from the
adapter and determine if it is enabled. In support of this, the READ_OBJECT
mailbox command definition is added to the driver.
Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The cmf framework requires the driver to maintain a cm statistics table,
accessible inband, of congestion related statistics that are reported per
minute, rolled up to per hour, and rolled up again per day. Several days
worth may be maintained. The table is registered with the adapter when the
MIB feature is enabled.
Add definition of the table and add support to register the table with the
adapter. Includes definition and initialization of event counters that are
later added to the statistics table.
Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.
Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.
Implement handlers for congestion signals from the fabric and maintain
statistics for them.
Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
MIB support is currently limited to detecting support in the adapter and
ensuring FDMI support is enabled if present. For the new framework MIB
support also requires active enablement of support via the SET_FEATURES
command with the firmware.
Rework the MIB detection and enablement for the following:
- Move detection away from the get_sli4_parameters routine, and into the
hba_setup path. get_sli4_parameters is only called once at attachment
while hba_setup is called as part of any SLI port reset path. This
ensures detection after firmware download.
- Update SET_FEATURES mbx command for the MIB enablement feature and add
support for the feature.
- Create the cmf_setup routine to encapsulate the detection of MIB support
and perform the enablement of the MIB support feature.
Link: https://lore.kernel.org/r/20210816162901.121235-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update routines to support 256 Gb link speed for LPe37000/LPe38000
adapters. 256 Gb speeds can be seen on trunk links.
Link: https://lore.kernel.org/r/20210722221721.74388-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Support for Topology and RAS logging capabilities were qualified by PCIe
device ID checks necessitating additional driver changes for new device
IDs.
Reduce reliance on specific PCIe device IDs by substituting checks for SLI
family information. This automatically picks up support on the newest
hardware.
Link: https://lore.kernel.org/r/20210722221721.74388-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update supported pci_device_id table to include the values for the G7+ ASIC
Device ID utilized by LPe37xxx and LPe38xxx series of adapters. The
default reporting string will be "LPe38000".
Link: https://lore.kernel.org/r/20210722221721.74388-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The SLI4 interface changed the manner used to indicate PBDE support.
Rework the driver to check for PBDE support via the PBDE feature bit in
COMMON_GET_SLI4_PARAMETERS.
Link: https://lore.kernel.org/r/20210707184351.67872-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mailbox commands sent via ioctl/bsg from user applications may be
interrupted from processing by a concurrently triggered PCI function
reset. The command will not generate a completion due to the reset. This
results in a user application hang waiting for the mailbox command to
complete.
Resolve by changing the function reset handler to detect that there was an
outstanding mailbox command and simulate a mailbox completion. Add some
additional debug when a mailbox command times out.
Link: https://lore.kernel.org/r/20210707184351.67872-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In lpfc_offline_prep() an RPI is freed and nlp_rpi set to 0xFFFF before
calling lpfc_unreg_rpi(). Unfortunately, lpfc_unreg_rpi() uses nlp_rpi to
index the sli4_hba.rpi_ids[] array.
In lpfc_offline_prep(), unreg rpi before freeing the rpi.
Link: https://lore.kernel.org/r/20210707184351.67872-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update comment headers for functions lpfc_vmid_cmd and lpfc_vmid_poll.
Link: https://lore.kernel.org/r/20210707184351.67872-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are instances when trace event logs are triggered from an interrupt
context. The trace event log may attempt to alloc memory causing scheduling
while atomic bug call traces.
Remove the need for the kmalloc'ed vport array when checking the
log_verbose flag, which eliminates the need for any allocation.
Link: https://lore.kernel.org/r/20210707184351.67872-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The NVMe support indicator in log message 6422 is displaying a field that
was initialized but never set to indicate NVMe support. Remove obsolete
nvme_support element from the lpfc_hba structure and change log message to
display NVMe support status as reported in SLI4 Config Parameters mailbox
command.
Link: https://lore.kernel.org/r/20210707184351.67872-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Here is the small set of driver core and debugfs updates for 5.14-rc1.
Included in here are:
- debugfs api cleanups (touched some drivers)
- devres updates
- tiny driver core updates and tweaks
Nothing major in here at all, and all have been in linux-next for a
while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM7jA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yloDQCfZOlLYXF+2KgXJQqevNnRiu7/B1gAn3aCX6xh
UWVUfu5LDIXi2uFERRT1
=Ze3R
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core changes from Greg KH:
"Here is the small set of driver core and debugfs updates for 5.14-rc1.
Included in here are:
- debugfs api cleanups (touched some drivers)
- devres updates
- tiny driver core updates and tweaks
Nothing major in here at all, and all have been in linux-next for a
while with no reported issues"
* tag 'driver-core-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
docs: ABI: testing: sysfs-firmware-memmap: add some memmap types.
devres: Enable trace events
devres: No need to call remove_nodes() when there none present
devres: Use list_for_each_safe_from() in remove_nodes()
devres: Make locking straight forward in release_nodes()
kernfs: move revalidate to be near lookup
drivers/base: Constify static attribute_group structs
firmware_loader: remove unneeded 'comma' macro
devcoredump: remove contact information
driver core: Drop helper devm_platform_ioremap_resource_wc()
component: Rename 'dev' to 'parent'
component: Drop 'dev' argument to component_match_realloc()
device property: Don't check for NULL twice in the loops
driver core: auxiliary bus: Fix typo in the docs
drivers/base/node.c: make CACHE_ATTR define static DEVICE_ATTR_RO
debugfs: remove return value of debugfs_create_ulong()
debugfs: remove return value of debugfs_create_bool()
scsi: snic: debugfs: remove local storage of debugfs files
b43: don't save dentries for debugfs
b43legacy: don't save dentries for debugfs
...
Implement timeout functionality for the VMID. After the set time period of
inactivity, the VMID is deregistered from the switch.
Link: https://lore.kernel.org/r/20210608043556.274139-12-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add supporting datastructures for mailbox command which helps in
determining if the firmware supports appid. Allocate resources for VMID at
initialization time and clean them up on removal.
Link: https://lore.kernel.org/r/20210608043556.274139-7-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.
Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.
To resolve the issue, do the following:
- Restore the RPI release code, but move the location to so that it is in
line with the new node cleanup design.
- NPIV ports now release the RPI and drop the node when the caller sets
the NLP_RELEASE_RPI flag.
- Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
release of RPI to free pool.
- Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
completed.
- Stop offline_prep from skipping nodes that are UNUSED. The RPI may
not have been released.
- Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.
- Fixed up debugfs RPI displays for better debugging.
Fixes: a70e63eee1c1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With commit 312c004d36ce ("[PATCH] driver core: replace "hotplug" by
"uevent"") already in the tree over a decade, update the name of
FW_ACTION defines to follow semantics, and reflect what the defines are
really meant for, i.e. whether or not generate user space event.
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210425020024.28057-1-shawn.guo@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dump command for reading a region passes a requested read length
specified in words (4-byte units). The response overwrites the same field
with the actual number of bytes read.
The mailbox handler for DUMP which reads VPD data (region 23) is treating
the response field as if it were still a word_cnt, thus multiplying it by 4
to set the read's "length". Given the read value was calculated based on
the size of the read buffer, the longer response length runs off the end of
the buffer.
Fix by reworking the code to use the response field as a byte count.
Link: https://lore.kernel.org/r/20210421234511.102206-1-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Code inspection showed lpfc was using three different pointer formats when
logging discovery object pointers.
Standardize the pointer format to x%px.
Note: %px use is limited to discovery objects in order to aid core
analysis.
Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SLI-4 does not contain a PORT_CAPABILITIES mailbox command (only SLI-3
does, and SLI-3 doesn't use it), yet there are SLI-4 code paths that have
code to issue the command. The command will always fail.
Remove the code for the mailbox command and leave only the resulting
"failure path" logic.
Link: https://lore.kernel.org/r/20210412013127.2387-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In SLI-4, when performing a mailbox command with MBX_POLL, the driver uses
the BMBX register to send the command rather than the MQ. A flag is set
indicating the BMBX register is active and saves the mailbox job struct
(mboxq) in the mbox_active element of the adapter. The routine then waits
for completion or timeout. The mailbox job struct is not freed by the
routine. In cases of timeout, the adapter will be reset. The
lpfc_sli_mbox_sys_flush() routine will clean up the mbox in preparation for
the reset. It clears the BMBX active flag and marks the job structure as
MBX_NOT_FINISHED. But, it never frees the mboxq job structure. Expectation
in both normal completion and timeout cases is that the issuer of the mbx
command will free the structure. Unfortunately, not all calling paths are
freeing the memory in cases of error.
All calling paths were looked at and updated, if missing, to free the mboxq
memory regardless of completion status.
Link: https://lore.kernel.org/r/20210412013127.2387-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove hbalock dependency for lpfc_abts_els_sgl_list and
lpfc_abts_nvmet_ctx_list. The lists are adaquately synchronized with the
sgl_list_lock and abts_nvmet_buf_list_lock.
Link: https://lore.kernel.org/r/20210412013127.2387-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets,
update the copyright for 2021.
Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing recent discovery node rework, several items were seen that
could be done better with respect to the new trace event logic.
1) in the following msg:
kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
If cnt is zero in the 1st message, there is no reason to display the
1st message, which is just giving start/end positioning.
Fix by not displaying message if cnt is 0.
2) If the driver is loaded with module log verbosity off, and later a
single NPIV host instance verbosity is enabled via sysfs, it enables
messages on all instances. This is due to the trace log verbosity checks
(lpfc_dmp_dbg) looking at the phba only. It should look at the phba and
the vport.
Fix by enabling a check on both phba and vport.
3) in the following messages:
2904 Firmware Dump Image Present on Adapter
2887 Reset Needed: Attempting Port Recovery...
These messages are not necessary for the trace event log, which is
primarily for discovery.
Fix by changing log level on these 2 messages to LOG_SLI.
Link: https://lore.kernel.org/r/20210104180240.46824-15-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.
Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware. If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.
The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.
Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a mailbox command times out, the SLI port is deemed in error and the
port is reset. The HBA cleanup is not returning I/Os to the NVMe layer
before the port is unregistered. This is due to the HBA being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
rather than an general adapter reset routine. The mailbox timeout handler
mailbox handler only cleaned up SCSI I/Os.
Fix by reworking the mailbox handler to:
- After handling the mailbox error, detect the board is already in
failure (may be due to another error), and leave cleanup to the
other handler.
- If the mailbox command timeout is initial detector of the port error,
continue with the board cleanup and marking the adapter offline
(!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
reset adapter routine that is subsequently invoked, will clean up the
I/Os.
- Have the reset adapter routine flush all NVMe and SCSI I/Os if the
adapter has been marked failed (!SLI_ACTIVE).
- Rework the NVMe I/O terminate routine to take a status code to fail the
I/O with and update so that cleaned up I/O calls the wqe completion
routine. Currently it is bypassing the wqe cleanup and calling the NVMe
I/O completion directly. The wqe completion routine will take care of
data structure and node cleanup then call the NVMe I/O completion
handler.
Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the lpfc offline routine, called for various reasons such as sysfs
attribute, driver unload, or port error, the driver is calling
__lpfc_cpuhp_remove() to destroy the hot plug data. If the offline routine
is called while the driver is in the process of being unloaded, a request
using lpfc_cpuhp_remove() is also made from lpfc_sli4_hba_unset(). The
cpuhp elements are no longer valid when the second removal request is made.
Fix by only calling the cpuhp removal once when the adapter is in the
process of unloading.
Link: https://lore.kernel.org/r/20210104180240.46824-8-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the port is configured for NVME and has any outstanding IOs when a FW
reset is requesteed, outstanding I/Os are not properly cleaned up. This
causes the fw download request to fail.
Fix by clearing the LPFC_SLI_ACTIVE flag to signify the I/O must be
manually flushed by the driver on port reset.
Link: https://lore.kernel.org/r/20210104180240.46824-7-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A very long time ago, there was a feature: auto sli mode. It gave the user
the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in,
or even force SLI2 mode if configured. Because of the convoluted logic,
the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should
have been called only once. Additionally, the driver no longer supports
SLI-2, so only SLI-3 mode should be allowed.
The following changes were made:
- Force module parameter to SLI3 only.
- Rip out redundant CONFIG_PORT mbox commands.
- Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.
- Added changes for offline to online behavior
Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers should do only device-specific jobs. But in general, drivers using
legacy PCI PM framework for .suspend()/.resume() have to manage many PCI
PM-related tasks themselves which can be done by PCI Core itself. This
brings extra load on the driver and it directly calls PCI helper functions
to handle them.
Switch to the new generic framework by updating function signatures and
define a "struct dev_pm_ops" variable to bind PM callbacks. Also, remove
unnecessary calls to the PCI Helper functions along with the legacy
.suspend & .resume bindings.
Link: https://lore.kernel.org/r/20201102164730.324035-18-vaibhavgupta40@gmail.com
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove local variables that are set but not used.
Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com
Fixes: c6adba150191 ("scsi: lpfc: Rework remote port lock handling")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is currently using SLI-4 WQE templates only for NVMe. Refactor
the template and the placement of the service routine so that it can be
used by both SCSI and NVMe.
Link: https://lore.kernel.org/r/20201115192646.12977-12-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing initiator-side cable swaps with NPIV, oops occur. The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.
The following fixes were made:
- Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
case for vports that didn't have them clean up just like the physical
port.
- Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
node is being removed in the context of a reference count release and a
mailbox command can't be issued at this point.
- Remove special case handling in the default mailbox completion handler
that allowed the skipping of a node reference. Now, reference counting
always requires the removal of the reference.
- Move the location of the DEVICE_RM event is done during LOGO handling as
the driver has additional work to do on the ndlp before puts/releases
can be performed.
Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.
Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.
Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Due to bug history and code review, the node reference counting approach in
the driver isn't implemented consistently with how the scsi and nvme
transport perform registrations and unregistrations and their callbacks.
This resulted in many bad/stale node pointers.
Reword the driver so that reference handling is performed as follows:
- The initial node reference is taken on structure allocation
- Take a reference on any add/register call to the transport
- Remove a reference on any delete/unregister call to the transport
- After the node has fully removed from both the SCSI and NVMEe transports
(dev_loss_callbacks have called back) call the discovery engine
DEVICE_RM event which will remove the final reference and release the
node structure.
- Alter dev_loss handling when a vport or base port is unloading.
- Remove the put_node handling - no longer needed.
- Rewrite the vport_delete handling on reference counts. Part of this
effort was driven from the FDISC not registering with the transport and
disrupting the model for node reference counting.
- Deleted lpfc_nlp_remove. Pushed it's remaining ops into
lpfc_nlp_release.
- Several other small code cleanups.
Link: https://lore.kernel.org/r/20201115192646.12977-5-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc driver is calling get_device and put_device on scsi_fc_transport
device structure. When this code was removed, the driver triggered an oops
in "scsi_is_host_dev" when the first SCSI target was unregistered from the
transport.
The reason the calls were necessary is that the driver is calling
scsi_remove_host too early, before the target rports are unregistered and
the scsi devices disconnected from the scsi_host. The fc_host was torn
down during fc_remove_host.
Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to
after the nodes are cleaned up. Remove the get_device and put_device calls
and the supporting code.
Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a remote port is disconnected and disappears, its node structure
(ndlp) stays allocated and on a vport node list. While on the list it can
be matched, thus requires validation checks on state to be added in
numerous code paths. If the node comes back, its possible for there to be
multiple node structures for the same device on the vport node list. There
is no reason to keep the node structure around after it is no longer in
existence, and the current implementation creates problems for itself
(multiple nodes) and lots of unnecessary code for state validation.
Additionally, the reference taking on the node structure didn't follow the
normal model used by the kernel kref api. It included lots of odd logic to
match state with reference count. The combination of this odd logic plus
the way it was implicitly used in the discovery engine made its reference
taking implementation suspect and extremely hard to follow.
Change the driver such that the reference taking routines are now normal
ref increments/decrements and callout on refcount=0.
With this in place, the rework can be done such that the node structure is
fully removed and deallocated when the remote port no longer exists and all
references are removed. This removal logic, and the basic ref counting are
intrically tied, thus in a single patch.
Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Created new attribute lpfc_enable_mi, which by default is enabled.
Add command definition bits for SLI-4 parameters that recognize whether the
adapter has MIB information support and what revision of MIB data. Using
the adapter information, register vendor-specific MIB support with FDMI.
The registration will be done every link up.
During FDMI registration, encountered a couple of errors when reverting to
FDMI rev1. Code needed to exist once reverting. Fixed these.
Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver supports arbitrarily large scatter-gather lists and the current
value for max_sectors is limiting.
Change max_sectors to the largest value. This was actually done prior but
it only corrected one template and that template was later removed.
So change the remaining 2 templates. Other areas which hard-set the sectors
value should be inheriting what is in the template.
Link: https://lore.kernel.org/r/20201020202719.54726-7-james.smart@broadcom.com
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following call trace was seen during HBA reset testing:
BUG: scheduling while atomic: swapper/2/0/0x10000100
...
Call Trace:
dump_stack+0x19/0x1b
__schedule_bug+0x64/0x72
__schedule+0x782/0x840
__cond_resched+0x26/0x30
_cond_resched+0x3a/0x50
mempool_alloc+0xa0/0x170
lpfc_unreg_rpi+0x151/0x630 [lpfc]
lpfc_sli_abts_recover_port+0x171/0x190 [lpfc]
lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc]
lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc]
lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc]
__lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc]
__lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc]
lpfc_cq_poll_hdler+0x1a/0x30 [lpfc]
irq_poll_softirq+0xc7/0x100
__do_softirq+0xf5/0x280
call_softirq+0x1c/0x30
do_softirq+0x65/0xa0
irq_exit+0x105/0x110
do_IRQ+0x56/0xf0
common_interrupt+0x16a/0x16a
With the conversion to blk_io_poll for better interrupt latency in normal
cases, it introduced this code path, executed when I/O aborts or logouts
are seen, which attempts to allocate memory for a mailbox command to be
issued. The allocation is GFP_KERNEL, thus it could attempt to sleep.
Fix by creating a work element that performs the event handling for the
remote port. This will have the mailbox commands and other items performed
in the work element, not the irq. A much better method as the "irq" routine
does not stall while performing all this deep handling code.
Ensure that allocation failures are handled and send LOGO on failure.
Additionally, enlarge the mailbox memory pool to reduce the possibility of
additional allocation in this path.
Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com
Fixes: 317aeb83c92b ("scsi: lpfc: Add blk_io_poll support for latency improvment")
Cc: <stable@vger.kernel.org> # v5.9+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some systems are reporting the following log message during driver unload
or system shutdown:
ics_rtas_set_affinity: No online cpus in the mask
A prior commit introduced the writing of an empty affinity mask in calls to
irq_set_affinity_hint() when disabling interrupts or when there are no
remaining online CPUs to service an eq interrupt. At least some ppc64
systems are checking whether affinity masks are empty or not.
Do not call irq_set_affinity_hint() with an empty CPU mask.
Fixes: dcaa21367938 ("scsi: lpfc: Change default IRQ model on AMD architectures")
Link: https://lore.kernel.org/r/20200828175332.130300-2-james.smart@broadcom.com
Cc: <stable@vger.kernel.org> # v5.5+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is the set of patches which arrived too late to stabilise in
-next for the first pull. It's really just an lpfc driver update and
an assortment of minor fixes, all in drivers. The only core update is
to the zone block device driver, which isn't the one most people use.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXza9yiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYyiAQCUJJ8m
vraqyBzjAPWpPsoNZmrvUxciXEDhoLrFo4Nl8wEAoJPVjUd79dXCJHB1Oq+MaeaF
LODgv2bO1F4TCf59F+4=
=+wdb
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is the set of patches which arrived too late to stabilise in
-next for the first pull.
It's really just an lpfc driver update and an assortment of minor
fixes, all in drivers. The only core update is to the zone block
device driver, which isn't the one most people use"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Update lpfc version to 12.8.0.3
scsi: lpfc: Fix LUN loss after cable pull
scsi: lpfc: Fix validation of bsg reply lengths
scsi: lpfc: Fix retry of PRLI when status indicates its unsupported
scsi: lpfc: Fix oops when unloading driver while running mds diags
scsi: lpfc: Fix RSCN timeout due to incorrect gidft counter
scsi: lpfc: Fix no message shown for lpfc_hdw_queue out of range value
scsi: lpfc: Fix FCoE speed reporting
scsi: lpfc: Add missing misc_deregister() for lpfc_init()
scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport
scsi: scsi_transport_sas: Add spaces around binary operator "|"
scsi: sd_zbc: Improve zone revalidation
scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del()
scsi: target: Make iscsit_register_transport() return void