Go to file
Steffen Maier 8c9db6679b scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
Suppose we have an environment with a number of non-NPIV FCP devices
(virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical
FCP channel (HBA port) and its I_T nexus. Plus a number of storage target
ports zoned to such shared channel. Now one target port logs out of the
fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port
recovery depending on the ADISC result. This happens on all such FCP
devices (in different Linux images) concurrently as they all receive a copy
of this RSCN. In the following we look at one of those FCP devices.

Requests other than FSF_QTCB_FCP_CMND can be slow until they get a
response.

Depending on which requests are affected by slow responses, there are
different recovery outcomes. Here we want to fix failed recoveries on port
or adapter level by avoiding recovery requests that can be slow.

We need the cached N_Port_ID for the remote port "link" test with ADISC.
Just before sending the ADISC, we now intentionally forget the old cached
N_Port_ID. The idea is that on receiving an RSCN for a port, we have to
assume that any cached information about this port is stale.  This forces a
fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for
the same port. Since we typically can still communicate with the nameserver
efficiently, we now reach steady state quicker: Either the nameserver still
does not know about the port so we stop recovery, or the nameserver already
knows the port potentially with a new N_Port_ID and we can successfully and
quickly perform open port recovery.  For the one case, where ADISC returns
successfully, we re-initialize port->d_id because that case does not
involve any port recovery.

This also solves a problem if the storage WWPN quickly logs into the fabric
again but with a different N_Port_ID. Such as on virtual WWPN takeover
during target NPIV failover.
[https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the
RSCN from the storage FDISC was ignored by zfcp and we could not
successfully recover the failover. On some later failback on the storage,
we could have been lucky if the virtual WWPN got the same old N_Port_ID
from the SAN switch as we still had cached.  Then the related RSCN
triggered a successful port reopen recovery.  However, there is no
guarantee to get the same N_Port_ID on NPIV FDISC.

Even though NPIV-enabled FCP devices are not affected by this problem, this
code change optimizes recovery time for gone remote ports as a side effect.
The timely drop of cached N_Port_IDs prevents unnecessary slow open port
attempts.

While the problem might have been in code before v2.6.32 commit
799b76d09a ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix
depends on the gid_pn_work introduced with that commit, so we mark it as
culprit to satisfy fix dependencies.

Note: Point-to-point remote port is already handled separately and gets its
N_Port_ID from the cached peer_d_id. So resetting port->d_id in general
does not affect PtP.

Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com
Fixes: 799b76d09a ("[SCSI] zfcp: Decouple gid_pn requests from erp")
Cc: <stable@vger.kernel.org> #2.6.32+
Suggested-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-01-24 23:30:27 -05:00
arch bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
block bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
certs certs: Fix build error when CONFIG_MODULE_SIG_KEY is empty 2022-01-23 00:08:44 +09:00
crypto lib/crypto: add prompts back to crypto libraries 2022-01-18 13:03:55 +01:00
Documentation Merge branch 'akpm' (patches from Andrew) 2022-01-22 11:28:23 +02:00
drivers scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices 2022-01-24 23:30:27 -05:00
fs bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
include bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
init lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() 2022-01-22 08:33:37 +02:00
ipc proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
kernel ftrace: Fix s390 breakage from sorting mcount tables 2022-01-23 08:07:02 +02:00
lib bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
net bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
samples Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
scripts Devicetree fixes for v5.17, take 1: 2022-01-22 09:52:17 +02:00
security fs.idmapped.v5.17 2022-01-11 14:26:55 -08:00
sound proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
tools perf tools changes for v5.17: 2nd batch 2022-01-23 08:14:21 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-01-22 21:48:45 +09:00
virt Generic: 2022-01-22 09:40:01 +02:00
.clang-format genirq/msi: Make interrupt allocation less convoluted 2021-12-16 22:22:20 +01:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap RISCV: 2022-01-16 16:15:14 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Removing Ohad from remoteproc/rpmsg maintenance 2021-12-08 10:09:40 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
Makefile Linux 5.17-rc1 2022-01-23 10:12:53 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.