linux/drivers
Steffen Maier 6f2ce1c6af scsi: zfcp: fix rport unblock race with LUN recovery
It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.

However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early.  In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN.  So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.

This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished).  If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover.  This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.

Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile.  Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work).  For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing.  We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.

For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.

Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy().  We must avoid the
following sequence:

ERP thread                 rport_work      other context
-------------------------  --------------  --------------------------------
port is unblocked, rport still blocked,
 due to pending/running ERP action,
 so ((port->status & ...UNBLOCK) != 0)
 and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
                                           zfcp_erp_port_reopen()
                                           lock ERP
                                           zfcp_erp_port_block()
                                           port->status clear ...UNBLOCK
                                           unlock ERP
                                           zfcp_scsi_schedule_rport_block()
                                           port->rport_task = RPORT_DEL
                                           queue_work(rport_work)
                           zfcp_scsi_rport_work()
                           (port->rport_task != RPORT_ADD)
                           port->rport_task = RPORT_NONE
                           zfcp_scsi_rport_block()
                           if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
                           zfcp_scsi_rport_work()
                           (port->rport_task == RPORT_ADD)
                           port->rport_task = RPORT_NONE
                           zfcp_scsi_rport_register()
                           (port->rport == NULL)
                           rport = fc_remote_port_add()
                           port->rport = rport;

Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed.  In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN.  With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.

Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):

[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
 zfcp_erp_adapter_block()
  * clear UNBLOCK ---------------------------------------+
 zfcp_scsi_schedule_rports_block()                       |
 write_lock_irqsave(&adapter->erp_lock, flags);-------+  |
 zfcp_erp_action_enqueue()                            |  |
  zfcp_erp_setup_act()                                |  |
   * set ERP_INUSE -----------------------------------|--|--+
 write_unlock_irqrestore(&adapter->erp_lock, flags);--+  |  |
.context-switch.                                         |  |
zfcp_erp_thread()                                        |  |
 zfcp_erp_strategy()                                     |  |
  write_lock_irqsave(&adapter->erp_lock, flags);------+  |  |
  ...                                                 |  |  |
  zfcp_erp_strategy_check_target()                    |  |  |
   zfcp_erp_strategy_check_adapter()                  |  |  |
    zfcp_erp_adapter_unblock()                        |  |  |
     * set UNBLOCK -----------------------------------|--+  |
  zfcp_erp_action_dequeue()                           |     |
   * clear ERP_INUSE ---------------------------------|-----+
  ...                                                 |
  write_unlock_irqrestore(&adapter->erp_lock, flags);-+

Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved.  Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c48 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e1 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e066 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-14 15:17:20 -05:00
..
accessibility
acpi ACPI material for v4.10-rc1 2016-12-13 11:06:21 -08:00
amba
android
ata Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2016-12-13 15:30:50 -08:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
auxdisplay auxdisplay: ht16k33: select framebuffer helper modules 2016-11-30 13:04:31 +01:00
base Driver core patches for 4.10-rc1 2016-12-13 11:42:18 -08:00
bcma bcma: add Dell Inspiron 3148 2016-11-29 17:35:14 +02:00
block SCSI misc on 20161213 2016-12-14 10:49:33 -08:00
bluetooth Bluetooth: btmrvl: drop duplicate header slab.h 2016-12-08 07:44:56 +01:00
bus
cdrom
char xen: features and fixes for 4.10 rc0 2016-12-13 16:07:55 -08:00
clk clk: bcm: Fix 'maybe-uninitialized' warning in bcm2835_clock_choose_div_and_prate() 2016-12-12 11:25:40 -08:00
clocksource
connector
cpufreq Power management material for v4.10-rc1 2016-12-13 10:41:53 -08:00
cpuidle cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() 2016-12-06 02:25:03 +01:00
crypto Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:27:49 -08:00
dax device-dax: fix private mapping restriction, permit read-only 2016-12-06 17:42:37 -08:00
dca
devfreq devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks 2016-12-08 01:46:07 +01:00
dio
dma remoteproc updates for v4.10 2016-12-13 08:49:12 -08:00
dma-buf
edac EDAC, amd64: Fix improper return value 2016-12-04 10:51:42 +01:00
eisa
extcon
firewire
firmware arm64 updates for 4.10: 2016-12-13 16:39:21 -08:00
fmc
fpga fpga: Clarify how write_init works streaming modes 2016-11-29 15:51:49 -06:00
gpio Bulk GPIO changes for the v4.10 kernel cycle: 2016-12-13 07:54:57 -08:00
gpu Main pull request for drm for 4.10 kernel 2016-12-13 09:35:09 -08:00
hid HID: hid-sensor-hub: clear memory to avoid random data 2016-11-23 17:54:58 +01:00
hsi
hv uio-hv-generic: new userspace i/o driver for VMBus 2016-12-06 11:52:49 +01:00
hwmon hwmon: (g762) Fix overflows and crash seen when writing limit attributes 2016-12-12 11:33:44 -08:00
hwspinlock
hwtracing coresight: perf: Add a missing call to etm_free_aux 2016-11-29 20:05:32 +01:00
i2c Revert "i2c: octeon: thunderx: Limit register access retries" 2016-11-29 20:04:21 +01:00
ide
idle Power management material for v4.10-rc1 2016-12-13 10:41:53 -08:00
iio iio: magnetometer: separate the values of attributes based on their usage type for HID compass sensor 2016-11-24 20:41:30 +00:00
infiniband
input xen: features and fixes for 4.10 rc0 2016-12-13 16:07:55 -08:00
iommu Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:25:04 -08:00
ipack
irqchip arm64 updates for 4.10: 2016-12-13 16:39:21 -08:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-10 16:21:55 -05:00
leds leds: pca955x: Add ACPI support 2016-12-02 09:31:50 +01:00
lguest
lightnvm Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
macintosh
mailbox
mcb
md . various fixes and improvements to request-based DM and DM multipath 2016-12-14 11:01:00 -08:00
media USB/PHY patches for 4.10-rc1 2016-12-13 11:10:36 -08:00
memory
memstick Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
message SCSI misc on 20161213 2016-12-14 10:49:33 -08:00
mfd Staging/IIO patches for 4.10-rc1 2016-12-13 11:35:00 -08:00
misc Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
mmc MMC core: 2016-12-14 10:55:56 -08:00
mtd
net scsi: cxgb4i: libcxgbi: cxgb4: add T6 iSCSI completion feature 2016-12-14 15:09:13 -05:00
nfc
ntb
nubus
nvdimm These are the documentation changes for 4.10. 2016-12-12 21:58:13 -08:00
nvme Just one simple change from Andrzej to drop the pointless return value 2016-12-14 10:31:25 -08:00
nvmem
of Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
oprofile oprofile/nmi timer: Convert to hotplug state machine 2016-12-02 00:52:34 +01:00
parisc
parport
pci Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2016-12-13 16:33:33 -08:00
pcmcia drivers/pcmcia/m32r_pcc.c: check return from add_pcc_socket 2016-12-12 18:55:06 -08:00
perf
phy SCSI misc on 20161213 2016-12-14 10:49:33 -08:00
pinctrl Bulk pin control changes for the v4.10 kernel cycle: 2016-12-13 07:59:10 -08:00
platform Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
pnp
power
powercap powercap / RAPL: Add Knights Mill CPUID 2016-11-30 23:41:33 +01:00
pps
ps3
ptp Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:56:15 -08:00
pwm pwm: Fix device reference leak 2016-11-29 16:43:24 +01:00
rapidio
ras
regulator Merge remote-tracking branches 'regulator/topic/tps65086' and 'regulator/topic/twl' into regulator-next 2016-12-12 12:17:31 +00:00
remoteproc remoteproc: qcom_adsp_pil: select qcom_scm 2016-12-09 16:16:56 -08:00
reset
rpmsg rpmsg updates for v4.10 2016-12-13 08:52:45 -08:00
rtc Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:56:15 -08:00
s390 scsi: zfcp: fix rport unblock race with LUN recovery 2016-12-14 15:17:20 -05:00
sbus
scsi scsi: libcxgbi: return error if interface is not up 2016-12-14 15:11:53 -05:00
sfi
sh lib: radix-tree: check accounting of existing slot replacement users 2016-12-12 18:55:08 -08:00
sn
soc This is a fairly quiet release. We don't have any patches to the core 2016-12-13 08:54:27 -08:00
spi Merge remote-tracking branches 'spi/topic/spidev', 'spi/topic/sunxi', 'spi/topic/ti-qspi', 'spi/topic/topcliff-pch' and 'spi/topic/xlp' into spi-next 2016-12-12 15:54:20 +00:00
spmi
ssb
staging Staging/IIO patches for 4.10-rc1 2016-12-13 11:35:00 -08:00
target SCSI misc on 20161213 2016-12-14 10:49:33 -08:00
tc
thermal Power management material for v4.10-rc1 2016-12-13 10:41:53 -08:00
thunderbolt Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
tty Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-12-13 12:59:57 -08:00
uio uio-hv-generic: store physical addresses instead of virtual 2016-12-10 14:57:58 +01:00
usb Just one simple change from Andrzej to drop the pointless return value 2016-12-14 10:31:25 -08:00
uwb
vfio vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages 2016-12-06 12:35:53 -07:00
vhost Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 10:48:02 -08:00
video xen: features and fixes for 4.10 rc0 2016-12-13 16:07:55 -08:00
virt
virtio
vlynq
vme
w1
watchdog Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
xen xen: features and fixes for 4.10 rc0 2016-12-13 16:07:55 -08:00
zorro
Kconfig
Makefile