Go to file
Michael J. Ruhl b4a4957d3d IB/hfi1: Fix destroy_qp hang after a link down
rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20 19:24:51 -06:00
arch ARM: SoC fixes 2018-09-02 10:44:28 -07:00
block block: bsg: move atomic_t ref_count variable to refcount API 2018-08-27 19:17:02 -06:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto DMAengine updates for v4.19-rc1 2018-08-18 15:55:59 -07:00
Documentation linux-watchdog 4.19-rc2 tag 2018-09-01 13:17:15 -07:00
drivers IB/hfi1: Fix destroy_qp hang after a link down 2018-09-20 19:24:51 -06:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-02 09:41:45 -07:00
include Devicetree updates for 4.19-rc2: 2018-09-02 10:56:01 -07:00
init Kbuild updates for v4.19 (2nd) 2018-08-25 13:40:38 -07:00
ipc ipc/util.c: update return value of ipc_getref from int to bool 2018-08-22 10:52:52 -07:00
kernel Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-02 10:09:35 -07:00
lib Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-02 09:41:45 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
mm notifier: Remove notifier header file wherever not used 2018-08-30 12:56:40 +02:00
net Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-02 09:41:45 -07:00
samples samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM 2018-08-16 21:55:32 +02:00
scripts Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-02 10:11:30 -07:00
security + Cleanups 2018-08-24 13:00:33 -07:00
sound Merge branch 'akpm' (patches from Andrew) 2018-08-23 19:20:12 -07:00
tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-08-27 11:59:39 -07:00
usr initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
virt ARM: Support for Group0 interrupts in guests, Cache management 2018-08-22 13:52:44 -07:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Fix up libata MAINTAINERS entry 2018-08-27 08:07:25 -07:00
Makefile Linux 4.19-rc2 2018-09-02 14:37:30 -07:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.