Go to file
Tao Chiu d4060d2be1 nvme-pci: fix controller reset hang when racing with nvme_timeout
reset_work() in nvme-pci may hang forever in the following scenario:
1) A reset caused by a command timeout occurs due to a controller being
   temporarily irresponsive.
2) nvme_reset_work() restarts admin queue at nvme_alloc_admin_tags(). At
   the same time, a user-submitted admin command is queued and waiting
   for completion. Then, reset_work() changes its state to CONNECTING,
   and submits an identify command.
3) However, the controller does still not respond to any command,
   causing a timeout being fired at the user-submitted command.
   Unfortunately, nvme_timeout() does not see the completion on cq, and
   any timeout that takes place under CONNECTING state causes a
   controller shutdown.
4) Normally, the identify command in reset_work() would be canceled with
   SC_HOST_ABORTED by nvme_dev_disable(), then reset_work can tear down
   the controller accordingly. But the controller happens to return
   online and respond the identify command before nvme_dev_disable()
   should have been reaped it off.
5) reset_work() continues to setup_io_queues() as it observes no error
   in init_identify(). However, the admin queue has already been
   quiesced in dev_disable(). Thus, any following commands would be
   blocked forever in blk_execute_rq().

This can be fixed by restricting usercmd commands when controller is not
in a LIVE state in nvme_queue_rq(), as what has been done previously in
fabrics.

```
nvme_reset_work():                     |
    nvme_alloc_admin_tags()            |
                                       | nvme_submit_user_cmd():
    nvme_init_identify():              |     ...
        __nvme_submit_sync_cmd():      |
            ...                        |     ...
---------------------------------------> nvme_timeout():
(Controller starts reponding commands) |     nvme_dev_disable(, true):
    nvme_setup_io_queues():            |
        __nvme_submit_sync_cmd():      |
            (hung in blk_execute_rq    |
             since run_hw_queue sees   |
             queue quiesced)           |

```

Signed-off-by: Tao Chiu <taochiu@synology.com>
Signed-off-by: Cody Wong <codywong@synology.com>
Reviewed-by: Leon Chien <leonchien@synology.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-05-04 09:35:50 +02:00
arch The x86 MM changes in this cycle were: 2021-04-29 11:41:43 -07:00
block bio: limit bio max size 2021-05-03 11:00:11 -06:00
certs certs: add 'x509_revocation_list' to gitignore 2021-04-26 10:48:07 -07:00
crypto for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
Documentation - removed get_fs/set_fs 2021-04-29 11:28:08 -07:00
drivers nvme-pci: fix controller reset hang when racing with nvme_timeout 2021-05-04 09:35:50 +02:00
fs \n 2021-04-29 11:06:13 -07:00
include bio: limit bio max size 2021-05-03 11:00:11 -06:00
init Merge branch 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2021-04-27 18:47:42 -07:00
ipc
kernel The x86 MM changes in this cycle were: 2021-04-29 11:41:43 -07:00
lib - removed get_fs/set_fs 2021-04-29 11:28:08 -07:00
LICENSES
mm \n 2021-04-29 11:06:13 -07:00
net Scheduler updates for this cycle are: 2021-04-28 13:33:57 -07:00
samples
scripts Devicetree updates for v5.13: 2021-04-28 15:50:24 -07:00
security Devicetree updates for v5.13: 2021-04-28 15:50:24 -07:00
sound - Core Frameworks 2021-04-28 15:59:13 -07:00
tools Perf events changes in this cycle were: 2021-04-28 13:03:44 -07:00
usr
virt
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap It's been a relatively busy cycle in docsland, though more than usually 2021-04-26 13:22:43 -07:00
COPYING
CREDITS - Core Frameworks 2021-04-28 15:59:13 -07:00
Kbuild
Kconfig
MAINTAINERS Here's a collection of largely clk driver updates for the merge window. The 2021-04-28 17:13:56 -07:00
Makefile CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.