939 Commits

Author SHA1 Message Date
Ming Lei
1b95e81791 nvme: fix possible hang when removing a controller during error recovery
Error recovery can be interrupted by controller removal, then the
controller is left as quiesced, and IO hang can be caused.

Fix the issue by unquiescing controller unconditionally when removing
namespaces.

This way is reasonable and safe given forward progress can be made
when removing namespaces.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Chunguang Xu <brookxu.cn@gmail.com>
Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21 00:53:26 -07:00
Christoph Hellwig
ac522fc6c3 nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
While duplicate IDs are still very harmful, including the potential to easily
see changing devices in /dev/disk/by-id, it turn out they are extremely
common for cheap end user NVMe devices.

Relax our check for them for so that it doesn't reject the probe on
single-ported PCIe devices, but prints a big warning instead.  In doubt
we'd still like to see quirk entries to disable the potential for
changing supposed stable device identifier links, but this will at least
allow users how have two (or more) of these devices to use them without
having to manually add a new PCI ID entry with the quirk through sysfs or
by patching the kernel.

Fixes: 2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique")
Cc: stable@vger.kernel.org # 6.0+
Co-developed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-13 08:12:50 -07:00
Linus Torvalds
e50df24979 block-6.5-2023-07-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSjJ2IQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsQMEACQiUBw81tXvetYhz3P/4KrrjvUobgqMU0w
 jtrxqMgPee9FbqCShpj76c+La5wu23DnlrCXoHZxFQuiQLnsX5xFV66NYVi+W1CN
 k5MHP7f2e9V0T7qJ9UoHFRV1k22LF4X6T8njEZimxsm/uXfpav/knkhI7nUDnB1K
 wxlu9akD2Bo/X9O2NTS+X6qjoawZ6rDWN15THMXlC45VzJPLmIcs07Ev+mvw21KE
 XqasoZrxEO0S8dWxmJgJGqnRIOQptTS5U+0OPBZT8H220Qp/1q0pQHPw6iLXNrkc
 w1a2W1Bge012gjJt7gCMkdDnZb76sKiyGuMbFME7DoRbLCQeaOtoSfmg7NoRI2gp
 74TCSr7dPWZUVUy5Tmsy0DCv0552vIbnlQ69W6Xwx8YkplM3FPiMpWrQ5JWEHdvv
 Zl84mLP6Yyo54JVuk9zi8q/2L0HfyfMDj4UM/mNs8hwmcUSbPO2TKdIWDaq8xPuS
 Ed+D+kg6XFux8tLnCSDLNbaD5JE+ak9gTVhNdRa/zFE04o/OeidscKEqRSYTkdXL
 2p34qtw5kEQocO4Pa3eUGO6KJCDTR36Rms5p6ZFybL4O2oZYrAbRi1TGDxaG2Hag
 GCr2vaFbmz1zbGuMpFhLha5B7HeDLs+PHOn+B1iUNjEr9RC0EOHV7moJKqjxlnCh
 4mBkK/Nlyg==
 =kSeX
 -----END PGP SIGNATURE-----

Merge tag 'block-6.5-2023-07-03' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:
 "Mostly items that came in a bit late for the initial pull request,
  wanted to make sure they had the appropriate amount of linux-next soak
  before going upstream.

  Outside of stragglers, just generic fixes for either merge window
  items, or longer standing bugs"

* tag 'block-6.5-2023-07-03' of git://git.kernel.dk/linux: (25 commits)
  md/raid0: add discard support for the 'original' layout
  nvme: disable controller on reset state failure
  nvme: sync timeout work on failed reset
  nvme: ensure unquiesce on teardown
  cdrom/gdrom: Fix build error
  nvme: improved uring polling
  block: add request polling helper
  nvme-mpath: fix I/O failure with EAGAIN when failing over I/O
  nvme: host: fix command name spelling
  blk-sysfs: add a new attr_group for blk_mq
  blk-iocost: move wbt_enable/disable_default() out of spinlock
  blk-wbt: cleanup rwb_enabled() and wbt_disabled()
  blk-wbt: remove dead code to handle wbt enable/disable with io inflight
  blk-wbt: don't create wbt sysfs entry if CONFIG_BLK_WBT is disabled
  blk-mq: fix two misuses on RQF_USE_SCHED
  blk-throttle: Fix io statistics for cgroup v1
  bcache: Fix bcache device claiming
  bcache: Alloc holder object before async registration
  raid10: avoid spin_lock from fastpath from raid10_unplug()
  md: fix 'delete_mutex' deadlock
  ...
2023-07-03 18:48:38 -07:00
Jens Axboe
6e34e784e7 nvme fixes for Linux 6.5
- Reduce spamming kernel logs on repeated controller updates (Breno)
  - Improved struct packing (Christophe JAILLET)
  - Misspelled command name in error logging (Damien)
  - Failover fix for temporary frozen queue (Sagi)
  - Reset error handling fixes (Keith)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmSfHiQACgkQPe3zGtjz
 RglOEg/+IBogV9vdGGaExlt6jKeVCLeq/dvBjTtq9adHFYnyF6s9B39GDW9QuhPv
 wGvE8FsWOVcLBVIvwS9ZiPMnjwyOD5DGQ7kqBVRiMT65JOA7sMHz22H432CjcOTM
 54m5ezdeAwdxMJrbElf6cpCQXxdpu6rlQrcpKPw5F6Gl5PWV1sG4U8c5YGsUQ+Jj
 KTjhQIIcvwWRNWfru/6CcZ3b6YxixtuGbCXnls2WhVPN1e8DIgVvQfy7eMLKYT4w
 3OrFe64nTEPSX1aT4FlV+AZN2icB8GJlhHUjIH4kjdquR49ZpItvVO38DdilJ8SU
 2cFfN1rLE9o4Kwozlxi0VgUe/TUwGL3QLlrQRvQMEdG80Y/pe4vwD0n8N/6btZLo
 EJcK4spSdsAmdAx+RTzizl+IhgV2cBWmwLfPv9qZmBODPIEHwVLXky5iVkFqYifk
 jm6PIUYur5RLZsYlW6SiZY1NBc62cigRbtC5Cu7XuxFHXZFC2MW7KWpyAHYSQxJJ
 k8MQr6vvlBLypzAZzZCm9zC3FlMvaFNM7Q4cq52s9vaL4lxQ5zgQ5SjGZnXp2Ow/
 K4PU7mo0BHSoKwfy42uYnWm42hinFSVKROEZo/E7+RoUIjl8S6uAeUyBnWYYGqsv
 g0VtZCUbSAlM8aP3ffi8UZCm84Po3zdB0s62eKNC5FG/CFtveI4=
 =ZzGx
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.5-2023-06-30' of git://git.infradead.org/nvme into block-6.5

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.5

 - Reduce spamming kernel logs on repeated controller updates (Breno)
 - Improved struct packing (Christophe JAILLET)
 - Misspelled command name in error logging (Damien)
 - Failover fix for temporary frozen queue (Sagi)
 - Reset error handling fixes (Keith)"

* tag 'nvme-6.5-2023-06-30' of git://git.infradead.org/nvme:
  nvme: disable controller on reset state failure
  nvme: sync timeout work on failed reset
  nvme: ensure unquiesce on teardown
  nvme-mpath: fix I/O failure with EAGAIN when failing over I/O
  nvme: host: fix command name spelling
  nvmet: Reorder fields in 'struct nvmet_ns'
  nvme: Print capabilities changes just once
2023-06-30 14:04:08 -06:00
Linus Torvalds
ca7ce08d6a SCSI misc on 20230629
Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
 lpfc, qla2xxx).  We have a couple of major core changes impacting
 other systems: Command Duration Limits, which spills into block and
 ATA and block level Persistent Reservation Operations, which touches
 block, nvme, target and dm (both of which are added with merge commits
 containing a cover letter explaining what's going on).
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZJ19cSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfZpAQCQBuWR
 ELcOhsaG5KzO6xLWcH8mjsOoxffKvazZjTKXlAD5ATEv7++E250oKS3t+yfjae5I
 Lc195MlDju85ItUQgfk=
 =U9ik
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
  lpfc, qla2xxx).

  We have a couple of major core changes impacting other systems:

   - Command Duration Limits, which spills into block and ATA

   - block level Persistent Reservation Operations, which touches block,
     nvme, target and dm

  Both of these are added with merge commits containing a cover letter
  explaining what's going on"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
  scsi: core: Improve warning message in scsi_device_block()
  scsi: core: Replace scsi_target_block() with scsi_block_targets()
  scsi: core: Don't wait for quiesce in scsi_device_block()
  scsi: core: Don't wait for quiesce in scsi_stop_queue()
  scsi: core: Merge scsi_internal_device_block() and device_block()
  scsi: sg: Increase number of devices
  scsi: bsg: Increase number of devices
  scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
  scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
  scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
  scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
  scsi: ufs: ufs-qcom: Switch to the new ICE API
  scsi: ufs: dt-bindings: qcom: Add ICE phandle
  scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
  scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
  scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
  scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
  scsi: ufs: core: Remove dedicated hwq for dev command
  scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
  scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
  ...
2023-06-30 11:57:07 -07:00
Linus Torvalds
a0433f8cae for-6.5/block-2023-06-23
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8dwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpilGD/9Yys1oxIXJpRf00fzrylAlBthRxMjFQVWw
 zAut106hAQiBHvU8IkmGA3MvEFVHxtzwYhHI7IR8K3aZBIqscweCqmVI9JyogJw9
 U9Twnzel47VmuKdM94FeoN+hbj1fP8EWTjzmy67/zEEfFCdmHvNlMi3lSrGYIpFy
 39LxTB99Y4UarM5PtWbes37GYYljzMSWKuo4AfBkvq1eQa+sZ0Vq2xAABKq3UM7f
 apqhgHtkJooRePDP0eQp+kAyyVMgW2jIK+oIdJDxNF3CKTu2w40RzaYz6fp+jVSU
 H4R/xS59GW4/xql+VBJDh/qJg9K62DPPYjlW8BmSR8+IjvfFpsyH3/MacE50CD3P
 20fs/Mnj49H79fDrQEHJI53cOOb2EmUitbwLbvOcColNTPpt8loBtdQxjF2RMU8R
 Nyort9DJPFclYCxky1LYg1CNEC2Ln4Zy/jD47wPvqRmOQphOoVlV/hPnOEqvjaZC
 49Vn70W2DeE9cXvYI7ha+XIg6/oj+Gs3iusEbV08Ci7EAtXgI+ZUUsQ97K8UNiUh
 h2lqSJtuI7lBpYP9sf+BeCch5UCC+xGYyTdoM5f58lehWBBPtbs0g7S9RyRyOYxe
 n+yxEUo3dAGzJ/xsKAjinbZfeWIpr0b1TkAh4w3Cq/BKzRr9Bp8lBAxYuancbQ+Y
 1ADPteUOTA==
 =zP4Y
 -----END PGP SIGNATURE-----

Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe pull request via Keith:
      - Various cleanups all around (Irvin, Chaitanya, Christophe)
      - Better struct packing (Christophe JAILLET)
      - Reduce controller error logs for optional commands (Keith)
      - Support for >=64KiB block sizes (Daniel Gomez)
      - Fabrics fixes and code organization (Max, Chaitanya, Daniel
        Wagner)

 - bcache updates via Coly:
      - Fix a race at init time (Mingzhe Zou)
      - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)

 - use page pinning in the block layer for dio (David)

 - convert old block dio code to page pinning (David, Christoph)

 - cleanups for pktcdvd (Andy)

 - cleanups for rnbd (Guoqing)

 - use the unchecked __bio_add_page() for the initial single page
   additions (Johannes)

 - fix overflows in the Amiga partition handling code (Michael)

 - improve mq-deadline zoned device support (Bart)

 - keep passthrough requests out of the IO schedulers (Christoph, Ming)

 - improve support for flush requests, making them less special to deal
   with (Christoph)

 - add bdev holder ops and shutdown methods (Christoph)

 - fix the name_to_dev_t() situation and use cases (Christoph)

 - decouple the block open flags from fmode_t (Christoph)

 - ublk updates and cleanups, including adding user copy support (Ming)

 - BFQ sanity checking (Bart)

 - convert brd from radix to xarray (Pankaj)

 - constify various structures (Thomas, Ivan)

 - more fine grained persistent reservation ioctl capability checks
   (Jingbo)

 - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
   Jordy, Li, Min, Yu, Zhong, Waiman)

* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
  scsi/sg: don't grab scsi host module reference
  ext4: Fix warning in blkdev_put()
  block: don't return -EINVAL for not found names in devt_from_devname
  cdrom: Fix spectre-v1 gadget
  block: Improve kernel-doc headers
  blk-mq: don't insert passthrough request into sw queue
  bsg: make bsg_class a static const structure
  ublk: make ublk_chr_class a static const structure
  aoe: make aoe_class a static const structure
  block/rnbd: make all 'class' structures const
  block: fix the exclusive open mask in disk_scan_partitions
  block: add overflow checks for Amiga partition support
  block: change all __u32 annotations to __be32 in affs_hardblocks.h
  block: fix signed int overflow in Amiga partition support
  block: add capacity validation in bdev_add_partition()
  block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
  block: disallow Persistent Reservation on partitions
  reiserfs: fix blkdev_put() warning from release_journal_dev()
  block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
  block: document the holder argument to blkdev_get_by_path
  ...
2023-06-26 12:47:20 -07:00
Breno Leitao
d0dd594bed nvme: Print capabilities changes just once
This current dev_info() could be very verbose and being printed very
frequently depending on some userspace application sending some specific
commands.

Just print this message once and skip it until the controller resets.
Use a controller flag (NVME_CTRL_DIRTY_CAPABILITY) to track if the
capability needs a reset.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-21 09:40:05 -07:00
Keith Busch
c917dd96fe nvme: skip optional id ctrl csi if it failed
A frequently recieved report is the driver requests the optional Command
Set Specific Identify Controller structure. Some controllers report this
in their error log, which tiggers other warnings to user space
monitoring the devices.

These error entries are harmless and of questionable value to save in
the log, but let's reduce their occurance by not resending the command
if it previously failed. This will not prevent the errors on the initial
module load, but will greatly reduce their occurance on any rescans and
resumes from suspend.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217445
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 18:24:15 -07:00
Irvin Cote
35e797b024 nvme-core: use nvme_ns_head_multipath instead of ns->head->disk
Change the way we check for a multipath nshead so as
to consistently use the same check to assert the same condition.

Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:44:57 -07:00
Daniel Gomez
bdbfcd5f6c nvme: Increase block size variable size to 32-bit
Increase block size variable size to 32-bit unsigned to be able to
support block devices larger than 32k (starting from 64 KiB).

Physical and logical block size already support unsigned 32-bit.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:36:59 -07:00
Irvin Cote
2110a6bcd7 nvme-core: remove redundant check from nvme_init_ns_head
nvme_find_ns_head already checks that the list of namescpaces
in an already existing namespace head is not empty

Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:36:59 -07:00
Max Gurtovoy
942e21c042 nvme: move sysfs code to a dedicated sysfs.c file
The core.c file became long and hard to maintain. Create a dedicated
file to centralize the sysfs functionality. This is a common practice to
separate sysfs/configfs related logic from the main driver logic .c file.
For example, in the nvmet module the configfs interface has its own
dedicated file.

This patch does not include any functional changes.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
[merged dhchap memleak fixes, include nvme-auth.h]
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:36:59 -07:00
Chaitanya Kulkarni
7ed5cf8e6d nvme-core: fix dev_pm_qos memleak
Call dev_pm_qos_hide_latency_tolerance() in the error unwind patch to
avoid following kmemleak:-

blktests (master) # kmemleak-clear; ./check nvme/044;
blktests (master) # kmemleak-scan ; kmemleak-show
nvme/044 (Test bi-directional authentication)                [passed]
    runtime  2.111s  ...  2.124s
unreferenced object 0xffff888110c46240 (size 96):
  comm "nvme", pid 33461, jiffies 4345365353 (age 75.586s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000069ac2cec>] kmalloc_trace+0x25/0x90
    [<000000006acc66d5>] dev_pm_qos_update_user_latency_tolerance+0x6f/0x100
    [<00000000cc376ea7>] nvme_init_ctrl+0x38e/0x410 [nvme_core]
    [<000000007df61b4b>] 0xffffffffc05e88b3
    [<00000000d152b985>] 0xffffffffc05744cb
    [<00000000f04a4041>] vfs_write+0xc5/0x3c0
    [<00000000f9491baf>] ksys_write+0x5f/0xe0
    [<000000001c46513d>] do_syscall_64+0x3b/0x90
    [<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Link: https://lore.kernel.org/linux-nvme/CAHj4cs-nDaKzMx2txO4dbE+Mz9ePwLtU0e3egz+StmzOUgWUrA@mail.gmail.com/
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:33:02 -07:00
Chaitanya Kulkarni
3a12a0b868 nvme-core: add missing fault-injection cleanup
Add missing fault-injection cleanup in nvme_init_ctrl() in the error
unwind path that also fixes following message for blktests:-

linux-block (for-next) # grep debugfs debugfs-err.log
[  147.853464] debugfs: Directory 'nvme1' with parent '/' already present!
[  147.853973] nvme1: failed to create debugfs attr
[  148.802490] debugfs: Directory 'nvme1' with parent '/' already present!
[  148.803244] nvme1: failed to create debugfs attr
[  148.877304] debugfs: Directory 'nvme1' with parent '/' already present!
[  148.877775] nvme1: failed to create debugfs attr
[  149.816652] debugfs: Directory 'nvme1' with parent '/' already present!
[  149.818011] nvme1: failed to create debugfs attr

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:33:02 -07:00
Chaitanya Kulkarni
99c2dcc8ff nvme-core: fix memory leak in dhchap_ctrl_secret
Free dhchap_secret in nvme_ctrl_dhchap_ctrl_secret_store() before we
return when nvme_auth_generate_key() returns error.

Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:33:02 -07:00
Chaitanya Kulkarni
a836ca33c5 nvme-core: fix memory leak in dhchap_secret_store
Free dhchap_secret in nvme_ctrl_dhchap_secret_store() before we return
fix following kmemleack:-

unreferenced object 0xffff8886376ea800 (size 64):
  comm "check", pid 22048, jiffies 4344316705 (age 92.199s)
  hex dump (first 32 bytes):
    44 48 48 43 2d 31 3a 30 30 3a 6e 78 72 35 4b 67  DHHC-1:00:nxr5Kg
    75 58 34 75 6f 41 78 73 4a 61 34 63 2f 68 75 4c  uX4uoAxsJa4c/huL
  backtrace:
    [<0000000030ce5d4b>] __kmalloc+0x4b/0x130
    [<000000009be1cdc1>] nvme_ctrl_dhchap_secret_store+0x8f/0x160 [nvme_core]
    [<00000000ac06c96a>] kernfs_fop_write_iter+0x12b/0x1c0
    [<00000000437e7ced>] vfs_write+0x2ba/0x3c0
    [<00000000f9491baf>] ksys_write+0x5f/0xe0
    [<000000001c46513d>] do_syscall_64+0x3b/0x90
    [<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff8886376eaf00 (size 64):
  comm "check", pid 22048, jiffies 4344316736 (age 92.168s)
  hex dump (first 32 bytes):
    44 48 48 43 2d 31 3a 30 30 3a 6e 78 72 35 4b 67  DHHC-1:00:nxr5Kg
    75 58 34 75 6f 41 78 73 4a 61 34 63 2f 68 75 4c  uX4uoAxsJa4c/huL
  backtrace:
    [<0000000030ce5d4b>] __kmalloc+0x4b/0x130
    [<000000009be1cdc1>] nvme_ctrl_dhchap_secret_store+0x8f/0x160 [nvme_core]
    [<00000000ac06c96a>] kernfs_fop_write_iter+0x12b/0x1c0
    [<00000000437e7ced>] vfs_write+0x2ba/0x3c0
    [<00000000f9491baf>] ksys_write+0x5f/0xe0
    [<000000001c46513d>] do_syscall_64+0x3b/0x90
    [<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:33:02 -07:00
Christoph Hellwig
05bdb99653 block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:05 -06:00
Christoph Hellwig
ae220766d8 block: remove the unused mode argument to ->release
The mode argument to the ->release block_device_operation is never used,
so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>			[rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Christoph Hellwig
d32e2bf837 block: pass a gendisk to ->open
->open is only called on the whole device.  Make that explicit by
passing a gendisk instead of the block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Uday Shankar
c7275ce6a5 nvme: improve handling of long keep alives
Upon keep alive completion, nvme_keep_alive_work is scheduled with the
same delay every time. If keep alive commands are completing slowly,
this may cause a keep alive timeout. The following trace illustrates the
issue, taking KATO = 8 and TBKAS off for simplicity:

1. t = 0: run nvme_keep_alive_work, send keep alive
2. t = ε: keep alive reaches controller, controller restarts its keep
          alive timer
3. t = 4: host receives keep alive completion, schedules
          nvme_keep_alive_work with delay 4
4. t = 8: run nvme_keep_alive_work, send keep alive

Here, a keep alive having RTT of 4 causes a delay of at least 8 - ε
between the controller receiving successive keep alives. With ε small,
the controller is likely to detect a keep alive timeout.

Fix this by calculating the RTT of the keep alive command, and adjusting
the scheduling delay of the next keep alive work accordingly.

Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-05-30 20:57:42 -07:00
Uday Shankar
774a963651 nvme: check IO start time when deciding to defer KA
When a command completes, we set a flag which will skip sending a
keep alive at the next run of nvme_keep_alive_work when TBKAS is on.
However, if the command was submitted long ago, it's possible that
the controller may have also restarted its keep alive timer (as a
result of receiving the command) long ago. The following trace
demonstrates the issue, assuming TBKAS is on and KATO = 8 for
simplicity:

1. t = 0: submit I/O commands A, B, C, D, E
2. t = 0.5: commands A, B, C, D, E reach controller, restart its keep
            alive timer
3. t = 1: A completes
4. t = 2: run nvme_keep_alive_work, see recent completion, do nothing
5. t = 3: B completes
6. t = 4: run nvme_keep_alive_work, see recent completion, do nothing
7. t = 5: C completes
8. t = 6: run nvme_keep_alive_work, see recent completion, do nothing
9. t = 7: D completes
10. t = 8: run nvme_keep_alive_work, see recent completion, do nothing
11. t = 9: E completes

At this point, 8.5 seconds have passed without restarting the
controller's keep alive timer, so the controller will detect a keep
alive timeout.

Fix this by checking the IO start time when deciding to defer sending a
keep alive command. Only set comp_seen if the command started after the
most recent run of nvme_keep_alive_work. With this change, the
completions of B, C, and D will not set comp_seen and the run of
nvme_keep_alive_work at t = 4 will send a keep alive.

Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-05-30 09:20:47 -07:00
Uday Shankar
ea4d453b9e nvme: double KA polling frequency to avoid KATO with TBKAS on
With TBKAS on, the completion of one command can defer sending a
keep alive for up to twice the delay between successive runs of
nvme_keep_alive_work. The current delay of KATO / 2 thus makes it
possible for one command to defer sending a keep alive for up to
KATO, which can result in the controller detecting a KATO. The following
trace demonstrates the issue, taking KATO = 8 for simplicity:

1. t = 0: run nvme_keep_alive_work, no keep-alive sent
2. t = ε: I/O completion seen, set comp_seen = true
3. t = 4: run nvme_keep_alive_work, see comp_seen == true,
          skip sending keep-alive, set comp_seen = false
4. t = 8: run nvme_keep_alive_work, see comp_seen == false,
          send a keep-alive command.

Here, there is a delay of 8 - ε between receiving a command completion
and sending the next command. With ε small, the controller is likely to
detect a keep alive timeout.

Fix this by running nvme_keep_alive_work with a delay of KATO / 4
whenever TBKAS is on. Going through the above trace now gives us a
worst-case delay of 4 - ε, which is in line with the recommendation of
sending a command every KATO / 2 in the NVMe specification.

Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-05-30 09:20:44 -07:00
min15.li
31a5978243 nvme: fix miss command type check
In the function nvme_passthru_end(), only the value of the command
opcode is checked, without checking the command type (IO command or
Admin command). When we send a Dataset Management command (The opcode
of the Dataset Management command is the same as the Set Feature
command), kernel thinks it is a set feature command, then sets the
controller's keep alive interval, and calls nvme_keep_alive_work().

Signed-off-by: min15.li <min15.li@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-05-30 08:50:24 -07:00
Martin K. Petersen
7907ad748b Merge patch series "Use block pr_ops in LIO"
Mike Christie <michael.christie@oracle.com> says:

The patches in this thread allow us to use the block pr_ops with LIO's
target_core_iblock module to support cluster applications in VMs. They
were built over Linus's tree. They also apply over linux-next and
Martin's tree and Jens's trees.

Currently, to use windows clustering or linux clustering (pacemaker +
cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you
have to use tcmu or pscsi or use a cluster aware FS/framework for the
LIO pr file. Setting up a cluster FS/framework is pain and waste when
your real backend device is already a distributed device, and pscsi
and tcmu are nice for specific use cases, but iblock gives you the
best performance and allows you to use stacked devices like
dm-multipath. So these patches allow iblock to work like pscsi/tcmu
where they can pass a PR command to the backend module. And then
iblock will use the pr_ops to pass the PR command to the real devices
similar to what we do for unmap today.

The patches are separated in the following groups:

Patch 1 - 2:

 - Add block layer callouts for reading reservations and rename reservation
   error code.

Patch 3 - 5:

 - SCSI support for new callouts.

Patch 6:

 - DM support for new callouts.

Patch 7 - 13:

 - NVMe support for new callouts.

Patch 14 - 18:

 - LIO support for new callouts.

This patchset has been tested with the libiscsi PGR ops and with
window's failover cluster verification test. Note that for scsi
backend devices we need this patchset:

https://lore.kernel.org/linux-scsi/20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7

to handle UAs. To reduce the size of this patchset that's being done
separately to make reviewing easier. And to make merging easier this
patchset and the one above do not have any conflicts so can be merged
in different trees.

Link: https://lore.kernel.org/r/20230407200551.12660-1-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-22 16:35:02 -04:00
Jens Axboe
1878b736e8 Merge tag 'nvme-6.4-2023-05-18' of git://git.infradead.org/nvme into block-6.4
Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.4

 - More device quirks (Sagi, Hristo, Adrian, Daniel)
 - Controller delete race (Maurizo)
 - Multipath cleanup fix (Christoph)"

* tag 'nvme-6.4-2023-05-18' of git://git.infradead.org/nvme:
  nvme-pci: Add quirk for Teamgroup MP33 SSD
  nvme: do not let the user delete a ctrl before a complete initialization
  nvme-multipath: don't call blk_mark_disk_dead in nvme_mpath_remove_disk
  nvme-pci: clamp max_hw_sectors based on DMA optimized limitation
  nvme-pci: add quirk for missing secondary temperature thresholds
  nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G
2023-05-18 19:46:42 -06:00
Maurizio Lombardi
2eb94dd56a nvme: do not let the user delete a ctrl before a complete initialization
If a userspace application performes a "delete_controller" command
early during the ctrl initialization, the delete operation
may race against the init code and the kernel will crash.

nvme nvme5: Connect command failed: host path error
nvme nvme5: failed to connect queue: 0 ret=880
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
 blk_mq_quiesce_queue+0x18/0x90
 nvme_tcp_delete_ctrl+0x24/0x40 [nvme_tcp]
 nvme_do_delete_ctrl+0x7f/0x8b [nvme_core]
 nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
 kernfs_fop_write_iter+0x124/0x1b0
 new_sync_write+0xff/0x190
 vfs_write+0x1ef/0x280

Fix the crash by checking the NVME_CTRL_STARTED_ONCE bit;
if it's not set it means that the nvme controller is still
in the process of getting initialized and the kernel
will return an -EBUSY error to userspace.
Set the NVME_CTRL_STARTED_ONCE later in the nvme_start_ctrl()
function, after the controller start operation is completed.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-05-17 07:36:17 -07:00
Linus Torvalds
556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds
9dd6956b38 for-6.4/block-2023-04-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRCvcIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpk+JEACj01t7Xen2+Razagu3aTx9tmRGFnTNR3MY
 raFG6B1TADk1TgCWWa2C4Dj67SOispPLm8hbIcOxqB1UscDWCCwjmnr/debADFzW
 Ap6shv/IRwVGmDp+F7ocYas0ynwooOJg4WJTwkSKz2o4m4p3vzlwAKi4fLiSjbXp
 gJTrA7WEvDOVjzajlTFUtjr8rc6PdunbGm25cPIufAxUEhvttYex2VbVqjDmfNsE
 8tyyk9RWbe4AY/ZYaGXVn4yQ/CgL/sXFkVc5noRXNfAQ/K3CVLQrFLJ3JlwUHpiA
 xXBor21TUWCZEo33Y2G5NConAYqE7etoPTkaTDO3/aZ+dAMFyhC/WAYLz1KZGMh1
 +g1fDX1QKEd40H2lfDXvqF1ob7Ut8EzUx+gvBXcc3/AiRpJ5rjfOcj6LPUMUqQJk
 nucLLFTiMKecnDMBERbvixqbaTyrjvkFEj2wYJvgj1LKXAd+x/bj8SGajs9r88Nb
 9YT9ai/+Yl7Ppfb67rCgXJU7oNZQSAQ2H+X/l2jbiqImOgq1u/45AmINnbanS7HH
 Y1I8pbH45AcnCgkJRoQwrNX3BnTOTBJ+D/4Fl4b8jsihq0D3UtwCwPCObHP4LW9S
 MUNPhP3tUuYsAgXqX80+Sao6SYvXDwnbWOM+LOaaZXgjb1ndwDUZXpto8Ra8WB1u
 8kM6s6ZR7g==
 =W1Zb
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - drbd patches, bringing us closer to unifying the out-of-tree version
   and the in tree one (Andreas, Christoph)

 - support for auto-quiesce for the s390 dasd driver (Stefan)

 - MD pull request via Song:
      - md/bitmap: Optimal last page size (Jon Derrick)
      - Various raid10 fixes (Yu Kuai, Li Nan)
      - md: add error_handlers for raid0 and linear (Mariusz Tkaczyk)

 - NVMe pull request via Christoph:
      - Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas)
      - Validate nvmet module parameters (Chaitanya Kulkarni)
      - Fence TCP socket on receive error (Chris Leech)
      - Fix async event trace event (Keith Busch)
      - Minor cleanups (Chaitanya Kulkarni, zhenwei pi)
      - Fix and cleanup nvmet Identify handling (Damien Le Moal,
        Christoph Hellwig)
      - Fix double blk_mq_complete_request race in the timeout handler
        (Lei Yin)
      - Fix irq locking in nvme-fcloop (Ming Lei)
      - Remove queue mapping helper for rdma devices (Sagi Grimberg)

 - use structured request attribute checks for nbd (Jakub)

 - fix blk-crypto race conditions between keyslot management (Eric)

 - add sed-opal support for reading read locking range attributes
   (Ondrej)

 - make fault injection configurable for null_blk (Akinobu)

 - clean up the request insertion API (Christoph)

 - clean up the queue running API (Christoph)

 - blkg config helper cleanups (Tejun)

 - lazy init support for blk-iolatency (Tejun)

 - various fixes and tweaks to ublk (Ming)

 - remove hybrid polling. It hasn't really been useful since we got
   async polled IO support, and these days we don't support sync polled
   IO at all (Keith)

 - misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming,
   Chaitanya, me)

* tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits)
  nbd: fix incomplete validation of ioctl arg
  ublk: don't return 0 in case of any failure
  sed-opal: geometry feature reporting command
  null_blk: Always check queue mode setting from configfs
  block: ublk: switch to ioctl command encoding
  blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush
  block, bfq: Fix division by zero error on zero wsum
  fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
  block: store bdev->bd_disk->fops->submit_bio state in bdev
  block: re-arrange the struct block_device fields for better layout
  md/raid5: remove unused working_disks variable
  md/raid10: don't call bio_start_io_acct twice for bio which experienced read error
  md/raid10: fix memleak of md thread
  md/raid10: fix memleak for 'conf->bio_split'
  md/raid10: fix leak of 'r10bio->remaining' for recovery
  md/raid10: don't BUG_ON() in raise_barrier()
  md: fix soft lockup in status_resync
  md: add error_handlers for raid0 and linear
  md: Use optimal I/O size for last bitmap page
  md: Fix types in sb writer
  ...
2023-04-26 12:52:58 -07:00
Lei Yin
d4f1d5f7a4 nvme: fix double blk_mq_complete_request for timeout request with low probability
When nvme_cancel_tagset traverses all tagsets and executes
nvme_cancel_request, this request may be executing blk_mq_free_request
that is called by nvme_rdma_complete_timed_out/nvme_tcp_complete_timed_out.
When blk_mq_free_request executes to WRITE_ONCE(rq->state, MQ_RQ_IDLE) and
__blk_mq_free_request(rq), it will cause double blk_mq_complete_request for
this request, and it will cause a null pointer error in the second
execution of this function because rq->mq_hctx has set to NULL in first
execution.

Signed-off-by: Lei Yin <yinlei2@lenovo.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:59:04 +02:00
Keith Busch
6622b76fe9 nvme: fix async event trace event
Mixing AER Event Type and Event Info has masking clashes. Just print the
event type, but also include the event info of the AER result in the
trace.

Fixes: 09bd1ff4b15143b ("nvme-core: add async event trace helper")
Reported-by: Nate Thornton <nate.thornton@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:59:04 +02:00
Mike Christie
b668f2f546 nvme: Move pr code to it's own file
This patch moves the pr code to it's own file because I'm going to be
adding more functions and core.c is getting bigger.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230407200551.12660-10-michael.christie@oracle.com
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11 21:55:35 -04:00
Mike Christie
d45b446bd8 nvme: Don't hardcode the data len for pr commands
Reservation Report support needs to pass in a variable sized buffer, so
this patch has the pr command helpers take a data length argument.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230407200551.12660-9-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11 21:55:35 -04:00
Mike Christie
7ba150834b block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT
BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's
locking feature which works similar to NVMe/SCSI reservations where a
host can get a lock on a device and when the lock is taken it will get
failures.

This patch renames BLK_STS_NEXUS so it better reflects this type of
use.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230407200551.12660-3-michael.christie@oracle.com
Acked-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11 21:55:35 -04:00
Keith Busch
d3205ab75e nvme: fix discard support without oncs
The device can report discard support without setting the ONCS DSM bit.
When not set, the driver clears max_discard_size expecting it to be set
later. We don't know the size until we have the namespace format,
though, so setting it is deferred until configuring one, but the driver
was abandoning the discard settings due to that initial clearing.

Move the max_discard_size calculation above the check for a '0' discard
size.

Fixes: 1a86924e4f46475 ("nvme: fix interpretation of DMRSL")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-05 17:13:17 +02:00
Greg Kroah-Hartman
cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Martin George
def84ab600 nvme: send Identify with CNS 06h only to I/O controllers
Identify CNS 06h (I/O Command Set Specific Identify Controller data
structure) is supported only on i/o controllers.

But nvme_init_non_mdts_limits() currently invokes this on all
controllers.  Correct this by ensuring this is sent to I/O
controllers only.

Signed-off-by: Martin George <marting@netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-22 09:17:52 +01:00
Greg Kroah-Hartman
1aaba11da9 driver core: class: remove module * from class_create()
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something.  So just remove it and fix up all callers of the function in
the kernel tree at the same time.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:16:33 +01:00
Ming Lei
37f0dc2ec7 nvme: fix handling single range discard request
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.

Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.

Fix the issue by building range from request sector/nr_sectors directly.

Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:51 +01:00
Linus Torvalds
9d0281b56b block-6.3-2023-03-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmQB57MQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgputpEADVrc1OFzHOivJq+LJ3HS3ufhLBthtgu1Lp
 sEHvDNp9tBGXMLkomuCYpAju5TBAEKC+AJTZyj9iS1j++ItoezdoP55YRIH7t2Or
 UTy8ex3rLPGkQk6k3o8roWCyajTW/ZS+4fmk+NkVYMLsQBp9I+kFbxgJa5bbREdU
 Z8b/9hcBGz58R8Kq+TEMp/bO7oCV4c8xWumrKER+MktDDx0kc5d+afWXoy7bEKFg
 jLB3gleTM9HUpa9a2GPc4fxqdb0KanQdMtiyn/oplg0JcZLMiHfRbiRnsgQkjN0O
 RVtUcdxXmOkQeFra4GXPiHmQBcIfE85wP4wxb8p/F2StYRhb1epzzeCXOhuNZvv4
 dd6OSARgtzWt3OlHka4aC63H4kzs9SxJp0F2uwuPLV0fM91TP1oOTWV+53FrQr9Z
 OQYyB8d9Il4K72NFLwU4ukJ1fPoCRHjpgAXIIkasEjaBftpJlMNnfblncTZTBumy
 XumFVdKfvqc3OFt8LLKWqLDV0j3TknVeCMPKhsbRwQ0NG4vlNOSWaLkGJCDLJ7ga
 ebf8AD5eaLCT9qyYquBuW5VBKZH5Z4rf5yHta9Dx+Omu0JTQYtTkiiM3UTdpDbtq
 SObZ31UvLoYK2dOZcVgjhE2RgM/AV5jJcx7aHhT3UptavAehHbePgiNhuEEntlKv
 L87kXJkSSQ==
 =ezrg
 -----END PGP SIGNATURE-----

Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - Don't access released socket during error recovery (Akinobu
        Mita)
      - Bring back auto-removal of deleted namespaces during sequential
        scan (Christoph Hellwig)
      - Fix an error code in nvme_auth_process_dhchap_challenge (Dan
        Carpenter)
      - Show well known discovery name (Daniel Wagner)
      - Add a missing endianess conversion in effects masking (Keith
        Busch)

 - Fix for a regression introduced in blk-rq-qos during init in this
   merge window (Breno)

 - Reorder a few fields in struct blk_mq_tag_set, eliminating a few
   holes and shrinking it (Christophe)

 - Remove redundant bdev_get_queue() NULL checks (Juhyung)

 - Add sed-opal single user mode support flag (Luca)

 - Remove SQE128 check in ublk as it isn't needed, saving some memory
   (Ming)

 - Op specific segment checking for cloned requests (Uday)

 - Exclusive open partition scan fixes (Yu)

 - Loop offset/size checking before assigning them in the device (Zhong)

 - Bio polling fixes (me)

* tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux:
  blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
  nvme-fabrics: show well known discovery name
  nvme-tcp: don't access released socket during error recovery
  nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
  nvme: bring back auto-removal of deleted namespaces during sequential scan
  blk-iocost: Pass gendisk to ioc_refresh_params
  nvme: fix sparse warning on effects masking
  block: be a bit more careful in checking for NULL bdev while polling
  block: clear bio->bi_bdev when putting a bio back in the cache
  loop: loop_set_status_from_info() check before assignment
  ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
  block: remove more NULL checks after bdev_get_queue()
  blk-mq: Reorder fields in 'struct blk_mq_tag_set'
  block: fix scan partition for exclusively open device again
  block: Revert "block: Do not reread partition table on exclusively open device"
  sed-opal: add support flag for SUM in status ioctl
2023-03-03 10:21:39 -08:00
Christoph Hellwig
0dd6fff2aa nvme: bring back auto-removal of deleted namespaces during sequential scan
Bring back the check of the Identify Namespace return value for the
legacy NVMe 1.0-style sequential scanning.  While NVMe 1.0 does not
support namespace management, there are "modern" cloud solutions like
Google Cloud Platform that claim the obsolete 1.0 compliance for no
good reason while supporting proprietary sideband namespace management.

Fixes: 1a893c2bfef4 ("nvme: refactor namespace probing")
Reported-by: Nils Hanke <nh@edgeless.systems>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Nils Hanke <nh@edgeless.systems>
2023-02-28 06:14:29 -07:00
Keith Busch
c0c33b94cf nvme: fix sparse warning on effects masking
The log entries are stored in le32, so use appropriate byte swapping
macros.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302242222.PevBhzvC-lkp@intel.com/
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-27 06:56:04 -07:00
Linus Torvalds
5b0ed59649 for-6.3/block-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
 JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
 PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
 Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
 OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
 Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
 PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
 2uEgcHzHQw==
 =poRc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Jens Axboe
455944e4e4 nvme updates for Linux 6.3
- small improvements to the logging functionality (Amit Engel)
  - authentication cleanups (Hannes Reinecke)
  - cleanup and optimize the DMA mapping cod in the PCIe driver
    (Keith Busch)
  - work around the command effects for Format NVM (Keith Busch)
  - misc cleanups (Keith Busch, Christoph Hellwig)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmPh7B8LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOIgxAAtLJaXzRFrKF6mK6j2PcXulJDVcayQYIiEBBTn4Li
 DkUr7QBqzRf+Y6TlkycNlAKroUa/tUEAFZWLpRf2t4Vvx2wqCMJGfJGz5/cJa+cs
 wM3T/9qijq+FG1t/ZA8L774vDUnwDHmU7WJgdva2xnXrzFGnh0GqgLyOS37G5Mqh
 kU3iQgMo+089IHmvD10IZjFjaWifbxQbC4gKxno53JYxvn3Ktu57PxBI/jPKIsbT
 yQj21LvCZwDIehpQqtCieLjjNrj1g130hYxplcLWB2D4BkC0rcqTCh+Jb7RtprzP
 wHrZig1zAA7IzGXyouQymi56HJiCAdTOgynVrg4j/VmQIqE3+H/KKQ0n7xMGEqIM
 YGO3CXuyS3DcX3SefuHroZapl95EbnF/EBHE1gGG49hPY2SoEGHlZwou3Hu/mWbm
 lOSuLFCCwvahKC6SzGr7+Vk5LgrwzAhl2Nwf9i1B+q0mnBHxiOXCeGmrtMde63W0
 qlpBhbwquoVbdia3oEvLfos7QZmw1eoyL4MWF6IG+rqxut+3A9eoyIAR61XLepny
 0yr7Kn0hZGtiZPJXhTG43p9hoVUP+FoihlwyGpZ8O2/HQFXZOmWRBPBF+L3mL/u2
 TPeiDEycWx3MtiGENts9UzaxEz9mo1pcky3vcsj8Ch4qYoYfFOmUd5F8Ay+nBETX
 VbQ=
 =yiTz
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme into for-6.3/block

Pull NVMe updates from Christoph:

"nvme updates for Linux 6.3

 - small improvements to the logging functionality (Amit Engel)
 - authentication cleanups (Hannes Reinecke)
 - cleanup and optimize the DMA mapping cod in the PCIe driver
   (Keith Busch)
 - work around the command effects for Format NVM (Keith Busch)
 - misc cleanups (Keith Busch, Christoph Hellwig)"

* tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme:
  nvme: mask CSE effects for security receive
  nvme: always initialize known command effects
  nvmet: for nvme admin set_features cmd, call nvmet_check_data_len_lte()
  nvme-tcp: add additional info for nvme_tcp_timeout log
  nvme: add nvme_opcode_str function for all nvme cmd types
  nvme: remove nvme_execute_passthru_rq
  nvme-pci: place descriptor addresses in iod
  nvme-pci: use mapped entries for sgl decision
  nvme-pci: remove SGL segment descriptors
  nvme-auth: don't use NVMe status codes
  nvme-fabrics: clarify AUTHREQ result handling
2023-02-07 07:22:58 -07:00
Linus Torvalds
0136d86b78 block-6.2-2023-02-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPdRq8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjcqEADcWlRjkcLzRpEMD9g3IyDShasT1JVeSvV6
 xqDuA0kRF6DyObu82jE2wiZ49FRpeCUw6S6ZdVhvwGHgPpfLBuPWonFnTqxYAnSz
 XCYnt4QdZHGiydIHVxkyP8Raz6d24kZawlUmbE7dcfksNziyGR5UjbCsk1HNJhmf
 EvnLZ2EozZwsZLW/RRYZrh9Q8ccB8kJeX+JuUVw7sboNyJ+bW+x+7prlm3CKgopX
 IiP69E6qIPe6RHkyLRdKgYgxRdcgeq6uJk/nuZ/6uPCcyrz+0QEtge3CkTe7zLkF
 CPmbWlqngmNfNsS93nPTK2kHWTz8P2spo+UTkXIegSYBA8CIr9lDxazSFKT0B6zH
 yIWzmQoE7YXRI5B21rlPvNGE/gPSy48mSn1ym/MCf+UyWGneRypeU/K//2Ww3UJK
 F1Xl2c1v/EEr28qPuC8VQbAsQ56GOcZ6zW4Q0grxTYm0KzzJ2O5B3FEHdCWlS/x9
 KY5v3a8a3nXg9rNio0ruXiyD5l7PE5nFESNrBFDS4kEfxk4cx50ZfgDH68d515/W
 //EnNjx9nN20yF+LcKD70KJHxPdWaUXGT2c1+E/tdbrgUKReCpER+5hQc8+YxQML
 DCbzr7LJjX5mmDQ5YI6Y09/L6luzFMjrnxpmXkL7nyWQlSYkMqus3vPtDcJ5Xk2J
 shHBlzIcuw==
 =/+rE
 -----END PGP SIGNATURE-----

Merge tag 'block-6.2-2023-02-03' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
 "A bit bigger than I'd like at this point, but mostly a bunch of little
  fixes. In detail:

   - NVMe pull request via Christoph:
       - Fix a missing queue put in nvmet_fc_ls_create_association
         (Amit Engel)
       - Clear queue pointers on tag_set initialization failure
         (Maurizio Lombardi)
       - Use workqueue dedicated to authentication (Shin'ichiro
         Kawasaki)

   - Fix for an overflow in ublk (Liu)

   - Fix for leaking a queue reference in block cgroups (Ming)

   - Fix for a use-after-free in BFQ (Yu)"

* tag 'block-6.2-2023-02-03' of git://git.kernel.dk/linux:
  blk-cgroup: don't update io stat for root cgroup
  nvme-auth: use workqueue dedicated to authentication
  nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set
  nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set
  nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
  block: Fix the blk_mq_destroy_queue() documentation
  block: ublk: extending queue_size to fix overflow
  block, bfq: fix uaf for bfqq in bic_set_bfqq()
2023-02-03 11:35:42 -08:00
Christoph Hellwig
4bee16daf1 nvme: use bvec_set_virt to initialize special_vec
Use the bvec_set_virt helper to initialize the special_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03 08:20:55 -07:00
Keith Busch
baff649144 nvme: mask CSE effects for security receive
The nvme driver will freeze the IO queues in response to an admin
command with CSE bits set. These bits notify the host that the command
that's about to be executed needs to be done exclusively, hence the
freeze.

The Security Receive command is often reported by multiple vendors with
CSE bits set. The reason for this is that the result depends on the
previous Security Send. This has nothing to do with IO queues, though,
so the driver is taking an overly cautious response to seeing this
passthrough command, while unable to fufill the intended admin queue
action.

Rather than freeze IO during this harmless command, mask off the
effects. This freezing is observed to cause IO latency spikes when host
software periodically validates the security state of the drives.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01 16:10:10 +01:00
Keith Busch
cc115cbe12 nvme: always initialize known command effects
Instead of appending command effects flags per IO, set the known effects
flags the driver needs to react to just once during initial setup.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01 16:10:09 +01:00
Christoph Hellwig
62281b9ed6 nvme: remove nvme_execute_passthru_rq
After moving the nvme_passthru_end call to the callers of
nvme_execute_passthru_rq, this function has become quite pointless,
so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-02-01 14:21:59 +01:00
Maurizio Lombardi
6fbf13c0e2 nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set
In nvme_alloc_io_tag_set(), the connect_q pointer should be set to NULL
in case of error to avoid potential invalid pointer dereferences.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01 14:18:46 +01:00
Maurizio Lombardi
fd62678ab5 nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set
If nvme_alloc_admin_tag_set() fails, the admin_q and fabrics_q pointers
are left with an invalid, non-NULL value. Other functions may then check
the pointers and dereference them, e.g. in

  nvme_probe() -> out_disable: -> nvme_dev_remove_admin().

Fix the bug by setting admin_q and fabrics_q to NULL in case of error.

Also use the set variable to free the tag_set as ctrl->admin_tagset isn't
initialized yet.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01 14:18:46 +01:00