linux

iv/linux

Author	SHA1	Message	Date
Keith Busch	071a839286	nvme: fix passthrough csi check [ Upstream commit 85eee6341abb81ac6a35062ffd5c3029eb53be6b ] The namespace head saves the Command Set Indicator enum, so use that instead of the Command Set Selected. The two values are not the same. Fixes: 831ed60c2aca2d ("nvme: also return I/O command effects from nvme_command_effects") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:27:28 +01:00
Keith Busch	5bd69d2ea8	nvme-pci: fix timeout request state check [ Upstream commit 1c5842085851f786eba24a39ecd02650ad892064 ] Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:27:13 +01:00
Jens Axboe	7ec9a45fc4	block: handle bio_split_to_limits() NULL return commit 613b14884b8595e20b9fac4126bf627313827fbe upstream. This can't happen right now, but in preparation for allowing bio_split_to_limits() returning NULL if it ended the bio, check for it in all the callers. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-18 11:48:58 +01:00
Christoph Hellwig	69f4bda5f4	nvme: also return I/O command effects from nvme_command_effects [ Upstream commit 831ed60c2aca2d7c517b2da22897a90224a97d27 ] To be able to use the Commands Supported and Effects Log for allowing unprivileged passtrough, it needs to be corretly reported for I/O commands as well. Return the I/O command effects from nvme_command_effects, and also add a default list of effects for the NVM command set. For other command sets, the Commands Supported and Effects log is required to be present already. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:59:17 +01:00
Christoph Hellwig	a6a4b057cd	nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it [ Upstream commit 61f37154c599cf9f2f84dcbd9be842f8645a7099 ] Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a single value to multiple array entries instead of repeated assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:59:17 +01:00
Yanjun Zhang	4df413d469	nvme: fix multipath crash caused by flush request when blktrace is enabled [ Upstream commit 3659fb5ac29a5e6102bebe494ac789fd47fb78f4 ] The flush request initialized by blk_kick_flush has NULL bio, and it may be dealt with nvme_end_req during io completion. When blktrace is enabled, nvme_trace_bio_complete with multipath activated trying to access NULL pointer bio from flush request results in the following crash: [ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a [ 2517.835213] #PF: supervisor read access in kernel mode [ 2517.838724] #PF: error_code(0x0000) - not-present page [ 2517.842222] PGD 7b2d51067 P4D 0 [ 2517.845684] Oops: 0000 [#1] SMP NOPTI [ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S 5.15.67-0.cl9.x86_64 #1 [ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022 [ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] [ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30 [ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba [ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286 [ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000 [ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000 [ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000 [ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8 [ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018 [ 2517.894434] FS: 0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000 [ 2517.898299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0 [ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2517.913761] PKRU: 55555554 [ 2517.917558] Call Trace: [ 2517.921294] <TASK> [ 2517.924982] nvme_complete_rq+0x1c3/0x1e0 [nvme_core] [ 2517.928715] nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp] [ 2517.932442] nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp] [ 2517.936137] ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp] [ 2517.939830] tcp_read_sock+0x9c/0x260 [ 2517.943486] nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp] [ 2517.947173] nvme_tcp_io_work+0x64/0x90 [nvme_tcp] [ 2517.950834] process_one_work+0x1e8/0x390 [ 2517.954473] worker_thread+0x53/0x3c0 [ 2517.958069] ? process_one_work+0x390/0x390 [ 2517.961655] kthread+0x10c/0x130 [ 2517.965211] ? set_kthread_struct+0x40/0x40 [ 2517.968760] ret_from_fork+0x1f/0x30 [ 2517.972285] </TASK> To avoid this situation, add a NULL check for req->bio before calling trace_block_bio_complete. Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:59:16 +01:00
Christoph Hellwig	8e228ac90c	nvmet: don't defer passthrough commands with trivial effects to the workqueue [ Upstream commit 2a459f6933e1c459bffb7cc73fd6c900edc714bd ] Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:58:43 +01:00
Keith Busch	4c5fee0d88	nvme-pci: fix page size checks [ Upstream commit 841734234a28fd5cd0889b84bd4d93a0988fa11e ] The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE. Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:58:42 +01:00
Keith Busch	9141144b37	nvme-pci: fix mempool alloc size [ Upstream commit c89a529e823d51dd23c7ec0c047c7a454a428541 ] Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries. The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool. While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence. Cc: Jens Axboe <axboe@kernel.dk> Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:58:42 +01:00
Klaus Jensen	f17cf8fa2c	nvme-pci: fix doorbell buffer value endianness [ Upstream commit b5f96cb719d8ba220b565ddd3ba4ac0d8bcfb130 ] When using shadow doorbells, the event index and the doorbell values are written to host memory. Prior to this patch, the values written would erroneously be written in host endianness. This causes trouble on big-endian platforms. Fix this by adding missing endian conversions. This issue was noticed by Guenter while testing various big-endian platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is up for review as well[2]. [1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/ [2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/ Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 11:58:42 +01:00
Joel Granados	63d011ad05	nvme: return err on nvme_init_non_mdts_limits fail [ Upstream commit bcaf434b8f04e1ee82a8b1e1bce0de99fbff67fa ] In nvme_init_non_mdts_limits function we were returning 0 when kzalloc failed; it now returns -ENOMEM. Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits") Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:10 +01:00
Christoph Hellwig	1a5aaa5736	nvmet: only allocate a single slab for bvecs [ Upstream commit fa8f9ac42350edd3ce82d0d148a60f0fa088f995 ] There is no need to have a separate slab cache for each namespace, and having separate ones creates duplicate debugs file names as well. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:09 +01:00
Lei Rao	de0866b94a	nvme-pci: clear the prp2 field when not used [ Upstream commit a56ea6147facce4ac1fc38675455f9733d96232b ] If the prp2 field is not filled in nvme_setup_prp_simple(), the prp2 field is garbage data. According to nvme spec, the prp2 is reserved if the data transfer does not cross a memory page boundary, so clear it to zero if it is not used. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-19 12:36:44 +01:00
Pankaj Raghav	b8c2f0392d	nvme initialize core quirks before calling nvme_init_subsystem [ Upstream commit 6f2d71524bcfdeb1fcbd22a4a92a5b7b161ab224 ] A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN (such as Samsung X5) but it would still give a: "missing or invalid SUBNQN field" warning as core quirks are filled after calling nvme_init_subnqn. Fill ctrl->quirks from struct core_quirks before calling nvme_init_subsystem to fix this. Tested on a Samsung X5. Fixes: ab9e00cc72fa ("nvme: track subsystems") Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-14 11:37:27 +01:00
Caleb Sander	787d81d4eb	nvme: fix SRCU protection of nvme_ns_head list [ Upstream commit 899d2a05dc14733cfba6224083c6b0dd5a738590 ] Walking the nvme_ns_head siblings list is protected by the head's srcu in nvme_ns_head_submit_bio() but not nvme_mpath_revalidate_paths(). Removing namespaces from the list also fails to synchronize the srcu. Concurrent scan work can therefore cause use-after-frees. Hold the head's srcu lock in nvme_mpath_revalidate_paths() and synchronize with the srcu, not the global RCU, in nvme_ns_remove(). Observed the following panic when making NVMe/RDMA connections with native multipath on the Rocky Linux 8.6 kernel (it seems the upstream kernel has the same race condition). Disassembly shows the faulting instruction is cmp 0x50(%rdx),%rcx; computing capacity != get_capacity(ns->disk). Address 0x50 is dereferenced because ns->disk is NULL. The NULL disk appears to be the result of concurrent scan work freeing the namespace (note the log line in the middle of the panic). [37314.206036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [37314.206036] nvme0n3: detected capacity change from 0 to 11811160064 [37314.299753] PGD 0 P4D 0 [37314.299756] Oops: 0000 [#1] SMP PTI [37314.299759] CPU: 29 PID: 322046 Comm: kworker/u98:3 Kdump: loaded Tainted: G W X --------- - - 4.18.0-372.32.1.el8test86.x86_64 #1 [37314.299762] Hardware name: Dell Inc. PowerEdge R720/0JP31P, BIOS 2.7.0 05/23/2018 [37314.299763] Workqueue: nvme-wq nvme_scan_work [nvme_core] [37314.299783] RIP: 0010:nvme_mpath_revalidate_paths+0x26/0xb0 [nvme_core] [37314.299790] Code: 1f 44 00 00 66 66 66 66 90 55 53 48 8b 5f 50 48 8b 83 c8 c9 00 00 48 8b 13 48 8b 48 50 48 39 d3 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 50 74 05 f0 80 60 70 ef 48 8b 50 30 48 8d 42 d0 48 39 d3 [37315.058803] RSP: 0018:ffffabe28f913d10 EFLAGS: 00010202 [37315.121316] RAX: ffff927a077da800 RBX: ffff92991dd70000 RCX: 0000000001600000 [37315.206704] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff92991b719800 [37315.292106] RBP: ffff929a6b70c000 R08: 000000010234cd4a R09: c0000000ffff7fff [37315.377501] R10: 0000000000000001 R11: ffffabe28f913a30 R12: 0000000000000000 [37315.462889] R13: ffff92992716600c R14: ffff929964e6e030 R15: ffff92991dd70000 [37315.548286] FS: 0000000000000000(0000) GS:ffff92b87fb80000(0000) knlGS:0000000000000000 [37315.645111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [37315.713871] CR2: 0000000000000050 CR3: 0000002208810006 CR4: 00000000000606e0 [37315.799267] Call Trace: [37315.828515] nvme_update_ns_info+0x1ac/0x250 [nvme_core] [37315.892075] nvme_validate_or_alloc_ns+0x2ff/0xa00 [nvme_core] [37315.961871] ? __blk_mq_free_request+0x6b/0x90 [37316.015021] nvme_scan_work+0x151/0x240 [nvme_core] [37316.073371] process_one_work+0x1a7/0x360 [37316.121318] ? create_worker+0x1a0/0x1a0 [37316.168227] worker_thread+0x30/0x390 [37316.212024] ? create_worker+0x1a0/0x1a0 [37316.258939] kthread+0x10a/0x120 [37316.297557] ? set_kthread_struct+0x50/0x50 [37316.347590] ret_from_fork+0x35/0x40 [37316.390360] Modules linked in: nvme_rdma nvme_tcp(X) nvme_fabrics nvme_core netconsole iscsi_tcp libiscsi_tcp dm_queue_length dm_service_time nf_conntrack_netlink br_netfilter bridge stp llc overlay nft_chain_nat ipt_MASQUERADE nf_nat xt_addrtype xt_CT nft_counter xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_multiport nft_compat nf_tables libcrc32c nfnetlink dm_multipath tg3 rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm intel_rapl_msr iTCO_wdt iTCO_vendor_support dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel ib_uverbs rapl intel_cstate intel_uncore ib_core ipmi_si joydev mei_me pcspkr ipmi_devintf mei lpc_ich wmi ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 mlx5_core drm_kms_helper syscopyarea [37316.390419] sysfillrect ahci sysimgblt fb_sys_fops libahci drm crc32c_intel libata mlxfw pci_hyperv_intf tls i2c_algo_bit psample dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nvme_core] [37317.645908] CR2: 0000000000000050 Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan") Signed-off-by: Caleb Sander <csander@purestorage.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-08 11:28:44 +01:00
Aleksandr Miloserdov	44d50fccf8	nvmet: fix memory leak in nvmet_subsys_attr_model_store_locked [ Upstream commit becc4cac309dc867571f0080fde4426a6c2222e0 ] Since model_number is allocated before it needs to be freed before kmemdump_nul. Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Aleksandr Miloserdov <a.miloserdov@yadro.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-02 17:41:02 +01:00
Tiago Dias Ferreira	f0ee88e83c	nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000 [ Upstream commit 8d6e38f636ac063e8062a21e7616f7d9bf0df5d8 ] Added a quirk to fix the Netac NV7000 SSD reporting duplicate NGUIDs. Cc: <stable@vger.kernel.org> Signed-off-by: Tiago Dias Ferreira <tiagodfer@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-02 17:41:00 +01:00
Xander Li	a61716cd24	nvme-pci: disable write zeroes on various Kingston SSD [ Upstream commit ac9b57d4e1e3ecf0122e915bbba1bd4c90ec3031 ] Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to process. The firmware version is locked by these SSDs, we can not expect firmware improvement, so disable Write_Zeroes cmd. Signed-off-by: Xander Li <xander_li@kingston.com.tw> Signed-off-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: 8d6e38f636ac ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-02 17:41:00 +01:00
Christoph Hellwig	19b60f3363	nvme-pci: disable namespace identifiers for the MAXIO MAP1001 [ Upstream commit 70ce3455345d056b5fc427c3bb4a3ff4d126b6d5 ] The MAXIO MAP1001 controllers reports completely bogus Namespace identifiers that even change after suspend cycles. Disable using the Identifiers entirely. Reported-by: Arman Hajishafieha <arman.hajishafieha@hotmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by: Arman Hajishafieha <arman.hajishafieha@hotmail.com> Stable-dep-of: 8d6e38f636ac ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-02 17:41:00 +01:00
Bean Huo	d537e19306	nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro [ Upstream commit d5ceb4d1c50786d21de3d4b06c3f43109ec56dd8 ] Added a quirk to fix Micron Nitro NVMe reporting duplicate NGUIDs. Cc: <stable@vger.kernel.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-02 17:41:00 +01:00
Leo Savernik	af03ce894c	nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH [ Upstream commit 41f38043f884c66af4114a7109cf540d6222f450 ] The Micron MTFDKBA2T0TFH device reports the same subsysem NQN for all devices. Add a quick to ignore it. Signed-off-by: Leo Savernik <l.savernik@aon.at> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: d5ceb4d1c507 ("nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-02 17:41:00 +01:00
Keith Busch	b1a27b2aad	nvme: ensure subsystem reset is single threaded commit 1e866afd4bcdd01a70a5eddb4371158d3035ce03 upstream. The subsystem reset writes to a register, so we have to ensure the device state is capable of handling that otherwise the driver may access unmapped registers. Use the state machine to ensure the subsystem reset doesn't try to write registers on a device already undergoing this type of reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:49 +01:00
Keith Busch	bccece3c33	nvme: restrict management ioctls to admin commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream. The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:48 +01:00
Sagi Grimberg	5efed7578d	nvmet: fix workqueue MEM_RECLAIM flushing dependency [ Upstream commit ddd2b8de9f85b388925e7dc46b3890fc1a0d8d24 ] The keep alive timer needs to stay on nvmet_wq, and not modified to reschedule on the system_wq. This fixes a warning: ------------[ cut here ]------------ workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM events:nvmet_keep_alive_timer [nvmet] WARNING: CPU: 3 PID: 1086 at kernel/workqueue.c:2628 check_flush_dependency+0x16c/0x1e0 Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: 8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-29 10:12:56 +02:00
Serge Semin	2f2b84b020	nvme-hwmon: kmalloc the NVME SMART log buffer [ Upstream commit c94b7f9bab22ac504f9153767676e659988575ad ] Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has caused a regression on our platform. It turned out that the nvme_get_log() method invocation caused the nvme_hwmon_data structure instance corruption. In particular the nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with garbage. After some research we discovered that the problem happened even before the actual NVME DMA execution, but during the buffer mapping. Since our platform is DMA-noncoherent, the mapping implied the cache-line invalidations or write-backs depending on the DMA-direction parameter. In case of the NVME SMART log getting the DMA was performed from-device-to-memory, thus the cache-invalidation was activated during the buffer mapping. Since the log-buffer isn't cache-line aligned, the cache-invalidation caused the neighbour data to be discarded. The neighbouring data turned to be the data surrounding the buffer in the framework of the nvme_hwmon_data structure. In order to fix that we need to make sure that the whole log-buffer is defined within the cache-line-aligned memory region so the cache-invalidation procedure wouldn't involve the adjacent data. One of the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the rest of the NVME core driver prefer that method it has been chosen to fix this problem too. Note after a deeper researches we found out that the denoted commit wasn't a root cause of the problem. It just revealed the invalidity by activating the DMA-based NVME SMART log getting performed in the framework of the NVME hwmon driver. The problem was here since the initial commit of the driver. [1] Documentation/core-api/dma-api-howto.rst Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support") Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-29 10:12:56 +02:00
Christoph Hellwig	66c56b2328	nvme-hwmon: consistently ignore errors from nvme_hwmon_init [ Upstream commit 6b8cf94005187952f794c0c4ed3920a1e8accfa3 ] An NVMe controller works perfectly fine even when the hwmon initialization fails. Stop returning errors that do not come from a controller reset from nvme_hwmon_init to handle this case consistently. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Stable-dep-of: c94b7f9bab22 ("nvme-hwmon: kmalloc the NVME SMART log buffer") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-29 10:12:56 +02:00
Varun Prakash	ec8adf767e	nvmet-tcp: add bounds check on Transfer Tag [ Upstream commit b6a545ffa2c192b1e6da4a7924edac5ba9f4ea2b ] ttag is used as an index to get cmd in nvmet_tcp_handle_h2c_data_pdu(), add a bounds check to avoid out-of-bounds access. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-26 12:35:51 +02:00
Keith Busch	1c64328840	nvme: copy firmware_rev on each init [ Upstream commit a8eb6c1ba48bddea82e8d74cbe6e119f006be97d ] The firmware revision can change on after a reset so copy the most recent info each time instead of just the first time, otherwise the sysfs firmware_rev entry may contain stale data. Reported-by: Jeff Lien <jeff.lien@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-26 12:35:51 +02:00
Rishabh Bhatnagar	4bdedc3b53	nvme-pci: set min_align_mask before calculating max_hw_sectors commit 61ce339f19fabbc3e51237148a7ef6f2270e44fa upstream. If swiotlb is force enabled dma_max_mapping_size ends up calling swiotlb_max_mapping_size which takes into account the min align mask for the device. Set the min align mask for nvme driver before calling dma_max_mapping_size while calculating max hw sectors. Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Sagi Grimberg	32aa0b3f0c	nvme-multipath: fix possible hang in live ns resize with ANA access commit 72e3b8883a36e80ebfa41015c7b6926ce31ace05 upstream. When we revalidate paths as part of ns size change (as of commit e7d65803e2bb), it is possible that during the path revalidation, the only paths that is IO capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet informed to the host, which will cause inflight requests to be requeued (as we have available paths but none are IO capable). These requests on the requeue list are waiting for someone to resubmit them at some point. The IO capable paths will eventually notify the ns resize change to the host, but there is nothing that will kick the requeue list to resubmit the queued requests. Fix this by always kicking the requeue list, and if no IO capable path exists, these requests will be queued again. A typical log that indicates that IOs are requeued: -- nvme nvme1: creating 4 I/O queues. nvme nvme1: new ctrl: "testnqn1" nvme nvme2: creating 4 I/O queues. nvme nvme2: mapped 4/0/0 default/read/poll queues. nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009 nvme nvme1: rescanning namespaces. nvme1n1: detected capacity change from 2097152 to 4194304 block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O nvme nvme2: rescanning namespaces. -- Reported-by: Yogev Cohen <yogev@lightbitslabs.com> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Michael Kelley	520e434a08	nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices [ Upstream commit c292a337d0e45a292c301e3cd51c35aa0ae91e95 ] The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are non-functional on NVMe devices because the nvme_pr_clear() and nvme_pr_release() functions set the IEKEY field incorrectly. The IEKEY field should be set only when the key is zero (i.e, not specified). The current code does it backwards. Furthermore, the NVMe spec describes the persistent reservation "clear" function as an option on the reservation release command. The current implementation of nvme_pr_clear() erroneously uses the reservation register command. Fix these errors. Note that NVMe version 1.3 and later specify that setting the IEKEY field will return an error of Invalid Field in Command. The fix will set IEKEY when the key is zero, which is appropriate as these ioctls consider a zero key to be "unspecified", and the intention of the spec change is to require a valid key. Tested on a version 1.4 PCI NVMe device in an Azure VM. Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Fixes: 1d277a637a71 ("NVMe: Add persistent reservation ops") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-05 10:39:43 +02:00
Maurizio Lombardi	a1347be8f0	nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change() [ Upstream commit 478814a5584197fa1fb18377653626e3416e7cd6 ] TCP_FIN_WAIT2 and TCP_LAST_ACK were not handled, the connection is closing so we can ignore them and avoid printing the "unhandled state" warning message. [ 1298.852386] nvmet_tcp: queue 2 unhandled state 5 [ 1298.879112] nvmet_tcp: queue 7 unhandled state 5 [ 1298.884253] nvmet_tcp: queue 8 unhandled state 5 [ 1298.889475] nvmet_tcp: queue 9 unhandled state 5 v2: Do not call nvmet_tcp_schedule_release_queue(), just ignore the fin_wait2 and last_ack states. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-20 12:39:45 +02:00
Dennis Maisenbacher	a1d7c8647c	nvmet: fix mar and mor off-by-one errors [ Upstream commit b7e97872a65e1d57b4451769610554c131f37a0a ] Maximum Active Resources (MAR) and Maximum Open Resources (MOR) are 0's based vales where a value of 0xffffffff indicates that there is no limit. Decrement the values that are returned by bdev_max_open_zones and bdev_max_active_zones as the block layer helpers are not 0's based. A 0 returned by the block layer helpers indicates no limit, thus convert it to 0xffffffff (U32_MAX). Fixes: aaf2e048af27 ("nvmet: add ZBD over ZNS backend support") Suggested-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-15 11:30:06 +02:00
Sagi Grimberg	8589bbfad2	nvme-tcp: fix regression that causes sporadic requests to time out [ Upstream commit 3770a42bb8ceb856877699257a43c0585a5d2996 ] When we queue requests, we strive to batch as much as possible and also signal the network stack that more data is about to be sent over a socket with MSG_SENDPAGE_NOTLAST. This flag looks at the pending requests queued as well as queue->more_requests that is derived from the block layer last-in-batch indication. We set more_request=true when we flush the request directly from .queue_rq submission context (in nvme_tcp_send_all), however this is wrongly assuming that no other requests may be queued during the execution of nvme_tcp_send_all. Due to this, a race condition may happen where: 1. request X is queued as !last-in-batch 2. request X submission context calls nvme_tcp_send_all directly 3. nvme_tcp_send_all is preempted and schedules to a different cpu 4. request Y is queued as last-in-batch 5. nvme_tcp_send_all context sends request X+Y, however signals for both MSG_SENDPAGE_NOTLAST because queue->more_requests=true. ==> none of the requests is pushed down to the wire as the network stack is waiting for more data, both requests timeout. To fix this, we eliminate queue->more_requests and only rely on the queue req_list and send_list to be not-empty. Fixes: 122e5b9f3d37 ("nvme-tcp: optimize network stack with setting msg flags according to batch size") Reported-by: Jonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Jonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-15 11:30:06 +02:00
Sagi Grimberg	13c80a6c11	nvme-tcp: fix UAF when detecting digest errors [ Upstream commit 160f3549a907a50e51a8518678ba2dcf2541abea ] We should also bail from the io_work loop when we set rd_enabled to true, so we don't attempt to read data from the socket when the TCP stream is already out-of-sync or corrupted. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-15 11:30:06 +02:00
Bart Van Assche	ebf46da50b	nvmet: fix a use-after-free commit 6a02a61e81c231cc5c680c5dbf8665275147ac52 upstream. Fix the following use-after-free complaint triggered by blktests nvme/004: BUG: KASAN: user-memory-access in blk_mq_complete_request_remote+0xac/0x350 Read of size 4 at addr 0000607bd1835943 by task kworker/13:1/460 Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] Call Trace: show_stack+0x52/0x58 dump_stack_lvl+0x49/0x5e print_report.cold+0x36/0x1e2 kasan_report+0xb9/0xf0 __asan_load4+0x6b/0x80 blk_mq_complete_request_remote+0xac/0x350 nvme_loop_queue_response+0x1df/0x275 [nvme_loop] __nvmet_req_complete+0x132/0x4f0 [nvmet] nvmet_req_complete+0x15/0x40 [nvmet] nvmet_execute_io_connect+0x18a/0x1f0 [nvmet] nvme_loop_execute_work+0x20/0x30 [nvme_loop] process_one_work+0x56e/0xa70 worker_thread+0x2d1/0x640 kthread+0x183/0x1c0 ret_from_fork+0x1f/0x30 Cc: stable@vger.kernel.org Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-15 11:30:02 +02:00
Christoph Hellwig	dd2ee2fd1f	block: add a bdev_max_zone_append_sectors helper commit 2aba0d19f4d8c8929b4b3b94a9cfde2aa20e6ee2 upstream Add a helper to check the max supported sectors for zone append based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:34 +02:00
Sagi Grimberg	a600ed25e3	nvmet-tcp: fix lockdep complaint on nvmet_tcp_wq flush during queue teardown [ Upstream commit 533d2e8b4d5e4c89772a0adce913525fb86cbbee ] We probably need nvmet_tcp_wq to have MEM_RECLAIM as we are sending/receiving for the socket from works on this workqueue. Also this eliminates lockdep complaints: -- [ 6174.010200] workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_tcp_release_queue_work [nvmet_tcp] is flushing !WQ_MEM_RECLAIM nvmet_tcp_wq:nvmet_tcp_io_work [nvmet_tcp] [ 6174.010216] WARNING: CPU: 20 PID: 14456 at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-25 11:40:39 +02:00
Christoph Hellwig	54e5b14c9b	nvme: catch -ENODEV from nvme_revalidate_zones again [ Upstream commit e06b425bc835ead08b9fd935bf5e47eef473e7a0 ] nvme_revalidate_zones can also return -ENODEV if e.g. zone sizes aren't constant or not a power of two. In that case we should jump to marking the gendisk hidden and only support pass through. Fixes: 602e57c9799c ("nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info") Reported-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:00 +02:00
Christoph Hellwig	a3f6aeba67	nvme: don't return an error from nvme_configure_metadata [ Upstream commit 363f6368603743072e5f318c668c632bccb097a3 ] When a fabrics controller claims to support an invalidate metadata configuration we already warn and disable metadata support. No need to also return an error during revalidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:00 +02:00
Keith Busch	92a6233585	nvme: disable namespace access for unsupported metadata [ Upstream commit d39ad2a45c0e38def3e0c95f5b90d9af4274c939 ] The only fabrics target that supports metadata handling through the separate integrity buffer is RDMA. It is currently usable only if the size is 8B per block and formatted for protection information. If an rdma target were to export a namespace with a different format (ex: 4k+64B), the driver will not be able to submit valid read/write commands for that namespace. Suppress setting the metadata feature in the namespace so that the gendisk capacity will be set to 0. This will prevent read/write access through the block stack, but will continue to allow ioctl passthrough commands. Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:00 +02:00
Nick Bowler	52cd55a4fb	nvme: define compat_ioctl again to unbreak 32-bit userspace. [ Upstream commit a25d4261582cf00dad884c194d21084836663d3d ] Commit 89b3d6e60550 ("nvme: simplify the compat ioctl handling") removed the initialization of compat_ioctl from the nvme block_device_operations structures. Presumably the expectation was that 32-bit ioctls would be directed through the regular handler but this is not the case: failing to assign .compat_ioctl actually means that the compat case is disabled entirely, and any attempt to submit nvme ioctls from 32-bit userspace fails outright with -ENOTTY. For example: % smartctl -x /dev/nvme0n1 [...] Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device The blkdev_compat_ptr_ioctl helper can be used to direct compat calls through the main ioctl handler and makes things work again. Fixes: 89b3d6e60550 ("nvme: simplify the compat ioctl handling") Signed-off-by: Nick Bowler <nbowler@draconx.ca> Reviewed-by: Guixin Liu <kanie@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:00 +02:00
Bean Huo	34552bf35f	nvme: use command_id instead of req->tag in trace_nvme_complete_rq() [ Upstream commit 679c54f2de672b7d79d02f8c4ad483ff6dd8ce2e ] Use command_id instead of req->tag in trace_nvme_complete_rq(), because of commit e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release"), cmd->common.command_id is set to ((genctl & 0xf)< 12 \| req->tag), no longer req->tag, which makes cid in trace_nvme_complete_rq and trace_nvme_setup_cmd are not the same. Fixes: e7006de6c238 ("nvme: code command_id with a genctr for use authentication after release") Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:00 +02:00
Israel Rukshin	98d81b2b15	nvme: fix block device naming collision [ Upstream commit 6961b5e02876b3b47f030a1f1ee8fd3e631ac270 ] The issue exists when multipath is enabled and the namespace is shared, but all the other controller checks at nvme_is_unique_nsid() are false. The reason for this issue is that nvme_is_unique_nsid() returns false when is called from nvme_mpath_alloc_disk() due to an uninitialized value of head->shared. The patch fixes it by setting head->shared before nvme_mpath_alloc_disk() is called. Fixes: 5974ea7ce0f9 ("nvme: allow duplicate NSIDs for private namespaces") Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-29 17:25:12 +02:00
Christoph Hellwig	321abf90c5	nvme: check for duplicate identifiers earlier [ Upstream commit e2d77d2e11c4f1e70a1a24cc8fe63ff3dc9b53ef ] Lift the check for duplicate identifiers into nvme_init_ns_head, which avoids pointless error unwinding in case they don't match, and also matches where we check identifier validity for the multipath case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-29 17:25:12 +02:00
Keith Busch	c01793517d	nvme-pci: phison e16 has bogus namespace ids [ Upstream commit 73029c9b23cf1213e5f54c2b59efce08665199e7 ] Add the quirk. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049 Reported-by: Chris Egolf <cegolf@ugholf.net> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:24:41 +02:00
Ruozhu Li	7a2294c5f2	nvme: fix regression when disconnect a recovering ctrl [ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ] We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem. Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout") Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:24:36 +02:00
Sagi Grimberg	1e4427aa2f	nvme-tcp: always fail a request when sending it failed [ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ] queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not. We are perfectly safe to just fail it, the request will not be cancelled later on. This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all' Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-07-21 21:24:35 +02:00
Lamarque Vieira Souza	526b53192d	nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1 commit e1c70d79346356bb1ede3f79436df80917845ab9 upstream. ADATA IM2P33F8ABR1 reports bogus eui64 values that appear to be the same across all drives. Quirk them out so they are not marked as "non globally unique" duplicates. Co-developed-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Lamarque V. Souza <lamarque.souza@petrosoftdesign.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>	2022-07-07 17:53:24 +02:00
Pablo Greco	58caf60ce2	nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G) commit 1629de0e0373e04d68e88e6d9d3071fbf70b7ea8 upstream. ADATA XPG SPECTRIX S40G drives report bogus eui64 values that appear to be the same across drives in one system. Quirk them out so they are not marked as "non globally unique" duplicates. Before: [ 2.258919] nvme nvme1: pci function 0000:06:00.0 [ 2.264898] nvme nvme2: pci function 0000:05:00.0 [ 2.323235] nvme nvme1: failed to set APST feature (2) [ 2.326153] nvme nvme2: failed to set APST feature (2) [ 2.333935] nvme nvme1: allocated 64 MiB host memory buffer. [ 2.336492] nvme nvme2: allocated 64 MiB host memory buffer. [ 2.339611] nvme nvme1: 7/0/0 default/read/poll queues [ 2.341805] nvme nvme2: 7/0/0 default/read/poll queues [ 2.346114] nvme1n1: p1 [ 2.347197] nvme nvme2: globally duplicate IDs for nsid 1 After: [ 2.427715] nvme nvme1: pci function 0000:06:00.0 [ 2.427771] nvme nvme2: pci function 0000:05:00.0 [ 2.488154] nvme nvme2: failed to set APST feature (2) [ 2.489895] nvme nvme1: failed to set APST feature (2) [ 2.498773] nvme nvme2: allocated 64 MiB host memory buffer. [ 2.500587] nvme nvme1: allocated 64 MiB host memory buffer. [ 2.504113] nvme nvme2: 7/0/0 default/read/poll queues [ 2.507026] nvme nvme1: 7/0/0 default/read/poll queues [ 2.509467] nvme nvme2: Ignoring bogus Namespace Identifiers [ 2.512804] nvme nvme1: Ignoring bogus Namespace Identifiers [ 2.513698] nvme1n1: p1 Signed-off-by: Pablo Greco <pgreco@centosproject.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-07 17:53:24 +02:00

1 2 3 4 5 ...

2582 Commits