IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit d3205ab75e99a47539ec91ef85ba488f4ddfeaa9 ]
The device can report discard support without setting the ONCS DSM bit.
When not set, the driver clears max_discard_size expecting it to be set
later. We don't know the size until we have the namespace format,
though, so setting it is deferred until configuring one, but the driver
was abandoning the discard settings due to that initial clearing.
Move the max_discard_size calculation above the check for a '0' discard
size.
Fixes: 1a86924e4f46475 ("nvme: fix interpretation of DMRSL")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9d2789ac9d60c049d26ef6d3005d9c94c5a559e9 upstream.
io_uring_cmd_done() currently assumes that the uring_lock is held
when invoked, and while it generally is, this is not guaranteed.
Pass in the issue_flags associated with it, so that we have
IO_URING_F_UNLOCKED available to be able to lock the CQ ring
appropriately when completing events.
Cc: stable@vger.kernel.org
Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6173a77b7e9d3e202bdb9897b23f2a8afe7bf286 ]
An nvme target ->queue_response() operation implementation may free the
request passed as argument. Such implementation potentially could result
in a use after free of the request pointer when percpu_ref_put() is
called in nvmet_req_complete().
Avoid such problem by using a local variable to save the sq pointer
before calling __nvmet_req_complete(), thus avoiding dereferencing the
req pointer after that function call.
Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 37f0dc2ec78af0c3f35dd05578763de059f6fe77 ]
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.
Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.
Fix the issue by building range from request sector/nr_sectors directly.
Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26a57cb35548ae67c14871cccbf50da3edb01ea4 ]
The kernel always logs the unique subsystem name for a discovery
controller, even in the case user space asked for the well known.
This has lead to confusion as the logs of nvme-cli and the kernel
logs didn't match.
First, nvme-cli connects to the well known discovery controller to
figure out if it supports TP8013. If so then nvme-cli disconnects and
connects to the unique discovery controller. Currently, the kernel show
that user space connected twice to the unique one.
To avoid further confusion, show the well known discovery controller if
user space asked for it:
$ nvme connect-all -v -t tcp -a 192.168.0.1
nvme0: nqn.2014-08.org.nvmexpress.discovery connected
nvme0: nqn.2014-08.org.nvmexpress.discovery disconnected
nvme0: nqn.discovery connected
kernel log:
nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.1:8009
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme0: new ctrl: NQN "nqn.discovery", addr 192.168.0.1:8009
Fixes: e5ea42faa773 ("nvme: display correct subsystem NQN")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76d54bf20cdcc1ed7569a89885e09636e9a8d71d ]
While the error recovery work is temporarily failing reconnect attempts,
running the 'nvme list' command causes a kernel NULL pointer dereference
by calling getsockname() with a released socket.
During error recovery work, the nvme tcp socket is released and a new one
created, so it is not safe to access the socket without proper check.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Fixes: 02c57a82c008 ("nvme-tcp: print actual source IP address through sysfs "address" attr")
Reviewed-by: Martin Belanger <martin.belanger@dell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dd6fff2aad4e35633fef1ea72838bec5b47559a ]
Bring back the check of the Identify Namespace return value for the
legacy NVMe 1.0-style sequential scanning. While NVMe 1.0 does not
support namespace management, there are "modern" cloud solutions like
Google Cloud Platform that claim the obsolete 1.0 compliance for no
good reason while supporting proprietary sideband namespace management.
Fixes: 1a893c2bfef4 ("nvme: refactor namespace probing")
Reported-by: Nils Hanke <nh@edgeless.systems>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Nils Hanke <nh@edgeless.systems>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e917a849c3fc317c4a5f82bb18726000173d39e6 upstream.
The sysfs group containing the cmb attributes is registered before the
driver knows if they need to be visible or not. Update the group when
cmb attributes are known to exist so the visibility setting is correct.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037
Fixes: 86adbf0cdb9ec65 ("nvme: simplify transport specific device attribute handling")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 91c11d5f32547a08d462934246488fe72f3d44c3 ]
when starting error recovery there might be a authentication work
running, and it involves I/O commands. Given the controller is tearing
down there is no chance for the I/O to complete other than timing out
which may unnecessarily take a full io timeout.
So first tear down the queues, fail/cancel all inflight I/O (including
potentially authentication) and only then stop authentication. This
ensures that failover is not stalled due to blocked authentication I/O.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f1a4f89562d3b33b6ca4fc8a4f3bd4cd35ab4ea ]
when starting error recovery there might be a authentication work
running, and it involves I/O commands. Given the controller is tearing
down there is no chance for the I/O to complete other than timing out
which may unnecessarily take a full io timeout.
So first tear down the queues, fail/cancel all inflight I/O (including
potentially authentication) and only then stop authentication. This
ensures that failover is not stalled due to blocked authentication I/O.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6fbf13c0e24fd86ab2e4477cd8484a485b687421 ]
In nvme_alloc_io_tag_set(), the connect_q pointer should be set to NULL
in case of error to avoid potential invalid pointer dereferences.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd62678ab55cb01e11a404d302cdade222bf4022 ]
If nvme_alloc_admin_tag_set() fails, the admin_q and fabrics_q pointers
are left with an invalid, non-NULL value. Other functions may then check
the pointers and dereference them, e.g. in
nvme_probe() -> out_disable: -> nvme_dev_remove_admin().
Fix the bug by setting admin_q and fabrics_q to NULL in case of error.
Also use the set variable to free the tag_set as ctrl->admin_tagset isn't
initialized yet.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cab4404874f2de52617de8400c844891c6ea1ce ]
As part of nvmet_fc_ls_create_association there is a case where
nvmet_fc_alloc_target_queue fails right after a new association with an
admin queue is created. In this case, no one releases the get taken in
nvmet_fc_alloc_target_assoc. This fix is adding the missing put.
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de4eda9de2d957ef2d6a8365a01e26a435e958cb ]
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.
Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stable-dep-of: 6dd88fd59da8 ("vhost-scsi: unbreak any layout for response")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0a4a1eafbd48e02829045bba3e6163c03037276 ]
NVMe controller register access hangs indefinitely when the co-processor
is not running. A missed reset is preferable over a hanging thread since
it could be recoverable.
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85eee6341abb81ac6a35062ffd5c3029eb53be6b ]
The namespace head saves the Command Set Indicator enum, so use that
instead of the Command Set Selected. The two values are not the same.
Fixes: 831ed60c2aca2d ("nvme: also return I/O command effects from nvme_command_effects")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98e3528012cd571c48bbae7c7c0f868823254b6c ]
ctrl->ops is used by nvme_alloc_admin_tag_set() but set by
nvme_init_ctrl() so reorder the calls to avoid a NULL pointer
dereference.
Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db45e1a5ddccc034eb60d62fc5352022d7963ae2 ]
All nvme transports should be using the same flags for their tagsets,
with the exception for the blocking flag that should only be set for
transports that can block in ->queue_rq.
Add a NVME_F_BLOCKING flag to nvme_ctrl_ops to control the blocking
behavior and lift setting the flags into nvme_alloc_{admin,io}_tag_set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Stable-dep-of: 98e3528012cd ("nvme-fc: fix initialization order")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 86adbf0cdb9ec6533234696c3e243184d4d0d040 ]
Allow the transport driver to override the attribute groups for the
control device, so that the PCIe driver doesn't manually have to add a
group after device creation and keep track of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
Stable-dep-of: 98e3528012cd ("nvme-fc: fix initialization order")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c5842085851f786eba24a39ecd02650ad892064 ]
Polling the completion can progress the request state to IDLE, either
inline with the completion, or through softirq. Either way, the state
may not be COMPLETED, so don't check for that. We only care if the state
isn't IN_FLIGHT.
This is fixing an issue where the driver aborts an IO that we just
completed. Seeing the "aborting" message instead of "polled" is very
misleading as to where the timeout problem resides.
Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 613b14884b8595e20b9fac4126bf627313827fbe upstream.
This can't happen right now, but in preparation for allowing
bio_split_to_limits() returning NULL if it ended the bio, check for it
in all the callers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 831ed60c2aca2d7c517b2da22897a90224a97d27 ]
To be able to use the Commands Supported and Effects Log for allowing
unprivileged passtrough, it needs to be corretly reported for I/O
commands as well. Return the I/O command effects from
nvme_command_effects, and also add a default list of effects for the
NVM command set. For other command sets, the Commands Supported and
Effects log is required to be present already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61f37154c599cf9f2f84dcbd9be842f8645a7099 ]
Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a
single value to multiple array entries instead of repeated assignments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2a459f6933e1c459bffb7cc73fd6c900edc714bd ]
Mask out the "Command Supported" and "Logical Block Content Change" bits
and only defer execution of commands that have non-trivial effects to
the workqueue for synchronous execution. This allows to execute admin
commands asynchronously on controllers that provide a Command Supported
and Effects log page, and will keep allowing to execute Write commands
asynchronously once command effects on I/O commands are taken into
account.
Fixes: c1fef73f793b ("nvmet: add passthru code to process commands")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 841734234a28fd5cd0889b84bd4d93a0988fa11e ]
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE,
which may be smaller than the PAGE_SIZE.
Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c89a529e823d51dd23c7ec0c047c7a454a428541 ]
Convert the max size to bytes to match the units of the divisor that
calculates the worst-case number of PRP entries.
The result is used to determine how many PRP Lists are required. The
code was previously rounding this to 1 list, but we can require 2 in the
worst case. In that scenario, the driver would corrupt memory beyond the
size provided by the mempool.
While unlikely to occur (you'd need a 4MB in exactly 127 phys segments
on a queue that doesn't support SGLs), this memory corruption has been
observed by kfence.
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5f96cb719d8ba220b565ddd3ba4ac0d8bcfb130 ]
When using shadow doorbells, the event index and the doorbell values are
written to host memory. Prior to this patch, the values written would
erroneously be written in host endianness. This causes trouble on
big-endian platforms. Fix this by adding missing endian conversions.
This issue was noticed by Guenter while testing various big-endian
platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is
up for review as well[2].
[1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/
[2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/
Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01604350e14560d4d69323eb1ba12a257a643ea8 ]
Replace ctrl ctrl_key/host_key only after nvme_auth_generate_key is successful.
Also, this fixes a bug where the keys are leaked.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcef77274ae52136925287b6b59d5c6e6a4adfb9 ]
Don't look at ctrl->ops as only RDMA and TCP actually support multiple
maps.
Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers")
Fixes: ceee1953f923 ("nvme-loop: use the tagset alloc/free helpers")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bcaf434b8f04e1ee82a8b1e1bce0de99fbff67fa ]
In nvme_init_non_mdts_limits function we were returning 0 when kzalloc
failed; it now returns -ENOMEM.
Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")
Signed-off-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa8f9ac42350edd3ce82d0d148a60f0fa088f995 ]
There is no need to have a separate slab cache for each namespace,
and having separate ones creates duplicate debugs file names as well.
Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOSTGAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpkW4D/49G+WuEFbBE4kM2Jk56tDgdNH611jsetvk
k5MmaK62FkAGBfoNl6pRpiqpV/MyJyS//SytyJpsv1Fj7InNkpEzbI7cxvbflm4t
D4/7Pg9VZgtNwrtq2M2t5NeM28scFFjQq3buzYGM6iKrwfcsLagkKiVU7cx0kTEl
7hzlG2t/FDwBLWCmDSRHVKMB3JJa5hIxpnZklHBmNBpmNh9rl4F2hCwpmi5x+0t+
qyti+1PRSknEQKspCMNcZvZwVmz0G3QZh2xYWNPkL0fxdQ7hpM65SV5DUs3SLjAr
FUt9UsvgTdeZ8uhfS1Ft6KgjM9x1hiZx0UYwASQxRdz7fhoG7ygRK9KY5r5v1cbr
lcUdwl5NJkPllDm5CZNCXMQYJlYMuA7J1VAMG+IZ/Iu5XiEFaEmOEzNrmmW0NZ57
5Z+2isfo24GGhRk78ryjuqXuwMhM3+DaYeS9+9/h84JcldUtrglOlG6CzX0sHhch
8xVCN3JVYc9/uWmIwb6QSIEZKNlsqkbiv5Gru1uu2pzX8MtuyC21rIIh8AUOSFl+
740prC6//wUxDcOHrA0aphubQADImi9RF5J5+40lE1NxSnAz1nMisZ1G7ywIwb+j
WjFbzW5p7ddO3DZFV+FENZ4QKFTDsR+3/tbbNdQpSmGEKk/KoT1jZyOVnoHsBSkd
Q7B23nEe8w==
=JkGh
-----END PGP SIGNATURE-----
Merge tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
"A small fix for initializing the NVMe quirks before initializing the
subsystem"
* tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux:
nvme initialize core quirks before calling nvme_init_subsystem
A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN
(such as Samsung X5) but it would still give a:
"missing or invalid SUBNQN field"
warning as core quirks are filled after calling nvme_init_subnqn. Fill
ctrl->quirks from struct core_quirks before calling nvme_init_subsystem
to fix this.
Tested on a Samsung X5.
Fixes: ab9e00cc72fa ("nvme: track subsystems")
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOKM1MQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgprErD/4vyIhYg4ZM9HOWNjpuT8oZCG6yRZ4gLhz0
GT7VRcb8GKEkKUMmeazaxocWbC3fc+yvj49Oan1Uj7/teHTmJDM0pF/fMpJdkJrF
z+PAy2++MGF++QNBq+wrDEIDsJ4QvRxDDJe9N+KDTtX6UsoBFYxJhem4JzZpM4BI
4GY8jYiKlx42WM58stZ0DXOucG1DsKaOQKYRQGjtKYvA0dTn7dj9btY+n6rGerEX
4265huzW5iY+MZWc5KLXGSr0wIJqAiKMoecN03JSBHONFVB4cjMQpZuQfSChqkUS
3fhVmFOZnYMzMIZgiwhFxuIP/QzLjctdibwU9JusqChYP9Mx7HQ2+gs7H7i5PSdS
9m64g2u+GuRjbgIeeGPVMPnBR3UG2GE8BDRfFBBCtbdmHXIKoolXdKvG9enRjXit
e4wjGQDHk6x9iV6LITH1Jn82kzk6TTuBkdSBJN6u8KASeOCoPwWuhgyRXo6+jh5D
1wd2mYxtM1UB2mZilPpflDSpzZCrp/CMjbLVPIV0aTxmmeEJN+Ao2PnduNjEBxoh
kYwlScoz9DPvMf59UU45MLc9/vYchL14VoPOl59osLlQrWf9vPMATlU1CaRgQSVa
apBNAMzWFTMGxXCtIsUoClNX7uuHrqrMEjBbhWuWp4DSOVQoJORrU5ymX9M92MYP
f0incJSEZQ==
=Gdkx
-----END PGP SIGNATURE-----
Merge tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just a small NVMe merge for this week, fixing protection of the name
space list, and a missing clear of a reserved field when unused"
* tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux:
nvme: fix SRCU protection of nvme_ns_head list
nvme-pci: clear the prp2 field when not used
If the prp2 field is not filled in nvme_setup_prp_simple(), the prp2
field is garbage data. According to nvme spec, the prp2 is reserved if
the data transfer does not cross a memory page boundary, so clear it to
zero if it is not used.
Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmN38ZUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpgXxD/9tUSFUKIVGIn4pmNILfY3XV45HOi1w44yR
zCxCELupcBeT+YixmaJcT8sunrrg2fLPOXMrDJk1cG/izXHzkjAQsHZvERfqC7hC
f5onH+2MyGm3qBwxV0iGqITJgTwQGInVJijT4f9UZd/8ultymyZR2nOdIdIydHCF
qzlOjq6hgIuGKHhFgOqRUg/OAkx510ZEEilUDcZ6XVV+zL7ccN6J9+eNTI3c58wT
7jvxZC4u6QGKteGvVniE3WXgk3QdFiQRORvV09g+PkbG/vPjAIZ5tJFb9PdIOebD
3guDiNUasgz2vnDetMK+yk4LcedcRfWnqgn+Vm8C26j5Fxs13eDx5kMDteVy7CYh
3bokOATHohoZZ9qTApgQUswTfGJfBdoy0nUTPuffxPdKDyUPteIxFCADcnyDHnDG
d/+PjU3FKF31o2HcUfvYp7OMO0VZP0hJSWps8znoVXKxb+LH9qKkYzHVlfni5kkS
k9XqqD1Ki98Erb346YqgvQjCkz+CUd5DxtGyh9Oh2+oS2qHP6WjdKo1QPFmWD5dp
EyXGSqGoZrIPtnKohLUN9EiVXanRQWJr3L0gw2CYXpmwfSKfMC3CQraEC1jOc01l
TfsLJGbl3L5XpLzxoBwDu44cqp+VvbalergdcmsDTLDFHhONY2g5LJh6C9/EDdnQ
Cde1uHikGw==
=sOGG
-----END PGP SIGNATURE-----
Merge tag 'block-6.1-2022-11-18' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Two more bogus nid quirks (Bean Huo, Tiago Dias Ferreira)
- Memory leak fix in nvmet (Sagi Grimberg)
- Regression fix for block cgroups pinning the wrong blkcg, causing
leaks of cgroups and blkcgs (Chris)
- UAF fix for drbd setup error handling (Dan)
- Fix DMA alignment propagation in DM (Keith)
* tag 'block-6.1-2022-11-18' of git://git.kernel.dk/linux:
dm-log-writes: set dma_alignment limit in io_hints
dm-integrity: set dma_alignment limit in io_hints
block: make blk_set_default_limits() private
dm-crypt: provide dma_alignment limit in io_hints
block: make dma_alignment a stacking queue_limit
nvmet: fix a memory leak in nvmet_auth_set_key
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV7000
drbd: use after free in drbd_create_device()
nvme-pci: add NVME_QUIRK_BOGUS_NID for Micron Nitro
blk-cgroup: properly pin the parent in blkcg_css_online
Since model_number is allocated before it needs to be freed before
kmemdump_nul.
Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Aleksandr Miloserdov <a.miloserdov@yadro.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The driver is spamming the kernel logs for entirely harmless errors from
user space submitting unsupported commands. Just silence the errors.
The application has direct access to command status, so there's no need
to log these.
And since every passthrough command now uses the quiet flag, move the
setting to the common initializer.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmNcRTYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpkc2D/4yxjQ+lAmXLrqZOGAZc+r2GpiCFsgFcpKP
i3ezeeo3zmdaoUH778DcbPo0oeWY/iIvV2RDo3/0PBHIlGL43W9e7zsnauRxUwtw
/Aj140Tsm5/lKnBy8n0nT+DO4LE22JnBHi5XjlFELBwM+deBxS+izinFtcOQC8sj
58XWSmKag/Lv5JLvcYMj+PprtGOKzfNAacXvTjouy0IlEyb9E/yPMELS8lWFv8i2
QELtvuEDxODpQtA+Ph0O/o00A8Fg/lC4EH5uvExFMr8k74CGFm32Bar1UuaJ9QAs
5b8wateTra51yOGW3NEl1ph+4qVe9e4mutrLOFrChYylk5LePOVAki3wYb3lREiU
rTOEKzUj3P/LHLpl4els0yIQ0gHXs60/M/Vn3TC50+2DnV00qEfvaocZ8vtXOux4
YR+2cKUxk2CRNyj2BB3WRlrIkCIVk+ehl17E2cdrg0m8SMqk0GAYbpXD753L9uiy
I7IQEqYB+op501pmTcVskFUfW9ozT96YD53fwSOTR/pEK+esHN0GfqxI6lcA6Q0O
M2AWEiu8t1PbSONVH/p895gfgGHdRHl6zgvR+ADJMDEmc7dpEoAxsoTj4HIirXbe
sGHi7ycrQR6aLdHahjCukjUVkZkuhXJkAQmq2XURJgmEcz7iJme23WqtWWUUoQvi
pk6e1RSqSA==
=Zfnu
-----END PGP SIGNATURE-----
Merge tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- make the multipath dma alignment match the non-multipath one
(Keith Busch)
- fix a bogus use of sg_init_marker() (Nam Cao)
- fix circulr locking in nvme-tcp (Sagi Grimberg)
- Initialization fix for requests allocated via the special hw queue
allocator (John)
- Fix for a regression added in this release with the batched
completions of end_io backed requests (Ming)
- Error handling leak fix for rbd (Yang)
- Error handling leak fix for add_disk() failure (Yu)
* tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux:
blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
blk-mq: don't add non-pt request with ->end_io to batch
rbd: fix possible memory leak in rbd_sysfs_init()
nvme-multipath: set queue dma alignment to 3
nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
nvme-tcp: replace sg_init_marker() with sg_init_table()
block: fix memory leak for elevator on add_disk failure
NVMe spec requires all transports support dword aligned addresses, which
is already set in the namespace request_queue. Set the same limit in the
multipath device's request_queue as well.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When destroying a queue, when calling sock_release, the network stack
might need to allocate an skb to send a FIN/RST. When that happens
during memory pressure, there is a need to reclaim memory, which
in turn may ask the nvme-tcp device to write out dirty pages, however
this is not possible due to a ctrl teardown that is going on.
Set PF_MEMALLOC to the task that releases the socket to grant access
to PF_MEMALLOC reserves. In addition, do the same for the nvme-tcp
thread as this may also originate from the swap itself and should
be more resilient to memory pressure situations.
This fixes the following lockdep complaint:
--
======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc2+ #25 Tainted: G W
------------------------------------------------------
kswapd0/92 is trying to acquire lock:
ffff888114003240 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_sendpage+0x23/0xa0
but task is already holding lock:
ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0x11e/0x160
kmem_cache_alloc_node+0x44/0x530
__alloc_skb+0x158/0x230
tcp_send_active_reset+0x7e/0x730
tcp_disconnect+0x1272/0x1ae0
__tcp_close+0x707/0xd90
tcp_close+0x26/0x80
inet_release+0xfa/0x220
sock_release+0x85/0x1a0
nvme_tcp_free_queue+0x1fd/0x470 [nvme_tcp]
nvme_do_delete_ctrl+0x130/0x13d [nvme_core]
nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
kernfs_fop_write_iter+0x356/0x530
vfs_write+0x4e8/0xce0
ksys_write+0xfd/0x1d0
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
__lock_acquire+0x2a0c/0x5690
lock_acquire+0x18e/0x4f0
lock_sock_nested+0x37/0xc0
tcp_sendpage+0x23/0xa0
inet_sendpage+0xad/0x120
kernel_sendpage+0x156/0x440
nvme_tcp_try_send+0x48a/0x2630 [nvme_tcp]
nvme_tcp_queue_rq+0xefb/0x17e0 [nvme_tcp]
__blk_mq_try_issue_directly+0x452/0x660
blk_mq_plug_issue_direct.constprop.0+0x207/0x700
blk_mq_flush_plug_list+0x6f5/0xc70
__blk_flush_plug+0x264/0x410
blk_finish_plug+0x4b/0xa0
shrink_lruvec+0x1263/0x1ea0
shrink_node+0x736/0x1a80
balance_pgdat+0x740/0x10d0
kswapd+0x5f2/0xaf0
kthread+0x256/0x2f0
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(sk_lock-AF_INET-NVME);
lock(fs_reclaim);
lock(sk_lock-AF_INET-NVME);
*** DEADLOCK ***
3 locks held by kswapd0/92:
#0: ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0
#1: ffff88811f21b0b0 (q->srcu){....}-{0:0}, at: blk_mq_flush_plug_list+0x6b3/0xc70
#2: ffff888170b11470 (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0xeb9/0x17e0 [nvme_tcp]
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>