IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Tested both with Corsairs firmware 11.3 and 13.0 for the Corsairs MP600
and both have the issue as reported by the kernel.
nvme nvme0: missing or invalid SUBNQN field.
Signed-off-by: Claus Stovgaard <claus.stovgaard@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The "list" of nvme_ns_head is used as rcu list, now in nvme_init_ns_head
list_add_tail is used to add ns->siblings to the rcu list. It is not safe.
Should use list_add_tail_rcu instead of list_add_tail.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme_round_robin_path() should test if the return ns pointer is valid.
nvme_next_ns() will return a NULL pointer if there is no path left.
Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Properly unwind step by step using refactored helpers from nvme_unmap_data
to avoid a potential double dma_unmap on a mapping failure.
Fixes: 7fe07d14f71f ("nvme-pci: merge nvme_free_iod into nvme_unmap_data")
Reported-by: Marc Orr <marcorr@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Marc Orr <marcorr@google.com>
Split out three helpers from nvme_unmap_data that will allow finer grained
unwinding from nvme_map_data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Marc Orr <marcorr@google.com>
Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled
by the host.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
[hch: avoid a local variable and add a comment]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Each name space has a request queue, if complete request long time,
multi request queues may have time out requests at the same time,
nvme_tcp_timeout will execute concurrently. Multi requests in different
request queues may be queued in the same tcp queue, multi
nvme_tcp_timeout may call nvme_tcp_stop_queue at the same time.
The first nvme_tcp_stop_queue will clear NVME_TCP_Q_LIVE and continue
stopping the tcp queue(cancel io_work), but the others check
NVME_TCP_Q_LIVE is already cleared, and then directly complete the
requests, complete request before the io work is completely canceled may
lead to a use-after-free condition.
Add a multex lock to serialize nvme_tcp_stop_queue.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A crash happens when inject completing request long time(nearly 30s).
Each name space has a request queue, when inject completing request long
time, multi request queues may have time out requests at the same time,
nvme_rdma_timeout will execute concurrently. Multi requests in different
request queues may be queued in the same rdma queue, multi
nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time.
The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue
stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE
is already cleared, and then directly complete the requests, complete
request before the qp is fully drained may lead to a use-after-free
condition.
Add a multex lock to serialize nvme_rdma_stop_queue.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Tested-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
According to NVMe spec v1.4, section 8.3.1, the PRINFO bit and
the metadata size play a vital role in deteriming the host buffer size.
If PRIFNO bit is set and MS==8, the host doesn't add the metadata buffer,
instead the controller adds it.
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Discovery controllers usually don't support smart log page command.
So when we connect to the discovery controller we see this warning:
nvme nvme0: Failed to read smart log (error 24577)
nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.123.1:8009
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Introduce a new helper to understand if the controller is a discovery
controller and use this helper to skip nvme_init_hwmon (also use it in
other places that we check if the controller is a discovery controller).
Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When a bio merges, we can get a request that spans multiple
bios, and the overall request payload size is the sum of
all bios. When we calculate how much we need to send
from the existing bio (and bvec), we did not take into
account the iov_iter byte count cap.
Since multipage bvecs support, bvecs can split in the middle
which means that when we account for the last bvec send we
should also take the iov_iter byte count cap as it might be
lower than the last bvec size.
Reported-by: Hao Wang <pkuwangh@gmail.com>
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Tested-by: Hao Wang <pkuwangh@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We shouldn't call smp_processor_id() in a preemptible
context, but this is advisory at best, so instead
call __smp_processor_id().
Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The only used argument in this function is the "req".
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are no callers for nvme_reset_ctrl_sync() and
nvme_alloc_request_qid() so that we keep the symbols exported.
Unexport those functions, mark them static and update the header file
respectively.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
While handling the completion queue, keep a local copy of the command id
from the DMA-accessible completion entry. This silences a time-of-check
to time-of-use (TOCTOU) warning from KF/x[1], with respect to a
Thunderclap[2] vulnerability analysis. The double-read impact appears
benign.
There may be a theoretical window for @command_id to be used as an
adversary-controlled array-index-value for mounting a speculative
execution attack, but that mitigation is saved for a potential follow-on.
A man-in-the-middle attack on the data payload is out of scope for this
analysis and is hopefully mitigated by filesystem integrity mechanisms.
[1] https://github.com/intel/kernel-fuzzer-for-xen-project
[2] http://thunderclap.io/thunderclap-paper-ndss2019.pdf
Signed-off-by: Lalithambika Krishna Kumar <lalithambika.krishnakumar@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We may send a request (with or without its data) from two paths:
1. From our I/O context nvme_tcp_io_work which is triggered from:
- queue_rq
- r2t reception
- socket data_ready and write_space callbacks
2. Directly from queue_rq if the send_list is empty (because we want to
save the context switch associated with scheduling our io_work).
However, given that now we have the send_mutex, we may run into a race
condition where none of these contexts will send the pending payload to
the controller. Both io_work send path and queue_rq send path
opportunistically attempt to acquire the send_mutex however queue_rq only
attempts to send a single request, and if io_work context fails to
acquire the send_mutex it will complete without rescheduling itself.
The race can trigger with the following sequence:
1. queue_rq sends request (no incapsule data) and blocks
2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
to the send_list and schedules io_work
3. io_work triggers and cannot acquire the send_mutex - because of (1),
ends without self rescheduling
4. queue_rq completes the send, and completes
==> no context will send the h2cdata - timeout.
Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.
Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Samuel Jones <sjones@kalrayinc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A system with more than one of these SSDs will only have one usable.
Hence the kernel fails to detect nvme devices due to duplicate cntlids.
[ 6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
[ 6.274566] nvme nvme1: Removing after probe failure status: -22
Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.
Signed-off-by: Gopal Tiwari <gtiwari@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
used to be called from a timeout or work context. Now it is being called
in an io completion context, which can be an interrupt handler.
Unfortunately, the abort outstanding ios routine attempts to stop nvme
queues and nested routines that may try to sleep, which is in conflict
with the interrupt handler.
Correct replacing the direct call with a work element scheduling, and the
abort outstanding ios routine will be called in the work element.
Fixes: 95ced8a2c72d ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A smaller set of patches, nothing stands out as being particularly major
this cycle:
- Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw, cxgb4,
mlx4 and mlx5
- Bug fixes and polishing for the new rts ULP
- Cleanup of uverbs checking for allowed driver operations
- Use sysfs_emit all over the place
- Lots of bug fixes and clarity improvements for hns
- hip09 support for hns
- NDR and 50/100Gb signaling rates
- Remove dma_virt_ops and go back to using the IB DMA wrappers
- mlx5 optimizations for contiguous DMA regions
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl/aNXUACgkQOG33FX4g
mxqlMQ/+O6UhxKnDAnMB+HzDGvOm+KXNHOQBuzxz4ZWXqtUrW8WU5ca3PhXovc4z
/QX0HhMhQmVsva5mjp1OGVATxQ2E+yasqFLg4QXAFWFR3N7s0u/sikE9i1DoPvOC
lsmLTeRauCFaE4mJD5nvYwm+riECX0GmyVVW7v6V05xwAp0hwdhyU7Kb6Yh3lxsE
umTz+onPNJcD6Tc4snziyC5QEp5ebEjAaj4dVI1YPR5X0c2RwC5E1CIDI6u4OQ2k
j7/+Kvo8LNdYNERGiR169x6c1L7WS6dYnGMMeXRgyy0BVbVdRGDnvCV9VRmF66w5
99fHfDjNMNmqbGNt/4/gwNdVrR9aI4jMZWCh7SmsguX6XwNOlhYldy3x3WnlkfkQ
e4O0huJceJqcB2Uya70GqufnAetRXsbjzcvWxpR5YAwRmcRkm1f6aGK3BxPjWEbr
BbYRpiKMxxT4yTe65BuuThzx6g4pNQHe0z3BM/dzMJQAX+PZcs1CPQR8F8PbCrZR
Ad7qw4HJ587PoSxPi3toVMpYZRP6cISh1zx9q/JCj8cxH9Ri4MovUCS3cF63Ny3B
1LJ2q0x8FuLLjgZJogKUyEkS8OO6q7NL8WumjvrYWWx19+jcYsV81jTRGSkH3bfY
F7Esv5K2T1F2gVsCe1ZFFplQg6ja1afIcc+LEl8cMJSyTdoSub4=
=9t8b
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A smaller set of patches, nothing stands out as being particularly
major this cycle. The biggest item would be the new HIP09 HW support
from HNS, otherwise it was pretty quiet for new work here:
- Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw,
cxgb4, mlx4 and mlx5
- Bug fixes and polishing for the new rts ULP
- Cleanup of uverbs checking for allowed driver operations
- Use sysfs_emit all over the place
- Lots of bug fixes and clarity improvements for hns
- hip09 support for hns
- NDR and 50/100Gb signaling rates
- Remove dma_virt_ops and go back to using the IB DMA wrappers
- mlx5 optimizations for contiguous DMA regions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits)
RDMA/cma: Don't overwrite sgid_attr after device is released
RDMA/mlx5: Fix MR cache memory leak
RDMA/rxe: Use acquire/release for memory ordering
RDMA/hns: Simplify AEQE process for different types of queue
RDMA/hns: Fix inaccurate prints
RDMA/hns: Fix incorrect symbol types
RDMA/hns: Clear redundant variable initialization
RDMA/hns: Fix coding style issues
RDMA/hns: Remove unnecessary access right set during INIT2INIT
RDMA/hns: WARN_ON if get a reserved sl from users
RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
RDMA/hns: Do shift on traffic class when using RoCEv2
RDMA/hns: Normalization the judgment of some features
RDMA/hns: Limit the length of data copied between kernel and userspace
RDMA/mlx4: Remove bogus dev_base_lock usage
RDMA/uverbs: Fix incorrect variable type
RDMA/core: Do not indicate device ready when device enablement fails
RDMA/core: Clean up cq pool mechanism
RDMA/core: Update kernel documentation for ib_create_named_qp()
MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/XgdYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjTBD/4me2TNvGOogbcL0b1leAotndJ7spI/IcFM
NUMNy3pOGuRBcRjwle85xq44puAjlNkZE2LLatem5sT7ZvS+8lPNnOIoTYgfaCjt
PhKx2sKlLumVm3BwymYAPcPtke4fikGG15Mwu5nX1oOehmyGrjObGAr3Lo6gexCT
tQoCOczVqaTsV+iTXrLlmgEgs07J9Tm93uh2cNR8Jgroxb8ivuWeUq4YgbV4kWk+
Y8XvOyVE/yba0vQf5/hHtWuVoC6RdELnqZ6NCkcP/EicdBecwk1GMJAej1S3zPS1
0BT7GSFTpm3YUHcygD6LRmRg4I/BmWDTDtMi84+jLat6VvSG1HwIm//qHiCJh3ku
SlvFZENIWAv5LP92x2vlR5Lt7uE3GK2V/5Pxt2fekyzCth6mzu+hLH4CBPQ3xgyd
E1JqIQ/ilbXstp+EYoivV5x8yltZQnKEZRopws0EOqj1LsmDPj9XT1wzE9RnB0o+
PWu/DNhQFhhcmP7Z8uLgPiKIVpyGs+vjxiJLlTtGDFTCy6M5JbcgzGkEkSmnybxH
7lSanjpLt1dWj85FBMc6fNtJkv2rBPfb4+j0d1kZ45Dzcr4umirGIh7wtCHcgc83
brmXSt29hlKHseSHMMuNWK8haXcgAE7gq9tD8GZ/kzM7+vkmLLxHJa22Qhq5rp4w
URPeaBaQJw==
=ayp2
-----END PGP SIGNATURE-----
Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"Nothing major in here:
- NVMe pull request from Christoph:
- nvmet passthrough improvements (Chaitanya Kulkarni)
- fcloop error injection support (James Smart)
- read-only support for zoned namespaces without Zone Append
(Javier González)
- improve some error message (Minwoo Im)
- reject I/O to offline fabrics namespaces (Victor Gladkov)
- PCI queue allocation cleanups (Niklas Schnelle)
- remove an unused allocation in nvmet (Amit Engel)
- a Kconfig spelling fix (Colin Ian King)
- nvme_req_qid simplication (Baolin Wang)
- MD pull request from Song:
- Fix race condition in md_ioctl() (Dae R. Jeong)
- Initialize read_slot properly for raid10 (Kevin Vigor)
- Code cleanup (Pankaj Gupta)
- md-cluster resync/reshape fix (Zhao Heming)
- Move null_blk into its own directory (Damien Le Moal)
- null_blk zone and discard improvements (Damien Le Moal)
- bcache race fix (Dongsheng Yang)
- Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang,
Lutz Pogrell, Md Haris Iqbal)
- lightnvm NULL pointer deref fix (tangzhenhao)
- sr in_interrupt() removal (Sebastian Andrzej Siewior)
- FC endpoint security support for s390/dasd (Jan Höppner, Sebastian
Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included
as it made it easier for them to funnel the feature through the
block driver tree.
- Follow up fixes (Colin Ian King)"
* tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits)
block: drop dead assignments in loop_init()
sr: Remove in_interrupt() usage in sr_init_command().
sr: Switch the sector size back to 2048 if sr_read_sector() changed it.
cdrom: Reset sector_size back it is not 2048.
drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c
null_blk: Move driver into its own directory
null_blk: Allow controlling max_hw_sectors limit
null_blk: discard zones on reset
null_blk: cleanup discard handling
null_blk: Improve implicit zone close
null_blk: improve zone locking
block: Align max_hw_sectors to logical blocksize
null_blk: Fail zone append to conventional zones
null_blk: Fix zone size initialization
bcache: fix race between setting bdev state to none and new write request direct to backing
block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
block/rnbd: call kobject_put in the failure path
Documentation/ABI/rnbd-srv: add document for force_close
block/rnbd-srv: close a mapped device from server side.
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK
2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb
wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV
N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3
geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq
e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF
fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku
IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY
Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A
Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/
GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg
Q1Xqs6xwww==
=zo4w
-----END PGP SIGNATURE-----
Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"Another series of killing more code than what is being added, again
thanks to Christoph's relentless cleanups and tech debt tackling.
This contains:
- blk-iocost improvements (Baolin Wang)
- part0 iostat fix (Jeffle Xu)
- Disable iopoll for split bios (Jeffle Xu)
- block tracepoint cleanups (Christoph Hellwig)
- Merging of struct block_device and hd_struct (Christoph Hellwig)
- Rework/cleanup of how block device sizes are updated (Christoph
Hellwig)
- Simplification of gendisk lookup and removal of block device
aliasing (Christoph Hellwig)
- Block device ioctl cleanups (Christoph Hellwig)
- Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)
- Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)
- sbitmap improvements (Pavel Begunkov)
- Hybrid polling fix (Pavel Begunkov)
- bvec iteration improvements (Pavel Begunkov)
- Zone revalidation fixes (Damien Le Moal)
- blk-throttle limit fix (Yu Kuai)
- Various little fixes"
* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
blk-mq: fix msec comment from micro to milli seconds
blk-mq: update arg in comment of blk_mq_map_queue
blk-mq: add helper allocating tagset->tags
Revert "block: Fix a lockdep complaint triggered by request queue flushing"
nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
block: disable iopoll for split bio
block: Improve blk_revalidate_disk_zones() checks
sbitmap: simplify wrap check
sbitmap: replace CAS with atomic and
sbitmap: remove swap_lock
sbitmap: optimise sbitmap_deferred_clear()
blk-mq: skip hybrid polling if iopoll doesn't spin
blk-iocost: Factor out the base vrate change into a separate function
blk-iocost: Factor out the active iocgs' state check into a separate function
blk-iocost: Move the usage ratio calculation to the correct place
blk-iocost: Remove unnecessary advance declaration
blk-iocost: Fix some typos in comments
blktrace: fix up a kerneldoc comment
block: remove the request_queue to argument request based tracepoints
...
The request_queue can trivially be derived from the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow ZNS NVMe SSDs to present a read-only namespace when append is not
supported, instead of rejecting the namespace directly.
This allows (i) the namespace to be used in read-only mode, which is not
a problem as the append command only affects the write path, and (ii) to
use standard management tools such as nvme-cli to choose a different
format or firmware slot that is compatible with the Linux zoned block
device.
Signed-off-by: Javier González <javier.gonz@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remane block device operations in preparation to add char device file
operations.
Signed-off-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Rename controller base dev_t char device in preparation for adding a
namespace char device.
Signed-off-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cleanup unnecessary ret values that are not checked or used in
nvme_alloc_ns().
Signed-off-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
During the scan_work, an Identify command is issued to figure out which
namespaces are active. If this command fails, the nvme driver falls back
to scanning namespaces sequentially. In this situation, we don't see
any warnings and don't even know whether list-ns command has been failed
or not easiliy.
Printa warning when the Identify command executin fail:
[ 1.108399] nvme nvme0: Identify NS List failed (status=0x400b)
[ 1.109583] nvme0n1: detected capacity change from 0 to 1048576
[ 1.112186] nvme nvme0: Identify Descriptors failed (nsid=2, status=0x4002)
[ 1.113929] nvme nvme0: Identify Descriptors failed (nsid=3, status=0x4002)
[ 1.116537] nvme nvme0: Identify Descriptors failed (nsid=4, status=0x4002)
...
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add the namespace ID to the error message when the Identify command used
to retrieve the Namespace Identification Descriptor list fails.
This avoids rather useless and duplicative messages like the following:
[ 1.321031] nvme nvme0: Identify Descriptors failed (16386)
[ 1.321948] nvme nvme0: Identify Descriptors failed (16386)
[ 1.322872] nvme nvme0: Identify Descriptors failed (16386)
[ 1.323775] nvme nvme0: Identify Descriptors failed (16386)
[ 1.324687] nvme nvme0: Identify Descriptors failed (16386)
...
Also, print the nvme status code in hexadecimal rather than decimal
format rather for better readability.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commands get stuck while Host NVMe-oF controller is in reconnect state.
The controller enters into reconnect state when it loses connection with
the target. It tries to reconnect every 10 seconds (default) until
a successful reconnect or until the reconnect time-out is reached.
The default reconnect time out is 10 minutes.
Applications are expecting commands to complete with success or error
within a certain timeout (30 seconds by default). The NVMe host is
enforcing that timeout while it is connected, but during reconnect the
timeout is not enforced and commands may get stuck for a long period or
even forever.
To fix this long delay due to the default timeout, introduce new
"fast_io_fail_tmo" session parameter. The timeout is measured in seconds
from the controller reconnect and any command beyond that timeout is
rejected. The new parameter value may be passed during 'connect'.
The default value of -1 means no timeout (similar to current behavior).
Signed-off-by: Victor Gladkov <victor.gladkov@kioxia.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
currently the NVME_QUIRK_SHARED_TAGS quirk for Apple devices is handled
during the assignment of nr_io_queues in nvme_setup_io_queues().
This however means that for these devices nvme_max_io_queues() will
actually not return the supported maximum which is confusing and
unexpected and also means that in nvme_probe() we are allocating
for I/O queues that will never be used.
Fix this by moving the quirk handling into nvme_max_io_queues().
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
in nvme_setup_io_queues() the number of I/O queues is set to either 1 in
case of a quirky Apple device or to the min of nvme_max_io_queues() or
dev->nr_allocated_queues - 1.
This is unnecessarily complicated as dev->nr_allocated_queues is only
assigned once and is nvme_max_io_queues() + 1.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Right now nvme_alloc_request() allocates a request from block layer
based on the value of the qid. When qid set to NVME_QID_ANY it used
blk_mq_alloc_request() else blk_mq_alloc_request_hctx().
The function nvme_alloc_request() is called from different context, The
only place where it uses non NVME_QID_ANY value is for fabrics connect
commands :-
nvme_submit_sync_cmd() NVME_QID_ANY
nvme_features() NVME_QID_ANY
nvme_sec_submit() NVME_QID_ANY
nvmf_reg_read32() NVME_QID_ANY
nvmf_reg_read64() NVME_QID_ANY
nvmf_reg_write32() NVME_QID_ANY
nvmf_connect_admin_queue() NVME_QID_ANY
nvme_submit_user_cmd() NVME_QID_ANY
nvme_alloc_request()
nvme_keep_alive() NVME_QID_ANY
nvme_alloc_request()
nvme_timeout() NVME_QID_ANY
nvme_alloc_request()
nvme_delete_queue() NVME_QID_ANY
nvme_alloc_request()
nvmet_passthru_execute_cmd() NVME_QID_ANY
nvme_alloc_request()
nvmf_connect_io_queue() QID
__nvme_submit_sync_cmd()
nvme_alloc_request()
With passthru nvme_alloc_request() now falls into the I/O fast path such
that blk_mq_alloc_request_hctx() is never gets called and that adds
additional branch check in fast path.
Split the nvme_alloc_request() into nvme_alloc_request() and
nvme_alloc_request_qid().
Replace each call of the nvme_alloc_request() with NVME_QID_ANY param
with a call to newly added nvme_alloc_request() without NVME_QID_ANY.
Replace a call to nvme_alloc_request() with QID param with a call to
newly added nvme_alloc_request() and nvme_alloc_request_qid()
based on the qid value set in the __nvme_submit_sync_cmd().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to
make consistent with NVME_IO_TIMEOUT.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function nvme_alloc_request() is called from different context
(I/O and Admin queue) where callers do not consider the I/O timeout when
called from I/O queue context.
Update nvme_alloc_request() to set the default I/O and Admin timeout
value based on whether the queuedata is set or not.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use the request's '->mq_hctx->queue_num' directly to simplify the
nvme_req_qid() function.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+69egeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGTSYH/ifRBlaxy5UiHFc0
2zdR7pkjWrYfDTTT3sazIAhdlzzcfnkUqgFxOP45F4ZIqeTzunH3sUY+5UlT9IX7
liUgnLxQ/1R9Gx8kPGQfu+tLCey78xVFydGsqJoW9sPRw2R+apMdGGa/lOrk+OXz
DXIN+dDnGFqwCCNJpK+rxQQhFf++IPpSI8z6Y23moOFhsDZrEziHuVFy2FGyRM6z
prZ/us/tcobE8ptCk1RmOxLoJ1DR6UxpA2vLimTE+JD8siOsSWPbjE0KudnWCnd5
BLqIjrsPJbSxyuzzK3v9dnO5wMv7tMDuMIuYM/MQTXDttNwtsqt/aP6gdnUCym7N
5eHEj5g=
=MuO1
-----END PGP SIGNATURE-----
Merge tag 'v5.10-rc5' into rdma.git for-next
For dependencies in following patches
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use the block layer helper to update both the disk and block device
sizes. Contrary to the name no notification is sent in this case,
as a size 0 is special cased.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The update_bdev argument is always set to true, so remove it. Also
rename the function to the slighly less verbose set_capacity_and_notify,
as propagating the disk size to the block device isn't really
revalidation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no good reason to call revalidate_disk_size separately.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
xa_destroy() frees only internal data. The caller is responsible for
freeing the exteranl objects referenced by an xarray.
Fixes: 1cf7a12e09aa4 ("nvme: use an xarray to lookup the Commands Supported and Effects log")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the struct used for tracking known command effects logs in a
list. This is now saved in an xarray that doesn't use these elements.
Instead, store the log directly instead of the wrapper struct.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If Doorbell Buffer Config command fails even 'dev->dbbuf_dbs != NULL'
which means OACS indicates that NVME_CTRL_OACS_DBBUF_SUPP is set,
nvme_dbbuf_update_and_check_event() will check event even it's not been
successfully set.
This patch fixes mismatch among dbbuf for sq/cqs in case that dbbuf
command fails.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
->dma_device is a private implementation detail of the RDMA core. Use the
ibdev_to_node helper to get the NUMA node for a ib_device instead of
poking into ->dma_device.
Link: https://lore.kernel.org/r/20201106181941.1878556-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The offending commit breaks BLKROSET ioctl because a device
revalidation will blindly override BLKROSET setting. Hence,
we remove the disk rw setting in case NVME_NS_ATTR_RO is cleared
from by the controller.
Fixes: 1293477f4f32 ("nvme: set gendisk read only based on nsattr")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>