1072709 Commits

Author SHA1 Message Date
Randy Dunlap
bcfe9b6cbb virtio_blk: eliminate anonymous module_init & module_exit
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.

Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.

Example 1: (System.map)
 ffffffff832fc78c t init
 ffffffff832fc79e t init
 ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
 calling  init+0x0/0x12 @ 1
 initcall init+0x0/0x12 returned 0 after 15 usecs
 calling  init+0x0/0x60 @ 1
 initcall init+0x0/0x60 returned 0 after 2 usecs
 calling  init+0x0/0x9a @ 1
 initcall init+0x0/0x9a returned 0 after 74 usecs

Fixes: e467cde23818 ("Block driver using virtio.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220316192010.19001-2-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-17 10:11:29 -06:00
Juergen Gross
85d9abcd73 xen/blkfront: speed up purge_persistent_grants()
purge_persistent_grants() is scanning the grants list for persistent
grants being no longer in use by the backend. When having found such a
grant, it will be set to "invalid" and pushed to the tail of the list.

Instead of pushing it directly to the end of the list, add it first to
a temporary list, avoiding to scan those entries again in the main
list traversal. After having finished the scan, append the temporary
list to the grant list.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20220311103527.12931-1-jgross@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-11 05:15:33 -07:00
Jens Axboe
67b56134ce Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.18/drivers
Pull MD updates from Song:

"This set contains raid5 bio handling cleanups for raid5."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  raid5: initialize the stripe_head embeeded bios as needed
  raid5-cache: statically allocate the recovery ra bio
  raid5-cache: fully initialize flush_bio when needed
  raid5-ppl: fully initialize the bio in ppl_new_iounit
2022-03-10 16:04:03 -07:00
Christoph Hellwig
03a6b195e8 raid5: initialize the stripe_head embeeded bios as needed
Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:13 -08:00
Christoph Hellwig
89f94b6440 raid5-cache: statically allocate the recovery ra bio
There is no need to preallocate the bio and reset it when use.  Just
allocate it on-stack and use a bvec places next to the pages used for
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:09 -08:00
Christoph Hellwig
0dd00cba99 raid5-cache: fully initialize flush_bio when needed
Stop using bio_reset and just initialize the bio fully when needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:04 -08:00
Christoph Hellwig
9f7c3f837a raid5-ppl: fully initialize the bio in ppl_new_iounit
We have all the information to pass the bdev and op directly to bio_init,
so do that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:00 -08:00
Jens Axboe
a2daeab5cf Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.18/drivers
Pull MD fixes from Song:

"Most of these changes are minor fixes and clean-ups."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: use msleep() in md_notify_reboot()
  lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR
  lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
  lib/raid6/test: fix multiple definition linking error
  md: raid1/raid10: drop pending_cnt
2022-03-08 16:40:12 -07:00
Eric Dumazet
7d959f6e97 md: use msleep() in md_notify_reboot()
Calling mdelay(1000) from process context, even while a reboot
is in progress, does not make sense.

Using msleep() allows other threads to make progress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:20:21 -08:00
Paul Menzel
5b401e4e9a lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR
On Ubuntu 21.10 (ppc64le) building raid6test with gcc (Ubuntu
11.2.0-7ubuntu2) 11.2.0 fails with the error below.

    gcc -I.. -I ../../../include -g -O2                       \
             -I../../../arch/powerpc/include -DCONFIG_ALTIVEC \
             -c -o vpermxor1.o vpermxor1.c
    vpermxor1.c: In function ‘raid6_vpermxor1_gen_syndrome_real’:
    vpermxor1.c:64:29: error: expected string literal before ‘VPERMXOR’
       64 |   asm(VPERMXOR(%0,%1,%2,%3):"=v"(wq0):"v"(gf_high), "v"(gf_low), "v"(wq0));
          |       ^~~~~~~~
    make: *** [Makefile:58: vpermxor1.o] Error 1

So, include the header asm/ppc-opcode.h defining this macro also when
not building the Linux kernel but only this too.

Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:20:21 -08:00
Paul Menzel
633174a704 lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the
errors below:

    $ cd lib/raid6/test/
    $ make
    <stdin>:1:1: error: stray ‘\’ in program
    <stdin>:1:2: error: stray ‘#’ in program
    <stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \
        before ‘<’ token

    [...]

The errors come from the HAS_ALTIVEC test, which fails, and the POWER
optimized versions are not built. That’s also reason nobody noticed on the
other architectures.

GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release
announcment:

> * WARNING: Backward-incompatibility!
>   Number signs (#) appearing inside a macro reference or function invocation
>   no longer introduce comments and should not be escaped with backslashes:
>   thus a call such as:
>     foo := $(shell echo '#')
>   is legal.  Previously the number sign needed to be escaped, for example:
>     foo := $(shell echo '\#')
>   Now this latter will resolve to "\#".  If you want to write makefiles
>   portable to both versions, assign the number sign to a variable:
>     H := \#
>     foo := $(shell echo '$H')
>   This was claimed to be fixed in 3.81, but wasn't, for some reason.
>   To detect this change search for 'nocomment' in the .FEATURES variable.

So, do the same as commit 9564a8cf422d ("Kbuild: fix # escaping in .cmd
files for future Make") and commit 929bef467771 ("bpf: Use $(pound) instead
of \# in Makefiles") and define and use a $(pound) variable.

Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57

Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:20:21 -08:00
Dirk Müller
a5359ddd05 lib/raid6/test: fix multiple definition linking error
GCC 10+ defaults to -fno-common, which enforces proper declaration of
external references using "extern". without this change a link would
fail with:

  lib/raid6/test/algos.c:28: multiple definition of `raid6_call';
  lib/raid6/test/test.c:22: first defined here

the pq.h header that is included already includes an extern declaration
so we can just remove the redundant one here.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dirk Müller <dmueller@suse.de>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:20:21 -08:00
Mariusz Tkaczyk
daae161fd2 md: raid1/raid10: drop pending_cnt
Those counters are not necessary after commit 11bb45e8aaf6 ("md: drop queue
limitation for RAID1 and RAID10"). Remove them from all code (conf and
plug structs). raid1_plug_cb and raid10_plug_cb are identical, so move
definition of raid1_plug_cb to common raid1-10 definitions and use it for
RAID10 too.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:16:54 -08:00
Jens Axboe
a76370690c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/colyli/linux-bcache into for-5.18/drivers
Pull bcache updates from Coly:

"We have 2 patches for Linux v5.18, both of them are from Mingzhe Zou.
 The first patch improves bcache initialization speed by avoid
 unnecessary cost of cache consistency, the second one fixes a potential
 NULL pointer deference in bcache initialization time."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/colyli/linux-bcache:
  bcache: fixup multiple threads crash
  bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
2022-03-06 08:13:09 -07:00
Mingzhe Zou
887554ab96 bcache: fixup multiple threads crash
When multiple threads to check btree nodes in parallel, the main
thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:

wait_event_interruptible(check_state->wait,
                         atomic_read(&check_state->started) == 0 ||
                         test_bit(CACHE_SET_IO_DISABLE, &c->flags));

However, the bch_btree_node_read and bch_btree_node_read_done
maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
will be set. If the flag already set, the main thread return
error. At the same time, maybe some threads still running and
read NULL pointer, the kernel will crash.

This patch change the event wait condition, the main thread must
wait for all threads to stop.

Fixes: 8e7102273f597 ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Coly Li <colyli@suse.de>
2022-03-06 22:33:45 +08:00
Mingzhe Zou
7b1002f7cf bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
When attaching a cached device (a.k.a backing device) to a cache
device, bch_sectors_dirty_init() is called to count dirty sectors
and stripes (see what bcache_dev_sectors_dirty_add() does) on the
cache device.

When bcache_dev_sectors_dirty_add() is called, set_bit(stripe,
d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes)
operation will always be performed. In full_dirty_stripes, each 1bit
represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512),
and each CPU cache line=64B=512bit=2048MB. When 20 threads process
a cached disk with 100G dirty data, a single thread processes about
23M at a time, and 20 threads total 460M. These full_dirty_stripes
bits corresponding to the 460M data is likely to fall in the same CPU
cache line. When one of these threads performs a set_bit or clear_bit
operation, the same CPU cache line of other threads will become invalid
and must read the full_dirty_stripes from the main memory again. Compared
with single thread, the time of a bcache_dev_sectors_dirty_add()
call is increased by about 50 times in our test (100G dirty data,
20 threads, bcache_dev_sectors_dirty_add() is called more than
20 million times).

This patch tries to test_bit before set_bit or clear_bit operation.
Therefore, a lot of force set and clear operations will be avoided,
and most of bcache_dev_sectors_dirty_add() calls will only read CPU
cache line.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
2022-03-06 22:33:37 +08:00
Christoph Hellwig
13d4ef0f66 floppy: use memcpy_{to,from}_bvec
Use the helpers instead of open coding them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
3eddaa60b8 drbd: use bvec_kmap_local in recv_dless_read
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
472278508d drbd: use bvec_kmap_local in drbd_csum_bio
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
07fee7aba5 bcache: use bvec_kmap_local in bio_csum
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
3205190655 nvdimm-btt: use bvec_kmap_local in btt_rw_integrity
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
20072ec828 nvdimm-blk: use bvec_kmap_local in nd_blk_rw_integrity
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
bd3d3203eb zram: use memcpy_from_bvec in zram_bvec_write
Use memcpy_from_bvec instead of open coding the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig
b3bd0a8a74 zram: use memcpy_to_bvec in zram_bvec_read
Use the proper helper instead of open coding the copy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:20 -07:00
Christoph Hellwig
b7ab4611b6 aoe: use bvec_kmap_local in bvcpy
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220303111905.321089-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:20 -07:00
Christoph Hellwig
143a70b8b4 iss-simdisk: use bvec_kmap_local in simdisk_submit_bio
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Link: https://lore.kernel.org/r/20220303111905.321089-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:20 -07:00
Jens Axboe
c48d8c5c0c nvme updates for Linux 5.18
- add vectored-io support for user-passthrough (Kanchan Joshi)
  - add verbose error logging (Alan Adamson)
  - support buffered I/O on block devices in nvmet (Chaitanya Kulkarni)
  - central discovery controller support (Martin Belanger)
  - fix and extended the globally unique idenfier validation (me)
  - move away from the deprecated IDA APIs (Sagi Grimberg)
  - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
    Chaitanya Kulkarni)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmIgnC4LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPuyg//RFQB11sLkNQREMBOyi4hMZyhhg0QoATUnzWzfeIA
 WHMRz/RIMfTeHU2yZVkapaSr+o4SX1UoGrE7nxBa66JKDflJK96mHAMJKruCmDAy
 /f0NzmsQ47F7RfgPEPQ9eYzXgqdrAiIbjWmevDCa9JocCBbqTMklM8IcRsVCGIvM
 aUuTIujvG6UcXXmaNmkZuejW2UhAZPpK6Yrpy6o6Zo0FRHxJ7mJdGvG1tCe6yuFR
 H07tvjQ0N9d6smAxvSykU0qtIXTyuG1WKdkm4c3Z7TQ0jzMQSadb/7FMCeCr5hWo
 2+XmbFEw9tx6t/FI522z+PmsA5rdjx9Qh16kEFZ5jknsNRI4YhrX670VzwD3litv
 hNH24KnnpfK8gOaTXddlrhobin9snCITTYnFr7g5INgxONf3q936U5EwHvJMPDAV
 9goYt8UN1ATgChlLhLnTWqVG7mTncAXJV7j0P9NHthe6fkn3W+/GEuVBv1nigPTF
 Kgz+ld6hjP3uygjwKhgYwKJ+0NewrqWCikEPKWG4VCMUGuFIhk1zcl+KCdqr9473
 OCK4TYRymWwQT6pTL4ScOW+cLhHLofgfcD6EEnND8+JwwqOM80t96x2mZvvFrNgf
 8mEVcfrqNBmOSzlsGLUPSuWYJ1T3CnNG/a9x4iku9jPhN0W79BHj0Hyc4Ryyq09/
 qYY=
 =UkwS
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.18-2022-03-03' of git://git.infradead.org/nvme into for-5.18/drivers

Pull NVMe updates from Christoph:

"nvme updates for Linux 5.18

 - add vectored-io support for user-passthrough (Kanchan Joshi)
 - add verbose error logging (Alan Adamson)
 - support buffered I/O on block devices in nvmet (Chaitanya Kulkarni)
 - central discovery controller support (Martin Belanger)
 - fix and extended the globally unique idenfier validation (me)
 - move away from the deprecated IDA APIs (Sagi Grimberg)
 - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
   Chaitanya Kulkarni)"

* tag 'nvme-5.18-2022-03-03' of git://git.infradead.org/nvme: (27 commits)
  nvme: check that EUI/GUID/UUID are globally unique
  nvme: check for duplicate identifiers earlier
  nvme: fix the check for duplicate unique identifiers
  nvme: cleanup __nvme_check_ids
  nvme: remove nssa from struct nvme_ctrl
  nvme: explicitly set non-error for directives
  nvme: expose cntrltype and dctype through sysfs
  nvme: send uevent on connection up
  nvme: add vectored-io support for user-passthrough
  nvme: add verbose error logging
  nvme: add a helper to initialize connect_q
  nvme-rdma: add helpers for mapping/unmapping request
  nvmet-tcp: replace ida_simple[get|remove] with the simler ida_[alloc|free]
  nvmet-rdma: replace ida_simple[get|remove] with the simler ida_[alloc|free]
  nvmet-fc: replace ida_simple[get|remove] with the simler ida_[alloc|free]
  nvmet: replace ida_simple[get|remove] with the simler ida_[alloc|free]
  nvme-fc: replace ida_simple[get|remove] with the simler ida_[alloc|free]
  nvme: replace ida_simple[get|remove] with the simler ida_[alloc|free]
  nvmet: allow bdev in buffered_io mode
  nvmet: use i_size_read() to set size for file-ns
  ...
2022-03-03 08:51:48 -07:00
Christoph Hellwig
2079f41ec6 nvme: check that EUI/GUID/UUID are globally unique
Add a check to verify that the unique identifiers are unique globally
in addition to the existing check that verifies that they are unique
inside a single subsystem.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28 13:45:07 +02:00
Christoph Hellwig
e2d77d2e11 nvme: check for duplicate identifiers earlier
Lift the check for duplicate identifiers into nvme_init_ns_head, which
avoids pointless error unwinding in case they don't match, and also
matches where we check identifier validity for the multipath case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28 13:45:07 +02:00
Christoph Hellwig
e2724cb9f0 nvme: fix the check for duplicate unique identifiers
nvme_subsys_check_duplicate_ids should needs to return an error if any of
the identifiers matches, not just if all of them match.  But it does not
need to and should not look at the CSI value for this sanity check.

Rewrite the logic to be separate from nvme_ns_ids_equal and optimize it
by reducing duplicate checks for non-present identifiers.

Fixes: ed754e5deeb1 ("nvme: track shared namespaces")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28 13:45:07 +02:00
Christoph Hellwig
fd8099e791 nvme: cleanup __nvme_check_ids
Pass the actual nvme_ns_ids used for the comparison instead of the
ns_head that isn't needed and use a more descriptive function name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28 13:45:06 +02:00
Keith Busch
0a9f850061 nvme: remove nssa from struct nvme_ctrl
The reported number of streams is not used outside the function that
gets it, so no need to stash it in the controller structure. Use a local
variable instead.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Keith Busch
1c3adf0de1 nvme: explicitly set non-error for directives
Stream directives is an optional feature. It is not an error if a
controller doesn't support as many as the kernel can optionally use.
Explicitly set the non-error return value on this condition with a
comment explaining why.

Note, the return value was already 0 in this condition, so the setting
is redundant. This patch should just silence bots that falsely believe
the condition contains an error omission.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Martin Belanger
86c2457a8e nvme: expose cntrltype and dctype through sysfs
TP8010 introduces the Discovery Controller Type attribute (dctype).
The dctype is returned in the response to the Identify command. This
patch exposes the dctype through the sysfs. Since the dctype depends on
the Controller Type (cntrltype), another attribute of the Identify
response, the patch also exposes the cntrltype as well. The dctype will
only be displayed for discovery controllers.

A note about the naming of this attribute:
Although TP8010 calls this attribute the Discovery Controller Type,
note that the dctype is now part of the response to the Identify
command for all controller types. I/O, Discovery, and Admin controllers
all share the same Identify response PDU structure. Non-discovery
controllers as well as pre-TP8010 discovery controllers will continue
to set this field to 0 (which has always been the default for reserved
bytes). Per TP8010, the value 0 now means "Discovery controller type is
not reported" instead of "Reserved". One could argue that this
definition is correct even for non-discovery controllers, and by
extension, exposing it in the sysfs for non-discovery controllers is
appropriate.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Martin Belanger
20d64911e7 nvme: send uevent on connection up
When connectivity with a controller is lost, the driver will keep
trying to reconnect once every 10 sec. When connection is restored,
user-space apps need to be informed so that they can take proper
action. For example, TP8010 introduces the DIM PDU, which is used to
register with a discovery controller (DC). The DIM PDU is sent from
user-space.  The DIM PDU must be sent every time a connection is
established with a DC. Therefore, the kernel must tell user-space apps
when connection is restored so that registration can happen.

The uevent sent is a "change" uevent with environmental data
set to: "NVME_EVENT=connected".

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Kanchan Joshi
89377bc197 nvme: add vectored-io support for user-passthrough
Add a new NVME_IOCTL_IO64_CMD_VEC ioctl that works like the existing
NVME_IOCTL_IO64_CMD ioctl except that it takes and array of iovecs
and thus supports vectored I/O.

  - cmd.addr is base address of user iovec array
  - cmd.vec_cnt is count of iovec array elements

This patch does not include vectored-variant for admin-commands as most
of them are light on buffers and likely to have low invocation frequency.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Alan Adamson
bd83fe6f2c nvme: add verbose error logging
Improves logging of NVMe errors.  If NVME_VERBOSE_ERRORS is configured,
a verbose description of the error is logged, otherwise only status
codes/bits is logged.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
[kch]: fix several nits, cosmetics, and trim down code.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Chaitanya Kulkarni
72e8b5cd7d nvme: add a helper to initialize connect_q
Add and use helper to remove duplicate code for fabrics connect_q
initialization and error handling for all the transports.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:06 +02:00
Max Gurtovoy
4686af885a nvme-rdma: add helpers for mapping/unmapping request
Introduce nvme_rdma_dma_map_req/nvme_rdma_dma_unmap_req helper functions
to improve code readability and ease on the error flow.

Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Sagi Grimberg
44f331a630 nvmet-tcp: replace ida_simple[get|remove] with the simler ida_[alloc|free]
ida_simple_[get|remove] are wrappers anyways.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Sagi Grimberg
7c2566394f nvmet-rdma: replace ida_simple[get|remove] with the simler ida_[alloc|free]
ida_simple_[get|remove] are wrappers anyways.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Sagi Grimberg
6dd0f465d5 nvmet-fc: replace ida_simple[get|remove] with the simler ida_[alloc|free]
ida_simple_[get|remove] are wrappers anyways.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Sagi Grimberg
22027a9811 nvmet: replace ida_simple[get|remove] with the simler ida_[alloc|free]
ida_simple_[get|remove] are wrappers anyways.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Sagi Grimberg
3dd83f4013 nvme-fc: replace ida_simple[get|remove] with the simler ida_[alloc|free]
ida_simple_[get|remove] are wrappers anyways.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Sagi Grimberg
8b850475c0 nvme: replace ida_simple[get|remove] with the simler ida_[alloc|free]
ida_simple_[get|remove] are wrappers anyways.

Also, use ida_alloc_min with the ns_ida as namespace
enumeration starts with 1.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Chaitanya Kulkarni
6f6d604b4e nvmet: allow bdev in buffered_io mode
Allow block device to be configured in the buffered I/O mode by using
the file backend. In this way now we can use cache for the block
device namespace which shows significant performance improvement.

We update the block device ns enable function and return early when
buffered_io flag is set.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Chaitanya Kulkarni
2caecd62ea nvmet: use i_size_read() to set size for file-ns
Instead of calling vfs_getattr() use i_size_read() to read the size of
file so we can read the size of not only file type but also block type
with one call. This is needed to implement buffered_io support for the
NVMeOF block device backend.

We also change return type of function nvmet_file_ns_revalidate() from
int to void, since this function does not return any meaning value.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Chaitanya Kulkarni
581f19dd72 nvme-fabrics: remove unnecessary braces for case
Braces are not required for enum value NVME_SC_CONNECT_INVALID_PARAM
when used on the switch-case statement, remove the braces.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:04 +02:00
Chaitanya Kulkarni
72b3eab456 nvme-fabrics: use consistent zeroout pattern
Remove zeroout memeset call & zeroout local variable cmd at the time
of declaration in nvmf_ref_read32() similar to what we have done in
nvmf_reg_read64(), nvmf_reg_write32(), nvmf_connect_admin_queue(), and
nvmf_connect_io_queue().

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:04 +02:00
Chaitanya Kulkarni
0801a4b630 nvme-fabrics: use unsigned int type
Loop variable i will never have a negative value, so use
unsigned int type instaed of int.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:04 +02:00