IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Use bio generic_start_io_acct() and generic_end_io_acct() to account
device's block layer statistics. This will let users to monitor zram
activities using sysstat and similar packages/tools.
Apart from the usual per-stat sysfs attr, zram IO stats are now also
available in '/sys/block/zram<id>/stat' and '/proc/diskstats' files.
We will slowly get rid of per-stat sysfs files.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A cosmetic change. We have a new code layout and keep zram per-device
sysfs store and show functions in one place. Move compact_store() to that
handlers block to conform to current layout.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces rework to zram stats. We have per-stat sysfs nodes,
and it makes things a bit hard to use in user space: it doesn't give an
immediate stats 'snapshot', it requires user space to use more syscalls -
open, read, close for every stat file, with appropriate error checks on
every step, etc.
First, zram now accounts block layer statistics, available in
/sys/block/zram<id>/stat and /proc/diskstats files. So some new stats are
available (see Documentation/block/stat.txt), besides, zram's activities
now can be monitored by sysstat's iostat or similar tools.
Example:
cat /sys/block/zram0/stat
248 0 1984 0 251029 0 2008232 5120 0 5116 5116
Second, group currently exported on per-stat basis nodes into two
categories (files):
-- zram<id>/io_stat
accumulates device's IO stats, that are not accounted by block layer,
and contains:
failed_reads
failed_writes
invalid_io
notify_free
Example:
cat /sys/block/zram0/io_stat
0 0 0 652572
-- zram<id>/mm_stat
accumulates zram mm stats and contains:
orig_data_size
compr_data_size
mem_used_total
mem_limit
mem_used_max
zero_pages
num_migrated
Example:
cat /sys/block/zram0/mm_stat
434634752 270288572 279158784 0 579895296 15060 0
per-stat sysfs nodes are now considered to be deprecated and we plan to
remove them (and clean up some of the existing stat code) in two years (as
of now, there is no warning printed to syslog about deprecated stats being
used). User space is advised to use the above mentioned 3 files.
This patch (of 7):
Remove sysfs `num_migrated' attribute. We are moving away from per-stat
device attrs towards 3 stat files that will accumulate io and mm stats in
a format similar to block layer statistics in /sys/block/<dev>/stat. That
will be easier to use in user space, and reduce the number of syscalls
needed to read zram device statistics.
`num_migrated' will return back in zram<id>/mm_stat file.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that zsmalloc supports compaction, zram can use it. For the first
step, this patch exports compact knob via sysfs so user can do compaction
via "echo 1 > /sys/block/zram0/compact".
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull second vfs update from Al Viro:
"Now that net-next went in... Here's the next big chunk - killing
->aio_read() and ->aio_write().
There'll be one more pile today (direct_IO changes and
generic_write_checks() cleanups/fixes), but I'd prefer to keep that
one separate"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
->aio_read and ->aio_write removed
pcm: another weird API abuse
infinibad: weird APIs switched to ->write_iter()
kill do_sync_read/do_sync_write
fuse: use iov_iter_get_pages() for non-splice path
fuse: switch to ->read_iter/->write_iter
switch drivers/char/mem.c to ->read_iter/->write_iter
make new_sync_{read,write}() static
coredump: accept any write method
switch /dev/loop to vfs_iter_write()
serial2002: switch to __vfs_read/__vfs_write
ashmem: use __vfs_read()
export __vfs_read()
autofs: switch to __vfs_write()
new helper: __vfs_write()
switch hugetlbfs to ->read_iter()
coda: switch to ->read_iter/->write_iter
ncpfs: switch to ->read_iter/->write_iter
net/9p: remove (now-)unused helpers
p9_client_attach(): set fid->uid correctly
...
Originally Xen PV drivers only use single-page ring to pass along
information. This might limit the throughput between frontend and
backend.
The patch extends Xenbus driver to support multi-page ring, which in
general should improve throughput if ring is the bottleneck. Changes to
various frontend / backend to adapt to the new interface are also
included.
Affected Xen drivers:
* blkfront/back
* netfront/back
* pcifront/back
* scsifront/back
* vtpmfront
The interface is documented, as before, in xenbus_client.c.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The current string_get_size() overflows when the device size goes over
2^64 bytes because the string helper routine computes the suffix from
the size in bytes. However, the entirety of Linux thinks in terms of
blocks, not bytes, so this will artificially induce an overflow on very
large devices. Fix this by making the function string_get_size() take
blocks and the block size instead of bytes. This should allow us to
keep working until the current SCSI standard overflows.
Also fix virtio_blk and mmc (both of which were also artificially
multiplying by the block size to pass a byte side to string_get_size()).
The mathematics of this is pretty simple: we're taking a product of
size in blocks (S) and block size (B) and trying to re-express this in
exponential form: S*B = R*N^E (where N, the exponent is either 1000 or
1024) and R < N. Mathematically, S = RS*N^ES and B=RB*N^EB, so if RS*RB
< N it's easy to see that S*B = RS*RB*N^(ES+EB). However, if RS*BS > N,
we can see that this can be re-expressed as RS*BS = R*N (where R =
RS*BS/N < N) so the whole exponent becomes R*N^(ES+EB+1)
[jejb: fix incorrect 32 bit do_div spotted by kbuild test robot <fengguang.wu@intel.com>]
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Konrad writes:
This pull has one fix and an cleanup.
Note that David Vrabel in the xen/tip.git tree has other changes for the
Xen block drivers that are related to his grant work - and they do not
conflict with this git pull.
This adds support for the extended metadata formats through the submit
IO ioctl, and simplifies the rest when using a separate metadata format.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Checking fails static analysis due to additional arithmetic prior to
the NULL check. Mapping doesn't return NULL here anyway, so removing
the check.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
class_create() returns ERR_PTR on failure,
so IS_ERR() should be used instead of check for NULL.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Define pr_fmt macro with {xen-blkback: } prefix, then remove all use
of DRV_PFX in the pr sentences. Replace all DPRINTK with pr sentences,
and get rid of DPRINTK macro. It will simplify the code.
And if the pr sentences miss a \n, add it in the end. If the DPRINTK
sentences have redundant \n, remove it. It will format the code.
These all make the readability of the code become better.
Signed-off-by: Tao Chen <boby.chen@huawei.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
The blkback name is like blkback.domid.xvd[a-z], if domid has four digits
(means larger than 1000), then the backmost xvd wouldn't be fully shown.
Define a BLKBACK_NAME_LEN macro to be 20, enlarge the array size of
blkback name, so it will be fully shown in any case.
Signed-off-by: Tao Chen <boby.chen@huawei.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
By returning the error code directly, we can avoid the jump label
error_out.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
dprintk has some name collisions with other frameworks and drivers. It
is also not necessary to have these custom debug print filters. Dynamic
debug offers the same amount of filtered debugging.
This patch replaces all dprintks with dev_dbg(). It also removes the
ioctl dprintk which prints the ingoing ioctls which should be
replaceable by strace or similar stuff.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
The block subsystem uses loff_t to store the device size. Change the
type for nbd_device bytesize to loff_t.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
kthread_run includes the wake_up_process() call, so instead of
kthread_create() followed by wake_up_process() we can use this macro.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
The header is not included anywhere. Remove it and include the private
nbd_device struct in nbd.c.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Fix:
drivers/block/pmem.c: In function ‘pmem_alloc’:
drivers/block/pmem.c:138:7: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Wformat=]
By using the proper %pa format specifier we use for 'phys_addr_t' arguments.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@ml01.01.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PMEM is a new driver that presents a reserved range of memory as
a block device. This is useful for developing with NV-DIMMs,
and can be used with volatile memory as a development platform.
This patch contains the initial driver from Ross Zwisler, with
various changes: converted it to use a platform_device for
discovery, fixed partition support and merged various patches
from Boaz Harrosh.
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@ml01.01.org
Link: http://lkml.kernel.org/r/1427872339-6688-3-git-send-email-hch@lst.de
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Usually the admin queue depth of 64 is plenty, but for some use cases we
really need it larger. Examples are use cases like MAT, where you have
to touch all of NAND for init/format like purposes. In those cases, we
see a good 2x increase with an increased queue depth.
Signed-off-by: Jens Axboe <axboe@fb.com>
Acked-by: Keith Busch <keith.busch@intel.com>
PRP list calculation is supposed to be based on device's page size.
Systems with page size larger than device's page size cause corruption
to the name space as well as system memory with out this fix.
Systems like x86 might not experience this issue because it uses
PAGE_SIZE of 4K where as powerpc uses PAGE_SIZE of 64k while NVMe device's
page size varies depending upon the vendor.
Signed-off-by: Murali Iyer <mniyer@us.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The driver may issue commands to a device that may never return, so its
request_queue could always have active requests while the controller is
running. Waiting for the queue to freeze could block forever, which is
what blk-mq's hot cpu notification handler was doing when nvme drives
were in use.
This has the nvme driver make the asynchronous event command's tag
reserved and does not keep the request active. We can't have more than
one since the request is released back to the request_queue before the
command is completed. Having only one avoids potential tag collisions,
and reserving the tag for this purpose prevents other admin tasks from
reusing the tag.
I also couldn't think of a scenario where issuing AEN requests single
depth is worse than issuing them in batches, so I don't think we lose
anything with this change.
As an added bonus, doing it this way removes "Cancelling I/O" warnings
observed when unbinding the nvme driver from a device.
Reported-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This fixes a race accessing an invalid address when a controller's admin
queue is in use during a reset for failure or hot removal occurs. The
admin queue will be frozen to prevent new users from entering prior to
the doorbell queue being unmapped.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mempools created for slab caches should use
mempool_create_slab_pool().
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
mempool_alloc() does not support __GFP_ZERO since elements may come from
memory that has already been released by mempool_free().
Remove __GFP_ZERO from mempool_alloc() in drbd_req_new() and properly
initialize it to 0.
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Driver recovery requires the device's list node to have been initialized.
Fixes: https://lkml.org/lkml/2015/3/22/262
Reported-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
ppc has special instruction forms to efficiently load and store values
in non-native endianness. These can be accessed via the arch-specific
{ld,st}_le{16,32}() inlines in arch/powerpc/include/asm/swab.h.
However, gcc is perfectly capable of generating the byte-reversing
load/store instructions when using the normal, generic cpu_to_le*() and
le*_to_cpu() functions eaning the arch-specific functions don't have much
point.
Worse the "le" in the names of the arch specific functions is now
misleading, because they always generate byte-reversing forms, but some
ppc machines can now run a little-endian kernel.
To start getting rid of the arch-specific forms, this patch removes them
from all the old Power Macintosh drivers, replacing them with the
generic byteswappers.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The IRQF_DISABLED flag is a NOOP and has been scheduled for removal
since Linux v2.6.36 by commit 6932bf37bed4 ("genirq: Remove
IRQF_DISABLED from core code").
According to commit e58aa3d2d0cc ("genirq: Run irq handlers with
interrupts disabled"), running IRQ handlers with interrupts
enabled can cause stack overflows when the interrupt line of the
issuing device is still active.
This patch ends the grace period for IRQF_DISABLED (i.e.,
SA_INTERRUPT in older versions of Linux) and removes the
definition and all remaining usages of this flag.
There's still a few non-functional references left in the kernel
source:
- The bigger hunk in Documentation/scsi/ncr53c8xx.txt is removed entirely
as IRQF_DISABLED is gone now; the usage in older kernel versions
(including the old SA_INTERRUPT flag) should be discouraged. The
trouble of using IRQF_SHARED is a general problem and not specific to
any driver.
- I left the reference in Documentation/PCI/MSI-HOWTO.txt untouched since
it has already been removed in linux-next.
- All remaining references are changelogs that I suggest to keep.
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Afzal Mohammed <afzal@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Eyal Perry <eyalpe@mellanox.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Hongliang Tao <taohl@lemote.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nishanth Menon <nm@ti.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Sricharan R <r.sricharan@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
Cc: iss_storagedev@hp.com
Cc: linux-mips@linux-mips.org
Cc: linux-mtd@lists.infradead.org
Link: http://lkml.kernel.org/r/1425565425-12604-1-git-send-email-valentinrothberg@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
we have already allocated memory for nbd_dev, but we were not
releasing that memory and just returning the error value.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Acked-by: Paul Clements <Paul.Clements@SteelEye.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Pull block layer fixes from Jens Axboe:
"Two smaller fixes for this cycle:
- A fixup from Keith so that NVMe compiles without BLK_INTEGRITY,
basically just moving the code around appropriately.
- A fixup for shm, fixing an oops in shmem_mapping() for mapping with
no inode. From Sasha"
[ The shmem fix doesn't look block-layer-related, but fixes a bug that
happened due to the backing_dev_info removal.. - Linus ]
* 'for-linus' of git://git.kernel.dk/linux-block:
mm: shmem: check for mapping owner before dereferencing
NVMe: Fix for BLK_DEV_INTEGRITY not set
max_used_pages is defined as atomic_long_t so we need to use unsigned
long to keep temporary value for it rather than int which is smaller
than unsigned long in a 64 bit system.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Need to define and use appropriate functions for when BLK_DEV_INTEGRITY
is not set.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This makes all sync commands uninterruptible and schedules without timeout
so the controller either has to post a completion or the timeout recovery
fails the command. This fixes potential memory or data corruption from
a command timing out too early or woken by a signal. Previously any DMA
buffers mapped for that command would have been released even though we
don't know what the controller is planning to do with those addresses.
Signed-off-by: Keith Busch <keith.busch@intel.com>
We don't track queues in a llist, subscribe to hot-cpu notifications,
or internally retry commands. Delete the unused artifacts.
Signed-off-by: Keith Busch <keith.busch@intel.com>
The driver has to end unreturned commands at some point even if the
controller has not provided a completion. The driver tried to be safe by
deleting IO queues prior to ending all unreturned commands. That should
cause the controller to internally abort inflight commands, but IO queue
deletion request does not have to be successful, so all bets are off. We
still have to make progress, so to be extra safe, this patch doesn't
clear a queue to release the dma mapping for a command until after the
pci device has been disabled.
This patch removes the special handling during device initialization
so controller recovery can be done all the time. This is possible since
initialization is not inlined with pci probe anymore.
Reported-by: Nilish Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
This performs the longest parts of nvme device probe in scheduled work.
This speeds up probe significantly when multiple devices are in use.
Signed-off-by: Keith Busch <keith.busch@intel.com>
This creates a new class type for nvme devices to register their
management character devices with. This is so we do not rely on miscdev
to provide enough minors for as many nvme devices some people plan to
use. The previous limit was approximately 60 NVMe controllers, depending
on the platform and kernel. Now the limit is 1M, which ought to be enough
for anybody.
Since we have a new device class, it makes sense to attach the block
devices under this as well, so part of this patch moves the management
handle initialization prior to the namespaces discovery.
Signed-off-by: Keith Busch <keith.busch@intel.com>
The original translation created collisions on Inquiry VPD 83 for many
existing devices. Newer specifications provide other ways to translate
based on the device's version can be used to create unique identifiers.
Version 1.1 provides an EUI64 field that uniquely identifies each
namespace, and 1.2 added the longer NGUID field for the same reason.
Both follow the IEEE EUI format and readily translate to the SCSI device
identification EUI designator type 2h. For devices implementing either,
the translation will use this type, defaulting to the EUI64 8-byte type if
implemented then NGUID's 16 byte version if not. If neither are provided,
the 1.0 translation is used, and is updated to use the SCSI String format
to guarantee a unique identifier.
Knowing when to use the new fields depends on the nvme controller's
revision. The NVME_VS macro was not decoding this correctly, so that is
fixed in this patch and moved to a more appropriate place.
Since the Identify Namespace structure required an update for the NGUID
field, this patch adds the remaining new 1.2 fields to the structure.
Signed-off-by: Keith Busch <keith.busch@intel.com>