linux

iv/linux

Author	SHA1	Message	Date
Sagi Grimberg	4054637c9b	nvme: flush reset_work before safely continuing with delete operation Prevent racing controller reset and delete flows. reset_work must not ever self-requeue so flushing it suffices. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-11-01 16:28:17 +01:00
Christoph Hellwig	6cd53d14aa	nvme: consolidate common code from ->reset_work No change in behavior except that the FC code cancels two work items a little later now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-11-01 16:28:06 +01:00
Christoph Hellwig	c5017e8570	nvme: move controller deletion to common code Move the ->delete_work and the associated helpers to common code instead of duplicating them in every driver. This also adds the missing reference get/put for the loop driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-11-01 16:28:04 +01:00
Christoph Hellwig	999ada2871	nvme: check for a live controller in nvme_dev_open This is a much more sensible check than just the admin queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@rimbeg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-10-27 09:05:22 +03:00
Christoph Hellwig	a6a5149b10	nvme: get rid of nvme_ctrl_list Use the core chrdev code to set up the link between the character device and the nvme controller. This allows us to get rid of the global list of all controllers, and also ensures that we have both a reference to the controller and the transport module before the open method of the character device is called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sgi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com>	2017-10-27 09:04:53 +03:00
Christoph Hellwig	d22524a478	nvme: switch controller refcounting to use struct device Instead of allocating a separate struct device for the character device handle embedd it into struct nvme_ctrl and use it for the main controller refcounting. This removes double refcounting and gets us an automatic reference for the character device operations. We keep ctrl->device as a pointer for now to avoid chaning printks all over, but in the future we could look into message printing helpers that take a controller structure similar to what other subsystems do. Note the delete_ctrl operation always already has a reference (either through sysfs due this change, or because every open file on the /dev/nvme-fabrics node has a refernece) when it is entered now, so we don't need to do the unless_zero variant there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com>	2017-10-27 09:04:07 +03:00
Christoph Hellwig	c6424a90da	nvme: simplify nvme_open Now that we are protected against lookup vs free races for the namespace by using kref_get_unless_zero we don't need the hack of NULLing out the disk private data during removal. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-10-27 09:03:31 +03:00
Christoph Hellwig	2dd4122854	nvme: use kref_get_unless_zero in nvme_find_get_ns For kref_get_unless_zero to protect against lookup vs free races we need to use it in all places where we aren't guaranteed to already hold a reference. There is no such guarantee in nvme_find_get_ns, so switch to kref_get_unless_zero in this function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-10-27 09:02:58 +03:00
Christoph Hellwig	9843f685ae	nvme: use ida_simple_{get,remove} for the controller instance Switch to the ida_simple_* helpers instead of opencoding them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-10-20 12:16:57 +02:00
Sagi Grimberg	31b8446079	nvme: introduce nvme_reinit_tagset Move blk_mq_reinit_tagset from blk-mq to nvme core as the only user of it. Current transports that use it (rdma, fc) simply implement .reinit_request op. This patch does not change any functionality. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-10-18 19:27:48 +02:00
Christoph Hellwig	761f2e1ed8	nvme: simplify compat_ioctl handling We can just use our normal ioctl handler for the compat case and remove the boilerplate code for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>	2017-10-16 14:54:10 +02:00
Marc Olson	8ae4e4477d	nvme: update timeout module parameter type The underlying blk_mq_tag_set, and request timeout parameters support an unsigned int. Extend the size of the nvme module parameters for io and admin commands to match. Signed-off-by: Marc Olson <marcolso@amazon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-10-04 09:43:09 +02:00
Sagi Grimberg	1a40d97288	nvme-core: Use nvme_wq to queue async events and fw activation async_event_work might race as it is executed from two different workqueues at the moment. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-09-25 12:42:11 -06:00
James Smart	0951338d96	nvme: allow timed-out ios to retry Currently the nvme_req_needs_retry() applies several checks to see if a retry is allowed. On of those is whether the current time has exceeded the start time of the io plus the timeout length. This check, if an io times out, means there is never a retry allowed for the io. Which means applications see the io failure. Remove this check and allow the io to timeout, like it does on other protocols, and retries to be made. On the FC transport, a frame can be lost for an individual io, and there may be no other errors that escalate for the connection/association. The io will timeout, which causes the transport to escalate into creating a new association, but the io that timed out, due to this retry logic, has already failed back to the application and things are hosed. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-09-25 08:56:05 -06:00
James Smart	cd48282cc7	nvme: stop aer posting if controller state not live If an nvme async_event command completes, in most cases, a new async event is posted. However, if the controller enters a resetting or reconnecting state, there is nothing to block the scheduled work element from posting the async event again. Nor are there calls from the transport to stop async events when an association dies. In the case of FC, where the association is torn down, the aer must be aborted on the FC link and completes through the normal job completion path. Thus the terminated async event ends up being rescheduled even though the controller isn't in a valid state for the aer, and the reposting gets the transport into a partially torn down data structure. It's possible to hit the scenario on rdma, although much less likely due to an aer completing right as the association is terminated and as the association teardown reclaims the blk requests via nvme_cancel_request() so its immediate, not a link-related action like on FC. Fix by putting controller state checks in both the async event completion routine where it schedules the async event and in the async event work routine before it calls into the transport. It's effectively a "stop_async_events()" behavior. The transport, when it creates a new association with the subsystem will transition the state back to live and is already restarting the async event posting. Signed-off-by: James Smart <james.smart@broadcom.com> [hch: remove taking a lock over reading the controller state] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-09-25 08:56:05 -06:00
Christoph Hellwig	044a9df1a7	nvme-pci: implement the HMB entry number and size limitations Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size and Host Memory Maximum Descriptors Entries field that were added in TP 4002 HMB Enhancements. These allow the controller to advertise limits for the usual number of segments in the host memory buffer, as well as a minimum usable per-segment size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-09-11 12:29:40 -04:00
Christoph Hellwig	608cc4b14a	nvme: fix lightnvm check nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to reading an incorrect field, or possible even a dereference of unallocated memory for fabrics controllers. Fix this by introducing a quirk for lighnvm capable devices instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-09-11 12:29:36 -04:00
Keith Busch	63263d60e0	nvme: Use metadata for passthrough commands The ioctls' struct allows the user to provide a metadata address and length for a passthrough command. This patch uses these values that were previously ignored and deletes the now unused wrapper function. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-30 14:59:35 +02:00
Keith Busch	485783ca63	nvme: Make nvme user functions static These functions are used only locally in the nvme core. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-30 14:59:34 +02:00
Christoph Hellwig	1cad65620f	nvme: factor metadata handling out of __nvme_submit_user_cmd Keep the metadata code in a separate helper instead of making the main function more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-30 14:59:30 +02:00
Christoph Hellwig	1d5df6af8c	nvme: don't blindly overwrite identifiers on disk revalidate Instead validate that these identifiers do not change, as that is prohibited by the specification. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-08-29 10:23:04 +02:00
Christoph Hellwig	cdbff4f26b	nvme: remove nvme_revalidate_ns The function is used in two places, and the shared code for those will diverge later in this series. Instead factor out a new helper to get the ids for a namespace, simplify the calling conventions for nvme_identify_ns and just open code the sequence. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-08-29 10:22:39 +02:00
Christoph Hellwig	0a72bbba49	nvme: allow calling nvme_change_ctrl_state from irq context Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-08-29 10:22:23 +02:00
Christoph Hellwig	a751da3394	nvme: report more detailed status codes to the block layer Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-08-29 10:22:18 +02:00
Martin K. Petersen	07fbd32a6b	nvme: honor RTD3 Entry Latency for shutdowns If an NVMe controller reports RTD3 Entry Latency larger than shutdown_timeout, up to a maximum of 60 seconds, use that value to set the shutdown timer. Otherwise fall back to the module parameter which defaults to 5 seconds. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [hch: removed do_div, made transition time local scope] Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-28 23:00:44 +03:00
Max Gurtovoy	60b43f627a	nvme: rename AMS symbolic constants to fit specification Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-28 23:00:39 +03:00
Sagi Grimberg	caaa15c509	nvme: fix identify namespace logging Use ctrl->device and lose the func name. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-28 23:00:38 +03:00
Jon Derrick	dbf86b3900	nvme: add support for NVMe 1.3 Timestamp Feature NVME's Timestamp feature allows controllers to be aware of the epoch time in milliseconds. This patch adds the set features hook for various transports through the identify path, so that resets and resumes can update the controller as necessary. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> [hch: rebased on top of nvme-4.13 error handling changes, changed nvme_configure_timestamp to return the status] Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-28 21:38:18 +02:00
Arnav Dawn	62346eaeb2	nvme: define NVME_NSID_ALL Define the constant "0xffffffff" (used as nsid for all namespaces) as NVME_NSID_ALL. Signed-off-by: Arnav Dawn <a.dawn@samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-08-28 21:38:17 +02:00
Arnav Dawn	b6dccf7fae	nvme: add support for FW activation without reset This patch adds support for handling Fw activation without reset On completion of FW-activation-starting AER, all queues are paused till CSTS.PP is cleared or timed out (exceeds max time for fw activtion MTFA). If device fails to clear CSTS.PP within MTFA, driver issues reset controller. Signed-off-by: Arnav Dawn <a.dawn@samsung.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-08-28 21:38:17 +02:00
Jens Axboe	cd996fb47c	Linux 4.13-rc7 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZo2HiAAoJEHm+PkMAQRiG3OcIAJqSeVK2uQ/QhmqFN1ExYay4 bdTjSTtSk7GH6PxI2C0cqfZvsxOUU7ICDHG8bYM1LA0S0SxfOtoFhHGKc/BcFLX8 MiKJWlF51ZbX0mkIEpKF+C8pRrXPgSqtk3N450/k2BzG9qCZSM93A2NCOB7v9T9w XOBUIYHqfTS2tdmCinjwu8Ls+w8oPOGH1gLjxZyGnBlg4lTqHMcUufmHeVEAh11d giGByqqqXH69kGD1HNC7H6quzXN9rz4n0gEwEG0mIhfkJ98b+ESSWwSEXXypOAQD QT5/6+2YizXf5DPCqR46xasQCPjRsS6Sv0cF2cntW2PEAb4jBjhx5gTFlJcoOC8= =efWJ -----END PGP SIGNATURE----- Merge tag 'v4.13-rc7' into for-4.14/block-postmerge Linux 4.13-rc7 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-08-28 13:00:44 -06:00
Christoph Hellwig	74d46992e0	block: replace bi_bdev with a gendisk pointer and partitions index This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-08-23 12:49:55 -06:00
Kwan (Hingkwan) Huen-SSI	a082b42628	nvme: fix directive command numd calculation The numd field of directive receive command takes number of dwords to transfer. This fix has the correct calculation for numd. Signed-off-by: Kwan (Hingkwan) Huen-SSI <kwan.huen@samsung.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-10 19:53:44 +02:00
Keith Busch	634b832590	nvme: fix nvme reset command timeout handling We need to return an error if a timeout occurs on any NVMe command during initialization. Without this, the nvme reset work will be stuck. A timeout will have a negative error code, meaning we need to stop initializing the controller. All postitive returns mean the controller is still usable. bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196325 Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Martin Peres <martin.peres@intel.com> [jth consolidated cleanup path ] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-10 19:53:38 +02:00
Martin Wilck	758f373558	nvme: strip trailing 0-bytes in wwid_show Some broken controllers (such as earlier Linux targets) pad model or serial fields with 0-bytes rather than spaces. The NVMe spec disallows 0 bytes in "ASCII" fields. Thus strip trailing 0-bytes, too. Also make sure that we get no underflow for pathological input. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-08-10 10:43:31 +02:00
Scott Bauer	7dd1ab163c	nvme: validate admin queue before unquiesce With a misbehaving controller it's possible we'll never enter the live state and create an admin queue. When we fail out of reset work it's possible we failed out early enough without setting up the admin queue. We tear down queues after a failed reset, but needed to do some more sanitization. Fixes `443bd90f2c`: "nvme: host: unquiesce queue in nvme_kill_queues()" [ 189.650995] nvme nvme1: pci function 0000:0b:00.0 [ 317.680055] nvme nvme0: Device not ready; aborting reset [ 317.680183] nvme nvme0: Removing after probe failure status: -19 [ 317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 317.681397] general protection fault: 0000 [#1] SMP KASAN [ 317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5 [ 317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016 [ 317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme] [ 317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000 [ 317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0 [ 317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282 [ 317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3 [ 317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000 [ 317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000 [ 317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0 [ 317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0 [ 317.684371] FS: 0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000 [ 317.684519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0 [ 317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 317.685018] Call Trace: [ 317.685084] nvme_kill_queues+0x4d/0x170 [nvme_core] [ 317.685185] nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme] [ 317.685289] process_one_work+0x771/0x1170 [ 317.685372] worker_thread+0xde/0x11e0 [ 317.685452] ? pci_mmcfg_check_reserved+0x110/0x110 [ 317.685550] kthread+0x2d3/0x3d0 [ 317.685617] ? process_one_work+0x1170/0x1170 [ 317.685704] ? kthread_create_on_node+0xc0/0xc0 [ 317.685785] ret_from_fork+0x25/0x30 [ 317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3 [ 317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40 [ 317.685908] ---[ end trace a3f8704150b1e8b4 ]--- Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-07-26 17:41:41 +02:00
Johannes Thumshirn	6484f5d16f	nvme: also provide a UUID in the WWID sysfs attribute The WWID sysfs attribute can provide multiple means of a World Wide ID for a NVMe device. It can either be a NGUID, a EUI-64 or a concatenation of VID, Serial Number, Model and the Namespace ID in this order of preference. If the target also sends us a UUID use the UUID for identification and give it the highest priority. This eases generation of /dev/disk/by-* symlinks. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-07-25 17:58:22 +02:00
Christoph Hellwig	dc1a0afbac	nvme: fix byte swapping in the streams code Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-07-20 08:41:56 -06:00
Sagi Grimberg	d09f2b45f3	nvme: split nvme_uninit_ctrl into stop and uninit Usually before we teardown the controller we want to: 1. complete/cancel any ctrl inflight works 2. remove ctrl namespaces (only for removal though, resets shouldn't remove any namespaces). but we do not want to destroy the controller device as we might use it for logging during the teardown stage. This patch adds nvme_start_ctrl() which queues inflight controller works (aen, ns scan, queue start and keep-alive if kato is set) and nvme_stop_ctrl() which cancels the works namespace removal is left to the callers to handle. Move nvme_uninit_ctrl after we are done with the controller device. Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-07-06 09:49:42 +03:00
Sagi Grimberg	8d7b8fafad	nvme: kick requeue list when requeueing a request instead of when starting the queues When we requeue a request, we can always insert the request back to the scheduler instead of doing it when restarting the queues and kicking the requeue work, so get rid of the requeue kick in nvme (core and drivers). Also, now there is no need start hw queues in nvme_kill_queues We don't stop the hw queues anymore, so no need to start them. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-07-06 09:48:59 +03:00
Christoph Hellwig	49d3d50b0d	nvme: simplify nvme_dev_attrs_are_visible Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-28 08:14:13 -06:00
Christoph Hellwig	180de00700	nvme: read the subsystem NQN from Identify Controller NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the Identify controller data structures. Use this NQN for the subsysnqn sysfs attribute by storing it in the nvme_ctrl structure after verifying it. For older controllers we generate a "fake" NQN per non-normative text in the NVMe 1.3 spec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-28 08:14:13 -06:00
Kai-Heng Feng	76a5af8417	nvme: explicitly disable APST on quirked devices A user reports APST is enabled, even when the NVMe is quirked or with option "default_ps_max_latency_us=0". The current logic will not set APST if the device is quirked. But the NVMe in question will enable APST automatically. Separate the logic "apst is supported" and "to enable apst", so we can use the latter one to explicitly disable APST at initialiaztion. BugLink: https://bugs.launchpad.net/bugs/1699004 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-28 08:14:13 -06:00
Keith Busch	3f7f25a910	nvme: Remove SCSI translations The SCSI-to-NVMe translations were added to assist storage applications utilizing SG_IO transitioning to NVMe. It was always recommended, however, to use native NVMe for device management as too much is lost in translation and the maintenance burden in keeping this kludgey layer around has been neglected such that much of the translations are completely broken. This patch removes SG_IO handling from NVMe to avoid any confusion regarding maintenance support for this interface. The config option for NVMe SCSI emulation has been disabled by default since 4.5. The driver has supported native nvme user commands since the beginning, and native tooling is publicly available for use or as reference for anyone writing their own tools, so there's no excuse for hanging onto a broken crutch. Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-28 08:14:13 -06:00
Jens Axboe	f5d1184062	nvme: add support for streams and directives This adds support for Directives in NVMe, particular for the Streams directive. Support for Directives is a new feature in NVMe 1.3. It allows a user to pass in information about where to store the data, so that it the device can do so most effiently. If an application is managing and writing data with different life times, mixing differently retentioned data onto the same locations on flash can cause write amplification to grow. This, in turn, will reduce performance and life time of the device. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-27 12:05:56 -06:00
Ming Lei	443bd90f2c	nvme: host: unquiesce queue in nvme_kill_queues() When nvme_kill_queues() is run, queues may be in quiesced state, so we forcibly unquiesce queues to avoid blocking dispatch, and I/O hang can be avoided in remove path. Peviously we use blk_mq_start_stopped_hw_queues() as counterpart of blk_mq_quiesce_queue(), now we have introduced blk_mq_unquiesce_queue(), so use it explicitly. Cc: linux-nvme@lists.infradead.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 20:52:58 -06:00
Ming Lei	f660174e8b	blk-mq: use the introduced blk_mq_unquiesce_queue() blk_mq_unquiesce_queue() is used for unquiescing the queue explicitly, so replace blk_mq_start_stopped_hw_queues() with it. For the scsi part, this patch takes Bart's suggestion to switch to block quiesce/unquiesce API completely. Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: dm-devel@redhat.com Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 14:20:04 -06:00
Scott Bauer	6b8190d61a	nvme: implement NS Optimal IO Boundary from 1.3 Spec The NVMe 1.3 spec introduces Namespace Optimal IO Boundaries (NOIOB), which standardizes the stripe mechanism we currently have quirks for. This patch implements the necessary logic to handle this new feature. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-16 08:25:54 +02:00
Sagi Grimberg	8fa611213d	nvme: don't hard code size of struct t10_pi_tuple Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-15 15:50:18 +02:00
Christoph Hellwig	39bdc5901f	nvme: no need to wait for the reset when keepalive fails We don't need to wait for the reset from the delayed work item that is kicked off when we don't get a keepalive. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-06-15 15:48:45 +02:00

1 2 3 4 5

213 Commits