nvme: add error message on mismatching controller ids

We've seen a few devices that return different controller id's to
the Fabric Connect command vs the Identify(controller) command. It's
currently hard to identify this failure by existing error messages. It
comes across as a (re)connect attempt in the transport that fails with
a -22 (-EINVAL) status. The issue is compounded by older kernels not
having the controller id check or had the identify command overwrite the
fabrics controller id value before it checked. Both resulted in cases
where the devices appeared fine until more recent kernels.

Clarify the reject by adding an error message on controller id mismatches.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
This commit is contained in:
James Smart 2019-11-21 09:58:10 -08:00 committed by Keith Busch
parent 863fbae929
commit a8157ff360

View File

@ -2862,6 +2862,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
* admin connect
*/
if (ctrl->cntlid != le16_to_cpu(id->cntlid)) {
dev_err(ctrl->device,
"Mismatching cntlid: Connect %u vs Identify "
"%u, rejecting\n",
ctrl->cntlid, le16_to_cpu(id->cntlid));
ret = -EINVAL;
goto out_free;
}