linux

iv/linux

Author	SHA1	Message	Date
Luo Jiaxing	e8a4d0daae	scsi: hisi_sas: Speed up error handling when internal abort timeout occurs If an internal task abort timeout occurs, the controller has developed a fault, and needs to be reset to be recovered. When this occurs during error handling, the current policy is to allow error handling to continue, and the inevitable nexus ha reset will handle the required reset. However various steps of error handling need to taken before this happens. These also involve some level of HW interaction, which will also fail with various timeouts. Speed up this process by recording a HW fault bit for an internal abort timeout - when this is set, just automatically error any HW interaction, and essentially go straight to clear nexus ha (to reset the controller). Link: https://lore.kernel.org/r/1623058179-80434-6-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:21:52 -04:00
Luo Jiaxing	63ece9eb35	scsi: hisi_sas: Reset controller for internal abort timeout If an internal task abort timeout occurs, the controller has developed a fault, and needs to be reset to be recovered. However if a timeout occurs during SCSI error handling, issuing a controller reset immediately may conflict with the error handling. To handle internal abort in these two scenarios, only queue the reset when not in an error handling function. In the case of a timeout during error handling, do nothing and rely on the inevitable ha nexus reset to reset the controller. Link: https://lore.kernel.org/r/1623058179-80434-5-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:21:51 -04:00
Luo Jiaxing	2f12a49951	scsi: hisi_sas: Include HZ in timer macros Include HZ in timer macros to make the code more concise. Link: https://lore.kernel.org/r/1623058179-80434-4-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:21:51 -04:00
Luo Jiaxing	0f75733991	scsi: hisi_sas: Run I_T nexus resets in parallel for clear nexus reset For a clear nexus reset operation, the I_T nexus resets are executed serially for each device. For devices attached through an expander, this may take 2s per device; so, in total, could take a long time. Reduce the total time by running the I_T nexus resets in parallel through async operations. Link: https://lore.kernel.org/r/1623058179-80434-3-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:21:51 -04:00
Luo Jiaxing	366da0da1f	scsi: hisi_sas: Put a limit of link reset retries If an OOB event is received but the phy still fails to come up, a link reset will be issued repeatedly at an interval of 20s until the phy comes up. Set a limit for link reset issue retries to avoid printing the timeout message endlessly. Link: https://lore.kernel.org/r/1623058179-80434-2-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:21:51 -04:00
Mike Christie	d1f2ce7763	scsi: qedi: Fix host removal with running sessions qedi_clear_session_ctx() could race with the in-kernel or userspace driven recovery/removal and we could access a NULL conn or do a double free. We should be using iscsi_host_remove() to start the removal process from the driver. It will start the in-kernel recovery and notify userspace that the driver's scsi_hosts are being removed. iscsid will then drive the session removal like is done when the logout command is run. When the sessions are removed, iscsi_host_remove() will return so qedi can finish knowing there are no running sessions and no new sessions will be allowed. This also fixes an issue where we check for a NULL conn after already accessing it introduced in commit 27e986289e73 ("scsi: iscsi: Drop suspend calls from ep_disconnect") by just removing the function completely. Link: https://lore.kernel.org/r/20210609192709.5094-1-michael.christie@oracle.com Fixes: 27e986289e73 ("scsi: iscsi: Drop suspend calls from ep_disconnect") Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:06:57 -04:00
Dan Carpenter	2938bedd0e	scsi: mpi3mr: Fix error handling in mpi3mr_setup_isr() The pci_alloc_irq_vectors_affinity() function returns negative error codes or it returns a number between the minimum vectors (1 in this case) and max_vectors. It won't return zero. Because "i" is a u16 then the error handling won't work. And also if it did work the error code was not set. Really "max_vectors" can be an int as well because we're doing a min_t() on int type. The other change is that it's better to remove unnecessary initialization so that static checkers can warn us if there are ever uninitialized variable bugs introduced in the future. I changed the error code from -1 (-EPERM) if the kmalloc() failed to -ENOMEM. And on success path I changed it from "return retval;" to "return 0;" which shouldn't affect the compiled code but makes it more readable. Link: https://lore.kernel.org/r/YMCJcnmSI4kOIyv/@mwanda Fixes: 824a156633df ("scsi: mpi3mr: Base driver code") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:01:24 -04:00
Dan Carpenter	d46bdecd9f	scsi: mpi3mr: Delete unnecessary NULL check The "mrioc->intr_info" pointer can't be NULL, but if it could then the second iteration through the loop would Oops. Let's delete the confusing and impossible NULL check. Link: https://lore.kernel.org/r/YMCJKgykDYtyvY44@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 23:01:24 -04:00
Tomas Henzl	d3d61f9c8c	scsi: mpi3mr: Fix a double free Fix a double free, scsi_tgt_priv_data will be freed in mpi3mr_target_destroy() so remove the kfree() from mpi3mr_target_alloc(). I've also removed few unneeded initialisations. Link: https://lore.kernel.org/r/20210608145712.16386-1-thenzl@redhat.com Acked-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 22:58:24 -04:00
Can Guo	eb783bb8bb	scsi: ufs: core: Fix a possible use before initialization case In ufshcd_exec_dev_cmd(), if error happens before lrpb is initialized, then we should bail out instead of letting trace record the error. Link: https://lore.kernel.org/r/1623227044-22635-1-git-send-email-cang@codeaurora.org Fixes: a45f937110fa ("scsi: ufs: Optimize host lock on transfer requests send/compl paths") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-09 22:56:47 -04:00
Bean Huo	105424895c	scsi: ufs: core: Use UPIU query trace in devman_upiu_cmd() Since devman_upiu_cmd() is not COMMAND UPIU, and doesn't have CDB, it is better to use UPIU query trace, which provides more helpful information for issue troubleshooting. Link: https://lore.kernel.org/r/20210531104308.391842-5-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:36:20 -04:00
Bean Huo	44b5de3635	scsi: ufs: core: Capture command trace only for the cmd != NULL case For the query request, we already have query_trace, but in ufshcd_send_command(), there will add two more redundant traces. Since lrbp->cmd is NULL in the query request, the two trace events below provide nothing except the tag and DB. Instead of letting them take up the limited trace ring buffer, it’s better not to print these traces in case of cmd == NULL. ufshcd_command: send_req: ff3b0000.ufs: tag: 28, DB: 0x0, size: -1, IS: 0, LBA: 18446744073709551615, opcode: 0x0 (0x0), group_id: 0x0 ufshcd_command: dev_complete: ff3b0000.ufs: tag: 28, DB: 0x0, size: -1, IS: 0, LBA: 18446744073709551615, opcode: 0x0 (0x0), group_id: 0x0 Link: https://lore.kernel.org/r/20210531104308.391842-4-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:35:58 -04:00
Bean Huo	89ac2c3b28	scsi: ufs: core: Let UPIU completion trace print RSP UPIU header The current UPIU completion event trace still prints the COMMAND UPIU header, rather than the RSP UPIU header. This makes UPIU command trace useless in problem shooting in case we receive a trace log from the customer/field. There are two important fields in RSP UPIU: 1. The response field, which indicates the UFS defined overall success or failure of the series of Command, Data and RESPONSE UPIU’s that make up the execution of a task. 2. The Status field, which contains the command set specific status for a specific command issued by the initiator device. Before this commit, the UPIU paired trace events: ufshcd_upiu: send_req: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 ufshcd_upiu: complete_rsp: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 After this commit: ufshcd_upiu: send_req: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 ufshcd_upiu: complete_rsp: fe3b0000.ufs: HDR:21 00 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 Link: https://lore.kernel.org/r/20210531104308.391842-3-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:35:58 -04:00
Bean Huo	04c073feb1	scsi: ufs: core: Clean up ufshcd_add_command_trace() To consistent with trace event print, convert the value of the variable 'lba' from a block layer sector address to a logical block adress. Link: https://lore.kernel.org/r/20210531104308.391842-2-huobean@gmail.com Suggested-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:35:58 -04:00
Keoseong Park	3242490233	scsi: ufs: core: Remove repeated word in comment Remove repeated word "for" in comment. Link: https://lore.kernel.org/r/1891546521.01622777101796.JavaMail.epsvc@epcpadp3 Signed-off-by: Keoseong Park <keosung.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:24:47 -04:00
Gustavo A. R. Silva	7b8a49881b	scsi: mpi3mr: Fix fall-through warning for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a fall-through warning by explicitly adding a break statement instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Link: https://lore.kernel.org/r/20210604023530.GA180997@embeddedor Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:23:46 -04:00
Gustavo A. R. Silva	61f4f11b48	scsi: NCR5380: Fix fall-through warning for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a fall-through warning by replacing a /* fallthrough */ comment with the new pseudo-keyword macro fallthrough; Link: https://github.com/KSPP/linux/issues/115 Link: https://lore.kernel.org/r/20210604022752.GA168289@embeddedor Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:22:53 -04:00
Can Guo	6f71517296	scsi: ufs: Utilize Transfer Request List Completion Notification Register By reading the UTP Transfer Request List Completion Notification Register, which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer requests. Thus, SW can get rid of host lock, which is used to synchronize the tr_doorbell and outstanding_reqs, on transfer requests dispatch and completion paths. This can further benefit random read/write performance. Link: https://lore.kernel.org/r/1621845419-14194-4-git-send-email-cang@codeaurora.org Cc: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:18:03 -04:00
Can Guo	a45f937110	scsi: ufs: Optimize host lock on transfer requests send/compl paths Current UFS IRQ handler is completely wrapped by host lock, and because ufshcd_send_command() is also protected by host lock, when IRQ handler fires, not only the CPU running the IRQ handler cannot send new requests, the rest CPUs can neither. Move the host lock wrapping the IRQ handler into specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(), ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is no longer required to call __ufshcd_transfer_req_compl(). As per test, the optimization can bring considerable gain to random read/write performance. Link: https://lore.kernel.org/r/1621845419-14194-3-git-send-email-cang@codeaurora.org Cc: Stanley Chu <stanley.chu@mediatek.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:18:03 -04:00
Can Guo	1cca0c3fdc	scsi: ufs: Remove a redundant command completion logic in error handler ufshcd_host_reset_and_restore() anyways completes all pending requests before starts re-probing, so there is no need to complete the command on the highest bit in tr_doorbell in advance. Link: https://lore.kernel.org/r/1621845419-14194-2-git-send-email-cang@codeaurora.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 22:18:03 -04:00
Dan Carpenter	80927822e8	scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg() The "retval" variable needs to be signed for the error handling to work. Link: https://lore.kernel.org/r/YLjMEAFNxOas1mIp@mwanda Fixes: 7e26e3ea0287 ("scsi: scsi_dh_alua: Check for negative result value") Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 21:48:16 -04:00
Avri Altman	8b1afb7ab0	scsi: ufs: core: Remove irrelevant reference to non-existing doc Remove all references to the description of __ufshcd_wl_{suspend,resume} as no such description exist. Fixes: b294ff3e3449 (scsi: ufs: core: Enable power management for wlun) Link: https://lore.kernel.org/r/20210603122209.635799-1-avri.altman@wdc.com Signed-off-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 21:46:31 -04:00
Kees Cook	ebab8e09a0	scsi: fcoe: Statically initialize flogi_maddr In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy() avoid using an inline const buffer argument and instead just statically initialize the destination array directly. Link: https://lore.kernel.org/r/20210602180000.3326448-1-keescook@chromium.org Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 21:30:09 -04:00
Saurav Kashyap	1b67f3d74e	scsi: qedf: Update the max_id value in host structure host->max_id defines the maximum target id that the SCSI midlayer will attempt to manually scan. The default is 8. Update the value to the max sessions the driver supports. [mkp: applied by hand] Link: https://lore.kernel.org/r/20210602104653.17278-1-jhasan@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-07 21:26:24 -04:00
Bart Van Assche	62af0ee94b	scsi: core: Change the type of the second argument of scsi_host_complete_all_commands() Allow the compiler to verify the type of the second argument passed to scsi_host_complete_all_commands(). Link: https://lore.kernel.org/r/20210524025457.11299-4-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Cc: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 23:09:39 -04:00
Bart Van Assche	149d0e489e	scsi: core: Introduce enums for the SAM and host status codes Make it possible for the compiler to verify whether SAM and host status codes are used correctly. [mkp: resolve conflicts with Hannes' SCSI result series] Link: https://lore.kernel.org/r/20210524025457.11299-3-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 23:09:39 -04:00
Bart Van Assche	d377f415dd	scsi: libsas: Introduce more SAM status code aliases in enum exec_status This patch prepares for converting SAM status codes into an enum. Without this patch converting SAM status codes into an enumeration type would trigger complaints about enum type mismatches for the SAS code. Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Cc: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 16:10:46 -04:00
Martin K. Petersen	1ff28f229b	Merge branch '5.14/scsi-result' into 5.14/scsi-staging Include Hannes' SCSI command result rework in the staging branch. [mkp: remove DRIVER_SENSE from mpi3mr] Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:37:04 -04:00
Mike Christie	ed1b86ba0f	scsi: qedi: Wake up if cmd_cleanup_req is set If we got a response then we should always wake up the conn. For both the cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into iscsi_itt_to_task because we don't know what the upper layers are doing. We can also remove the qedi_clear_task_idx call here because once we signal success libiscsi will loop over the affected commands and end up calling the cleanup_task callout which will release it. Link: https://lore.kernel.org/r/20210525181821.7617-29-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:23 -04:00
Mike Christie	b40f3894e3	scsi: qedi: Complete TMF works before disconnect We need to make sure that abort and reset completion work has completed before ep_disconnect returns. After ep_disconnect we can't manipulate cmds because libiscsi will call conn_stop and take onwership. We are trying to make sure abort work and reset completion work has completed before we do the cmd clean up in ep_disconnect. The problem is that: 1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work was still pending we would not see the bit set. We need to do this before the work is queued. 2. If we had multiple works queued then we could break from the loop in qedi_ep_disconnect early because when abort work 1 completes it could clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that before work 2 has run. 3. A TMF reset completion work could run after ep_disconnect starts cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq -> qedi_cleanup_all_io would might think it's done cleaning up cmds, but the reset completion work could still be running. We then return from ep_disconnect while still doing cleanup. This replaces the bit with a counter to track the number of queued TMF works, and adds a bool to prevent new works from starting from the completion path once a ep_disconnect starts. Link: https://lore.kernel.org/r/20210525181821.7617-28-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:23 -04:00
Mike Christie	60a0d379f1	scsi: qedi: Pass send_iscsi_tmf task to abort qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf. Link: https://lore.kernel.org/r/20210525181821.7617-27-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:23 -04:00
Mike Christie	0c72191da6	scsi: qedi: Fix cleanup session block/unblock use Drivers shouldn't be calling block/unblock session for cmd cleanup because the functions can change the session state from under libiscsi. This adds a new a driver level bit so it can block all I/O the host while it drains the card. Link: https://lore.kernel.org/r/20210525181821.7617-26-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:23 -04:00
Mike Christie	2819b4ae28	scsi: qedi: Fix TMF session block/unblock use Drivers shouldn't be calling block/unblock session for tmf handling because the functions can change the session state from under libiscsi. iscsi_queuecommand's call to iscsi_prep_scsi_cmd_pdu-> iscsi_check_tmf_restrictions will prevent new cmds from being sent to qedi after we've started handling a TMF. So we don't need to try and block it in the driver, and we can remove these block calls. Link: https://lore.kernel.org/r/20210525181821.7617-25-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:23 -04:00
Mike Christie	140d63b73f	scsi: qedi: Use GFP_NOIO for TMF allocation We run from a workqueue with no locks held so use GFP_NOIO. Link: https://lore.kernel.org/r/20210525181821.7617-24-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	f7eea75262	scsi: qedi: Fix TMF tid allocation qedi_iscsi_abort_work and qedi_tmf_work both allocate a tid then call qedi_send_iscsi_tmf which also allocates a tid. This removes the tid allocation from the callers. Link: https://lore.kernel.org/r/20210525181821.7617-23-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	5b04d050cd	scsi: qedi: Fix use after free during abort cleanup If qedi_tmf_work's qedi_wait_for_cleanup_request call times out we will also force the clean up of the qedi_work_map but qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd. To fix this issue we extend where we hold the tmf_work_lock and back_lock so the qedi_process_cmd_cleanup_resp access is serialized with the cleanup done in qedi_tmf_work and any completion handling for the iscsi_task. Link: https://lore.kernel.org/r/20210525181821.7617-22-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	2ce002366a	scsi: qedi: Fix race during abort timeouts If the SCSI cmd completes after qedi_tmf_work calls iscsi_itt_to_task then the qedi qedi_cmd->task_id could be freed and used for another cmd. If we then call qedi_iscsi_cleanup_task with that task_id we will be cleaning up the wrong cmd. Wait to release the task_id until the last put has been done on the iscsi_task. Because libiscsi grabs a ref to the task when sending the abort, we know that for the non-abort timeout case that the task_id we are referencing is for the cmd that was supposed to be aborted. A latter commit will fix the case where the abort times out while we are running qedi_tmf_work. Link: https://lore.kernel.org/r/20210525181821.7617-21-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	5777b7f0f0	scsi: qedi: Fix null ref during abort handling If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is non-NULL when it wants to force cleanup. Link: https://lore.kernel.org/r/20210525181821.7617-20-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	a1f3486b3b	scsi: iscsi: Move pool freeing This doesn't fix any bugs, but it makes more sense to free the pool after we have removed the session. At that time we know nothing is touching any of the session fields, because all devices have been removed and scans are stopped. Link: https://lore.kernel.org/r/20210525181821.7617-19-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	99b0603313	scsi: iscsi: Hold task ref during TMF timeout handling For aborts, qedi needs to cleanup the FW then send the TMF from a worker thread. While it's doing these the cmd could complete normally and the TMF could time out. libiscsi would then complete the iscsi_task which will call into the driver to cleanup the driver level resources while it still might be accessing them for the cleanup/abort. This has iscsi_eh_abort keep the iscsi_task ref if the TMF times out, so qedi does not have to worry about if the task is being freed while in use and does not need to get its own ref. Link: https://lore.kernel.org/r/20210525181821.7617-18-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:22 -04:00
Mike Christie	7ce9fc5ecd	scsi: iscsi: Flush block work before unblock We set the max_active iSCSI EH works to 1, so all work is going to execute in order by default. However, userspace can now override this in sysfs. If max_active > 1, we can end up with the block_work on CPU1 and iscsi_unblock_session running the unblock_work on CPU2 and the session and target/device state will end up out of sync with each other. This adds a flush of the block_work in iscsi_unblock_session. Link: https://lore.kernel.org/r/20210525181821.7617-17-michael.christie@oracle.com Fixes: 1d726aa6ef57 ("scsi: iscsi: Optimize work queue flush use") Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:21 -04:00
Mike Christie	f6f9645744	scsi: iscsi: Fix completion check during abort races We have a ref to the task being aborted, so SCp.ptr will never be NULL. We need to use iscsi_task_is_completed to check for the completed state. Link: https://lore.kernel.org/r/20210525181821.7617-16-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:21 -04:00
Mike Christie	bdd4aad7ff	scsi: iscsi: Fix shost->max_id use The iscsi offload drivers are setting the shost->max_id to the max number of sessions they support. The problem is that max_id is not the max number of targets but the highest identifier the targets can have. To use it to limit the number of targets we need to set it to max sessions - 1, or we can end up with a session we might not have preallocated resources for. Link: https://lore.kernel.org/r/20210525181821.7617-15-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:21 -04:00
Mike Christie	ec29d0ac29	scsi: iscsi: Fix conn use after free during resets If we haven't done a unbind target call we can race where iscsi_conn_teardown wakes up the EH thread and then frees the conn while those threads are still accessing the conn ehwait. We can only do one TMF per session so this just moves the TMF fields from the conn to the session. We can then rely on the iscsi_session_teardown->iscsi_remove_session->__iscsi_unbind_session call to remove the target and it's devices, and know after that point there is no device or scsi-ml callout trying to access the session. Link: https://lore.kernel.org/r/20210525181821.7617-14-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:21 -04:00
Mike Christie	fda290c5ae	scsi: iscsi: Get ref to conn during reset handling The comment in iscsi_eh_session_reset is wrong and we don't wait for the EH to complete before tearing down the conn. This has us get a ref to the conn when we are not holding the eh_mutex/frwd_lock so it does not get freed from under us. Link: https://lore.kernel.org/r/20210525181821.7617-13-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:21 -04:00
Mike Christie	d39df15851	scsi: iscsi: Have abort handler get ref to conn If SCSI midlayer is aborting a task when we are tearing down the conn we could free the conn while the abort thread is accessing the conn. This has the abort handler get a ref to the conn so it won't be freed from under it. Note: this is not needed for device/target reset because we are holding the eh_mutex when accessing the conn. Link: https://lore.kernel.org/r/20210525181821.7617-12-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:21 -04:00
Mike Christie	b1d19e8c92	scsi: iscsi: Add iscsi_cls_conn refcount helpers There are a couple places where we could free the iscsi_cls_conn while it's still in use. This adds some helpers to get/put a refcount on the struct and converts an exiting user. Subsequent commits will then use the helpers to fix 2 bugs in the eh code. Link: https://lore.kernel.org/r/20210525181821.7617-11-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:20 -04:00
Mike Christie	788b71c54f	scsi: iscsi: iscsi_tcp: Start socket shutdown during conn stop Make sure the conn socket shutdown starts before we start the timer to fail commands to upper layers. Link: https://lore.kernel.org/r/20210525181821.7617-10-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:20 -04:00
Mike Christie	c0920cd36f	scsi: iscsi: iscsi_tcp: Set no linger Userspace (open-iscsi based tools at least) sets no linger on the socket to prevent stale data from being sent. However, with the in-kernel cleanup if userspace is not up the sockfd_put will release the socket without having set that sockopt. iscsid sets that opt at socket close time, but it seems ok to set this at setup time in the kernel for all tools. Link: https://lore.kernel.org/r/20210525181821.7617-9-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:20 -04:00
Mike Christie	23d6fefbb3	scsi: iscsi: Fix in-kernel conn failure handling Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") has the following regressions/bugs that this patch fixes: 1. It can return cmds to upper layers like dm-multipath where that can retry them. After they are successful the fs/app can send new I/O to the same sectors, but we've left the cmds running in FW or in the net layer. We need to be calling ep_disconnect if userspace is not up. This patch only fixes the issue for offload drivers. iscsi_tcp will be fixed in separate commit because it doesn't have a ep_disconnect call. 2. The drivers that implement ep_disconnect expect that it's called before conn_stop. Besides crashes, if the cleanup_task callout is called before ep_disconnect it might free up driver/card resources for session1 then they could be allocated for session2. But because the driver's ep_disconnect is not called it has not cleaned up the firmware so the card is still using the resources for the original cmd. 3. The stop_conn_work_fn can run after userspace has done its recovery and we are happily using the session. We will then end up with various bugs depending on what is going on at the time. We may also run stop_conn_work_fn late after userspace has called stop_conn and ep_disconnect and is now going to call start/bind conn. If stop_conn_work_fn runs after bind but before start, we would leave the conn in a unbound but sort of started state where IO might be allowed even though the drivers have been set in a state where they no longer expect I/O. 4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in kernel stop_conn function is breaking userspace. We should have been doing this for the caller. Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-02 01:28:20 -04:00

1 2 3 4 5 ...

1013657 Commits