accel/habanalabs: abort device reset for consecutive heartbeat failures
The mechanism of aborting device reset for consecutive fatal errors is currently only for fatal errors that are reported by FW. A non-responsive FW and consecutive heartbeat failures is also considered fatal, so add them as well to this mechanism to avoid recurring device reset in such a case. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
parent
d0df8a35a7
commit
246d8b6cfb
@ -1769,14 +1769,16 @@ kill_processes:
|
|||||||
hdev->device_cpu_disabled = false;
|
hdev->device_cpu_disabled = false;
|
||||||
hdev->reset_info.hard_reset_pending = false;
|
hdev->reset_info.hard_reset_pending = false;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Put the device in an unusable state if there are 2 back to back resets due to
|
||||||
|
* fatal errors.
|
||||||
|
*/
|
||||||
if (hdev->reset_info.reset_trigger_repeated &&
|
if (hdev->reset_info.reset_trigger_repeated &&
|
||||||
(hdev->reset_info.prev_reset_trigger ==
|
(hdev->reset_info.prev_reset_trigger == HL_DRV_RESET_FW_FATAL_ERR ||
|
||||||
HL_DRV_RESET_FW_FATAL_ERR)) {
|
hdev->reset_info.prev_reset_trigger ==
|
||||||
/* if there 2 back to back resets from FW,
|
HL_DRV_RESET_HEARTBEAT)) {
|
||||||
* ensure driver puts the driver in a unusable state
|
|
||||||
*/
|
|
||||||
dev_crit(hdev->dev,
|
dev_crit(hdev->dev,
|
||||||
"%s Consecutive FW fatal errors received, stopping hard reset\n",
|
"%s Consecutive fatal errors, stopping hard reset\n",
|
||||||
dev_name(&(hdev)->pdev->dev));
|
dev_name(&(hdev)->pdev->dev));
|
||||||
rc = -EIO;
|
rc = -EIO;
|
||||||
goto out_err;
|
goto out_err;
|
||||||
|
Loading…
Reference in New Issue
Block a user