accel/habanalabs: abort device reset for consecutive heartbeat failures
The mechanism of aborting device reset for consecutive fatal errors is currently only for fatal errors that are reported by FW. A non-responsive FW and consecutive heartbeat failures is also considered fatal, so add them as well to this mechanism to avoid recurring device reset in such a case. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
parent
d0df8a35a7
commit
246d8b6cfb
@ -1769,14 +1769,16 @@ kill_processes:
|
||||
hdev->device_cpu_disabled = false;
|
||||
hdev->reset_info.hard_reset_pending = false;
|
||||
|
||||
if (hdev->reset_info.reset_trigger_repeated &&
|
||||
(hdev->reset_info.prev_reset_trigger ==
|
||||
HL_DRV_RESET_FW_FATAL_ERR)) {
|
||||
/* if there 2 back to back resets from FW,
|
||||
* ensure driver puts the driver in a unusable state
|
||||
/*
|
||||
* Put the device in an unusable state if there are 2 back to back resets due to
|
||||
* fatal errors.
|
||||
*/
|
||||
if (hdev->reset_info.reset_trigger_repeated &&
|
||||
(hdev->reset_info.prev_reset_trigger == HL_DRV_RESET_FW_FATAL_ERR ||
|
||||
hdev->reset_info.prev_reset_trigger ==
|
||||
HL_DRV_RESET_HEARTBEAT)) {
|
||||
dev_crit(hdev->dev,
|
||||
"%s Consecutive FW fatal errors received, stopping hard reset\n",
|
||||
"%s Consecutive fatal errors, stopping hard reset\n",
|
||||
dev_name(&(hdev)->pdev->dev));
|
||||
rc = -EIO;
|
||||
goto out_err;
|
||||
|
Loading…
Reference in New Issue
Block a user