linux/drivers/accel
Koby Elbaz a6685b573c habanalabs: block soft-reset on an unusable device
A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.

A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.

Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26 11:52:13 +02:00
..
habanalabs habanalabs: block soft-reset on an unusable device 2023-01-26 11:52:13 +02:00
ivpu accel/ivpu: Add PM support 2023-01-19 11:12:08 +01:00
drm_accel.c Fix mismerge due to devnode now taking a 'const *' device 2022-12-16 13:04:15 -06:00
Kconfig habanalabs: move driver to accel subsystem 2023-01-26 11:52:10 +02:00
Makefile habanalabs: move driver to accel subsystem 2023-01-26 11:52:10 +02:00