Ofir Bitton a39725819c habanalabs/gaudi: don't use disabled ports in collective wait
In the collective wait, we put jobs on the QMANs of all the NICs. The
code takes into account if a port is disabled only in case of PCI card.
When this info arrives from the f/w, the code doesn't take it into
account, and it tries to schedule jobs on NICs that aren't enabled and
thats a bug.

To fix this, after the f/w sends us the list of disabled ports, we
update the state of the QMANs according to that list. In addition,
we need to update the HW_CAP bits so the collective wait operation
will not try to use those QMANs. We also need to update the collective
master monitor mask.

Moreover, we need to add a protection for such future cases and in case
the user will try to submit work to those QMANs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
..
2020-07-24 20:31:37 +03:00