Guilherme G. Piccoli 70f1872e38 drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 <TASK>
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-08 22:04:12 -05:00
..
2022-12-13 15:22:14 -08:00
2023-01-14 07:38:48 +09:00
2023-01-27 16:16:57 -08:00
2022-12-16 03:49:24 -08:00
2023-01-12 17:02:20 -06:00
2022-12-19 07:13:33 -06:00
2023-01-12 05:56:06 -06:00
2023-01-31 12:23:23 +01:00
2023-01-31 12:23:23 +01:00
2022-12-16 03:49:24 -08:00
2023-01-27 14:05:38 +01:00
2023-01-18 14:44:32 -08:00
2022-12-13 15:47:48 -08:00
2022-12-13 13:09:38 -08:00
2022-12-11 21:25:58 +01:00
2022-12-16 03:49:24 -08:00
2023-01-29 11:06:47 -08:00
2023-01-27 16:09:12 -08:00
2022-12-21 09:19:24 -08:00
2023-01-31 12:23:23 +01:00
2023-01-27 16:16:57 -08:00
2023-01-13 17:32:22 -06:00
2022-12-13 15:47:48 -08:00
2022-12-21 09:41:28 -08:00
2022-12-21 09:19:24 -08:00
2023-01-06 13:12:42 -08:00
2023-01-24 17:42:53 -08:00
2023-01-10 23:09:09 +01:00
2022-12-19 08:47:33 -06:00
2022-12-12 10:17:08 -08:00
2023-01-21 11:12:42 -08:00
2023-01-21 11:10:03 -08:00
2023-01-31 12:23:23 +01:00
2022-12-16 03:49:24 -08:00
2022-12-17 08:34:01 -06:00
2023-01-12 17:02:20 -06:00