29 Commits

Author SHA1 Message Date
Andrey Grodzovsky
0b10ab8069 drm/sched: Avoid data corruptions
Wait for all dependencies of a job  to complete before
killing it to avoid data corruptions.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com
2021-05-19 23:50:28 -04:00
Andrey Grodzovsky
c61cdbdbff drm/scheduler: Fix hang when sched_entity released
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.

Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.

v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.

v3:
Drop drm_sched_rq_remove_entity, only modify entity->stopped
and check for it in drm_sched_entity_is_idle

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com
2021-05-19 23:50:28 -04:00
Lee Jones
04be0c5b40 drm/scheduler/sched_entity: Fix some function name disparity
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead
 drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead
 drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2021-04-22 14:05:17 +02:00
Christian König
ac4eb83ab2 drm/sched: select new rq even if there is only one v3
This is necessary when changing priorities of an entity.

v2: test the sched_list instead of num_sched.
v3: set the sched_list to NULL when there is only one entry

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com
2021-03-08 14:06:17 +01:00
Christian König
f2f12eb9c3 drm/scheduler: provide scheduler score externally
Allow multiple schedulers to share the load balancing score.

This is useful when one engine has different hw rings.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com
2021-02-05 10:47:11 +01:00
Lee Jones
00d44b966d gpu: drm: scheduler: sched_entity: Demote non-conformant kernel-doc headers
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'f' not described in 'drm_sched_entity_clear_dep'
 drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_clear_dep'
 drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'f' not described in 'drm_sched_entity_wakeup'
 drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_wakeup'

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-11-13 00:03:31 -05:00
Boris Brezillon
170fb58ee3 drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path
If we don't initialize the entity to idle and the entity is never
scheduled before being destroyed we end up with an infinite wait in the
destroy path.

v2:
- Add Steven's R-b

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/393486/
2020-10-05 15:06:33 +02:00
Nirmoy Das
d41a39dda1 drm/scheduler: improve job distribution with multiple queues
This patch uses score to select a new drm scheduler for better
loadbalance between multiple drm schedulers instead of num_jobs.

Below are test results after running amdgpu_test for ~10 times.

Before this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          1463
sdma1          198
comp_1.0.1     280

After this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          925
sdma1          928
comp_1.0.1     177
comp_1.1.1     44
comp_1.2.1     43
comp_1.3.1     44

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/373000/
Signed-off-by: Christian König <christian.koenig@amd.com>
2020-06-26 14:16:29 +02:00
Nirmoy Das
ec2edcc279 drm/sched: implement and export drm_sched_pick_best
Remove drm_sched_entity_get_free_sched() and use the logic of picking
the least loaded drm scheduler from a drm scheduler list to implement
drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so
that it can be utilized by other drm drivers.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
changzhu
d164bebb95 Revert "drm/scheduler: improve job distribution with multiple queues"
It needs to revert this patch to avoid amdgpu_test compute hang problem
on picasso.

This reverts commit 56822db194232c089601728d68ed078dccb97f8b.

Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
Nirmoy Das
b37aced31e drm/scheduler: implement a function to modify sched list
Implement drm_sched_entity_modify_sched() which modifies existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then the driver can switch
to the corresponding HW scheduler list for that priority.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:35 -04:00
Nirmoy Das
2639f453f2 drm/amdgpu: fix doc by clarifying sched_list definition
expand sched_list definition for better understanding.
Also fix a typo atleast -> at least

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:44 -05:00
Nirmoy Das
9e3e90c50d drm/scheduler: fix documentation by replacing rq_list with sched_list
This also replaces old artifacts with a correct one in drm_sched_entity_init()
declaration

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:41:00 -05:00
Nirmoy Das
56822db194 drm/scheduler: improve job distribution with multiple queues
This patch uses score based logic to select a new rq for better
loadbalance between multiple rq/scheds instead of num_jobs.

Below are test results after running amdgpu_test from mesa drm

Before this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          314
sdma1          32
comp_1.0.0     56
comp_1.0.1     0
comp_1.1.0     0
comp_1.1.1     0
comp_1.2.0     0
comp_1.2.1     0
comp_1.3.0     0
comp_1.3.1     0
After this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          216
sdma1          185
comp_1.0.0     39
comp_1.0.1     9
comp_1.1.0     12
comp_1.1.1     0
comp_1.2.0     12
comp_1.2.1     0
comp_1.3.0     12
comp_1.3.1     0

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:37:54 -05:00
Nirmoy Das
8c23056bdc drm/scheduler: do not keep a copy of sched list
entity should not keep copy and maintain sched list for
itself.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Nirmoy Das
b3ac17667f drm/scheduler: rework entity creation
Entity currently keeps a copy of run_queue list and modify it in
drm_sched_entity_set_priority(). Entities shouldn't modify run_queue
list. Use drm_gpu_scheduler list instead of drm_sched_rq list
in drm_sched_entity struct. In this way we can select a runqueue based
on entity/ctx's priority for a  drm scheduler.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Andrey Grodzovsky
83a7772ba2 drm/sched: Use completion to wait for sched->thread idle v2.
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

v2: Per sched entity completion.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-07 18:08:07 -05:00
Christian König
85cb9d5067 drm/scheduler: use job count instead of peek
The spsc_queue_peek function is accessing queue->head which belongs to
the consumer thread and shouldn't be accessed by the producer

This is fixing a rare race condition when destroying entities.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Monk.liu@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15 10:52:10 -05:00
Sam Ravnborg
7c1be93c8e drm/scheduler: drop use of drmP.h
Drop use of the deprecated drmP.h header file.
Fix fallout.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Sharat Masetty <smasetty@codeaurora.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190630061922.7254-26-sam@ravnborg.org
2019-07-15 18:11:31 +02:00
Dave Airlie
fbac3c48fa Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-next
Fixes for 5.1:
amdgpu:
- Fix missing fw declaration after dropping old CI DPM code
- Fix debugfs access to registers beyond the MMIO bar size
- Fix context priority handling
- Add missing license on some new files
- Various cleanups and bug fixes

radeon:
- Fix missing break in CS parser for evergreen
- Various cleanups and bug fixes

sched:
- Fix entities with 0 run queues

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com
2019-02-22 15:56:42 +10:00
Bas Nieuwenhuizen
1decbf6bb0 drm/sched: Fix entities with 0 rqs.
Some blocks in amdgpu can have 0 rqs.

Job creation already fails with -ENOENT when entity->rq is NULL,
so jobs cannot be pushed. Without a rq there is no scheduler to
pop jobs, and rq selection already does the right thing with a
list of length 0.

So the operations we need to fix are:
  - Creation, do not set rq to rq_list[0] if the list can have length 0.
  - Do not flush any jobs when there is no rq.
  - On entity destruction handle the rq = NULL case.
  - on set_priority, do not try to change the rq if it is NULL.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-15 11:15:08 -05:00
Eric Anholt
82abf33766 drm/sched: Always trace the dependencies we wait on, to fix a race.
The entity->dependency can go away completely once we've called
drm_sched_entity_add_dependency_cb() (if the cb is called before we
get around to tracing).  The tracepoint is more useful if we trace
every dependency instead of just ones that get callbacks installed,
anyway, so just do that.

Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with
"perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-08 14:02:33 -05:00
Sharat Masetty
26efecf955 drm/scheduler: Add drm_sched_job_cleanup
This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:27 -05:00
Andrey Grodzovsky
faf6e1a87e drm/sched: Add boolean to mark if sched is ready to work v5
Problem:
A particular scheduler may become unsuable (underlying HW) after
some event (e.g. GPU reset). If it's later chosen by
the get free sched. policy a command will fail to be
submitted.

Fix:
Add a driver specific callback to report the sched status so
rq with bad sched can be avoided in favor of working one or
none in which case job init will fail.

v2: Switch from driver callback to flag in scheduler.

v3: rebase

v4: Remove ready paramter from drm_sched_init, set
uncoditionally to true once init done.

v5: fix missed change in v3d in v4 (Alex)

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:22 -05:00
Nathan Chancellor
c1f0320e03 drm/scheduler: Simplify spsc_queue_count check in drm_sched_entity_select_rq
Clang generates a warning when it sees a logical not followed by a
conditional operator like ==, >, or <.

drivers/gpu/drm/scheduler/sched_entity.c:470:6: warning: logical not is
only applied to the left hand side of this comparison
[-Wlogical-not-parentheses]
        if (!spsc_queue_count(&entity->job_queue) == 0 ||
            ^                                     ~~
drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses
after the '!' to evaluate the comparison first
        if (!spsc_queue_count(&entity->job_queue) == 0 ||
            ^
             (                                        )
drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses
around left hand side expression to silence this warning
        if (!spsc_queue_count(&entity->job_queue) == 0 ||
            ^
            (                                    )
1 warning generated.

It assumes the author might have made a mistake in their logic:

if (!a == b) -> if (!(a == b))

Sometimes that is the case; other times, it's just a super convoluted
way of saying 'if (a)' when b = 0:

if (!1 == 0) -> if (0 == 0) -> if (true)

Alternatively:

if (!1 == 0) -> if (!!1) -> if (1)

Simplify this comparison so that Clang doesn't complain.

Fixes: 35e160e781a0 ("drm/scheduler: change entities rq even earlier")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-09 17:06:00 -05:00
Nayan Deshmukh
c89677afb3 drm/scheduler: avoid redundant shifting of the entity v2
do not remove entity from the rq if the current rq is from
the least loaded scheduler.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:14 -05:00
Andrey Grodzovsky
62347a3300 drm/scheduler: Add stopped flag to drm_sched_entity
The flag will prevent another thread from same process to
reinsert the entity queue into scheduler's rq after it was already
removewd from there by another thread during drm_sched_entity_flush.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:10 -05:00
Christian König
7b10574eac drm/scheduler: cleanup entity coding style
Cleanup coding style in sched_entity.c

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:43 -05:00
Christian König
620e762f9a drm/scheduler: move entity handling into separate file
This is complex enough on it's own. Move it into a separate C file.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:43 -05:00