771 Commits

Author SHA1 Message Date
Chris Wilson
c80274bb58 drm/i915: Downgrade NEWCLIENT to non-preemptive
Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.

If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as

	rcsA x 30, (rcsB, vcs) x 30

where as before it would have been

	(rcsA, rcsB, vcs) x 30

That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.

The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
(cherry picked from commit 68fc728b01fcc93b26d52f6e884e738962a49a66)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-05-20 18:28:22 +03:00
Chris Wilson
0edda1d681 drm/i915: Flush the CSB pointer reset
The HW resets it CSB tail pointer on resetting the engine. Most of the
time. In case it doesn't (and for system resume) we write the expected
value anyway. For extra paranoia, flush the write before we invalidate
the cacheline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190412110159.10495-1-chris@chris-wilson.co.uk
2019-04-12 14:32:11 +01:00
Chris Wilson
1863e3020a drm/i915/execlists: Always reset the context's RING registers
During reset, we try and stop the active ring. This has the consequence
that we often clobber the RING registers within the context image. When
we find an active request, we update the context image to rerun that
request (if it was guilty, we replace the hanging user payload with
NOPs). However, we were ignoring an active context if the request had
completed, with the consequence that the next submission on that request
would start with RING_HEAD==0 and not the tail of the previous request,
causing all requests still in the ring to be rerun. Rare, but
occasionally seen within CI where we would spot that the context seqno
would reverse and complain that we were retiring an incomplete request.

    <0> [412.390350]   <idle>-0       3d.s2 408373352us : __i915_request_submit: rcs0 fence 1e95b:3640 -> current 3638
    <0> [412.390350]   <idle>-0       3d.s2 408373353us : __i915_request_submit: rcs0 fence 1e95b:3642 -> current 3638
    <0> [412.390350]   <idle>-0       3d.s2 408373354us : __i915_request_submit: rcs0 fence 1e95b:3644 -> current 3638
    <0> [412.390350]   <idle>-0       3d.s2 408373354us : __i915_request_submit: rcs0 fence 1e95b:3646 -> current 3638
    <0> [412.390350]   <idle>-0       3d.s2 408373356us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3646 (current 3638), prio=4
    <0> [412.390350] i915_sel-4613    0.... 408373374us : __i915_request_commit: rcs0 fence 1e95b:3648
    <0> [412.390350] i915_sel-4613    0d..1 408373377us : process_csb: rcs0 cs-irq head=2, tail=3
    <0> [412.390350] i915_sel-4613    0d..1 408373377us : process_csb: rcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
    <0> [412.390350] i915_sel-4613    0d..1 408373378us : __i915_request_submit: rcs0 fence 1e95b:3648 -> current 3638
    <0> [412.390350]   <idle>-0       3..s1 408373378us : execlists_submission_tasklet: rcs0 awake?=1, active=5
    <0> [412.390350] i915_sel-4613    0d..1 408373379us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.2, fence 1e95b:3648 (current 3638), prio=4
    <0> [412.390350] i915_sel-4613    0.... 408373381us : i915_reset_engine: rcs0 flags=4
    <0> [412.390350] i915_sel-4613    0.... 408373382us : execlists_reset_prepare: rcs0: depth<-0
    <0> [412.390350]   <idle>-0       3d.s2 408373390us : process_csb: rcs0 cs-irq head=3, tail=4
    <0> [412.390350]   <idle>-0       3d.s2 408373390us : process_csb: rcs0 csb[4]: status=0x00008002:0x00000002, active=0x1
    <0> [412.390350]   <idle>-0       3d.s2 408373390us : process_csb: rcs0 out[0]: ctx=2.2, fence 1e95b:3648 (current 3640), prio=4
    <0> [412.390350] i915_sel-4613    0.... 408373401us : intel_engine_stop_cs: rcs0
    <0> [412.390350] i915_sel-4613    0d..1 408373402us : process_csb: rcs0 cs-irq head=4, tail=4
    <0> [412.390350] i915_sel-4613    0.... 408373403us : intel_gpu_reset: engine_mask=1
    <0> [412.390350] i915_sel-4613    0d..1 408373408us : execlists_cancel_port_requests: rcs0:port0 fence 1e95b:3648, (current 3648)
    <0> [412.390350] i915_sel-4613    0.... 408373442us : intel_engine_cancel_stop_cs: rcs0
    <0> [412.390350] i915_sel-4613    0.... 408373442us : execlists_reset_finish: rcs0: depth->0
    <0> [412.390350] ksoftirq-26      3..s. 408373442us : execlists_submission_tasklet: rcs0 awake?=1, active=0
    <0> [412.390350] ksoftirq-26      3d.s1 408373443us : process_csb: rcs0 cs-irq head=5, tail=5
    <0> [412.390350] i915_sel-4613    0.... 408373475us : i915_request_retire: rcs0 fence 1e95b:3640, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373476us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3640, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373494us : __i915_request_commit: rcs0 fence 1e95b:3650
    <0> [412.390350] i915_sel-4613    0d..1 408373496us : process_csb: rcs0 cs-irq head=5, tail=5
    <0> [412.390350] i915_sel-4613    0d..1 408373496us : __i915_request_submit: rcs0 fence 1e95b:3650 -> current 3648
    <0> [412.390350] i915_sel-4613    0d..1 408373498us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3650 (current 3648), prio=6
    <0> [412.390350] i915_sel-4613    0.... 408373500us : i915_request_retire_upto: rcs0 fence 1e95b:3648, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373500us : i915_request_retire: rcs0 fence 1e95b:3642, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373501us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3642, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373514us : i915_request_retire: rcs0 fence 1e95b:3644, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373515us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3644, current 3648
    <0> [412.390350] i915_sel-4613    0.... 408373527us : i915_request_retire: rcs0 fence 1e95b:3646, current 3640
    <0> [412.390350]   <idle>-0       3..s1 408373569us : execlists_submission_tasklet: rcs0 awake?=1, active=1
    <0> [412.390350]   <idle>-0       3d.s2 408373569us : process_csb: rcs0 cs-irq head=5, tail=1
    <0> [412.390350]   <idle>-0       3d.s2 408373570us : process_csb: rcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
    <0> [412.390350]   <idle>-0       3d.s2 408373570us : process_csb: rcs0 csb[1]: status=0x00000018:0x00000002, active=0x5
    <0> [412.390350]   <idle>-0       3d.s2 408373570us : process_csb: rcs0 out[0]: ctx=2.1, fence 1e95b:3650 (current 3650), prio=6
    <0> [412.390350]   <idle>-0       3d.s2 408373571us : process_csb: rcs0 completed ctx=2
    <0> [412.390350] i915_sel-4613    0.... 408373621us : i915_request_retire: i915_request_retire:253 GEM_BUG_ON(!i915_request_completed(request))

v2: Fixup the cancellation path to drain the CSB and reset the pointers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-2-chris@chris-wilson.co.uk
2019-04-11 20:48:52 +01:00
Chris Wilson
292ad25c22 drm/i915/guc: Implement reset locally
Before causing guc and execlists to diverge further (breaking guc in the
process), take a copy of the current reset procedure and make it local to
the guc submission backend

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-1-chris@chris-wilson.co.uk
2019-04-11 20:48:51 +01:00
Mika Kuoppala
632c7ad6f4 drm/i915/icl: Switch to using 12 deep CSB status FIFO
Now when we can support variable csb fifo sizes, disable legacy mode.
By disabling legacy we hope to get better hw testing coverage by
assuming everyone else have switched over.

v2: rebase

References: https://bugs.freedesktop.org/show_bug.cgi?id=110338
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Kelvin Gardiner <kelvin.gardiner@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405204657.12887-2-chris@chris-wilson.co.uk
2019-04-11 09:20:10 +01:00
Mika Kuoppala
7d4c75d909 drm/i915: Prepare for larger CSB status FIFO size
Make csb entry count variable in preparation for larger
CSB status FIFO size found on gen11+ hardware.

v2: adapt to hwsp access only (Chris)
    non continuous mmio (Daniele)
v3: entries (Chris), fix macro for checkpatch
v4: num_entries (Chris)
v5: consistency on num_entries

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405204657.12887-1-chris@chris-wilson.co.uk
2019-04-11 09:20:04 +01:00
Chris Wilson
9726920b7e drm/i915: Only reset the pinned kernel contexts on resume
On resume, we know that the only pinned contexts in danger of seeing
corruption are the kernel context, and so we do not need to walk the
list of all GEM contexts as we tracked them on each engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk
2019-04-10 21:18:11 +01:00
Chris Wilson
6d4257284a drm/i915: Make RING_PDP relative to engine->mmio_base
The PDP registers are an oddity inside the set of context saved
registers in that they take the engine as a parameter to the macro and
not the mmio_base as the others do. Make it accept the engine->mmio_base
for consistency in programming the context registers.

add/remove: 0/0 grow/shrink: 2/1 up/down: 3/-32 (-29)
Function                                     old     new   delta
emit_ppgtt_update                            324     326      +2
capture                                     5102    5103      +1
execlists_init_reg_state.isra               1128    1096     -32

And similar savings later!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405123831.9724-1-chris@chris-wilson.co.uk
2019-04-05 15:23:40 +01:00
Chris Wilson
bac24f59f4 drm/i915/execlists: Enable coarse preemption boundaries for gen8
When we introduced preemption, we chose to keep it disabled for gen8 as
supporting preemption inside GPGPU user batches required various w/a in
userspace. Since then, the desire to preempt long queues of requests
between batches (e.g. within busywaiting semaphores) has grown. So allow
arbitration within the busywaits and between requests, but disable
arbitration within user batches so that we can preempt between requests
and not risk breaking GPGPU.

However, since this preemption is much coarser and doesn't interfere
with userspace, we decline to include it amongst the scheduler
capabilities. (This is also required for us to skip over the preemption
selftests that expect to be able to preempt user batches.)

Michal suggested that we could perhaps allow preemption inside gen8
userspace batches if we can satisfy ourselves that the default
preemption settings are viable with existing userspace (principally
OpenCL which already should carry any known workaround). We could then
merge the two code paths back into one, even dropping the artifical
has-preemption device feature flag.

Testcase: igt/gem_exec_scheduler/semaphore-user
References: beecec901790 ("drm/i915/execlists: Preemption!")
Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> #irc
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190329134024.5254-1-chris@chris-wilson.co.uk
2019-04-05 11:00:28 +01:00
Zhenyu Wang
a2deb87396 drm/i915: Disable semaphore on vGPU for now
This is to disable semaphore usage when on vGPU for now. Unfortunately
GVT-g hasn't fully enabled semaphore usage yet, so current guest with
semaphore use would cause vGPU failure.

Although current semaphore failure with vGPU can be simply resolved by
allowing cmd parser to accept MI_SEMAPHORE_WAIT command with address
audit, we're checking general usage of semaphore and how we should
handle it properly for virtualization in consider of function and
security concern. So we decide to request to disable it for now in
guest driver. Once GVT could support it, we would add new compat bit
to turn it on.

Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") #vgpu
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327090636.3547-1-zhenyuw@linux.intel.com
2019-03-27 15:13:28 +00:00
Daniele Ceraolo Spurio
baba6e572b drm/i915: take a reference to uncore in the engine and use it
A few advantages:

- Prepares us for the planned split of display uncore from GT uncore

- Improves our engine-centric view of the world in the engine code
  and allows us to avoid jumping back to dev_priv.

- Allows us to wrap accesses to engine register in nice macros that
  automatically pick the right mmio base.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-10-daniele.ceraolospurio@intel.com
2019-03-26 20:20:40 +00:00
Chris Wilson
ea593dbba4 drm/i915: Allow contexts to share a single timeline across all engines
Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

v2: Move the specialised timeline ordering to its own function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
2019-03-22 13:12:38 +00:00
Chris Wilson
a679f58d05 drm/i915: Flush pages on acquisition
When we return pages to the system, we ensure that they are marked as
being in the CPU domain since any external access is uncontrolled and we
must assume the worst. This means that we need to always flush the pages
on acquisition if we need to use them on the GPU, and from the beginning
have used set-domain. Set-domain is overkill for the purpose as it is a
general synchronisation barrier, but our intent is to only flush the
pages being swapped in. If we move that flush into the pages acquisition
phase, we know then that when we have obj->mm.pages, they are coherent
with the GPU and need only maintain that status without resorting to
heavy handed use of set-domain.

The principle knock-on effect for userspace is through mmap-gtt
pagefaulting. Our uAPI has always implied that the GTT mmap was async
(especially as when any pagefault occurs is unpredicatable to userspace)
and so userspace had to apply explicit domain control itself
(set-domain). However, swapping is transparent to the kernel, and so on
first fault we need to acquire the pages and make them coherent for
access through the GTT. Our use of set-domain here leaks into the uABI
that the first pagefault was synchronous. This is unintentional and
baring a few igt should be unoticed, nevertheless we bump the uABI
version for mmap-gtt to reflect the change in behaviour.

Another implication of the change is that gem_create() is presumed to
create an object that is coherent with the CPU and is in the CPU write
domain, so a set-domain(CPU) following a gem_create() would be a minor
operation that merely checked whether we could allocate all pages for
the object. On applying this change, a set-domain(CPU) causes a clflush
as we acquire the pages. This will have a small impact on mesa as we move
the clflush here on !llc from execbuf time to create, but that should
have minimal performance impact as the same clflush exists but is now
done early and because of the clflush issue, userspace recycles bo and
so should resist allocating fresh objects.

Internally, the presumption that objects are created in the CPU
write-domain and remain so through writes to obj->mm.mapping is more
prevalent than I expected; but easy enough to catch and apply a manual
flush.

For the future, we should push the page flush from the central
set_pages() into the callers so that we can more finely control when it
is applied, but for now doing it one location is easier to validate, at
the cost of sometimes flushing when there is no need.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
2019-03-21 17:28:12 +00:00
Chris Wilson
4daffb664a drm/i915: Stop storing the context name as the timeline name
The timeline->name is only used for convenience in pretty printing the
i915_request.fence->ops->get_timeline_name() and it is just as
convenient to pull it from the gem_context directly. The few instances
of its use inside GEM_TRACE() has proven more of a nuisance than
helpful, so not worth saving imo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-4-chris@chris-wilson.co.uk
2019-03-21 15:59:31 +00:00
Daniele Ceraolo Spurio
25286aaca9 drm/i915: move regs pointer inside the uncore structure
This will allow futher simplifications in the uncore handling.

v2: move register access setup under uncore (Chris)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-8-daniele.ceraolospurio@intel.com
2019-03-20 21:12:50 +00:00
Chris Wilson
4c5896dc4c drm/i915: Hold a reference to the active HW context
For virtual engines, we need to keep the HW context alive while it
remains in use. For regular HW contexts, they are created and kept alive
until the end of the GEM context. For simplicity, generalise the
requirements and keep an active reference to each HW context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-2-chris@chris-wilson.co.uk
2019-03-19 08:21:13 +00:00
Chris Wilson
206c2f812f drm/i915: Lock the gem_context->active_list while dropping the link
On unpinning the intel_context, we remove it from the active list
inside the GEM context. This list is supposed to be guarded by the GEM
context mutex, so remember to take it!

Fixes: 7e3d9a59410d ("drm/i915: Track active engines within a context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-1-chris@chris-wilson.co.uk
2019-03-19 08:21:11 +00:00
Chris Wilson
65baf0ef04 drm/i915: Hold a ref to the ring while retiring
As the final request on a ring may hold the reference to this ring (via
retiring the last pinned context), we may find ourselves chasing a
dangling pointer on completion of the list.

A quick solution is to hold a reference to the ring itself as we retire
along it so that we only free it after we stop dereferencing it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
2019-03-18 21:00:28 +00:00
Chris Wilson
54939ea0bd drm/i915: Switch to use HWS indices rather than addresses
If we use the STORE_DATA_INDEX function we can use a fixed offset and
avoid having to lookup up the engine HWS address. A step closer to being
able to emit the final breadcrumb during request_add rather than later
in the submission interrupt handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-9-chris@chris-wilson.co.uk
2019-03-18 20:55:28 +00:00
Chris Wilson
41a1bde367 drm/i915: Always kick the execlists tasklet after reset
With direct submission being disabled while the reset in progress, we
have a small window where we may forgo the submission of a new request
and not notice its addition during execlists_reset_finish. To close this
window, always schedule the submission tasklet on coming out of reset to
catch any residual work.

<6> [333.144082] i915: Running intel_hangcheck_live_selftests/igt_reset_engines
<3> [333.296927] i915_reset_engine(rcs0:idle): failed to idle after reset
<6> [333.296932] i915 0000:00:02.0: [drm] rcs0
<6> [333.296934] i915 0000:00:02.0: [drm] 	Hangcheck 0:a9ddf7a5 [4157 ms]
<6> [333.296936] i915 0000:00:02.0: [drm] 	Reset count: 36048 (global 754)
<6> [333.296938] i915 0000:00:02.0: [drm] 	Requests:
<6> [333.296997] i915 0000:00:02.0: [drm] 	RING_START: 0x00000000
<6> [333.296999] i915 0000:00:02.0: [drm] 	RING_HEAD:  0x00000000
<6> [333.297001] i915 0000:00:02.0: [drm] 	RING_TAIL:  0x00000000
<6> [333.297003] i915 0000:00:02.0: [drm] 	RING_CTL:   0x00000000
<6> [333.297005] i915 0000:00:02.0: [drm] 	RING_MODE:  0x00000200 [idle]
<6> [333.297007] i915 0000:00:02.0: [drm] 	RING_IMR: fffffeff
<6> [333.297010] i915 0000:00:02.0: [drm] 	ACTHD:  0x00000000_00000000
<6> [333.297012] i915 0000:00:02.0: [drm] 	BBADDR: 0x00000000_00000000
<6> [333.297015] i915 0000:00:02.0: [drm] 	DMA_FADDR: 0x00000000_00000000
<6> [333.297017] i915 0000:00:02.0: [drm] 	IPEIR: 0x00000000
<6> [333.297019] i915 0000:00:02.0: [drm] 	IPEHR: 0x00000000
<6> [333.297021] i915 0000:00:02.0: [drm] 	Execlist status: 0x00000001 00000000
<6> [333.297023] i915 0000:00:02.0: [drm] 	Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (enabled)
<6> [333.297025] i915 0000:00:02.0: [drm] 		ELSP[0] idle
<6> [333.297027] i915 0000:00:02.0: [drm] 		ELSP[1] idle
<6> [333.297028] i915 0000:00:02.0: [drm] 		HW active? 0x0
<6> [333.297044] i915 0000:00:02.0: [drm] 		Queue priority hint: -8186
<6> [333.297067] i915 0000:00:02.0: [drm] 		Q  2afac:5f2+  prio=-8186 @ 50ms: (null)
<6> [333.297068] i915 0000:00:02.0: [drm] HWSP:
<6> [333.297071] i915 0000:00:02.0: [drm] [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<6> [333.297073] i915 0000:00:02.0: [drm] *
<6> [333.297075] i915 0000:00:02.0: [drm] [0040] 00000001 00000000 00000018 00000002 00000001 00000000 00000018 00000000
<6> [333.297077] i915 0000:00:02.0: [drm] [0060] 00000001 00000000 00008002 00000002 00000000 00000000 00000000 00000005
<6> [333.297079] i915 0000:00:02.0: [drm] [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<6> [333.297081] i915 0000:00:02.0: [drm] *
<6> [333.297083] i915 0000:00:02.0: [drm] [00c0] 00000000 00000000 00000000 00000000 a9ddf7a5 00000000 00000000 00000000
<6> [333.297085] i915 0000:00:02.0: [drm] [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<6> [333.297087] i915 0000:00:02.0: [drm] *
<6> [333.297089] i915 0000:00:02.0: [drm] Idle? no
<6> [333.297090] i915_reset_engine(rcs0:idle): 3000 resets
<3> [333.297092] i915/intel_hangcheck_live_selftests: igt_reset_engines failed with error -5
<3> [333.455460] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
...
<0> [333.491294] i915_sel-4916    1.... 333262143us : i915_reset_engine: rcs0 flags=4
<0> [333.491328] i915_sel-4916    1.... 333262143us : execlists_reset_prepare: rcs0: depth<-0
<0> [333.491362] i915_sel-4916    1.... 333262143us : intel_engine_stop_cs: rcs0
<0> [333.491396] i915_sel-4916    1d..1 333262144us : process_csb: rcs0 cs-irq head=5, tail=5
<0> [333.491424] i915_sel-4916    1.... 333262145us : intel_gpu_reset: engine_mask=1
<0> [333.491454] kworker/-214     5.... 333262184us : i915_gem_switch_to_kernel_context: awake?=yes
<0> [333.491487] kworker/-214     5.... 333262192us : i915_request_add: rcs0 fence 2afac:1522
<0> [333.491520] kworker/-214     5.... 333262193us : i915_request_add: marking (null) as active
<0> [333.491553] i915_sel-4916    1.... 333262199us : intel_engine_cancel_stop_cs: rcs0
<0> [333.491587] i915_sel-4916    1.... 333262199us : execlists_reset_finish: rcs0: depth->0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190313162835.30228-1-chris@chris-wilson.co.uk
2019-03-15 10:58:23 +00:00
Chris Wilson
a9fe9ca44c drm/i915/gtt: Rename i915_vm_is_48b to i915_vm_is_4lvl
Large ppGTT are differentiated by the requirement to go to four levels
to address more than 32b. Given the introduction of more 4 level ppGTT
with different sizes of addressable bits, rename i915_vm_is_48b() to
better reflect the commonality of using 4 levels.

Based on a patch by Bob Paauwe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-4-chris@chris-wilson.co.uk
2019-03-15 09:04:54 +00:00
Chris Wilson
4b378c0672 drm/i915: Consolidate reset-request debug message
Move the pair of messages to the common callsite where it makes sense to
include a bit more information about which request is being reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190312111146.10662-1-chris@chris-wilson.co.uk
2019-03-12 12:49:29 +00:00
Chris Wilson
0881954965 drm/i915: Introduce intel_context.pin_mutex for pin management
Introduce a mutex to start locking the HW contexts independently of
struct_mutex, with a view to reducing the coarse struct_mutex. The
intel_context.pin_mutex is used to guard the transition to and from being
pinned on the gpu, and so is required before starting to build any
request. The intel_context will then remain pinned until the request
completes, but the mutex can be released immediately unpin completion of
pinning the context.

A slight variant of the above is used by per-context sseu that wants to
inspect the pinned status of the context, and requires that it remains
stable (either !pinned or pinned) across its operation. By using the
pin_mutex to serialise operations while pin_count==0, we can take that
pin_mutex for stabilise the boolean pin status.

v2: for Tvrtko!
* Improved commit message.
* Dropped _gpu suffix from gen8_modify_rpcs_gpu.
v3: Repair the locking for sseu selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk
2019-03-08 14:04:19 +00:00
Chris Wilson
9dbfea98d7 drm/i915: Track the pinned kernel contexts on each engine
Each engine acquires a pin on the kernel contexts (normal and preempt)
so that the logical state is always available on demand. Keep track of
each engines pin by storing the returned pointer on the engine for quick
access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-6-chris@chris-wilson.co.uk
2019-03-08 14:00:02 +00:00
Chris Wilson
95f697eb02 drm/i915: Make context pinning part of intel_context_ops
Push the intel_context pin callback down from intel_engine_cs onto the
context itself by virtue of having a central caller for
intel_context_pin() being able to lookup the intel_context itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-5-chris@chris-wilson.co.uk
2019-03-08 13:59:59 +00:00
Chris Wilson
c4d52feb2c drm/i915: Move over to intel_context_lookup()
In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

v2: Check for no HW context in guc_stage_desc_init

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
2019-03-08 13:59:52 +00:00
Chris Wilson
4dc84b77b0 drm/i915: Store the intel_context_ops in the intel_engine_cs
If we place a pointer to the engine specific intel_context_ops in the
engine itself, we can assign the ops pointer on initialising the
context, and then rely on it being set. This simplifies the code in
later patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-3-chris@chris-wilson.co.uk
2019-03-08 13:59:50 +00:00
Chris Wilson
7e3d9a5941 drm/i915: Track active engines within a context
For use in the next patch, if we track which engines have been used by
the HW, we can reduce the work required to flush our state off the HW to
those engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-1-chris@chris-wilson.co.uk
2019-03-08 13:59:41 +00:00
Chris Wilson
b146e5efe6 drm/i915: Pass around the intel_context
Instead of passing the gem_context and engine to find the instance of
the intel_context to use, pass around the intel_context instead. This is
useful for the next few patches, where the intel_context is no longer a
direct lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306084704.15755-1-chris@chris-wilson.co.uk
2019-03-06 10:16:33 +00:00
Chris Wilson
8a68d46436 drm/i915: Store the BIT(engine->id) as the engine's mask
In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.

v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)
v3: Tvrtko voted for moah churn so teach everyone to not mention ring
and use $class$instance throughout.
v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and
VCS[0-4] in later gen. We opt to keep the code consistent and use
0-index naming throughout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
2019-03-05 18:19:50 +00:00
Chris Wilson
f9e9e9de58 drm/i915: Prioritise non-busywait semaphore workloads
We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

v2: Stop rolling the bits into a chain and just use a flag in case this
request or any of our dependencies use a semaphore. The rolling around
was contagious as Tvrtko was heard to fall off his chair.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
2019-03-01 17:45:11 +00:00
Chris Wilson
e886196469 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the 3x reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement (can be up to ~10%, most see no change)
for single clients that utilize multiple engines (typically media players
and transcoders), without regressing multiple clients that can saturate
the system or changing the power envelope dramatically.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Testcase: igt/gem_exec_whisper
Testcase: igt/benchmarks/gem_wsim
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
2019-03-01 17:45:07 +00:00
Chris Wilson
1e3f697e47 drm/i915/execlists: Suppress redundant preemption
On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!

v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.

v6: More commentary after coming to an understanding about what I had
forgotten to say.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-1-chris@chris-wilson.co.uk
2019-03-01 17:40:32 +00:00
Chris Wilson
b5773a3616 drm/i915/execlists: Suppress mere WAIT preemption
WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
2019-02-28 23:10:43 +00:00
Chris Wilson
32eb6bcfdd drm/i915: Make request allocation caches global
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
2019-02-28 11:07:56 +00:00
Chris Wilson
2d5eaad007 drm/i915: Compute the global scheduler caps
Do a pass over all the engines upon starting to determine the global
scheduler capability flags (those that are agreed upon by all).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-7-chris@chris-wilson.co.uk
2019-02-28 08:58:37 +00:00
Chris Wilson
b300fde896 drm/i915: Remove i915_request.global_seqno
Having weaned the interrupt handling off using a single global execution
queue, we no longer need to emit a global_seqno. Note that we still have
a few assumptions about execution order along engine timelines, but this
removes the most obvious artefact!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
2019-02-26 09:55:37 +00:00
Chris Wilson
8892f47742 drm/i915: Remove access to global seqno in the HWSP
Stop accessing the HWSP to read the global seqno, and stop tracking the
mirror in the engine's execution timeline -- it is unused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
2019-02-26 09:55:33 +00:00
Chris Wilson
89531e7d8e drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
To determine whether an engine has 'stuck', we simply check whether or
not is still on the same seqno for several seconds. To keep this simple
mechanism intact over the loss of a global seqno, we can simply add a
new global heartbeat seqno instead. As we cannot know the sequence in
which requests will then be completed, we use a primitive random number
generator instead (with a cycle long enough to not matter over an
interval of a few thousand requests between hangcheck samples).

The alternative to using a dedicated seqno on every request is to issue
a heartbeat request and query its progress through the system. Sadly
this requires us to reduce struct_mutex so that we can issue requests
without requiring that bkl.

v2: And without the extra CS_STALL for the hangcheck seqno -- we don't
need strict serialisation with what comes later, we just need to be sure
we don't write the hangcheck seqno before our batch is flushed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-1-chris@chris-wilson.co.uk
2019-02-26 09:55:31 +00:00
Chris Wilson
9a3b19a16d drm/i915: Only try to park engines after a failed reset
Currently we try to stop the engine by programming the ring registers to
be disabled before we perform the reset. Sometimes, we see the context
image also have invalid ring registers, which one presumes may be
actually caused by us doing so. Lets risk not doing programming the
ring to zero on the first attempt to avoid preserving that corruption
into the context image, leaving the w/a in place for subsequent
reset attempts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213232047.8486-1-chris@chris-wilson.co.uk
2019-02-14 21:41:45 +00:00
Chris Wilson
c10c78ade5 drm/i915/execlists: Refactor out can_merge_rq()
In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

v2: Reorder tests to decide if we can continue filling ELSP and bonus
comments.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208235108.23127-1-chris@chris-wilson.co.uk
2019-02-09 00:31:41 +00:00
Chris Wilson
21182b3c4c drm/i915: Don't claim an unstarted request was guilty
If we haven't even begun executing the payload of the stalled request,
then we should not claim that its userspace context was guilty of
submitting a hanging batch.

v2: Check for context corruption before trying to restart.
v3: Preserve semaphores on skipping requests (need to keep the timelines
intact).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-7-chris@chris-wilson.co.uk
2019-02-08 16:47:40 +00:00
Tvrtko Ursulin
e46c2e99f6 drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)
We want to allow userspace to reconfigure the subslice configuration on a
per context basis.

This is required for the functional requirement of shutting down non-VME
enabled sub-slices on Gen11 parts.

To do so, we expose a context parameter to allow adjustment of the RPCS
register stored within the context image (and currently not accessible via
LRI).

If the context is adjusted before first use or whilst idle, the adjustment
is for "free"; otherwise if the context is active we queue a request to do
so (using the kernel context), following all other activity by that
context, which is also marked as barrier for all following submission
against the same context.

Since the overhead of device re-configuration during context switching can
be significant, especially in multi-context workloads, we limit this new
uAPI to only support the Gen11 VME use case. In this use case either the
device is fully enabled, and exactly one slice and half of the subslices
are enabled.

Example usage:

	struct drm_i915_gem_context_param_sseu sseu = { };
	struct drm_i915_gem_context_param arg = {
		.param = I915_CONTEXT_PARAM_SSEU,
		.ctx_id = gem_context_create(fd),
		.size = sizeof(sseu),
		.value = to_user_pointer(&sseu)
	};

	/* Query device defaults. */
	gem_context_get_param(fd, &arg);

	/* Set VME configuration on a 1x6x8 part. */
	sseu.slice_mask = 0x1;
	sseu.subslice_mask = 0xe0;
	gem_context_set_param(fd, &arg);

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu()
    (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context
    (Chris)

v7: Synchronize the requests following a powergating setting change using
    a global dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users
    (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
    s/dev_priv/i915/ (Tvrtko)
    Change uapi class/instance fields to u16 (Tvrtko)
    Bump mask fields to 64bits (Lionel)
    Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
    reconfiguring powergated slice/subslices (Chris)
    Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
 * Update for upstream changes.
 * Request submit needs a RPM reference.
 * Reject on !FULL_PPGTT for simplicity.
 * Pull out get/set param to helpers for readability and less indent.
 * Use i915_request_await_dma_fence in add_global_barrier to skip waits
   on the same timeline and avoid GEM_BUG_ON.
 * No need to explicitly assign a NULL pointer to engine in legacy mode.
 * No need to move gen8_make_rpcs up.
 * Factored out global barrier as prep patch.
 * Allow to only CAP_SYS_ADMIN if !Gen11.

v11:
 * Remove engine vfunc in favour of local helper. (Chris Wilson)
 * Stop retiring requests before updates since it is not needed
   (Chris Wilson)
 * Implement direct CPU update path for idle contexts. (Chris Wilson)
 * Left side dependency needs only be on the same context timeline.
   (Chris Wilson)
 * It is sufficient to order the timeline. (Chris Wilson)
 * Reject !RCS configuration attempts with -ENODEV for now.

v12:
 * Rebase for make_rpcs.

v13:
 * Centralize SSEU normalization to make_rpcs.
 * Type width checking (uAPI <-> implementation).
 * Gen11 restrictions uAPI checks.
 * Gen11 subslice count differences handling.
 Chris Wilson:
 * args->size handling fixes.
 * Update context image from GGTT.
 * Postpone context image update to pinning.
 * Use i915_gem_active_raw instead of last_request_on_engine.

v14:
 * Add activity tracker on intel_context to fix the lifetime issues
   and simplify the code. (Chris Wilson)

v15:
 * Fix context pin leak if no space in ring by simplifying the
   context pinning sequence.

v16:
 * Rebase for context get/set param locking changes.
 * Just -ENODEV on !Gen11. (Joonas)

v17:
 * Fix one Gen11 subslice enablement rule.
 * Handle error from i915_sw_fence_await_sw_fence_gfp. (Chris Wilson)

v18:
 * Update commit message. (Joonas)
 * Restrict uAPI to VME use case. (Joonas)

v19:
 * Rebase.

v20:
 * Rebase for ce->active_tracker.

v21:
 * Rebase for IS_GEN changes.

v22:
 * Reserve uAPI for flags straight away. (Chris Wilson)

v23:
 * Rebase for RUNTIME_INFO.

v24:
 * Added some headline docs for the uapi usage. (Joonas/Chris)

v25:
 * Renamed class/instance to engine_class/engine_instance to avoid clash
   with C++ keyword. (Tony Ye)

v26:
 * Rebased for runtime pm api changes.

v27:
 * Rebased for intel_context_init.
 * Wrap commit msg to 75.

v28:
 (Chris Wilson)
 * Use i915_gem_ggtt.
 * Use i915_request_await_dma_fence to show a better example.

v29:
 * i915_timeline_set_barrier can now fail. (Chris Wilson)

v30:
 * Capture some acks.

v31:
 * Drop the WARN_ON from use controllable paths. (Chris Wilson)
 * Use overflows_type for all checks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107634
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Stéphane Marchesin <marcheu@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-4-tvrtko.ursulin@linux.intel.com
2019-02-05 11:32:03 +00:00
Lionel Landwerlin
ec431eae8f drm/i915/perf: lock powergating configuration to default when active
If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active.

On most platforms there are no issues with this apart from a performance
penality for some media workloads that benefit from running on a partially
powergated GPU. We already prevent RC6 from affecting the programming so
it doesn't sound completely unreasonable to hold on powergating for the
same reason.

On Icelake however there would a functional problem if the slices not-
containing the VME block were left enabled with a running media workload
which explicitly disabled them. To avoid a GPU hang in this case, on
Icelake we lock the enablement to only slices which contain VME blocks.
Downside is that it means degraded GPU performance when OA is active but
there is no known alternative solution for this.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
    More to_intel_context() (Tvrtko)
    s/dev_priv/i915/ (Tvrtko)

Tvrtko Ursulin:

v4:
 * Rebase for make_rpcs changes.

v5:
 * Apply OA restriction from make_rpcs directly.

v6:
 * Rebase for context image setup changes.

v7:
 * Move stream assignment before metric enable.

v8-9:
 * Rebase.

v10:
 * Squashed with ICL support patch.

Bspec: 21140
Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-2-tvrtko.ursulin@linux.intel.com
2019-02-05 11:31:52 +00:00
Lionel Landwerlin
87f1ef2252 drm/i915: Record the sseu configuration per-context & engine
We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

v5: More to_intel_context() (Tvrtko)
    Switch intel_sseu from union to struct (Tvrtko)
    Move context default sseu in existing loop (Chris)

v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)

Tvrtko Ursulin:

v7:
 * Pass intel_sseu by pointer instead of value to make_rpcs.
 * Rebase for make_rpcs changes.

v8:
 * Rebase for RPCS edit on pin.

v9:
 * Rebase for context image setup changes.

v10:
 * Rename dev_priv to i915. (Chris Wilson)

v11:
 * Rebase.

v12:
 * Rebase for IS_GEN changes.

v13:
 * Rebase for RUNTIME_INFO.

v14:
 * Rebase for intel_context_init.

v15:
 * Rebase for drm-tip changes.

v16:
 * Moved struct intel_sseu definition to i915_gem_context.h.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-1-tvrtko.ursulin@linux.intel.com
2019-02-05 11:31:27 +00:00
Chris Wilson
c9a6462288 drm/i915/execlists: Suppress preempting self
In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

However, when we avoid preempting ELSP[0], we still retain the preemption
value as it may match a second preemption request within the same time period
that we need to resolve after the next CS event. However, since we only
store the maximum preemption priority seen, it may not match the
subsequent event and so we should double check whether or not we
actually do need to trigger a preempt-to-idle by comparing the top
priorities from each queue. Later, this gives us a hook for finer
control over deciding whether the preempt-to-idle is justified.

The sequence of events where we end up preempting for no avail is:

1. Queue requests/contexts A, B
2. Priority boost A; no preemption as it is executing, but keep hint
3. After CS switch, B is less than hint, force preempt-to-idle
4. Resubmit B after idling

v2: We can simplify a bunch of tests based on the knowledge that PI will
ensure that earlier requests along the same context will have the highest
priority.
v3: Demonstrate the stale preemption hint with a selftest

References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
2019-01-29 20:00:05 +00:00
Chris Wilson
4d97cbe019 drm/i915: Rename execlists->queue_priority to queue_priority_hint
After noticing that we trigger preemption events for currently executing
requests, as well as requests that complete before the preemption and
attempting to suppress those preemption events, it is wise to not
consider the queue_priority to be authoritative. As we only track the
maximum priority seen between dequeue passes, if the maximum priority
request is no longer available for dequeuing (it completed or is even
executing on another engine), we have no knowledge of the previous
queue_priority as it would require us to keep a full history of enqueued
requests -- but we already have that history in the priolists!

Rename the queue_priority to queue_priority_hint so that we do not
confuse it as being exactly the maximum priority in the queue, but merely
an indication that we have seen a new maximum priority value and as such
we should check whether it should preempt the currently running request.

v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it
being only used for the singular task of preemption and not the wider
question of waking up due to a change in the queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
2019-01-29 20:00:03 +00:00
Chris Wilson
8547444137 drm/i915: Identify active requests
To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
2019-01-29 19:59:59 +00:00
Chris Wilson
5013eb8cd6 drm/i915: Track the context's seqno in its own timeline HWSP
Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.

One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.

At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.

v2: s/fake_complete/mark_complete/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
2019-01-28 19:07:09 +00:00
Chris Wilson
52954edd1f drm/i915: Allocate a status page for each timeline
Allocate a page for use as a status page by a group of timelines, as we
only need a dword of storage for each (rounded up to the cacheline for
safety) we can pack multiple timelines into the same page. Each timeline
will then be able to track its own HW seqno.

v2: Reuse the common per-engine HWSP for the solitary ringbuffer
timeline, so that we do not have to emit (using per-gen specialised
vfuncs) the breadcrumb into the distinct timeline HWSP and instead can
keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the
sleight-of-hand for the global/per-context seqno switchover, we will
store both temporarily (and so use a custom offset for the shared timeline
HWSP until the switch over).

v3: Keep things simple and allocate a page for each timeline, page
sharing comes next.

v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over
again in selftests.

v5: And caught red handed copying create timeline + check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-3-chris@chris-wilson.co.uk
2019-01-28 19:07:02 +00:00