IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This was done by the following semantic patch:
@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)
@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E
@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E
@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E
@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)
@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)
@@
identifier def.id;
@@
- id
+ ver
It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com
The functions can be called both in _rcu context as well
as while holding the lock.
v2: add some kerneldoc as suggested by Daniel
v3: fix indentation
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602111714.212426-7-christian.koenig@amd.com
Reference needs to be taken before arming the timer. Luckily, given the
default timer period of 20s, the potential to hit the race is extremely
unlikely.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 9b4d0598ee94 ("drm/i915: Request watchdog infrastructure")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210326105759.2387104-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit f7c379779161d364eb30338529490eac7dc377b7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reference needs to be taken before arming the timer. Luckily, given the
default timer period of 20s, the potential to hit the race is extremely
unlikely.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 9b4d0598ee94 ("drm/i915: Request watchdog infrastructure")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210326105759.2387104-1-tvrtko.ursulin@linux.intel.com
Driver Changes:
- Prepare for local/device memory support on DG1 by starting
to use it for kernel internal allocations: context, ring
and engine scratch (Matt A, CQ, Abdiel, Imre)
- Sandybridge fix to avoid hard hang on ring resume (Chris)
- Limit imported dma-buf size to int32 (Matt A)
- Double check heartbeat timeout before resetting (Chris)
- Use new tasklet API for execution list (Emil)
- Fix SPDX checkpats warnings (Chris)
- Fixes for various checkpatch warnings (Chris)
- Selftest improvements (Chris)
- Move the defer_request waiter active assertion to correct spot (Chris)
- Make local-memory probing a GT operation (Matt, Tvrtko)
- Protect against request freeing during cancellation on wedging (Chris)
- Retire unexpected starting state error dumping (Chris)
- Distinction of memory regions in debugging (Zbigniew)
- Always flush the submission queue on checking for idle (Chris)
- Consolidate 2big error check to helper (Matt)
- Decrease number of subplatform bits (Tvrtko)
- Remove unused internal request priority levels (Chris)
- Document the unused internal header bits in buddy allocator (Matt)
- Cleanup the region class/instance encoding (Matt)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YGxksaZGXHnFxlwg@jlahtine-mobl.ger.corp.intel.com
Prepares the plumbing for setting request/fence expiration time. All code
is put in place but is never activated due yet missing ability to actually
configure the timer.
Outline of the basic operation:
A timer is started when request is ready for execution. If the request
completes (retires) before the timer fires, timer is cancelled and nothing
further happens.
If the timer fires request is added to a lockless list and worker queued.
Purpose of this is twofold: a) It allows request cancellation from a more
friendly context and b) coalesces multiple expirations into a single event
of consuming the list.
Worker locklessly consumes the list of expired requests and cancels them
all using previous added i915_request_cancel().
Associated timeout value is stored in rq->context.watchdog.timeout_us.
v2:
* Log expiration.
v3:
* Include more information about user timeline in the log message.
v4:
* Remove obsolete comment and fix formatting. (Matt)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-6-tvrtko.ursulin@linux.intel.com
Currently, we cancel outstanding requests within a context when the
context is closed. We may also want to cancel individual requests using
the same graceful preemption mechanism.
v2 (Tvrtko):
* Cancel waiters carefully considering no timeline lock and RCU.
* Fixed selftests.
v3 (Tvrtko):
* Remove error propagation to waiters for now.
v4 (Tvrtko):
* Rebase for extracted i915_request_active_engine. (Matt)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
[danvet: Resolve conflict because intel_engine_flush_scheduler is
still called intel_engine_flush_submission]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-3-tvrtko.ursulin@linux.intel.com
Move active engine lookup to exported i915_request_active_engine.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
[danvet: Slight rebase, engine->sched.lock is still called
engine->active.lock.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-2-tvrtko.ursulin@linux.intel.com
As soon as we mark a request as completed, it may be retired. So when
cancelling a request and marking it complete, make sure we first keep a
reference to the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210201085715.27435-4-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of sharing pages with breadcrumbs, give each timeline a
single page. This allows unrelated timelines not to share locks
any more during command submission.
As an additional benefit, seqno wraparound no longer requires
i915_vma_pin, which means we no longer need to worry about a
potential -EDEADLK at a point where we are ready to submit.
Changes since v1:
- Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
- Extra check for completion in intel_read_hwsp().
Changes since v2:
- Fix inconsistent indent in hwsp_alloc() (kbuild)
- memset entire cacheline to 0.
Changes since v3:
- Do same in intel_timeline_reset_seqno(), and clflush for good measure.
Changes since v4:
- Use refcounting on timeline, instead of relying on i915_active.
- Fix waiting on kernel requests.
Changes since v5:
- Bump amount of slots to maximum (256), for best wraparounds.
- Add hwsp_offset to i915_request to fix potential wraparound hang.
- Ensure timeline wrap test works with the changes.
- Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
fix a hang.
Changes since v6:
- Rename i915_request_active_offset to i915_request_active_seqno(),
and elaborate the function. (tvrtko)
Changes since v7:
- Move hunk to where it belongs. (jekstrand)
- Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com
Originally, we used the signal->lock as a means of following the
previous link in its timeline and peeking at the previous fence.
However, we have replaced the explicit serialisation with a series of
very careful probes that anticipate the links being deleted and the
fences recycled before we are able to acquire a strong reference to it.
We do not need the signal->lock crutch anymore, nor want the contention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-2-chris@chris-wilson.co.uk
When we know that we are inside the timeline mutex, or inside the
submission flow (under active.lock or the holder's rcu lock), we know
that the rq->hwsp is stable and we can use the simpler direct version.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-1-chris@chris-wilson.co.uk
UAPI Changes:
- Deprecate I915_PMU_LAST and optimize state tracking (Tvrtko)
Avoid relying on last item ABI marker in i915_drm.h, add a
comment to mark as deprecated.
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- Restore clear residuals security mitigations for Ivybridge and
Baytrail (Chris)
- Close#1858: Allow sysadmin to choose applied GPU security mitigations
through i915.mitigations=... similar to CPU (Chris)
- Fix for #2024: GPU hangs on HSW GT1 (Chris)
- Fix for #2707: Driver hang when editing UVs in Blender (Chris, Ville)
- Fix for #2797: False positive GuC loading error message (Chris)
- Fix for #2859: Missing GuC firmware for older Cometlakes (Chris)
- Lessen probability of GPU hang due to DMAR faults [reason 7,
next page table ptr is invalid] on Tigerlake (Chris)
- Fix REVID macros for TGL to fetch correct stepping (Aditya)
- Limit frequency drop to RPe on parking (Chris, Edward)
- Limit W/A 1406941453 to TGL, RKL and DG1 (Swathi)
- Make W/A 22010271021 permanent on DG1 (Lucas)
- Implement W/A 16011163337 to prevent a HS/DS hang on DG1 (Swathi)
- Only disable preemption on gen8 render engines (Chris)
- Disable arbitration around Braswell's PDP updates (Chris)
- Disable arbitration on no-preempt requests (Chris)
- Check for arbitration after writing start seqno before busywaiting (Chris)
- Retain default context state across shrinking (Venkata, CQ)
- Fix mismatch between misplaced vma check and vma insert for 32-bit
addressing userspaces (Chris, CQ)
- Propagate error for vmap() failure instead kernel NULL deref (Chris)
- Propagate error from cancelled submit due to context closure
immediately (Chris)
- Fix RCU race on HWSP tracking per request (Chris)
- Clear CMD parser shadow and GPU reloc batches (Matt A)
- Populate logical context during first pin (Maarten)
- Optimistically prune dma-resv from the shrinker (Chris)
- Fix for virtual engine ownership race (Chris)
- Remove timeslice suppression to restore fairness for virtual engines (Chris)
- Rearrange IVB/HSW workarounds properly between GT and engine (Chris)
- Taint the reset mutex with the shrinker (Chris)
- Replace direct submit with direct call to tasklet (Chris)
- Multiple corrections to virtual engine dequeue and breadcrumbs code (Chris)
- Avoid wakeref from potentially hard IRQ context in PMU (Tvrtko)
- Use raw clock for RC6 time estimation in PMU (Tvrtko)
- Differentiate OOM failures from invalid map types (Chris)
- Fix Gen9 to have 64 MOCS entries similar to Gen11 (Chris)
- Ignore repeated attempts to suspend request flow across reset (Chris)
- Remove livelock from "do_idle_maps" VT-d W/A (Chris)
- Cancel the preemption timeout early in case engine reset fails (Chris)
- Code flow optimization in the scheduling code (Chris)
- Clear the execlists timers upon reset (Chris)
- Drain the breadcrumbs just once (Chris, Matt A)
- Track the overall GT awake/busy time (Chris)
- Tweak submission tasklet flushing to avoid starvation (Chris)
- Track timelines created using the HWSP to restore on resume (Chris)
- Use cmpxchg64 for 32b compatilibity for active tracking (Chris)
- Prefer recycling an idle GGTT fence to avoid GPU wait (Chris)
- Restructure GT code organization for clearer split between GuC
and execlists (Chris, Daniele, John, Matt A)
- Remove GuC code that will remain unused by new interfaces (Matt B)
- Restructure the CS timestamp clocks code to local to GT (Chris)
- Fix error return paths in perf code (Zhang)
- Replace idr_init() by idr_init_base() in perf (Deepak)
- Fix shmem_pin_map error path (Colin)
- Drop redundant free_work worker for GEM contexts (Chris, Mika)
- Increase readability and understandability of intel_workarounds.c (Lucas)
- Defer enabling the breadcrumb interrupt to after submission (Chris)
- Deal with buddy alloc block sizes beyond 4G (Venkata, Chris)
- Encode fence specific waitqueue behaviour into the wait.flags (Chris)
- Don't cancel the breadcrumb interrupt shadow too early (Chris)
- Cancel submitted requests upon context reset (Chris)
- Use correct locks in GuC code (Tvrtko)
- Prevent use of engine->wa_ctx after error (Chris, Matt R)
- Fix build warning on 32-bit (Arnd)
- Avoid memory leak if platform would have more than 16 W/A (Tvrtko)
- Avoid unnecessary #if CONFIG_PM in PMU code (Chris, Tvrtko)
- Improve debugging output (Chris, Tvrtko, Matt R)
- Make file local variables static (Jani)
- Avoid uint*_t types in i915 (Jani)
- Selftest improvements (Chris, Matt A, Dan)
- Documentation fixes (Chris, Jose)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# Conflicts:
# drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
# drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
# drivers/gpu/drm/i915/gt/intel_lrc.c
# drivers/gpu/drm/i915/gvt/mmio_context.h
# drivers/gpu/drm/i915/i915_drv.h
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114152232.GA21588@jlahtine-mobl.ger.corp.intel.com
When wedging the device, we cancel all outstanding requests and mark
them as EIO. Rather than duplicate the small function to do so between
each submission backend, export one.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210109163455.28466-3-chris@chris-wilson.co.uk
Since we use a flag within i915_request.flags to indicate when we have
boosted the request (so that we only apply the boost) once, this can be
used as the serialisation with i915_request_retire() to avoid having to
explicitly take the i915_request.lock which is more heavily contended.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201231093149.19086-1-chris@chris-wilson.co.uk
We only need to evaluate the current status of the context when it is
scheduled in, we will force a reschedule when the context is closed
propagating the change to inflight contexts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201231093946.11649-1-chris@chris-wilson.co.uk
Since we process schedule-in of a context after submitting the request,
if we decide to reset the context at that time, we also have to cancel
the requets we have marked for submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201230220028.17089-1-chris@chris-wilson.co.uk
Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fast/immediate submissions, without requiring any
duplicated code paths, and most importantly greatly simplifying the
control flow by removing reentrancy. This will enable us to close a few
races in the virtual engines in the next few patches.
The trickiest part here is to ensure that paired operations (such as
schedule_in/schedule_out) remain under consistent locking domains,
e.g. when pulled outside of the engine->active.lock
v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET
softirq synchronous").
v3: Update engine-reset to be tasklet aware
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
tasklet_kill() ensures that we _yield_ the processor until a remote
tasklet is completed. However, this leads to a starvation condition as
being at the bottom of the scheduler's runqueue means that anything else
is able to run, including all hogs keeping the tasklet occupied.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201220134858.10510-1-chris@chris-wilson.co.uk
Since the introduction of preempt-to-busy, requests can complete in the
background, even while they are not on the engine->active.requests list.
As such, the engine->active.request list itself is not in strict
retirement order, and we have to scan the entire list while unwinding to
not miss any. However, if the request is completed we currently leave it
on the list [until retirement], but we could just as simply remove it
and stop treating it as active. We would only have to then traverse it
once while unwinding in quick succession.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-1-chris@chris-wilson.co.uk
Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
We plan to expand upon the number of available statuses for when we
pretty-print the requests along the timelines, and so need a new set of
flags. We have settled upon:
Unready [U]
- initial status after being submitted, the request is not
ready for execution as it is waiting for external fences
Ready [R]
- all fences the request was waiting on have been signaled,
and the request is now ready for execution and will be
in a backend queue
- a ready request may still need to wait on semaphores
[internal fences]
Ready/virtual [V]
- same as ready, but queued over multiple backends
Executing [E]
- the request has been transferred from the backend queue and
submitted for execution on HW
- a completed request may still be regarded as executing, its
status may not be updated until it is retired and removed
from the lists
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk
Extract i915_request_show for reuse in other request chain pretty
printers.
For a bonus point, quietly change the seqno format from %llx to %lld to
match everywhere else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk
We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.
Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
(cherry picked from commit 7a991cd3e3da9a56d5616b62d425db000a3242f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove
i915_request.lock requirement for execution callbacks") caused it to
revert an earlier correction. Let us restore commit 99f0a640d464
("drm/i915: Remove requirement for holding i915_request.lock for
breadcrumbs")
Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk
(cherry picked from commit 35faeb7de9ef83da510a048f2016061f1e31d5fc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.
Previous fix 1d9221e9d395 ("drm/i915: Skip signaling a signaled request")
however did not correctly serialize request retirement with the execution
callbacks.
We were using the i915_request.lock to serialise adding an execution callback
with __i915_request_submit. However, if we use an atomic llist_add to serialise
multiple waiters and then check to see if the request is already executing, we
can remove the irq-spinlock and fix serialization between retirement and
execution callbacks in one go.
v2: Avoid using the irq_work when outside of the irq-spinlocks, where we
can execute the callbacks immediately.
v3: Pay close attention to the order of setting ACTIVE on retirement, we
need to ensure the request is signaled and breadcrumbs detached before
we finish removing the request from the engine.
v4: Expanded commit message.
Fixes: 1d9221e9d395 ("drm/i915: Skip signaling a signaled request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.
The latest occurence of this was spotted by KCSAN:
[ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
[ 1413.563221]
[ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
[ 1413.563548] __await_execution+0x217/0x370 [i915]
[ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
[ 1413.564235] i915_request_await_object+0x421/0x490 [i915]
[ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
[ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
[ 1413.564998] drm_ioctl_kernel+0x156/0x1b0
[ 1413.565022] drm_ioctl+0x2ff/0x480
[ 1413.565046] __x64_sys_ioctl+0x87/0xd0
[ 1413.565069] do_syscall_64+0x4d/0x80
[ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
To complicate matters, we have to both avoid the read tearing of *active and
avoid any write tearing as perform the pending[] -> inflight[] promotion of the
execlists.
This is because we cannot rely on the memcpy doing u64 aligned copies on all
kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
annotations to satisfy KCSAN.
v2: When in doubt, write the same comment again.
v3: Expanded commit message.
Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently we hold no actual reference to the request nor context while
they are attached to a breadcrumb. To avoid freeing the request/context
too early, we serialise with cancel-breadcrumbs by taking the irq
spinlock in i915_request_retire(). The alternative is to take a
reference for a new breadcrumb and release it upon signaling; removing
the more frequently hit contention point in i915_request_retire().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.
v2: Drop b->irq_armed = true mocking for no interrupt HW
Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
After staring at the breadcrumb enabling/cancellation and coming to the
conclusion that the cause of the mysterious stale breadcrumbs must the
act of submitting a completed requests, we can then redirect those
completed requests onto a dedicated signaled_list at the time of
construction and so eliminate intel_engine_transfer_stale_breadcrumbs().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock within request submission.
The challenge in serialisation is around the unsubmission where we take
an active request that wants a breadcrumb on the signaling engine and
put it to sleep. We do not want a concurrent
dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so
we must mark the request as no longer active before serialising with the
concurrent enable-signaling.
On retire, we serialise with the concurrent enable-signaling, but
instead of clearing ACTIVE, we mark it as SIGNALED.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.
That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.
It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We include a tasklet flush before waiting on a request as a precaution
against the HW being lax in event signaling. We now have a precautionary
flush in the engine's heartbeat and so do not need to be quite so
zealous on every request wait. If we focus on the request, the only
tasklet flush that matters is if there is a delay in submitting this
request to HW, so if the request is not ready to be executed, no
advantage in reducing this wait can be gained by running the tasklet.
And there is little point in doing busy work for no result.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200715115147.11866-10-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then try and call dma_fence_signal()
ourselves while the signaler is still holding the lock.
dma_fence_is_signaled() has the benefit of checking the
DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
avoids most of that contention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716100754.5670-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
It's been a while since gen6_rps_boost() [that only worked on gen6+] was
replaced by intel_rps_boost() that understood itself when rps was
active. Since the intel_rps_boost() is gen-agnostic, just call it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152219.1387-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Preempt-to-busy introduces various fascinating complications in that the
requests may complete as we are unsubmitting them from HW. As they may
then signal after unsubmission, we may find ourselves having to cleanup
the signaling request from within the signaling callback. This causes us
to recurse onto the same i915_request.lock.
However, if the request is already signaled (as it will be before we
enter the signal callbacks), we know we can skip the signaling of that
request during submission, neatly evading the spinlock recursion.
unsubmit(ve.rq0) # timeslice expiration or other preemption
-> virtual_submit_request(ve.rq0)
dma_fence_signal(ve.rq0) # request completed before preemption ack
-> submit_notify(ve.rq1)
-> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0
-> __i915_request_submit(ve.rq0)
[ 264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093
[ 264.210150] lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2
[ 264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G U
[ 264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019
[ 264.210160] Call Trace:
[ 264.210167] dump_stack+0x98/0xda
[ 264.210174] spin_dump.cold+0x24/0x3c
[ 264.210178] do_raw_spin_lock+0x9a/0xd0
[ 264.210184] _raw_spin_lock_nested+0x6a/0x70
[ 264.210314] __i915_request_submit+0x10a/0x3c0 [i915]
[ 264.210415] virtual_submit_request+0x9b/0x380 [i915]
[ 264.210516] submit_notify+0xaf/0x14c [i915]
[ 264.210602] __i915_sw_fence_complete+0x8a/0x230 [i915]
[ 264.210692] i915_sw_fence_complete+0x2d/0x40 [i915]
[ 264.210762] __dma_i915_sw_fence_wake+0x19/0x30 [i915]
[ 264.210767] dma_fence_signal_locked+0xb1/0x1c0
[ 264.210772] dma_fence_signal+0x29/0x50
[ 264.210871] i915_request_wait+0x5cb/0x830 [i915]
[ 264.210876] ? dma_resv_get_fences_rcu+0x294/0x5d0
[ 264.210974] i915_gem_object_wait_fence+0x2f/0x40 [i915]
[ 264.211084] i915_gem_object_wait+0xce/0x400 [i915]
[ 264.211178] i915_gem_wait_ioctl+0xff/0x290 [i915]
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713141636.29326-1-chris@chris-wilson.co.uk
We infrequently use the direct i915 backpointer from the i915_request,
so do we really need to waste the space in the struct for it? 8 bytes
from the most frequently allocated struct vs an 3 bytes and pointer
chasing in using rq->engine->i915?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk
With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.
We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.
As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag. However, we do not precisely know which
request is in ELSP[0] due to a delay in processing events, and
furthermore we only store the last request in a context in our state
tracker.
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Testcase: igt/gem_exec_balancer/bonded-dual
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
When we push a virtual request onto the HW, we update the rq->engine to
point to the physical engine. A request that is then submitted by the
user that waits upon the virtual engine, but along the physical engine
in use, will then see that it is due to be submitted to the same engine
and take a shortcut (and be queued without waiting for the completion
fence). However, the virtual request may be preempted (either by higher
priority users, or by timeslicing) and removed from the physical engine
to be migrated over to one of its siblings. The dependent normal request
however is oblivious to the removal of the virtual request and remains
queued to execute on HW, believing that once it reaches the head of its
queue all of its predecessors will have completed executing!
v2: Beware restriction of signal->execution_mask prior to submission.
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Testcase: igt/gem_exec_balancer/sliced
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
Reduce the irq_work llist for attaching the callbacks to the signal for
both smaller structs (two fewer pointers!) and simpler [debug] code:
Function old new delta
irq_execute_cb 35 34 -1
__igt_breadcrumbs_smoketest 1684 1682 -2
i915_request_retire 2003 1996 -7
__i915_request_create 1047 1040 -7
__notify_execute_cb 135 126 -9
__i915_request_ctor 188 178 -10
__await_execution.part.constprop 451 440 -11
igt_wait_request 924 714 -210
One minor artifact is that the order of cb exection is reversed. No
current use cases are affected by that change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526112051.10229-1-chris@chris-wilson.co.uk
In order to be valid to dereference during the i915_fence_release, after
retiring the fence and releasing its refererences, we assume that
rq->engine can only be a real engine (that stay intact until the device
is shutdown after all fences have been flushed). However, due to a quirk
of preempt-to-busy, we may retire a request that still belongs to a
virtual engine and so eventually free it with rq->engine being invalid.
To avoid dereferencing that invalid engine, we look at the
execution_mask which if it indicates it may be executed on more than one
engine, we know it originated on a virtual engine and may still be on
one.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-2-chris@chris-wilson.co.uk
Now that we have fast timeslicing on semaphores, we no longer need to
prioritise none-semaphore work as we will yield any work blocked on a
semaphore to the next in the queue. Previously with no timeslicing,
blocking on the semaphore caused extremely bad scheduling with multiple
clients utilising multiple rings. Now, there is no impact and we can
remove the complication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk