1224 Commits

Author SHA1 Message Date
Chris Wilson
89d19b2b45 drm/i915: Release shortlived maps of longlived objects
Some objects we map once during their construction, and then never
access their mappings again, even if they are kept around for the
duration of the driver. Keeping those pages mapped, often vmapped, is
therefore wasteful and we should release the maps as soon as we no
longer need them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-3-chris@chris-wilson.co.uk
2020-07-08 22:05:50 +01:00
Chris Wilson
59c94b9d26 drm/i915/gt: Replace opencoded i915_gem_object_pin_map()
As we have a pin_map interface, that knows how to flush the data to the
device, use it. The only downside is that we keep the kmap around, as
once acquired we keep the mapping cached until the object's backing
store is released.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-2-chris@chris-wilson.co.uk
2020-07-08 22:05:49 +01:00
Chris Wilson
09137e9454 drm/i915/gem: Unpin idle contexts from kswapd reclaim
We removed retiring requests from the shrinker in order to decouple the
mutexes from reclaim in preparation for unravelling the struct_mutex.
The impact of not retiring is that we are much less agressive in making
global objects available for shrinking, as such objects remain pinned
until they are flushed by a heartbeat pulse following the last retired
request along their timeline. In order to ensure that pulse occurs in
time for memory reclamation, we should kick it from kswapd.

The catch is that we have added some flush_work() into the retirement
phase (to ensure that we reach a global idle in a timely manner), but
these flush_work() are not eligible (i.e do not belong to WQ_MEM_RELCAIM)
for use from inside kswapd. To avoid flushing those workqueues, we teach
the retirer not to do so unless we are actually waiting, and only do the
plain retire from inside the shrinker.

Note that for execlists, we already retire completed contexts as they
are scheduled out, so it should not be keeping global state
unnecessarily pinned. The legacy ringbuffer however...

References: 9e9539800dd4 ("drm/i915: Remove waiting & retiring from shrinker paths")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-1-chris@chris-wilson.co.uk
2020-07-08 22:05:49 +01:00
Daniele Ceraolo Spurio
a00eda7d89 drm/i915: Move sseu debugfs under gt/
In line with what happened for other gt-related features, move the sseu
debugfs files under gt/.

The sseu_status debugfs has also been kept at the top level as we do
have tests that use it; it will be removed once we teach the tests to
look into the new path.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-10-daniele.ceraolospurio@intel.com
2020-07-08 21:40:15 +01:00
Venkata Sandeep Dhanalakota
0b6613c6b9 drm/i915/sseu: Move sseu_info under gt_info
SSEUs are a GT capability, so track them under gt_info.

Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-8-daniele.ceraolospurio@intel.com
2020-07-08 21:13:09 +01:00
Daniele Ceraolo Spurio
9b413f011c drm/i915/sseu: Move sseu detection and dump to intel_sseu
Keep all the SSEU code in the relevant file. The code has also been
updated to use intel_gt instead of dev_priv.

Based on an original patch by Sandeep.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-7-daniele.ceraolospurio@intel.com
2020-07-08 21:11:39 +01:00
Daniele Ceraolo Spurio
d0eb686687 drm/i915: Introduce gt_init_mmio
We already call 2 gt-related init_mmio functions in driver_mmio_probe
and a 3rd one will be added by a follow-up patch, so pre-emptively
introduce a gt_init_mmio function to group them.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-6-daniele.ceraolospurio@intel.com
2020-07-08 21:07:13 +01:00
Daniele Ceraolo Spurio
792592e72a drm/i915: Move the engine mask to intel_gt_info
Since the engines belong to the GT, move the runtime-updated list of
available engines to the intel_gt struct. The original mask has been
renamed to indicate it contains the maximum engine list that can be
found on a matching device.

In preparation for other info being moved to the gt in follow up patches
(sseu), introduce an intel_gt_info structure to group all gt-related
runtime info.

v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
2020-07-08 21:07:11 +01:00
Daniele Ceraolo Spurio
f6beb38100 drm/i915: Move engine-related mmio init to engines_init_mmio
All the info we read in intel_device_info_init_mmio are engine-related
and since we already have an engine_init_mmio function we can just
perform the operations from there.

v2: clarify comment about forcewake requirements and pruning (Chris)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-4-daniele.ceraolospurio@intel.com
2020-07-08 21:07:10 +01:00
Daniele Ceraolo Spurio
242613af55 drm/i915: Use the gt in HAS_ENGINE
A follow up patch will move the engine mask under the gt structure,
so get ready for that.

v2: switch the remaining gvt case using dev_priv->gt to gvt->gt (Chris)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-3-daniele.ceraolospurio@intel.com
2020-07-08 21:07:09 +01:00
Michał Winiarski
9459fd5945 drm/i915/huc: Adjust HuC state accordingly after GuC fetch error
Firmware "Selected" state is a transient state - we don't expect to see
it after finishing driver probe, we even have asserts sprinkled over
i915 to confirm whether that's the case.
Unfortunately - we don't handle the transition out of "Selected" in case
of GuC fetch error, leading those asserts to fire when calling
"intel_huc_is_used()".

v2: Add dbg print when moving HuC into error state (Daniele)

Reported-by: Marcin Bernatowicz <marcin.bernatowicz@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Marcin Bernatowicz <marcin.bernatowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708100843.297655-2-michal@hardline.pl
2020-07-08 13:02:02 +01:00
Michał Winiarski
7f67deeb7f drm/i915/uc: Extract uc usage details into separate debugfs
It has been pointed out that information about HuC usage doesn't belong
in guc_info debugfs. Let's move "supported/used/wanted" matrix to a
separate debugfs file, keeping guc_info strictly about GuC.

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708100843.297655-1-michal@hardline.pl
2020-07-08 13:02:01 +01:00
Chris Wilson
8567774e87 drm/i915/gt: Pin the rings before marking active
On eviction, we acquire the vm->mutex and then wait on the vma->active.
Therefore when binding and pinning the vma, we must follow the same
sequence, lock/pin the vma then mark it active. Otherwise, we mark the
vma as active, then wait for the vm->mutex, and meanwhile the evictor
holding the mutex waits upon us to complete our activity.

Fixes: 8ccfc20a7d56 ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706170138.8993-1-chris@chris-wilson.co.uk
2020-07-07 10:40:38 +01:00
Michał Winiarski
65706203d1 drm/i915: Print caller when tainting for CI
We can add taint from multiple places, printing the caller allows us to
have a better overview of what exactly caused us to do the tainting.

v2: Tweak format and print the device (Chris)
v3: Move things around (Chris)

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-2-michal@hardline.pl
2020-07-06 19:21:07 +01:00
Michał Winiarski
3f04bdce72 drm/i915: Reboot CI if we get wedged during driver init
Getting wedged device on driver init is pretty much unrecoverable.
Since we're running various scenarios that may potentially hit this in
CI (module reload / selftests / hotunplug), and if it happens, it means
that we can't trust any subsequent CI results, we should just apply the
taint to let the CI know that it should reboot (CI checks taint between
test runs).

v2: Comment that WEDGED_ON_INIT is non-recoverable, distinguish
    WEDGED_ON_INIT from WEDGED_ON_FINI (Chris)
v3: Appease checkpatch, fixup search-replace logic expression mindbomb
    in assert (Chris)

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-1-michal@hardline.pl
2020-07-06 19:21:07 +01:00
Chris Wilson
12b07256c2 drm/i915: Export ppgtt_bind_vma
Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can
reduce some code near-duplication. The catch is that we need to then
pass along the i915_address_space and not rely on vma->vm, as they
differ with the aliasing-ppgtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200703102519.26539-1-chris@chris-wilson.co.uk
2020-07-03 15:14:35 +01:00
Michał Winiarski
ba06216d00 drm/i915/guc: Expand guc_info debugfs with more information
The information about platform/driver/user view of GuC firmware usage
currently requires user to either go through kernel log or parse the
combination of "enable_guc" modparam and various debugfs entries.
Let's keep things simple and add a "supported/used/wanted" matrix
(already used internally by i915) in guc_info debugfs.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200701142752.419878-1-michal@hardline.pl
2020-07-03 10:27:29 +01:00
Chris Wilson
8f125dafb3 drm/i915/gt: Move the heartbeat into the high priority system wq
As we ensure that the heartbeat is reasonably fast (and should not
block), move the heartbeat work into the system_highpri_wq to avoid
having this essential task be blocked behind other slow work, such as
our own retire_work_handler.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2119
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702095219.963-2-chris@chris-wilson.co.uk
2020-07-02 12:30:24 +01:00
Chris Wilson
aab4707fdd drm/i915/gt: Harden the heartbeat against a stuck driver
If the driver gets stuck holding the kernel timeline, we cannot issue a
heartbeat and so fail to discover that the driver is indeed stuck and do
not issue a GPU reset (which would hopefully unstick the driver!).
Switch to using a trylock so that we can query if the heartbeat's
timeline mutex is locked elsewhere, and then use the timer to probe if it
remains stuck at the same spot for consecutive heartbeats, indicating
that the mutex has not been released and the engine has not progressed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702095219.963-1-chris@chris-wilson.co.uk
2020-07-02 12:30:23 +01:00
Jani Nikula
8a25c4be58 drm/i915/params: switch to device specific parameters
Start using device specific parameters instead of module parameters for
most things. The module parameters become the immutable initial values
for i915 parameters. The device specific parameters in i915->params
start life as a copy of i915_modparams. Any later changes are only
reflected in the debugfs.

The stragglers are:

* i915.force_probe and i915.modeset. Needed before dev_priv is
  available. This is fine because the parameters are read-only and never
  modified.

* i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and
  I915_STATE_WARN_ON would result in massive and ugly churn. This is
  handled by not exposing the parameter via debugfs, and leaving the
  parameter writable in sysfs. This may be fixed up in follow-up work.

* i915.inject_probe_failure. Only makes sense in terms of the module,
  not the device. This is handled by not exposing the parameter via
  debugfs.

v2: Fix uc i915 lookup code (Michał Winiarski)

Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
2020-06-22 23:26:40 +03:00
Chris Wilson
cf46143fe2 drm/i915/gt: Replace manual kmap_atomic() with pin_map for renderstate
We only emit the renderstate once now during module load, it is no
longer a concern that we are delaying context creation and so do not
need to so eagerly optimise. Since the last time we have looked at the
renderstate, we have a pin_map / flush_map facility that supports simple
single mappings, replacing the open-coded kmap_atomic() and
prepare_write. As it should be a single page, of which we only write a
small portion, we stick to a simple WB [kmap] and use clflush on !llc
platforms, rather than creating a temporary WC vmapping for the single
page.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200619234543.17499-2-chris@chris-wilson.co.uk
2020-06-20 09:57:10 +01:00
Chris Wilson
4fb3395343 drm/i915/gt: Show the culmative runtime as part of the engine info
Since we always enable the busy-stats, the culmulative runtime should be
accurate, and might be useful for diagnosing issues with the engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200619191053.9654-1-chris@chris-wilson.co.uk
2020-06-19 21:13:45 +01:00
Chris Wilson
5a15550e56 drm/i915/gt: Initialise rps timestamp
Smatch warns that we may iterate over an empty array of gt->engines[].
One hopes that this is impossible, but nevertheless we can simply
appease smatch by initialising the timestamp to zero before we starting
probing the busy-time from the engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200619151938.21740-1-chris@chris-wilson.co.uk
2020-06-19 20:00:18 +01:00
Chris Wilson
810b7ee300 drm/i915/gt: Always report the sample time for busy-stats
Return the monotonic timestamp (ktime_get()) at the time of sampling the
busy-time. This is used in preference to taking ktime_get() separately
before or after the read seqlock as there can be some large variance in
reported timestamps. For selftests trying to ascertain that we are
reporting accurate to within a few microseconds, even a small delay
leads to the test failing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200617130916.15261-2-chris@chris-wilson.co.uk
2020-06-18 09:26:54 +01:00
Chris Wilson
1b90e4a43b drm/i915/selftests: Enable selftesting of busy-stats
A couple of very simple tests to ensure that the basic properties of
per-engine busyness accounting [0% and 100% busy] are faithful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200617130916.15261-1-chris@chris-wilson.co.uk
2020-06-18 09:26:53 +01:00
Colin Ian King
0ff0fc97d3 drm/i915/selftests: fix spelling mistake "submited" -> "submitted"
There is a spelling mistake in a pr_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200617085207.167552-1-colin.king@canonical.com
2020-06-17 12:59:31 +01:00
Chris Wilson
dfdfbd3823 drm/i915/selftests: Check preemption rollback of different ring queue depths
Like live_unlite_ring, but instead of simply looking at the impact of
intel_ring_direction(), check that preemption more generally works with
different depths of queued requests in the ring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616233733.18050-1-chris@chris-wilson.co.uk
2020-06-17 01:56:10 +01:00
Chris Wilson
ba0cada976 drm/i915/selftests: Use friendly request names for live_timeslice_rewind
Rather than mixing [012] and (A1, A2, B2) for the request indices, use
the enums throughout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616185518.11948-2-chris@chris-wilson.co.uk
2020-06-17 00:32:29 +01:00
Chris Wilson
9199c070cd drm/i915/selftests: Exercise far preemption rollbacks
Not too long ago, we realised we had issues with a rolling back a
context so far for a preemption request we considered the resubmit not
to be a rollback but a forward roll. This means we would issue a lite
restore instead of forcing a full restore, continuing execution of the
old requests rather than causing a preemption. Add a selftest to
exercise such a far rollback, such that if we were to skip the full
restore, we would execute invalid instructions in the ring and hang.

Note that while I was able to confirm that this causes us to do a
lite-restore preemption rollback (with commit e36ba817fa96 ("drm/i915/gt:
Incrementally check for rewinding") disabled), it did not trick the HW
into rolling past the old RING_TAIL. Myybe on other HW.

References: e36ba817fa96 ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616185518.11948-1-chris@chris-wilson.co.uk
2020-06-17 00:32:29 +01:00
Gustavo A. R. Silva
f29e08800b drm/i915/selftests: Fix inconsistent IS_ERR and PTR_ERR
Fix inconsistent IS_ERR and PTR_ERR in live_timeslice_nopreempt().

The proper pointer to be passed as argument to PTR_ERR() is ce.

This bug was detected with the help of Coccinelle.

Fixes: b72f02d78e4f ("drm/i915/gt: Prevent timeslicing into unpreemptable requests")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616145452.GA25291@embeddedor
2020-06-16 20:56:37 +01:00
Chris Wilson
570af07d79 drm/i915/gt: Don't flush the tasklet if not setup
If the tasklet is not being used, don't try and flush it.

Fixes: 594893870044 ("drm/i915/gt: Add a safety submission flush in the heartbeat")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615183935.17389-1-chris@chris-wilson.co.uk
2020-06-15 21:15:02 +01:00
Rodrigo Vivi
5a7eeb8ba1 drm/i915: Include asm sources for {ivb, hsw}_clear_kernel.c
Alexandre Oliva has recently removed these files from Linux Libre
with concerns that the sources weren't available.

The sources are available on IGT repository, and only open source
tools are used to generate the {ivb,hsw}_clear_kernel.c files.

However, the remaining concern from Alexandre Oliva was around
GPL license and the source not been present when distributing
the code.

So, it looks like 2 alternatives are possible, the use of
linux-firmware.git repository to store the blob or making sure
that the source is also present in our tree. Since the goal
is to limit the i915 firmware to only the micro-controller blobs
let's make sure that we do include the asm sources here in our tree.

Btw, I tried to have some diligence here and make sure that the
asms that these commits are adding are truly the source for
the mentioned files:

igt$ ./scripts/generate_clear_kernel.sh -g ivb \
     -m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "ivb-cb_assembled"

Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c"
for i915 driver...

igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \
     ivb_clear_kernel.c

<  * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC
>  * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT
61c61
< };
> };
\ No newline at end of file

igt$ ./scripts/generate_clear_kernel.sh -g hsw \
     -m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "hsw-cb_assembled"

Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c"
for i915 driver...

igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \
     hsw_clear_kernel.c
5c5
<  * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC
>  * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT
61c61
< };
> };
\ No newline at end of file

Used IGT and Mesa master repositories from Fri Jun 5 2020)
IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging
     the connector")
Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with
      RADV_DEBUG=shaderstats")
Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \
                 -D gallium-drivers=iris -D prefix=/usr \
		 -D libdir=/usr/lib64/ -Dtools=intel \
		 -Dkulkan-drivers=intel && ninja -C build

v2: Header clean-up and include build instructions in a readme (Chris)
    Modified commit message to respect check-patch

Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003374.html
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003375.html
Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Cc: <stable@vger.kernel.org> # v5.7+
Cc: Alexandre Oliva <lxoliva@fsfla.org>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200610201807.191440-1-rodrigo.vivi@intel.com
2020-06-15 11:53:40 -07:00
Chris Wilson
5948938700 drm/i915/gt: Add a safety submission flush in the heartbeat
Just in case everything fails (like for example "missed interrupt
syndrome" on Sandybridge), always flush the submission tasklet from the
heartbeat. This papers over such issues, but will still appear as a
second long glitch, and prevents us from detecting it unless we happen
to be performing a timed test.

v2: We rely on flush_submission() synchronizing with the tasklet on
another CPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615165013.22973-3-chris@chris-wilson.co.uk
2020-06-15 18:32:28 +01:00
Chris Wilson
f2e85e5736 drm/i915/selftests: Dump engine state and trace upon hanging after reset
If the engine dies after a reset, and so we fail to submit a request
but need to be interrupted by the CI runner, dump the engine state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615165013.22973-2-chris@chris-wilson.co.uk
2020-06-15 18:31:44 +01:00
Chris Wilson
7102a76043 drm/i915/selftests: Disable preemptive heartbeats over preemption tests
Since the heartbeat may cause a preemption event, disable it over the
preemption suppression tests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615165013.22973-1-chris@chris-wilson.co.uk
2020-06-15 18:31:16 +01:00
Chris Wilson
2267f68404 drm/i915/gt: Flush gen3 relocs harder, again
gen3 does not fully flush MI stores to memory on MI_FLUSH, such that a
subsequent read from e.g. the sampler can bypass the store and read the
stale value from memory. This is a serious issue when we are using MI
stores to rewrite the batches for relocation, as it means that the batch
is reading from random user/kernel memory. While it is particularly
sensitive [and detectable] for relocations, reading stale data at any
time is a worry.

Having started with a small number of delaying stores and doubling until
no more incoherency was seen over a few hours (with and without
background memory pressure), 32 was the magic number.

Note that it definitely doesn't fix the issue, merely adds a long delay
between requests, sufficient to mostly hide the problem, enough to raise
the mtbf to several hours. This is merely a stop gap.

v2: Follow more closer with the gen5 w/a and include some
post-invalidate flushes as well.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2018
References: a889580c087a ("drm/i915: Flush GPU relocs harder for gen3")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612123949.7093-1-chris@chris-wilson.co.uk
2020-06-13 10:30:01 +01:00
Chris Wilson
d4b02a4c61 drm/i915/selftests: Trim execlists runtime
Reduce the smoke depth by trimming the number of contexts, repetitions
and wait times. This is in preparation for a less greedy scheduler that
tries to be fair across contexts, resulting in a great many more context
switches. A thousand context switches may be 50-100ms, causing us to
timeout as the HW is not fast enough to complete the deep smoketests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-5-chris@chris-wilson.co.uk
2020-06-13 10:24:26 +01:00
Chris Wilson
3d09677a07 drm/i915/execlists: Lift opportunistic process_csb to before engine lock
Since the process_csb() does not require us to hold the
engine->active.lock, we can move the opportunistic flush before
direction submission to outside of the lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612221113.9129-1-chris@chris-wilson.co.uk
2020-06-13 10:15:54 +01:00
Chris Wilson
2bcefd0d26 drm/i915/gt: Move gen4 GT workarounds from init_clock_gating to workarounds
Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-6-chris@chris-wilson.co.uk
2020-06-11 16:11:39 +01:00
Chris Wilson
806a45c083 drm/i915/gt: Move ilk GT workarounds from init_clock_gating to workarounds
Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-5-chris@chris-wilson.co.uk
2020-06-11 16:11:39 +01:00
Chris Wilson
c3b93a943f drm/i915/gt: Move snb GT workarounds from init_clock_gating to workarounds
Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-4-chris@chris-wilson.co.uk
2020-06-11 16:11:39 +01:00
Chris Wilson
7331c356b6 drm/i915/gt: Move vlv GT workarounds from init_clock_gating to workarounds
Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-3-chris@chris-wilson.co.uk
2020-06-11 16:11:39 +01:00
Chris Wilson
19f1f627b3 drm/i915/gt: Move ivb GT workarounds from init_clock_gating to workarounds
Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611080140.30228-2-chris@chris-wilson.co.uk
2020-06-11 16:11:39 +01:00
Chris Wilson
f93ec5fb56 drm/i915/gt: Move hsw GT workarounds from init_clock_gating to workarounds
Rescue the GT workarounds from being buried inside init_clock_gating so
that we remember to apply them after a GT reset, and that they are
included in our verification that the workarounds are applied.

v2: Leave HSW_SCRATCH to set an explicit value, not or in our disable
bit.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2011
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200611093015.11370-1-chris@chris-wilson.co.uk
2020-06-11 16:11:39 +01:00
Chris Wilson
ad2ad80e64 drm/i915/selftests: Remove live_suppress_wait_preempt
With the removal of the internal wait-priority boosting, we can also
remove the selftest to ensure that those waits were being suppressed
from causing preemptions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-4-chris@chris-wilson.co.uk
2020-06-11 16:11:28 +01:00
Chris Wilson
3e48e836cf drm/i915/gt: Include context status in debug dumps
This may be useful to identify contexts that are running even though
they are supposed to be closed or banned.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200610154046.22449-1-chris@chris-wilson.co.uk
2020-06-10 16:47:31 +01:00
Chris Wilson
174b976d56 drm/i915/selftests: Teach hang-self to target only itself
We have a test case to exercise resetting an engine while the other
engines are busy, all the TEST_SELF adds on top is that the target
engine also has background activity. In this case it is useful to first
test resetting the engine while there is background activity, as a
separate flag from exercising all others.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-3-chris@chris-wilson.co.uk
2020-06-10 15:53:52 +01:00
Chris Wilson
e36ba817fa drm/i915/gt: Incrementally check for rewinding
In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.

However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.

Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk
2020-06-10 15:42:47 +01:00
Chris Wilson
94ed47531d drm/i915/selftests: Make the hanging request non-preemptible
In some of our hangtests, we try to reset an active engine while it is
spinning inside the recursive spinner. However, we also try to flood the
engine with requests that preempt the hang, and so should disable the
preemption to be sure that we reset the right request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-2-chris@chris-wilson.co.uk
2020-06-08 23:20:48 +01:00
Tvrtko Ursulin
8733a06323 drm/i915: Adjust the sentinel assert to match implementation
Sentinels are supposed to be last requests in the elsp queue, not the
only one, so adjust the assert accordingly.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2020-06-08 23:20:24 +01:00