IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Some objects we map once during their construction, and then never
access their mappings again, even if they are kept around for the
duration of the driver. Keeping those pages mapped, often vmapped, is
therefore wasteful and we should release the maps as soon as we no
longer need them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-3-chris@chris-wilson.co.uk
As we have a pin_map interface, that knows how to flush the data to the
device, use it. The only downside is that we keep the kmap around, as
once acquired we keep the mapping cached until the object's backing
store is released.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-2-chris@chris-wilson.co.uk
We removed retiring requests from the shrinker in order to decouple the
mutexes from reclaim in preparation for unravelling the struct_mutex.
The impact of not retiring is that we are much less agressive in making
global objects available for shrinking, as such objects remain pinned
until they are flushed by a heartbeat pulse following the last retired
request along their timeline. In order to ensure that pulse occurs in
time for memory reclamation, we should kick it from kswapd.
The catch is that we have added some flush_work() into the retirement
phase (to ensure that we reach a global idle in a timely manner), but
these flush_work() are not eligible (i.e do not belong to WQ_MEM_RELCAIM)
for use from inside kswapd. To avoid flushing those workqueues, we teach
the retirer not to do so unless we are actually waiting, and only do the
plain retire from inside the shrinker.
Note that for execlists, we already retire completed contexts as they
are scheduled out, so it should not be keeping global state
unnecessarily pinned. The legacy ringbuffer however...
References: 9e9539800dd4 ("drm/i915: Remove waiting & retiring from shrinker paths")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-1-chris@chris-wilson.co.uk
In line with what happened for other gt-related features, move the sseu
debugfs files under gt/.
The sseu_status debugfs has also been kept at the top level as we do
have tests that use it; it will be removed once we teach the tests to
look into the new path.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-10-daniele.ceraolospurio@intel.com
SSEUs are a GT capability, so track them under gt_info.
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-8-daniele.ceraolospurio@intel.com
Keep all the SSEU code in the relevant file. The code has also been
updated to use intel_gt instead of dev_priv.
Based on an original patch by Sandeep.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-7-daniele.ceraolospurio@intel.com
We already call 2 gt-related init_mmio functions in driver_mmio_probe
and a 3rd one will be added by a follow-up patch, so pre-emptively
introduce a gt_init_mmio function to group them.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-6-daniele.ceraolospurio@intel.com
Since the engines belong to the GT, move the runtime-updated list of
available engines to the intel_gt struct. The original mask has been
renamed to indicate it contains the maximum engine list that can be
found on a matching device.
In preparation for other info being moved to the gt in follow up patches
(sseu), introduce an intel_gt_info structure to group all gt-related
runtime info.
v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
All the info we read in intel_device_info_init_mmio are engine-related
and since we already have an engine_init_mmio function we can just
perform the operations from there.
v2: clarify comment about forcewake requirements and pruning (Chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-4-daniele.ceraolospurio@intel.com
A follow up patch will move the engine mask under the gt structure,
so get ready for that.
v2: switch the remaining gvt case using dev_priv->gt to gvt->gt (Chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-3-daniele.ceraolospurio@intel.com
Firmware "Selected" state is a transient state - we don't expect to see
it after finishing driver probe, we even have asserts sprinkled over
i915 to confirm whether that's the case.
Unfortunately - we don't handle the transition out of "Selected" in case
of GuC fetch error, leading those asserts to fire when calling
"intel_huc_is_used()".
v2: Add dbg print when moving HuC into error state (Daniele)
Reported-by: Marcin Bernatowicz <marcin.bernatowicz@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Marcin Bernatowicz <marcin.bernatowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708100843.297655-2-michal@hardline.pl
It has been pointed out that information about HuC usage doesn't belong
in guc_info debugfs. Let's move "supported/used/wanted" matrix to a
separate debugfs file, keeping guc_info strictly about GuC.
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708100843.297655-1-michal@hardline.pl
On eviction, we acquire the vm->mutex and then wait on the vma->active.
Therefore when binding and pinning the vma, we must follow the same
sequence, lock/pin the vma then mark it active. Otherwise, we mark the
vma as active, then wait for the vm->mutex, and meanwhile the evictor
holding the mutex waits upon us to complete our activity.
Fixes: 8ccfc20a7d56 ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706170138.8993-1-chris@chris-wilson.co.uk
We can add taint from multiple places, printing the caller allows us to
have a better overview of what exactly caused us to do the tainting.
v2: Tweak format and print the device (Chris)
v3: Move things around (Chris)
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-2-michal@hardline.pl
Getting wedged device on driver init is pretty much unrecoverable.
Since we're running various scenarios that may potentially hit this in
CI (module reload / selftests / hotunplug), and if it happens, it means
that we can't trust any subsequent CI results, we should just apply the
taint to let the CI know that it should reboot (CI checks taint between
test runs).
v2: Comment that WEDGED_ON_INIT is non-recoverable, distinguish
WEDGED_ON_INIT from WEDGED_ON_FINI (Chris)
v3: Appease checkpatch, fixup search-replace logic expression mindbomb
in assert (Chris)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-1-michal@hardline.pl
Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can
reduce some code near-duplication. The catch is that we need to then
pass along the i915_address_space and not rely on vma->vm, as they
differ with the aliasing-ppgtt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200703102519.26539-1-chris@chris-wilson.co.uk
The information about platform/driver/user view of GuC firmware usage
currently requires user to either go through kernel log or parse the
combination of "enable_guc" modparam and various debugfs entries.
Let's keep things simple and add a "supported/used/wanted" matrix
(already used internally by i915) in guc_info debugfs.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200701142752.419878-1-michal@hardline.pl
If the driver gets stuck holding the kernel timeline, we cannot issue a
heartbeat and so fail to discover that the driver is indeed stuck and do
not issue a GPU reset (which would hopefully unstick the driver!).
Switch to using a trylock so that we can query if the heartbeat's
timeline mutex is locked elsewhere, and then use the timer to probe if it
remains stuck at the same spot for consecutive heartbeats, indicating
that the mutex has not been released and the engine has not progressed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702095219.963-1-chris@chris-wilson.co.uk
Start using device specific parameters instead of module parameters for
most things. The module parameters become the immutable initial values
for i915 parameters. The device specific parameters in i915->params
start life as a copy of i915_modparams. Any later changes are only
reflected in the debugfs.
The stragglers are:
* i915.force_probe and i915.modeset. Needed before dev_priv is
available. This is fine because the parameters are read-only and never
modified.
* i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and
I915_STATE_WARN_ON would result in massive and ugly churn. This is
handled by not exposing the parameter via debugfs, and leaving the
parameter writable in sysfs. This may be fixed up in follow-up work.
* i915.inject_probe_failure. Only makes sense in terms of the module,
not the device. This is handled by not exposing the parameter via
debugfs.
v2: Fix uc i915 lookup code (Michał Winiarski)
Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
We only emit the renderstate once now during module load, it is no
longer a concern that we are delaying context creation and so do not
need to so eagerly optimise. Since the last time we have looked at the
renderstate, we have a pin_map / flush_map facility that supports simple
single mappings, replacing the open-coded kmap_atomic() and
prepare_write. As it should be a single page, of which we only write a
small portion, we stick to a simple WB [kmap] and use clflush on !llc
platforms, rather than creating a temporary WC vmapping for the single
page.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200619234543.17499-2-chris@chris-wilson.co.uk
Smatch warns that we may iterate over an empty array of gt->engines[].
One hopes that this is impossible, but nevertheless we can simply
appease smatch by initialising the timestamp to zero before we starting
probing the busy-time from the engines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200619151938.21740-1-chris@chris-wilson.co.uk
Return the monotonic timestamp (ktime_get()) at the time of sampling the
busy-time. This is used in preference to taking ktime_get() separately
before or after the read seqlock as there can be some large variance in
reported timestamps. For selftests trying to ascertain that we are
reporting accurate to within a few microseconds, even a small delay
leads to the test failing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200617130916.15261-2-chris@chris-wilson.co.uk
A couple of very simple tests to ensure that the basic properties of
per-engine busyness accounting [0% and 100% busy] are faithful.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200617130916.15261-1-chris@chris-wilson.co.uk
Like live_unlite_ring, but instead of simply looking at the impact of
intel_ring_direction(), check that preemption more generally works with
different depths of queued requests in the ring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616233733.18050-1-chris@chris-wilson.co.uk
Not too long ago, we realised we had issues with a rolling back a
context so far for a preemption request we considered the resubmit not
to be a rollback but a forward roll. This means we would issue a lite
restore instead of forcing a full restore, continuing execution of the
old requests rather than causing a preemption. Add a selftest to
exercise such a far rollback, such that if we were to skip the full
restore, we would execute invalid instructions in the ring and hang.
Note that while I was able to confirm that this causes us to do a
lite-restore preemption rollback (with commit e36ba817fa96 ("drm/i915/gt:
Incrementally check for rewinding") disabled), it did not trick the HW
into rolling past the old RING_TAIL. Myybe on other HW.
References: e36ba817fa96 ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616185518.11948-1-chris@chris-wilson.co.uk
Fix inconsistent IS_ERR and PTR_ERR in live_timeslice_nopreempt().
The proper pointer to be passed as argument to PTR_ERR() is ce.
This bug was detected with the help of Coccinelle.
Fixes: b72f02d78e4f ("drm/i915/gt: Prevent timeslicing into unpreemptable requests")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616145452.GA25291@embeddedor
If the tasklet is not being used, don't try and flush it.
Fixes: 594893870044 ("drm/i915/gt: Add a safety submission flush in the heartbeat")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615183935.17389-1-chris@chris-wilson.co.uk
Alexandre Oliva has recently removed these files from Linux Libre
with concerns that the sources weren't available.
The sources are available on IGT repository, and only open source
tools are used to generate the {ivb,hsw}_clear_kernel.c files.
However, the remaining concern from Alexandre Oliva was around
GPL license and the source not been present when distributing
the code.
So, it looks like 2 alternatives are possible, the use of
linux-firmware.git repository to store the blob or making sure
that the source is also present in our tree. Since the goal
is to limit the i915 firmware to only the micro-controller blobs
let's make sure that we do include the asm sources here in our tree.
Btw, I tried to have some diligence here and make sure that the
asms that these commits are adding are truly the source for
the mentioned files:
igt$ ./scripts/generate_clear_kernel.sh -g ivb \
-m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "ivb-cb_assembled"
Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c"
for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \
ivb_clear_kernel.c
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC
> * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT
61c61
< };
> };
\ No newline at end of file
igt$ ./scripts/generate_clear_kernel.sh -g hsw \
-m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "hsw-cb_assembled"
Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c"
for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \
hsw_clear_kernel.c
5c5
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC
> * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT
61c61
< };
> };
\ No newline at end of file
Used IGT and Mesa master repositories from Fri Jun 5 2020)
IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging
the connector")
Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with
RADV_DEBUG=shaderstats")
Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \
-D gallium-drivers=iris -D prefix=/usr \
-D libdir=/usr/lib64/ -Dtools=intel \
-Dkulkan-drivers=intel && ninja -C build
v2: Header clean-up and include build instructions in a readme (Chris)
Modified commit message to respect check-patch
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003374.html
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003375.html
Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Cc: <stable@vger.kernel.org> # v5.7+
Cc: Alexandre Oliva <lxoliva@fsfla.org>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200610201807.191440-1-rodrigo.vivi@intel.com
Just in case everything fails (like for example "missed interrupt
syndrome" on Sandybridge), always flush the submission tasklet from the
heartbeat. This papers over such issues, but will still appear as a
second long glitch, and prevents us from detecting it unless we happen
to be performing a timed test.
v2: We rely on flush_submission() synchronizing with the tasklet on
another CPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615165013.22973-3-chris@chris-wilson.co.uk
gen3 does not fully flush MI stores to memory on MI_FLUSH, such that a
subsequent read from e.g. the sampler can bypass the store and read the
stale value from memory. This is a serious issue when we are using MI
stores to rewrite the batches for relocation, as it means that the batch
is reading from random user/kernel memory. While it is particularly
sensitive [and detectable] for relocations, reading stale data at any
time is a worry.
Having started with a small number of delaying stores and doubling until
no more incoherency was seen over a few hours (with and without
background memory pressure), 32 was the magic number.
Note that it definitely doesn't fix the issue, merely adds a long delay
between requests, sufficient to mostly hide the problem, enough to raise
the mtbf to several hours. This is merely a stop gap.
v2: Follow more closer with the gen5 w/a and include some
post-invalidate flushes as well.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2018
References: a889580c087a ("drm/i915: Flush GPU relocs harder for gen3")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612123949.7093-1-chris@chris-wilson.co.uk
Reduce the smoke depth by trimming the number of contexts, repetitions
and wait times. This is in preparation for a less greedy scheduler that
tries to be fair across contexts, resulting in a great many more context
switches. A thousand context switches may be 50-100ms, causing us to
timeout as the HW is not fast enough to complete the deep smoketests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-5-chris@chris-wilson.co.uk
Since the process_csb() does not require us to hold the
engine->active.lock, we can move the opportunistic flush before
direction submission to outside of the lock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612221113.9129-1-chris@chris-wilson.co.uk
With the removal of the internal wait-priority boosting, we can also
remove the selftest to ensure that those waits were being suppressed
from causing preemptions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-4-chris@chris-wilson.co.uk
We have a test case to exercise resetting an engine while the other
engines are busy, all the TEST_SELF adds on top is that the target
engine also has background activity. In this case it is useful to first
test resetting the engine while there is background activity, as a
separate flag from exercising all others.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-3-chris@chris-wilson.co.uk
In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.
However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.
Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk
In some of our hangtests, we try to reset an active engine while it is
spinning inside the recursive spinner. However, we also try to flood the
engine with requests that preempt the hang, and so should disable the
preemption to be sure that we reset the right request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-2-chris@chris-wilson.co.uk
Sentinels are supposed to be last requests in the elsp queue, not the
only one, so adjust the assert accordingly.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>