IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This was done by the following semantic patch:
@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)
@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E
@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E
@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E
@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)
@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)
@@
identifier def.id;
@@
- id
+ ver
It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com
I doubt anyone has used the display error state since CS flips
went the way of the dodo. Just nuke it.
It might be semi interesting to have something like this for
FIFO underruns and the like, but as it stands this wouldn't
provide a sufficient amount of information. So would need
an extensive rewrite anyway.
The lockless power well handling is also racy, so this could
just be contributing noise to test results if we end up
accessing something with the relevant power well already
disabled.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505191140.14215-1-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cross-subsystem Changes:
- DMA mapped scatterlist fixes in i915 to unblock merging of
https://lkml.org/lkml/2020/9/27/70 (Tvrtko, Tom)
Driver Changes:
- Fix for user reported issue #2381 (Graphical output stops with "switching to inteldrmfb from simple"):
Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init (Ville, Chris)
- Fix for Tigerlake (and earlier) to avoid spurious empty CSB events leading to hang (Chris, Bruce)
- Delay execlist processing for Tigerlake to avoid hang (Chris)
- Fix for Tigerlake RCS engine health check through heartbeat (Chris)
- Fix for Tigerlake reserved MOCS entries (Ayaz, Chris)
- Fix Media power gate sequence on Tigerlake (Rodrigo)
- Enable eLLC caching of display buffers for SKL+ (Ville)
- Support parsing of oversize batches on Gen9 (Matt, Chris)
- Exclude low pages (128KiB) of stolen from use to avoid thrashing during reset (Chris)
- Flush engines before Tigerlake breadcrumbs (Chris)
- Use the local HWSP offset during submission (Chris)
- Flush coherency domains on first set-domain-ioctl (Chris, Zbigniew)
- Use the active reference on the vma while capturing to avoid use-after-free (Chris)
- Fix MOCS PTE setting for gen9+ (Ville)
- Avoid NULL dereference on IPS driver callback while unbinding i915 (Chris)
- Avoid NULL dereference from PT/PD stash allocation error (Matt)
- Hold request reference for canceling an active context (Chris)
- Avoid infinite loop on x86-32 when mapping a lot of objects (Chris)
- Disallow WC mappings when processor doesn't support them (Chris)
- Return correct error in i915_gem_object_copy_blt() error path (Dan)
- Return correct error in intel_context_create_request() error path (Maarten)
- Tune down GuC communication enabled/disabled messages to debug (Jani)
- Fix rebased commit "Remove i915_request.lock requirement for execution callbacks" (Chris)
- Cancel outstanding work after disabling heartbeats on an engine (Chris)
- Signal cancelled requests (Chris)
- Retire cancelled requests on unload (Chris)
- Scrub HW state on driver remove (Chris)
- Undo forced context restores after trivial preemptions (Chris)
- Handle PCI unbind in PMU code (Tvrtko)
- Fix CPU hotplug with multiple GPUs in PMU code (Trtkko)
- Correctly set SFC capability for video engines (Venkata)
- Update GuC code to use firmware v49.0.1 (John, Matthew B., Daniele, Oscar, Michel, Rodrigo, Michal)
- Improve GuC warnings on loading failure (John)
- Avoid ownership race in buffer pool by clearing age (Chris)
- Use MMIO to read CSB in case of failure (Chris, Mika)
- Show engine properties in engine state dump to indicate changes (Chris, Joonas)
- Break up error capture compression loops with cond_resched() (Chris)
- Reduce GPU error capture mutex hold time to avoid khungtaskd (Chris)
- Serialise debugfs i915_gem_objects with ctx->mutex (Chris)
- Always test execution status on closing the context and close if not persistent (Chris)
- Avoid mixing integer types during batch copies (Chris, Jared)
- Skip over MI_NOOP when parsing to avoid overhead (Chris)
- Hold onto an explicit ref to i915_vma_work.pinned (Chris)
- Perform all asynchronous waits prior to marking payload start (Chris)
- Pull phys pread/pwrite implementations to the backend (Matt)
- Improve record of hung engines in error state (Tvrtko)
- Allow backends to override pread implementation (Matt)
- Reinforce LRC poisoning checks to confirm context survives execution (Chris)
- Fix memory region max size calculation (Matt)
- Fix order when adding blocks to memory region (Matt)
- Eliminate unused intel_virtual_engine_get_sibling func (Chris)
- Cleanup kasan warning for on-stack (unsigned long) casting (Chris)
- Onion unwind for scratch page allocation failure (Chris)
- Poison stolen pages before use (Chris)
- Selftest improvements (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201112163407.GA20320@jlahtine-mobl.ger.corp.intel.com
Instead of printing out the internal engine mask, which can change between
kernel versions making it difficult to map to actual engines, present a
bitmask of hanging engines ABI classes. For example:
[drm] GPU HANG: ecode 9:8:24dffffd, in gem_exec_schedu [1334]
Engine ABI class is useful to quickly categorize render vs media etc hangs
in bug reports. Considering virtual engine even more so than the current
scheme.
v2:
* Do not re-order fields. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201105113842.1395391-1-tvrtko.ursulin@linux.intel.com
Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Improve
this by passing in the hung engine mask down to error capture.
Result is that the list of engines in user visible "GPU HANG: ecode
<gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
active engines. Most importantly the displayed process is now the one
which was actually hung.
v2:
* Stub prototype. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
Shrink the hold time for the error capture mutex to just around the
acquire/release of the PTE used for reading back the object via the
Global GTT. For platforms that do not need the GGTT read back, we can
skip the mutex entirely and allow concurrent error capture. Where we do
use the GGTT, by restricting the hold time around the slow readback and
compression, we are more resilient against softlockups (khungtaskd) as
the heartbeat may well also trigger an error while the first is on
going, and this allows the heartbeat reset to skip past the capture and
not be stalled.
Testcase: igt/gem_exec_capture/many-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-3-chris@chris-wilson.co.uk
Start using device specific parameters instead of module parameters for
most things. The module parameters become the immutable initial values
for i915 parameters. The device specific parameters in i915->params
start life as a copy of i915_modparams. Any later changes are only
reflected in the debugfs.
The stragglers are:
* i915.force_probe and i915.modeset. Needed before dev_priv is
available. This is fine because the parameters are read-only and never
modified.
* i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and
I915_STATE_WARN_ON would result in massive and ugly churn. This is
handled by not exposing the parameter via debugfs, and leaving the
parameter writable in sysfs. This may be fixed up in follow-up work.
* i915.inject_probe_failure. Only makes sense in terms of the module,
not the device. This is handled by not exposing the parameter via
debugfs.
v2: Fix uc i915 lookup code (Michał Winiarski)
Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
We need to keep the default context state around to instantiate new
contexts (aka golden rendercontext), and we also keep it pinned while
the engine is active so that we can quickly reset a hanging context.
However, the default contexts are large enough to merit keeping in
swappable memory as opposed to kernel memory, so we store them inside
shmemfs. Currently, we use the normal GEM objects to create the default
context image, but we can throw away all but the shmemfs file.
This greatly simplifies the tricky power management code which wants to
run underneath the normal GT locking, and we definitely do not want to
use any high level objects that may appear to recurse back into the GT.
Though perhaps the primary advantage of the complex GEM object is that
we aggressively cache the mapping, but here we are recreating the
vm_area everytime time we unpark. At the worst, we add a lightweight
cache, but first find a microbenchmark that is impacted.
Having started to create some utility functions to make working with
shmemfs objects easier, we can start putting them to wider use, where
GEM objects are overkill, such as storing persistent error state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429172429.6054-1-chris@chris-wilson.co.uk
GPU saves accumulated context runtime (in CS timestamp units) in PPHWSP
which will be useful for us in cases when we are not able to track context
busyness ourselves (like with GuC). Keep a copy of this in struct
intel_context from where it can be easily read even if the context is not
pinned.
v2:
(Chris)
* Do not store pphwsp address in intel_context.
* Log CS wrap-around.
* Simplify calculation by relying on integer wraparound.
v3:
* Include total/avg in traces and error state for debugging
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200216133620.394962-1-chris@chris-wilson.co.uk
Now that we have offline error capture and can reset an engine from
inside an atomic context while also preserving the GPU state for
post-mortem analysis, it is time to handle error interrupts thrown by
the command parser.
This provides a much, much faster mechanism for us to detect known
problems than using heartbeats/hangchecks, and also provides a mechanism
for when those are disabled. However, it is limited to problems the HW
can detect in the CS and so not a complete solution for detecting lockups.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200128204318.4182039-2-chris@chris-wilson.co.uk
io_mapping_map_atomic/kmap_atomic are occasionally taken in error capture
(if there is no aperture preallocated for the use of error capture), but
the error capture and compression routines are now run in normal
context:
<3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
<3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
<4> [113.319900] no locks held by debugfs_test/678.
<3> [113.321002] Preemption disabled at:
<4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
<4> [113.327259] Call Trace:
<4> [113.327871] dump_stack+0x67/0x9b
<4> [113.328683] ___might_sleep+0x167/0x250
<4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
<4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
<4> [113.335951] compress_page+0x7c/0x100 [i915]
<4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
<4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
However, it is not a good idea to run the slow compression inside atomic
context, so we choose not to.
Fixes: 895d8ebeaa ("drm/i915: error capture with no ggtt slot")
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191113231104.24208-1-yu.bruce.chang@intel.com