IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This allows us to finally get rid of all the assumptions that vma->obj
is NULL.
Changes since v1:
- Ensure the mock_ring vma is pinned to prevent a fault.
- Pin it high to avoid failure in evict_for_vma selftest.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-3-matthew.auld@intel.com
We currently have to special case vma->obj being NULL because
of gen6 ppgtt and mock_engine. Fix gen6 ppgtt, so we may soon
be able to remove a few checks. As the object only exists as
a fake object pointing to ggtt, we have no backing storage,
so no real object is created. It just has to look real enough.
Also kill pin_mutex, it's not compatible with ww locking,
and we can use the vm lock instead.
v2:
- Drop IS_SHRINKABLE and shorten overly long line
v3:
- Checkpatch fix for alignment
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-2-matthew.auld@intel.com
In intel_context_do_pin_ww, when calling into the pre_pin hook(which is
passed the ww context) it could in theory return -EDEADLK(which is very
likely with debug kernels), once we start adding more ww locking in there,
like in the next patch. If so then we need to be mindful of having to
restart the do_pin at this point.
If this is the kernel_context, or some other early in-kernel context
where we have yet to setup the default_state, then we always inhibit the
context restore, and instead rely on the delayed active_release to set
the CONTEXT_VALID_BIT for us(if we even care), which should indicate
that we have context switched away, and that our newly saved context
state should now be valid. However, since we currently grab the active
reference before the potential ww dance, we can end up setting the
CONTEXT_VALID_BIT much too early, if we need to backoff, and then upon
re-trying the do_pin, we could potentially cause the hardware to
incorrectly load some garbage context state when later context switching
to that context, but at the very least this will trigger the
GEM_BUG_ON() in __engine_unpark. For now let's just move any ww dance
stuff prior to arming the active reference.
For normal user contexts this shouldn't be a concern, since we should
already have the default_state ready when initialising the lrc state,
and so there should be no concern with active_release somehow
prematurely setting the CONTEXT_VALID_BIT.
v2(Thomas):
- Also re-order the onion unwind
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-1-matthew.auld@intel.com
The intel_engine_create_virtual() function does not return NULL. It
returns error pointers.
Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116114916.GB11936@kili
Trying to capture uninitialised engines when we wedged on init ends in
tears. Skip that together with uC capture, since failure to initialise the
latter can actually be one of the reasons for wedging on init.
v2:
* Use i915_disable_error_state when wedging on init/fini.
v3:
* Handle mock tests.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> # v1
Link: https://patchwork.freedesktop.org/patch/msgid/20211111130634.266098-1-tvrtko.ursulin@linux.intel.com
Pre-HSW platforms don't use the gt SSEU structures; this means that
calling intel_sseu_get_subslices() on slice 0 for these platforms will
trip a GEM_BUG_ON(slice >= sseu->max_slices) warning.
Let's move the DSS lookup for a DG2 workaround into a helper function
that will only get called after we've already decided that we're on a
DG2 platform.
Fixes: 645cc0b9d972 ("drm/i915/dg2: Add initial gt/ctx/engine workarounds")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211112160107.1593906-1-matthew.d.roper@intel.com
The bspec's performance guide suggests programming specific values into
a few registers for optimal performance. Although these aren't
workarounds, it's easiest to handle them inside the GT workaround
functions (which will also ensure that the values set here are properly
melded with other bits in the same registers that _are_ set by
workarounds).
Bspec: 68331, 45395
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Siddiqui Ayaz A <ayaz.siddiqui@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211102222511.534310-4-matthew.d.roper@intel.com
Add the initial set of workarounds for Xe_HP SDV.
There are some additional workarounds specific to the compute engines
that we're holding back for now. Those will be added later, after
general compute engine support lands.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211102222511.534310-2-matthew.d.roper@intel.com
There's a small window of opportunity during which the adjust_lru()
function can be called with a GEM refcount of zero from the TTM
eviction code. This results in a kernel BUG().
Ensure that we don't attempt to modify the GEM shrinker lists unless
we have a GEM refcount.
Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211110085527.1033475-1-thomas.hellstrom@linux.intel.com
In coming patches we'll be doing the actual tile initialization between
these two uncore init phases.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211029032817.3747750-3-matthew.d.roper@intel.com
We'll be adding multi-tile support soon; on multi-tile platforms
interrupts are per-tile and every tile has the full set of
interrupt registers.
In this commit we start passing intel_gt instead of dev_priv for the
functions that are related to Xe_HP irq handling. Right now we're still
passing tile 0 everywhere, but in later patches we'll start actually
passing the correct tile.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Co-authored-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211029032817.3747750-2-matthew.d.roper@intel.com
Some selftests assume that nothing will attempt to grab these bitlocks
while they are held by the selftests. With GuC, for example, that is
not true because the hanging workloads may cause the GuC code to attempt
to grab them for a global reset, and that may cause it to end up
sleeping on the bit never waking up. Regardless whether that will be
the final solution for GuC, use clear_and_wake_up_bit() pending a more
thorough investigation on how this should be handled moving forward.
To be clear this needs to be a temporary solution. If we can't find
an in-kernel locking primitive to use here, we should at the very least
add lockdep annotation to these bitlocks with a thorough explanation
as to why we need to use bits.
v3:
- Use GEM_BUG_ON(test_and_set_bit()) rather than set_bit() to verify
the assumption that nothing is holding the reset locks when we
attempt to grab them. (Chris Wilson)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211105150146.834052-1-thomas.hellstrom@linux.intel.com
Gem-TTM objects that are backed by shmem might have populated
page-vectors without having the GEM pages set. Those objects
aren't moved to the correct shrinker / purge list by gem_madvise.
For such objects, identified by having the
_SELF_MANAGED_SHRINK_LIST set, make sure they end up on the
correct list.
v2:
- Revert a change that made swapped-out objects inaccessible for
truncating. (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211108123637.929617-1-thomas.hellstrom@linux.intel.com
When i915 receives a context reset notification from GuC, it triggers
an error capture before resetting any outstanding requsts of that
context. Unfortunately, the error capture is not a time bound
operation. In certain situations it can take a long time, particularly
when multiple large LMEM buffers must be read back and eoncoded. If
this delay is longer than other timeouts (heartbeat, test recovery,
etc.) then a full GT reset can be triggered in the middle.
That can result in the context being reset by GuC actually being
destroyed before the error capture completes and the GuC submission
code resumes. Thus, the GuC side can start dereferencing stale
pointers and Bad Things ensue.
So add a refcount get of the context during the entire reset
operation. That way, the context can't be destroyed part way through
no matter what other resets or user interactions occur.
v2:
(Matthew Brost)
- Update patch to work with async error capture
v3:
(Matthew Brost)
- Drop async capture support as that hasn't landed yet
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211108164054.23588-1-matthew.brost@intel.com
Don't set, test for, or clear per-engine reset bits with GuC submission
as the GuC owns the per engine resets not the i915. Setting, testing
for, and clearing these bits is causing issues with the hangcheck
selftest. Rather than change to test to not use these bits, rip the use
of these bits out from the reset code.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028224224.32693-1-matthew.brost@intel.com
In the next commit, we don't evict when refcount = 0, so we need to
call drain freed objects, because we want to pin new bo's in the same
place, causing a test failure.
Furthermore, since each subtest is separated, it's a lot better to use
i915_live_selftests, so each subtest starts with a clean slate, and a
clean address space.
v2(Reported-by: kernel test robot <lkp@intel.com>):
- Make hugepage_ctx static.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028125855.3281674-9-matthew.auld@intel.com
If the initial fill blit or copy blit of an object fails, the old
content of the data might be exposed and read as soon as either CPU- or
GPU PTEs are set up to point at the pages.
Intercept the blit fence with an async callback that checks the
blit fence for errors and if there are errors performs an async cpu blit
instead. If there is a failure to allocate the async dma_fence_work,
allocate it on the stack and sync wait for the blit to complete.
Add selftests that simulate gpu blit failures and failure to allocate
the async dma_fence_work.
A previous version of this pach used dma_fence_work, now that's
opencoded which adds more code but might lower the latency
somewhat in the common non-error case.
v3:
- Style fixes (Matthew Auld)
v4:
- Use "#if IS_ENABLED()" instead of #ifdef (Matthew Auld)
v5:
- Fix an issue where we, if the dependency was already signaled, might
end up waiting for a memcpy fence that would never signal.
v6:
- Add a missing i915_ttm_memcpy_release() (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104110718.688420-3-thomas.hellstrom@linux.intel.com
We are about to introduce failsafe- and asynchronous migration and
ttm moves.
This will add complexity and code to the TTM move code so it makes sense
to split it out to a separate file to make the i915 TTM code easer to
digest.
Split the i915 TTM move code out and since we will have to change the name
of the gpu_binds_iomem() and cpu_maps_iomem() functions anyway,
we alter the name of gpu_binds_iomem() to i915_ttm_gtt_binds_lmem() which
is more reflecting what it is used for.
With this we also add some more documentation. Otherwise there should be
no functional change.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104110718.688420-2-thomas.hellstrom@linux.intel.com
Add a helper to sort through the SLPC/RPS paths of get/set methods.
Boost frequency will be modified as long as it is within the constraints
of RP0 and if it is different from the existing one. We will set min
freq to boost only if there is at least one active waiter.
v2: Add num_boosts to guc_slpc_info and changes for worker function
v3: Review comments (Ashutosh)
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-4-vinay.belgaumkar@intel.com
Add helper in RPS code for handling SLPC and non-SLPC paths.
When boost is requested in the SLPC path, we can ask GuC to ramp
up the frequency req by setting the minimum frequency to boost freq.
Reset freq back to the min softlimit when there are no more waiters.
v2: Schedule a worker thread which can boost freq from within
an interrupt context as well.
v3: No need to check against requested freq before scheduling boost
work (Ashutosh)
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-3-vinay.belgaumkar@intel.com
Define helpers and struct members required to record boost info.
Boost frequency is initialized to RP0 at SLPC init. Also define num_waiters
which can track the pending boost requests.
Boost will be done by scheduling a worker thread. This will avoid
the need to make H2G calls inside an interrupt context. Initialize the
worker function during SLPC init as well. Had to move intel_guc_slpc_init
a few lines below to accommodate this.
v2: Add a workqueue to handle waitboost
v3: Code review comments (Ashutosh)
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-2-vinay.belgaumkar@intel.com
As now graphics and media can have different steppings this patch is
renaming all _GT_STEP macros to _GRAPHICS_STEP.
Future platforms will properly choose between _MEDIA_STEP and
_GRAPHICS_STEP for each new workaround.
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Caz Yokoyama <caz.yokoyama@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211020002353.193893-3-jose.souza@intel.com
Graphics and media IPs can have different stepping so a new field is
needed in intel_step_info.
The next patch will take care of rename gt_step to graphics_step.
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211020002353.193893-2-jose.souza@intel.com
Adding a structure to standardize access to IP versioning as future
platforms will have this information populated at runtime.
The constant platform display version is not using this new struct but
the runtime variant will definitely use it.
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Caz Yokoyama <caz.yokoyama@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211020002353.193893-1-jose.souza@intel.com
We were overzealous here; even though discrete is non-LLC, it should
still be always coherent.
v2(Thomas & Daniel)
- Be extra cautious and limit to DG1
- Add some more commentary
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211029122137.3484203-1-matthew.auld@intel.com
Should not be needed. Even with non-coherent display, we should be using
device local-memory there, and not system memory.
v2: also add a warning in i915_gem_clflush_object
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-4-matthew.auld@intel.com
Move it next to its partner in crime; gpu_write_needs_clflush. For
better readability lets keep gpu vs cpu at least in the same file.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-3-matthew.auld@intel.com
We seem to have an unfortunate issue where we arrive from:
i915_gem_object_flush_if_display+0x86/0xd0 [i915]
intel_user_framebuffer_dirty+0x1a/0x50 [i915]
drm_mode_dirtyfb_ioctl+0xfb/0x1b0
which can be before the pages are populated(and pinned for display), and
so i915_gem_object_has_struct_page() might still return true, as per the
ttm backend. We could re-order the later get_pages() call here, but
since on discrete everything should already be coherent, with the
exception of the display engine, and even there display surfaces must be
allocated in device local-memory anyway, so there should in theory be no
conceivable reason to ever call i915_gem_clflush_object() on discrete.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/4320
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-2-matthew.auld@intel.com
In theory if clflush_work_create() somehow fails here, and we don't yet
have mm.pages populated then we end up resetting cache_dirty, which is
likely wrong, since that will potentially skip the flush-on-acquire, if
it was needed.
It looks like intel_user_framebuffer_dirty() can arrive here before the
pages are populated.
v2(Thomas):
- Move setting cache_dirty out of the async portion, also add a
comment for why that should still be safe.
v3:
- Add Thomas' irc r-b
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-1-matthew.auld@intel.com
As we start to introduce asynchronous failsafe object migration,
where we update the object state and then submit asynchronous
commands we need to record what memory resources are actually used
by various part of the command stream. Initially for three purposes:
1) Error capture.
2) Asynchronous migration error recovery.
3) Asynchronous vma bind.
At the time where these happens, the object state may have been updated
to be several migrations ahead and object sg-tables discarded.
In order to make it possible to keep sg-tables with memory resource
information for these operations, introduce refcounted sg-tables that
aren't freed until the last user is done with them.
The alternative would be to reference information sitting on the
corresponding ttm_resources which typically have the same lifetime as
these refcountes sg_tables, but that leads to other awkward constructs:
Due to the design direction chosen for ttm resource managers that would
lead to diamond-style inheritance, the LMEM resources may sometimes be
prematurely freed, and finally the subclassed struct ttm_resource would
have to bleed into the asynchronous vma bind code.
v3:
- Address a number of style issues (Matthew Auld)
v4:
- Dont check for st->sgl being NULL in i915_ttm_tt__shmem_unpopulate(),
that should never happen. (Matthew Auld)
v5:
- Fix a Potential double-free (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211101122444.114607-1-thomas.hellstrom@linux.intel.com
This implements WaProgramMgsrForCorrectSliceSpecificMmioReads which
was omitted by mistake from Gen9 documentation, while it is actually
applicable to fused off parts.
Workaround consists of making sure MCR packet control register is
programmed to point to enabled slice/subslice pair before doing any
MMIO reads from the affected registers.
Failure do to this can result in complete system hangs when running
certain workloads. Two known cases which can cause system hangs are:
1. "test_basic progvar_prog_scope_uninit" test which is part of
Khronos OpenCL conformance suite
(https://github.com/KhronosGroup/OpenCL-CTS) with the Intel
OpenCL driver (https://github.com/intel/compute-runtime).
2. VP8 media hardware encoding using the full-feature build of the
Intel media-driver (https://github.com/intel/media-driver) and
ffmpeg.
For the former case patch was verified to fix the hard system hang
when executing the OCL test on Intel Pentium CPU 6405U which contains
fused off GT1 graphics.
Reference: HSD#1508045018,1405586840, BSID#0575
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: Shawn C Lee <shawn.c.lee@intel.com>
Cc: Pawel Wilma <pawel.wilma@intel.com>
Signed-off-by: Cooper Chiou <cooper.chiou@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025042623.3876-1-cooper.chiou@intel.com
Gone with userptr rewrite by Maarten in ed29c2691188 ("drm/i915: Fix
userptr so we do not have to worry about obj->mm.lock, v7.")
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211022082200.2684194-1-daniel.vetter@ffwll.ch
Normal users shouldn't be hitting this, likely this would indicate a
userspace bug. So don't bother caching, which should be safe now that we
manually flush the page.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-2-matthew.auld@intel.com
The scratch page is directly visible in the users address space, and
while this is forced as CACHE_LLC, by the kernel, we still have to
contend with things like "Bypass-LLC" MOCS. So just flush no matter
what.
v2(Thomas):
- Make sure we use drm_clflush_virt_range here, in case clflush support
is missing.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-1-matthew.auld@intel.com
With GuC handling scheduling, i915 is not aware of the time that a
context is scheduled in and out of the engine. Since i915 pmu relies on
this info to provide engine busyness to the user, GuC shares this info
with i915 for all engines using shared memory. For each engine, this
info contains:
- total busyness: total time that the context was running (total)
- id: id of the running context (id)
- start timestamp: timestamp when the context started running (start)
At the time (now) of sampling the engine busyness, if the id is valid
(!= ~0), and start is non-zero, then the context is considered to be
active and the engine busyness is calculated using the below equation
engine busyness = total + (now - start)
All times are obtained from the gt clock base. For inactive contexts,
engine busyness is just equal to the total.
The start and total values provided by GuC are 32 bits and wrap around
in a few minutes. Since perf pmu provides busyness as 64 bit
monotonically increasing values, there is a need for this implementation
to account for overflows and extend the time to 64 bits before returning
busyness to the user. In order to do that, a worker runs periodically at
frequency = 1/8th the time it takes for the timestamp to wrap. As an
example, that would be once in 27 seconds for a gt clock frequency of
19.2 MHz.
Note:
There might be an over-accounting of busyness due to the fact that GuC
may be updating the total and start values while kmd is reading them.
(i.e kmd may read the updated total and the stale start). In such a
case, user may see higher busyness value followed by smaller ones which
would eventually catch up to the higher value.
v2: (Tvrtko)
- Include details in commit message
- Move intel engine busyness function into execlist code
- Use union inside engine->stats
- Use natural type for ping delay jiffies
- Drop active_work condition checks
- Use for_each_engine if iterating all engines
- Drop seq locking, use spinlock at GuC level to update engine stats
- Document worker specific details
v3: (Tvrtko/Umesh)
- Demarcate GuC and execlist stat objects with comments
- Document known over-accounting issue in commit
- Provide a consistent view of GuC state
- Add hooks to gt park/unpark for GuC busyness
- Stop/start worker in gt park/unpark path
- Drop inline
- Move spinlock and worker inits to GuC initialization
- Drop helpers that are called only once
v4: (Tvrtko/Matt/Umesh)
- Drop addressed opens from commit message
- Get runtime pm in ping, remove from the park path
- Use cancel_delayed_work_sync in disable_submission path
- Update stats during reset prepare
- Skip ping if reset in progress
- Explicitly name execlists and GuC stats objects
- Since disable_submission is called from many places, move resetting
stats to intel_guc_submission_reset_prepare
v5: (Tvrtko)
- Add a trylock helper that does not sleep and synchronize PMU event
callbacks and worker with gt reset
v6: (CI BAT failures)
- DUTs using execlist submission failed to boot since __gt_unpark is
called during i915 load. This ends up calling the GuC busyness unpark
hook and results in kick-starting an uninitialized worker. Let
park/unpark hooks check if GuC submission has been initialized.
- drop cant_sleep() from trylock helper since rcu_read_lock takes care
of that.
v7: (CI) Fix igt@i915_selftest@live@gt_engines
- For GuC mode of submission the engine busyness is derived from gt time
domain. Use gt time elapsed as reference in the selftest.
- Increase busyness calculation to 10ms duration to ensure batch runs
longer and falls within the busyness tolerances in selftest.
v8:
- Use ktime_get in selftest as before
- intel_reset_trylock_no_wait results in a lockdep splat that is not
trivial to fix since the PMU callback runs in irq context and the
reset paths are tightly knit into the driver. The test that uncovers
this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
instead use the reset_count to synchronize with gt reset during pmu
callback. For the ping, continue to use intel_reset_trylock since ping
is not run in irq context.
- GuC PM timestamp does not tick when GuC is idle. This can potentially
result in wrong busyness values when a context is active on the
engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
process the GuC busyness stats. This works since both GuC timestamp and
RING timestamp are synced with the same clock.
- The busyness stats may get updated after the batch starts running.
This delay causes the busyness reported for 100us duration to fall
below 95% in the selftest. The only option at this time is to wait for
GuC busyness to change from idle to active before we sample busyness
over a 100us period.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
In preparation for GuC pmu stats, add a name to the execlists stats
structure so that it can be differentiated from the GuC stats.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-1-umesh.nerlige.ramappa@intel.com
Avoid adding backend specific data to the tracepoints outside of
the LOW_LEVEL_TRACEPOINTS kernel config protection. These bits of
information are bound to change depending on the selected submission
method per platform and are not necessarily possible to maintain in
the future.
Fixes: dbf9da8d55ef ("drm/i915/guc: Add trace point for GuC submit")
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027093255.66489-1-joonas.lahtinen@linux.intel.com
Add Tvrtko Ursulin as a co-maintainer for drm/i915 driver.
Tvrtko will bring added bandwidth and focus to the GT/GEM domain
(drm-intel-gt-next).
This will help with the increased driver maintenance efforts, allows
alternating the drm-intel-gt-next pull requests and also should increase
the coverage for the maintenance in general.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025134907.20078-1-joonas.lahtinen@linux.intel.com
Fix following coccicheck warning:
./drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3117:15-22: WARNING:
ERR_CAST can be used with eb->requests[i].
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025113316.24424-1-wanjiabing@vivo.com
Update live.evict to wait on last request and idle GPU after each loop.
This not only enhances the test to fill the GGTT on each engine class
but also avoid timeouts from igt_flush_test when using GuC submission.
igt_flush_test (idle GPU) can take a long time with GuC submission if
losts of contexts are created due to H2G / G2H required to destroy
contexts.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021214040.33292-1-matthew.brost@intel.com
perf_parallel_engines is micro benchmark to test i915 request
scheduling. The test creates a thread per physical engine and submits
NOP requests and waits the requests to complete in a loop. In execlists
mode this works perfectly fine as powerful CPU has enough cores to feed
each engine and process the CSBs. With GuC submission the uC gets
overwhelmed as all threads feed into a single CTB channel and the GuC
gets bombarded with CSBs as contexts are immediately switched in and out
on the engines due to the zero runtime of the requests. When the GuC is
overwhelmed scheduling of contexts is unfair due to the nature of the
GuC scheduling algorithm. This behavior is understood and deemed
acceptable as this micro benchmark isn't close to real world use case.
Increasing the timeout of wait period for requests to complete. This
makes the test understand that is ok for contexts to get starved in this
scenario.
A future patch / cleanup may just delete these micro benchmark tests as
they basically mean nothing. We care about real workloads not made up
ones.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211011175704.28509-1-matthew.brost@intel.com