IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This means that the proto-context needs to grow support for engine
configuration information as well as setparam logic. Fortunately, we'll
be deleting a lot of setparam logic on the primary context shortly so it
will hopefully balance out.
There's an extra bit of fun here when it comes to setting SSEU and the
way it interacts with PARAM_ENGINES. Unfortunately, thanks to
SET_CONTEXT_PARAM and not being allowed to pick the order in which we
handle certain parameters, we have think about those interactions.
v2 (Daniel Vetter):
- Add a proto_context_free_user_engines helper
- Comment on SSEU in the commit message
- Use proto_context_set_persistence in set_proto_ctx_param
v3 (Daniel Vetter):
- Fix a doc comment
- Do an explicit HAS_FULL_PPGTT check in set_proto_ctx_vm instead of
relying on pc->vm != NULL.
- Handle errors for CONTEXT_PARAM_PERSISTENCE
- Don't allow more resetting user engines
- Rework initialization of UCONTEXT_PERSISTENCE
v4 (Jason Ekstrand):
- Move hand-rolled initialization of UCONTEXT_PERSISTENCE to an
earlier patch
v5 (Jason Ekstrand):
- Move proto_context_set_persistence to this patch
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-22-jason@jlekstrand.net
What we really want to check is that size of the engines array, i.e.
args->size - sizeof(*user) is divisible by the element size, i.e.
sizeof(*user->engines) because that's what's required for computing the
array length right below the check. However, we're currently not doing
this and instead doing a compile-time check that sizeof(*user) is
divisible by sizeof(*user->engines) and avoiding the subtraction. As
far as I can tell, the only reason for the more confusing pair of checks
is to avoid a single subtraction of a constant.
The other thing the BUILD_BUG_ON might be trying to implicitly check is
that offsetof(user->engines) == sizeof(*user) and we don't have any
weird padding throwing us off. However, that's not the check it's doing
and it's not even a reliable way to do that check.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-21-jason@jlekstrand.net
This is the VM equivalent of i915_gem_context_lookup. It's only used
once in this patch but future patches will need to duplicate this lookup
code so it's better to have it in a helper.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-20-jason@jlekstrand.net
For now this is a no-op because everyone passes in a null SSEU but it
lets us get some of the error handling and selftest refactoring plumbed
through.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-19-jason@jlekstrand.net
The current context uAPI allows for two methods of setting context
parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM. The
former is allowed to be called at any time while the later happens as
part of GEM_CONTEXT_CREATE. Currently, everything settable via one is
settable via the other. While some params are fairly simple and setting
them on a live context is harmless such the context priority, others are
far trickier such as the VM or the set of engines. In order to swap out
the VM, for instance, we have to delay until all current in-flight work
is complete, swap in the new VM, and then continue. This leads to a
plethora of potential race conditions we'd really rather avoid.
Unfortunately, both methods of setting the VM and the engine set are in
active use today so we can't simply disallow setting the VM or engine
set vial SET_CONTEXT_PARAM. In order to work around this wart, this
commit adds a proto-context struct which contains all the context create
parameters.
v2 (Daniel Vetter):
- Better commit message
- Use __set/clear_bit instead of set/clear_bit because there's no race
and we don't need the atomics
v3 (Daniel Vetter):
- Use manual bitops and BIT() instead of __set_bit
v4 (Daniel Vetter):
- Add a changelog to the commit message
- Better hyperlinking in docs
- Create the default PPGTT in i915_gem_create_context
v5 (Daniel Vetter):
- Hand-roll the initialization of UCONTEXT_PERSISTENCE
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-17-jason@jlekstrand.net
In order to prevent kernel doc warnings, also fill out docs for any
missing fields and fix those that forgot the "@".
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-16-jason@jlekstrand.net
With the proto-context stuff added later in this series, we end up
having to duplicate set_priority. This lets us avoid duplicating the
validation logic.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-15-jason@jlekstrand.net
As far as I can tell, the only real reason for this is to avoid taking a
reference to the i915_gem_context. The cost of those two atomics
probably pales in comparison to the cost of the ioctl itself so we're
really not buying ourselves anything here. We're about to make context
lookup a tiny bit more complicated, so let's get rid of the one hand-
rolled case.
Some usermode drivers such as our Vulkan driver call GET_RESET_STATS on
every execbuf so the perf here could theoretically be an issue. If this
ever does become a performance issue for any such userspace drivers,
they can use set CONTEXT_PARAM_RECOVERABLE to false and look for -EIO
coming from execbuf to check for hangs instead.
v2 (Daniel Vetter):
- Add a comment in the commit message about recoverable contexts
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-14-jason@jlekstrand.net
This was only ever used for FENCE_SUBMIT automatic engine selection
which was removed in the previous commit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-12-jason@jlekstrand.net
Even though FENCE_SUBMIT is only documented to wait until the request in
the in-fence starts instead of waiting until it completes, it has a bit
more magic than that. If FENCE_SUBMIT is used to submit something to a
balanced engine, we would wait to assign engines until the primary
request was ready to start and then attempt to assign it to a different
engine than the primary. There is an IGT test (the bonded-slice subtest
of gem_exec_balancer) which exercises this by submitting a primary batch
to a specific VCS and then using FENCE_SUBMIT to submit a secondary
which can run on any VCS and have i915 figure out which VCS to run it on
such that they can run in parallel.
However, this functionality has never been used in the real world. The
media driver (the only user of FENCE_SUBMIT) always picks exactly two
physical engines to bond and never asks us to pick which to use.
v2 (Daniel Vetter):
- Mention the exact IGT test this breaks
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-11-jason@jlekstrand.net
This adds a bunch of complexity which the media driver has never
actually used. The media driver does technically bond a balanced engine
to another engine but the balanced engine only has one engine in the
sibling set. This doesn't actually result in a virtual engine.
This functionality was originally added to handle cases where we may
have more than two video engines and media might want to load-balance
their bonded submits by, for instance, submitting to a balanced vcs0-1
as the primary and then vcs2-3 as the secondary. However, no such
hardware has shipped thus far and, if we ever want to enable such
use-cases in the future, we'll use the up-and-coming parallel submit API
which targets GuC submission.
This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op. We leave the
validation code in place in case we ever decide we want to do something
interesting with the bonding information.
v2 (Jason Ekstrand):
- Don't delete quite as much code.
v3 (Tvrtko Ursulin):
- Add some history to the commit message
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-10-jason@jlekstrand.net
This has never been used by any userspace except IGT and provides no
real functionality beyond parroting back parameters userspace passed in
as part of context creation or via setparam. If the context is in
legacy mode (where you use I915_EXEC_RENDER and friends), it returns
success with zero data so it's not useful for discovering what engines
are in the context. It's also not a replacement for the recently
removed I915_CONTEXT_CLONE_ENGINES because it doesn't return any of the
balancing or bonding information.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-9-jason@jlekstrand.net
This API is entirely unnecessary and I'd love to get rid of it. If
userspace wants a single timeline across multiple contexts, they can
either use implicit synchronization or a syncobj, both of which existed
at the time this feature landed. The justification given at the time
was that it would help GL drivers which are inherently single-timeline.
However, neither of our GL drivers actually wanted the feature. i965
was already in maintenance mode at the time and iris uses syncobj for
everything.
Unfortunately, as much as I'd love to get rid of it, it is used by the
media driver so we can't do that. We can, however, do the next-best
thing which is to embed a syncobj in the context and do exactly what
we'd expect from userspace internally. This isn't an entirely identical
implementation because it's no longer atomic if userspace races with
itself by calling execbuffer2 twice simultaneously from different
threads. It won't crash in that case; it just doesn't guarantee any
ordering between those two submits. It also means that sync files
exported from different engines on a SINGLE_TIMELINE context will have
different fence contexts. This is visible to userspace if it looks at
the obj_name field of sync_fence_info.
Moving SINGLE_TIMELINE to a syncobj emulation has a couple of technical
advantages beyond mere annoyance. One is that intel_timeline is no
longer an api-visible object and can remain entirely an implementation
detail. This may be advantageous as we make scheduler changes going
forward. Second is that, together with deleting the CLONE_CONTEXT API,
we should now have a 1:1 mapping between intel_context and
intel_timeline which may help us reduce locking.
v2 (Tvrtko Ursulin):
- Update the comment on i915_gem_context::syncobj to mention that it's
an emulation and the possible race if userspace calls execbuffer2
twice on the same context concurrently.
v2 (Jason Ekstrand):
- Wrap the checks for eb.gem_context->syncobj in unlikely()
- Drop the dma_fence reference
- Improved commit message
v3 (Jason Ekstrand):
- Move the dma_fence_put() to before the error exit
v4 (Tvrtko Ursulin):
- Add a comment about fence contexts to the commit message
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-8-jason@jlekstrand.net
This API allows one context to grab bits out of another context upon
creation. It can be used as a short-cut for setparam(getparam()) for
things like I915_CONTEXT_PARAM_VM. However, it's never been used by any
real userspace. It's used by a few IGT tests and that's it. Since it
doesn't add any real value (most of the stuff you can CLONE you can copy
in other ways), drop it.
There is one thing that this API allows you to clone which you cannot
clone via getparam/setparam: timelines. However, timelines are an
implementation detail of i915 and not really something that needs to be
exposed to userspace. Also, sharing timelines between contexts isn't
obviously useful and supporting it has the potential to complicate i915
internally. It also doesn't add any functionality that the client can't
get in other ways. If a client really wants a shared timeline, they can
use a syncobj and set it as an in and out fence on every submit.
v2 (Jason Ekstrand):
- More detailed commit message
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-7-jason@jlekstrand.net
Instead of handling it like a context param, unconditionally set it when
intel_contexts are created. For years we've had the idea of a watchdog
uAPI floating about. The aim was for media, so that they could set very
tight deadlines for their transcodes jobs, so that if you have a corrupt
bitstream (especially for decoding) you don't hang your desktop too
hard. But it's been stuck in limbo since forever, and this simplifies
things a bit in preparation for the proto-context work. If we decide to
actually make said uAPI a reality, we can do it through the proto-
context easily enough.
This does mean that we move from reading the request_timeout_ms param
once per engine when engines are created instead of once at context
creation. If someone changes request_timeout_ms between creating a
context and setting engines, it will mean that they get the new timeout.
If someone races setting request_timeout_ms and context creation, they
can theoretically end up with different timeouts. However, since both
of these are fairly harmless and require changing kernel params, we
don't care.
v2 (Tvrtko Ursulin):
- Add a comment about races with request_timeout_ms
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-5-jason@jlekstrand.net
The idea behind this param is to support OpenCL drivers with relocations
because OpenCL reserves 0x0 for NULL and, if we placed memory there, it
would confuse CL kernels. It was originally sent out as part of a patch
series including libdrm [1] and Beignet [2] support. However, the
libdrm and Beignet patches never landed in their respective upstream
projects so this API has never been used. It's never been used in Mesa
or any other driver, either.
Dropping this API allows us to delete a small bit of code.
[1]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067030.html
[2]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067031.html
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-4-jason@jlekstrand.net
Previously, we were storing the ring size in the ring pointer before it
was actually allocated. We would then guard setting the ring size on
checking for CONTEXT_ALLOC_BIT. This is error-prone at best and really
only saves us a few bytes on something that already burns at least 4K.
Instead, this patch adds a new ring_size field and makes everything use
that.
v2 (Daniel Vetter):
- Replace 512 * SZ_4K with SZ_2M
v2 (Jason Ekstrand):
- Rebase on top of page migration code
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-3-jason@jlekstrand.net
This reverts commit 88be76cdafc7 ("drm/i915: Allow userspace to specify
ringsize on construction"). This API was originally added for OpenCL
but the compute-runtime PR has sat open for a year without action so we
can still pull it out if we want. I argue we should drop it for three
reasons:
1. If the compute-runtime PR has sat open for a year, this clearly
isn't that important.
2. It's a very leaky API. Ring size is an implementation detail of the
current execlist scheduler and really only makes sense there. It
can't apply to the older ring-buffer scheduler on pre-execlist
hardware because that's shared across all contexts and it won't
apply to the GuC scheduler that's in the pipeline.
3. Having userspace set a ring size in bytes is a bad solution to the
problem of having too small a ring. There is no way that userspace
has the information to know how to properly set the ring size so
it's just going to detect the feature and always set it to the
maximum of 512K. This is what the compute-runtime PR does. The
scheduler in i915, on the other hand, does have the information to
make an informed choice. It could detect if the ring size is a
problem and grow it itself. Or, if that's too hard, we could just
increase the default size from 16K to 32K or even 64K instead of
relying on userspace to do it.
Let's drop this API for now and, if someone decides they really care
about solving this problem, they can do it properly.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-2-jason@jlekstrand.net
Add ADL-P to the list of supported GuC and HuC firmware versions. For
HuC, it reuses the existing TGL firmware file. For GuC, there is a
dedicated firmware release.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210626004522.1699509-3-John.C.Harrison@Intel.com
46 bit addressing enables you to use 4 bits to support some
MKTME features, and 3 more bits for Optane support that uses
a subset of MTKME for persistent memory.
But GTT addressing sticking to 39 bit addressing, thus setting
dma_mask_size to 39 fixes below tests :
igt@i915_selftest@live@mman
igt@kms_big_fb@linear-32bpp-rotate-0
igt@gem_create@create-clear
igt@gem_mmap_offset@clear
igt@gem_mmap_gtt@cpuset-big-copy
In a way solves Gitlab#3142
https://gitlab.freedesktop.org/drm/intel/-/issues/3142, which had
following errors :
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr
7effff9000 [fault reason 05] PTE Write access is not set
0x7effff9000 is suspiciously exactly 39 bits, so it seems likely that
the HW just ends up masking off those extra bits hence DMA errors.
Changes since V2 :
- dim checkpatch error solved
Changes since V1 :
- Added more details to commit message - Matthew Auld
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708071222.955455-1-tejaskumarx.surendrakumar.upadhyay@intel.com
Commit c816723b6b8a ("drm/i915/gt: replace IS_GEN and friends with
GRAPHICS_VER") converted INTEL_GEN and friends to the new version check
macros. Meanwhile, some changes sneaked in to use INTEL_GEN. Remove the
last users so we can remove the macros.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210707181325.2130821-2-lucas.demarchi@intel.com
If mock_region_create fails then mem will be an error pointer. Instead
we just need to use the correct ordering for the onion unwind.
igt_mock_reserve() error: 'mem' dereferencing possible ERR_PTR()
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210702104642.1189978-2-matthew.auld@intel.com
The block here can't be NULL, especially since we already dereferenced
it earlier, so remove the redundant check.
igt_check_blocks() warn: variable dereferenced before check 'block' (see line 126)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210702104642.1189978-1-matthew.auld@intel.com
We're not consistently recommending these for developers only.
I stumbled over this due to DRM_I915_LOW_LEVEL_TRACEPOINTS, which was
added in
commit 354d036fcf70654cff2e2cbdda54a835d219b9d2
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue Feb 21 11:01:42 2017 +0000
drm/i915/tracepoints: Add request submit and execute tracepoints
to "alleviate the performance impact concerns."
Which is nonsense.
Tvrtko and Joonas pointed out on irc that the real (but undocumented
reason) was stable abi concerns for tracepoints, see
https://lwn.net/Articles/705270/
and the specific change that was blocked around tracepoints:
https://lwn.net/Articles/442113/
Anyway to make it a notch clearer why we have this Kconfig option
consistly add the "Recommended for driver developers only." to it and
all the other debug options we have.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210702201708.2075793-1-daniel.vetter@ffwll.ch
The code in xcs_resume() probably didn't work as intended. It uses
struct drm_device.irq, which is allocated to 0, but never initialized
by i915 to the device's interrupt number.
Change all calls to synchronize_hardirq() to intel_synchronize_irq(),
which uses the correct interrupt. _hardirq() functions are not needed
in this context.
v5:
* go back to _hardirq() after PCI probe reported wrong
context; add rsp comment
v4:
* switch everything to intel_synchronize_irq() (Daniel)
v3:
* also use intel_synchronize_hardirq() at another callsite
v2:
* wrap irq code in intel_synchronize_hardirq() (Ville)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 536f77b1caa0 ("drm/i915/gt: Call stop_ring() from ring resume, again")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210701173618.10718-2-tzimmermann@suse.de
The min_page_size is only needed for pages inserted into the GTT, and
for our paging structures we only need at most 4K bytes, so simply
ignore the min_page_size restrictions here, otherwise we might see some
severe overallocation on some devices.
v2(Thomas): add some commentary
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625103824.558481-2-matthew.auld@intel.com
For some specialised objects we might need something larger than the
regions min_page_size due to some hw restriction, and slightly more
hairy is needing something smaller with the guarantee that such objects
will never be inserted into any GTT, which is the case for the paging
structures.
This also fixes how we setup the BO page_alignment, if we later migrate
the object somewhere else. For example if the placements are {SMEM,
LMEM}, then we might get this wrong. Pushing the min_page_size behaviour
into the manager should fix this.
v2(Thomas): push the default page size behaviour into buddy_man, and let
the user override it with the page-alignment, which looks cleaner
v3: rebase on ttm sys changes
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625103824.558481-1-matthew.auld@intel.com
Objects intended to be used as display framebuffers must reside in
LMEM for discrete. If they happen to not do that, migrate them to
LMEM before pinning.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-4-thomas.hellstrom@linux.intel.com
A selftest for the gem object migrate functionality. Slightly adapted
from the original by Matthew to the new interface and new fill blit
code.
v4:
- Initialize buffers and check contents after migration
(Suggested by Matthew Auld)
- Perform async migration (if implemented) in the igt_lmem_pages_migrate
test
- Test also migration to the current region.
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> #v3
Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-3-thomas.hellstrom@linux.intel.com
Introduce an interface to migrate objects between regions.
This is primarily intended to migrate objects to LMEM for display and
to SYSTEM for dma-buf, but might be reused in one form or another for
performance-based migration.
v2:
- Verify that the memory region given as an id really exists.
(Reported by Matthew Auld)
- Call i915_gem_object_{init,release}_memory_region() when switching region
to handle also switching region lists. (Reported by Matthew Auld)
v3:
- Fix i915_gem_object_can_migrate() to return true if object is already in
the correct region, even if the object ops doesn't have a migrate()
callback.
- Update typo in commit message.
- Fix kerneldoc of i915_gem_object_wait_migration().
v4:
- Improve documentation (Suggested by Mattew Auld and Michael Ruhl)
- Always assume TTM migration hits a TTM move and unsets the pages through
move_notify. (Reported by Matthew Auld)
- Add a dma_fence_might_wait() annotation to
i915_gem_object_wait_migration() (Suggested by Daniel Vetter)
v5:
- Re-add might_sleep() instead of __dma_fence_might_wait(), Sent
v4 with the wrong version, didn't compile and __dma_fence_might_wait()
is not exported.
- Added an R-B.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-2-thomas.hellstrom@linux.intel.com
Add entry for i915 new parallel submission uAPI plan.
v2:
(Daniel Vetter):
- Expand logical order explaination
- Add dummy header
- Only allow N BBs in execbuf IOCTL
- Configure parallel submission per slot not per gem context
v3:
(Marcin Ślusarz):
- Lot's of typos / bad english fixed
(Tvrtko Ursulin):
- Consistent pseudo code, clean up wording in descriptions
v4:
(Daniel Vetter)
- Drop flags
- Add kernel doc
- Reword a few things / fix typos
(Tvrtko)
- Reword a few things / fix typos
v5:
(Checkpatch)
- Fix typos
(Docs)
- Fix warning
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
CC: Carl Zhang <carl.zhang@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Tony Ye <tony.ye@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210629193511.124099-3-matthew.brost@intel.com
Add entry for i915 GuC submission / DRM scheduler integration plan.
Follow up patch with details of new parallel submission uAPI to come.
v2:
(Daniel Vetter)
- Expand explaination of why bonding isn't supported for GuC
submission
- CC some of the DRM scheduler maintainers
- Add priority inheritance / boosting use case
- Add reasoning for removing in order assumptions
(Daniel Stone)
- Add links to priority spec
v4:
(Tvrtko)
- Add TODOs section
(Daniel Vetter)
- Pull in 1 line from following patch
v5:
(Checkpatch)
- Fix typos
Cc: Christian König <christian.koenig@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210629193511.124099-2-matthew.brost@intel.com
EHL and JSL are also observing requirement for 80ns interval for
CTX_TIMESTAMP thus extending it to GEN11.
Changes since V1:
- IS_GEN replaced by GRAPHICS_VER - Tvrtko
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210624112250.895410-1-tejaskumarx.surendrakumar.upadhyay@intel.com
Reinstate the mmap ioctl for all current integrated platforms.
The intention was really to have it disabled for discrete graphics
where we enforce a single mmap mode.
This was reported to break ADL-P with the media stack, which was not the
intention. Although longer term we do still plan to sunset this ioctl
even for integrated, in favour of using mmap_offset instead.
Fixes: 35cbd91eb541 ("drm/i915: Disable mmap ioctl for gen12+")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210624112914.311984-1-thomas.hellstrom@linux.intel.com
For discrete, use TTM for both cached and WC system memory. That means
we currently rely on the TTM memory accounting / shrinker. For cached
system memory we should consider remaining shmem-backed, which can be
implemented from our ttm_tt_populate callback. We can then also reuse our
own very elaborate shrinker for that memory.
If an object is evicted to a gem allowable region, we will now consider
the object migrated, and we flip the gem region and move the object to a
different region list. Since we are now changing gem regions, we can't
any longer rely on the CONTIGUOUS flag being set based on the region
min page size, so remove that flag update. If we want to reintroduce it,
we need to put it in the mutable flags.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210624084240.270219-4-thomas.hellstrom@linux.intel.com
After a TTM move or object init we need to update the i915 gem flags and
caching settings to reflect the new placement. Currently caching settings
are not changed during the lifetime of an object, although that might
change moving forward if we run into performance issues or issues with
WC system page allocations.
Also introduce gpu_binds_iomem() and cpu_maps_iomem() to clean up the
various ways we previously used to detect this.
Finally, initialize the TTM object reserved to be able to update
flags and caching before anyone else gets hold of the object.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210624084240.270219-3-thomas.hellstrom@linux.intel.com
The object ops i915_GEM_OBJECT_HAS_IOMEM and the object
I915_BO_ALLOC_STRUCT_PAGE flags are considered immutable by
much of our code. Introduce a new mem_flags member to hold these
and make sure checks for these flags being set are either done
under the object lock or with pages properly pinned. The flags
will change during migration under the object lock.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210624084240.270219-2-thomas.hellstrom@linux.intel.com
In
commit ebc0808fa2da0548a78e715858024cb81cd732bc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Oct 18 13:02:51 2016 +0100
drm/i915: Restrict pagefault disabling to just around copy_from_user()
we entirely missed that there's a slow path call to eb_relocate_entry
(or i915_gem_execbuffer_relocate_entry as it was called back then)
which was left fully wrapped by pagefault_disable/enable() calls.
Previously any issues with blocking calls where handled by the
following code:
/* we can't wait for rendering with pagefaults disabled */
if (pagefault_disabled() && !object_is_idle(obj))
return -EFAULT;
Now at this point the prefaulting was still around, which means in
normal applications it was very hard to hit this bug. No idea why the
regressions in igts weren't caught.
Now this all changed big time with 2 patches merged closely together.
First
commit 2889caa9232109afc8881f29a2205abeb5709d0c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 16 15:05:19 2017 +0100
drm/i915: Eliminate lots of iterations over the execobjects array
removes the prefaulting from the first relocation path, pushing it into
the first slowpath (of which this patch added a total of 3 escalation
levels). This would have really quickly uncovered the above bug, were
it not for immediate adding a duct-tape on top with
commit 7dd4f6729f9243bd7046c6f04c107a456bda38eb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 16 15:05:24 2017 +0100
drm/i915: Async GPU relocation processing
by pushing all all the relocation patching to the gpu if the buffer
was busy, which avoided all the possible blocking calls.
The entire slowpath was then furthermore ditched in
commit 7dc8f1143778a35b190f9413f228b3cf28f67f8d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Mar 11 16:03:10 2020 +0000
drm/i915/gem: Drop relocation slowpath
and resurrected in
commit fd1500fcd4420eee06e2c7f3aa6067b78ac05871
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed Aug 19 16:08:43 2020 +0200
Revert "drm/i915/gem: Drop relocation slowpath".
but this did not further impact what's going on.
Since pagefault_disable/enable is an atomic section, any sleeping in
there is prohibited, and we definitely do that without gpu relocations
since we have to wait for the gpu usage to finish before we can patch
up the relocations.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618214503.1773805-1-daniel.vetter@ffwll.ch
A little bit of documentation covering the topics of engine discovery,
context engine maps and virtual engines. It is not very detailed but
supposed to be a starting point of giving a brief high level overview of
general principles and intended use cases.
v2:
* Have the text in uapi header and link from there.
v4:
* Link from driver-uapi.rst.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618150036.2507653-1-tvrtko.ursulin@linux.intel.com
GuC ABI documentation is now ready to be included in i915.rst
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616001302.84233-4-matthew.brost@intel.com
Most of the changes to the 62.0.0 firmware revolved around CTB
communication channel. Conform to the new (stable) CTB protocol.
v2:
(Michal)
Add values back to kernel DOC for actions
(Docs)
Add 'CT buffer' back in to fix warning
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[mattrope: Tweaked kerneldoc while pushing as suggested by Daniele/Michal]
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616001302.84233-3-matthew.brost@intel.com
New GuC firmware will unify format of MMIO and CTB H2G messages.
Introduce their definitions now to allow gradual transition of
our code to match new changes.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616001302.84233-2-matthew.brost@intel.com
The submission tasklet operates on i915_sched_engine, thus it is the
correct place for it.
v3:
(Jason Ekstrand)
Change sched_engine->engine to a void* private data pointer
Add kernel doc
v4:
(Daniele)
Update private_data comment
Set queue_priority_hint in kick_execlists
v5:
(CI)
Rebase and fix build error
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-9-matthew.brost@intel.com
Rather passing around an intel_engine_cs in the scheduling code, pass
around a i915_sched_engine.
v3:
(Jason Ekstrand)
Add READ_ONCE around rq->engine in lock_sched_engine
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-8-matthew.brost@intel.com