IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Execlists are indeed a brave new world with respect to workload
submission to the GPU.
In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.
This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.
v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).
v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.
v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.
For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.
Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.
Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.
v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.
v3: Several rebases and general changes to the code.
v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.
v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.
v6: Add a note about the (presumed) differences between BDW and CHV state
contexts. Also, Brad's review comments:
- Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
- Be less magical about how we set the ring size in the context.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.
v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.
In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.
Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.
v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).
v3:
- Go back to one single global default context, this time with multiple
backing objects inside.
- Use different context sizes for non-render engines, as suggested by
Damien (still hardcoded, since the information about the context size
registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
that we don't waste memory in rings that might not get initialized).
v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.
v5: Several rebases to account for the changes in the previous patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).
In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we nee.
With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.
Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.
v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep rcs_state as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what he suggested for ring->buffer (Rusty's API
level 5).
v3: Use the name "state" instead of the too-generic "obj", so that it
mirrors the name choice for the legacy rcs_state.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the moment this is just a placeholder, but it shows one of the
main differences between the good ol' HW contexts and the shiny
new Logical Ring Contexts: LR contexts allocate and free their
own backing objects. Another difference is that the allocation is
deferred (as the create function name suggests), but that does not
happen in this patch yet, because for the moment we are only dealing
with the default context.
Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.
Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).
v2: Daniel says "explain, do not showcase".
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Depending upon one module option to be sanitized (through USES_PPGTT)
for the other is a bit too fragile for my taste. At least WARN about
this.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".
The macro is defined to off until we have things in place to hope to
work.
v2: Rename "advanced contexts" to the more correct "logical ring
contexts".
v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.
v4: Add an intel_enable_execlists function.
v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.
Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway.
And some point in time, it would be a good idea to split intel_lrc.c/.h
even further, but for the moment just shove everything together.
v2: Change to intel_lrc.c
v3: Squash together with the header file addition
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As usual in both a crtc index and a struct drm_crtc * version.
The function assumes that no one drivers their display below 10Hz, and
it will complain if the vblank wait takes longer than that.
v2: Also check dev->max_vblank_counter since some drivers register a
fake get_vblank_counter function.
v3: Use drm_vblank_count instead of calling the low-level
->get_vblank_counter callback. That way we'll get the sw-cooked
counter for platforms without proper vblank support and so can ditch
the max_vblank_counter check again.
v4: Review from Michel Dänzer:
- Restore lost notes about v3:
- Spelling in kerneldoc.
- Inline wait_event condition.
- s/vblank_wait/wait_one_vblank/
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In general having this can't hurt, and the atomic helpers will need
it to be able to reset the state objects properly. The overall idea
is to reset in the order pixels flow, so planes -> crtcs ->
encoders -> connectors.
v2: Squash in fixup from Ville to correctly deference struct drm_plane
instead of drm_crtc when walking the plane list. Fixes an oops in
driver init and resume.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Even though we should not try to use 4+GiB GTTs on 32-bit systems, by
using a local variable we can future proof the code whilst making it
easier to read.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Appease checkpatch a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Part of the pre-validation for an execbuffer call is that there is at
least one object in the execlist. As we bail if we fail to lookup any
object, we can be sure that after the eb_lookup_vma() there is at least
one object in the vma list and so we do not need to assert.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have an implementation requirement that precludes the user from
requesting a ggtt entry when the device is operating in ppgtt mode. Move
the current check from inside the execbuffer object collation to the
prevalidation phase.
v2: Roll both invalid flags checks into one
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based upon a hunk from a patch from Chris Wilson, but augmented to:
- Process the batch in the full ppgtt vm so that self-relocations
match again with userspace's expectations..
- Add a comment why plain pin for the global gtt binding is safe at
that point.
v2: Drop local bind_vm variable (Chris).
v3: Explain why this works despite the lack of proper active tracking
for the ggtt batch vma.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Adapt the macro so that we can pass either the struct drm_device or the
struct drm_i915_private pointers and get the answer we want. Over time,
my plan is to convert all users over to using drm_i915_private and so
trimming down the pointer dance. Having spent a few hours chasing that
goal and achieved over 8k of object code saving, it appears to be a
worthwhile target. This interim macro allows us to slowly convert over.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Drop the (struct drm_device *) cast per the m-l discussion.
Also explain the seemingly unecessary first cast.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
During ring initialisation, sometimes we observe, though not in
production hardware, that the idle flag is not set even though the ring
is empty. Double check before giving up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is so that we can make the drm_i915_private->info always the
preferred source for chipset type and feature queries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.
Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.
v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale
References: https://bugs.freedesktop.org/show_bug.cgi?id=81094
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If an object is not bound into the global GTT, then it cannot be
accessed via the GTT. This restores the original code that was muddled
by ppGTT. In the process, we remove a WARN that had long outlived its
usefulness and was simply being coded around instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I keep telling myself that those tables aren't great because their size
is the number of dwords we need to program and not the number of entries
(number of dwords = number of entries * 2).
And... I got it wrong when I refactored the code. Fortunately, it was
only wrong when the VBT table (or the code parsing it) is itself
erroneous. Long story short, it shouldn't matter, but still, there's a
potential array overflow and random programming of the DDI translation
tables.
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Removing the check for HAS_PCH_SPLIT, it looks redundant here. Anyways all the
platforms are checked separately.
v2: Reordering as per the gen (Ville)
Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull nouveau drm updates from Ben Skeggs:
"Apologies for not getting this done in time for Dave's drm-next merge
window. As he mentioned, a pre-existing bug reared its head a lot
more obviously after this lot of changes. It took quite a bit of time
to track it down. In any case, Dave suggested I try my luck by
sending directly to you this time.
Overview:
- more code for Tegra GK20A from NVIDIA - probing, reclockig
- better fix for Kepler GPUs that have the graphics engine powered
off on startup, method courtesy of info provided by NVIDIA
- unhardcoding of a bunch of graphics engine setup on
Fermi/Kepler/Maxwell, will hopefully solve some issues people have
noticed on higher-end models
- support for "Zero Bandwidth Clear" on Fermi/Kepler/Maxwell, needs
userspace support in general, but some lucky apps will benefit
automagically
- reviewed/exposed the full object APIs to userspace (finally), gives
it access to perfctrs, ZBC controls, various events. More to come
in the future.
- various other fixes"
Acked-by: Dave Airlie <airlied@redhat.com>
* 'linux-3.17' of git://anongit.freedesktop.org/git/nouveau/linux-2.6: (87 commits)
drm/nouveau: expose the full object/event interfaces to userspace
drm/nouveau: fix headless mode
drm/nouveau: hide sysfs pstate file behind an option again
drm/nv50/disp: shhh compiler
drm/gf100-/gr: implement the proper SetShaderExceptions method
drm/gf100-/gr: remove some broken ltc bashing, for now
drm/gf100-/gr: unhardcode attribute cb config
drm/gf100-/gr: fetch tpcs-per-ppc info on startup
drm/gf100-/gr: unhardcode pagepool config
drm/gf100-/gr: unhardcode bundle cb config
drm/gf100-/gr: improve initial context patch list helpers
drm/gf100-/gr: add support for zero bandwidth clear
drm/nouveau/ltc: add zbc drivers
drm/nouveau/ltc: s/ltcg/ltc/ + cleanup
drm/nouveau: use ram info from nvif_device
drm/nouveau/disp: implement nvif event sources for vblank/connector notifiers
drm/nouveau/disp: allow user direct access to channel control registers
drm/nouveau/disp: audit and version display classes
drm/nouveau/disp: audit and version SCANOUTPOS method
drm/nv50-/disp: audit and version PIOR_PWR method
...
No-one has yet had time to move this to debugfs as discussed during
the last merge window. Until this happens, hide the option to make
it clear it's not going to be here forever.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We have another version of it implemented in SW, however, that version
isn't serialised with normal PGRAPH operation and can possibly clobber
the enables for another context.
This is the same method that's implemented by the NVIDIA binary driver.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
... and hope that the defaults are good enough. This was always
supposed to be a read/modify/write thing anyway, so we're writing
very wrong stuff for some boards already.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Should be the same values as before, except:
GF117 has smaller buffer allocated, as per register setup.
GK20A now uses values from Tegra driver, not GK104's.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Removes need for fixed buffer indices, and allows the functions
utilising them to also be run outside of context generation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Default ZBC table is compatible with binary driver defaults.
Userspace will need to be updated to take full advantage of this
feature, however, some applications will see a performance boost
without updated drivers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>