Commit Graph

1518 Commits

Author SHA1 Message Date
695dc55b57 drm/i915/tgl: Fix Media power gate sequence.
Some media power gates are disabled by default. commit 5d86923060
("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
tried to enable it, but it duplicated an existent register.
So, the main PG setup sequences ended up overwriting it.

So, let's now merge this to the main PG setup sequence.

v2: (Chris): s/BIT/REG_BIT, remove useless comment,
    	     remove useless =0, use the right gt,
	     remove rc6 sequence doubt from commit message.

Fixes: 5d86923060 ("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org#v5.5+
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201111072859.1186070-1-rodrigo.vivi@intel.com
2020-11-11 15:07:10 +00:00
512bce50a4 Merge v5.10-rc3 into drm-next
We need commit f8f6ae5d07 ("mm: always have io_remap_pfn_range() set
pgprot_decrypted()") to be able to merge Jason's cleanup patch.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2020-11-10 14:36:36 +01:00
bda3002485 drm/i915: Improve record of hung engines in error state
Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Improve
this by passing in the hung engine mask down to error capture.

Result is that the list of engines in user visible "GPU HANG: ecode
<gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
active engines. Most importantly the displayed process is now the one
which was actually hung.

v2:
 * Stub prototype. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
2020-11-09 11:59:43 +00:00
ad18fa0f5f drm/i915: Correctly set SFC capability for video engines
SFC capability of video engines is not set correctly because i915
is testing for incorrect bits.

Fixes: c5d3e39caa ("drm/i915: Engine discovery query")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106011842.36203-1-daniele.ceraolospurio@intel.com
2020-11-06 10:04:15 +00:00
e047c7be17 Merge tag 'drm-intel-next-queued-2020-11-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 features for v5.11

Highlights:
- More DG1 enabling (Lucas, Matt, Aditya, Anshuman, Clinton, Matt, Stuart, Venkata)
- Integer scaling filter support (Pankaj Bharadiya)
- Asynchronous flip support (Karthik)

Generic:
- Fix gen12 forcewake tables (Matt)
- Haswell PCI ID updates (Alexei Podtelezhnikov)

Display:
- ICL+ DSI command mode enabling (Vandita)
- Shutdown displays grafecully on reboot/shutdown (Ville)
- Don't register display debugfs when there is no display (Lucas)
- Fix RKL CDCLK table (Matt)
- Limit EHL/JSL eDP to HBR2 (José)
- Handle incorrectly set (by BIOS) PLLs and DP link rates at probe (Imre)
- Fix mode valid check wrt bpp for "YCbCr 4:2:0 only" modes (Ville)
- State checker and dump fixes (Ville)
- DP AUX backlight updates (Aaron Ma, Sean Paul)
- Add DP LTTPR non-transparent link training mode (Imre)
- PSR2 selective fetch enabling (José)
- VBT updates (José)
- HDCP updates (Ramalingam)

Cleanups and refactoring:
- HPD pin, AUX channel, and Type-C port identifier cleanup (Ville)
- Hotplug and irq refactoring (Ville)
- Better DDI encoder and AUX channel names (Ville)
- Color LUT code cleanups (Ville)
- Combo PHY code cleanups (Ville)
- LSPCON code cleanups (Ville)
- Documentation fixes (Mauro, Chris)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87o8kehbaj.fsf@intel.com
2020-11-04 12:17:34 +10:00
e67d01d849 drm/i915/gt: Flush xcs before tgl breadcrumbs
In a simple test case that writes to scratch and then busy-waits for the
batch to be signaled, we observe that the signal is before the write is
posted. That is bad news.

Splitting the flush + write_dword into two separate flush_dw prevents
the issue from being reproduced, we can presume the post-sync op is not
so post-sync.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216
Testcase: igt/gem_exec_fence/parallel
Testcase: igt/i915_selftest/live/gt_timelines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
(cherry picked from commit 09212e81e5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-11-03 19:22:42 -05:00
306bb61d6b drm/i915/gt: Expose more parameters for emitting writes into the ring
Add another lower level to emit_ggtt_write so that the GGTT nature of
the write is not hardcoded into the emitter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-1-chris@chris-wilson.co.uk
(cherry picked from commit 2739d8cfc5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-11-03 19:21:42 -05:00
8ce70996f7 drm/i915/gt: Use the local HWSP offset during submission
We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.

Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.

Fixes: ebece75392 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
(cherry picked from commit c10f6019d0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-11-03 19:14:08 -05:00
09212e81e5 drm/i915/gt: Flush xcs before tgl breadcrumbs
In a simple test case that writes to scratch and then busy-waits for the
batch to be signaled, we observe that the signal is before the write is
posted. That is bad news.

Splitting the flush + write_dword into two separate flush_dw prevents
the issue from being reproduced, we can presume the post-sync op is not
so post-sync.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216
Testcase: igt/gem_exec_fence/parallel
Testcase: igt/i915_selftest/live/gt_timelines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
2020-11-03 14:29:10 +00:00
2739d8cfc5 drm/i915/gt: Expose more parameters for emitting writes into the ring
Add another lower level to emit_ggtt_write so that the GGTT nature of
the write is not hardcoded into the emitter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-1-chris@chris-wilson.co.uk
2020-11-03 14:16:43 +00:00
0f41e31a7b drm/i915/guc: Clear pointers on free
Clear out some pointers when objects have been de-allocated. This
makes it much easier to track down use-after-free type issues.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-4-John.C.Harrison@Intel.com
2020-10-29 13:46:28 +02:00
164e57ca15 drm/i915/guc: Improved reporting when GuC fails to load
Rather than just saying 'GuC failed to load: -110', actually print out
the GuC status register and break it down into the individual fields.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-3-John.C.Harrison@Intel.com
2020-10-29 13:46:27 +02:00
c784e5249e drm/i915/guc: Update to use firmware v49.0.1
The latest GuC firmware includes a number of interface changes that
require driver updates to match.

* Starting from Gen11, the ID to be provided to GuC needs to contain
  the engine class in bits [0..2] and the instance in bits [3..6].

  NOTE: this patch breaks pointer dereferences in some existing GuC
  functions that use the guc_id to dereference arrays but these functions
  are not used for now as we have GuC submission disabled and we will
  update these functions in follow up patch which requires new IDs.

* The new GuC requires the additional data structure (ADS) and associated
  'private_data' pointer to be setup. This is basically a scratch area
  of memory that the GuC owns. The size is read from the CSS header.

* There is now a physical to logical engine mapping table in the ADS
  which needs to be configured in order for the firmware to load. For
  now, the table is initialised with a 1 to 1 mapping.

* GUC_CTL_CTXINFO has been removed from the initialization params.

* reg_state_buffer is maintained internally by the GuC as part of
  the private data.

* The ADS layout has changed significantly. This patch updates the
  shared structure and also adds better documentation of the layout.

* While i915 does not use GuC doorbells, the firmware now requires
  that some initialisation is done.

* The number of engine classes and instances supported in the ADS has
  been increased.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-2-John.C.Harrison@Intel.com
2020-10-29 13:46:26 +02:00
fc03b2d6a9 Merge tag 'drm-next-2020-10-23' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
 "This should be the last round of things for rc1, a bunch of i915
  fixes, some amdgpu, more font OOB fixes and one ttm fix just found
  reading code:

  fbcon/fonts:
   - Two patches to prevent OOB access

  ttm:
   - fix for evicition value range check

  amdgpu:
   - Sienna Cichlid fixes
   - MST manager resource leak fix
   - GPU reset fix

  amdkfd:
   - Luxmark fix for Navi1x

  i915:
   - Tweak initial DPCD backlight.enabled value (Sean)
   - Initialize reserved MOCS indices (Ayaz)
   - Mark initial fb obj as WT on eLLC machines to avoid rcu lockup (Ville)
   - Support parsing of oversize batches (Chris)
   - Delay execlists processing for TGL (Chris)
   - Use the active reference on the vma during error capture (Chris)
   - Widen CSB pointer (Chris)
   - Wait for CSB entries on TGL (Chris)
   - Fix unwind for scratch page allocation (Chris)
   - Exclude low patches of stolen memory (Chris)
   - Force VT'd workarounds when running as a guest OS (Chris)
   - Drop runtime-pm assert from vpgu io accessors (Chris)"

* tag 'drm-next-2020-10-23' of git://anongit.freedesktop.org/drm/drm: (31 commits)
  drm/amdgpu: correct the cu and rb info for sienna cichlid
  drm/amd/pm: remove the average clock value in sysfs
  drm/amd/pm: fix pp_dpm_fclk
  Revert drm/amdgpu: disable sienna chichlid UMC RAS
  drm/amd/pm: fix pcie information for sienna cichlid
  drm/amdkfd: Use same SQ prefetch setting as amdgpu
  drm/amd/swsmu: correct wrong feature bit mapping
  drm/amd/psp: Fix sysfs: cannot create duplicate filename
  drm/amd/display: Avoid MST manager resource leak.
  drm/amd/display: Revert "drm/amd/display: Fix a list corruption"
  drm/amdgpu: update golden setting for sienna_cichlid
  drm/amd/swsmu: add missing feature map for sienna_cichlid
  drm/amdgpu: correct the gpu reset handling for job != NULL case
  drm/amdgpu: add rlc iram and dram firmware support
  drm/amdgpu: add function to program pbb mode for sienna cichlid
  drm/i915: Drop runtime-pm assert from vgpu io accessors
  drm/i915: Force VT'd workarounds when running as a guest OS
  drm/i915: Exclude low pages (128KiB) of stolen from use
  drm/i915/gt: Onion unwind for scratch page allocation failure
  drm/ttm: fix eviction valuable range check.
  ...
2020-10-23 13:56:34 -07:00
6e7a21e7ab drm/i915/selftests: Exercise intel_timeline_read_hwsp()
intel_timeline_read_hwsp() is used to support semaphore waits between
engines, that may themselves be deferred for arbitrary periods -- that
is the read of the target request's HWSP is at an indeterminant point in
the future. To support this, we need to prevent overwriting a HWSP that
is being watched across a seqno wrap (otherwise the next request will
write its value into the old HWSP preventing the watcher from making
progress, ad infinitum.) To simulate the observer across a wrap, let's
create a request that reads from the HWSP and dispatch it at different
points around a wrap to see if the value is lost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021220411.5777-2-chris@chris-wilson.co.uk
2020-10-23 13:35:39 +01:00
c10f6019d0 drm/i915/gt: Use the local HWSP offset during submission
We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.

Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.

Fixes: ebece75392 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
2020-10-23 13:35:38 +01:00
b1cff58578 drm/i915/selftests: Skip RPS tests on Ironlake (only IPS)
Since Ironlake uses intel_ips.ko for its dynamic frequency adjustment,
we do not have direct control over the frequency management so such
tests are defunct. Similarly, we can't check the gen6+ RPS registers on
Ironlake.

Hopefully this catches all the invalid tests now that Ironlake has
rejoined the dynamic GPU frequency club. There is an opportunity for the
reader to add tests to exercise MEMINTRSTS and co.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022210814.23004-1-chris@chris-wilson.co.uk
2020-10-23 09:57:54 +01:00
c6073d4c92 drm/i915: Clean up the irq enable/disable for ilk rps
Let's unmask the PCU event irq _after_ we've set up the
hardware and software to deal with the fallout. We can
also drop the PCU event bit from DEIER except when we
need it for rps.

And on the disable side we replace the hand rolled (and
unlocked) DEIER/IIR/IMR frobbing with ilk_disable_display_irq().
Ocd does require me to reorder it to be symmetric with
the enable path however.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-5-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2020-10-21 23:21:45 +03:00
d08c4e2327 drm/i915: Fix potential overflows in ilk ips calculations
A bunch of the ips calculations require 64bit math. In particular
'corr' and 'corr2' look like they can overflow on 32bit systems.
Switch to explicit u64 for those.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-3-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2020-10-21 23:20:19 +03:00
e82351e74d drm/i915: Read actual GPU frequency from MEMSTAT_ILK on ILK
There is no GEN6_RPSTAT1 on ILK. Instead of reading that let's
try to get the same information from MEMSTAT_ILK. At least it
seems to track MEMSWCTL frequency request perfectly on my ILK.
It needs the same invert trick as the request value.

We don't want to put the invert thing into intel_gpu_freq()
and intel_freq_opcode() because that would incorrectly invert
the min/max/etc frequencies also.

One day someone might want to reverse engineer the formula for
converting these numbers to Hz, but for now we'll just report
them raw.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2020-10-21 23:19:36 +03:00
8f2b4b684a drm/i915/selftests: Flush the old heartbeat more gently
In order to test how fast the heartbeat can respond, we measure with the
interval set to its minimum. Before we measure though, we want to be
sure we start with a fresh pulse, and so wait until any old one is
complete. During that wait though, we were continually flushing the
work, and so continually re-evaluating to see if the pulse was complete,
and each attempt would count as an unresponsive system. If the engine
did not complete the request in the couple of busy-spins, it would flag
an error. This is unfortunate, so let's not busy-spin waiting for the
old heartbeat, but terminate it and start afresh.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
2020-10-21 19:20:24 +01:00
3da3c5c1c9 drm/i915: Exclude low pages (128KiB) of stolen from use
The GPU is trashing the low pages of its reserved memory upon reset. If
we are using this memory for ringbuffers, then we will dutiful resubmit
the trashed rings after the reset causing further resets, and worse. We
must exclude this range from our own use. The value of 128KiB was found
by empirical measurement (and verified now with a selftest) on gen9.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
(cherry picked from commit d3606757e6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-21 08:32:28 -04:00
b8cff311a4 drm/i915/gt: Onion unwind for scratch page allocation failure
In switching to using objects for our ppGTT scratch pages, care was not
taken to avoid trying to unref NULL objects on failure. And for gen6
ppGTT, it appears we forgot entirely to unwind after a partial allocation
failure.

Fixes: 89351925a4 ("drm/i915/gt: Switch to object allocations for page directories")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
(cherry picked from commit fa812ce96a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-21 08:32:25 -04:00
d3606757e6 drm/i915: Exclude low pages (128KiB) of stolen from use
The GPU is trashing the low pages of its reserved memory upon reset. If
we are using this memory for ringbuffers, then we will dutiful resubmit
the trashed rings after the reset causing further resets, and worse. We
must exclude this range from our own use. The value of 128KiB was found
by empirical measurement (and verified now with a selftest) on gen9.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
2020-10-20 10:16:54 +01:00
4a9bb58aba drm/i915/gt: Wait for CSB entries on Tigerlake
On Tigerlake, we are seeing a repeat of commit d8f5053117 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.

However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:

    513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f5053117 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
(cherry picked from commit 233c1ae3c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 14:32:31 -04:00
ca05277e40 drm/i915/gt: Widen CSB pointer to u64 for the parsers
A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
(cherry picked from commit f24a44e52f)
(cherry picked from commit 3d4dbe0e0f0d04ebcea917b7279586817da8cf46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 14:31:59 -04:00
64402570e1 drm/i915/gt: Undo forced context restores after trivial preemptions
We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.

However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)

Fixes: 8ab3a3812a ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
(cherry picked from commit bb65548e3c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:55 -04:00
9b99e5ba3e drm/i915/gt: Delay execlist processing for tgl
When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.

Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c8 ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
(cherry picked from commit 6ca7217dff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:52 -04:00
849c0fe9e8 drm/i915/gt: Initialize reserved and unspecified MOCS indices
In order to avoid functional breakage of mis-programmed applications that
have grown to depend on unused MOCS entries, we are programming
those entries to be equal to fully cached ("L3 + LLC") entry.

These reserved and unspecified entries should not be used as they may be
changed to less performant variants with better coherency in the future
if more entries are needed.

v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for
programming default value, setting I915_MOCS_PTE index of tgl_mocs_table
with desired value.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Mcguire Russell W <russell.w.mcguire@intel.com>
Cc: Spruit Neil R <neil.r.spruit@intel.com>
Cc: Zhou Cheng <cheng.zhou@intel.com>
Cc: Benemelis Mike G <mike.g.benemelis@intel.com>

Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com
Cc: stable@vger.kernel.org
(cherry picked from commit 4d8a5cfe3b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:44 -04:00
fa812ce96a drm/i915/gt: Onion unwind for scratch page allocation failure
In switching to using objects for our ppGTT scratch pages, care was not
taken to avoid trying to unref NULL objects on failure. And for gen6
ppGTT, it appears we forgot entirely to unwind after a partial allocation
failure.

Fixes: 89351925a4 ("drm/i915/gt: Switch to object allocations for page directories")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
2020-10-19 12:36:38 +01:00
bfed6708d6 drm/i915: use vmap in shmem_pin_map
shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and
manual pte setup.  The only practical difference is that alloc_vm_area
prefeaults the vmalloc area PTEs, which doesn't seem to be required here
(and could be added to vmap using a flag if actually required).  Switch to
use vmap, and use vfree to free both the vmalloc mapping and the page
array, as well as dropping the references to each page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lkml.kernel.org/r/20201002122204.1534411-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 09:27:10 -07:00
89db95377b drm/i915/gt: Confirm the context survives execution
Repeat our sanitychecks from before execution to after execution. One
expects that if we were to see these, the gpu would already be on fire,
but the timing may be informative.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk
2020-10-16 12:13:46 +01:00
6971e07b6b drm/i915/gt: Cleanup kasan warning for on-stack (unsigned long) casting
Kasan is gving a warning for passing a u32 parameter into find_first_bit
(casting to a unsigned long *, with appropriate length restrictions):

[   44.678262] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x2e/0x50
[   44.678295] Read of size 8 at addr ffff888233f4fc30 by task core_hotunplug/474
[   44.678326]
[   44.678358] CPU: 0 PID: 474 Comm: core_hotunplug Not tainted 5.9.0+ #608
[   44.678465] Hardware name: BESSTAR (HK) LIMITED GN41/Default string, BIOS BLT-BI-MINIPC-F4G-EX3R110-GA65A-101-D 10/12/2018
[   44.678500] Call Trace:
[   44.678534]  dump_stack+0x84/0xba
[   44.678569]  print_address_description.constprop.0+0x21/0x220
[   44.678605]  ? kmsg_dump_rewind_nolock+0x5f/0x5f
[   44.678638]  ? _raw_spin_lock_irqsave+0x6d/0xb0
[   44.678669]  ? _raw_write_lock_irqsave+0xb0/0xb0
[   44.678702]  ? set_task_cpu+0x1e0/0x1e0
[   44.678733]  ? find_first_bit+0x2e/0x50
[   44.678763]  kasan_report.cold+0x20/0x42
[   44.678794]  ? find_first_bit+0x2e/0x50
[   44.678825]  __asan_load8+0x69/0x90
[   44.678856]  find_first_bit+0x2e/0x50
[   44.679027]  __caps_show.isra.0+0x9e/0x1f0 [i915]

Since we are only using the shorter type for our own convenience,
accommodate kasan and use unsigned long.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201013110845.16127-1-chris@chris-wilson.co.uk
2020-10-16 11:08:44 +01:00
bb65548e3c drm/i915/gt: Undo forced context restores after trivial preemptions
We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.

However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)

Fixes: 8ab3a3812a ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
2020-10-16 11:00:47 +01:00
6ca7217dff drm/i915/gt: Delay execlist processing for tgl
When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.

Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c8 ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
2020-10-16 11:00:47 +01:00
da94275092 drm/i915/dg1: Add initial DG1 workarounds
DG1 shares some workarounds with TGL and RKL and also has some
additional workarounds of its own.

v2: Correct location of Wa_1408615072 (JohnH).
v3: Apply WAs 1606700617, 18011464164 and 22010931296 to DG1 (José)
v4 (Anusha)
  - Add Wa_22010271021
  - s/Wa_14010096844/Wa_1409836686
v5:
  - Extend Wa_14010919138 to all revs (Matt Atwood)
  - Power gate media is global gen12 design. (Rodrigo)
  - Rebase (Lucas)
v6: use REG_BIT() to fix checkpatch warning (Lucas)

BSpec: 53508

Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201014191937.1266226-8-lucas.demarchi@intel.com
2020-10-15 14:14:34 -07:00
a04ac82736 drm/i915/gt: Fixup tgl mocs for PTE tracking
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.

Fixes: 4d8a5cfe3b ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
2020-10-15 15:38:21 +01:00
c0888e9e22 drm/i915: Enable eLLC caching of display buffers for SKL+
Since SKL the eLLC has been sitting on the far side of the system
agent, meaning the display engine can utilize it. Let's enable that.

I chose WB for the caching mode, because my numbers are indicating
that WT might actually be WB and WC might actually be UC. I'm not
100% sure that is indeed the case but at least my simple rendercopy
based benchmark didn't see any difference in performance.

Also if I configure things to do LLCeLLC+WT I still get cache dirt
on my screen, suggesting that is in fact operating in WB mode
anyway. This is also the reason I had to fix the MOCS target cache
to really say PTE rather than LLC+eLLC.
Since SKL the eLLC has been sitting on the far side of the system agent,
meaning the display engine can utilize it. Let's enable that.

Eero's earlier benchmarks numbers:
"* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily
   variation on the tested SKL machines

 * SKL GT4e (128MB eLLC) / Wayland / Weston:
   +15-20% SynMark TexMem512 (512MB of textures)
   +4-6% SynMark TerrainFly*, CSCloth, ShMapVsm
   -5-10% SynMark TexMem128 (128MB of textures)

 * SKL GT3e (64MB eLLC) / Xorg / Unity:
   +4-8% GpuTest Triangle fullscreen (FullHD)
   -5-10% GpuTest Triangle windowed (1/2 screen)

 * SKL GT2 (no eLLC) / Xorg / Unity:
   * Some of the higher FPS SynMark pixel and vertex shader tests
     are few percent higher, more than daily variance
   => Do you see any reason why this machine would be impacted
      although it doesn't eLLC?"

Caveats:
- Still haven't tested with a prime setup
- Still not entirely sure this a good idea, but I've been
  using it on my cfl anyway :)

v2: Split the MOCS PTE change out

Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
2020-10-15 15:38:20 +01:00
36b6b68169 drm/i915: Fix MOCS PTE setting for gen9+
Fix up the MOCS PTE setting to really get the LLC cacheability
from the PTE rather than hardocoding it to LLC or LLC+eLLC.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-2-ville.syrjala@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-2-chris@chris-wilson.co.uk
2020-10-15 15:38:20 +01:00
24ea098b7c drm/i915/jsl: Split EHL/JSL platform info and PCI ids
Recently we came across requirement to identify EHL and JSL
platform to program them differently. Thus Split the basic
platform definition, macros, and PCI IDs to differentiate
between EHL and JSL platforms. Also, IS_ELKHARTLAKE is replaced
with IS_JSL_EHL everywhere.

Changes since V1 :
	- Rebased to avoid merge conflicts
	- Added missed check for jasperlake in intel_uc_fw.c

Cc : Matt Roper <matthew.d.roper@intel.com>
Cc : Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201013192948.63470-1-tejaskumarx.surendrakumar.upadhyay@intel.com
2020-10-14 09:31:34 +02:00
4d8a5cfe3b drm/i915/gt: Initialize reserved and unspecified MOCS indices
In order to avoid functional breakage of mis-programmed applications that
have grown to depend on unused MOCS entries, we are programming
those entries to be equal to fully cached ("L3 + LLC") entry.

These reserved and unspecified entries should not be used as they may be
changed to less performant variants with better coherency in the future
if more entries are needed.

v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for
programming default value, setting I915_MOCS_PTE index of tgl_mocs_table
with desired value.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Mcguire Russell W <russell.w.mcguire@intel.com>
Cc: Spruit Neil R <neil.r.spruit@intel.com>
Cc: Zhou Cheng <cheng.zhou@intel.com>
Cc: Benemelis Mike G <mike.g.benemelis@intel.com>

Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com
Cc: stable@vger.kernel.org
2020-10-13 10:24:12 +01:00
3bcacad3d7 drm/i915: Update gen12 multicast register ranges
The updated bspec forcewake table also provides us with new multicast
ranges that should be reflected in our workaround code.

Note that there are different types of multicast registers with
different styles of replication and different steering registers.  The
i915 MCR range lists we're updating here are only used to ensure we can
verify workarounds properly (i.e., if we can't steer register reads we
don't want to verify workarounds where an unsteered read might hit a
fused-off instance of the unit).  Because of this, we don't need to
include any of the multicast ranges where all instances of the register
will always present and fusing doesn't play a role.  Specifically, that
means that we are not including the MCR ranges designated as "SQIDI" in
the bspec.

Bspec: 66696
Cc: Caz Yokoyama <caz.yokoyama@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201009194442.3668677-4-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
2020-10-09 18:54:44 -07:00
55e3c17095 drm/i915: Rename FORCEWAKE_BLITTER to FORCEWAKE_GT
The power well that we've been referring to as the 'blitter' well is
actually more of a general GT power well which contains a lot of things
other than the blitter engine registers.  The FORCEWAKE_BLITTER name in
the code was used for historic reasons, but no longer matches how the
bspec describes this power well and just causes confusion for people not
familiar with this area of the code.  Let's rename it to FORCEWAKE_GT to
more accurately describe the role of the power well and match how the
modern bspec refers to it.

v2:
 - Add a comment noting that the GT power well includes the blitter
   engine. (Jose)

Bspec: 66696, 66534, 67609
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201009194442.3668677-2-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
2020-10-09 18:51:27 -07:00
2606b26923 drm/i915/dg1: Define MOCS table for DG1
DG1 has a new MOCS table. We still use the old definition of the table,
but as for any dgfx card it doesn't contain the control_value values
(these values don't matter as we won't program them).

Bspec: 45101

v2: Reword the comment to state that the last few entries are reserved
    instead of "the last two". DG1 reserves four instead of two from
    previous platforms (from Matt Roper)

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007002210.3678024-3-lucas.demarchi@intel.com
2020-10-07 13:51:20 -07:00
bf9bd6a512 drm/i915/gt: Track the most recent pulse for the heartbeat
Since we track the idle_pulse for flushing the barriers and avoid
re-emitting the pulse upon idling if no futher action is required, this
also impacts the heartbeat. Before emitting a fresh heartbeat, we look
at the engine idle status and assume that if the pulse was the last
request emitted along the heartbeat, the engine is idling and a
heartbeat pulse not required. This assumption fails, but we can reuse
the idle pulse as the heartbeat if we are yet to emit one, and so track
the status of that pulse for our engine health check.

This impacts tgl/rcs0 as we rely on the heartbeat for our healthcheck for
the normal preemption detection mechanism is disabled by default.

Testcase: igt/gem_exec_schedule/preempt-hang/rcs0 #tgl
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006094653.7558-1-chris@chris-wilson.co.uk
2020-10-07 10:23:11 +01:00
934941ed5a drm/i915: Fix DMA mapped scatterlist lookup
As the previous patch fixed the places where we walk the whole scatterlist
for DMA addresses, this patch fixes the random lookup functionality.

To achieve this we have to add a second lookup iterator and add a
i915_gem_object_get_sg_dma helper, to be used analoguous to existing
i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per
object and they are flushed at the same point for simplicity. (Strictly
speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages,
but today this conincides with unsetting of the pages in general.)

Partial VMA view is then fixed to use the new DMA lookup and properly
query sg length.

v2:
 * Checkpatch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
2020-10-06 12:49:12 +01:00
8a473dbadc drm/i915: Fix DMA mapped scatterlist walks
When walking DMA mapped scatterlists sg_dma_len has to be used since it
can be different (coalesced) from the backing store entry.

This also means we have to end the walk when encountering a zero length
DMA entry and cannot rely on the normal sg list end marker.

Both issues were there in theory for some time but were hidden by the fact
Intel IOMMU driver was never coalescing entries. As there are ongoing
efforts to change this we need to start handling it.

v2:
 * Use unsigned int for local storing sg_dma_len. (Logan)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: 85d1225ec0 ("drm/i915: Introduce & use new lightweight SGL iterators")
References: b31144c0da ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()")
Reported-by: Tom Murphy <murphyt7@tcd.ie>
Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter
Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com
2020-10-06 12:49:07 +01:00
25dc89d527 drm/i915/gt: Scrub HW state on remove
Currently we do a final scrub of the HW state upon release. However,
when rebinding the device, this is too late as the device may either
have been partially rebound or the device is no longer accessible. If
the device has been removed before release, the reset goes astray
leaving the device in an inconsistent state, unlikely to work without a
full PCI reset. Furthermore, if the device is partially rebound before
the HW scrubbing, there may be leftover HW state that should have been
scrubbed. Either way, we need to push the scrubbing earlier before the
removal, so into unregister. The danger is that on older machines,
resetting the GPU also impact the display engine and so the reset should
be after modesetting is disabled (and before reuse we need to recover
modesetting).

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2508
Testcase: igt/core_hotunplug
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200929112639.24223-1-chris@chris-wilson.co.uk
2020-10-06 12:09:30 +01:00
b1e93a85f8 drm/i915: don't conflate is_dgfx with fake lmem
When using fake lmem for tests, we are overriding the setting in
device info for dgfx devices. Current users of IS_DGFX() except one are
correct. However, as we add support for DG1, we are going to use it in
additional places to trigger dgfx-only code path.

In future if we need we can use HAS_LMEM() instead of IS_DGFX() in the
places that make sense to also contemplate fake lmem use.

v2: update gen8_gmch_probe() to use HAS_LMEM(): we need to steal the
mappable aperture later(which is fine since it doesn't exist on "DGFX"),
and use it as a substitute for LMEMBAR. The !mappable aperture property
is also useful since it exercises some other parts of the code too.
(Matthew Auld)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201001063917.3133475-1-lucas.demarchi@intel.com
2020-10-05 15:54:44 -07:00
ca65fc0d8e drm/i915/gt: Always send a pulse down the engine after disabling heartbeat
Currently, we check we can send a pulse prior to disabling the
heartbeat to verify that we can change the heartbeat, but since we may
re-evaluate execution upon changing the heartbeat interval we need another
pulse afterwards to refresh execution.

v2: Tvrtko asked if we could reduce the double pulse to a single, which
opened up a discussion of how we should handle the pulse-error after
attempting to change the property, and the desire to serialise
adjustment of the property with its validating pulse, and unwind upon
failure.

Fixes: 9a40bddd47 ("drm/i915/gt: Expose heartbeat interval via sysfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-2-chris@chris-wilson.co.uk
(cherry picked from commit 3dd66a94de)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-30 14:24:48 -04:00