14 Commits

Author SHA1 Message Date
Chris Wilson
8f125dafb3 drm/i915/gt: Move the heartbeat into the high priority system wq
As we ensure that the heartbeat is reasonably fast (and should not
block), move the heartbeat work into the system_highpri_wq to avoid
having this essential task be blocked behind other slow work, such as
our own retire_work_handler.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2119
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702095219.963-2-chris@chris-wilson.co.uk
2020-07-02 12:30:24 +01:00
Chris Wilson
aab4707fdd drm/i915/gt: Harden the heartbeat against a stuck driver
If the driver gets stuck holding the kernel timeline, we cannot issue a
heartbeat and so fail to discover that the driver is indeed stuck and do
not issue a GPU reset (which would hopefully unstick the driver!).
Switch to using a trylock so that we can query if the heartbeat's
timeline mutex is locked elsewhere, and then use the timer to probe if it
remains stuck at the same spot for consecutive heartbeats, indicating
that the mutex has not been released and the engine has not progressed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702095219.963-1-chris@chris-wilson.co.uk
2020-07-02 12:30:23 +01:00
Jani Nikula
8a25c4be58 drm/i915/params: switch to device specific parameters
Start using device specific parameters instead of module parameters for
most things. The module parameters become the immutable initial values
for i915 parameters. The device specific parameters in i915->params
start life as a copy of i915_modparams. Any later changes are only
reflected in the debugfs.

The stragglers are:

* i915.force_probe and i915.modeset. Needed before dev_priv is
  available. This is fine because the parameters are read-only and never
  modified.

* i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and
  I915_STATE_WARN_ON would result in massive and ugly churn. This is
  handled by not exposing the parameter via debugfs, and leaving the
  parameter writable in sysfs. This may be fixed up in follow-up work.

* i915.inject_probe_failure. Only makes sense in terms of the module,
  not the device. This is handled by not exposing the parameter via
  debugfs.

v2: Fix uc i915 lookup code (Michał Winiarski)

Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
2020-06-22 23:26:40 +03:00
Chris Wilson
5948938700 drm/i915/gt: Add a safety submission flush in the heartbeat
Just in case everything fails (like for example "missed interrupt
syndrome" on Sandybridge), always flush the submission tasklet from the
heartbeat. This papers over such issues, but will still appear as a
second long glitch, and prevents us from detecting it unless we happen
to be performing a timed test.

v2: We rely on flush_submission() synchronizing with the tasklet on
another CPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200615165013.22973-3-chris@chris-wilson.co.uk
2020-06-15 18:32:28 +01:00
Chris Wilson
ba03a63d76 drm/i915/gt: Don't declare hangs if engine is stalled
If the ring submission is stalled on an external request, nothing can be
submitted, not even the heartbeat in the kernel context. Since nothing
is running, resetting the engine/device does not unblock the system and
is pointless. We can see if the heartbeat is supposed to be running
before declaring foul.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200528074109.28235-2-chris@chris-wilson.co.uk
2020-05-28 17:53:52 +01:00
Chris Wilson
500f9ac302 drm/i915/gt: Always reschedule the new heartbeat
In order to better respond to new heartbeat intervals given via sysfs,
always reprogramme an active heartbeat upon change (i.e. use
mod_delayed_work to reschedule rather than queue_delayed_work which
ignores an already active work.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200317163208.30010-1-chris@chris-wilson.co.uk
2020-03-17 18:26:15 +00:00
Chris Wilson
fbcb52db41 drm/i915/gt: Fix up missing error propagation for heartbeat pulses
Just missed setting err along an interruptible error path for the
intel_engine_pulse().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218162150.1300405-4-chris@chris-wilson.co.uk
2020-02-18 20:32:21 +00:00
Chris Wilson
e1c31fb5dd drm/i915: Merge i915_request.flags with i915_request.fence.flags
As we already have a flags field buried within i915_request, reuse it!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk
2020-01-06 14:38:55 +00:00
Chris Wilson
de5825beae drm/i915: Serialise with engine-pm around requests on the kernel_context
As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk
2019-11-25 13:17:18 +00:00
Chris Wilson
3466a3def2 drm/i915/gt: Cleanup heartbeat systole first
Before we grab the engine wakeref, tidy up the previous heartbeat
request. If we then abort because the engine powerwell is off, we ensure
the request is freed as we know we will not have freed it when
cancelling the work (as the work is running!).

Fixes: 841e86728615 ("drm/i915/gt: Only drop heartbeat.systole if the sole owner")
References: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191106223410.30334-1-chris@chris-wilson.co.uk
2019-11-07 07:49:55 +00:00
Chris Wilson
841e867286 drm/i915/gt: Only drop heartbeat.systole if the sole owner
Mika spotted that only using cancel_delayed_work() could mean that we
attempted to clear the heartbeat.systole while the worker was still
running. Rectify the situation by only touching the systole from outside
the worker if we suceeded in cancelling the worker before it could run.
The worker is expected to clean up by itself upon idling.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191106133129.17732-1-chris@chris-wilson.co.uk
2019-11-06 15:45:53 +00:00
Chris Wilson
babaab2f47 drm/i915: Encapsulate kconfig constant values inside boolean predicates
Avoid angering clang and smatch by using a constant value in a '&&' test,
by forcing that constant value into a boolean.

E.g.,
drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c:159:13: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
	if (!delay && CONFIG_DRM_I915_PREEMPT_TIMEOUT) {
                      ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025135943.12524-1-chris@chris-wilson.co.uk
2019-10-26 09:25:25 +01:00
Chris Wilson
058179e72e drm/i915/gt: Replace hangcheck by heartbeats
Replace sampling the engine state every so often with a periodic
heartbeat request to measure the health of an engine. This is coupled
with the forced-preemption to allow long running requests to survive so
long as they do not block other users.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-5-chris@chris-wilson.co.uk
2019-10-23 23:52:10 +01:00
Chris Wilson
b5e8e954eb drm/i915/gt: Introduce barrier pulses along engines
To flush idle barriers, and even inflight requests, we want to send a
preemptive 'pulse' along an engine. We use a no-op request along the
pinned kernel_context at high priority so that it should run or else
kick off the stuck requests. We can use this to ensure idle barriers are
immediately flushed, as part of a context cancellation mechanism, or as
part of a heartbeat mechanism to detect and reset a stuck GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191021174339.5389-1-chris@chris-wilson.co.uk
2019-10-21 21:01:52 +01:00