linux/drivers/gpu/drm/i915/i915_drv.h

1784 lines
55 KiB
C
Raw Normal View History

/* i915_drv.h -- Private header for the I915 driver -*- linux-c -*-
*/
/*
*
* Copyright 2003 Tungsten Graphics, Inc., Cedar Park, Texas.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
#ifndef _I915_DRV_H_
#define _I915_DRV_H_
#include <uapi/drm/i915_drm.h>
#include <uapi/drm/drm_fourcc.h>
#include <asm/hypervisor.h>
#include <linux/io-mapping.h>
#include <linux/i2c.h>
#include <linux/i2c-algo-bit.h>
#include <linux/backlight.h>
#include <linux/hash.h>
#include <linux/intel-iommu.h>
#include <linux/kref.h>
#include <linux/mm_types.h>
drm/i915/pmu: Expose a PMU interface for perf queries From: Chris Wilson <chris@chris-wilson.co.uk> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> The first goal is to be able to measure GPU (and invidual ring) busyness without having to poll registers from userspace. (Which not only incurs holding the forcewake lock indefinitely, perturbing the system, but also runs the risk of hanging the machine.) As an alternative we can use the perf event counter interface to sample the ring registers periodically and send those results to userspace. Functionality we are exporting to userspace is via the existing perf PMU API and can be exercised via the existing tools. For example: perf stat -a -e i915/rcs0-busy/ -I 1000 Will print the render engine busynnes once per second. All the performance counters can be enumerated (perf list) and have their unit of measure correctly reported in sysfs. v1-v2 (Chris Wilson): v2: Use a common timer for the ring sampling. v3: (Tvrtko Ursulin) * Decouple uAPI from i915 engine ids. * Complete uAPI defines. * Refactor some code to helpers for clarity. * Skip sampling disabled engines. * Expose counters in sysfs. * Pass in fake regs to avoid null ptr deref in perf core. * Convert to class/instance uAPI. * Use shared driver code for rc6 residency, power and frequency. v4: (Dmitry Rogozhkin) * Register PMU with .task_ctx_nr=perf_invalid_context * Expose cpumask for the PMU with the single CPU in the mask * Properly support pmu->stop(): it should call pmu->read() * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) * Introduce refcounting of event subscriptions. * Make pmu.busy_stats a refcounter to avoid busy stats going away with some deleted event. * Expose cpumask for i915 PMU to avoid multiple events creation of the same type followed by counter aggregation by perf-stat. * Track CPUs getting online/offline to migrate perf context. If (likely) cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be needed to see effect of CPU status tracking. * End result is that only global events are supported and perf stat works correctly. * Deny perf driver level sampling - it is prohibited for uncore PMU. v5: (Tvrtko Ursulin) * Don't hardcode number of engine samplers. * Rewrite event ref-counting for correctness and simplicity. * Store initial counter value when starting already enabled events to correctly report values to all listeners. * Fix RC6 residency readout. * Comments, GPL header. v6: * Add missing entry to v4 changelog. * Fix accounting in CPU hotplug case by copying the approach from arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) v7: * Log failure message only on failure. * Remove CPU hotplug notification state on unregister. v8: * Fix error unwind on failed registration. * Checkpatch cleanup. v9: * Drop the energy metric, it is available via intel_rapl_perf. (Ville Syrjälä) * Use HAS_RC6(p). (Chris Wilson) * Handle unsupported non-engine events. (Dmitry Rogozhkin) * Rebase for intel_rc6_residency_ns needing caller managed runtime pm. * Drop HAS_RC6 checks from the read callback since creating those events will be rejected at init time already. * Add counter units to sysfs so perf stat output is nicer. * Cleanup the attribute tables for brevity and readability. v10: * Fixed queued accounting. v11: * Move intel_engine_lookup_user to intel_engine_cs.c * Commit update. (Joonas Lahtinen) v12: * More accurate sampling. (Chris Wilson) * Store and report frequency in MHz for better usability from perf stat. * Removed metrics: queued, interrupts, rc6 counters. * Sample engine busyness based on seqno difference only for less MMIO (and forcewake) on all platforms. (Chris Wilson) v13: * Comment spelling, use mul_u32_u32 to work around potential GCC issue and somne code alignment changes. (Chris Wilson) v14: * Rebase. v15: * Rebase for RPS refactoring. v16: * Use the dynamic slot in the CPU hotplug state machine so that we are free to setup our state as multi-instance. Previously we were re-using the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as multi-instance, nor owned by our driver to start with. * Register the CPU hotplug handlers after the PMU, otherwise the callback will get called before the PMU is initialized which can end up in perf_pmu_migrate_context with an un-initialized base. * Added workaround for a probable bug in cpuhp core. v17: * Remove workaround for the cpuhp bug. v18: * Rebase for drm_i915_gem_engine_class getting upstream before us. v19: * Rebase. (trivial) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com
2017-11-21 18:18:45 +00:00
#include <linux/perf_event.h>
drm/i915: irq-drive the dp aux communication At least on the platforms that have a dp aux irq and also have it enabled - vlvhsw should have one, too. But I don't have a machine to test this on. Judging from docs there's no dp aux interrupt for gm45. Also, I only have an ivb cpu edp machine, so the dp aux A code for snb/ilk is untested. For dpcd probing when nothing is connected it slashes about 5ms of cpu time (cpu time is now negligible), which agrees with 3 * 5 400 usec timeouts. A previous version of this patch increases the time required to go through the dp_detect cycle (which includes reading the edid) from around 33 ms to around 40 ms. Experiments indicated that this is purely due to the irq latency - the hw doesn't allow us to queue up dp aux transactions and hence irq latency directly affects throughput. gmbus is much better, there we have a 8 byte buffer, and we get the irq once another 4 bytes can be queued up. But by using the pm_qos interface to request the lowest possible cpu wake-up latency this slowdown completely disappeared. Since all our output detection logic is single-threaded with the mode_config mutex right now anyway, I've decide not ot play fancy and to just reuse the gmbus wait queue. But this would definitely prep the way to run dp detection on different ports in parallel v2: Add a timeout for dp aux transfers when using interrupts - the hw _does_ prevent this with the hw-based 400 usec timeout, but if the irq somehow doesn't arrive we're screwed. Lesson learned while developing this ;-) v3: While at it also convert the busy-loop to wait_for_atomic, so that we don't run the risk of an infinite loop any more. v4: Ensure we have the smallest possible irq latency by using the pm_qos interface. v5: Add a comment to the code to explain why we frob pm_qos. Suggested by Chris Wilson. v6: Disable dp irq for vlv, that's easier than trying to get at docs and hw. v7: Squash in a fix for Haswell that Paulo Zanoni tracked down - the dp aux registers aren't at a fixed offset any more, but can be on the PCH while the DP port is on the cpu die. Reviewed-by: Imre Deak <imre.deak@intel.com> (v6) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-01 13:53:48 +01:00
#include <linux/pm_qos.h>
#include <linux/dma-resv.h>
#include <linux/shmem_fs.h>
#include <linux/stackdepot.h>
#include <linux/xarray.h>
#include <drm/drm_gem.h>
#include <drm/drm_auth.h>
#include <drm/drm_cache.h>
#include <drm/drm_util.h>
#include <drm/drm_dsc.h>
drm/i915: Make sure we have enough memory bandwidth on ICL ICL has so many planes that it can easily exceed the maximum effective memory bandwidth of the system. We must therefore check that we don't exceed that limit. The algorithm is very magic number heavy and lacks sufficient explanation for now. We also have no sane way to query the memory clock and timings, so we must rely on a combination of raw readout from the memory controller and hardcoded assumptions. The memory controller values obviously change as the system jumps between the different SAGV points, so we try to stabilize it first by disabling SAGV for the duration of the readout. The utilized bandwidth is tracked via a device wide atomic private object. That is actually not robust because we can't afford to enforce strict global ordering between the pipes. Thus I think I'll need to change this to simply chop up the available bandwidth between all the active pipes. Each pipe can then do whatever it wants as long as it doesn't exceed its budget. That scheme will also require that we assume that any number of planes could be active at any time. TODO: make it robust and deal with all the open questions v2: Sleep longer after disabling SAGV v3: Poll for the dclk to get raised (seen it take 250ms!) If the system has 2133MT/s memory then we pointlessly wait one full second :( v4: Use the new pcode interface to get the qgv points rather that using hardcoded numbers v5: Move the pcode stuff into intel_bw.c (Matt) s/intel_sagv_info/intel_qgv_info/ Do the NV12/P010 as per spec for now (Matt) s/IS_ICELAKE/IS_GEN11/ v6: Ignore bandwidth limits if the pcode query fails Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190524153614.32410-1-ville.syrjala@linux.intel.com
2019-05-24 18:36:14 +03:00
#include <drm/drm_atomic.h>
#include <drm/drm_connector.h>
#include <drm/i915_mei_hdcp_interface.h>
#include <drm/ttm/ttm_device.h>
#include "i915_params.h"
#include "i915_reg.h"
#include "i915_utils.h"
#include "display/intel_bios.h"
#include "display/intel_cdclk.h"
#include "display/intel_display.h"
#include "display/intel_display_power.h"
#include "display/intel_dmc.h"
#include "display/intel_dpll_mgr.h"
#include "display/intel_dsb.h"
#include "display/intel_fbc.h"
#include "display/intel_frontbuffer.h"
#include "display/intel_global_state.h"
#include "display/intel_gmbus.h"
#include "display/intel_opregion.h"
#include "gem/i915_gem_context_types.h"
#include "gem/i915_gem_shrinker.h"
#include "gem/i915_gem_stolen.h"
#include "gem/i915_gem_lmem.h"
#include "gt/intel_engine.h"
#include "gt/intel_gt_types.h"
#include "gt/intel_region_lmem.h"
#include "gt/intel_workarounds.h"
#include "gt/uc/intel_uc.h"
#include "intel_device_info.h"
#include "intel_memory_region.h"
#include "intel_pch.h"
#include "intel_pm_types.h"
#include "intel_runtime_pm.h"
#include "intel_step.h"
#include "intel_uncore.h"
#include "intel_wakeref.h"
drm/i915: Implement dynamic GuC WOPCM offset and size calculation Hardware may have specific restrictions on GuC WOPCM offset and size. On Gen9, the value of the GuC WOPCM size register needs to be larger than the value of GuC WOPCM offset register + a Gen9 specific offset (144KB) for reserved GuC WOPCM. Fail to enforce such a restriction on GuC WOPCM size will lead to GuC firmware execution failures. On the other hand, with current static GuC WOPCM offset and size values (512KB for both offset and size), the GuC WOPCM size verification will fail on Gen9 even if it can be fixed by lowering the GuC WOPCM offset by calculating its value based on HuC firmware size (which is likely less than 200KB on Gen9), so that we can have a GuC WOPCM size value which is large enough to pass the GuC WOPCM size check. This patch updates the reserved GuC WOPCM size for RC6 context on Gen9 to 24KB to strictly align with the Gen9 GuC WOPCM layout. It also adds support to verify the GuC WOPCM size aganist the Gen9 hardware restrictions. To meet all above requirements, let's provide dynamic partitioning of the WOPCM that will be based on platform specific HuC/GuC firmware sizes. v2: - Removed intel_wopcm_init (Ville/Sagar/Joonas) - Renamed and Moved the intel_wopcm_partition into intel_guc (Sagar) - Removed unnecessary function calls (Joonas) - Init GuC WOPCM partition as soon as firmware fetching is completed v3: - Fixed indentation issues (Chris) - Removed layering violation code (Chris/Michal) - Created separat files for GuC wopcm code (Michal) - Used inline function to avoid code duplication (Michal) v4: - Preset the GuC WOPCM top during early GuC init (Chris) - Fail intel_uc_init_hw() as soon as GuC WOPCM partitioning failed v5: - Moved GuC DMA WOPCM register updating code into intel_wopcm.c - Took care of the locking status before writing to GuC DMA Write-Once registers. (Joonas) v6: - Made sure the GuC WOPCM size to be multiple of 4K (4K aligned) v8: - Updated comments and fixed naming issues (Sagar/Joonas) - Updated commit message to include more description about the hardware restriction on GuC WOPCM size (Sagar) v9: - Minor changes variable names and code comments (Sagar) - Added detailed GuC WOPCM layout drawing (Sagar/Michal) - Refined macro definitions to be reader friendly (Michal) - Removed redundent check to valid flag (Michal) - Unified first parameter for exported GuC WOPCM functions (Michal) - Refined the name and parameter list of hardware restriction checking functions (Michal) v10: - Used shorter function name for internal functions (Joonas) - Moved init-ealry function into c file (Joonas) - Consolidated and removed redundant size checks (Joonas/Michal) - Removed unnecessary unlikely() from code which is only called once during boot (Joonas) - More fixes to kernel-doc format and content (Michal) - Avoided the use of PAGE_MASK for 4K pages (Michal) - Added error log messages to error paths (Michal) v11: - Replaced intel_guc_wopcm with more generic intel_wopcm and attached intel_wopcm to drm_i915_private instead intel_guc (Michal) - dynamic calculation of GuC non-wopcm memory start (a.k.a WOPCM Top offset from GuC WOPCM base) (Michal) - Moved WOPCM marco definitions into .c source file (Michal) - Exported WOPCM layout diagram as kernel-doc (Michal) v12: - Updated naming, function kernel-doc to align with new changes (Michal) v13: - Updated the ordering of s-o-b/cc/r-b tags (Sagar) - Corrected one tense error in comment (Sagar) - Corrected typos and removed spurious comments (Joonas) Bspec: 12690 Signed-off-by: Jackie Li <yaodong.li@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Spotswood <john.a.spotswood@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v8) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9) Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-2-git-send-email-yaodong.li@intel.com
2018-03-13 17:32:50 -07:00
#include "intel_wopcm.h"
#include "i915_gem.h"
#include "i915_gem_gtt.h"
#include "i915_gpu_error.h"
#include "i915_perf_types.h"
#include "i915_request.h"
#include "i915_scheduler.h"
#include "gt/intel_timeline.h"
#include "i915_vma.h"
#include "i915_irq.h"
/* General customization:
*/
#define DRIVER_NAME "i915"
#define DRIVER_DESC "Intel Graphics"
#define DRIVER_DATE "20201103"
#define DRIVER_TIMESTAMP 1604406085
struct drm_i915_gem_object;
drm/i915: Add short HPD IRQ storm detection for non-MST systems Unfortunately, it seems that the HPD IRQ storm problem from the early days of Intel GPUs was never entirely solved, only mostly. Within the last couple of days, I got a bug report from one of our customers who had been having issues with their machine suddenly booting up very slowly after having updated. The amount of time it took to boot went from around 30 seconds, to over 6 minutes consistently. After some investigation, I discovered that i915 was reporting massive amounts of short HPD IRQ spam on this system from the DisplayPort port, despite there not being anything actually connected. The symptoms would start with one "long" HPD IRQ being detected at boot: [ 1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0 [ 1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long [ 1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0 [ 1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long [ 1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0 [ 1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long [ 1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long … followed by constant short IRQs afterwards: [ 1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected [ 1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event. [ 1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3] [ 1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085 [ 1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 The customer's system in question has a GM45 GPU, which is apparently well known for hotplugging storms. So, workaround this impressively broken hardware by changing the default HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and short IRQs count for 1. This makes it so that 5 long IRQs will trigger an HPD storm, and on systems with short HPD storm detection 50 short IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of constant pulsing, which seems like a good middleground between being too sensitive and not being sensitive enough (which would cause visible stutters in userspace every time a storm occurs). And just to be extra safe: we don't enable this by default on systems with MST support. There's too high of a chance of MST support triggering storm detection, and systems that are new enough to support MST are a lot less likely to have issues with IRQ storms anyway. As a note: this patch was tested using a ThinkPad T450s and a Chamelium to simulate the short IRQ storms. Changes since v1: - Don't use two separate thresholds, just make long IRQs count for 10 each and short IRQs count for 1. This simplifies the code a bit - Ville Syrjälä Changes since v2: - Document @long_hpd in intel_hpd_irq_storm_detect, no functional changes Changes since v4: - Remove !! in long_hpd assignment - Ville Syrjälä - queue_hp = true - Ville Syrjälä Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-6-lyude@redhat.com
2018-11-06 16:30:16 -05:00
/* Threshold == 5 for long IRQs, 50 for short */
#define HPD_STORM_DEFAULT_THRESHOLD 50
struct i915_hotplug {
struct delayed_work hotplug_work;
const u32 *hpd, *pch_hpd;
struct {
unsigned long last_jiffies;
int count;
enum {
HPD_ENABLED = 0,
HPD_DISABLED = 1,
HPD_MARK_DISABLED = 2
} state;
} stats[HPD_NUM_PINS];
u32 event_bits;
u32 retry_bits;
struct delayed_work reenable_work;
u32 long_port_mask;
u32 short_port_mask;
struct work_struct dig_port_work;
drm/i915: Enable polling when we don't have hpd Unfortunately, there's two situations where we lose hpd right now: - Runtime suspend - When we've shut off all of the power wells on Valleyview/Cherryview While it would be nice if this didn't cause issues, this has the ability to get us in some awkward states where a user won't be able to get their display to turn on. For instance; if we boot a Valleyview system without any monitors connected, it won't need any of it's power wells and thus shut them off. Since this causes us to lose HPD, this means that unless the user knows how to ssh into their machine and do a manual reprobe for monitors, none of the monitors they connect after booting will actually work. Eventually we should come up with a better fix then having to enable polling for this, since this makes rpm a lot less useful, but for now the infrastructure in i915 just isn't there yet to get hpd in these situations. Changes since v1: - Add comment explaining the addition of the if (!mode_config->poll_running) in intel_hpd_init() - Remove unneeded if (!dev->mode_config.poll_enabled) in i915_hpd_poll_init_work() - Call to drm_helper_hpd_irq_event() after we disable polling - Add cancel_work_sync() call to intel_hpd_cancel_work() Changes since v2: - Apparently dev->mode_config.poll_running doesn't actually reflect whether or not a poll is currently in progress, and is actually used for dynamic module paramter enabling/disabling. So now we instead keep track of our own poll_running variable in dev_priv->hotplug - Clean i915_hpd_poll_init_work() a little bit Changes since v3: - Remove the now-redundant connector loop in intel_hpd_init(), just rely on intel_hpd_poll_enable() for setting connector->polled correctly on each connector - Get rid of poll_running - Don't assign enabled in i915_hpd_poll_init_work before we actually lock dev->mode_config.mutex - Wrap enabled assignment in i915_hpd_poll_init_work() in READ_ONCE() for doc purposes - Do the same for dev_priv->hotplug.poll_enabled with WRITE_ONCE in intel_hpd_poll_enable() - Add some comments about racing not mattering in intel_hpd_poll_enable Changes since v4: - Rename intel_hpd_poll_enable() to intel_hpd_poll_init() - Drop the bool argument from intel_hpd_poll_init() - Remove redundant calls to intel_hpd_poll_init() - Rename poll_enable_work to poll_init_work - Add some kerneldoc for intel_hpd_poll_init() - Cross-reference intel_hpd_poll_init() in intel_hpd_init() - Just copy the loop from intel_hpd_init() in intel_hpd_poll_init() Changes since v5: - Minor kerneldoc nitpicks Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Lyude <cpaul@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-06-21 17:03:44 -04:00
struct work_struct poll_init_work;
bool poll_enabled;
unsigned int hpd_storm_threshold;
drm/i915: Add short HPD IRQ storm detection for non-MST systems Unfortunately, it seems that the HPD IRQ storm problem from the early days of Intel GPUs was never entirely solved, only mostly. Within the last couple of days, I got a bug report from one of our customers who had been having issues with their machine suddenly booting up very slowly after having updated. The amount of time it took to boot went from around 30 seconds, to over 6 minutes consistently. After some investigation, I discovered that i915 was reporting massive amounts of short HPD IRQ spam on this system from the DisplayPort port, despite there not being anything actually connected. The symptoms would start with one "long" HPD IRQ being detected at boot: [ 1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0 [ 1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long [ 1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0 [ 1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long [ 1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0 [ 1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long [ 1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long … followed by constant short IRQs afterwards: [ 1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected [ 1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event. [ 1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3] [ 1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085 [ 1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 [ 1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short [ 1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080 The customer's system in question has a GM45 GPU, which is apparently well known for hotplugging storms. So, workaround this impressively broken hardware by changing the default HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and short IRQs count for 1. This makes it so that 5 long IRQs will trigger an HPD storm, and on systems with short HPD storm detection 50 short IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of constant pulsing, which seems like a good middleground between being too sensitive and not being sensitive enough (which would cause visible stutters in userspace every time a storm occurs). And just to be extra safe: we don't enable this by default on systems with MST support. There's too high of a chance of MST support triggering storm detection, and systems that are new enough to support MST are a lot less likely to have issues with IRQ storms anyway. As a note: this patch was tested using a ThinkPad T450s and a Chamelium to simulate the short IRQ storms. Changes since v1: - Don't use two separate thresholds, just make long IRQs count for 10 each and short IRQs count for 1. This simplifies the code a bit - Ville Syrjälä Changes since v2: - Document @long_hpd in intel_hpd_irq_storm_detect, no functional changes Changes since v4: - Remove !! in long_hpd assignment - Ville Syrjälä - queue_hp = true - Ville Syrjälä Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-6-lyude@redhat.com
2018-11-06 16:30:16 -05:00
/* Whether or not to count short HPD IRQs in HPD storms */
u8 hpd_short_storm_enabled;
/*
* if we get a HPD irq from DP and a HPD irq from non-DP
* the non-DP HPD could block the workqueue on a mode config
* mutex getting, that userspace may have taken. However
* userspace is waiting on the DP workqueue to run which is
* blocked behind the non-DP one.
*/
struct workqueue_struct *dp_wq;
};
#define I915_GEM_GPU_DOMAINS \
(I915_GEM_DOMAIN_RENDER | \
I915_GEM_DOMAIN_SAMPLER | \
I915_GEM_DOMAIN_COMMAND | \
I915_GEM_DOMAIN_INSTRUCTION | \
I915_GEM_DOMAIN_VERTEX)
struct drm_i915_private;
drm/i915: Prevent recursive deadlock on releasing a busy userptr During release of the GEM object we hold the struct_mutex. As the object may be holding onto the last reference for the task->mm, calling mmput() may trigger exit_mmap() which close the vma which will call drm_gem_vm_close() and attempt to reacquire the struct_mutex. In order to avoid that recursion, we have to defer the mmput() until after we drop the struct_mutex, i.e. we need to schedule a worker to do the clean up. A further issue spotted by Tvrtko was caused when we took a GTT mmapping of a userptr buffer object. In that case, we would never call mmput as the object would be cyclically referenced by the GTT mmapping and not freed upon process exit - keeping the entire process mm alive after the process task was reaped. The fix employed is to replace the mm_users/mmput() reference handling to mm_count/mmdrop() for the shared i915_mm_struct. INFO: task test_surfaces:1632 blocked for more than 120 seconds.       Tainted: GF          O 3.14.5+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. test_surfaces   D 0000000000000000     0  1632   1590 0x00000082  ffff88014914baa8 0000000000000046 0000000000000000 ffff88014914a010  0000000000012c40 0000000000012c40 ffff8800a0058210 ffff88014784b010  ffff88014914a010 ffff880037b1c820 ffff8800a0058210 ffff880037b1c824 Call Trace:  [<ffffffff81582499>] schedule+0x29/0x70  [<ffffffff815825fe>] schedule_preempt_disabled+0xe/0x10  [<ffffffff81583b93>] __mutex_lock_slowpath+0x183/0x220  [<ffffffff81583c53>] mutex_lock+0x23/0x40  [<ffffffffa005c2a3>] drm_gem_vm_close+0x33/0x70 [drm]  [<ffffffff8115a483>] remove_vma+0x33/0x70  [<ffffffff8115a5dc>] exit_mmap+0x11c/0x170  [<ffffffff8104d6eb>] mmput+0x6b/0x100  [<ffffffffa00f44b9>] i915_gem_userptr_release+0x89/0xc0 [i915]  [<ffffffffa00e6706>] i915_gem_free_object+0x126/0x250 [i915]  [<ffffffffa005c06a>] drm_gem_object_free+0x2a/0x40 [drm]  [<ffffffffa005cc32>] drm_gem_object_handle_unreference_unlocked+0xe2/0x120 [drm]  [<ffffffffa005ccd4>] drm_gem_object_release_handle+0x64/0x90 [drm]  [<ffffffff8127ffeb>] idr_for_each+0xab/0x100  [<ffffffffa005cc70>] ? drm_gem_object_handle_unreference_unlocked+0x120/0x120 [drm]  [<ffffffff81583c46>] ? mutex_lock+0x16/0x40  [<ffffffffa005c354>] drm_gem_release+0x24/0x40 [drm]  [<ffffffffa005b82b>] drm_release+0x3fb/0x480 [drm]  [<ffffffff8118d482>] __fput+0xb2/0x260  [<ffffffff8118d6de>] ____fput+0xe/0x10  [<ffffffff8106f27f>] task_work_run+0x8f/0xf0  [<ffffffff81052228>] do_exit+0x1a8/0x480  [<ffffffff81052551>] do_group_exit+0x51/0xc0  [<ffffffff810525d7>] SyS_exit_group+0x17/0x20  [<ffffffff8158e092>] system_call_fastpath+0x16/0x1b v2: Incorporate feedback from Tvrtko and remove the unnessary mm referencing when creating the i915_mm_struct and improve some of the function names and comments. Reported-by: Jacek Danecki <jacek.danecki@intel.com> Test-case: igt/gem_userptr_blits/process-exit* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Jacek Danecki <jacek.danecki@intel.com> Cc: "Ursulin, Tvrtko" <tvrtko.ursulin@intel.com> Reviewed-by: "Ursulin, Tvrtko" <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org # hold off until 3.17 ships for additional testing Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-08-07 14:20:40 +01:00
struct i915_mm_struct;
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 14:22:37 +01:00
struct i915_mmu_object;
struct drm_i915_file_private {
struct drm_i915_private *dev_priv;
drm/i915: Keep drm_i915_file_private around under RCU Ensure that the drm_i915_file_private continues to exist as we attempt to remove a request from its list, which may race with the destruction of the file. <6> [38.380714] [IGT] gem_ctx_create: starting subtest basic-files <0> [42.201329] BUG: spinlock bad magic on CPU#0, kworker/u16:0/7 <4> [42.201356] general protection fault: 0000 [#1] PREEMPT SMP PTI <4> [42.201371] CPU: 0 PID: 7 Comm: kworker/u16:0 Tainted: G U 5.3.0-rc5-CI-Patchwork_14169+ #1 <4> [42.201391] Hardware name: Dell Inc. OptiPlex 745 /0GW726, BIOS 2.3.1 05/21/2007 <4> [42.201594] Workqueue: i915 retire_work_handler [i915] <4> [42.201614] RIP: 0010:spin_dump+0x5a/0x90 <4> [42.201625] Code: 00 48 8d 88 c0 06 00 00 48 c7 c7 00 71 09 82 e8 35 ef 00 00 48 85 db 44 8b 4d 08 41 b8 ff ff ff ff 48 c7 c1 0b cd 0f 82 74 0e <44> 8b 83 e0 04 00 00 48 8d 8b c0 06 00 00 8b 55 04 48 89 ee 48 c7 <4> [42.201660] RSP: 0018:ffffc9000004bd80 EFLAGS: 00010202 <4> [42.201673] RAX: 0000000000000031 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff820fcd0b <4> [42.201688] RDX: 0000000000000000 RSI: ffff88803de266f8 RDI: 00000000ffffffff <4> [42.201703] RBP: ffff888038381ff8 R08: 00000000ffffffff R09: 000000006b6b6b6b <4> [42.201718] R10: 0000000041cb0b89 R11: 646162206b636f6c R12: ffff88802a618500 <4> [42.201733] R13: ffff88802b32c288 R14: ffff888038381ff8 R15: ffff88802b32c250 <4> [42.201748] FS: 0000000000000000(0000) GS:ffff88803de00000(0000) knlGS:0000000000000000 <4> [42.201765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [42.201778] CR2: 00007f2cefc6d180 CR3: 00000000381ee000 CR4: 00000000000006f0 <4> [42.201793] Call Trace: <4> [42.201805] do_raw_spin_lock+0x66/0xb0 <4> [42.201898] i915_request_retire+0x548/0x7c0 [i915] <4> [42.201989] retire_requests+0x4d/0x60 [i915] <4> [42.202078] i915_retire_requests+0x144/0x2e0 [i915] <4> [42.202169] retire_work_handler+0x10/0x40 [i915] Recently, in commit 44c22f3f1a0a ("drm/i915: Serialize insertion into the file->mm.request_list"), we fixed a race on insertion. Now, it appears we also have a race with destruction! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823181455.31910-1-chris@chris-wilson.co.uk
2019-08-23 19:14:55 +01:00
union {
struct drm_file *file;
struct rcu_head rcu;
};
drm/i915/gem: Delay context creation (v3) The current context uAPI allows for two methods of setting context parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM. The former is allowed to be called at any time while the later happens as part of GEM_CONTEXT_CREATE. Currently, everything settable via one is settable via the other. While some params are fairly simple and setting them on a live context is harmless such as the context priority, others are far trickier such as the VM or the set of engines. In order to swap out the VM, for instance, we have to delay until all current in-flight work is complete, swap in the new VM, and then continue. This leads to a plethora of potential race conditions we'd really rather avoid. In previous patches, we added a i915_gem_proto_context struct which is capable of storing and tracking all such create parameters. This commit delays the creation of the actual context until after the client is done configuring it with SET_CONTEXT_PARAM. From the perspective of the client, it has the same u32 context ID the whole time. From the perspective of i915, however, it's an i915_gem_proto_context right up until the point where we attempt to do something which the proto-context can't handle. Then the real context gets created. This is accomplished via a little xarray dance. When GEM_CONTEXT_CREATE is called, we create a proto-context, reserve a slot in context_xa but leave it NULL, the proto-context in the corresponding slot in proto_context_xa. Then, whenever we go to look up a context, we first check context_xa. If it's there, we return the i915_gem_context and we're done. If it's not, we look in proto_context_xa and, if we find it there, we create the actual context and kill the proto-context. In order for this dance to work properly, everything which ever touches a proto-context is guarded by drm_i915_file_private::proto_context_lock, including context creation. Yes, this means context creation now takes a giant global lock but it can't really be helped and that should never be on any driver's fast-path anyway. v2 (Daniel Vetter): - Commit message grammatical fixes. - Use WARN_ON instead of GEM_BUG_ON - Rename lazy_create_context_locked to finalize_create_context_locked - Rework the control-flow logic in the setparam ioctl - Better documentation all around v3 (kernel test robot): - Make finalize_create_context_locked static Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-25-jason@jlekstrand.net
2021-07-08 10:48:29 -05:00
/** @proto_context_lock: Guards all struct i915_gem_proto_context
* operations
*
* This not only guards @proto_context_xa, but is always held
* whenever we manipulate any struct i915_gem_proto_context,
* including finalizing it on first actual use of the GEM context.
*
* See i915_gem_proto_context.
*/
struct mutex proto_context_lock;
/** @proto_context_xa: xarray of struct i915_gem_proto_context
*
* Historically, the context uAPI allowed for two methods of
* setting context parameters: SET_CONTEXT_PARAM and
* CONTEXT_CREATE_EXT_SETPARAM. The former is allowed to be called
* at any time while the later happens as part of
* GEM_CONTEXT_CREATE. Everything settable via one was settable
* via the other. While some params are fairly simple and setting
* them on a live context is harmless such as the context priority,
* others are far trickier such as the VM or the set of engines.
* In order to swap out the VM, for instance, we have to delay
* until all current in-flight work is complete, swap in the new
* VM, and then continue. This leads to a plethora of potential
* race conditions we'd really rather avoid.
*
* We have since disallowed setting these more complex parameters
* on active contexts. This works by delaying the creation of the
* actual context until after the client is done configuring it
* with SET_CONTEXT_PARAM. From the perspective of the client, it
* has the same u32 context ID the whole time. From the
* perspective of i915, however, it's a struct i915_gem_proto_context
* right up until the point where we attempt to do something which
* the proto-context can't handle. Then the struct i915_gem_context
* gets created.
*
* This is accomplished via a little xarray dance. When
* GEM_CONTEXT_CREATE is called, we create a struct
* i915_gem_proto_context, reserve a slot in @context_xa but leave
* it NULL, and place the proto-context in the corresponding slot
* in @proto_context_xa. Then, in i915_gem_context_lookup(), we
* first check @context_xa. If it's there, we return the struct
* i915_gem_context and we're done. If it's not, we look in
* @proto_context_xa and, if we find it there, we create the actual
* context and kill the proto-context.
*
* In order for this dance to work properly, everything which ever
* touches a struct i915_gem_proto_context is guarded by
* @proto_context_lock, including context creation. Yes, this
* means context creation now takes a giant global lock but it
* can't really be helped and that should never be on any driver's
* fast-path anyway.
*/
struct xarray proto_context_xa;
/** @context_xa: xarray of fully created i915_gem_context
*
* Write access to this xarray is guarded by @proto_context_lock.
* Otherwise, writers may race with finalize_create_context_locked().
*
* See @proto_context_xa.
*/
struct xarray context_xa;
struct xarray vm_xa;
unsigned int bsd_engine;
/*
* Every context ban increments per client ban score. Also
* hangs in short succession increments ban score. If ban threshold
* is reached, client is considered banned and submitting more work
* will fail. This is a stop gap measure to limit the badly behaving
* clients access to gpu. Note that unbannable contexts never increment
* the client ban score.
*/
#define I915_CLIENT_SCORE_HANG_FAST 1
#define I915_CLIENT_FAST_HANG_JIFFIES (60 * HZ)
#define I915_CLIENT_SCORE_CONTEXT_BAN 3
#define I915_CLIENT_SCORE_BANNED 9
/** ban_score: Accumulated score of all ctx bans and fast hangs. */
atomic_t ban_score;
unsigned long hang_timestamp;
};
/* Interface history:
*
* 1.1: Original.
* 1.2: Add Power Management
* 1.3: Add vblank support
* 1.4: Fix cmdbuffer path, add heap destroy
* 1.5: Add vblank pipe configuration
* 1.6: - New ioctl for scheduling buffer swaps on vertical blank
* - Support vertical blank on secondary display pipe
*/
#define DRIVER_MAJOR 1
#define DRIVER_MINOR 6
#define DRIVER_PATCHLEVEL 0
struct intel_overlay;
struct intel_overlay_error_state;
struct sdvo_device_mapping {
u8 initialized;
u8 dvo_port;
u8 slave_addr;
u8 dvo_wiring;
u8 i2c_pin;
u8 ddc_pin;
};
struct intel_connector;
struct intel_encoder;
struct intel_atomic_state;
struct intel_cdclk_config;
struct intel_cdclk_funcs;
struct intel_cdclk_state;
struct intel_cdclk_vals;
struct intel_initial_plane_config;
struct intel_crtc;
drm/i915: move find_pll callback to dev_priv->display Now that the DP madness is cleared out, this is all only per-platform. So move it out from the intel clock limits structure. While at it drop the intel prefix on the static functions, call the vtable entry find_dpll (since it's for the display pll) and rip out the now unnecessary forward declarations. Note that the parameters of ->find_dpll are still unchanged, but they eventually need to be moved over to just take in a pipe configuration. But currently a lot of things are still missing from the pipe configuration (reflock, output-specific dpll limits and preferences, downclocked dotclock). So this will happen in a later step. Note that intel_g4x_limit has a peculiar case where it selects intel_limits_i9xx_sdvo as the limit. This is pretty bogus and also not used since the only output types left are DP and native TV-out which both use special pre-tuned dpll values. v2: Re-add comment for the find_pll callback (requested by Paulo) and elaborate on why the transformation is correct for g4x platforms (to clarify a review question from Paulo). Double up on that by adding a WARN as suggested by Paulo Zanoni on irc. v3: Initialize limits to NULL since gcc is now unhappy. v4: v2/3 will blow up with a NULL dereference in ->find_dpll for dp and TV-out ports, spotted by Paulo on irc. So just give up on this madness for now, and leave this to be fixed in a later patch. v5: Since the ever-so-slight change for g4x might result in some dpll parameter computation failing spuriously where before it didn't for ports with preset dpll settings (DP & TV-out) override this. For paranoia also do it in the ilk+ code. Cc: Paulo Zanoni <przanoni@gmail.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 22:40:22 +02:00
struct intel_limit;
struct dpll;
/* functions used internal in intel_pm.c */
struct drm_i915_clock_gating_funcs {
void (*init_clock_gating)(struct drm_i915_private *dev_priv);
};
/* functions used for watermark calcs for display. */
struct drm_i915_wm_disp_funcs {
/* update_wm is for legacy wm management */
void (*update_wm)(struct drm_i915_private *dev_priv);
int (*compute_pipe_wm)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
int (*compute_intermediate_wm)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
void (*initial_watermarks)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
void (*atomic_update_watermarks)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
void (*optimize_watermarks)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
int (*compute_global_watermarks)(struct intel_atomic_state *state);
};
struct intel_color_funcs {
int (*color_check)(struct intel_crtc_state *crtc_state);
/*
* Program double buffered color management registers during
* vblank evasion. The registers should then latch during the
* next vblank start, alongside any other double buffered registers
* involved with the same commit.
*/
void (*color_commit)(const struct intel_crtc_state *crtc_state);
/*
* Load LUTs (and other single buffered color management
* registers). Will (hopefully) be called during the vblank
* following the latching of any double buffered registers
* involved with the same commit.
*/
void (*load_luts)(const struct intel_crtc_state *crtc_state);
void (*read_luts)(struct intel_crtc_state *crtc_state);
};
struct intel_hotplug_funcs {
void (*hpd_irq_setup)(struct drm_i915_private *dev_priv);
};
struct intel_fdi_funcs {
void (*fdi_link_train)(struct intel_crtc *crtc,
const struct intel_crtc_state *crtc_state);
};
struct intel_dpll_funcs {
int (*crtc_compute_clock)(struct intel_crtc_state *crtc_state);
};
struct drm_i915_display_funcs {
/* Returns the active state of the crtc, and if the crtc is active,
* fills out the pipe-config with the hw state. */
bool (*get_pipe_config)(struct intel_crtc *,
struct intel_crtc_state *);
void (*get_initial_plane_config)(struct intel_crtc *,
struct intel_initial_plane_config *);
void (*crtc_enable)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
void (*crtc_disable)(struct intel_atomic_state *state,
struct intel_crtc *crtc);
void (*commit_modeset_enables)(struct intel_atomic_state *state);
};
#define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */
/*
* HIGH_RR is the highest eDP panel refresh rate read from EDID
* LOW_RR is the lowest eDP panel refresh rate found from EDID
* parsing for same resolution.
*/
enum drrs_refresh_rate_type {
DRRS_HIGH_RR,
DRRS_LOW_RR,
DRRS_MAX_RR, /* RR count */
};
enum drrs_support_type {
DRRS_NOT_SUPPORTED = 0,
STATIC_DRRS_SUPPORT = 1,
SEAMLESS_DRRS_SUPPORT = 2
drm/i915: Add support for DRRS to switch RR This patch computes and stored 2nd M/N/TU for switching to different refresh rate dynamically. PIPECONF_EDP_RR_MODE_SWITCH bit helps toggle between alternate refresh rates programmed in 2nd M/N/TU registers. v2: Daniel's review comments Computing M2/N2 in compute_config and storing it in crtc_config v3: Modified reference to edp_downclock and edp_downclock_avail based on the changes made to move them from dev_private to intel_panel. v4: Modified references to is_drrs_supported based on the changes made to rename it to drrs_support. v5: Jani's review comments Removed superfluous return statements. Changed support for Gen 7 and above. Corrected indentation. Re-structured the code which finds crtc and connector from encoder. Changed some logs to be less verbose. v6: Modifying i915_drrs to include only intel connector as intel_dp can be derived from intel connector when required. v7: As per internal review comments, acquiring mutex just before accessing drrs RR. As per Chris's review comments, added documentation about the use of locking in the function. v8: Incorporated Jani's review comments. Removed reference to edp_downclock. v9: Jani's review comments. Modified comment in set_drrs. Changed index to type edp_drrs_refresh_rate_type. Check if PSR is enabled before setting registers fo DRRS. Signed-off-by: Pradeep Bhat <pradeep.bhat@intel.com> Signed-off-by: Vandana Kannan <vandana.kannan@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-05 12:13:28 +05:30
};
struct intel_dp;
struct i915_drrs {
struct mutex mutex;
struct delayed_work work;
struct intel_dp *dp;
unsigned busy_frontbuffer_bits;
enum drrs_refresh_rate_type refresh_rate_type;
enum drrs_support_type type;
};
#define QUIRK_LVDS_SSC_DISABLE (1<<1)
#define QUIRK_INVERT_BRIGHTNESS (1<<2)
#define QUIRK_BACKLIGHT_PRESENT (1<<3)
#define QUIRK_PIN_SWIZZLED_PAGES (1<<5)
#define QUIRK_INCREASE_T12_DELAY (1<<6)
#define QUIRK_INCREASE_DDI_DISABLED_TIME (1<<7)
#define QUIRK_NO_PPS_BACKLIGHT_POWER_HOOK (1<<8)
struct intel_fbdev;
struct intel_gmbus {
struct i2c_adapter adapter;
#define GMBUS_FORCE_BIT_RETRY (1U << 31)
u32 force_bit;
u32 reg0;
drm/i915: Type safe register read/write Make I915_READ and I915_WRITE more type safe by wrapping the register offset in a struct. This should eliminate most of the fumbles we've had with misplaced parens. This only takes care of normal mmio registers. We could extend the idea to other register types and define each with its own struct. That way you wouldn't be able to accidentally pass the wrong thing to a specific register access function. The gpio_reg setup is probably the ugliest thing left. But I figure I'd just leave it for now, and wait for some divine inspiration to strike before making it nice. As for the generated code, it's actually a bit better sometimes. Eg. looking at i915_irq_handler(), we can see the following change: lea 0x70024(%rdx,%rax,1),%r9d mov $0x1,%edx - movslq %r9d,%r9 - mov %r9,%rsi - mov %r9,-0x58(%rbp) - callq *0xd8(%rbx) + mov %r9d,%esi + mov %r9d,-0x48(%rbp) callq *0xd8(%rbx) So previously gcc thought the register offset might be signed and decided to sign extend it, just in case. The rest appears to be mostly just minor shuffling of instructions. v2: i915_mmio_reg_{offset,equal,valid}() helpers added s/_REG/_MMIO/ in the register defines mo more switch statements left to worry about ring_emit stuff got sorted in a prep patch cmd parser, lrc context and w/a batch buildup also in prep patch vgpu stuff cleaned up and moved to a prep patch all other unrelated changes split out v3: Rebased due to BXT DSI/BLC, MOCS, etc. v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/ Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 15:33:26 +02:00
i915_reg_t gpio_reg;
struct i2c_algo_bit_data bit_algo;
struct drm_i915_private *dev_priv;
};
struct i915_suspend_saved_registers {
u32 saveDSPARB;
u32 saveSWF0[16];
u32 saveSWF1[16];
u32 saveSWF3[3];
u16 saveGCDGMBUS;
};
struct vlv_s0ix_state;
#define MAX_L3_SLICES 2
struct intel_l3_parity {
u32 *remap_info[MAX_L3_SLICES];
struct work_struct error_work;
int which_slice;
};
struct i915_gem_mm {
/*
* Shortcut for the stolen region. This points to either
* INTEL_REGION_STOLEN_SMEM for integrated platforms, or
* INTEL_REGION_STOLEN_LMEM for discrete, or NULL if the device doesn't
* support stolen.
*/
struct intel_memory_region *stolen_region;
/** Memory allocator for GTT stolen memory */
struct drm_mm stolen;
/** Protects the usage of the GTT stolen memory allocator. This is
* always the inner lock when overlapping with struct_mutex. */
struct mutex stolen_lock;
/* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */
spinlock_t obj_lock;
/**
* List of objects which are purgeable.
*/
struct list_head purge_list;
/**
* List of objects which have allocated pages and are shrinkable.
*/
struct list_head shrink_list;
/**
* List of objects which are pending destruction.
*/
struct llist_head free_list;
struct work_struct free_work;
/**
* Count of objects pending destructions. Used to skip needlessly
* waiting on an RCU barrier if no objects are waiting to be freed.
*/
atomic_t free_count;
/**
* tmpfs instance used for shmem backed objects
*/
struct vfsmount *gemfs;
struct intel_memory_region *regions[INTEL_REGION_UNKNOWN];
struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
struct shrinker shrinker;
#ifdef CONFIG_MMU_NOTIFIER
/**
drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7. Instead of doing what we do currently, which will never work with PROVE_LOCKING, do the same as AMD does, and something similar to relocation slowpath. When all locks are dropped, we acquire the pages for pinning. When the locks are taken, we transfer those pages in .get_pages() to the bo. As a final check before installing the fences, we ensure that the mmu notifier was not called; if it is, we return -EAGAIN to userspace to signal it has to start over. Changes since v1: - Unbinding is done in submit_init only. submit_begin() removed. - MMU_NOTFIER -> MMU_NOTIFIER Changes since v2: - Make i915->mm.notifier a spinlock. Changes since v3: - Add WARN_ON if there are any page references left, should have been 0. - Return 0 on success in submit_init(), bug from spinlock conversion. - Release pvec outside of notifier_lock (Thomas). Changes since v4: - Mention why we're clearing eb->[i + 1].vma in the code. (Thomas) - Actually check all invalidations in eb_move_to_gpu. (Thomas) - Do not wait when process is exiting to fix gem_ctx_persistence.userptr. Changes since v5: - Clarify why check on PF_EXITING is (temporarily) required. Changes since v6: - Ensure userptr validity is checked in set_domain through a special path. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> [danvet: s/kfree/kvfree/ in i915_gem_object_userptr_drop_ref in the previous review round, but which got lost. The other open questions around page refcount are imo better discussed in a separate series, with amdgpu folks involved]. Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-17-maarten.lankhorst@linux.intel.com
2021-03-23 16:50:05 +01:00
* notifier_lock for mmu notifiers, memory may not be allocated
* while holding this lock.
*/
rwlock_t notifier_lock;
#endif
drm/i915: Report all objects with allocated pages to the shrinker Currently, we try to report to the shrinker the precise number of objects (pages) that are available to be reaped at this moment. This requires searching all objects with allocated pages to see if they fulfill the search criteria, and this count is performed quite frequently. (The shrinker tries to free ~128 pages on each invocation, before which we count all the objects; counting takes longer than unbinding the objects!) If we take the pragmatic view that with sufficient desire, all objects are eventually reapable (they become inactive, or no longer used as framebuffer etc), we can simply return the count of pinned pages maintained during get_pages/put_pages rather than walk the lists every time. The downside is that we may (slightly) over-report the number of objects/pages we could shrink and so penalize ourselves by shrinking more than required. This is mitigated by keeping the order in which we shrink objects such that we avoid penalizing active and frequently used objects, and if memory is so tight that we need to free them we would need to anyway. v2: Only expose shrinkable objects to the shrinker; a small reduction in not considering stolen and foreign objects. v3: Restore the tracking from a "backup" copy from before the gem/ split Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-2-chris@chris-wilson.co.uk
2019-05-30 21:35:00 +01:00
/* shrinker accounting, also useful for userland debugging */
u64 shrink_memory;
u32 shrink_count;
};
#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
unsigned long i915_fence_context_timeout(const struct drm_i915_private *i915,
u64 context);
static inline unsigned long
i915_fence_timeout(const struct drm_i915_private *i915)
{
return i915_fence_context_timeout(i915, U64_MAX);
}
/* Amount of SAGV/QGV points, BSpec precisely defines this */
#define I915_NUM_QGV_POINTS 8
#define HAS_HW_SAGV_WM(i915) (DISPLAY_VER(i915) >= 13 && !IS_DGFX(i915))
drm/i915: Implement PSF GV point support PSF GV points are an additional factor that can limit the bandwidth available to display, separate from the traditional QGV points. Whereas traditional QGV points represent possible memory clock frequencies, PSF GV points reflect possible frequencies of the memory fabric. Switching between PSF GV points has the advantage of incurring almost no memory access block time and thus does not need to be accounted for in watermark calculations. This patch adds support for those on top of regular QGV points. Those are supposed to be used simultaneously, i.e we are always at some QGV and some PSF GV point, based on the current video mode requirements. Bspec: 64631, 53998 v2: Seems that initial assumption made during ml conversation was wrong, PCode rejects any masks containing points beyond the ones returned, so even though BSpec says we have around 8 points theoretically, we can mask/unmask only those which are returned, trying to manipulate those beyond causes a failure from PCode. So switched back to generating mask from 1 - num_qgv_points, where num_qgv_points is the actual amount of points, advertised by PCode. v3: - Extended restricted qgv point mask to 0xf, as we have now 3:2 bits for PSF GV points(Matt Roper) - Replaced val2 with NULL from PCode request, since its not being used(Matt Roper) - Replaced %d to 0x%x for better readability(thanks for spotting) Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210531064845.4389-2-stanislav.lisovskiy@intel.com
2021-05-31 09:48:45 +03:00
/* Amount of PSF GV points, BSpec precisely defines this */
#define I915_NUM_PSF_GV_POINTS 3
enum psr_lines_to_wait {
PSR_0_LINES_TO_WAIT = 0,
PSR_1_LINE_TO_WAIT,
PSR_4_LINES_TO_WAIT,
PSR_8_LINES_TO_WAIT
};
struct intel_vbt_data {
/* bdb version */
u16 version;
struct drm_display_mode *lfp_lvds_vbt_mode; /* if any */
struct drm_display_mode *sdvo_lvds_vbt_mode; /* if any */
/* Feature bits */
unsigned int int_tv_support:1;
unsigned int lvds_dither:1;
unsigned int int_crt_support:1;
unsigned int lvds_use_ssc:1;
unsigned int int_lvds_support:1;
unsigned int display_clock_mode:1;
unsigned int fdi_rx_polarity_inverted:1;
unsigned int panel_type:4;
int lvds_ssc_freq;
unsigned int bios_lvds_val; /* initial [PCH_]LVDS reg val in VBIOS */
enum drm_panel_orientation orientation;
enum drrs_support_type drrs_type;
struct {
int rate;
int lanes;
int preemphasis;
int vswing;
bool low_vswing;
bool initialized;
int bpp;
struct edp_power_seq pps;
bool hobl;
} edp;
struct {
bool enable;
bool full_link;
bool require_aux_wakeup;
int idle_frames;
enum psr_lines_to_wait lines_to_wait;
drm/i915/psr: vbt change for psr For psr block #9, the vbt description has moved to options [0-3] for TP1,TP2,TP3 Wakeup time from decimal value without any change to vbt structure. Since spec does not mention from which VBT version this change was added to vbt.bsf file, we cannot depend on bdb->version check to change for all the platforms. There is RCR inplace for GOP team to provide the version number to make generic change. Since Kabylake with bdb version 209 is having this change, limiting this change to gen9_bc and version 209+ to unblock google. Tested on skl(bdb version 203,without options) and kabylake(bdb version 209,212) having new options. bspec 20131 v2: (Jani and Rodrigo) move the 165 version check to intel_bios.c v3: Jani Move the abstraction to intel_bios. v4: Jani Rename tp*_wakeup_time to have "us" suffix. For values outside range[0-3],default to max 2500us. Old decimal value was wake up time in multiples of 100us. v5: Jani and Rodrigo Handle option 2 in default condition. Print oustide range value. For negetive values default to 2500us. v6: Jani Handle default first and then fall through for case 2. v7: Rodrigo Apply this change for IS_GEN9_BC and vbt version > 209 v8: Puthik Add new function vbt_psr_to_us. v9: Jani Change to v7 version as it's more readable. DK add comment /*fall through*/ after case2. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Puthikorn Voravootivat <puthik@chromium.org> Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Maulik V Vaghela <maulik.v.vaghela@intel.com> Signed-off-by: Vathsala Nagaraju <vathsala.nagaraju@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1526981243-2745-1-git-send-email-vathsala.nagaraju@intel.com
2018-05-22 14:57:23 +05:30
int tp1_wakeup_time_us;
int tp2_tp3_wakeup_time_us;
int psr2_tp2_tp3_wakeup_time_us;
} psr;
struct {
u16 pwm_freq_hz;
u16 brightness_precision_bits;
bool present;
bool active_low_pwm;
u8 min_brightness; /* min_brightness/255 of max */
u8 controller; /* brightness controller number */
enum intel_backlight_type type;
} backlight;
/* MIPI DSI */
struct {
u16 panel_id;
struct mipi_config *config;
struct mipi_pps_data *pps;
u16 bl_ports;
u16 cabc_ports;
u8 seq_version;
u32 size;
u8 *data;
const u8 *sequence[MIPI_SEQ_MAX];
drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3 So far models of the Dell Venue 8 Pro, with a panel with MIPI panel index = 3, one of which has been kindly provided to me by Jan Brummer, where not working with the i915 driver, giving a black screen on the first modeset. The problem with at least these Dells is that their VBT defines a MIPI ASSERT sequence, but not a DEASSERT sequence. Instead they DEASSERT the reset in their INIT_OTP sequence, but the deassert must be done before calling intel_dsi_device_ready(), so that is too late. Simply doing the INIT_OTP sequence earlier is not enough to fix this, because the INIT_OTP sequence also sends various MIPI packets to the panel, which can only happen after calling intel_dsi_device_ready(). This commit fixes this by splitting the INIT_OTP sequence into everything before the first DSI packet and everything else, including the first DSI packet. The first part (everything before the first DSI packet) is then used as deassert sequence. Changed in v2: -Split the init OTP sequence into a deassert reset and the actual init OTP sequence, instead of calling it earlier and then having the first mipi_exec_send_packet() call call intel_dsi_device_ready(). Changes in v3: -Move the whole shebang to intel_bios.c Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82880 References: https://bugs.freedesktop.org/show_bug.cgi?id=101205 Cc: Jan-Michael Brummer <jan.brummer@tabos.org> Reported-by: Jan-Michael Brummer <jan.brummer@tabos.org> Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180214082151.25015-3-hdegoede@redhat.com
2018-02-14 09:21:51 +01:00
u8 *deassert_seq; /* Used by fixup_mipi_sequences() */
enum drm_panel_orientation orientation;
} dsi;
int crt_ddc_pin;
struct list_head display_devices;
struct intel_bios_encoder_data *ports[I915_MAX_PORTS]; /* Non-NULL if port present. */
struct sdvo_device_mapping sdvo_mappings[2];
};
drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush_*_domain and set_to_*_domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 16:01:59 +02:00
struct i915_frontbuffer_tracking {
spinlock_t lock;
drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush_*_domain and set_to_*_domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 16:01:59 +02:00
/*
* Tracking bits for delayed frontbuffer flushing du to gpu activity or
* scheduled flips.
*/
unsigned busy_bits;
unsigned flip_bits;
};
drm/i915: Introduce a PV INFO page structure for Intel GVT-g. Introduce a PV INFO structure, to facilitate the Intel GVT-g technology, which is a GPU virtualization solution with mediated pass-through. This page contains the shared information between i915 driver and the host emulator. For now, this structure utilizes an area of 4K bytes on HSW GPU's unused MMIO space. Future hardware will have the reserved window architecturally defined, and layout of the page will be added in future BSpec. The i915 driver load routine detects if it is running in a VM by reading the contents of this PV INFO page. Thereafter a flag, vgpu.active is set, and intel_vgpu_active() is used by checking this flag to conclude if GPU is virtualized with Intel GVT-g. By now, intel_vgpu_active() will return true, only when the driver is running as a guest in the Intel GVT-g enhanced environment on HSW platform. v2: take Chris' comments: - call the i915_check_vgpu() in intel_uncore_init() - sanitize i915_check_vgpu() by adding BUILD_BUG_ON() and debug info take Daniel's comments: - put the definition of PV INFO into a new header - i915_vgt_if.h other changes: - access mmio regs by readq/readw in i915_check_vgpu() v3: take Daniel's comments: - move the i915/vgt interfaces into a new i915_vgpu.c - update makefile - add kerneldoc to functions which are non-static - add a DOC: section describing some of the high-level design - update drm docbook other changes: - rename i915_vgt_if.h to i915_vgpu.h v4: take Tvrtko's comments: - fix a typo in commit message - add debug message when vgt version mismatches - rename low_gmadr/high_gmadr to mappable/non-mappable in PV INFO structure Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Eddie Dong <eddie.dong@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-10 19:05:47 +08:00
struct i915_virtual_gpu {
struct mutex lock; /* serialises sending of g2v_notify command pkts */
drm/i915: Introduce a PV INFO page structure for Intel GVT-g. Introduce a PV INFO structure, to facilitate the Intel GVT-g technology, which is a GPU virtualization solution with mediated pass-through. This page contains the shared information between i915 driver and the host emulator. For now, this structure utilizes an area of 4K bytes on HSW GPU's unused MMIO space. Future hardware will have the reserved window architecturally defined, and layout of the page will be added in future BSpec. The i915 driver load routine detects if it is running in a VM by reading the contents of this PV INFO page. Thereafter a flag, vgpu.active is set, and intel_vgpu_active() is used by checking this flag to conclude if GPU is virtualized with Intel GVT-g. By now, intel_vgpu_active() will return true, only when the driver is running as a guest in the Intel GVT-g enhanced environment on HSW platform. v2: take Chris' comments: - call the i915_check_vgpu() in intel_uncore_init() - sanitize i915_check_vgpu() by adding BUILD_BUG_ON() and debug info take Daniel's comments: - put the definition of PV INFO into a new header - i915_vgt_if.h other changes: - access mmio regs by readq/readw in i915_check_vgpu() v3: take Daniel's comments: - move the i915/vgt interfaces into a new i915_vgpu.c - update makefile - add kerneldoc to functions which are non-static - add a DOC: section describing some of the high-level design - update drm docbook other changes: - rename i915_vgt_if.h to i915_vgpu.h v4: take Tvrtko's comments: - fix a typo in commit message - add debug message when vgt version mismatches - rename low_gmadr/high_gmadr to mappable/non-mappable in PV INFO structure Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Eddie Dong <eddie.dong@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-10 19:05:47 +08:00
bool active;
drm/i915: Enable guest i915 full ppgtt functionality Enable the guest i915 full ppgtt functionality when host can provide this capability. vgt_caps is introduced to guest i915 driver to get the vgpu capabilities from the device model. VGT_CPAS_FULL_PPGTT is one of the capabilities type to let guest i915 dirver know that the guest i915 full ppgtt is supported by device model. Notice that the minor version of pvinfo isn't bumped because of this vgt_caps introduction, due to older guest would be broken by simply increasing the pvinfo version. Although the pvinfo minor version doesn't increase, the compatibility won't be blocked. The compatibility is ensured by checking the value of caps field in pvinfo. Zero means no full ppgtt support and BIT(2) means this feature is provided. Changes since v1: - Use u32 instead of uint32_t (Joonas) - Move VGT_CAPS_FULL_PPGTT introduction to this patch and use #define instead of enum (Joonas) - Rewrite the vgpu full ppgtt capability checking logic. (Joonas) - Some coding style refine. (Joonas) Changes since v2: - Divide the whole patch set into two separate patch series, with one patch in i915 side to check guest i915 full ppgtt capability and enable it when this capability is supported by the device model, and the other one in gvt side which fixs the blocking issue and enables the device model to provide the capability to guest. And this patch focuses on guest i915 side. (Joonas) - Change the title from "introduce vgt_caps to pvinfo" to "Enable guest i915 full ppgtt functionality". (Tina) Change since v3: - Add some comments about pvinfo caps and version. (Joonas) Change since v4: - Tested by Tina Zhang. Change since v5: - Add limitation about supporting 32bit full ppgtt. Change since v6: - Change the fallback to 48bit full ppgtt if i915.ppgtt_enable=2. (Zhenyu) Change in v9: - Remove the fixme comment due to no plan for 32bit full ppgtt support. (Zhenyu) - Reorder the patch-set to fix compiling issue with git-bisect. (Zhenyu) - Add print log when forcing guest 48bit full ppgtt. (Zhenyu) v10: - Update against Joonas's has_full_ppgtt and has_full_48bit_ppgtt disconnect change. (Zhenyu) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> # in v2 Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-08-14 15:20:46 +08:00
u32 caps;
drm/i915: Introduce a PV INFO page structure for Intel GVT-g. Introduce a PV INFO structure, to facilitate the Intel GVT-g technology, which is a GPU virtualization solution with mediated pass-through. This page contains the shared information between i915 driver and the host emulator. For now, this structure utilizes an area of 4K bytes on HSW GPU's unused MMIO space. Future hardware will have the reserved window architecturally defined, and layout of the page will be added in future BSpec. The i915 driver load routine detects if it is running in a VM by reading the contents of this PV INFO page. Thereafter a flag, vgpu.active is set, and intel_vgpu_active() is used by checking this flag to conclude if GPU is virtualized with Intel GVT-g. By now, intel_vgpu_active() will return true, only when the driver is running as a guest in the Intel GVT-g enhanced environment on HSW platform. v2: take Chris' comments: - call the i915_check_vgpu() in intel_uncore_init() - sanitize i915_check_vgpu() by adding BUILD_BUG_ON() and debug info take Daniel's comments: - put the definition of PV INFO into a new header - i915_vgt_if.h other changes: - access mmio regs by readq/readw in i915_check_vgpu() v3: take Daniel's comments: - move the i915/vgt interfaces into a new i915_vgpu.c - update makefile - add kerneldoc to functions which are non-static - add a DOC: section describing some of the high-level design - update drm docbook other changes: - rename i915_vgt_if.h to i915_vgpu.h v4: take Tvrtko's comments: - fix a typo in commit message - add debug message when vgt version mismatches - rename low_gmadr/high_gmadr to mappable/non-mappable in PV INFO structure Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Eddie Dong <eddie.dong@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-10 19:05:47 +08:00
};
struct i915_selftest_stash {
atomic_t counter;
struct ida mock_region_instances;
};
/* intel_audio.c private */
struct intel_audio_funcs;
struct intel_audio_private {
/* Display internal audio functions */
const struct intel_audio_funcs *funcs;
/* hda/i915 audio component */
struct i915_audio_component *component;
bool component_registered;
/* mutex for audio/video sync */
struct mutex mutex;
int power_refcount;
u32 freq_cntrl;
/* Used to save the pipe-to-encoder mapping for audio */
struct intel_encoder *encoder_map[I915_MAX_PIPES];
/* necessary resource sharing with HDMI LPE audio driver. */
struct {
struct platform_device *platdev;
int irq;
} lpe;
};
struct drm_i915_private {
struct drm_device drm;
drm/i915: Use drmm_add_final_kfree With this we can drop the final kfree from the release function. The mock device in the selftests needed it's pci_device split up from the drm_device. In the future we could simplify this again by allocating the pci_device as a managed allocation too. v2: I overlooked that i915_driver_destroy is also called in the unwind code of the error path. There we need a drm_dev_put. Similar for the mock object. Now the problem with that is that the drm_driver->release callbacks for both the real driver and the mock one assume everything has been set up. Hence going through that path for a partially set up driver will result in issues. Quickest fix is to disable the ->release() hook until the driver is fully initialized, and keep the onion unwinding. Long term would be cleanest to move everything over to drmm_ release actions, but that's a lot of work for a big driver like i915. Plus more core work needed first anyway. v3: Fix i915_drm pointer wrangling in mock_gem_device. Also switch over to start using drm_dev_put() to clean up even on the error path. Aside I think the current error path is leaking the allocation. v4: more fixes for intel-gfx-ci, some if it damage from v3 :-/ Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20200323144950.3018436-9-daniel.vetter@ffwll.ch
2020-03-23 15:49:07 +01:00
/* FIXME: Device release actions should all be moved to drmm_ */
bool do_release;
/* i915 device parameters */
struct i915_params params;
const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
struct intel_driver_caps caps;
/**
* Data Stolen Memory - aka "i915 stolen memory" gives us the start and
* end of stolen which we can optionally use to create GEM objects
* backed by stolen memory. Note that stolen_usable_size tells us
* exactly how much of this we are actually allowed to use, given that
* some portion of it is in fact reserved for use by hardware functions.
*/
struct resource dsm;
/**
* Reseved portion of Data Stolen Memory
*/
struct resource dsm_reserved;
/*
* Stolen memory is segmented in hardware with different portions
* offlimits to certain functions.
*
* The drm_mm is initialised to the total accessible range, as found
* from the PCI config. On Broadwell+, this is further restricted to
* avoid the first page! The upper end of stolen memory is reserved for
* hardware functions and similarly removed from the accessible range.
*/
resource_size_t stolen_usable_size; /* Total size minus reserved ranges */
struct intel_uncore uncore;
struct intel_uncore_mmio_debug mmio_debug;
drm/i915: Introduce a PV INFO page structure for Intel GVT-g. Introduce a PV INFO structure, to facilitate the Intel GVT-g technology, which is a GPU virtualization solution with mediated pass-through. This page contains the shared information between i915 driver and the host emulator. For now, this structure utilizes an area of 4K bytes on HSW GPU's unused MMIO space. Future hardware will have the reserved window architecturally defined, and layout of the page will be added in future BSpec. The i915 driver load routine detects if it is running in a VM by reading the contents of this PV INFO page. Thereafter a flag, vgpu.active is set, and intel_vgpu_active() is used by checking this flag to conclude if GPU is virtualized with Intel GVT-g. By now, intel_vgpu_active() will return true, only when the driver is running as a guest in the Intel GVT-g enhanced environment on HSW platform. v2: take Chris' comments: - call the i915_check_vgpu() in intel_uncore_init() - sanitize i915_check_vgpu() by adding BUILD_BUG_ON() and debug info take Daniel's comments: - put the definition of PV INFO into a new header - i915_vgt_if.h other changes: - access mmio regs by readq/readw in i915_check_vgpu() v3: take Daniel's comments: - move the i915/vgt interfaces into a new i915_vgpu.c - update makefile - add kerneldoc to functions which are non-static - add a DOC: section describing some of the high-level design - update drm docbook other changes: - rename i915_vgt_if.h to i915_vgpu.h v4: take Tvrtko's comments: - fix a typo in commit message - add debug message when vgt version mismatches - rename low_gmadr/high_gmadr to mappable/non-mappable in PV INFO structure Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Eddie Dong <eddie.dong@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-10 19:05:47 +08:00
struct i915_virtual_gpu vgpu;
struct intel_gvt *gvt;
drm/i915: gvt: Introduce the basic architecture of GVT-g This patch introduces the very basic framework of GVT-g device model, includes basic prototypes, definitions, initialization. v12: - Call intel_gvt_init() in driver early initialization stage. (Chris) v8: - Remove the GVT idr and mutex in intel_gvt_host. (Joonas) v7: - Refine the URL link in Kconfig. (Joonas) - Refine the introduction of GVT-g host support in Kconfig. (Joonas) - Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas) - Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas) - Remove {alloc, free}_gvt_device() - Rename intel_gvt_{create, destroy}_gvt_device() - Expost intel_gvt_init_host() - Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas) v6: - Refine introduction in Kconfig. (Chris) - The exposed API functions will take struct intel_gvt * instead of void *. (Chris/Tvrtko) - Remove most memebers of strct intel_gvt_device_info. Will add them in the device model patches.(Chris) - Remove gvt_info() and gvt_err() in debug.h. (Chris) - Move GVT kernel parameter into i915_params. (Chris) - Remove include/drm/i915_gvt.h, as GVT-g will be built within i915. - Remove the redundant struct i915_gvt *, as the functions in i915 will directly take struct intel_gvt *. - Add more comments for reviewer. v5: Take Tvrtko's comments: - Fix the misspelled words in Kconfig - Let functions take drm_i915_private * instead of struct drm_device * - Remove redundant prints/local varible initialization v3: Take Joonas' comments: - Change file name i915_gvt.* to intel_gvt.* - Move GVT kernel parameter into intel_gvt.c - Remove redundant debug macros - Change error handling style - Add introductions for some stub functions - Introduce drm/i915_gvt.h. Take Kevin's comments: - Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c v2: - Introduce i915_gvt.c. It's necessary to introduce the stubs between i915 driver and GVT-g host, as GVT-g components is configurable in kernel config. When disabled, the stubs here do nothing. Take Joonas' comments: - Replace boolean return value with int. - Replace customized info/warn/debug macros with DRM macros. - Document all non-static functions like i915. - Remove empty and unused functions. - Replace magic number with marcos. - Set GVT-g in kernel config to "n" by default. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-16 08:07:00 -04:00
drm/i915: Implement dynamic GuC WOPCM offset and size calculation Hardware may have specific restrictions on GuC WOPCM offset and size. On Gen9, the value of the GuC WOPCM size register needs to be larger than the value of GuC WOPCM offset register + a Gen9 specific offset (144KB) for reserved GuC WOPCM. Fail to enforce such a restriction on GuC WOPCM size will lead to GuC firmware execution failures. On the other hand, with current static GuC WOPCM offset and size values (512KB for both offset and size), the GuC WOPCM size verification will fail on Gen9 even if it can be fixed by lowering the GuC WOPCM offset by calculating its value based on HuC firmware size (which is likely less than 200KB on Gen9), so that we can have a GuC WOPCM size value which is large enough to pass the GuC WOPCM size check. This patch updates the reserved GuC WOPCM size for RC6 context on Gen9 to 24KB to strictly align with the Gen9 GuC WOPCM layout. It also adds support to verify the GuC WOPCM size aganist the Gen9 hardware restrictions. To meet all above requirements, let's provide dynamic partitioning of the WOPCM that will be based on platform specific HuC/GuC firmware sizes. v2: - Removed intel_wopcm_init (Ville/Sagar/Joonas) - Renamed and Moved the intel_wopcm_partition into intel_guc (Sagar) - Removed unnecessary function calls (Joonas) - Init GuC WOPCM partition as soon as firmware fetching is completed v3: - Fixed indentation issues (Chris) - Removed layering violation code (Chris/Michal) - Created separat files for GuC wopcm code (Michal) - Used inline function to avoid code duplication (Michal) v4: - Preset the GuC WOPCM top during early GuC init (Chris) - Fail intel_uc_init_hw() as soon as GuC WOPCM partitioning failed v5: - Moved GuC DMA WOPCM register updating code into intel_wopcm.c - Took care of the locking status before writing to GuC DMA Write-Once registers. (Joonas) v6: - Made sure the GuC WOPCM size to be multiple of 4K (4K aligned) v8: - Updated comments and fixed naming issues (Sagar/Joonas) - Updated commit message to include more description about the hardware restriction on GuC WOPCM size (Sagar) v9: - Minor changes variable names and code comments (Sagar) - Added detailed GuC WOPCM layout drawing (Sagar/Michal) - Refined macro definitions to be reader friendly (Michal) - Removed redundent check to valid flag (Michal) - Unified first parameter for exported GuC WOPCM functions (Michal) - Refined the name and parameter list of hardware restriction checking functions (Michal) v10: - Used shorter function name for internal functions (Joonas) - Moved init-ealry function into c file (Joonas) - Consolidated and removed redundant size checks (Joonas/Michal) - Removed unnecessary unlikely() from code which is only called once during boot (Joonas) - More fixes to kernel-doc format and content (Michal) - Avoided the use of PAGE_MASK for 4K pages (Michal) - Added error log messages to error paths (Michal) v11: - Replaced intel_guc_wopcm with more generic intel_wopcm and attached intel_wopcm to drm_i915_private instead intel_guc (Michal) - dynamic calculation of GuC non-wopcm memory start (a.k.a WOPCM Top offset from GuC WOPCM base) (Michal) - Moved WOPCM marco definitions into .c source file (Michal) - Exported WOPCM layout diagram as kernel-doc (Michal) v12: - Updated naming, function kernel-doc to align with new changes (Michal) v13: - Updated the ordering of s-o-b/cc/r-b tags (Sagar) - Corrected one tense error in comment (Sagar) - Corrected typos and removed spurious comments (Joonas) Bspec: 12690 Signed-off-by: Jackie Li <yaodong.li@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Spotswood <john.a.spotswood@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v8) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9) Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-2-git-send-email-yaodong.li@intel.com
2018-03-13 17:32:50 -07:00
struct intel_wopcm wopcm;
struct intel_dmc dmc;
drm/i915/skl: Add support to load SKL CSR firmware. Display Context Save and Restore support is needed for various SKL Display C states like DC5, DC6. This implementation is added based on first version of DMC CSR program that we received from h/w team. Here we are using request_firmware based design. Finally this firmware should end up in linux-firmware tree. For SKL platform its mandatory to ensure that we load this csr program before enabling DC states like DC5/DC6. As CSR program gets reset on various conditions, we should ensure to load it during boot and in future change to be added to load this system resume sequence too. v1: Initial relese as RFC patch v2: Design change as per Daniel, Damien and Shobit's review comments request firmware method followed. v3: Some optimization and functional changes. Pulled register defines into drivers/gpu/drm/i915/i915_reg.h Used kmemdup to allocate and duplicate firmware content. Ensured to free allocated buffer. v4: Modified as per review comments from Satheesh and Daniel Removed temporary buffer. Optimized number of writes by replacing I915_WRITE with I915_WRITE64. v5: Modified as per review comemnts from Damien. - Changed name for functions and firmware. - Introduced HAS_CSR. - Reverted back previous change and used csr_buf with u8 size. - Using cpu_to_be64 for endianness change. Modified as per review comments from Imre. - Modified registers and macro names to be a bit closer to bspec terminology and the existing register naming in the driver. - Early return for non SKL platforms in intel_load_csr_program function. - Added locking around CSR program load function as it may be called concurrently during system/runtime resume. - Releasing the fw before loading the program for consistency - Handled error path during f/w load. v6: Modified as per review comments from Imre. - Corrected out_freecsr sequence. v7: Modified as per review comments from Imre. Fail loading fw if fw->size%8!=0. v8: Rebase to latest. v9: Rebase on top of -nightly (Damien) v10: Enabled support for dmc firmware ver 1.0. According to ver 1.0 in a single binary package all the firmware's that are required for different stepping's of the product will be stored. The package contains the css header, followed by the package header and the actual dmc firmwares. Package header contains the firmware/stepping mapping table and the corresponding firmware offsets to the individual binaries, within the package. Each individual program binary contains the header and the payload sections whose size is specified in the header section. This changes are done to extract the specific firmaware from the package. (Animesh) v11: Modified as per review comemnts from Imre. - Added code comment from bpec for header structure elements. - Added __packed to avoid structure padding. - Added helper functions for stepping and substepping info. - Added code comment for CSR_MAX_FW_SIZE. - Disabled BXT firmware loading, will be enabled with dmc 1.0 support. - Changed skl_stepping_info based on bspec, earlier used from config DB. - Removed duplicate call of cpu_to_be* from intel_csr_load_program function. - Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned. - Added sanity check for header length. - Added sanity check for mmio address got from firmware binary. - kmalloc done separately for dmc header and dmc firmware. (Animesh) v12: Modified as per review comemnts from Imre. - Corrected the typo error in skl stepping info structure. - Added out-of-bound access for skl_stepping_info. - Sanity check for mmio address modified. - Sanity check added for stepping and substeppig. - Modified the intel_dmc_info structure, cache only the required header info. (Animesh) v13: clarify firmware load error message. The reason for a firmware loading failure can be obscure if the driver is built-in. Provide an explanation to the user about the likely reason for the failure and how to resolve it. (Imre) v14: Suggested by Jani. - fix s/I915/CONFIG_DRM_I915/ typo - add fw_path to the firmware object instead of using a static ptr (Jani) v15: 1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link with same name will help not to build kernel again. 2) Changes done as per review comments from Imre. - Error check removed for intel_csr_ucode_init. - Moved csr-specific data structure to intel_csr.h and optimization done on structure definition. - fw->data used directly for parsing the header info & memory allocation only done separately for payload. (Animesh) v16: - No need for out_regs label in i915_driver_load(), so removed it. - Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh) Issue: VIZ-2569 Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 14:58:44 +02:00
struct intel_gmbus gmbus[GMBUS_NUM_PINS];
drm/i915: use the gmbus irq for waits We need two special things to properly wire this up: - Add another argument to gmbus_wait_hw_status to pass in the correct interrupt bit in gmbus4. - Since we can only get an irq for one of the two events we want, hand-roll the wait_event_timeout code so that we wake up every jiffie and can check for NAKs. This way we also subsume gmbus support for platforms without interrupts (or where those are not yet enabled). The important bit really is to only enable one gmbus interrupt source at the same time - with that piece of lore figured out, this seems to work flawlessly. Ben Widawsky rightfully complained the lack of measurements for the claimed benefits (especially since the first version was actually broken and fell back to bit-banging). Previously reading the 256 byte hdmi EDID takes about 72 ms here. With this patch it's down to 33 ms. Given that transfering the 256 bytes over i2c at wire speed takes 20.5ms alone, the reduction in additional overhead is rather nice. v2: Chris Wilson wondered whether GMBUS4 might contain some set bits when booting up an hence result in some spurious interrupts. Since we clear GMBUS4 after every wait and we do gmbus transfer really early in the setup sequence to detect displays the window is small, but still be paranoid and clear it properly. v3: Clarify the comment that gmbus irq generation can only support one kind of event, why it bothers us and how we work around that limit. Cc: Daniel Kurtz <djkurtz@chromium.org> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-01 13:53:45 +01:00
/** gmbus_mutex protects against concurrent usage of the single hw gmbus
* controller on different i2c buses. */
struct mutex gmbus_mutex;
/**
* Base address of where the gmbus and gpio blocks are located (either
* on PCH or on SoC for platforms without PCH).
*/
u32 gpio_mmio_base;
/* MMIO base address for MIPI regs */
u32 mipi_mmio_base;
u32 pps_mmio_base;
drm/i915: use the gmbus irq for waits We need two special things to properly wire this up: - Add another argument to gmbus_wait_hw_status to pass in the correct interrupt bit in gmbus4. - Since we can only get an irq for one of the two events we want, hand-roll the wait_event_timeout code so that we wake up every jiffie and can check for NAKs. This way we also subsume gmbus support for platforms without interrupts (or where those are not yet enabled). The important bit really is to only enable one gmbus interrupt source at the same time - with that piece of lore figured out, this seems to work flawlessly. Ben Widawsky rightfully complained the lack of measurements for the claimed benefits (especially since the first version was actually broken and fell back to bit-banging). Previously reading the 256 byte hdmi EDID takes about 72 ms here. With this patch it's down to 33 ms. Given that transfering the 256 bytes over i2c at wire speed takes 20.5ms alone, the reduction in additional overhead is rather nice. v2: Chris Wilson wondered whether GMBUS4 might contain some set bits when booting up an hence result in some spurious interrupts. Since we clear GMBUS4 after every wait and we do gmbus transfer really early in the setup sequence to detect displays the window is small, but still be paranoid and clear it properly. v3: Clarify the comment that gmbus irq generation can only support one kind of event, why it bothers us and how we work around that limit. Cc: Daniel Kurtz <djkurtz@chromium.org> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-01 13:53:45 +01:00
wait_queue_head_t gmbus_wait_queue;
struct pci_dev *bridge_dev;
struct rb_root uabi_engines;
struct resource mch_res;
/* protects the irq masks */
spinlock_t irq_lock;
bool display_irqs_enabled;
/* Sideband mailbox protection */
struct mutex sb_lock;
drm/i915: Disable preemption and sleeping while using the punit sideband While we talk to the punit over its sideband, we need to prevent the cpu from sleeping in order to prevent a potential machine hang. Note that by itself, it appears that pm_qos_update_request (via intel_idle) doesn't provide a sufficient barrier to ensure that all core are indeed awake (out of Cstate) and that the package is awake. To do so, we need to supplement the pm_qos with a manual ping on_each_cpu. v2: Restrict the heavy-weight wakeup to just the ISOF_PORT_PUNIT, there is insufficient evidence to implicate a wider problem atm. Similarly, restrict the w/a to Valleyview, as Cherryview doesn't have an angry cadre of users. The working theory, courtesy of Ville and Hans, is the issue lies within the power delivery and so is likely to be unit and board specific and occurs when both the unit/fw require extra power at the same time as the cpu package is changing its own power state. References: https://bugzilla.kernel.org/show_bug.cgi?id=109051 References: https://bugs.freedesktop.org/show_bug.cgi?id=102657 References: https://bugzilla.kernel.org/show_bug.cgi?id=195255 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-1-chris@chris-wilson.co.uk
2019-04-26 09:17:18 +01:00
struct pm_qos_request sb_qos;
/** Cached value of IMR to avoid reads in updating the bitfield */
drm/i915/bdw: Implement interrupt changes The interrupt handling implementation remains the same as previous generations with the 4 types of registers, status, identity, mask, and enable. However the layout of where the bits go have changed entirely. To address these changes, all of the interrupt vfuncs needed special gen8 code. The way it works is there is a top level status register now which informs the interrupt service routine which unit caused the interrupt, and therefore which interrupt registers to read to process the interrupt. For display the division is quite logical, a set of interrupt registers for each pipe, and in addition to those, a set each for "misc" and port. For GT the things get a bit hairy, as seen by the code. Each of the GT units has it's own bits defined. They all look *very similar* and resides in 16 bits of a GT register. As an example, RCS and BCS share register 0. To compact the code a bit, at a slight expense to complexity, this is exactly how the code works as well. 2 structures are added to the ring buffer so that our ring buffer interrupt handling code knows which ring shares the interrupt registers, and a shift value (ie. the top or bottom 16 bits of the register). The above allows us to kept the interrupt register caching scheme, the per interrupt enables, and the code to mask and unmask interrupts relatively clean (again at the cost of some more complexity). Most of the GT units mentioned above are command streamers, and so the symmetry should work quite well for even the yet to be implemented rings which Broadwell adds. v2: Fixes up a couple of bugs, and is more verbose about errors in the Broadwell interrupt handler. v3: fix DE_MISC IER offset v4: Simplify interrupts: I totally misread the docs the first time I implemented interrupts, and so this should greatly simplify the mess. Unlike GEN6, we never touch the regular mask registers in irq_get/put. v5: Rebased on to of recent pch hotplug setup changes. v6: Fixup on top of moving num_pipes to intel_info. v7: Rebased on top of Egbert Eich's hpd irq handling rework. Also wired up ibx_hpd_irq_setup for gen8. v8: Rebase on top of Jani's asle handling rework. v9: Rebase on top of Ben's VECS enabling for Haswell, where he unfortunately went OCD on the gt irq #defines. Not that they're still not yet fully consistent: - Used the GT_RENDER_ #defines + bdw shifts. - Dropped the shift from the L3_PARITY stuff, seemed clearer. - s/irq_refcount/irq_refcount.gt/ v10: Squash in VECS enabling patches and the gen8_gt_irq_handler refactoring from Zhao Yakui <yakui.zhao@intel.com> v11: Rebase on top of the interrupt cleanups in upstream. v12: Rebase on top of Ben's DPF changes in upstream. v13: Drop bdw from the HAS_L3_DPF feature flag for now, it's unclear what exactly needs to be done. Requested by Ben. v14: Fix the patch. - Drop the mask of reserved bits and assorted logic, it doesn't match the spec. - Do the posting read inconditionally instead of commenting it out. - Add a GEN8_MASTER_IRQ_CONTROL definition and use it. - Fix up the GEN8_PIPE interrupt defines and give the GEN8_ prefixes - we actually will need to use them. - Enclose macros in do {} while (0) (checkpatch). - Clear DE_MISC interrupt bits only after having processed them. - Fix whitespace fail (checkpatch). - Fix overtly long lines where appropriate (checkpatch). - Don't use typedef'ed private_t (maintainer-scripts). - Align the function parameter list correctly. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v4) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> bikeshed
2013-11-02 21:07:09 -07:00
union {
u32 irq_mask;
u32 de_irq_mask[I915_MAX_PIPES];
};
u32 pipestat_irq_mask[I915_MAX_PIPES];
struct i915_hotplug hotplug;
struct intel_fbc *fbc[I915_MAX_FBCS];
drm/i915: Add support for DRRS to switch RR This patch computes and stored 2nd M/N/TU for switching to different refresh rate dynamically. PIPECONF_EDP_RR_MODE_SWITCH bit helps toggle between alternate refresh rates programmed in 2nd M/N/TU registers. v2: Daniel's review comments Computing M2/N2 in compute_config and storing it in crtc_config v3: Modified reference to edp_downclock and edp_downclock_avail based on the changes made to move them from dev_private to intel_panel. v4: Modified references to is_drrs_supported based on the changes made to rename it to drrs_support. v5: Jani's review comments Removed superfluous return statements. Changed support for Gen 7 and above. Corrected indentation. Re-structured the code which finds crtc and connector from encoder. Changed some logs to be less verbose. v6: Modifying i915_drrs to include only intel connector as intel_dp can be derived from intel connector when required. v7: As per internal review comments, acquiring mutex just before accessing drrs RR. As per Chris's review comments, added documentation about the use of locking in the function. v8: Incorporated Jani's review comments. Removed reference to edp_downclock. v9: Jani's review comments. Modified comment in set_drrs. Changed index to type edp_drrs_refresh_rate_type. Check if PSR is enabled before setting registers fo DRRS. Signed-off-by: Pradeep Bhat <pradeep.bhat@intel.com> Signed-off-by: Vandana Kannan <vandana.kannan@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-05 12:13:28 +05:30
struct i915_drrs drrs;
struct intel_opregion opregion;
struct intel_vbt_data vbt;
bool preserve_bios_swizzle;
/* overlay */
struct intel_overlay *overlay;
/* backlight registers and fields in struct intel_panel */
struct mutex backlight_lock;
/* protects panel power sequencer state */
struct mutex pps_mutex;
unsigned int fsb_freq, mem_freq, is_ddr3;
unsigned int skl_preferred_vco_freq;
unsigned int max_cdclk_freq;
unsigned int max_dotclk_freq;
unsigned int hpll_freq;
drm/i915: Read ilk FDI PLL frequency once during initialisation During intel_atomic_check(), we do not take the intel_runtime_pm_get() wakeref and so should do the atomic modeset precalculations without referring to the HW. However, on Ironlake we see <7>[ 23.487557] [drm:intel_atomic_check [i915]] [CONNECTOR:47:VGA-1] checking for sink bpp constrains <7>[ 23.487615] [drm:intel_atomic_check [i915]] clamping display bpp (was 36) to default limit of 24 <4>[ 23.487621] RPM wakelock ref not held during HW access <4>[ 23.487652] ------------[ cut here ]------------ <4>[ 23.487697] WARNING: CPU: 0 PID: 1343 at drivers/gpu/drm/i915/intel_drv.h:1813 gen5_read32+0x183/0x200 [i915] <4>[ 23.487701] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich e1000e mei_me ptp mei pps_core prime_numbers <4>[ 23.487784] CPU: 0 PID: 1343 Comm: debugfs_test Tainted: G W 4.14.0-rc7-CI-Trybot_1378+ #1 <4>[ 23.487788] Hardware name: Hewlett-Packard HP Compaq 8100 Elite SFF PC/304Ah, BIOS 786H1 v01.13 07/14/2011 <4>[ 23.487793] task: ffff8801f90aa6c0 task.stack: ffffc900013ec000 <4>[ 23.487838] RIP: 0010:gen5_read32+0x183/0x200 [i915] <4>[ 23.487842] RSP: 0018:ffffc900013efb58 EFLAGS: 00010292 <4>[ 23.487849] RAX: 000000000000002a RBX: ffff880205c00000 RCX: 0000000000000006 <4>[ 23.487854] RDX: 000000000000140a RSI: ffffffff81d0eb14 RDI: ffffffff81cc26f6 <4>[ 23.487857] RBP: ffffc900013efb80 R08: ffff8801f90aaff8 R09: 0000000000000000 <4>[ 23.487861] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 <4>[ 23.487865] R13: 0000000000046000 R14: ffff88020ffaba78 R15: ffff88020b109bf8 <4>[ 23.487870] FS: 00007f53b5e40a40(0000) GS:ffff88021bc00000(0000) knlGS:0000000000000000 <4>[ 23.487874] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 23.487878] CR2: 000055e41900c0e8 CR3: 00000001fa0d6005 CR4: 00000000000206f0 <4>[ 23.487882] Call Trace: <4>[ 23.487931] intel_atomic_check+0x745/0x1290 [i915] <4>[ 23.487948] drm_atomic_check_only+0x459/0x560 <4>[ 23.487956] ? drm_atomic_set_crtc_for_connector+0xc9/0x100 <4>[ 23.488025] drm_atomic_commit+0x18/0x50 <4>[ 23.488035] restore_fbdev_mode_atomic+0x190/0x1f0 <4>[ 23.488059] restore_fbdev_mode+0x32/0x120 <4>[ 23.488072] drm_fb_helper_restore_fbdev_mode_unlocked+0x50/0xa0 <4>[ 23.488139] intel_fbdev_restore_mode+0x34/0x90 [i915] <4>[ 23.488194] i915_driver_lastclose+0xe/0x10 [i915] <4>[ 23.488208] drm_lastclose+0x39/0xf0 <4>[ 23.488219] drm_release+0x30c/0x3c0 <4>[ 23.488236] __fput+0xb9/0x200 <4>[ 23.488252] ____fput+0xe/0x10 <4>[ 23.488264] task_work_run+0x89/0xc0 <4>[ 23.488278] exit_to_usermode_loop+0x83/0x90 <4>[ 23.488290] syscall_return_slowpath+0xd0/0x110 <4>[ 23.488304] entry_SYSCALL_64_fastpath+0xaf/0xb1 <4>[ 23.488312] RIP: 0033:0x7f53b4317560 <4>[ 23.488320] RSP: 002b:00007ffca7e70748 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 <4>[ 23.488333] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007f53b4317560 <4>[ 23.488340] RDX: 0000000000000005 RSI: 00007ffca7e70640 RDI: 0000000000000004 <4>[ 23.488347] RBP: 000055e417783900 R08: 000055e418f9e290 R09: 0000000000000000 <4>[ 23.488356] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 <4>[ 23.488363] R13: 00007f53b4302c40 R14: 0000000000000000 R15: 0000000000000000 <4>[ 23.488384] Code: b5 f2 f2 e0 0f ff e9 c5 fe ff ff 80 3d 0e 4b 10 00 00 0f 85 c6 fe ff ff 48 c7 c7 30 73 29 a0 c6 05 fa 4a 10 00 01 e8 8e f2 f2 e0 <0f> ff e9 ac fe ff ff e8 51 9d f3 e0 85 c0 0f 85 01 ff ff ff 48 <4>[ 23.488780] ---[ end trace 6bc72ce7f1596190 ]--- <7>[ 23.488844] [drm:intel_atomic_check [i915]] checking fdi config on pipe A, lanes 1 <7>[ 23.488911] [drm:intel_atomic_check [i915]] hw max bpp: 36, pipe bpp: 24, dithering: 0 due to intel_fdi_link_freq() poking at FDI_PLL_BIOS_0. Avoid this by recording the fdi pll frequency during device initiailisation. v2: Also extract the static FDI PLL frequencies for Sandybridge and Ivybridge. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171107214713.18704-1-chris@chris-wilson.co.uk Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-11-05 13:49:05 +00:00
unsigned int fdi_pll_freq;
unsigned int czclk_freq;
struct {
/* The current hardware cdclk configuration */
struct intel_cdclk_config hw;
drm/i915: Use literal representation of cdclk tables The bspec lays out legal cdclk frequencies, PLL ratios, and CD2X dividers in an easy-to-read table for most recent platforms. We've been translating the data from that table into platform-specific code logic, but it's easy to overlook an area we need to update when adding new cdclk values or enabling new platforms. Let's just add a form of the bspec table to the code and then adjust our functions to pull what they need directly out of the table. v2: Fix comparison when finding best cdclk. v3: Another logic fix for calc_cdclk. v4: - Use named initializers for cdclk tables. (Ville) - Include refclk as a field in the table instead of adding all three ratios for each entry. (Ville) - Terminate tables with an empty entry to avoid needing to store the table size. (Ville) - Don't try so hard to return reasonable values from our lookup functions if we get impossible inputs; just WARN and return 0. (Ville) - Keep a bxt_ prefix on the lookup functions since they're still only used on bxt+ for now. We can rename them later if we extend this table-based approach back to older platforms. (Ville) v5: - Fix cnl table's ratios for 24mhz refclk. (Ville) - Don't miss the named initializers on the cnl table. (Ville) - Represent refclk in table as u16 rather than u32. (Ville) Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190910161506.7158-1-matthew.d.roper@intel.com
2019-09-10 09:15:06 -07:00
/* cdclk, divider, and ratio table from bspec */
const struct intel_cdclk_vals *table;
struct intel_global_obj obj;
} cdclk;
struct {
/* The current hardware dbuf configuration */
u8 enabled_slices;
struct intel_global_obj obj;
} dbuf;
drm/i915: fix hpd work vs. flush_work in the pageflip code deadlock Historically we've run our own driver hotplug handling in our own work-queue, which then launched the drm core hotplug handling in the system workqueue. This is important since we flush our own driver workqueue in the pageflip code while hodling modeset locks, and only the drm hotplug code grabbed these locks. But with commit 69787f7da6b2adc4054357a661aaa1701a9ca76f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Oct 23 18:23:34 2012 +0000 drm: run the hpd irq event code directly this was changed and now we could deadlock in our flip handler if there's a hotplug work blocking the progress of the crucial unpin works. So this broke the careful deadlock avoidance implemented in commit b4a98e57fc27854b5938fc8b08b68e5e68b91e1f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 1 09:26:26 2012 +0000 drm/i915: Flush outstanding unpin tasks before pageflipping Since the rule thus far has been that work items on our own workqueue may never grab modeset locks simply restore that rule again. v2: Add a comment to the declaration of dev_priv->wq to warn readers about the tricky implications of using it. Suggested by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Stuart Abercrombie <sabercrombie@chromium.org> Reported-by: Stuart Abercrombie <sabercrombie@chromium.org> References: http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/26239 Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Squash in a comment at the place where we schedule the work. Requested after-the-fact by Chris on irc since the hpd work isn't the only place we botch this.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-02 16:22:25 +02:00
/**
* wq - Driver workqueue for GEM.
*
* NOTE: Work items scheduled here are not allowed to grab any modeset
* locks, for otherwise the flushing done in the pageflip code will
* result in deadlocks.
*/
struct workqueue_struct *wq;
/* ordered wq for modesets */
struct workqueue_struct *modeset_wq;
/* unbound hipri wq for page flips/plane updates */
struct workqueue_struct *flip_wq;
/* pm private clock gating functions */
const struct drm_i915_clock_gating_funcs *clock_gating_funcs;
/* pm display functions */
const struct drm_i915_wm_disp_funcs *wm_disp;
/* irq display functions */
const struct intel_hotplug_funcs *hotplug_funcs;
/* fdi display functions */
const struct intel_fdi_funcs *fdi_funcs;
/* display pll funcs */
const struct intel_dpll_funcs *dpll_funcs;
/* Display functions */
const struct drm_i915_display_funcs *display;
/* Display internal color functions */
const struct intel_color_funcs *color_funcs;
/* Display CDCLK functions */
const struct intel_cdclk_funcs *cdclk_funcs;
/* PCH chipset type */
enum intel_pch pch_type;
unsigned short pch_id;
unsigned long quirks;
struct drm_atomic_state *modeset_restore_state;
struct drm_modeset_acquire_ctx reset_ctx;
struct i915_ggtt ggtt; /* VM representing the global address space */
struct i915_gem_mm mm;
/* Kernel Modesetting */
/**
* dpll and cdclk state is protected by connection_mutex
* dpll.lock serializes intel_{prepare,enable,disable}_shared_dpll.
* Must be global rather than per dpll, because on some platforms plls
* share registers.
*/
struct {
struct mutex lock;
int num_shared_dpll;
struct intel_shared_dpll shared_dplls[I915_NUM_PLLS];
const struct intel_dpll_mgr *mgr;
struct {
int nssc;
int ssc;
} ref_clks;
} dpll;
struct list_head global_obj_list;
/*
* For reading active_pipes holding any crtc lock is
* sufficient, for writing must hold all of them.
*/
u8 active_pipes;
drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush_*_domain and set_to_*_domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 16:01:59 +02:00
struct i915_frontbuffer_tracking fb_tracking;
drm/i915: Move atomic state free from out of fence release Fences are required to support being released from under an atomic context. The drm_atomic_state struct may take a mutex when being released and so we cannot drop a reference to the drm_atomic_state from the fence release path directly, and so we need to defer that unreference to a worker. [ 326.576697] WARNING: CPU: 2 PID: 366 at kernel/sched/core.c:7737 __might_sleep+0x5d/0x80 [ 326.576816] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffffc0359549>] intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.576818] Modules linked in: rfcomm fuse snd_hda_codec_hdmi bnep snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer input_leds led_class snd punit_atom_debug btusb btrtl btbcm btintel intel_rapl bluetooth i915 drm_kms_helper syscopyarea sysfillrect iwlwifi sysimgblt soundcore fb_sys_fops mei_txe cfg80211 drm pwm_lpss_platform pwm_lpss pinctrl_cherryview fjes acpi_pad parport_pc ppdev parport autofs4 [ 326.576899] CPU: 2 PID: 366 Comm: i915/signal:0 Tainted: G U 4.10.0-rc3-patser+ #5030 [ 326.576902] Hardware name: /NUC5PPYB, BIOS PYBSWCEL.86A.0031.2015.0601.1712 06/01/2015 [ 326.576905] Call Trace: [ 326.576920] dump_stack+0x4d/0x6d [ 326.576926] __warn+0xc0/0xe0 [ 326.576931] warn_slowpath_fmt+0x5a/0x80 [ 326.577004] ? intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.577075] ? intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.577079] __might_sleep+0x5d/0x80 [ 326.577087] mutex_lock+0x1b/0x40 [ 326.577133] drm_property_free_blob+0x1e/0x80 [drm] [ 326.577167] ? drm_property_destroy+0xe0/0xe0 [drm] [ 326.577200] drm_mode_object_unreference+0x5c/0x70 [drm] [ 326.577233] drm_property_unreference_blob+0xe/0x10 [drm] [ 326.577260] __drm_atomic_helper_crtc_destroy_state+0x14/0x40 [drm_kms_helper] [ 326.577278] drm_atomic_helper_crtc_destroy_state+0x10/0x20 [drm_kms_helper] [ 326.577352] intel_crtc_destroy_state+0x9/0x10 [i915] [ 326.577388] drm_atomic_state_default_clear+0xea/0x1d0 [drm] [ 326.577462] intel_atomic_state_clear+0xd/0x20 [i915] [ 326.577497] drm_atomic_state_clear+0x1a/0x30 [drm] [ 326.577532] __drm_atomic_state_free+0x13/0x60 [drm] [ 326.577607] intel_atomic_commit_ready+0x6f/0x78 [i915] [ 326.577670] i915_sw_fence_release+0x3a/0x50 [i915] [ 326.577733] dma_i915_sw_fence_wake+0x39/0x80 [i915] [ 326.577741] dma_fence_signal+0xda/0x120 [ 326.577812] ? intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.577884] intel_breadcrumbs_signaler+0xb1/0x270 [i915] [ 326.577889] kthread+0x127/0x130 [ 326.577961] ? intel_engine_remove_wait+0x1a0/0x1a0 [i915] [ 326.577964] ? kthread_stop+0x120/0x120 [ 326.577970] ret_from_fork+0x22/0x30 Fixes: c004a90b7263 ("drm/i915: Restore nonblocking awaits for modesetting") Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170123212939.30345-1-chris@chris-wilson.co.uk Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+ Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-23 21:29:39 +00:00
struct intel_atomic_helper {
struct llist_head free_list;
struct work_struct free_work;
} atomic_helper;
bool mchbar_need_disable;
struct intel_l3_parity l3_parity;
/*
* HTI (aka HDPORT) state read during initial hw readout. Most
* platforms don't have HTI, so this will just stay 0. Those that do
* will use this later to figure out which PLLs and PHYs are unavailable
* for driver usage.
*/
u32 hti_state;
/*
* edram size in MB.
* Cannot be determined by PCIID. You must always read a register.
*/
u32 edram_size_mb;
struct i915_power_domains power_domains;
struct i915_gpu_error gpu_error;
struct drm_i915_gem_object *vlv_pctx;
/* list of fbdev register on this device */
struct intel_fbdev *fbdev;
struct work_struct fbdev_suspend_work;
struct drm_property *broadcast_rgb_property;
struct drm_property *force_audio_property;
u32 fdi_rx_config;
drm/i915: Implement WaPixelRepeatModeFixForC0:chv DPLL_MD(PIPE_C) is AWOL on CHV. Instead of fixing it someone added chicken bits to propagate the pixel multiplier from DPLL_MD(PIPE_B) to either pipe B or C. So do that to make pixel repeat work on pipes B and C. Pipe A is fine without any tricks. Fortunately the pixel repeat propagation appears to be a oneshot operation, so once the value has been written we can clear the chicken bits. So it is still possible to drive pipe B and C with different pixel multipliers simultaneosly. Looks like DPLL_VGA_MODE_DIS must also be set in DPLL(PIPE_B) for this to work. But since we keep that bit always set in all DPLLs there's no problem. This of course means we can't reliably read out the pixel multiplier for pipes B and C. That would make the state checker unhappy, so I added shadow copies of those registers in to dev_priv. The other option would have been to skip pixel multiplier, dpll_md an dotclock checks entirely on CHV, but that feels like a serious loss of cross checking, so just pretending that we have working DPLL MD registers seemed better. Obviously with the shadow copies we can't detect if the pixel multiplier was properly configured, nor can we take over its state from the BIOS, but hopefully people won't have displays that would be limitd to such crappy modes. There is one strange flicker still remaining. It's visible on pipe C/HDMID when HDMIB is enabled while driven by pipe B. It doesn't occur if pipe A drives HDMIB, nor is there any glitch on pipe B/HDMIB when port C/HDMID starts up. I don't have a board with HDMIC so not sure if it happens there too. So I'm not sure if it's somehow tied in with this strange linkage between pipe B and C. Sadly I was unable to find an enable sequence that would avoid the glitch, but at least it's not fatal ie. the output recovers afterwards. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1458052809-23426-4-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2016-03-15 16:39:56 +02:00
/* Shadow for DISPLAY_PHY_CONTROL which can't be safely read */
drm/i915: Work around DISPLAY_PHY_CONTROL register corruption on CHV Sometimes (exactly when is a bit unclear) DISPLAY_PHY_CONTROL appears to get corrupted. The values I've managed to read from it seem to have some pattern but vary quite a lot. The corruption doesn't seem to just happen when the register is accessed, but can also happen spontaneosly during modeset. When this happens during a modeset things go south and the display doesn't light up. I've managed to hit the problemn when toggling HDMI on port D on and off. When things get corrupted the display doesn't light up, but as soon as I manually write the correct value to the register the display comes up. First I was suspicious that we ourselves accidentally overwrite it with garbage, but didn't catch anything with the reg_rw tracepoint. Also I sprinkled check all over the modeset path to see exactly when the corruption happens, and eg. the read back value was fine just before intel_dp_set_m(), and corrupted immediately after it. I also made my check function repair the register value whenever it was wrong, and with this approach the corruption repeated several times during the modeset operation, always seeming to trigger in the same exact calls to the check function, while other calls to the function never caught anything. So far I've not seen this problem occurring when carefully avoiding all read accesses to DISPLAY_PHY_CONTROL. Not sure if that's just pure luck or an actual workaround, but we can hope it works. So let's avoid reading the register and instead track the desired value of the register in dev_priv. v2: Read out the power well state to determine initial register value v3: Use DPIO_CHx names instead of raw numbers Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 18:21:28 +03:00
u32 chv_phy_control;
drm/i915: Implement WaPixelRepeatModeFixForC0:chv DPLL_MD(PIPE_C) is AWOL on CHV. Instead of fixing it someone added chicken bits to propagate the pixel multiplier from DPLL_MD(PIPE_B) to either pipe B or C. So do that to make pixel repeat work on pipes B and C. Pipe A is fine without any tricks. Fortunately the pixel repeat propagation appears to be a oneshot operation, so once the value has been written we can clear the chicken bits. So it is still possible to drive pipe B and C with different pixel multipliers simultaneosly. Looks like DPLL_VGA_MODE_DIS must also be set in DPLL(PIPE_B) for this to work. But since we keep that bit always set in all DPLLs there's no problem. This of course means we can't reliably read out the pixel multiplier for pipes B and C. That would make the state checker unhappy, so I added shadow copies of those registers in to dev_priv. The other option would have been to skip pixel multiplier, dpll_md an dotclock checks entirely on CHV, but that feels like a serious loss of cross checking, so just pretending that we have working DPLL MD registers seemed better. Obviously with the shadow copies we can't detect if the pixel multiplier was properly configured, nor can we take over its state from the BIOS, but hopefully people won't have displays that would be limitd to such crappy modes. There is one strange flicker still remaining. It's visible on pipe C/HDMID when HDMIB is enabled while driven by pipe B. It doesn't occur if pipe A drives HDMIB, nor is there any glitch on pipe B/HDMIB when port C/HDMID starts up. I don't have a board with HDMIC so not sure if it happens there too. So I'm not sure if it's somehow tied in with this strange linkage between pipe B and C. Sadly I was unable to find an enable sequence that would avoid the glitch, but at least it's not fatal ie. the output recovers afterwards. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1458052809-23426-4-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2016-03-15 16:39:56 +02:00
/*
* Shadows for CHV DPLL_MD regs to keep the state
* checker somewhat working in the presence hardware
* crappiness (can't read out DPLL_MD for pipes B & C).
*/
u32 chv_dpll_md[I915_MAX_PIPES];
u32 bxt_phy_grc;
drm/i915: Work around DISPLAY_PHY_CONTROL register corruption on CHV Sometimes (exactly when is a bit unclear) DISPLAY_PHY_CONTROL appears to get corrupted. The values I've managed to read from it seem to have some pattern but vary quite a lot. The corruption doesn't seem to just happen when the register is accessed, but can also happen spontaneosly during modeset. When this happens during a modeset things go south and the display doesn't light up. I've managed to hit the problemn when toggling HDMI on port D on and off. When things get corrupted the display doesn't light up, but as soon as I manually write the correct value to the register the display comes up. First I was suspicious that we ourselves accidentally overwrite it with garbage, but didn't catch anything with the reg_rw tracepoint. Also I sprinkled check all over the modeset path to see exactly when the corruption happens, and eg. the read back value was fine just before intel_dp_set_m(), and corrupted immediately after it. I also made my check function repair the register value whenever it was wrong, and with this approach the corruption repeated several times during the modeset operation, always seeming to trigger in the same exact calls to the check function, while other calls to the function never caught anything. So far I've not seen this problem occurring when carefully avoiding all read accesses to DISPLAY_PHY_CONTROL. Not sure if that's just pure luck or an actual workaround, but we can hope it works. So let's avoid reading the register and instead track the desired value of the register in dev_priv. v2: Read out the power well state to determine initial register value v3: Use DPIO_CHx names instead of raw numbers Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 18:21:28 +03:00
u32 suspend_count;
drm/i915: Fix hibernation with ACPI S0 target state After commit dd9f31c7a3887950cbd0d49eb9d43f7a1518a356 Author: Imre Deak <imre.deak@intel.com> Date: Wed Aug 16 17:46:07 2017 +0300 drm/i915/gen9+: Set same power state before hibernation image save/restore during hibernation/suspend the power domain functionality got disabled, after which resume could leave it incorrectly disabled if the ACPI target state was S0 during suspend and i915 was not loaded by the loader kernel. This was caused by not considering if we resumed from hibernation as the condition for power domains reiniting. Fix this by simply tracking if we suspended power domains during system suspend and reinit power domains accordingly during resume. This will result in reiniting power domains always when resuming from hibernation, regardless of the platform and whether or not i915 is loaded by the loader kernel. The reason we didn't catch this earlier is that the enabled/disabled state of power domains during PMSG_FREEZE/PMSG_QUIESCE is platform and kernel config dependent: on my SKL the target state is S4 during PMSG_FREEZE and (with the driver loaded in the loader kernel) S0 during PMSG_QUIESCE. On the reporter's machine it's S0 during PMSG_FREEZE but (contrary to this) power domains are not initialized during PMSG_QUIESCE since i915 is not loaded in the loader kernel, or it's loaded but without the DMC firmware being available. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105196 Reported-and-tested-by: amn-bas@hotmail.com Fixes: dd9f31c7a388 ("drm/i915/gen9+: Set same power state before hibernation image save/restore") Cc: amn-bas@hotmail.com Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180322143642.26883-1-imre.deak@intel.com
2018-03-22 16:36:42 +02:00
bool power_domains_suspended;
struct i915_suspend_saved_registers regfile;
struct vlv_s0ix_state *vlv_s0ix_state;
drm/i915/skl: Add support for the SAGV, fix underrun hangs Since the watermark calculations for Skylake are still broken, we're apt to hitting underruns very easily under multi-monitor configurations. While it would be lovely if this was fixed, it's not. Another problem that's been coming from this however, is the mysterious issue of underruns causing full system hangs. An easy way to reproduce this with a skylake system: - Get a laptop with a skylake GPU, and hook up two external monitors to it - Move the cursor from the built-in LCD to one of the external displays as quickly as you can - You'll get a few pipe underruns, and eventually the entire system will just freeze. After doing a lot of investigation and reading through the bspec, I found the existence of the SAGV, which is responsible for adjusting the system agent voltage and clock frequencies depending on how much power we need. According to the bspec: "The display engine access to system memory is blocked during the adjustment time. SAGV defaults to enabled. Software must use the GT-driver pcode mailbox to disable SAGV when the display engine is not able to tolerate the blocking time." The rest of the bspec goes on to explain that software can simply leave the SAGV enabled, and disable it when we use interlaced pipes/have more then one pipe active. Sure enough, with this patchset the system hangs resulting from pipe underruns on Skylake have completely vanished on my T460s. Additionally, the bspec mentions turning off the SAGV with more then one pipe enabled as a workaround for display underruns. While this patch doesn't entirely fix that, it looks like it does improve the situation a little bit so it's likely this is going to be required to make watermarks on Skylake fully functional. This will still need additional work in the future: we shouldn't be enabling the SAGV if any of the currently enabled planes can't enable WM levels that introduce latencies >= 30 µs. Changes since v11: - Add skl_can_enable_sagv() - Make sure we don't enable SAGV when not all planes can enable watermarks >= the SAGV engine block time. I was originally going to save this for later, but I recently managed to run into a machine that was having problems with a single pipe configuration + SAGV. - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE - Move printks outside of mutexes - Don't print error messages twice Changes since v10: - Apparently sandybridge_pcode_read actually writes values and reads them back, despite it's misleading function name. This means we've been doing this mostly wrong and have been writing garbage to the SAGV control. Because of this, we no longer attempt to read the SAGV status during initialization (since there are no helpers for this). - mlankhorst noticed that this patch was breaking on some very early pre-release Skylake machines, which apparently don't allow you to disable the SAGV. To prevent machines from failing tests due to SAGV errors, if the first time we try to control the SAGV results in the mailbox indicating an invalid command, we just disable future attempts to control the SAGV state by setting dev_priv->skl_sagv_status to I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg. - Move mutex_unlock() a little higher in skl_enable_sagv(). This doesn't actually fix anything, but lets us release the lock a little sooner since we're finished with it. Changes since v9: - Only enable/disable sagv on Skylake Changes since v8: - Add intel_state->modeset guard to the conditional for skl_enable_sagv() Changes since v7: - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's all we use it for anyway) - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification - Fix a styling error that snuck past me Changes since v6: - Protect skl_enable_sagv() with intel_state->modeset conditional in intel_atomic_commit_tail() Changes since v5: - Don't use is_power_of_2. Makes things confusing - Don't use the old state to figure out whether or not to enable/disable the sagv, use the new one - Split the loop in skl_disable_sagv into it's own function - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail() Changes since v4: - Use is_power_of_2 against active_crtcs to check whether we have > 1 pipe enabled - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0 enabled - Call skl_sagv_enable/disable() from pre/post-plane updates Changes since v3: - Use time_before() to compare timeout to jiffies Changes since v2: - Really apply minor style nitpicks to patch this time Changes since v1: - Added comments about this probably being one of the requirements to fixing Skylake's watermark issues - Minor style nitpicks from Matt Roper - Disable these functions on Broxton, since it doesn't have an SAGV Signed-off-by: Lyude <cpaul@redhat.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com [mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
2016-08-17 15:55:54 -04:00
enum {
I915_SAGV_UNKNOWN = 0,
I915_SAGV_DISABLED,
I915_SAGV_ENABLED,
I915_SAGV_NOT_CONTROLLED
} sagv_status;
drm/i915/skl: Add support for the SAGV, fix underrun hangs Since the watermark calculations for Skylake are still broken, we're apt to hitting underruns very easily under multi-monitor configurations. While it would be lovely if this was fixed, it's not. Another problem that's been coming from this however, is the mysterious issue of underruns causing full system hangs. An easy way to reproduce this with a skylake system: - Get a laptop with a skylake GPU, and hook up two external monitors to it - Move the cursor from the built-in LCD to one of the external displays as quickly as you can - You'll get a few pipe underruns, and eventually the entire system will just freeze. After doing a lot of investigation and reading through the bspec, I found the existence of the SAGV, which is responsible for adjusting the system agent voltage and clock frequencies depending on how much power we need. According to the bspec: "The display engine access to system memory is blocked during the adjustment time. SAGV defaults to enabled. Software must use the GT-driver pcode mailbox to disable SAGV when the display engine is not able to tolerate the blocking time." The rest of the bspec goes on to explain that software can simply leave the SAGV enabled, and disable it when we use interlaced pipes/have more then one pipe active. Sure enough, with this patchset the system hangs resulting from pipe underruns on Skylake have completely vanished on my T460s. Additionally, the bspec mentions turning off the SAGV with more then one pipe enabled as a workaround for display underruns. While this patch doesn't entirely fix that, it looks like it does improve the situation a little bit so it's likely this is going to be required to make watermarks on Skylake fully functional. This will still need additional work in the future: we shouldn't be enabling the SAGV if any of the currently enabled planes can't enable WM levels that introduce latencies >= 30 µs. Changes since v11: - Add skl_can_enable_sagv() - Make sure we don't enable SAGV when not all planes can enable watermarks >= the SAGV engine block time. I was originally going to save this for later, but I recently managed to run into a machine that was having problems with a single pipe configuration + SAGV. - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE - Move printks outside of mutexes - Don't print error messages twice Changes since v10: - Apparently sandybridge_pcode_read actually writes values and reads them back, despite it's misleading function name. This means we've been doing this mostly wrong and have been writing garbage to the SAGV control. Because of this, we no longer attempt to read the SAGV status during initialization (since there are no helpers for this). - mlankhorst noticed that this patch was breaking on some very early pre-release Skylake machines, which apparently don't allow you to disable the SAGV. To prevent machines from failing tests due to SAGV errors, if the first time we try to control the SAGV results in the mailbox indicating an invalid command, we just disable future attempts to control the SAGV state by setting dev_priv->skl_sagv_status to I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg. - Move mutex_unlock() a little higher in skl_enable_sagv(). This doesn't actually fix anything, but lets us release the lock a little sooner since we're finished with it. Changes since v9: - Only enable/disable sagv on Skylake Changes since v8: - Add intel_state->modeset guard to the conditional for skl_enable_sagv() Changes since v7: - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's all we use it for anyway) - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification - Fix a styling error that snuck past me Changes since v6: - Protect skl_enable_sagv() with intel_state->modeset conditional in intel_atomic_commit_tail() Changes since v5: - Don't use is_power_of_2. Makes things confusing - Don't use the old state to figure out whether or not to enable/disable the sagv, use the new one - Split the loop in skl_disable_sagv into it's own function - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail() Changes since v4: - Use is_power_of_2 against active_crtcs to check whether we have > 1 pipe enabled - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0 enabled - Call skl_sagv_enable/disable() from pre/post-plane updates Changes since v3: - Use time_before() to compare timeout to jiffies Changes since v2: - Really apply minor style nitpicks to patch this time Changes since v1: - Added comments about this probably being one of the requirements to fixing Skylake's watermark issues - Minor style nitpicks from Matt Roper - Disable these functions on Broxton, since it doesn't have an SAGV Signed-off-by: Lyude <cpaul@redhat.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com [mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
2016-08-17 15:55:54 -04:00
u32 sagv_block_time_us;
struct {
/*
* Raw watermark latency values:
* in 0.1us units for WM0,
* in 0.5us units for WM1+.
*/
/* primary */
u16 pri_latency[5];
/* sprite */
u16 spr_latency[5];
/* cursor */
u16 cur_latency[5];
/*
* Raw watermark memory latency values
* for SKL for all 8 levels
* in 1us units.
*/
u16 skl_latency[8];
/* current hardware state */
drm/i915/skl: SKL Watermark Computation This patch implements the watermark algorithm and its necessary functions. Two function pointers skl_update_wm and skl_update_sprite_wm are provided. The skl_update_wm will update the watermarks for the crtc provided as an argument and then checks for change in DDB allocation for other active pipes and recomputes the watermarks for those Pipes and planes as well. Finally it does the register programming for all dirty pipes. The trigger of the Watermark double buffer registers will have to be once the plane configurations are done by the caller. v2: fixed the divide-by-0 error in the results computation func. Also reworked the PLANE_WM register values computation func to make it more compact. Incorporated all other review comments from Damien. v3: Changed the skl_compute_plane_wm function to now return success or failure. Also the result blocks and lines are computed here instead of in skl_compute_wm_results function. v4: Adjust skl_ddb_alloc_changed() to the new planes/cursor split (Damien) v5: Reworked the affected functions to implement new plane/cursor split. v6: Rework the logic that triggers the DDB allocation and WM computation of skl_update_other_pipe_wm() to not depend on non-computed DDB values. Always give a valid cursor_width (at boot it's 0) to keep the invariant that we consider the cursor plane always enabled. Otherwise we end up dividing by 0 in skl_compute_plane_wm() (Damien Lespiau) v7: Spell out allocation skl_ddb_ functions should have the ddb as first argument Make the skl_ddb_alloc_changed() parameters const (Damien) v8: Rebase on top of the crtc->primary changes v9: Split the staging results structure to not exceed the 1Kb stack allocation in skl_update_wm() v10: Make skl_pipe_pixel_rate() take a pointer to the pipe config Add a comment about overflow considerations for skl_wm_method1() Various additions of const Various use of sizeof(variable) instead of sizeof(type) Various move of variable definitons to a narrower scope Zero initialize some stack allocated structures to make sure we don't have garbage in case we don't write all the values (Ville) v11: Remove non-necessary default number of blocks/lines when the plane is disabled (Ville) Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Pradeep Bhat <pradeep.bhat@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-04 17:06:42 +00:00
union {
struct ilk_wm_values hw;
struct vlv_wm_values vlv;
drm/i915: Two stage watermarks for g4x Implement proper two stage watermark programming for g4x. As with other pre-SKL platforms, the watermark registers aren't double buffered on g4x. Hence we must sequence the watermark update carefully around plane updates. The code is quite heavily modelled on the VLV/CHV code, with some fairly significant differences due to the different hardware architecture: * g4x doesn't use inverted watermark values * CxSR actually affects the watermarks since it controls memory self refresh in addition to the max FIFO mode * A further HPLL SR mode is possible with higher memory wakeup latency * g4x has FBC2 and so it also has FBC watermarks * max FIFO mode for primary plane only (cursor is allowed, sprite is not) * g4x has no manual FIFO repartitioning * some TLB miss related workarounds are needed for the watermarks Actually the hardware is quite similar to ILK+ in many ways. The most visible differences are in the actual watermakr register layout. ILK revamped that part quite heavily whereas g4x is still using the layout inherited from earlier platforms. Note that we didn't previously enable the HPLL SR on g4x. So in order to not introduce too many functional changes in this patch I've not actually enabled it here either, even though the code is now fully ready for it. We'll enable it separately later on. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170421181432.15216-13-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2017-04-21 21:14:29 +03:00
struct g4x_wm_values g4x;
drm/i915/skl: SKL Watermark Computation This patch implements the watermark algorithm and its necessary functions. Two function pointers skl_update_wm and skl_update_sprite_wm are provided. The skl_update_wm will update the watermarks for the crtc provided as an argument and then checks for change in DDB allocation for other active pipes and recomputes the watermarks for those Pipes and planes as well. Finally it does the register programming for all dirty pipes. The trigger of the Watermark double buffer registers will have to be once the plane configurations are done by the caller. v2: fixed the divide-by-0 error in the results computation func. Also reworked the PLANE_WM register values computation func to make it more compact. Incorporated all other review comments from Damien. v3: Changed the skl_compute_plane_wm function to now return success or failure. Also the result blocks and lines are computed here instead of in skl_compute_wm_results function. v4: Adjust skl_ddb_alloc_changed() to the new planes/cursor split (Damien) v5: Reworked the affected functions to implement new plane/cursor split. v6: Rework the logic that triggers the DDB allocation and WM computation of skl_update_other_pipe_wm() to not depend on non-computed DDB values. Always give a valid cursor_width (at boot it's 0) to keep the invariant that we consider the cursor plane always enabled. Otherwise we end up dividing by 0 in skl_compute_plane_wm() (Damien Lespiau) v7: Spell out allocation skl_ddb_ functions should have the ddb as first argument Make the skl_ddb_alloc_changed() parameters const (Damien) v8: Rebase on top of the crtc->primary changes v9: Split the staging results structure to not exceed the 1Kb stack allocation in skl_update_wm() v10: Make skl_pipe_pixel_rate() take a pointer to the pipe config Add a comment about overflow considerations for skl_wm_method1() Various additions of const Various use of sizeof(variable) instead of sizeof(type) Various move of variable definitons to a narrower scope Zero initialize some stack allocated structures to make sure we don't have garbage in case we don't write all the values (Ville) v11: Remove non-necessary default number of blocks/lines when the plane is disabled (Ville) Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Pradeep Bhat <pradeep.bhat@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-04 17:06:42 +00:00
};
u8 max_level;
drm/i915: Add two-stage ILK-style watermark programming (v11) In addition to calculating final watermarks, let's also pre-calculate a set of intermediate watermark values at atomic check time. These intermediate watermarks are a combination of the watermarks for the old state and the new state; they should satisfy the requirements of both states which means they can be programmed immediately when we commit the atomic state (without waiting for a vblank). Once the vblank does happen, we can then re-program watermarks to the more optimal final value. v2: Significant rebasing/rewriting. v3: - Move 'need_postvbl_update' flag to CRTC state (Daniel) - Don't forget to check intermediate watermark values for validity (Maarten) - Don't due async watermark optimization; just do it at the end of the atomic transaction, after waiting for vblanks. We do want it to be async eventually, but adding that now will cause more trouble for Maarten's in-progress work. (Maarten) - Don't allocate space in crtc_state for intermediate watermarks on platforms that don't need it (gen9+). - Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit now that ilk_update_wm is gone. v4: - Add a wm_mutex to cover updates to intel_crtc->active and the need_postvbl_update flag. Since we don't have async yet it isn't terribly important yet, but might as well add it now. - Change interface to program watermarks. Platforms will now expose .initial_watermarks() and .optimize_watermarks() functions to do watermark programming. These should lock wm_mutex, copy the appropriate state values into intel_crtc->active, and then call the internal program watermarks function. v5: - Skip intermediate watermark calculation/check during initial hardware readout since we don't trust the existing HW values (and don't have valid values of our own yet). - Don't try to call .optimize_watermarks() on platforms that don't have atomic watermarks yet. (Maarten) v6: - Rebase v7: - Further rebase v8: - A few minor indentation and line length fixes v9: - Yet another rebase since Maarten's patches reworked a bunch of the code (wm_pre, wm_post, etc.) that this was previously based on. v10: - Move wm_mutex to dev_priv to protect against racing commits against disjoint CRTC sets. (Maarten) - Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten) v11: - Now that we've moved to atomic watermark updates, make sure we call the proper function to program watermarks in {ironlake,haswell}_crtc_enable(); the failure to do so on the previous patch iteration led to us not actually programming the watermarks before turning on the CRTC, which was the cause of the underruns that the CI system was seeing. - Fix inverted logic for determining when to optimize watermarks. We were needlessly optimizing when the intermediate/optimal values were the same (harmless), but not actually optimizing when they differed (also harmless, but wasteful from a power/bandwidth perspective). Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1456276813-5689-1-git-send-email-matthew.d.roper@intel.com
2016-02-23 17:20:13 -08:00
/*
* Should be held around atomic WM register writing; also
* protects * intel_crtc->wm.active and
* crtc_state->wm.need_postvbl_update.
drm/i915: Add two-stage ILK-style watermark programming (v11) In addition to calculating final watermarks, let's also pre-calculate a set of intermediate watermark values at atomic check time. These intermediate watermarks are a combination of the watermarks for the old state and the new state; they should satisfy the requirements of both states which means they can be programmed immediately when we commit the atomic state (without waiting for a vblank). Once the vblank does happen, we can then re-program watermarks to the more optimal final value. v2: Significant rebasing/rewriting. v3: - Move 'need_postvbl_update' flag to CRTC state (Daniel) - Don't forget to check intermediate watermark values for validity (Maarten) - Don't due async watermark optimization; just do it at the end of the atomic transaction, after waiting for vblanks. We do want it to be async eventually, but adding that now will cause more trouble for Maarten's in-progress work. (Maarten) - Don't allocate space in crtc_state for intermediate watermarks on platforms that don't need it (gen9+). - Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit now that ilk_update_wm is gone. v4: - Add a wm_mutex to cover updates to intel_crtc->active and the need_postvbl_update flag. Since we don't have async yet it isn't terribly important yet, but might as well add it now. - Change interface to program watermarks. Platforms will now expose .initial_watermarks() and .optimize_watermarks() functions to do watermark programming. These should lock wm_mutex, copy the appropriate state values into intel_crtc->active, and then call the internal program watermarks function. v5: - Skip intermediate watermark calculation/check during initial hardware readout since we don't trust the existing HW values (and don't have valid values of our own yet). - Don't try to call .optimize_watermarks() on platforms that don't have atomic watermarks yet. (Maarten) v6: - Rebase v7: - Further rebase v8: - A few minor indentation and line length fixes v9: - Yet another rebase since Maarten's patches reworked a bunch of the code (wm_pre, wm_post, etc.) that this was previously based on. v10: - Move wm_mutex to dev_priv to protect against racing commits against disjoint CRTC sets. (Maarten) - Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten) v11: - Now that we've moved to atomic watermark updates, make sure we call the proper function to program watermarks in {ironlake,haswell}_crtc_enable(); the failure to do so on the previous patch iteration led to us not actually programming the watermarks before turning on the CRTC, which was the cause of the underruns that the CI system was seeing. - Fix inverted logic for determining when to optimize watermarks. We were needlessly optimizing when the intermediate/optimal values were the same (harmless), but not actually optimizing when they differed (also harmless, but wasteful from a power/bandwidth perspective). Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1456276813-5689-1-git-send-email-matthew.d.roper@intel.com
2016-02-23 17:20:13 -08:00
*/
struct mutex wm_mutex;
} wm;
struct dram_info {
bool wm_lv_0_adjust_needed;
u8 num_channels;
bool symmetric_memory;
enum intel_dram_type {
INTEL_DRAM_UNKNOWN,
INTEL_DRAM_DDR3,
INTEL_DRAM_DDR4,
INTEL_DRAM_LPDDR3,
INTEL_DRAM_LPDDR4,
INTEL_DRAM_DDR5,
INTEL_DRAM_LPDDR5,
} type;
u8 num_qgv_points;
drm/i915: Implement PSF GV point support PSF GV points are an additional factor that can limit the bandwidth available to display, separate from the traditional QGV points. Whereas traditional QGV points represent possible memory clock frequencies, PSF GV points reflect possible frequencies of the memory fabric. Switching between PSF GV points has the advantage of incurring almost no memory access block time and thus does not need to be accounted for in watermark calculations. This patch adds support for those on top of regular QGV points. Those are supposed to be used simultaneously, i.e we are always at some QGV and some PSF GV point, based on the current video mode requirements. Bspec: 64631, 53998 v2: Seems that initial assumption made during ml conversation was wrong, PCode rejects any masks containing points beyond the ones returned, so even though BSpec says we have around 8 points theoretically, we can mask/unmask only those which are returned, trying to manipulate those beyond causes a failure from PCode. So switched back to generating mask from 1 - num_qgv_points, where num_qgv_points is the actual amount of points, advertised by PCode. v3: - Extended restricted qgv point mask to 0xf, as we have now 3:2 bits for PSF GV points(Matt Roper) - Replaced val2 with NULL from PCode request, since its not being used(Matt Roper) - Replaced %d to 0x%x for better readability(thanks for spotting) Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210531064845.4389-2-stanislav.lisovskiy@intel.com
2021-05-31 09:48:45 +03:00
u8 num_psf_gv_points;
} dram_info;
drm/i915: Make sure we have enough memory bandwidth on ICL ICL has so many planes that it can easily exceed the maximum effective memory bandwidth of the system. We must therefore check that we don't exceed that limit. The algorithm is very magic number heavy and lacks sufficient explanation for now. We also have no sane way to query the memory clock and timings, so we must rely on a combination of raw readout from the memory controller and hardcoded assumptions. The memory controller values obviously change as the system jumps between the different SAGV points, so we try to stabilize it first by disabling SAGV for the duration of the readout. The utilized bandwidth is tracked via a device wide atomic private object. That is actually not robust because we can't afford to enforce strict global ordering between the pipes. Thus I think I'll need to change this to simply chop up the available bandwidth between all the active pipes. Each pipe can then do whatever it wants as long as it doesn't exceed its budget. That scheme will also require that we assume that any number of planes could be active at any time. TODO: make it robust and deal with all the open questions v2: Sleep longer after disabling SAGV v3: Poll for the dclk to get raised (seen it take 250ms!) If the system has 2133MT/s memory then we pointlessly wait one full second :( v4: Use the new pcode interface to get the qgv points rather that using hardcoded numbers v5: Move the pcode stuff into intel_bw.c (Matt) s/intel_sagv_info/intel_qgv_info/ Do the NV12/P010 as per spec for now (Matt) s/IS_ICELAKE/IS_GEN11/ v6: Ignore bandwidth limits if the pcode query fails Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190524153614.32410-1-ville.syrjala@linux.intel.com
2019-05-24 18:36:14 +03:00
struct intel_bw_info {
/* for each QGV point */
unsigned int deratedbw[I915_NUM_QGV_POINTS];
drm/i915: Implement PSF GV point support PSF GV points are an additional factor that can limit the bandwidth available to display, separate from the traditional QGV points. Whereas traditional QGV points represent possible memory clock frequencies, PSF GV points reflect possible frequencies of the memory fabric. Switching between PSF GV points has the advantage of incurring almost no memory access block time and thus does not need to be accounted for in watermark calculations. This patch adds support for those on top of regular QGV points. Those are supposed to be used simultaneously, i.e we are always at some QGV and some PSF GV point, based on the current video mode requirements. Bspec: 64631, 53998 v2: Seems that initial assumption made during ml conversation was wrong, PCode rejects any masks containing points beyond the ones returned, so even though BSpec says we have around 8 points theoretically, we can mask/unmask only those which are returned, trying to manipulate those beyond causes a failure from PCode. So switched back to generating mask from 1 - num_qgv_points, where num_qgv_points is the actual amount of points, advertised by PCode. v3: - Extended restricted qgv point mask to 0xf, as we have now 3:2 bits for PSF GV points(Matt Roper) - Replaced val2 with NULL from PCode request, since its not being used(Matt Roper) - Replaced %d to 0x%x for better readability(thanks for spotting) Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210531064845.4389-2-stanislav.lisovskiy@intel.com
2021-05-31 09:48:45 +03:00
/* for each PSF GV point */
unsigned int psf_bw[I915_NUM_PSF_GV_POINTS];
u8 num_qgv_points;
drm/i915: Implement PSF GV point support PSF GV points are an additional factor that can limit the bandwidth available to display, separate from the traditional QGV points. Whereas traditional QGV points represent possible memory clock frequencies, PSF GV points reflect possible frequencies of the memory fabric. Switching between PSF GV points has the advantage of incurring almost no memory access block time and thus does not need to be accounted for in watermark calculations. This patch adds support for those on top of regular QGV points. Those are supposed to be used simultaneously, i.e we are always at some QGV and some PSF GV point, based on the current video mode requirements. Bspec: 64631, 53998 v2: Seems that initial assumption made during ml conversation was wrong, PCode rejects any masks containing points beyond the ones returned, so even though BSpec says we have around 8 points theoretically, we can mask/unmask only those which are returned, trying to manipulate those beyond causes a failure from PCode. So switched back to generating mask from 1 - num_qgv_points, where num_qgv_points is the actual amount of points, advertised by PCode. v3: - Extended restricted qgv point mask to 0xf, as we have now 3:2 bits for PSF GV points(Matt Roper) - Replaced val2 with NULL from PCode request, since its not being used(Matt Roper) - Replaced %d to 0x%x for better readability(thanks for spotting) Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210531064845.4389-2-stanislav.lisovskiy@intel.com
2021-05-31 09:48:45 +03:00
u8 num_psf_gv_points;
u8 num_planes;
drm/i915: Make sure we have enough memory bandwidth on ICL ICL has so many planes that it can easily exceed the maximum effective memory bandwidth of the system. We must therefore check that we don't exceed that limit. The algorithm is very magic number heavy and lacks sufficient explanation for now. We also have no sane way to query the memory clock and timings, so we must rely on a combination of raw readout from the memory controller and hardcoded assumptions. The memory controller values obviously change as the system jumps between the different SAGV points, so we try to stabilize it first by disabling SAGV for the duration of the readout. The utilized bandwidth is tracked via a device wide atomic private object. That is actually not robust because we can't afford to enforce strict global ordering between the pipes. Thus I think I'll need to change this to simply chop up the available bandwidth between all the active pipes. Each pipe can then do whatever it wants as long as it doesn't exceed its budget. That scheme will also require that we assume that any number of planes could be active at any time. TODO: make it robust and deal with all the open questions v2: Sleep longer after disabling SAGV v3: Poll for the dclk to get raised (seen it take 250ms!) If the system has 2133MT/s memory then we pointlessly wait one full second :( v4: Use the new pcode interface to get the qgv points rather that using hardcoded numbers v5: Move the pcode stuff into intel_bw.c (Matt) s/intel_sagv_info/intel_qgv_info/ Do the NV12/P010 as per spec for now (Matt) s/IS_ICELAKE/IS_GEN11/ v6: Ignore bandwidth limits if the pcode query fails Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190524153614.32410-1-ville.syrjala@linux.intel.com
2019-05-24 18:36:14 +03:00
} max_bw[6];
struct intel_global_obj bw_obj;
drm/i915: Make sure we have enough memory bandwidth on ICL ICL has so many planes that it can easily exceed the maximum effective memory bandwidth of the system. We must therefore check that we don't exceed that limit. The algorithm is very magic number heavy and lacks sufficient explanation for now. We also have no sane way to query the memory clock and timings, so we must rely on a combination of raw readout from the memory controller and hardcoded assumptions. The memory controller values obviously change as the system jumps between the different SAGV points, so we try to stabilize it first by disabling SAGV for the duration of the readout. The utilized bandwidth is tracked via a device wide atomic private object. That is actually not robust because we can't afford to enforce strict global ordering between the pipes. Thus I think I'll need to change this to simply chop up the available bandwidth between all the active pipes. Each pipe can then do whatever it wants as long as it doesn't exceed its budget. That scheme will also require that we assume that any number of planes could be active at any time. TODO: make it robust and deal with all the open questions v2: Sleep longer after disabling SAGV v3: Poll for the dclk to get raised (seen it take 250ms!) If the system has 2133MT/s memory then we pointlessly wait one full second :( v4: Use the new pcode interface to get the qgv points rather that using hardcoded numbers v5: Move the pcode stuff into intel_bw.c (Matt) s/intel_sagv_info/intel_qgv_info/ Do the NV12/P010 as per spec for now (Matt) s/IS_ICELAKE/IS_GEN11/ v6: Ignore bandwidth limits if the pcode query fails Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190524153614.32410-1-ville.syrjala@linux.intel.com
2019-05-24 18:36:14 +03:00
struct intel_runtime_pm runtime_pm;
struct i915_perf perf;
drm/i915: Add i915 perf infrastructure Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
2016-11-07 19:49:47 +00:00
/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
struct intel_gt gt;
struct {
struct i915_gem_contexts {
spinlock_t lock; /* locks list */
struct list_head list;
} contexts;
/*
* We replace the local file with a global mappings as the
* backing storage for the mmap is on the device and not
* on the struct file, and we do not want to prolong the
* lifetime of the local fd. To minimise the number of
* anonymous inodes we create, we use a global singleton to
* share the global mapping.
*/
struct file *mmap_singleton;
} gem;
u8 framestart_delay;
/* Window2 specifies time required to program DSB (Window2) in number of scan lines */
u8 window2_delay;
u8 pch_ssc_use;
/* For i915gm/i945gm vblank irq workaround */
u8 vblank_enabled;
bool irq_enabled;
/* perform PHY state sanity checks? */
bool chv_phy_assert[2];
bool ipc_enabled;
struct intel_audio_private audio;
drm/i915/pmu: Expose a PMU interface for perf queries From: Chris Wilson <chris@chris-wilson.co.uk> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> The first goal is to be able to measure GPU (and invidual ring) busyness without having to poll registers from userspace. (Which not only incurs holding the forcewake lock indefinitely, perturbing the system, but also runs the risk of hanging the machine.) As an alternative we can use the perf event counter interface to sample the ring registers periodically and send those results to userspace. Functionality we are exporting to userspace is via the existing perf PMU API and can be exercised via the existing tools. For example: perf stat -a -e i915/rcs0-busy/ -I 1000 Will print the render engine busynnes once per second. All the performance counters can be enumerated (perf list) and have their unit of measure correctly reported in sysfs. v1-v2 (Chris Wilson): v2: Use a common timer for the ring sampling. v3: (Tvrtko Ursulin) * Decouple uAPI from i915 engine ids. * Complete uAPI defines. * Refactor some code to helpers for clarity. * Skip sampling disabled engines. * Expose counters in sysfs. * Pass in fake regs to avoid null ptr deref in perf core. * Convert to class/instance uAPI. * Use shared driver code for rc6 residency, power and frequency. v4: (Dmitry Rogozhkin) * Register PMU with .task_ctx_nr=perf_invalid_context * Expose cpumask for the PMU with the single CPU in the mask * Properly support pmu->stop(): it should call pmu->read() * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) * Introduce refcounting of event subscriptions. * Make pmu.busy_stats a refcounter to avoid busy stats going away with some deleted event. * Expose cpumask for i915 PMU to avoid multiple events creation of the same type followed by counter aggregation by perf-stat. * Track CPUs getting online/offline to migrate perf context. If (likely) cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be needed to see effect of CPU status tracking. * End result is that only global events are supported and perf stat works correctly. * Deny perf driver level sampling - it is prohibited for uncore PMU. v5: (Tvrtko Ursulin) * Don't hardcode number of engine samplers. * Rewrite event ref-counting for correctness and simplicity. * Store initial counter value when starting already enabled events to correctly report values to all listeners. * Fix RC6 residency readout. * Comments, GPL header. v6: * Add missing entry to v4 changelog. * Fix accounting in CPU hotplug case by copying the approach from arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) v7: * Log failure message only on failure. * Remove CPU hotplug notification state on unregister. v8: * Fix error unwind on failed registration. * Checkpatch cleanup. v9: * Drop the energy metric, it is available via intel_rapl_perf. (Ville Syrjälä) * Use HAS_RC6(p). (Chris Wilson) * Handle unsupported non-engine events. (Dmitry Rogozhkin) * Rebase for intel_rc6_residency_ns needing caller managed runtime pm. * Drop HAS_RC6 checks from the read callback since creating those events will be rejected at init time already. * Add counter units to sysfs so perf stat output is nicer. * Cleanup the attribute tables for brevity and readability. v10: * Fixed queued accounting. v11: * Move intel_engine_lookup_user to intel_engine_cs.c * Commit update. (Joonas Lahtinen) v12: * More accurate sampling. (Chris Wilson) * Store and report frequency in MHz for better usability from perf stat. * Removed metrics: queued, interrupts, rc6 counters. * Sample engine busyness based on seqno difference only for less MMIO (and forcewake) on all platforms. (Chris Wilson) v13: * Comment spelling, use mul_u32_u32 to work around potential GCC issue and somne code alignment changes. (Chris Wilson) v14: * Rebase. v15: * Rebase for RPS refactoring. v16: * Use the dynamic slot in the CPU hotplug state machine so that we are free to setup our state as multi-instance. Previously we were re-using the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as multi-instance, nor owned by our driver to start with. * Register the CPU hotplug handlers after the PMU, otherwise the callback will get called before the PMU is initialized which can end up in perf_pmu_migrate_context with an un-initialized base. * Added workaround for a probable bug in cpuhp core. v17: * Remove workaround for the cpuhp bug. v18: * Rebase for drm_i915_gem_engine_class getting upstream before us. v19: * Rebase. (trivial) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com
2017-11-21 18:18:45 +00:00
struct i915_pmu pmu;
struct i915_hdcp_comp_master *hdcp_master;
bool hdcp_comp_added;
/* Mutex to protect the above hdcp component related values. */
struct mutex hdcp_comp_mutex;
/* The TTM device structure. */
struct ttm_device bdev;
I915_SELFTEST_DECLARE(struct i915_selftest_stash selftest;)
/*
* NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
* will be rejected. Instead look for a better place.
*/
};
static inline struct drm_i915_private *to_i915(const struct drm_device *dev)
{
return container_of(dev, struct drm_i915_private, drm);
}
static inline struct drm_i915_private *kdev_to_i915(struct device *kdev)
{
return dev_get_drvdata(kdev);
}
static inline struct drm_i915_private *pdev_to_i915(struct pci_dev *pdev)
{
return pci_get_drvdata(pdev);
}
/* Simple iterator over all initialised engines */
drm/i915: Allocate intel_engine_cs structure only for the enabled engines With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs *engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine** macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-13 22:44:48 +05:30
#define for_each_engine(engine__, dev_priv__, id__) \
for ((id__) = 0; \
(id__) < I915_NUM_ENGINES; \
(id__)++) \
for_each_if ((engine__) = (dev_priv__)->engine[(id__)])
/* Iterator over subset of engines selected by mask */
#define for_each_engine_masked(engine__, gt__, mask__, tmp__) \
for ((tmp__) = (mask__) & (gt__)->info.engine_mask; \
(tmp__) ? \
((engine__) = (gt__)->engine[__mask_next_bit(tmp__)]), 1 : \
0;)
#define rb_to_uabi_engine(rb) \
rb_entry_safe(rb, struct intel_engine_cs, uabi_node)
#define for_each_uabi_engine(engine__, i915__) \
for ((engine__) = rb_to_uabi_engine(rb_first(&(i915__)->uabi_engines));\
(engine__); \
(engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node)))
#define for_each_uabi_class_engine(engine__, class__, i915__) \
for ((engine__) = intel_engine_lookup_user((i915__), (class__), 0); \
(engine__) && (engine__)->uabi_class == (class__); \
(engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node)))
#define I915_GTT_OFFSET_NONE ((u32)-1)
drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-18 23:28:09 +02:00
/*
* Frontbuffer tracking bits. Set in obj->frontbuffer_bits while a gem bo is
* considered to be the frontbuffer for the given plane interface-wise. This
drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-18 23:28:09 +02:00
* doesn't mean that the hw necessarily already scans it out, but that any
* rendering (by the cpu or gpu) will land in the frontbuffer eventually.
*
* We have one bit per pipe and per scanout plane type.
*/
#define INTEL_FRONTBUFFER_BITS_PER_PIPE 8
#define INTEL_FRONTBUFFER(pipe, plane_id) ({ \
BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES > 32); \
BUILD_BUG_ON(I915_MAX_PLANES > INTEL_FRONTBUFFER_BITS_PER_PIPE); \
BIT((plane_id) + INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)); \
})
drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-18 23:28:09 +02:00
#define INTEL_FRONTBUFFER_OVERLAY(pipe) \
BIT(INTEL_FRONTBUFFER_BITS_PER_PIPE - 1 + INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe))
#define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
GENMASK(INTEL_FRONTBUFFER_BITS_PER_PIPE * ((pipe) + 1) - 1, \
INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe))
drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-18 23:28:09 +02:00
#define INTEL_INFO(dev_priv) (&(dev_priv)->__info)
#define RUNTIME_INFO(dev_priv) (&(dev_priv)->__runtime)
#define DRIVER_CAPS(dev_priv) (&(dev_priv)->caps)
#define INTEL_DEVID(dev_priv) (RUNTIME_INFO(dev_priv)->device_id)
drm/i915: Add release id version Besides the arch version returned by GRAPHICS_VER(), new platforms contain a "release id" to make clear the difference from one platform to another. The release id number is not formally defined by hardware until future platforms that will expose it via a new GMD_ID register. For the platforms we support before that register becomes available we will set the values in software and we can set them as we please. So the plan is to set them so we can group different features under a single GRAPHICS_VER_FULL() check. After GMD_ID is used, the usefulness of a "full version check" will be greatly reduced and will be mostly used for deciding workarounds and a few code paths. So it makes sense to keep it as a separate field from graphics_ver. Also, as a platform with `release == n` may be closer feature-wise to `n - 2` than to `n - 1`, use the word "release" rather than the more common "minor" for this This is a mix of 2 independent changes: one by me and the other by Matt Roper. v2: - Reword commit message to make it clearer why we don't call it "minor" (Matt Roper and Tvrtko) - Rename variables s/*_ver_release/*_rel/ and print them in a single line formatted as {ver}.{rel:2} (Jani and Matt Roper) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210707235921.2416911-2-lucas.demarchi@intel.com
2021-07-07 16:59:21 -07:00
#define IP_VER(ver, rel) ((ver) << 8 | (rel))
drm/i915: Add release id version Besides the arch version returned by GRAPHICS_VER(), new platforms contain a "release id" to make clear the difference from one platform to another. The release id number is not formally defined by hardware until future platforms that will expose it via a new GMD_ID register. For the platforms we support before that register becomes available we will set the values in software and we can set them as we please. So the plan is to set them so we can group different features under a single GRAPHICS_VER_FULL() check. After GMD_ID is used, the usefulness of a "full version check" will be greatly reduced and will be mostly used for deciding workarounds and a few code paths. So it makes sense to keep it as a separate field from graphics_ver. Also, as a platform with `release == n` may be closer feature-wise to `n - 2` than to `n - 1`, use the word "release" rather than the more common "minor" for this This is a mix of 2 independent changes: one by me and the other by Matt Roper. v2: - Reword commit message to make it clearer why we don't call it "minor" (Matt Roper and Tvrtko) - Rename variables s/*_ver_release/*_rel/ and print them in a single line formatted as {ver}.{rel:2} (Jani and Matt Roper) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210707235921.2416911-2-lucas.demarchi@intel.com (cherry picked from commit ca6374e267e2735fe382fe95de2a8a9c30c6bdb3) Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2021-07-07 16:59:21 -07:00
#define GRAPHICS_VER(i915) (INTEL_INFO(i915)->graphics_ver)
drm/i915: Add release id version Besides the arch version returned by GRAPHICS_VER(), new platforms contain a "release id" to make clear the difference from one platform to another. The release id number is not formally defined by hardware until future platforms that will expose it via a new GMD_ID register. For the platforms we support before that register becomes available we will set the values in software and we can set them as we please. So the plan is to set them so we can group different features under a single GRAPHICS_VER_FULL() check. After GMD_ID is used, the usefulness of a "full version check" will be greatly reduced and will be mostly used for deciding workarounds and a few code paths. So it makes sense to keep it as a separate field from graphics_ver. Also, as a platform with `release == n` may be closer feature-wise to `n - 2` than to `n - 1`, use the word "release" rather than the more common "minor" for this This is a mix of 2 independent changes: one by me and the other by Matt Roper. v2: - Reword commit message to make it clearer why we don't call it "minor" (Matt Roper and Tvrtko) - Rename variables s/*_ver_release/*_rel/ and print them in a single line formatted as {ver}.{rel:2} (Jani and Matt Roper) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210707235921.2416911-2-lucas.demarchi@intel.com (cherry picked from commit ca6374e267e2735fe382fe95de2a8a9c30c6bdb3) Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2021-07-07 16:59:21 -07:00
#define GRAPHICS_VER_FULL(i915) IP_VER(INTEL_INFO(i915)->graphics_ver, \
INTEL_INFO(i915)->graphics_rel)
#define IS_GRAPHICS_VER(i915, from, until) \
(GRAPHICS_VER(i915) >= (from) && GRAPHICS_VER(i915) <= (until))
#define MEDIA_VER(i915) (INTEL_INFO(i915)->media_ver)
drm/i915: Add release id version Besides the arch version returned by GRAPHICS_VER(), new platforms contain a "release id" to make clear the difference from one platform to another. The release id number is not formally defined by hardware until future platforms that will expose it via a new GMD_ID register. For the platforms we support before that register becomes available we will set the values in software and we can set them as we please. So the plan is to set them so we can group different features under a single GRAPHICS_VER_FULL() check. After GMD_ID is used, the usefulness of a "full version check" will be greatly reduced and will be mostly used for deciding workarounds and a few code paths. So it makes sense to keep it as a separate field from graphics_ver. Also, as a platform with `release == n` may be closer feature-wise to `n - 2` than to `n - 1`, use the word "release" rather than the more common "minor" for this This is a mix of 2 independent changes: one by me and the other by Matt Roper. v2: - Reword commit message to make it clearer why we don't call it "minor" (Matt Roper and Tvrtko) - Rename variables s/*_ver_release/*_rel/ and print them in a single line formatted as {ver}.{rel:2} (Jani and Matt Roper) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210707235921.2416911-2-lucas.demarchi@intel.com (cherry picked from commit ca6374e267e2735fe382fe95de2a8a9c30c6bdb3) Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2021-07-07 16:59:21 -07:00
#define MEDIA_VER_FULL(i915) IP_VER(INTEL_INFO(i915)->media_ver, \
INTEL_INFO(i915)->media_rel)
#define IS_MEDIA_VER(i915, from, until) \
(MEDIA_VER(i915) >= (from) && MEDIA_VER(i915) <= (until))
#define DISPLAY_VER(i915) (INTEL_INFO(i915)->display.ver)
drm/i915/display: rename display version macros While converting the rest of the driver to use GRAPHICS_VER() and MEDIA_VER(), following what was done for display, some discussions went back on what we did for display: 1) Why is the == comparison special that deserves a separate macro instead of just getting the version and comparing directly like is done for >, >=, <=? 2) IS_DISPLAY_RANGE() is weird in that it omits the "_VER" for brevity. If we remove the current users of IS_DISPLAY_VER(), we could actually repurpose it for a range check With (1) there could be an advantage if we used gen_mask since multiple conditionals be combined by the compiler in a single and instruction and check the result. However a) INTEL_GEN() doesn't use the mask since it would make the code bigger everywhere else and b) in the cases it made sense, it also made sense to convert to the _RANGE() variant. So here we repurpose IS_DISPLAY_VER() to work with a [ from, to ] range like was the IS_DISPLAY_RANGE() and convert the current IS_DISPLAY_VER() users to use == and != operators. Aside from the definition changes, this was done by the following semantic patch: @@ expression dev_priv, E1; @@ - !IS_DISPLAY_VER(dev_priv, E1) + DISPLAY_VER(dev_priv) != E1 @@ expression dev_priv, E1; @@ - IS_DISPLAY_VER(dev_priv, E1) + DISPLAY_VER(dev_priv) == E1 @@ expression dev_priv, from, until; @@ - IS_DISPLAY_RANGE(dev_priv, from, until) + IS_DISPLAY_VER(dev_priv, from, until) Cc: Jani Nikula <jani.nikula@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> [Jani: Minor conflict resolve while applying.] Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20210413051002.92589-4-lucas.demarchi@intel.com
2021-04-12 22:09:53 -07:00
#define IS_DISPLAY_VER(i915, from, until) \
drm/i915: Add DISPLAY_VER() and related macros Although we've long referred to platforms by a single "GEN" number, the hardware teams have recommended that we stop doing this since the various component IP blocks are going to start using independent number schemes with varying cadence. To support this, hardware platforms a bit down the road are going to start providing MMIO registers that the driver can read to obtain the "graphics version," "media version," and "display version" without needing to do a PCI ID -> platform -> version translation. Although our current platforms don't yet expose these registers (and the next couple we release probably won't have them yet either), the hardware teams would still like to see us move to this independent numbering scheme now in preparation. For i915 that means we should try to eliminate all usage of INTEL_GEN() throughout our code and instead replace it with separate GRAPHICS_VER(), MEDIA_VER(), and DISPLAY_VER() constructs in the code. For old platforms, these will all usually give the same value for each IP block (aside from a few special cases like GLK which we can no more accurately represent as graphics=9 + display=10), but future platforms will have more flexibility to bump IP version numbers independently. The upcoming ADL-P platform will have a display version of 13 and a graphics version of 12, so let's just the first step of breaking out DISPLAY_VER(), but leaving the rest of INTEL_GEN() untouched for now. For now we'll automatically derive the display version from the platform's INTEL_GEN() value except in cases where an alternative display version is explicitly provided in the device info structure. We also add some helper macros IS_DISPLAY_VER(i915, ver) and IS_DISPLAY_RANGE(i915, from, until) that match the behavior of the existing gen-based macros. However unlike IS_GEN(), we will implement those macros with direct comparisons rather than trying to maintain a mask to help compiler optimization. In practice the optimization winds up not being used in very many places (since the vast majority of our platform checks are of the form "gen >= x") so there is pretty minimal size reduction in the final driver binary[1]. We're also likely going to need to extend these version numbers to non-integer major.minor values at some point in the future, so the mask approach won't work at all once we get to platforms like that. [1] The results before/after the next patch in this series, which switches our code over to the new display macros: $ size i915.ko.{orig,new} text data bss dec hex filename 2940291 102944 5384 3048619 2e84ab i915.ko.orig 2940723 102956 5384 3049063 2e8667 i915.ko.new v2: - Move version into device info's display sub-struct. (Jani) - Add extra parentheses to macros. (Jani) - Note the lack of genmask optimization in the display-based macros and give size data. (Lucas) Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210320044245.3920043-3-matthew.d.roper@intel.com
2021-03-19 21:42:41 -07:00
(DISPLAY_VER(i915) >= (from) && DISPLAY_VER(i915) <= (until))
#define INTEL_REVID(dev_priv) (to_pci_dev((dev_priv)->drm.dev)->revision)
#define HAS_DSB(dev_priv) (INTEL_INFO(dev_priv)->display.has_dsb)
#define INTEL_DISPLAY_STEP(__i915) (RUNTIME_INFO(__i915)->step.display_step)
#define INTEL_GT_STEP(__i915) (RUNTIME_INFO(__i915)->step.gt_step)
#define IS_DISPLAY_STEP(__i915, since, until) \
(drm_WARN_ON(&(__i915)->drm, INTEL_DISPLAY_STEP(__i915) == STEP_NONE), \
drm/i915: Make display workaround upper bounds exclusive Workarounds are documented in the bspec with an exclusive upper bound (i.e., a "fixed" stepping that no longer needs the workaround). This makes our driver's use of an inclusive upper bound for stepping ranges confusing; the differing notation between code and bspec makes it very easy for mistakes to creep in. Let's switch the upper bound of our IS_{GT,DISP}_STEP macros over to use an exclusive upper bound like the bspec does. This also has the benefit of helping make sure workarounds are properly handled for new minor steppings that show up (e.g., an A1 between the A0 and B0 we already knew about) --- if the new intermediate stepping pulls in hardware fixes early, there will be an update to the workaround definition which lets us know we need to change our code. If the new stepping does not pull a hardware fix earlier, then the new stepping will already be captured properly by the "[begin, fix)" range in the code. We'll probably need to be extra vigilant in code review of new workarounds for the near future to make sure developers notice the new semantics of workaround bounds. But we just migrated a bunch of our platforms from the IS_REVID bounds over to IS_{GT,DISP}_STEP, so people are already adjusting to the new macros and now is a good time to make this change too. [mattrope: Split out display changes to apply through intel-next tree] Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-8-matthew.d.roper@intel.com
2021-07-16 22:14:26 -07:00
INTEL_DISPLAY_STEP(__i915) >= (since) && INTEL_DISPLAY_STEP(__i915) < (until))
#define IS_GT_STEP(__i915, since, until) \
(drm_WARN_ON(&(__i915)->drm, INTEL_GT_STEP(__i915) == STEP_NONE), \
drm/i915: Make GT workaround upper bounds exclusive Workarounds are documented in the bspec with an exclusive upper bound (i.e., a "fixed" stepping that no longer needs the workaround). This makes our driver's use of an inclusive upper bound for stepping ranges confusing; the differing notation between code and bspec makes it very easy for mistakes to creep in. Let's switch the upper bound of our IS_{GT,DISP}_STEP macros over to use an exclusive upper bound like the bspec does. This also has the benefit of helping make sure workarounds are properly handled for new minor steppings that show up (e.g., an A1 between the A0 and B0 we already knew about) --- if the new intermediate stepping pulls in hardware fixes early, there will be an update to the workaround definition which lets us know we need to change our code. If the new stepping does not pull a hardware fix earlier, then the new stepping will already be captured properly by the "[begin, fix)" range in the code. We'll probably need to be extra vigilant in code review of new workarounds for the near future to make sure developers notice the new semantics of workaround bounds. But we just migrated a bunch of our platforms from the IS_REVID bounds over to IS_{GT,DISP}_STEP, so people are already adjusting to the new macros and now is a good time to make this change too. [mattrope: Split out GT changes to apply through gt-next tree] Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-8-matthew.d.roper@intel.com
2021-07-16 22:14:26 -07:00
INTEL_GT_STEP(__i915) >= (since) && INTEL_GT_STEP(__i915) < (until))
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
static __always_inline unsigned int
__platform_mask_index(const struct intel_runtime_info *info,
enum intel_platform p)
{
const unsigned int pbits =
BITS_PER_TYPE(info->platform_mask[0]) - INTEL_SUBPLATFORM_BITS;
/* Expand the platform_mask array if this fails. */
BUILD_BUG_ON(INTEL_MAX_PLATFORMS >
pbits * ARRAY_SIZE(info->platform_mask));
return p / pbits;
}
static __always_inline unsigned int
__platform_mask_bit(const struct intel_runtime_info *info,
enum intel_platform p)
{
const unsigned int pbits =
BITS_PER_TYPE(info->platform_mask[0]) - INTEL_SUBPLATFORM_BITS;
return p % pbits + INTEL_SUBPLATFORM_BITS;
}
static inline u32
intel_subplatform(const struct intel_runtime_info *info, enum intel_platform p)
{
const unsigned int pi = __platform_mask_index(info, p);
return info->platform_mask[pi] & INTEL_SUBPLATFORM_MASK;
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
}
static __always_inline bool
IS_PLATFORM(const struct drm_i915_private *i915, enum intel_platform p)
{
const struct intel_runtime_info *info = RUNTIME_INFO(i915);
const unsigned int pi = __platform_mask_index(info, p);
const unsigned int pb = __platform_mask_bit(info, p);
BUILD_BUG_ON(!__builtin_constant_p(p));
return info->platform_mask[pi] & BIT(pb);
}
static __always_inline bool
IS_SUBPLATFORM(const struct drm_i915_private *i915,
enum intel_platform p, unsigned int s)
{
const struct intel_runtime_info *info = RUNTIME_INFO(i915);
const unsigned int pi = __platform_mask_index(info, p);
const unsigned int pb = __platform_mask_bit(info, p);
const unsigned int msb = BITS_PER_TYPE(info->platform_mask[0]) - 1;
const u32 mask = info->platform_mask[pi];
BUILD_BUG_ON(!__builtin_constant_p(p));
BUILD_BUG_ON(!__builtin_constant_p(s));
BUILD_BUG_ON((s) >= INTEL_SUBPLATFORM_BITS);
/* Shift and test on the MSB position so sign flag can be used. */
return ((mask << (msb - pb)) & (mask << (msb - s))) & BIT(msb);
}
#define IS_MOBILE(dev_priv) (INTEL_INFO(dev_priv)->is_mobile)
#define IS_DGFX(dev_priv) (INTEL_INFO(dev_priv)->is_dgfx)
#define IS_I830(dev_priv) IS_PLATFORM(dev_priv, INTEL_I830)
#define IS_I845G(dev_priv) IS_PLATFORM(dev_priv, INTEL_I845G)
#define IS_I85X(dev_priv) IS_PLATFORM(dev_priv, INTEL_I85X)
#define IS_I865G(dev_priv) IS_PLATFORM(dev_priv, INTEL_I865G)
#define IS_I915G(dev_priv) IS_PLATFORM(dev_priv, INTEL_I915G)
#define IS_I915GM(dev_priv) IS_PLATFORM(dev_priv, INTEL_I915GM)
#define IS_I945G(dev_priv) IS_PLATFORM(dev_priv, INTEL_I945G)
#define IS_I945GM(dev_priv) IS_PLATFORM(dev_priv, INTEL_I945GM)
#define IS_I965G(dev_priv) IS_PLATFORM(dev_priv, INTEL_I965G)
#define IS_I965GM(dev_priv) IS_PLATFORM(dev_priv, INTEL_I965GM)
#define IS_G45(dev_priv) IS_PLATFORM(dev_priv, INTEL_G45)
#define IS_GM45(dev_priv) IS_PLATFORM(dev_priv, INTEL_GM45)
#define IS_G4X(dev_priv) (IS_G45(dev_priv) || IS_GM45(dev_priv))
#define IS_PINEVIEW(dev_priv) IS_PLATFORM(dev_priv, INTEL_PINEVIEW)
#define IS_G33(dev_priv) IS_PLATFORM(dev_priv, INTEL_G33)
#define IS_IRONLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_IRONLAKE)
#define IS_IRONLAKE_M(dev_priv) \
(IS_PLATFORM(dev_priv, INTEL_IRONLAKE) && IS_MOBILE(dev_priv))
#define IS_SANDYBRIDGE(dev_priv) IS_PLATFORM(dev_priv, INTEL_SANDYBRIDGE)
#define IS_IVYBRIDGE(dev_priv) IS_PLATFORM(dev_priv, INTEL_IVYBRIDGE)
#define IS_IVB_GT1(dev_priv) (IS_IVYBRIDGE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 1)
#define IS_VALLEYVIEW(dev_priv) IS_PLATFORM(dev_priv, INTEL_VALLEYVIEW)
#define IS_CHERRYVIEW(dev_priv) IS_PLATFORM(dev_priv, INTEL_CHERRYVIEW)
#define IS_HASWELL(dev_priv) IS_PLATFORM(dev_priv, INTEL_HASWELL)
#define IS_BROADWELL(dev_priv) IS_PLATFORM(dev_priv, INTEL_BROADWELL)
#define IS_SKYLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_SKYLAKE)
#define IS_BROXTON(dev_priv) IS_PLATFORM(dev_priv, INTEL_BROXTON)
#define IS_KABYLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_KABYLAKE)
#define IS_GEMINILAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_GEMINILAKE)
#define IS_COFFEELAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_COFFEELAKE)
#define IS_COMETLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_COMETLAKE)
#define IS_ICELAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_ICELAKE)
#define IS_JSL_EHL(dev_priv) (IS_PLATFORM(dev_priv, INTEL_JASPERLAKE) || \
IS_PLATFORM(dev_priv, INTEL_ELKHARTLAKE))
#define IS_TIGERLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_TIGERLAKE)
#define IS_ROCKETLAKE(dev_priv) IS_PLATFORM(dev_priv, INTEL_ROCKETLAKE)
#define IS_DG1(dev_priv) IS_PLATFORM(dev_priv, INTEL_DG1)
drm/i915/adl_s: Add ADL-S platform info and PCI ids - Add the initial platform information for Alderlake-S. - Specify ppgtt_size value - Add dma_mask_size - Add ADLS REVIDs - HW tracking(Selective Update Tracking Enable) has been removed from ADLS. Disable PSR2 till we enable software/ manual tracking. v2: - Add support for different ADLS SOC steppings to select correct GT/DISP stepping based on Bspec 53655 based on feedback from Matt Roper.(aswarup) v3: - Make display/gt steppings info generic for reuse with TGL and ADLS. - Modify the macros to reuse tgl_revids_get() - Add HTI support to adls device info.(mdroper) v4: - Rebase on TGL patch for applying WAs based on stepping info from Matt Roper's feedback.(aswarup) v5: - Replace macros with PCI IDs in revid to stepping table. v6: remove stray adls_revids (Lucas) Bspec: 53597 Bspec: 53648 Bspec: 53655 Bspec: 48028 Bspec: 53650 BSpec: 50422 Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210119192931.1116500-2-lucas.demarchi@intel.com
2021-01-19 11:29:31 -08:00
#define IS_ALDERLAKE_S(dev_priv) IS_PLATFORM(dev_priv, INTEL_ALDERLAKE_S)
#define IS_ALDERLAKE_P(dev_priv) IS_PLATFORM(dev_priv, INTEL_ALDERLAKE_P)
#define IS_XEHPSDV(dev_priv) IS_PLATFORM(dev_priv, INTEL_XEHPSDV)
#define IS_DG2(dev_priv) IS_PLATFORM(dev_priv, INTEL_DG2)
#define IS_DG2_G10(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_DG2, INTEL_SUBPLATFORM_G10)
#define IS_DG2_G11(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_DG2, INTEL_SUBPLATFORM_G11)
#define IS_ADLS_RPLS(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_ALDERLAKE_S, INTEL_SUBPLATFORM_RPL_S)
#define IS_HSW_EARLY_SDV(dev_priv) (IS_HASWELL(dev_priv) && \
(INTEL_DEVID(dev_priv) & 0xFF00) == 0x0C00)
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
#define IS_BDW_ULT(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_BROADWELL, INTEL_SUBPLATFORM_ULT)
#define IS_BDW_ULX(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_BROADWELL, INTEL_SUBPLATFORM_ULX)
#define IS_BDW_GT3(dev_priv) (IS_BROADWELL(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 3)
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
#define IS_HSW_ULT(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_HASWELL, INTEL_SUBPLATFORM_ULT)
#define IS_HSW_GT3(dev_priv) (IS_HASWELL(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 3)
#define IS_HSW_GT1(dev_priv) (IS_HASWELL(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 1)
/* ULX machines are also considered ULT. */
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
#define IS_HSW_ULX(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_HASWELL, INTEL_SUBPLATFORM_ULX)
#define IS_SKL_ULT(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_SKYLAKE, INTEL_SUBPLATFORM_ULT)
#define IS_SKL_ULX(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_SKYLAKE, INTEL_SUBPLATFORM_ULX)
#define IS_KBL_ULT(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_KABYLAKE, INTEL_SUBPLATFORM_ULT)
#define IS_KBL_ULX(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_KABYLAKE, INTEL_SUBPLATFORM_ULX)
drm/i915/perf: Add OA unit support for Gen 8+ Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-13 12:23:03 +01:00
#define IS_SKL_GT2(dev_priv) (IS_SKYLAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 2)
#define IS_SKL_GT3(dev_priv) (IS_SKYLAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 3)
#define IS_SKL_GT4(dev_priv) (IS_SKYLAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 4)
#define IS_KBL_GT2(dev_priv) (IS_KABYLAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 2)
#define IS_KBL_GT3(dev_priv) (IS_KABYLAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 3)
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
#define IS_CFL_ULT(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_COFFEELAKE, INTEL_SUBPLATFORM_ULT)
#define IS_CFL_ULX(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_COFFEELAKE, INTEL_SUBPLATFORM_ULX)
#define IS_CFL_GT2(dev_priv) (IS_COFFEELAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 2)
#define IS_CFL_GT3(dev_priv) (IS_COFFEELAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 3)
#define IS_CML_ULT(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_COMETLAKE, INTEL_SUBPLATFORM_ULT)
#define IS_CML_ULX(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_COMETLAKE, INTEL_SUBPLATFORM_ULX)
#define IS_CML_GT2(dev_priv) (IS_COMETLAKE(dev_priv) && \
INTEL_INFO(dev_priv)->gt == 2)
drm/i915: Introduce concept of a sub-platform Concept of a sub-platform already exist in our code (like ULX and ULT platform variants and similar),implemented via the macros which check a list of device ids to determine a match. With this patch we consolidate device ids checking into a single function called during early driver load. A few low bits in the platform mask are reserved for sub-platform identification and defined as a per-platform namespace. At the same time it future proofs the platform_mask handling by preparing the code for easy extending, and tidies the very verbose WARN strings generated when IS_PLATFORM macros are embedded into a WARN type statements. v2: Fixed IS_SUBPLATFORM. Updated commit msg. v3: Chris was right, there is an ordering problem. v4: * Catch-up with new sub-platforms. * Rebase for RUNTIME_INFO. * Drop subplatform mask union tricks and convert platform_mask to an array for extensibility. v5: * Fix subplatform check. * Protect against forgetting to expand subplatform bits. * Remove platform enum tallying. * Add subplatform to error state. (Chris) * Drop macros and just use static inlines. * Remove redundant IRONLAKE_M. (Ville) v6: * Split out Ironlake change. * Optimize subplatform check. * Use __always_inline. (Lucas) * Add platform_mask comment. (Paulo) * Pass stored runtime info in error capture. (Chris) v7: * Rebased for new AML ULX device id. * Bump platform mask array size for EHL. * Stop mentioning device ids in intel_device_subplatform_init by using the trick of splitting macros i915_pciids.h. (Jani) * AML seems to be either a subplatform of KBL or CFL so express it like that. v8: * Use one device id table per subplatform. (Jani) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
2019-03-27 14:23:28 +00:00
#define IS_ICL_WITH_PORT_F(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_ICELAKE, INTEL_SUBPLATFORM_PORTF)
#define IS_TGL_U(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_TIGERLAKE, INTEL_SUBPLATFORM_ULT)
#define IS_TGL_Y(dev_priv) \
IS_SUBPLATFORM(dev_priv, INTEL_TIGERLAKE, INTEL_SUBPLATFORM_ULX)
#define IS_SKL_GT_STEP(p, since, until) (IS_SKYLAKE(p) && IS_GT_STEP(p, since, until))
#define IS_KBL_GT_STEP(dev_priv, since, until) \
(IS_KABYLAKE(dev_priv) && IS_GT_STEP(dev_priv, since, until))
#define IS_KBL_DISPLAY_STEP(dev_priv, since, until) \
(IS_KABYLAKE(dev_priv) && IS_DISPLAY_STEP(dev_priv, since, until))
#define IS_JSL_EHL_GT_STEP(p, since, until) \
(IS_JSL_EHL(p) && IS_GT_STEP(p, since, until))
#define IS_JSL_EHL_DISPLAY_STEP(p, since, until) \
(IS_JSL_EHL(p) && IS_DISPLAY_STEP(p, since, until))
#define IS_TGL_DISPLAY_STEP(__i915, since, until) \
(IS_TIGERLAKE(__i915) && \
IS_DISPLAY_STEP(__i915, since, until))
#define IS_TGL_UY_GT_STEP(__i915, since, until) \
((IS_TGL_U(__i915) || IS_TGL_Y(__i915)) && \
IS_GT_STEP(__i915, since, until))
#define IS_TGL_GT_STEP(__i915, since, until) \
(IS_TIGERLAKE(__i915) && !(IS_TGL_U(__i915) || IS_TGL_Y(__i915)) && \
IS_GT_STEP(__i915, since, until))
#define IS_RKL_DISPLAY_STEP(p, since, until) \
(IS_ROCKETLAKE(p) && IS_DISPLAY_STEP(p, since, until))
#define IS_DG1_GT_STEP(p, since, until) \
(IS_DG1(p) && IS_GT_STEP(p, since, until))
#define IS_DG1_DISPLAY_STEP(p, since, until) \
(IS_DG1(p) && IS_DISPLAY_STEP(p, since, until))
#define IS_ADLS_DISPLAY_STEP(__i915, since, until) \
(IS_ALDERLAKE_S(__i915) && \
IS_DISPLAY_STEP(__i915, since, until))
drm/i915/adl_s: Add ADL-S platform info and PCI ids - Add the initial platform information for Alderlake-S. - Specify ppgtt_size value - Add dma_mask_size - Add ADLS REVIDs - HW tracking(Selective Update Tracking Enable) has been removed from ADLS. Disable PSR2 till we enable software/ manual tracking. v2: - Add support for different ADLS SOC steppings to select correct GT/DISP stepping based on Bspec 53655 based on feedback from Matt Roper.(aswarup) v3: - Make display/gt steppings info generic for reuse with TGL and ADLS. - Modify the macros to reuse tgl_revids_get() - Add HTI support to adls device info.(mdroper) v4: - Rebase on TGL patch for applying WAs based on stepping info from Matt Roper's feedback.(aswarup) v5: - Replace macros with PCI IDs in revid to stepping table. v6: remove stray adls_revids (Lucas) Bspec: 53597 Bspec: 53648 Bspec: 53655 Bspec: 48028 Bspec: 53650 BSpec: 50422 Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210119192931.1116500-2-lucas.demarchi@intel.com
2021-01-19 11:29:31 -08:00
#define IS_ADLS_GT_STEP(__i915, since, until) \
(IS_ALDERLAKE_S(__i915) && \
IS_GT_STEP(__i915, since, until))
drm/i915/adl_s: Add ADL-S platform info and PCI ids - Add the initial platform information for Alderlake-S. - Specify ppgtt_size value - Add dma_mask_size - Add ADLS REVIDs - HW tracking(Selective Update Tracking Enable) has been removed from ADLS. Disable PSR2 till we enable software/ manual tracking. v2: - Add support for different ADLS SOC steppings to select correct GT/DISP stepping based on Bspec 53655 based on feedback from Matt Roper.(aswarup) v3: - Make display/gt steppings info generic for reuse with TGL and ADLS. - Modify the macros to reuse tgl_revids_get() - Add HTI support to adls device info.(mdroper) v4: - Rebase on TGL patch for applying WAs based on stepping info from Matt Roper's feedback.(aswarup) v5: - Replace macros with PCI IDs in revid to stepping table. v6: remove stray adls_revids (Lucas) Bspec: 53597 Bspec: 53648 Bspec: 53655 Bspec: 48028 Bspec: 53650 BSpec: 50422 Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210119192931.1116500-2-lucas.demarchi@intel.com
2021-01-19 11:29:31 -08:00
#define IS_ADLP_DISPLAY_STEP(__i915, since, until) \
(IS_ALDERLAKE_P(__i915) && \
IS_DISPLAY_STEP(__i915, since, until))
#define IS_ADLP_GT_STEP(__i915, since, until) \
(IS_ALDERLAKE_P(__i915) && \
IS_GT_STEP(__i915, since, until))
#define IS_XEHPSDV_GT_STEP(__i915, since, until) \
(IS_XEHPSDV(__i915) && IS_GT_STEP(__i915, since, until))
/*
* DG2 hardware steppings are a bit unusual. The hardware design was forked
* to create two variants (G10 and G11) which have distinct workaround sets.
* The G11 fork of the DG2 design resets the GT stepping back to "A0" for its
* first iteration, even though it's more similar to a G10 B0 stepping in terms
* of functionality and workarounds. However the display stepping does not
* reset in the same manner --- a specific stepping like "B0" has a consistent
* meaning regardless of whether it belongs to a G10 or G11 DG2.
*
* TLDR: All GT workarounds and stepping-specific logic must be applied in
* relation to a specific subplatform (G10 or G11), whereas display workarounds
* and stepping-specific logic will be applied with a general DG2-wide stepping
* number.
*/
#define IS_DG2_GT_STEP(__i915, variant, since, until) \
(IS_SUBPLATFORM(__i915, INTEL_DG2, INTEL_SUBPLATFORM_##variant) && \
IS_GT_STEP(__i915, since, until))
#define IS_DG2_DISPLAY_STEP(__i915, since, until) \
(IS_DG2(__i915) && \
IS_DISPLAY_STEP(__i915, since, until))
#define IS_LP(dev_priv) (INTEL_INFO(dev_priv)->is_lp)
#define IS_GEN9_LP(dev_priv) (GRAPHICS_VER(dev_priv) == 9 && IS_LP(dev_priv))
#define IS_GEN9_BC(dev_priv) (GRAPHICS_VER(dev_priv) == 9 && !IS_LP(dev_priv))
#define __HAS_ENGINE(engine_mask, id) ((engine_mask) & BIT(id))
#define HAS_ENGINE(gt, id) __HAS_ENGINE((gt)->info.engine_mask, id)
#define ENGINE_INSTANCES_MASK(gt, first, count) ({ \
unsigned int first__ = (first); \
unsigned int count__ = (count); \
((gt)->info.engine_mask & \
GENMASK(first__ + count__ - 1, first__)) >> first__; \
})
#define VDBOX_MASK(gt) \
ENGINE_INSTANCES_MASK(gt, VCS0, I915_MAX_VCS)
#define VEBOX_MASK(gt) \
ENGINE_INSTANCES_MASK(gt, VECS0, I915_MAX_VECS)
/*
* The Gen7 cmdparser copies the scanned buffer to the ggtt for execution
* All later gens can run the final buffer from the ppgtt
*/
#define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
#define HAS_LLC(dev_priv) (INTEL_INFO(dev_priv)->has_llc)
#define HAS_SNOOP(dev_priv) (INTEL_INFO(dev_priv)->has_snoop)
#define HAS_EDRAM(dev_priv) ((dev_priv)->edram_size_mb)
#define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
drm/i915: Enable eLLC caching of display buffers for SKL+ Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. I chose WB for the caching mode, because my numbers are indicating that WT might actually be WB and WC might actually be UC. I'm not 100% sure that is indeed the case but at least my simple rendercopy based benchmark didn't see any difference in performance. Also if I configure things to do LLCeLLC+WT I still get cache dirt on my screen, suggesting that is in fact operating in WB mode anyway. This is also the reason I had to fix the MOCS target cache to really say PTE rather than LLC+eLLC. Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. Eero's earlier benchmarks numbers: "* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily variation on the tested SKL machines * SKL GT4e (128MB eLLC) / Wayland / Weston: +15-20% SynMark TexMem512 (512MB of textures) +4-6% SynMark TerrainFly*, CSCloth, ShMapVsm -5-10% SynMark TexMem128 (128MB of textures) * SKL GT3e (64MB eLLC) / Xorg / Unity: +4-8% GpuTest Triangle fullscreen (FullHD) -5-10% GpuTest Triangle windowed (1/2 screen) * SKL GT2 (no eLLC) / Xorg / Unity: * Some of the higher FPS SynMark pixel and vertex shader tests are few percent higher, more than daily variance => Do you see any reason why this machine would be impacted although it doesn't eLLC?" Caveats: - Still haven't tested with a prime setup - Still not entirely sure this a good idea, but I've been using it on my cfl anyway :) v2: Split the MOCS PTE change out Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
2020-10-15 13:21:37 +01:00
#define HAS_WT(dev_priv) HAS_EDRAM(dev_priv)
#define HWS_NEEDS_PHYSICAL(dev_priv) (INTEL_INFO(dev_priv)->hws_needs_physical)
#define HAS_LOGICAL_RING_CONTEXTS(dev_priv) \
(INTEL_INFO(dev_priv)->has_logical_ring_contexts)
drm/i915/icl: Enhanced execution list support Enhanced Execlists is an upgraded version of execlists which supports up to 8 ports. The lrcs to be submitted are written to a submit queue (the ExecLists Submission Queue - ELSQ), which is then loaded on the HW. When writing to the ELSP register, the lrcs are written cyclically in the queue from position 0 to position 7. Alternatively, it is possible to write directly in the individual positions of the queue using the ELSQC registers. To be able to re-use all the existing code we're using the latter method and we're currently limiting ourself to only using 2 elements. v2: Rebase. v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio). v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio) v5: Reword commit, rename regs to be closer to specs, turn off preemption (Daniele), reuse engine->execlists.elsp (Chris) v6: use has_logical_ring_elsq to differentiate the new paths v7: add preemption support, rename els to submit_reg (Chris) v8: save the ctrl register inside the execlists struct, drop CSB handling updates (superseded by preempt_complete_status) (Chris) v9: s/drm_i915_gem_request/i915_request (Mika) v10: resolved conflict in inject_preempt_context (Mika) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-4-mika.kuoppala@linux.intel.com
2018-03-02 18:14:59 +02:00
#define HAS_LOGICAL_RING_ELSQ(dev_priv) \
(INTEL_INFO(dev_priv)->has_logical_ring_elsq)
#define HAS_EXECLISTS(dev_priv) HAS_LOGICAL_RING_CONTEXTS(dev_priv)
#define INTEL_PPGTT(dev_priv) (INTEL_INFO(dev_priv)->ppgtt_type)
#define HAS_PPGTT(dev_priv) \
(INTEL_PPGTT(dev_priv) != INTEL_PPGTT_NONE)
#define HAS_FULL_PPGTT(dev_priv) \
(INTEL_PPGTT(dev_priv) >= INTEL_PPGTT_FULL)
#define HAS_PAGE_SIZES(dev_priv, sizes) ({ \
GEM_BUG_ON((sizes) == 0); \
((sizes) & ~INTEL_INFO(dev_priv)->page_sizes) == 0; \
})
#define HAS_OVERLAY(dev_priv) (INTEL_INFO(dev_priv)->display.has_overlay)
#define OVERLAY_NEEDS_PHYSICAL(dev_priv) \
(INTEL_INFO(dev_priv)->display.overlay_needs_physical)
/* Early gen2 have a totally busted CS tlb and require pinned batches. */
#define HAS_BROKEN_CS_TLB(dev_priv) (IS_I830(dev_priv) || IS_I845G(dev_priv))
#define NEEDS_RC6_CTX_CORRUPTION_WA(dev_priv) \
(IS_BROADWELL(dev_priv) || GRAPHICS_VER(dev_priv) == 9)
/* WaRsDisableCoarsePowerGating:skl,cnl */
#define NEEDS_WaRsDisableCoarsePowerGating(dev_priv) \
(IS_SKL_GT3(dev_priv) || IS_SKL_GT4(dev_priv))
#define HAS_GMBUS_IRQ(dev_priv) (GRAPHICS_VER(dev_priv) >= 4)
#define HAS_GMBUS_BURST_READ(dev_priv) (GRAPHICS_VER(dev_priv) >= 11 || \
IS_GEMINILAKE(dev_priv) || \
IS_KABYLAKE(dev_priv))
/* With the 945 and later, Y tiling got adjusted so that it was 32 128-byte
* rows, which changed the alignment requirements and fence programming.
*/
#define HAS_128_BYTE_Y_TILING(dev_priv) (GRAPHICS_VER(dev_priv) != 2 && \
!(IS_I915G(dev_priv) || IS_I915GM(dev_priv)))
#define SUPPORTS_TV(dev_priv) (INTEL_INFO(dev_priv)->display.supports_tv)
#define I915_HAS_HOTPLUG(dev_priv) (INTEL_INFO(dev_priv)->display.has_hotplug)
#define HAS_FW_BLC(dev_priv) (GRAPHICS_VER(dev_priv) > 2)
#define HAS_FBC(dev_priv) (INTEL_INFO(dev_priv)->display.fbc_mask != 0)
#define HAS_CUR_FBC(dev_priv) (!HAS_GMCH(dev_priv) && GRAPHICS_VER(dev_priv) >= 7)
#define HAS_IPS(dev_priv) (IS_HSW_ULT(dev_priv) || IS_BROADWELL(dev_priv))
#define HAS_DP_MST(dev_priv) (INTEL_INFO(dev_priv)->display.has_dp_mst)
#define HAS_DP20(dev_priv) (IS_DG2(dev_priv))
#define HAS_CDCLK_CRAWL(dev_priv) (INTEL_INFO(dev_priv)->display.has_cdclk_crawl)
#define HAS_DDI(dev_priv) (INTEL_INFO(dev_priv)->display.has_ddi)
#define HAS_FPGA_DBG_UNCLAIMED(dev_priv) (INTEL_INFO(dev_priv)->display.has_fpga_dbg)
#define HAS_PSR(dev_priv) (INTEL_INFO(dev_priv)->display.has_psr)
#define HAS_PSR_HW_TRACKING(dev_priv) \
(INTEL_INFO(dev_priv)->display.has_psr_hw_tracking)
#define HAS_PSR2_SEL_FETCH(dev_priv) (GRAPHICS_VER(dev_priv) >= 12)
#define HAS_TRANSCODER(dev_priv, trans) ((INTEL_INFO(dev_priv)->display.cpu_transcoder_mask & BIT(trans)) != 0)
#define HAS_RC6(dev_priv) (INTEL_INFO(dev_priv)->has_rc6)
#define HAS_RC6p(dev_priv) (INTEL_INFO(dev_priv)->has_rc6p)
#define HAS_RC6pp(dev_priv) (false) /* HW was never validated */
#define HAS_RPS(dev_priv) (INTEL_INFO(dev_priv)->has_rps)
#define HAS_DMC(dev_priv) (INTEL_INFO(dev_priv)->display.has_dmc)
drm/i915/skl: Add support to load SKL CSR firmware. Display Context Save and Restore support is needed for various SKL Display C states like DC5, DC6. This implementation is added based on first version of DMC CSR program that we received from h/w team. Here we are using request_firmware based design. Finally this firmware should end up in linux-firmware tree. For SKL platform its mandatory to ensure that we load this csr program before enabling DC states like DC5/DC6. As CSR program gets reset on various conditions, we should ensure to load it during boot and in future change to be added to load this system resume sequence too. v1: Initial relese as RFC patch v2: Design change as per Daniel, Damien and Shobit's review comments request firmware method followed. v3: Some optimization and functional changes. Pulled register defines into drivers/gpu/drm/i915/i915_reg.h Used kmemdup to allocate and duplicate firmware content. Ensured to free allocated buffer. v4: Modified as per review comments from Satheesh and Daniel Removed temporary buffer. Optimized number of writes by replacing I915_WRITE with I915_WRITE64. v5: Modified as per review comemnts from Damien. - Changed name for functions and firmware. - Introduced HAS_CSR. - Reverted back previous change and used csr_buf with u8 size. - Using cpu_to_be64 for endianness change. Modified as per review comments from Imre. - Modified registers and macro names to be a bit closer to bspec terminology and the existing register naming in the driver. - Early return for non SKL platforms in intel_load_csr_program function. - Added locking around CSR program load function as it may be called concurrently during system/runtime resume. - Releasing the fw before loading the program for consistency - Handled error path during f/w load. v6: Modified as per review comments from Imre. - Corrected out_freecsr sequence. v7: Modified as per review comments from Imre. Fail loading fw if fw->size%8!=0. v8: Rebase to latest. v9: Rebase on top of -nightly (Damien) v10: Enabled support for dmc firmware ver 1.0. According to ver 1.0 in a single binary package all the firmware's that are required for different stepping's of the product will be stored. The package contains the css header, followed by the package header and the actual dmc firmwares. Package header contains the firmware/stepping mapping table and the corresponding firmware offsets to the individual binaries, within the package. Each individual program binary contains the header and the payload sections whose size is specified in the header section. This changes are done to extract the specific firmaware from the package. (Animesh) v11: Modified as per review comemnts from Imre. - Added code comment from bpec for header structure elements. - Added __packed to avoid structure padding. - Added helper functions for stepping and substepping info. - Added code comment for CSR_MAX_FW_SIZE. - Disabled BXT firmware loading, will be enabled with dmc 1.0 support. - Changed skl_stepping_info based on bspec, earlier used from config DB. - Removed duplicate call of cpu_to_be* from intel_csr_load_program function. - Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned. - Added sanity check for header length. - Added sanity check for mmio address got from firmware binary. - kmalloc done separately for dmc header and dmc firmware. (Animesh) v12: Modified as per review comemnts from Imre. - Corrected the typo error in skl stepping info structure. - Added out-of-bound access for skl_stepping_info. - Sanity check for mmio address modified. - Sanity check added for stepping and substeppig. - Modified the intel_dmc_info structure, cache only the required header info. (Animesh) v13: clarify firmware load error message. The reason for a firmware loading failure can be obscure if the driver is built-in. Provide an explanation to the user about the likely reason for the failure and how to resolve it. (Imre) v14: Suggested by Jani. - fix s/I915/CONFIG_DRM_I915/ typo - add fw_path to the firmware object instead of using a static ptr (Jani) v15: 1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link with same name will help not to build kernel again. 2) Changes done as per review comments from Imre. - Error check removed for intel_csr_ucode_init. - Moved csr-specific data structure to intel_csr.h and optimization done on structure definition. - fw->data used directly for parsing the header info & memory allocation only done separately for payload. (Animesh) v16: - No need for out_regs label in i915_driver_load(), so removed it. - Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh) Issue: VIZ-2569 Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 14:58:44 +02:00
#define HAS_MSO(i915) (GRAPHICS_VER(i915) >= 12)
#define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm)
#define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc)
#define HAS_MSLICES(dev_priv) \
(INTEL_INFO(dev_priv)->has_mslices)
#define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc)
#define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
#define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
#define HAS_GT_UC(dev_priv) (INTEL_INFO(dev_priv)->has_gt_uc)
#define HAS_POOLED_EU(dev_priv) (INTEL_INFO(dev_priv)->has_pooled_eu)
drm/i915:bxt: Enable Pooled EU support This mode allows to assign EUs to pools which can process work collectively. The command to enable this mode should be issued as part of context initialization. The pooled mode is global, once enabled it has to stay the same across all contexts until HW reset hence this is sent in auxiliary golden context batch. Thanks to Mika for the preliminary review and comments. v2: explain why this is enabled in golden context, use feature flag while enabling the support (Chris) v3: Include only kernel support as userspace support is not available yet. User space clients need to know when the pooled EU feature is present and enabled on the hardware so that they can adapt work submissions. Create a new device info flag for this purpose. Set has_pooled_eu to true in the Broxton static device info - Broxton supports the feature in hardware and the driver will enable it by default. We need to add getparam ioctls to enable userspace to query availability of this feature and to retrieve min. no of eus in a pool but we will expose them once userspace support is available. Opensource users for this feature are mesa, libva and beignet. Beignet team is currently working on adding userspace support. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Cc: Winiarski, Michal <michal.winiarski@intel.com> Cc: Zou, Nanhai <nanhai.zou@intel.com> Cc: Yang, Rong R <rong.r.yang@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Armin Reese <armin.c.reese@intel.com> Cc: Tim Gore <tim.gore@intel.com> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-03 06:34:33 +01:00
#define HAS_GLOBAL_MOCS_REGISTERS(dev_priv) (INTEL_INFO(dev_priv)->has_global_mocs)
#define HAS_PXP(dev_priv) ((IS_ENABLED(CONFIG_DRM_I915_PXP) && \
INTEL_INFO(dev_priv)->has_pxp) && \
VDBOX_MASK(&dev_priv->gt))
#define HAS_GMCH(dev_priv) (INTEL_INFO(dev_priv)->display.has_gmch)
#define HAS_LSPCON(dev_priv) (IS_GRAPHICS_VER(dev_priv, 9, 10))
/* DPF == dynamic parity feature */
#define HAS_L3_DPF(dev_priv) (INTEL_INFO(dev_priv)->has_l3_dpf)
#define NUM_L3_SLICES(dev_priv) (IS_HSW_GT3(dev_priv) ? \
2 : HAS_L3_DPF(dev_priv))
#define GT_FREQUENCY_MULTIPLIER 50
#define GEN9_FREQ_SCALER 3
#define INTEL_NUM_PIPES(dev_priv) (hweight8(INTEL_INFO(dev_priv)->display.pipe_mask))
#define HAS_DISPLAY(dev_priv) (INTEL_INFO(dev_priv)->display.pipe_mask != 0)
#define HAS_VRR(i915) (GRAPHICS_VER(i915) >= 11)
#define HAS_ASYNC_FLIPS(i915) (DISPLAY_VER(i915) >= 5)
/* Only valid when HAS_DISPLAY() is true */
#define INTEL_DISPLAY_ENABLED(dev_priv) \
(drm_WARN_ON(&(dev_priv)->drm, !HAS_DISPLAY(dev_priv)), !(dev_priv)->params.disable_display)
static inline bool run_as_guest(void)
{
return !hypervisor_is_type(X86_HYPER_NATIVE);
}
#define HAS_D12_PLANE_MINIMIZATION(dev_priv) (IS_ROCKETLAKE(dev_priv) || \
IS_ALDERLAKE_S(dev_priv))
static inline bool intel_vtd_active(void)
{
#ifdef CONFIG_INTEL_IOMMU
if (intel_iommu_gfx_mapped)
return true;
#endif
/* Running as a guest, we assume the host is enforcing VT'd */
return run_as_guest();
}
static inline bool intel_scanout_needs_vtd_wa(struct drm_i915_private *dev_priv)
{
return GRAPHICS_VER(dev_priv) >= 6 && intel_vtd_active();
}
drm/i915: Serialize GTT/Aperture accesses on BXT BXT has a H/W issue with IOMMU which can lead to system hangs when Aperture accesses are queued within the GAM behind GTT Accesses. This patch avoids the condition by wrapping all GTT updates in stop_machine and using a flushing read prior to restarting the machine. The stop_machine guarantees no new Aperture accesses can begin while the PTE writes are being emmitted. The flushing read ensures that any following Aperture accesses cannot begin until the PTE writes have been cleared out of the GAM's fifo. Only FOLLOWING Aperture accesses need to be separated from in flight PTE updates. PTE Writes may follow tightly behind already in flight Aperture accesses, so no flushing read is required at the start of a PTE update sequence. This issue was reproduced by running igt/gem_readwrite and igt/gem_render_copy simultaneously from different processes, each in a tight loop, with INTEL_IOMMU enabled. This patch was originally published as: drm/i915: Serialize GTT Updates on BXT v2: Move bxt/iommu detection into static function Remove #ifdef CONFIG_INTEL_IOMMU protection Make function names more reflective of purpose Move flushing read into static function v3: Tidy up for checkpatch.pl Testcase: igt/gem_concurrent_blit Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: John Harrison <john.C.Harrison@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1495641251-30022-1-git-send-email-jon.bloomfield@intel.com Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-24 08:54:11 -07:00
static inline bool
drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2. The stop_machine() lock may allocate memory, but is called inside vm->mutex, which is taken in the shrinker. This will cause a lockdep splat, as can be seen below: <4>[ 462.585762] ====================================================== <4>[ 462.585768] WARNING: possible circular locking dependency detected <4>[ 462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G U <4>[ 462.585779] ------------------------------------------------------ <4>[ 462.585783] i915_selftest/5540 is trying to acquire lock: <4>[ 462.585788] ffffffff826440b0 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x12/0x30 <4>[ 462.585814] but task is already holding lock: <4>[ 462.585818] ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.586301] which lock already depends on the new lock. <4>[ 462.586305] the existing dependency chain (in reverse order) is: <4>[ 462.586309] -> #2 (&vm->mutex/1){+.+.}-{3:3}: <4>[ 462.586323] i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4>[ 462.586719] i915_address_space_init+0x12d/0x130 [i915] <4>[ 462.587092] ppgtt_init+0x4e/0x80 [i915] <4>[ 462.587467] gen8_ppgtt_create+0x3e/0x5c0 [i915] <4>[ 462.587828] i915_ppgtt_create+0x28/0xf0 [i915] <4>[ 462.588203] intel_gt_init+0x123/0x370 [i915] <4>[ 462.588572] i915_gem_init+0x129/0x1f0 [i915] <4>[ 462.588971] i915_driver_probe+0x753/0xd80 [i915] <4>[ 462.589320] i915_pci_probe+0x43/0x1d0 [i915] <4>[ 462.589671] pci_device_probe+0x9e/0x110 <4>[ 462.589680] really_probe+0xea/0x410 <4>[ 462.589690] driver_probe_device+0xd9/0x140 <4>[ 462.589697] device_driver_attach+0x4a/0x50 <4>[ 462.589704] __driver_attach+0x83/0x140 <4>[ 462.589711] bus_for_each_dev+0x75/0xc0 <4>[ 462.589718] bus_add_driver+0x14b/0x1f0 <4>[ 462.589724] driver_register+0x66/0xb0 <4>[ 462.589731] i915_init+0x70/0x87 [i915] <4>[ 462.590053] do_one_initcall+0x56/0x2e0 <4>[ 462.590061] do_init_module+0x55/0x200 <4>[ 462.590068] load_module+0x2703/0x2990 <4>[ 462.590074] __do_sys_finit_module+0xad/0x110 <4>[ 462.590080] do_syscall_64+0x33/0x80 <4>[ 462.590089] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.590096] -> #1 (fs_reclaim){+.+.}-{0:0}: <4>[ 462.590109] fs_reclaim_acquire+0x9f/0xd0 <4>[ 462.590118] kmem_cache_alloc_trace+0x3d/0x430 <4>[ 462.590126] intel_cpuc_prepare+0x3b/0x1b0 <4>[ 462.590133] cpuhp_invoke_callback+0x9e/0x890 <4>[ 462.590141] _cpu_up+0xa4/0x130 <4>[ 462.590147] cpu_up+0x82/0x90 <4>[ 462.590153] bringup_nonboot_cpus+0x4a/0x60 <4>[ 462.590159] smp_init+0x21/0x5c <4>[ 462.590167] kernel_init_freeable+0x8a/0x1b7 <4>[ 462.590175] kernel_init+0x5/0xff <4>[ 462.590181] ret_from_fork+0x22/0x30 <4>[ 462.590187] -> #0 (cpu_hotplug_lock){++++}-{0:0}: <4>[ 462.590199] __lock_acquire+0x1520/0x2590 <4>[ 462.590207] lock_acquire+0xd1/0x3d0 <4>[ 462.590213] cpus_read_lock+0x39/0xc0 <4>[ 462.590219] stop_machine+0x12/0x30 <4>[ 462.590226] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.590601] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.590970] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.591374] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.591779] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.592170] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.592562] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.592995] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.593428] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.593860] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.594210] pci_device_probe+0x9e/0x110 <4>[ 462.594217] really_probe+0xea/0x410 <4>[ 462.594226] driver_probe_device+0xd9/0x140 <4>[ 462.594233] device_driver_attach+0x4a/0x50 <4>[ 462.594240] __driver_attach+0x83/0x140 <4>[ 462.594247] bus_for_each_dev+0x75/0xc0 <4>[ 462.594254] bus_add_driver+0x14b/0x1f0 <4>[ 462.594260] driver_register+0x66/0xb0 <4>[ 462.594267] i915_init+0x70/0x87 [i915] <4>[ 462.594586] do_one_initcall+0x56/0x2e0 <4>[ 462.594592] do_init_module+0x55/0x200 <4>[ 462.594599] load_module+0x2703/0x2990 <4>[ 462.594605] __do_sys_finit_module+0xad/0x110 <4>[ 462.594612] do_syscall_64+0x33/0x80 <4>[ 462.594618] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.594625] other info that might help us debug this: <4>[ 462.594629] Chain exists of: cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1 <4>[ 462.594645] Possible unsafe locking scenario: <4>[ 462.594648] CPU0 CPU1 <4>[ 462.594652] ---- ---- <4>[ 462.594655] lock(&vm->mutex/1); <4>[ 462.594664] lock(fs_reclaim); <4>[ 462.594671] lock(&vm->mutex/1); <4>[ 462.594679] lock(cpu_hotplug_lock); <4>[ 462.594686] *** DEADLOCK *** <4>[ 462.594690] 4 locks held by i915_selftest/5540: <4>[ 462.594696] #0: ffff888100fbc240 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50 <4>[ 462.594715] #1: ffffc900006cb9a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: make_obj_busy+0x81/0x330 [i915] <4>[ 462.595118] #2: ffff88812a6081e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: make_obj_busy+0x21f/0x330 [i915] <4>[ 462.595519] #3: ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.595934] stack backtrace: <4>[ 462.595939] CPU: 0 PID: 5540 Comm: i915_selftest Tainted: G U 5.12.0-rc5-CI-Trybot_7644+ #1 <4>[ 462.595947] Hardware name: GOOGLE Kefka/Kefka, BIOS MrChromebox 02/04/2018 <4>[ 462.595952] Call Trace: <4>[ 462.595961] dump_stack+0x7f/0xad <4>[ 462.595974] check_noncircular+0x12e/0x150 <4>[ 462.595982] ? save_stack.isra.17+0x3f/0x70 <4>[ 462.595991] ? drm_mm_insert_node_in_range+0x34a/0x5b0 <4>[ 462.596000] ? i915_vma_pin_ww+0x9ec/0xb40 [i915] <4>[ 462.596410] __lock_acquire+0x1520/0x2590 <4>[ 462.596419] ? do_init_module+0x55/0x200 <4>[ 462.596429] lock_acquire+0xd1/0x3d0 <4>[ 462.596435] ? stop_machine+0x12/0x30 <4>[ 462.596445] ? gen8_ggtt_insert_entries+0xf0/0xf0 [i915] <4>[ 462.596816] cpus_read_lock+0x39/0xc0 <4>[ 462.596824] ? stop_machine+0x12/0x30 <4>[ 462.596831] stop_machine+0x12/0x30 <4>[ 462.596839] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.597210] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.597580] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.597986] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.598395] ? make_obj_busy+0xcb/0x330 [i915] <4>[ 462.598786] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.599180] ? 0xffffffff81000000 <4>[ 462.599187] ? debug_mutex_unlock+0x50/0xa0 <4>[ 462.599198] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.599592] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.600026] ? i915_perf_selftests+0x20/0x20 [i915] <4>[ 462.600422] ? __i915_nop_setup+0x10/0x10 [i915] <4>[ 462.600820] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.601253] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.601686] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.602037] ? _raw_spin_unlock_irqrestore+0x3d/0x60 <4>[ 462.602047] pci_device_probe+0x9e/0x110 <4>[ 462.602057] really_probe+0xea/0x410 <4>[ 462.602067] driver_probe_device+0xd9/0x140 <4>[ 462.602075] device_driver_attach+0x4a/0x50 <4>[ 462.602084] __driver_attach+0x83/0x140 <4>[ 462.602091] ? device_driver_attach+0x50/0x50 <4>[ 462.602099] ? device_driver_attach+0x50/0x50 <4>[ 462.602107] bus_for_each_dev+0x75/0xc0 <4>[ 462.602116] bus_add_driver+0x14b/0x1f0 <4>[ 462.602124] driver_register+0x66/0xb0 <4>[ 462.602133] i915_init+0x70/0x87 [i915] <4>[ 462.602453] ? 0xffffffffa0606000 <4>[ 462.602458] do_one_initcall+0x56/0x2e0 <4>[ 462.602466] ? kmem_cache_alloc_trace+0x374/0x430 <4>[ 462.602476] do_init_module+0x55/0x200 <4>[ 462.602484] load_module+0x2703/0x2990 <4>[ 462.602500] ? __do_sys_finit_module+0xad/0x110 <4>[ 462.602507] __do_sys_finit_module+0xad/0x110 <4>[ 462.602519] do_syscall_64+0x33/0x80 <4>[ 462.602527] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.602535] RIP: 0033:0x7fab69d8d89d Changes since v1: - Add lockdep annotations during init, to ensure that lockdep is primed. This also fixes a false positive when reading /proc/lockdep_stats during module reload. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426102351.921874-1-maarten.lankhorst@linux.intel.com Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2021-04-26 12:23:51 +02:00
intel_ggtt_update_needs_vtd_wa(struct drm_i915_private *i915)
drm/i915: Serialize GTT/Aperture accesses on BXT BXT has a H/W issue with IOMMU which can lead to system hangs when Aperture accesses are queued within the GAM behind GTT Accesses. This patch avoids the condition by wrapping all GTT updates in stop_machine and using a flushing read prior to restarting the machine. The stop_machine guarantees no new Aperture accesses can begin while the PTE writes are being emmitted. The flushing read ensures that any following Aperture accesses cannot begin until the PTE writes have been cleared out of the GAM's fifo. Only FOLLOWING Aperture accesses need to be separated from in flight PTE updates. PTE Writes may follow tightly behind already in flight Aperture accesses, so no flushing read is required at the start of a PTE update sequence. This issue was reproduced by running igt/gem_readwrite and igt/gem_render_copy simultaneously from different processes, each in a tight loop, with INTEL_IOMMU enabled. This patch was originally published as: drm/i915: Serialize GTT Updates on BXT v2: Move bxt/iommu detection into static function Remove #ifdef CONFIG_INTEL_IOMMU protection Make function names more reflective of purpose Move flushing read into static function v3: Tidy up for checkpatch.pl Testcase: igt/gem_concurrent_blit Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: John Harrison <john.C.Harrison@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1495641251-30022-1-git-send-email-jon.bloomfield@intel.com Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-24 08:54:11 -07:00
{
drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2. The stop_machine() lock may allocate memory, but is called inside vm->mutex, which is taken in the shrinker. This will cause a lockdep splat, as can be seen below: <4>[ 462.585762] ====================================================== <4>[ 462.585768] WARNING: possible circular locking dependency detected <4>[ 462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G U <4>[ 462.585779] ------------------------------------------------------ <4>[ 462.585783] i915_selftest/5540 is trying to acquire lock: <4>[ 462.585788] ffffffff826440b0 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x12/0x30 <4>[ 462.585814] but task is already holding lock: <4>[ 462.585818] ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.586301] which lock already depends on the new lock. <4>[ 462.586305] the existing dependency chain (in reverse order) is: <4>[ 462.586309] -> #2 (&vm->mutex/1){+.+.}-{3:3}: <4>[ 462.586323] i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4>[ 462.586719] i915_address_space_init+0x12d/0x130 [i915] <4>[ 462.587092] ppgtt_init+0x4e/0x80 [i915] <4>[ 462.587467] gen8_ppgtt_create+0x3e/0x5c0 [i915] <4>[ 462.587828] i915_ppgtt_create+0x28/0xf0 [i915] <4>[ 462.588203] intel_gt_init+0x123/0x370 [i915] <4>[ 462.588572] i915_gem_init+0x129/0x1f0 [i915] <4>[ 462.588971] i915_driver_probe+0x753/0xd80 [i915] <4>[ 462.589320] i915_pci_probe+0x43/0x1d0 [i915] <4>[ 462.589671] pci_device_probe+0x9e/0x110 <4>[ 462.589680] really_probe+0xea/0x410 <4>[ 462.589690] driver_probe_device+0xd9/0x140 <4>[ 462.589697] device_driver_attach+0x4a/0x50 <4>[ 462.589704] __driver_attach+0x83/0x140 <4>[ 462.589711] bus_for_each_dev+0x75/0xc0 <4>[ 462.589718] bus_add_driver+0x14b/0x1f0 <4>[ 462.589724] driver_register+0x66/0xb0 <4>[ 462.589731] i915_init+0x70/0x87 [i915] <4>[ 462.590053] do_one_initcall+0x56/0x2e0 <4>[ 462.590061] do_init_module+0x55/0x200 <4>[ 462.590068] load_module+0x2703/0x2990 <4>[ 462.590074] __do_sys_finit_module+0xad/0x110 <4>[ 462.590080] do_syscall_64+0x33/0x80 <4>[ 462.590089] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.590096] -> #1 (fs_reclaim){+.+.}-{0:0}: <4>[ 462.590109] fs_reclaim_acquire+0x9f/0xd0 <4>[ 462.590118] kmem_cache_alloc_trace+0x3d/0x430 <4>[ 462.590126] intel_cpuc_prepare+0x3b/0x1b0 <4>[ 462.590133] cpuhp_invoke_callback+0x9e/0x890 <4>[ 462.590141] _cpu_up+0xa4/0x130 <4>[ 462.590147] cpu_up+0x82/0x90 <4>[ 462.590153] bringup_nonboot_cpus+0x4a/0x60 <4>[ 462.590159] smp_init+0x21/0x5c <4>[ 462.590167] kernel_init_freeable+0x8a/0x1b7 <4>[ 462.590175] kernel_init+0x5/0xff <4>[ 462.590181] ret_from_fork+0x22/0x30 <4>[ 462.590187] -> #0 (cpu_hotplug_lock){++++}-{0:0}: <4>[ 462.590199] __lock_acquire+0x1520/0x2590 <4>[ 462.590207] lock_acquire+0xd1/0x3d0 <4>[ 462.590213] cpus_read_lock+0x39/0xc0 <4>[ 462.590219] stop_machine+0x12/0x30 <4>[ 462.590226] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.590601] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.590970] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.591374] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.591779] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.592170] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.592562] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.592995] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.593428] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.593860] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.594210] pci_device_probe+0x9e/0x110 <4>[ 462.594217] really_probe+0xea/0x410 <4>[ 462.594226] driver_probe_device+0xd9/0x140 <4>[ 462.594233] device_driver_attach+0x4a/0x50 <4>[ 462.594240] __driver_attach+0x83/0x140 <4>[ 462.594247] bus_for_each_dev+0x75/0xc0 <4>[ 462.594254] bus_add_driver+0x14b/0x1f0 <4>[ 462.594260] driver_register+0x66/0xb0 <4>[ 462.594267] i915_init+0x70/0x87 [i915] <4>[ 462.594586] do_one_initcall+0x56/0x2e0 <4>[ 462.594592] do_init_module+0x55/0x200 <4>[ 462.594599] load_module+0x2703/0x2990 <4>[ 462.594605] __do_sys_finit_module+0xad/0x110 <4>[ 462.594612] do_syscall_64+0x33/0x80 <4>[ 462.594618] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.594625] other info that might help us debug this: <4>[ 462.594629] Chain exists of: cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1 <4>[ 462.594645] Possible unsafe locking scenario: <4>[ 462.594648] CPU0 CPU1 <4>[ 462.594652] ---- ---- <4>[ 462.594655] lock(&vm->mutex/1); <4>[ 462.594664] lock(fs_reclaim); <4>[ 462.594671] lock(&vm->mutex/1); <4>[ 462.594679] lock(cpu_hotplug_lock); <4>[ 462.594686] *** DEADLOCK *** <4>[ 462.594690] 4 locks held by i915_selftest/5540: <4>[ 462.594696] #0: ffff888100fbc240 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50 <4>[ 462.594715] #1: ffffc900006cb9a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: make_obj_busy+0x81/0x330 [i915] <4>[ 462.595118] #2: ffff88812a6081e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: make_obj_busy+0x21f/0x330 [i915] <4>[ 462.595519] #3: ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.595934] stack backtrace: <4>[ 462.595939] CPU: 0 PID: 5540 Comm: i915_selftest Tainted: G U 5.12.0-rc5-CI-Trybot_7644+ #1 <4>[ 462.595947] Hardware name: GOOGLE Kefka/Kefka, BIOS MrChromebox 02/04/2018 <4>[ 462.595952] Call Trace: <4>[ 462.595961] dump_stack+0x7f/0xad <4>[ 462.595974] check_noncircular+0x12e/0x150 <4>[ 462.595982] ? save_stack.isra.17+0x3f/0x70 <4>[ 462.595991] ? drm_mm_insert_node_in_range+0x34a/0x5b0 <4>[ 462.596000] ? i915_vma_pin_ww+0x9ec/0xb40 [i915] <4>[ 462.596410] __lock_acquire+0x1520/0x2590 <4>[ 462.596419] ? do_init_module+0x55/0x200 <4>[ 462.596429] lock_acquire+0xd1/0x3d0 <4>[ 462.596435] ? stop_machine+0x12/0x30 <4>[ 462.596445] ? gen8_ggtt_insert_entries+0xf0/0xf0 [i915] <4>[ 462.596816] cpus_read_lock+0x39/0xc0 <4>[ 462.596824] ? stop_machine+0x12/0x30 <4>[ 462.596831] stop_machine+0x12/0x30 <4>[ 462.596839] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.597210] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.597580] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.597986] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.598395] ? make_obj_busy+0xcb/0x330 [i915] <4>[ 462.598786] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.599180] ? 0xffffffff81000000 <4>[ 462.599187] ? debug_mutex_unlock+0x50/0xa0 <4>[ 462.599198] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.599592] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.600026] ? i915_perf_selftests+0x20/0x20 [i915] <4>[ 462.600422] ? __i915_nop_setup+0x10/0x10 [i915] <4>[ 462.600820] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.601253] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.601686] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.602037] ? _raw_spin_unlock_irqrestore+0x3d/0x60 <4>[ 462.602047] pci_device_probe+0x9e/0x110 <4>[ 462.602057] really_probe+0xea/0x410 <4>[ 462.602067] driver_probe_device+0xd9/0x140 <4>[ 462.602075] device_driver_attach+0x4a/0x50 <4>[ 462.602084] __driver_attach+0x83/0x140 <4>[ 462.602091] ? device_driver_attach+0x50/0x50 <4>[ 462.602099] ? device_driver_attach+0x50/0x50 <4>[ 462.602107] bus_for_each_dev+0x75/0xc0 <4>[ 462.602116] bus_add_driver+0x14b/0x1f0 <4>[ 462.602124] driver_register+0x66/0xb0 <4>[ 462.602133] i915_init+0x70/0x87 [i915] <4>[ 462.602453] ? 0xffffffffa0606000 <4>[ 462.602458] do_one_initcall+0x56/0x2e0 <4>[ 462.602466] ? kmem_cache_alloc_trace+0x374/0x430 <4>[ 462.602476] do_init_module+0x55/0x200 <4>[ 462.602484] load_module+0x2703/0x2990 <4>[ 462.602500] ? __do_sys_finit_module+0xad/0x110 <4>[ 462.602507] __do_sys_finit_module+0xad/0x110 <4>[ 462.602519] do_syscall_64+0x33/0x80 <4>[ 462.602527] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.602535] RIP: 0033:0x7fab69d8d89d Changes since v1: - Add lockdep annotations during init, to ensure that lockdep is primed. This also fixes a false positive when reading /proc/lockdep_stats during module reload. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426102351.921874-1-maarten.lankhorst@linux.intel.com Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2021-04-26 12:23:51 +02:00
return IS_BROXTON(i915) && intel_vtd_active();
}
static inline bool
intel_vm_no_concurrent_access_wa(struct drm_i915_private *i915)
drm/i915: Serialize GTT/Aperture accesses on BXT BXT has a H/W issue with IOMMU which can lead to system hangs when Aperture accesses are queued within the GAM behind GTT Accesses. This patch avoids the condition by wrapping all GTT updates in stop_machine and using a flushing read prior to restarting the machine. The stop_machine guarantees no new Aperture accesses can begin while the PTE writes are being emmitted. The flushing read ensures that any following Aperture accesses cannot begin until the PTE writes have been cleared out of the GAM's fifo. Only FOLLOWING Aperture accesses need to be separated from in flight PTE updates. PTE Writes may follow tightly behind already in flight Aperture accesses, so no flushing read is required at the start of a PTE update sequence. This issue was reproduced by running igt/gem_readwrite and igt/gem_render_copy simultaneously from different processes, each in a tight loop, with INTEL_IOMMU enabled. This patch was originally published as: drm/i915: Serialize GTT Updates on BXT v2: Move bxt/iommu detection into static function Remove #ifdef CONFIG_INTEL_IOMMU protection Make function names more reflective of purpose Move flushing read into static function v3: Tidy up for checkpatch.pl Testcase: igt/gem_concurrent_blit Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: John Harrison <john.C.Harrison@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1495641251-30022-1-git-send-email-jon.bloomfield@intel.com Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-24 08:54:11 -07:00
{
drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2. The stop_machine() lock may allocate memory, but is called inside vm->mutex, which is taken in the shrinker. This will cause a lockdep splat, as can be seen below: <4>[ 462.585762] ====================================================== <4>[ 462.585768] WARNING: possible circular locking dependency detected <4>[ 462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G U <4>[ 462.585779] ------------------------------------------------------ <4>[ 462.585783] i915_selftest/5540 is trying to acquire lock: <4>[ 462.585788] ffffffff826440b0 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x12/0x30 <4>[ 462.585814] but task is already holding lock: <4>[ 462.585818] ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.586301] which lock already depends on the new lock. <4>[ 462.586305] the existing dependency chain (in reverse order) is: <4>[ 462.586309] -> #2 (&vm->mutex/1){+.+.}-{3:3}: <4>[ 462.586323] i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4>[ 462.586719] i915_address_space_init+0x12d/0x130 [i915] <4>[ 462.587092] ppgtt_init+0x4e/0x80 [i915] <4>[ 462.587467] gen8_ppgtt_create+0x3e/0x5c0 [i915] <4>[ 462.587828] i915_ppgtt_create+0x28/0xf0 [i915] <4>[ 462.588203] intel_gt_init+0x123/0x370 [i915] <4>[ 462.588572] i915_gem_init+0x129/0x1f0 [i915] <4>[ 462.588971] i915_driver_probe+0x753/0xd80 [i915] <4>[ 462.589320] i915_pci_probe+0x43/0x1d0 [i915] <4>[ 462.589671] pci_device_probe+0x9e/0x110 <4>[ 462.589680] really_probe+0xea/0x410 <4>[ 462.589690] driver_probe_device+0xd9/0x140 <4>[ 462.589697] device_driver_attach+0x4a/0x50 <4>[ 462.589704] __driver_attach+0x83/0x140 <4>[ 462.589711] bus_for_each_dev+0x75/0xc0 <4>[ 462.589718] bus_add_driver+0x14b/0x1f0 <4>[ 462.589724] driver_register+0x66/0xb0 <4>[ 462.589731] i915_init+0x70/0x87 [i915] <4>[ 462.590053] do_one_initcall+0x56/0x2e0 <4>[ 462.590061] do_init_module+0x55/0x200 <4>[ 462.590068] load_module+0x2703/0x2990 <4>[ 462.590074] __do_sys_finit_module+0xad/0x110 <4>[ 462.590080] do_syscall_64+0x33/0x80 <4>[ 462.590089] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.590096] -> #1 (fs_reclaim){+.+.}-{0:0}: <4>[ 462.590109] fs_reclaim_acquire+0x9f/0xd0 <4>[ 462.590118] kmem_cache_alloc_trace+0x3d/0x430 <4>[ 462.590126] intel_cpuc_prepare+0x3b/0x1b0 <4>[ 462.590133] cpuhp_invoke_callback+0x9e/0x890 <4>[ 462.590141] _cpu_up+0xa4/0x130 <4>[ 462.590147] cpu_up+0x82/0x90 <4>[ 462.590153] bringup_nonboot_cpus+0x4a/0x60 <4>[ 462.590159] smp_init+0x21/0x5c <4>[ 462.590167] kernel_init_freeable+0x8a/0x1b7 <4>[ 462.590175] kernel_init+0x5/0xff <4>[ 462.590181] ret_from_fork+0x22/0x30 <4>[ 462.590187] -> #0 (cpu_hotplug_lock){++++}-{0:0}: <4>[ 462.590199] __lock_acquire+0x1520/0x2590 <4>[ 462.590207] lock_acquire+0xd1/0x3d0 <4>[ 462.590213] cpus_read_lock+0x39/0xc0 <4>[ 462.590219] stop_machine+0x12/0x30 <4>[ 462.590226] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.590601] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.590970] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.591374] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.591779] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.592170] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.592562] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.592995] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.593428] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.593860] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.594210] pci_device_probe+0x9e/0x110 <4>[ 462.594217] really_probe+0xea/0x410 <4>[ 462.594226] driver_probe_device+0xd9/0x140 <4>[ 462.594233] device_driver_attach+0x4a/0x50 <4>[ 462.594240] __driver_attach+0x83/0x140 <4>[ 462.594247] bus_for_each_dev+0x75/0xc0 <4>[ 462.594254] bus_add_driver+0x14b/0x1f0 <4>[ 462.594260] driver_register+0x66/0xb0 <4>[ 462.594267] i915_init+0x70/0x87 [i915] <4>[ 462.594586] do_one_initcall+0x56/0x2e0 <4>[ 462.594592] do_init_module+0x55/0x200 <4>[ 462.594599] load_module+0x2703/0x2990 <4>[ 462.594605] __do_sys_finit_module+0xad/0x110 <4>[ 462.594612] do_syscall_64+0x33/0x80 <4>[ 462.594618] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.594625] other info that might help us debug this: <4>[ 462.594629] Chain exists of: cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1 <4>[ 462.594645] Possible unsafe locking scenario: <4>[ 462.594648] CPU0 CPU1 <4>[ 462.594652] ---- ---- <4>[ 462.594655] lock(&vm->mutex/1); <4>[ 462.594664] lock(fs_reclaim); <4>[ 462.594671] lock(&vm->mutex/1); <4>[ 462.594679] lock(cpu_hotplug_lock); <4>[ 462.594686] *** DEADLOCK *** <4>[ 462.594690] 4 locks held by i915_selftest/5540: <4>[ 462.594696] #0: ffff888100fbc240 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50 <4>[ 462.594715] #1: ffffc900006cb9a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: make_obj_busy+0x81/0x330 [i915] <4>[ 462.595118] #2: ffff88812a6081e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: make_obj_busy+0x21f/0x330 [i915] <4>[ 462.595519] #3: ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.595934] stack backtrace: <4>[ 462.595939] CPU: 0 PID: 5540 Comm: i915_selftest Tainted: G U 5.12.0-rc5-CI-Trybot_7644+ #1 <4>[ 462.595947] Hardware name: GOOGLE Kefka/Kefka, BIOS MrChromebox 02/04/2018 <4>[ 462.595952] Call Trace: <4>[ 462.595961] dump_stack+0x7f/0xad <4>[ 462.595974] check_noncircular+0x12e/0x150 <4>[ 462.595982] ? save_stack.isra.17+0x3f/0x70 <4>[ 462.595991] ? drm_mm_insert_node_in_range+0x34a/0x5b0 <4>[ 462.596000] ? i915_vma_pin_ww+0x9ec/0xb40 [i915] <4>[ 462.596410] __lock_acquire+0x1520/0x2590 <4>[ 462.596419] ? do_init_module+0x55/0x200 <4>[ 462.596429] lock_acquire+0xd1/0x3d0 <4>[ 462.596435] ? stop_machine+0x12/0x30 <4>[ 462.596445] ? gen8_ggtt_insert_entries+0xf0/0xf0 [i915] <4>[ 462.596816] cpus_read_lock+0x39/0xc0 <4>[ 462.596824] ? stop_machine+0x12/0x30 <4>[ 462.596831] stop_machine+0x12/0x30 <4>[ 462.596839] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.597210] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.597580] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.597986] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.598395] ? make_obj_busy+0xcb/0x330 [i915] <4>[ 462.598786] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.599180] ? 0xffffffff81000000 <4>[ 462.599187] ? debug_mutex_unlock+0x50/0xa0 <4>[ 462.599198] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.599592] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.600026] ? i915_perf_selftests+0x20/0x20 [i915] <4>[ 462.600422] ? __i915_nop_setup+0x10/0x10 [i915] <4>[ 462.600820] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.601253] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.601686] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.602037] ? _raw_spin_unlock_irqrestore+0x3d/0x60 <4>[ 462.602047] pci_device_probe+0x9e/0x110 <4>[ 462.602057] really_probe+0xea/0x410 <4>[ 462.602067] driver_probe_device+0xd9/0x140 <4>[ 462.602075] device_driver_attach+0x4a/0x50 <4>[ 462.602084] __driver_attach+0x83/0x140 <4>[ 462.602091] ? device_driver_attach+0x50/0x50 <4>[ 462.602099] ? device_driver_attach+0x50/0x50 <4>[ 462.602107] bus_for_each_dev+0x75/0xc0 <4>[ 462.602116] bus_add_driver+0x14b/0x1f0 <4>[ 462.602124] driver_register+0x66/0xb0 <4>[ 462.602133] i915_init+0x70/0x87 [i915] <4>[ 462.602453] ? 0xffffffffa0606000 <4>[ 462.602458] do_one_initcall+0x56/0x2e0 <4>[ 462.602466] ? kmem_cache_alloc_trace+0x374/0x430 <4>[ 462.602476] do_init_module+0x55/0x200 <4>[ 462.602484] load_module+0x2703/0x2990 <4>[ 462.602500] ? __do_sys_finit_module+0xad/0x110 <4>[ 462.602507] __do_sys_finit_module+0xad/0x110 <4>[ 462.602519] do_syscall_64+0x33/0x80 <4>[ 462.602527] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.602535] RIP: 0033:0x7fab69d8d89d Changes since v1: - Add lockdep annotations during init, to ensure that lockdep is primed. This also fixes a false positive when reading /proc/lockdep_stats during module reload. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426102351.921874-1-maarten.lankhorst@linux.intel.com Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2021-04-26 12:23:51 +02:00
return IS_CHERRYVIEW(i915) || intel_ggtt_update_needs_vtd_wa(i915);
drm/i915: Serialize GTT/Aperture accesses on BXT BXT has a H/W issue with IOMMU which can lead to system hangs when Aperture accesses are queued within the GAM behind GTT Accesses. This patch avoids the condition by wrapping all GTT updates in stop_machine and using a flushing read prior to restarting the machine. The stop_machine guarantees no new Aperture accesses can begin while the PTE writes are being emmitted. The flushing read ensures that any following Aperture accesses cannot begin until the PTE writes have been cleared out of the GAM's fifo. Only FOLLOWING Aperture accesses need to be separated from in flight PTE updates. PTE Writes may follow tightly behind already in flight Aperture accesses, so no flushing read is required at the start of a PTE update sequence. This issue was reproduced by running igt/gem_readwrite and igt/gem_render_copy simultaneously from different processes, each in a tight loop, with INTEL_IOMMU enabled. This patch was originally published as: drm/i915: Serialize GTT Updates on BXT v2: Move bxt/iommu detection into static function Remove #ifdef CONFIG_INTEL_IOMMU protection Make function names more reflective of purpose Move flushing read into static function v3: Tidy up for checkpatch.pl Testcase: igt/gem_concurrent_blit Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: John Harrison <john.C.Harrison@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1495641251-30022-1-git-send-email-jon.bloomfield@intel.com Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-24 08:54:11 -07:00
}
/* i915_getparam.c */
int i915_getparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
/* i915_gem.c */
int i915_gem_init_userptr(struct drm_i915_private *dev_priv);
void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv);
void i915_gem_init_early(struct drm_i915_private *dev_priv);
void i915_gem_cleanup_early(struct drm_i915_private *dev_priv);
static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
{
/*
* A single pass should suffice to release all the freed objects (along
* most call paths) , but be a little more paranoid in that freeing
* the objects does take a little amount of time, during which the rcu
* callbacks could have added new objects into the freed list, and
* armed the work again.
*/
while (atomic_read(&i915->mm.free_count)) {
flush_work(&i915->mm.free_work);
rcu_barrier();
}
}
static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
{
/*
* Similar to objects above (see i915_gem_drain_freed-objects), in
* general we have workers that are armed by RCU and then rearm
* themselves in their callbacks. To be paranoid, we need to
* drain the workqueue a second time after waiting for the RCU
* grace period so that we catch work queued via RCU from the first
* pass. As neither drain_workqueue() nor flush_workqueue() report
* a result, we make an assumption that we only don't require more
* than 3 passes to catch all _recursive_ RCU delayed work.
*
*/
int pass = 3;
do {
flush_workqueue(i915->wq);
rcu_barrier();
i915_gem_drain_freed_objects(i915);
} while (--pass);
drain_workqueue(i915->wq);
}
struct i915_vma * __must_check
i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object *obj,
struct i915_gem_ww_ctx *ww,
const struct i915_ggtt_view *view,
u64 size, u64 alignment, u64 flags);
static inline struct i915_vma * __must_check
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
const struct i915_ggtt_view *view,
u64 size, u64 alignment, u64 flags)
{
return i915_gem_object_ggtt_pin_ww(obj, NULL, view, size, alignment, flags);
}
drm/i915: Infrastructure for supporting different GGTT views per object Things like reliable GGTT mappings and mirrored 2d-on-3d display will need to map objects into the same address space multiple times. Added a GGTT view concept and linked it with the VMA to distinguish between multiple instances per address space. New objects and GEM functions which do not take this new view as a parameter assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the previous behaviour. This now means that objects can have multiple VMA entries so the code which assumed there will only be one also had to be modified. Alternative GGTT views are supposed to borrow DMA addresses from obj->pages which is DMA mapped on first VMA instantiation and unmapped on the last one going away. v2: * Removed per view special casing in i915_gem_ggtt_prepare / finish_object in favour of creating and destroying DMA mappings on first VMA instantiation and last VMA destruction. (Daniel Vetter) * Simplified i915_vma_unbind which does not need to count the GGTT views. (Daniel Vetter) * Also moved obj->map_and_fenceable reset under the same check. * Checkpatch cleanups. v3: * Only retire objects once the last VMA is unbound. v4: * Keep scatter-gather table for alternative views persistent for the lifetime of the VMA. * Propagate binding errors to callers and handle appropriately. v5: * Explicitly look for normal GGTT view in i915_gem_obj_bound to align usage in i915_gem_object_ggtt_unpin. (Michel Thierry) * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry) * Removed stray semi-colon in i915_gem_object_set_cache_level. For: VIZ-4544 Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Michel Thierry <michel.thierry@intel.com> [danvet: Drop hunk from i915_gem_shrink since it's just prettification but upsets a __must_check warning.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-10 17:27:58 +00:00
int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
unsigned long flags);
#define I915_GEM_OBJECT_UNBIND_ACTIVE BIT(0)
drm/i915/gem: Avoid rcu_barrier() from shrinker paths As i915_gem_object_unbind() waits on an rcu_barrier() to flush vm releases (and destruction of their bound vma), we have to be careful not to invoke that barrier from beneath the shrinker: <4> [430.222671] WARNING: possible circular locking dependency detected <4> [430.222673] 5.4.0-rc8-CI-CI_DRM_7508+ #1 Tainted: G U <4> [430.222675] ------------------------------------------------------ <4> [430.222677] gem_pwrite/2317 is trying to acquire lock: <4> [430.222678] ffffffff82248218 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190 <4> [430.222685] but task is already holding lock: <4> [430.222687] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30 <4> [430.222691] which lock already depends on the new lock. <4> [430.222693] the existing dependency chain (in reverse order) is: <4> [430.222695] -> #2 (fs_reclaim){+.+.}: <4> [430.222698] fs_reclaim_acquire.part.117+0x24/0x30 <4> [430.222702] kmem_cache_alloc_trace+0x2a/0x2c0 <4> [430.222705] intel_cpuc_prepare+0x37/0x1a0 <4> [430.222709] cpuhp_invoke_callback+0x9b/0x9d0 <4> [430.222712] _cpu_up+0xa2/0x140 <4> [430.222714] do_cpu_up+0x61/0xa0 <4> [430.222718] smp_init+0x57/0x96 <4> [430.222722] kernel_init_freeable+0xac/0x1c7 <4> [430.222725] kernel_init+0x5/0x100 <4> [430.222728] ret_from_fork+0x24/0x50 <4> [430.222729] -> #1 (cpu_hotplug_lock.rw_sem){++++}: <4> [430.222733] cpus_read_lock+0x34/0xd0 <4> [430.222734] rcu_barrier+0xaa/0x190 <4> [430.222736] kernel_init+0x21/0x100 <4> [430.222737] ret_from_fork+0x24/0x50 <4> [430.222739] -> #0 (rcu_state.barrier_mutex){+.+.}: <4> [430.222742] __lock_acquire+0x1328/0x15d0 <4> [430.222743] lock_acquire+0xa7/0x1c0 <4> [430.222746] __mutex_lock+0x9a/0x9d0 <4> [430.222747] rcu_barrier+0x23/0x190 <4> [430.222850] i915_gem_object_unbind+0x264/0x3d0 [i915] <4> [430.222882] i915_gem_shrink+0x297/0x5f0 [i915] <4> [430.222912] i915_gem_shrink_all+0x38/0x60 [i915] <4> [430.222934] i915_drop_caches_set+0x1f0/0x240 [i915] <4> [430.222938] simple_attr_write+0xb0/0xd0 <4> [430.222941] full_proxy_write+0x51/0x80 <4> [430.222943] vfs_write+0xb9/0x1d0 <4> [430.222944] ksys_write+0x9f/0xe0 <4> [430.222946] do_syscall_64+0x4f/0x210 <4> [430.222948] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [430.222950] other info that might help us debug this: <4> [430.222952] Chain exists of: rcu_state.barrier_mutex --> cpu_hotplug_lock.rw_sem --> fs_reclaim <4> [430.222955] Possible unsafe locking scenario: <4> [430.222957] CPU0 CPU1 <4> [430.222958] ---- ---- <4> [430.222960] lock(fs_reclaim); <4> [430.222961] lock(cpu_hotplug_lock.rw_sem); <4> [430.222963] lock(fs_reclaim); <4> [430.222964] lock(rcu_state.barrier_mutex); <4> [430.222966] *** DEADLOCK *** <4> [430.222968] 3 locks held by gem_pwrite/2317: <4> [430.222969] #0: ffff88849e2d9408 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0 <4> [430.222973] #1: ffff888496976db0 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0 <4> [430.222976] #2: ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30 <4> [430.222980] stack backtrace: <4> [430.222982] CPU: 1 PID: 2317 Comm: gem_pwrite Tainted: G U 5.4.0-rc8-CI-CI_DRM_7508+ #1 <4> [430.222985] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019 <4> [430.222989] Call Trace: <4> [430.222992] dump_stack+0x71/0x9b <4> [430.222995] check_noncircular+0x19b/0x1c0 <4> [430.222998] ? __lock_acquire+0x1328/0x15d0 <4> [430.222999] __lock_acquire+0x1328/0x15d0 <4> [430.223001] ? mark_held_locks+0x49/0x70 <4> [430.223003] lock_acquire+0xa7/0x1c0 <4> [430.223005] ? rcu_barrier+0x23/0x190 <4> [430.223008] __mutex_lock+0x9a/0x9d0 <4> [430.223009] ? rcu_barrier+0x23/0x190 <4> [430.223011] ? rcu_barrier+0x23/0x190 <4> [430.223013] ? find_held_lock+0x2d/0x90 <4> [430.223045] ? i915_gem_object_unbind+0x24a/0x3d0 [i915] <4> [430.223048] ? rcu_barrier+0x23/0x190 <4> [430.223049] rcu_barrier+0x23/0x190 <4> [430.223081] i915_gem_object_unbind+0x264/0x3d0 [i915] <4> [430.223119] i915_gem_shrink+0x297/0x5f0 [i915] Closes: https://gitlab.freedesktop.org/drm/intel/issues/743 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191208161252.3015727-1-chris@chris-wilson.co.uk
2019-12-08 16:12:51 +00:00
#define I915_GEM_OBJECT_UNBIND_BARRIER BIT(1)
#define I915_GEM_OBJECT_UNBIND_TEST BIT(2)
drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2. The stop_machine() lock may allocate memory, but is called inside vm->mutex, which is taken in the shrinker. This will cause a lockdep splat, as can be seen below: <4>[ 462.585762] ====================================================== <4>[ 462.585768] WARNING: possible circular locking dependency detected <4>[ 462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G U <4>[ 462.585779] ------------------------------------------------------ <4>[ 462.585783] i915_selftest/5540 is trying to acquire lock: <4>[ 462.585788] ffffffff826440b0 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x12/0x30 <4>[ 462.585814] but task is already holding lock: <4>[ 462.585818] ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.586301] which lock already depends on the new lock. <4>[ 462.586305] the existing dependency chain (in reverse order) is: <4>[ 462.586309] -> #2 (&vm->mutex/1){+.+.}-{3:3}: <4>[ 462.586323] i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4>[ 462.586719] i915_address_space_init+0x12d/0x130 [i915] <4>[ 462.587092] ppgtt_init+0x4e/0x80 [i915] <4>[ 462.587467] gen8_ppgtt_create+0x3e/0x5c0 [i915] <4>[ 462.587828] i915_ppgtt_create+0x28/0xf0 [i915] <4>[ 462.588203] intel_gt_init+0x123/0x370 [i915] <4>[ 462.588572] i915_gem_init+0x129/0x1f0 [i915] <4>[ 462.588971] i915_driver_probe+0x753/0xd80 [i915] <4>[ 462.589320] i915_pci_probe+0x43/0x1d0 [i915] <4>[ 462.589671] pci_device_probe+0x9e/0x110 <4>[ 462.589680] really_probe+0xea/0x410 <4>[ 462.589690] driver_probe_device+0xd9/0x140 <4>[ 462.589697] device_driver_attach+0x4a/0x50 <4>[ 462.589704] __driver_attach+0x83/0x140 <4>[ 462.589711] bus_for_each_dev+0x75/0xc0 <4>[ 462.589718] bus_add_driver+0x14b/0x1f0 <4>[ 462.589724] driver_register+0x66/0xb0 <4>[ 462.589731] i915_init+0x70/0x87 [i915] <4>[ 462.590053] do_one_initcall+0x56/0x2e0 <4>[ 462.590061] do_init_module+0x55/0x200 <4>[ 462.590068] load_module+0x2703/0x2990 <4>[ 462.590074] __do_sys_finit_module+0xad/0x110 <4>[ 462.590080] do_syscall_64+0x33/0x80 <4>[ 462.590089] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.590096] -> #1 (fs_reclaim){+.+.}-{0:0}: <4>[ 462.590109] fs_reclaim_acquire+0x9f/0xd0 <4>[ 462.590118] kmem_cache_alloc_trace+0x3d/0x430 <4>[ 462.590126] intel_cpuc_prepare+0x3b/0x1b0 <4>[ 462.590133] cpuhp_invoke_callback+0x9e/0x890 <4>[ 462.590141] _cpu_up+0xa4/0x130 <4>[ 462.590147] cpu_up+0x82/0x90 <4>[ 462.590153] bringup_nonboot_cpus+0x4a/0x60 <4>[ 462.590159] smp_init+0x21/0x5c <4>[ 462.590167] kernel_init_freeable+0x8a/0x1b7 <4>[ 462.590175] kernel_init+0x5/0xff <4>[ 462.590181] ret_from_fork+0x22/0x30 <4>[ 462.590187] -> #0 (cpu_hotplug_lock){++++}-{0:0}: <4>[ 462.590199] __lock_acquire+0x1520/0x2590 <4>[ 462.590207] lock_acquire+0xd1/0x3d0 <4>[ 462.590213] cpus_read_lock+0x39/0xc0 <4>[ 462.590219] stop_machine+0x12/0x30 <4>[ 462.590226] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.590601] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.590970] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.591374] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.591779] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.592170] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.592562] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.592995] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.593428] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.593860] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.594210] pci_device_probe+0x9e/0x110 <4>[ 462.594217] really_probe+0xea/0x410 <4>[ 462.594226] driver_probe_device+0xd9/0x140 <4>[ 462.594233] device_driver_attach+0x4a/0x50 <4>[ 462.594240] __driver_attach+0x83/0x140 <4>[ 462.594247] bus_for_each_dev+0x75/0xc0 <4>[ 462.594254] bus_add_driver+0x14b/0x1f0 <4>[ 462.594260] driver_register+0x66/0xb0 <4>[ 462.594267] i915_init+0x70/0x87 [i915] <4>[ 462.594586] do_one_initcall+0x56/0x2e0 <4>[ 462.594592] do_init_module+0x55/0x200 <4>[ 462.594599] load_module+0x2703/0x2990 <4>[ 462.594605] __do_sys_finit_module+0xad/0x110 <4>[ 462.594612] do_syscall_64+0x33/0x80 <4>[ 462.594618] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.594625] other info that might help us debug this: <4>[ 462.594629] Chain exists of: cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1 <4>[ 462.594645] Possible unsafe locking scenario: <4>[ 462.594648] CPU0 CPU1 <4>[ 462.594652] ---- ---- <4>[ 462.594655] lock(&vm->mutex/1); <4>[ 462.594664] lock(fs_reclaim); <4>[ 462.594671] lock(&vm->mutex/1); <4>[ 462.594679] lock(cpu_hotplug_lock); <4>[ 462.594686] *** DEADLOCK *** <4>[ 462.594690] 4 locks held by i915_selftest/5540: <4>[ 462.594696] #0: ffff888100fbc240 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50 <4>[ 462.594715] #1: ffffc900006cb9a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: make_obj_busy+0x81/0x330 [i915] <4>[ 462.595118] #2: ffff88812a6081e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: make_obj_busy+0x21f/0x330 [i915] <4>[ 462.595519] #3: ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915] <4>[ 462.595934] stack backtrace: <4>[ 462.595939] CPU: 0 PID: 5540 Comm: i915_selftest Tainted: G U 5.12.0-rc5-CI-Trybot_7644+ #1 <4>[ 462.595947] Hardware name: GOOGLE Kefka/Kefka, BIOS MrChromebox 02/04/2018 <4>[ 462.595952] Call Trace: <4>[ 462.595961] dump_stack+0x7f/0xad <4>[ 462.595974] check_noncircular+0x12e/0x150 <4>[ 462.595982] ? save_stack.isra.17+0x3f/0x70 <4>[ 462.595991] ? drm_mm_insert_node_in_range+0x34a/0x5b0 <4>[ 462.596000] ? i915_vma_pin_ww+0x9ec/0xb40 [i915] <4>[ 462.596410] __lock_acquire+0x1520/0x2590 <4>[ 462.596419] ? do_init_module+0x55/0x200 <4>[ 462.596429] lock_acquire+0xd1/0x3d0 <4>[ 462.596435] ? stop_machine+0x12/0x30 <4>[ 462.596445] ? gen8_ggtt_insert_entries+0xf0/0xf0 [i915] <4>[ 462.596816] cpus_read_lock+0x39/0xc0 <4>[ 462.596824] ? stop_machine+0x12/0x30 <4>[ 462.596831] stop_machine+0x12/0x30 <4>[ 462.596839] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915] <4>[ 462.597210] ggtt_bind_vma+0x5d/0x80 [i915] <4>[ 462.597580] i915_vma_bind+0xdc/0x1c0 [i915] <4>[ 462.597986] i915_vma_pin_ww+0x435/0xb40 [i915] <4>[ 462.598395] ? make_obj_busy+0xcb/0x330 [i915] <4>[ 462.598786] make_obj_busy+0xcb/0x330 [i915] <4>[ 462.599180] ? 0xffffffff81000000 <4>[ 462.599187] ? debug_mutex_unlock+0x50/0xa0 <4>[ 462.599198] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915] <4>[ 462.599592] __i915_subtests.cold.7+0x42/0x92 [i915] <4>[ 462.600026] ? i915_perf_selftests+0x20/0x20 [i915] <4>[ 462.600422] ? __i915_nop_setup+0x10/0x10 [i915] <4>[ 462.600820] __run_selftests.part.3+0x10d/0x172 [i915] <4>[ 462.601253] i915_live_selftests.cold.5+0x1f/0x47 [i915] <4>[ 462.601686] i915_pci_probe+0x93/0x1d0 [i915] <4>[ 462.602037] ? _raw_spin_unlock_irqrestore+0x3d/0x60 <4>[ 462.602047] pci_device_probe+0x9e/0x110 <4>[ 462.602057] really_probe+0xea/0x410 <4>[ 462.602067] driver_probe_device+0xd9/0x140 <4>[ 462.602075] device_driver_attach+0x4a/0x50 <4>[ 462.602084] __driver_attach+0x83/0x140 <4>[ 462.602091] ? device_driver_attach+0x50/0x50 <4>[ 462.602099] ? device_driver_attach+0x50/0x50 <4>[ 462.602107] bus_for_each_dev+0x75/0xc0 <4>[ 462.602116] bus_add_driver+0x14b/0x1f0 <4>[ 462.602124] driver_register+0x66/0xb0 <4>[ 462.602133] i915_init+0x70/0x87 [i915] <4>[ 462.602453] ? 0xffffffffa0606000 <4>[ 462.602458] do_one_initcall+0x56/0x2e0 <4>[ 462.602466] ? kmem_cache_alloc_trace+0x374/0x430 <4>[ 462.602476] do_init_module+0x55/0x200 <4>[ 462.602484] load_module+0x2703/0x2990 <4>[ 462.602500] ? __do_sys_finit_module+0xad/0x110 <4>[ 462.602507] __do_sys_finit_module+0xad/0x110 <4>[ 462.602519] do_syscall_64+0x33/0x80 <4>[ 462.602527] entry_SYSCALL_64_after_hwframe+0x44/0xae <4>[ 462.602535] RIP: 0033:0x7fab69d8d89d Changes since v1: - Add lockdep annotations during init, to ensure that lockdep is primed. This also fixes a false positive when reading /proc/lockdep_stats during module reload. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426102351.921874-1-maarten.lankhorst@linux.intel.com Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2021-04-26 12:23:51 +02:00
#define I915_GEM_OBJECT_UNBIND_VM_TRYLOCK BIT(3)
void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv);
int i915_gem_dumb_create(struct drm_file *file_priv,
struct drm_device *dev,
struct drm_mode_create_dumb *args);
int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
static inline u32 i915_reset_count(struct i915_gpu_error *error)
{
return atomic_read(&error->reset_count);
drm/i915: clear up wedged transitions We have two important transitions of the wedged state in the current code: - 0 -> 1: This means a hang has been detected, and signals to everyone that they please get of any locks, so that the reset work item can do its job. - 1 -> 0: The reset handler has completed. Now the last transition mixes up two states: "Reset completed and successful" and "Reset failed". To distinguish these two we do some tricks with the reset completion, but I simply could not convince myself that this doesn't race under odd circumstances. Hence split this up, and add a new terminal state indicating that the hw is gone for good. Also add explicit #defines for both states, update comments. v2: Split out the reset handling bugfix for the throttle ioctl. v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase error which prevented this patch from actually compiling. v4: To unify the wedged state with the reset counter, keep the reset-in-progress state just as a flag. The terminally-wedged state is now denoted with a big number. v5: Add a comment to the reset_counter special values explaining that WEDGED & RESET_IN_PROGRESS needs to be true for the code to be correct. v6: Fixup logic errors introduced with the wedged+reset_counter unification. Since WEDGED implies reset-in-progress (in a way we're terminally stuck in the dead-but-reset-not-completed state), we need ensure that we check for this everywhere. The specific bug was in wait_for_error, which would simply have timed out. v7: Extract an inline i915_reset_in_progress helper to make the code more readable. Also annote the reset-in-progress case with an unlikely, to help the compiler optimize the fastpath. Do the same for the terminally wedged case with i915_terminally_wedged. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-15 17:17:22 +01:00
}
drm/i915: Record the tail at each request and use it to estimate the head By recording the location of every request in the ringbuffer, we know that in order to retire the request the GPU must have finished reading it and so the GPU head is now beyond the tail of the request. We can therefore provide a conservative estimate of where the GPU is reading from in order to avoid having to read back the ring buffer registers when polling for space upon starting a new write into the ringbuffer. A secondary effect is that this allows us to convert intel_ring_buffer_wait() to use i915_wait_request() and so consolidate upon the single function to handle the complicated task of waiting upon the GPU. A necessary precaution is that we need to make that wait uninterruptible to match the existing conditions as all the callers of intel_ring_begin() have not been audited to handle ERESTARTSYS correctly. By using a conservative estimate for the head, and always processing all outstanding requests first, we prevent a race condition between using the estimate and direct reads of I915_RING_HEAD which could result in the value of the head going backwards, and the tail overflowing once again. We are also careful to mark any request that we skip over in order to free space in ring as consumed which provides a self-consistency check. Given sufficient abuse, such as a set of unthrottled GPU bound cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on Sandy Bridge (i5-2520m): firefox-paintball 18927ms -> 15646ms: 1.21x speedup firefox-fishtank 12563ms -> 11278ms: 1.11x speedup which is a mild consolation for the performance those traces achieved from exploiting the buggy autoreported head. v2: Add a few more comments and make request->tail a conservative estimate as suggested by Daniel Vetter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: resolve conflicts with retirement defering and the lack of the autoreport head removal (that will go in through -fixes).] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-15 11:25:36 +00:00
static inline u32 i915_reset_engine_count(struct i915_gpu_error *error,
const struct intel_engine_cs *engine)
{
return atomic_read(&error->reset_engine_count[engine->uabi_class]);
}
int __must_check i915_gem_init(struct drm_i915_private *dev_priv);
void i915_gem_driver_register(struct drm_i915_private *i915);
void i915_gem_driver_unregister(struct drm_i915_private *i915);
void i915_gem_driver_remove(struct drm_i915_private *dev_priv);
void i915_gem_driver_release(struct drm_i915_private *dev_priv);
void i915_gem_suspend(struct drm_i915_private *dev_priv);
void i915_gem_suspend_late(struct drm_i915_private *dev_priv);
void i915_gem_resume(struct drm_i915_private *dev_priv);
int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
enum i915_cache_level cache_level);
i915: add dmabuf/prime buffer sharing support. This adds handle->fd and fd->handle support to i915, this is to allow for offloading of rendering in one direction and outputs in the other. v2 from Daniel Vetter: - fixup conflicts with the prepare/finish gtt prep work. - implement ppgtt binding support. Note that we have squat i-g-t testcoverage for any of the lifetime and access rules dma_buf/prime support brings along. And there are quite a few intricate situations here. Also note that the integration with the existing code is a bit hackish, especially around get_gtt_pages and put_gtt_pages. It imo would be easier with the prep code from Chris Wilson's unbound series, but that is for 3.6. Also note that I didn't bother to put the new prepare/finish gtt hooks to good use by moving the dma_buf_map/unmap_attachment calls in there (like we've originally planned for). Last but not least this patch is only compile-tested, but I've changed very little compared to Dave Airlie's version. So there's a decent chance v2 on drm-next works as well as v1 on 3.4-rc. v3: Right when I've hit sent I've noticed that I've screwed up one obj->sg_list (for dmar support) and obj->sg_table (for prime support) disdinction. We should be able to merge these 2 paths, but that's material for another patch. v4: fix the error reporting bugs pointed out by ickle. v5: fix another error, and stop non-gtt mmaps on shared objects stop pread/pwrite on imported objects, add fake kmap Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-10 15:25:09 +02:00
struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
struct dma_buf *dma_buf);
drm/prime: Align gem_prime_export with obj_funcs.export The idea is that gem_prime_export is deprecated in favor of obj_funcs.export. That's much easier to do if both have matching function signatures. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <maxime.ripard@bootlin.com> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Eric Anholt <eric@anholt.net> Cc: "Michel Dänzer" <michel.daenzer@amd.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Huang Rui <ray.huang@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Feifei Xu <Feifei.Xu@amd.com> Cc: Jim Qu <Jim.Qu@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Thomas Zimmermann <tdz@users.sourceforge.net> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jilayne Lovejoy <opensource@jilayne.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Junwei Zhang <Jerry.Zhang@amd.com> Cc: intel-gvt-dev@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: amd-gfx@lists.freedesktop.org Cc: linux-tegra@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20190614203615.12639-10-daniel.vetter@ffwll.ch
2019-06-14 22:35:25 +02:00
struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags);
i915: add dmabuf/prime buffer sharing support. This adds handle->fd and fd->handle support to i915, this is to allow for offloading of rendering in one direction and outputs in the other. v2 from Daniel Vetter: - fixup conflicts with the prepare/finish gtt prep work. - implement ppgtt binding support. Note that we have squat i-g-t testcoverage for any of the lifetime and access rules dma_buf/prime support brings along. And there are quite a few intricate situations here. Also note that the integration with the existing code is a bit hackish, especially around get_gtt_pages and put_gtt_pages. It imo would be easier with the prep code from Chris Wilson's unbound series, but that is for 3.6. Also note that I didn't bother to put the new prepare/finish gtt hooks to good use by moving the dma_buf_map/unmap_attachment calls in there (like we've originally planned for). Last but not least this patch is only compile-tested, but I've changed very little compared to Dave Airlie's version. So there's a decent chance v2 on drm-next works as well as v1 on 3.4-rc. v3: Right when I've hit sent I've noticed that I've screwed up one obj->sg_list (for dmar support) and obj->sg_table (for prime support) disdinction. We should be able to merge these 2 paths, but that's material for another patch. v4: fix the error reporting bugs pointed out by ickle. v5: fix another error, and stop non-gtt mmaps on shared objects stop pread/pwrite on imported objects, add fake kmap Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-10 15:25:09 +02:00
static inline struct i915_address_space *
i915_gem_vm_lookup(struct drm_i915_file_private *file_priv, u32 id)
{
struct i915_address_space *vm;
xa_lock(&file_priv->vm_xa);
vm = xa_load(&file_priv->vm_xa, id);
if (vm)
kref_get(&vm->ref);
xa_unlock(&file_priv->vm_xa);
return vm;
}
/* i915_gem_evict.c */
int __must_check i915_gem_evict_something(struct i915_address_space *vm,
u64 min_size, u64 alignment,
unsigned long color,
u64 start, u64 end,
unsigned flags);
int __must_check i915_gem_evict_for_node(struct i915_address_space *vm,
struct drm_mm_node *node,
unsigned int flags);
drm/i915: Eliminate lots of iterations over the execobjects array The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 15:05:19 +01:00
int i915_gem_evict_vm(struct i915_address_space *vm);
drm/i915: Introduce an internal allocator for disposable private objects Quite a few of our objects used for internal hardware programming do not benefit from being swappable or from being zero initialised. As such they do not benefit from using a shmemfs backing storage and since they are internal and never directly exposed to the user, we do not need to worry about providing a filp. For these we can use an drm_i915_gem_object wrapper around a sg_table of plain struct page. They are not swap backed and not automatically pinned. If they are reaped by the shrinker, the pages are released and the contents discarded. For the internal use case, this is fine as for example, ringbuffers are pinned from being written by a request to be read by the hardware. Once they are idle, they can be discarded entirely. As such they are a good match for execlist ringbuffers and a small variety of other internal objects. In the first iteration, this is limited to the scratch batch buffers we use (for command parsing and state initialisation). v2: Allocate physically contiguous pages, where possible. v3: Reduce maximum order on subsequent requests following an allocation failure. v4: Fix up mismatch between swiotlb segment size and page count (it counts in 2k units, not 4k pages) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk
2016-10-28 13:58:30 +01:00
/* i915_gem_internal.c */
struct drm_i915_gem_object *
i915_gem_object_create_internal(struct drm_i915_private *dev_priv,
phys_addr_t size);
drm/i915: Introduce an internal allocator for disposable private objects Quite a few of our objects used for internal hardware programming do not benefit from being swappable or from being zero initialised. As such they do not benefit from using a shmemfs backing storage and since they are internal and never directly exposed to the user, we do not need to worry about providing a filp. For these we can use an drm_i915_gem_object wrapper around a sg_table of plain struct page. They are not swap backed and not automatically pinned. If they are reaped by the shrinker, the pages are released and the contents discarded. For the internal use case, this is fine as for example, ringbuffers are pinned from being written by a request to be read by the hardware. Once they are idle, they can be discarded entirely. As such they are a good match for execlist ringbuffers and a small variety of other internal objects. In the first iteration, this is limited to the scratch batch buffers we use (for command parsing and state initialisation). v2: Allocate physically contiguous pages, where possible. v3: Reduce maximum order on subsequent requests following an allocation failure. v4: Fix up mismatch between swiotlb segment size and page count (it counts in 2k units, not 4k pages) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk
2016-10-28 13:58:30 +01:00
/* i915_gem_tiling.c */
static inline bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
{
struct drm_i915_private *i915 = to_i915(obj->base.dev);
return i915->ggtt.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_9_10_17 &&
i915_gem_object_is_tiled(obj);
}
u32 i915_gem_fence_size(struct drm_i915_private *dev_priv, u32 size,
unsigned int tiling, unsigned int stride);
u32 i915_gem_fence_alignment(struct drm_i915_private *dev_priv, u32 size,
unsigned int tiling, unsigned int stride);
const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
/* i915_cmd_parser.c */
int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv);
int intel_engine_init_cmd_parser(struct intel_engine_cs *engine);
void intel_engine_cleanup_cmd_parser(struct intel_engine_cs *engine);
int intel_engine_cmd_parser(struct intel_engine_cs *engine,
struct i915_vma *batch,
unsigned long batch_offset,
unsigned long batch_length,
drm/i915/gem: Prepare gen7 cmdparser for async execution The gen7 cmdparser is primarily a promotion-based system to allow access to additional registers beyond the HW validation, and allows fallback to normal execution of the user batch buffer if valid and requires chaining. In the next patch, we will do the cmdparser validation in the pipeline asynchronously and so at the point of request construction we will not know if we want to execute the privileged and validated batch, or the original user batch. The solution employed here is to execute both batches, one with raised privileges and one as normal. This is because the gen7 MI_BATCH_BUFFER_START command cannot change privilege level within a batch and must strictly use the current privilege level (or undefined behaviour kills the GPU). So in order to execute the original batch, we need a second non-priviledged batch buffer chain from the ring, i.e. we need to emit two batches for each user batch. Inside the two batches we determine which one should actually execute, we provide a conditional trampoline to call the original batch. Implementation-wise, we create a single buffer and write the shadow and the trampoline inside it at different offsets; and bind the buffer into both the kernel GGTT for the privileged execution of the shadow and into the user ppGTT for the non-privileged execution of the trampoline and original batch. One buffer, two batches and two vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211230858.599030-1-chris@chris-wilson.co.uk
2019-12-11 23:08:56 +00:00
struct i915_vma *shadow,
drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser" This reverts 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser"). The justification for this commit in the git history was a vague comment about getting it out from under the struct_mutex. While this may improve perf for some workloads on Gen7 platforms where we rely on the command parser for features such as indirect rendering, no numbers were provided to prove such an improvement. It claims to closed two gitlab/bugzilla issues but with no explanation whatsoever as to why or what bug it's fixing. Meanwhile, by moving command parsing off to an async callback, it leaves us with a problem of what to do on error. When things were synchronous, EXECBUFFER2 would fail with an error code if parsing failed. When moving it to async, we needed another way to handle that error and the solution employed was to set an error on the dma_fence and then trust that said error gets propagated to the client eventually. Moving back to synchronous will help us untangle the fence error propagation mess. This also reverts most of 0edbb9ba1bfe ("drm/i915: Move cmd parser pinning to execbuffer") which is a refactor of some of our allocation paths for asynchronous parsing. Now that everything is synchronous, we don't need it. v2 (Daniel Vetter): - Add stabel Cc and Fixes tag Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: <stable@vger.kernel.org> # v5.6+ Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-2-jason@jlekstrand.net (cherry picked from commit 93b713304188844b8514074dc13ffd56d12235d3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-07-14 14:34:15 -05:00
bool trampoline);
drm/i915/gem: Prepare gen7 cmdparser for async execution The gen7 cmdparser is primarily a promotion-based system to allow access to additional registers beyond the HW validation, and allows fallback to normal execution of the user batch buffer if valid and requires chaining. In the next patch, we will do the cmdparser validation in the pipeline asynchronously and so at the point of request construction we will not know if we want to execute the privileged and validated batch, or the original user batch. The solution employed here is to execute both batches, one with raised privileges and one as normal. This is because the gen7 MI_BATCH_BUFFER_START command cannot change privilege level within a batch and must strictly use the current privilege level (or undefined behaviour kills the GPU). So in order to execute the original batch, we need a second non-priviledged batch buffer chain from the ring, i.e. we need to emit two batches for each user batch. Inside the two batches we determine which one should actually execute, we provide a conditional trampoline to call the original batch. Implementation-wise, we create a single buffer and write the shadow and the trampoline inside it at different offsets; and bind the buffer into both the kernel GGTT for the privileged execution of the shadow and into the user ppGTT for the non-privileged execution of the trampoline and original batch. One buffer, two batches and two vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211230858.599030-1-chris@chris-wilson.co.uk
2019-12-11 23:08:56 +00:00
#define I915_CMD_PARSER_TRAMPOLINE_SIZE 8
/* intel_device_info.c */
static inline struct intel_device_info *
mkwrite_device_info(struct drm_i915_private *dev_priv)
{
return (struct intel_device_info *)INTEL_INFO(dev_priv);
}
int i915_reg_read_ioctl(struct drm_device *dev, void *data,
struct drm_file *file);
drm/i915/execlists: Read the context-status HEAD from the HWSP The engine also provides a mirror of the CSB write pointer in the HWSP, but not of our read pointer. To take advantage of this we need to remember where we read up to on the last interrupt and continue off from there. This poses a problem following a reset, as we don't know where the hw will start writing from, and due to the use of power contexts we cannot perform that query during the reset itself. So we continue the current modus operandi of delaying the first read of the context-status read/write pointers until after the first interrupt. With this we should now have eliminated all uncached mmio reads in handling the context-status interrupt, though we still have the uncached mmio writes for submitting new work, and many uncached mmio reads in the global interrupt handler itself. Still a step in the right direction towards reducing our resubmit latency, although it appears lost in the noise! v2: Cannonlake moved the CSB write index v3: Include the sw/hwsp state in debugfs/i915_engine_info v4: Also revert to using CSB mmio for GVT-g v5: Prevent the compiler reloading tail (Mika) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-09-13 09:56:05 +01:00
static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
{
if (GRAPHICS_VER(i915) >= 11)
return ICL_HWS_CSB_WRITE_INDEX;
drm/i915/execlists: Read the context-status HEAD from the HWSP The engine also provides a mirror of the CSB write pointer in the HWSP, but not of our read pointer. To take advantage of this we need to remember where we read up to on the last interrupt and continue off from there. This poses a problem following a reset, as we don't know where the hw will start writing from, and due to the use of power contexts we cannot perform that query during the reset itself. So we continue the current modus operandi of delaying the first read of the context-status read/write pointers until after the first interrupt. With this we should now have eliminated all uncached mmio reads in handling the context-status interrupt, though we still have the uncached mmio writes for submitting new work, and many uncached mmio reads in the global interrupt handler itself. Still a step in the right direction towards reducing our resubmit latency, although it appears lost in the noise! v2: Cannonlake moved the CSB write index v3: Include the sw/hwsp state in debugfs/i915_engine_info v4: Also revert to using CSB mmio for GVT-g v5: Prevent the compiler reloading tail (Mika) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-09-13 09:56:05 +01:00
else
return I915_HWS_CSB_WRITE_INDEX;
}
static inline enum i915_map_type
i915_coherent_map_type(struct drm_i915_private *i915,
struct drm_i915_gem_object *obj, bool always_coherent)
{
if (i915_gem_object_is_lmem(obj))
return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
return I915_MAP_WB;
else
return I915_MAP_WC;
}
#endif