linux

iv/linux

Author	SHA1	Message	Date
Rodrigo Vivi	8ed9aaae39	drm/xe: Force wedged state and block GT reset upon any GPU hang In many validation situations when debugging GPU Hangs, it is useful to preserve the GT situation from the moment that the timeout occurred. This patch introduces a module parameter that could be used on situations like this. If xe.wedged module parameter is set to 2, Xe will be declared wedged on every single execution timeout (a.k.a. GPU hang) right after devcoredump snapshot capture and without attempting any kind of GT reset and blocking entirely any kind of execution. v2: Really block gt_reset from guc side. (Lucas) s/wedged/busted (Lucas) v3: - s/busted/wedged - Really use global_flags (Dafna) - More robust timeout handling when wedging it. v4: A really robust clean exit done by Matt Brost. No more kernel warns on unbind. v5: Simplify error message (Lucas) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dafna Hirschfeld <dhirschfeld@habana.ai> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himanshu Somaiya <himanshu.somaiya@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	692818678e	drm/xe: declare wedged upon GuC load failure Let's block the device upon any GuC load failure. But let's continue with the probe so guc logs can be read from the debugfs. v2: - s/wedged/busted - do not block probe or we lose guc_logs in debugfs (Matt) v3: - s/busted/wedged v4: Do not change __xe_guc_upload return. (Himal) Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	fb74b205cd	drm/xe: Introduce a simple wedged state Introduce a very simple 'wedged' state where any attempt to access the GPU is entirely blocked. On some critical cases, like on gt_reset failure, we need to block any other attempt to use the GPU. Otherwise we are at a risk of reaching cases that would force us to reboot the machine. So, when this cases are identified we corner and block any GPU access. No IOCTL and not even another GT reset should be attempted. The 'wedged' state in Xe is an end state with no way back. Only a device "re-probe" (unbind + bind) can restore the GPU access. v2: - s/wedged/busted (Lucas) - use unbind+bind instead of module reload (Lucas) - added more info on unbind operations and instruction on bug report - only print the message once. v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas) - don't assume user has sudo and tee available (Lucas) v4: - remove unnecessary cases around ct communication or migration. Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
José Roberto de Souza	c8d4524ecc	drm/xe: Add INSTDONE registers to devcoredump This registers contains important information that can help with debug of GPU hangs. While at it also fixing the double line jump at the end of engine registers for CCS engines. v2: - print other INSTDONE registers v3: - add for_each_geometry/compute_dss() v4: - print one slice_common_instdone per glice in DG2+ v5: - rename registers prefix from DG2 to XEHPG (Zhanjun) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-3-jose.souza@intel.com	2024-04-24 09:06:39 -07:00
José Roberto de Souza	082a634f60	drm/xe: Add helpers to loop over geometry and compute DSS Some DSS can only be available for geometry while others can only be available for compute. So here adding helpers to loop only available DSS for given usage. User of this helper will come in the next patch. v2: - drop has_dss() Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-2-jose.souza@intel.com	2024-04-24 09:06:38 -07:00
José Roberto de Souza	f332625733	drm/xe: Store xe_hw_engine in xe_hw_engine_snapshot A future patch will require gt and xe device structs, so here replacing class by hwe. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-1-jose.souza@intel.com	2024-04-24 09:06:37 -07:00
Michal Wajdeczko	49f853c78e	drm/xe/pf: Clamp maximum execution quantum to 100s GuC is silently clamping values of the execution quantum and preemption timeout KLVs to 100s. Perform explicit clamping on the driver side as later there is no way to read back values used by the firmware and we shouldn't mislead the user about actual values being used when we print them in dmesg or debugfs. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-3-michal.wajdeczko@intel.com	2024-04-24 15:32:26 +02:00
Michal Wajdeczko	5a8c292f74	drm/xe/guc: Update VF configuration KLVs definitions GuC firmware specification says that maximum value for the execution quantum KLV is 100s and anything exceeding that will be clamped. The same limitation applies to the preemption timeout KLV. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-2-michal.wajdeczko@intel.com	2024-04-24 15:32:24 +02:00
Michal Wajdeczko	2cab6319b4	drm/xe/pf: Expose SR-IOV policy settings over debugfs We already have functions to configure SR-IOV policies. Allow to tweak those policy settings over debugfs. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-4-michal.wajdeczko@intel.com	2024-04-24 15:18:41 +02:00
Michal Wajdeczko	b00240b6a2	drm/xe/pf: Expose SR-IOV VF control commands over debugfs We already have functions to control the VF. Allow to control the VF using debugfs. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-3-michal.wajdeczko@intel.com	2024-04-24 15:18:40 +02:00
Michal Wajdeczko	e42a51fb9c	drm/xe/pf: Expose SR-IOV VFs configuration over debugfs We already have functions to configure VF resources and to print actual provisioning details. Expose this functionality in debugfs to allow experiment with different settings or inspect details in case of unexpected issues with the provisioning. As debugfs attributes are per-VF, we use parent d_inode->i_private to store VFID, similarly how we did for per-GT attributes. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-2-michal.wajdeczko@intel.com	2024-04-24 15:18:38 +02:00
Michal Wajdeczko	11294bf38f	drm/xe/kunit: Add PF service tests Start with basic tests for VF/PF ABI version negotiation. As we treat all platforms in the same way, we can run the tests on one platform. More tests will likely come later. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-6-michal.wajdeczko@intel.com	2024-04-24 15:10:46 +02:00
Michal Wajdeczko	98e6280592	drm/xe/pf: Add SR-IOV GuC Relay PF services We already have mechanism that allows a VF driver to communicate with the PF driver, now add PF side handlers for VF2PF requests defined in version 1.0 of VF/PF GuC Relay ABI specification. The VF2PF_HANDSHAKE request must be used by the VF driver to negotiate the ABI version prior to sending any other request. We will reset any negotiated version later during FLR. The outcome of the VF2PF_QUERY_RUNTIME requests depends on actual platform, for legacy platforms used as SDV is provided as-is, for latest platforms it is preliminary, and might be changed. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-5-michal.wajdeczko@intel.com	2024-04-24 15:10:42 +02:00
Michal Wajdeczko	dec793860d	drm/xe: Add few more GT register definitions While we are not using these registers right now, they are part of some runtime register lists that PF driver share with VFs on some legacy platforms that we might want to support as SDV. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-4-michal.wajdeczko@intel.com	2024-04-24 15:10:41 +02:00
Michal Wajdeczko	1cb4db30cf	drm/xe: Add helper to calculate adjusted register offset Our MMIO accessing functions automatically adjust addresses for the media registers based on mmio.adj_limit and mmio.adj_offset logic. Move it to the separate helper to avoid code duplication and to allow using it by the upcoming changes to PF driver code. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-3-michal.wajdeczko@intel.com	2024-04-24 15:10:40 +02:00
Michal Wajdeczko	8f21f82d8b	drm/xe/guc: Add GuC Relay ABI version 1.0 definitions This initial GuC Relay ABI specification includes messages for ABI version negotiation and to query values of runtime/fuse registers. We will start handling those messages on the PF driver soon. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-2-michal.wajdeczko@intel.com	2024-04-24 15:10:39 +02:00
Thomas Hellström	06e7139a03	drm/xe: Fix unexpected backmerge results The recent backmerge from drm-next to drm-xe-next brought with it some silent unexpected results. One code snippet was added twice and a partial revert had merge errors. Fix that up to reinstate the affected code as it was before the backmerge. v2: - Commit log message rewording (Lucas DeMarchi) Fixes: 79790b6818e9 ("Merge drm/drm-next into drm-xe-next") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423121114.39325-1-thomas.hellstrom@linux.intel.com	2024-04-24 10:25:51 +02:00
Rodrigo Vivi	869e54d4d5	drm/xe: make xe_pm_runtime_lockdep_map a static struct Fix the new sparse warning: >> drivers/gpu/drm/xe/xe_pm.c:72:20: sparse: sparse: symbol 'xe_pm_runtime_lockdep_map' was not declared. Should it be static? Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404191329.EZzOTzwK-lkp@intel.com/ Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240422201454.699089-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-23 10:43:22 -04:00
Michal Wajdeczko	48c64d495f	drm/xe/guc: Fix arguments passed to relay G2H handlers By default CT code was passing just payload of the G2H event message, while Relay code expects full G2H message including HXG header which contains DATA0 field. Fix that. Fixes: 26d4481ac23f ("drm/xe/guc: Start handling GuC Relay event messages") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419150351.358-1-michal.wajdeczko@intel.com	2024-04-22 20:08:04 +02:00
Michal Wajdeczko	d3b80dc7aa	drm/xe/pf: Fix xe_gt_sriov_pf_config_print_available_ggtt() This function is using internal helper pf_get_spare_ggtt() that expects PF's master mutex to be locked. Fix that. Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419141000.314-1-michal.wajdeczko@intel.com	2024-04-22 19:50:47 +02:00
Rodrigo Vivi	783d6cdc82	drm/xe: Kill xe_device_mem_access_{get*,put} Let's simply convert all the current callers towards direct xe_pm_runtime access and remove this extra layer of indirection. No functional change is expected with this patch since xe_mem_access_get was already using the xe_pm_runtime_get_noresume at this point. v2: Convert all the current callers instead of a big refactor at once. v3: - Rebased - Squashed the GSC/HDCP - Added a new case: sriov_pf_policy - Improved commit message to highlight that there's no functional change in this patch. Reviewed-by: Matthew Auld <matthew.auld@intel.com> #v2 Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240418143049.43231-1-rodrigo.vivi@intel.com	2024-04-22 09:03:09 -04:00
Matt Roper	62422b7be4	drm/xe: Define all possible engines in media IP descriptors Rather than trying to identify exactly which engines are available on each platform in the IP descriptor, just include the list of all media engines that the IP could theoretically support (i.e., 8 VCS + 4 VECS). We still rely on the media fuse registers to tell us which specific engine instances are actually present on a given platform, so there shouldn't be any functional change. This will help prevent mistakes with engine numbering (for example ambiguity about whether the 2nd VCS engine on a platform with exactly two engines is numbered "VCS1" or "VCS2") and will also future-proof the code a bit more in case new SKUs or platform refreshes extend the engine list in the future. Note that the media fuse register technically has an 8-bit field for VECS engine presence starting on Xe2. However there's still no MMIO register range reserved for VE engines above VECS3, so VE0-VE3 is still consider the "maximum" VE engine mask that the driver can support for now. Bspec: 52614, 52615, 62567 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417152621.3357990-2-matthew.d.roper@intel.com	2024-04-19 07:26:28 -07:00
Rodrigo Vivi	7af6b11626	drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref In the past, the noresume function was used by the GEM code to ensure wakelocks were held and bump its usage. This is no longer the case and this function was totally unused until it started to be used again by display with commit 77e619a82fc3 ("drm/i915/display: convert inner wakeref get towards get_if_in_use") However, on the display code, most of the callers are using the raw wakeref, rather then the wakelock version. What caused a major regression caught by CI. Another option to this patch is to go with the original plan and use the get_if_in_use variant in the display code, what is enough to fulfil our needs. Then, an extra patch to delete the unused _noresume variant. v2: Keep grabbing wakelock but only assert for wakeref. (Imre) Cc: Imre Deak <imre.deak@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 77e619a82fc3 ("drm/i915/display: convert inner wakeref get towards get_if_in_use") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10875 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240418223756.68427-1-rodrigo.vivi@intel.com	2024-04-19 11:27:14 +03:00
Ashutosh Dixit	5bc9de065b	drm/i915/hwmon: Get rid of devm When both hwmon and hwmon drvdata (on which hwmon depends) are device managed resources, the expectation, on device unbind, is that hwmon will be released before drvdata. However, in i915 there are two separate code paths, which both release either drvdata or hwmon and either can be released before the other. These code paths (for device unbind) are as follows (see also the bug referenced below): Call Trace: release_nodes+0x11/0x70 devres_release_group+0xb2/0x110 component_unbind_all+0x8d/0xa0 component_del+0xa5/0x140 intel_pxp_tee_component_fini+0x29/0x40 [i915] intel_pxp_fini+0x33/0x80 [i915] i915_driver_remove+0x4c/0x120 [i915] i915_pci_remove+0x19/0x30 [i915] pci_device_remove+0x32/0xa0 device_release_driver_internal+0x19c/0x200 unbind_store+0x9c/0xb0 and Call Trace: release_nodes+0x11/0x70 devres_release_all+0x8a/0xc0 device_unbind_cleanup+0x9/0x70 device_release_driver_internal+0x1c1/0x200 unbind_store+0x9c/0xb0 This means that in i915, if use devm, we cannot gurantee that hwmon will always be released before drvdata. Which means that we have a uaf if hwmon sysfs is accessed when drvdata has been released but hwmon hasn't. The only way out of this seems to be do get rid of devm_ and release/free everything explicitly during device unbind. v2: Change commit message and other minor code changes v3: Cleanup from i915_hwmon_register on error (Armin Wolf) v4: Eliminate potential static analyzer warning (Rodrigo) Eliminate fetch_and_zero (Jani) v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10366 Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417145646.793223-1-ashutosh.dixit@intel.com	2024-04-18 17:59:31 -07:00
Himal Prasad Ghimiray	c086bfc6ff	drm/xe/pm: Capture errors and handle them xe_pm_init may encounter failures for various reasons, such as a failure in initializing drmm_mutex, or when dealing with a d3cold-capable device for vram_threshold sysfs creation and setting default threshold. Presently, all these potential failures are disregarded. Move d3cold.lock initialization to xe_pm_init_early and cause driver abort if mutex initialization has failed. For xe_pm_init failures cleanup the driver and return error code -v2 Make mutex init cleaner (Lucas) Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-8-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:30:24 -07:00
Himal Prasad Ghimiray	e3d0839aa5	drm/xe/tile: Abort driver load for sysfs creation failure Ensure that the status of all tile associated sysfs entries creation is relayed to xe_tile_init_noalloc, leading to a driver load abort if any sysfs creation failures occur. -v2 Avoid unnecessary warn/error messages. (Lucas) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-7-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:30:17 -07:00
Himal Prasad Ghimiray	9c3f72a342	drm/xe/gt: Abort driver load for sysfs creation failure Instead of allowing the driver to load with incomplete sysfs entries in case of sysfs creation failure, we should terminate the driver loading. This change ensures that the status of all gt associated sysfs entries creation is relayed to xe_gt_init, leading to a driver load abort if any sysfs creation failures occur. -v2 use err_force_wake label instead of new. (Lucas) Avoid unnecessary warn/error messages. (Lucas) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-6-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:26:34 -07:00
Himal Prasad Ghimiray	6e40f142c5	drm/xe: Return NULL in case of drmm_add_action_or_reset failure In case of drmm_add_action_or_reset failure return NULL and no need to print warning messages as they will be printed implictly. Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-5-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:26:34 -07:00
Himal Prasad Ghimiray	22bf0bc04d	drm/xe: call free_gsc_pkt only once on action add failure The drmm_add_action_or_reset function automatically invokes the action (free_gsc_pkt) in the event of a failure; therefore, there's no necessity to call it within the return check. -v2 Fix commit message. (Lucas) Fixes: d8b1571312b7 ("drm/xe/huc: HuC authentication via GSC") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-4-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:26:34 -07:00
Himal Prasad Ghimiray	a99641e387	drm/xe: Remove sysfs only once on action add failure The drmm_add_action_or_reset function automatically invokes the action (sysfs removal) in the event of a failure; therefore, there's no necessity to call it within the return check. Modify the return type of xe_gt_ccs_mode_sysfs_init to int, allowing the caller to pass errors up the call chain. Should sysfs creation or drmm_add_action_or_reset fail, error propagation will prompt a driver load abort. -v2 Edit commit message (Nikula/Lucas) use err_force_wake label instead of new. (Lucas) Avoid unnecessary warn/error messages. (Lucas) Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-3-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:26:34 -07:00
Himal Prasad Ghimiray	5a73dd61a0	drm/xe: Simplify function return using drmm_add_action_or_reset() Instead of assigning the value of drmm_add_action_or_reset() to err and returning err in case of failure and 0 in case of success, simply return the result of drmm_add_action_or_reset(). -v2: cleanup in xe_display too. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-2-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:26:34 -07:00
Gustavo Sousa	cba22c911c	drm/xe/xe2lpg: Extend Wa_14020338487 Wa_14020338487 also applies to Xe2_LPG. Replicate the existing entry to one specific for Xe2_LPG. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417212501.312346-1-gustavo.sousa@intel.com	2024-04-18 11:01:56 -07:00
Rodrigo Vivi	f9116f658a	drm/xe: Add outer runtime_pm protection to xe_live_ktest@xe_dma_buf Any kunit doing any memory access should get their own runtime_pm outer references since they don't use the standard driver API entries. In special this dma_buf from the same driver. Found by pre-merge CI on adding WARN calls for unprotected inner callers: <6> [318.639739] # xe_dma_buf_kunit: running xe_test_dmabuf_import_same_driver <4> [318.639957] ------------[ cut here ]------------ <4> [318.639967] xe 0000:4d:00.0: Missing outer runtime PM protection <4> [318.640049] WARNING: CPU: 117 PID: 3832 at drivers/gpu/drm/xe/xe_pm.c:533 xe_pm_runtime_get_noresume+0x48/0x60 [xe] Cc: Matthew Auld <matthew.auld@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-10-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	e1feade077	drm/xe: Ensure all the inner access are using the _noresume variant At this point mem_access references should be only used as inner points of the execution and a get with synchronous resume previously called at an outer point. So, before killing mem_acces in favor of direct accsess, let's ensure that we first convert them towards the new _noresume variant that will WARN us if no inner caller happened. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-9-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	16b57c90bb	drm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active Now that assert_mem_access is relying directly on the pm_runtime state instead of the counters, there's no reason why we cannot use the pm_runtime functions directly. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-8-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	a382291017	drm/xe: Removing extra mem_access protection from runtime pm This is not needed any longer, now that we have all the protection in place with the runtime pm itself. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	fdea94a4c2	drm/xe: Convert xe_gem_fault to use direct xe_pm_runtime calls The gem page fault is one of the outer bound protections where we want to ensure that the hardware is in D0 before proceeding with memory access. Let's convert it towards the xe_pm_runtime functions directly so we can then convert the mem_access to be inner protection only and then Kill it for good. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	152c37bf40	drm/xe: Remove useless mem_access during probe xe_pm_init is the very last thing during the xe_pci_probe(), hence these protections are useless from the point of view of ensuring that the device is awake. Let's remove it so we continue towards the goal of killing xe_device_mem_access. v2: Adding more cases v3: Provide a separate fix for xe_tile_init_noalloc return (Matt) Adding a new case where display HDCP init calls which are also called at display probe time. Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-5-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:39 -04:00
Rodrigo Vivi	8ae84a2744	drm/xe: Move lockdep protection from mem_access to xe_pm_runtime The mem_access itself is not holding any lock, but attempting to train lockdep with possible scarring locks happening during runtime pm. We are going soon to kill the mem_access get and put helpers in favor of direct xe_pm_runtime calls, so let's just move this lock around to where it now belongs. v2: s/lockdep_training/lockdep_prime (Matt Auld) Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:39 -04:00
Rodrigo Vivi	77e619a82f	drm/i915/display: convert inner wakeref get towards get_if_in_use This patch brings no functional change. Since at this point of the code we are already asserting a wakeref was held, it means that we are with runtime_pm 'in_use' and in practical terms we are only bumping the pm_runtime usage counter and moving on. However, xe driver has a lockdep annotation that warned us that if a sync resume was actually called at this point, we could have a deadlock because we are inside the power_domains->lock locked area and the resume would call the irq_reset, which would also try to get the power_domains->lock. For this reason, let's convert this call to a safer option and calm lockdep on. v2: use _noresume variant instead of get_in_use (Ville, Imre) Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Imre Deak <imre.deak@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:39 -04:00
Rodrigo Vivi	82e279a49a	drm/xe: Introduce intel_runtime_pm_get_noresume at compat-i915-headers for display The i915-display will start using the intel_runtime_pm_noresume. So we need to add the compat header before it. Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:39 -04:00
Rodrigo Vivi	cbb6a7413b	drm/xe: Introduce xe_pm_runtime_get_noresume for inner callers Let's ensure that we have an option for inner callers that will raise WARN if device is not active and not protected by outer callers. Make this also a void function forcing every caller to unconditionally put the reference back afterwards. This will be very important for cases where we want to hold the reference before scheduling a work in a queue. Then the work job will be responsible for putting it back. While at this, already convert a case from mem_access_get_ongoing where it is not checking for the reference and put it back, what would cause the underflow. v2: Fix identation. v3: Convert equivalent missing put from mem_access towards pm_runtime. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:39 -04:00
Vinay Belgaumkar	2817a1f1bf	drm/xe/lnl: Apply GuC Wa_13011645652 Enable WA for a bug that could cause the C6 state machine to hang during RC6 exit. v2: Add comment clarifying the WA (John H) v3: Add more details to the comment (John H) Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com	2024-04-17 15:21:12 -07:00
Matthew Auld	8eae42f175	drm/xe/vm: don't include xe_gt.h clangd complains here, since nothing in xe_gt.h seems to be needed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-6-matthew.auld@intel.com	2024-04-17 13:38:32 +01:00
Matthew Auld	5b259c0d1d	drm/xe/vm: drop vm->destroy_work Now that we no longer grab the usm.lock mutex (which might sleep) it looks like it should be safe to directly perform xe_vm_free when vm refcount reaches zero, instead of punting that off to some worker. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-5-matthew.auld@intel.com	2024-04-17 13:38:32 +01:00
Matthew Auld	83967c5732	drm/xe/vm: prevent UAF with asid based lookup The asid is only erased from the xarray when the vm refcount reaches zero, however this leads to potential UAF since the xe_vm_get() only works on a vm with refcount != 0. Since the asid is allocated in the vm create ioctl, rather erase it when closing the vm, prior to dropping the potential last ref. This should also work when user closes driver fd without explicit vm destroy. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-4-matthew.auld@intel.com	2024-04-17 13:38:11 +01:00
Matthew Auld	48b1f11c95	drm/xe/stolen: ignore first page for FBC We have observed underruns on some platforms if the CFB offset is within the first page of stolen. Just like i915 skip the first page. v2 (Maarten) - Also align the start. BSpec: 50214 Reported-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412150301.273344-4-matthew.auld@intel.com	2024-04-17 13:09:47 +01:00
Matthew Auld	9890821f3e	drm/xe/stolen: lower the default alignment No need to be so aggressive here. The upper layers will already apply the needed alignment, plus some allocations might wish to skip it. Main issue is that we might want to have start/end bias range which doesn't match the default alignment which is rejected by the allocator. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412150301.273344-3-matthew.auld@intel.com	2024-04-17 13:09:46 +01:00
Lu Yao	67a9e86dc1	drm/xe: select X86_PLATFORM_DEVICES when ACPI_WMI is selected ACPI_WMI is a subitem of X86_PLATFORM_DEVICES. And X86_PLATFORM_DEVICES is not selected in the current Kconfig, and may cause Kconfig warnings: WARNING: unmet direct dependencies detected for ACPI_WMI Depends on [n]: X86_PLATFORM_DEVICES [=n] && ACPI [=y] Selected by [m]: - DRM_XE [=m] && HAS_IOMEM [=y] && DRM [=m] && PCI [=y] && MMU [=y] && (m && MODULES [=y] \|\| y && KUNIT [=y]=y) && X86 [=y] && ACPI [=y] Signed-off-by: Lu Yao <yaolu@kylinos.cn> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240415025215.15811-1-yaolu@kylinos.cn Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-16 11:52:12 -07:00
John Harrison	09700beeba	drm/xe/bmg: Some LNL workarounds also apply to BMG Enable a couple of existing workarounds for a new platform. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-3-John.C.Harrison@Intel.com	2024-04-16 10:50:37 -07:00

1 2 3 4 5 ...

1265376 Commits