linux

iv/linux

Author	SHA1	Message	Date
Michal Wajdeczko	4edadc41a3	drm/xe/vf: Use register values obtained from the PF As part of the its initialization, the VF driver has already obtained a list of the runtime (fuse) register values from the PF driver. When VF driver is attempting to read register that is inaccessible to the VF, use the values from this list instead. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-2-michal.wajdeczko@intel.com	2024-05-24 10:02:26 +02:00
John Harrison	b0ac1b42db	drm/xe/guc: Port over the slow GuC loading support from i915 GuC loading can take longer than it is supposed to for various reasons. So add in the code to cope with that and to report it when it happens. There are also many different reasons why GuC loading can fail, so add in the code for checking for those and for reporting issues in a meaningful manner rather than just hitting a timeout and saying 'fail: status = %x'. Also, remove the 'FIXME' comment about an i915 bug that has never been applicable to Xe! v2: Actually report the requested and granted frequencies rather than showing granted twice (review feedback from Badal). v3: Locally code all the timeout and end condition handling because a helper function is not allowed (review feedback from Lucas/Rodrigo). v4: Add more documentation comments and rename a define to add units (review feedback from Lucas). v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from Lucas) and rebase (no more return value from guc_wait_ucode). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com	2024-05-23 10:55:31 -07:00
John Harrison	fcc8f80517	drm/xe: Make read_perf_limit_reasons globally accessible Other driver code beyond the sysfs interface wants to know about throttling. So make the query function globally accessible. v2: Revert include order change (review feedback from Lucas) v3: Remove '_sysfs' from throttle file names and keep limit query in the same file rather than moving elsewhere (review feedback from Rodrigo). v4: Correct #include while renaming header file (review feedback from Lucas). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-2-John.C.Harrison@Intel.com	2024-05-23 10:55:28 -07:00
José Roberto de Souza	83ee002df0	drm/xe: Nuke simple error capture This error capture prints into dmesg HW state when a gpu hang happens. It was useful when we did not had devcoredump, now it is a incompleted version of devcoredump that has potential to flood dmesg. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522203431.191594-1-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 13:38:26 -04:00
José Roberto de Souza	b10d0c5e9d	drm/xe: Add process name to devcoredump Process name help us track what application caused the gpug hang, this is crucial when running several applications at the same time. v2: - handle Xe KMD exec_queues without VM v3: - use get_pid_task() (suggested by Nirmoy) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522201203.145403-1-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 13:37:56 -04:00
Dr. David Alan Gilbert	e8ac8048a7	drm/xe: remove unused struct 'xe_gt_desc' 'xe_gt_desc' is unused since commit 1e6c20be6c83 ("drm/xe: Drop extra_gts[] declarations and XE_GT_TYPE_REMOTE"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240522175840.382107-1-linux@treblig.org Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 13:33:55 -04:00
Rodrigo Vivi	f91806033f	drm/xe: Enable D3Cold on 'low' VRAM utilization Now that we eliminated all the mem_access get/put with its locking issues from the inner calls of migration, we can allow D3Cold. Enable it when VRAM utilization is lower then 300Mb. On higher utilization we only allow D3hot so we don't increase so much the latency on runtime resume due to the memory restoration. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:08 -04:00
Rodrigo Vivi	8d490e019b	drm/xe: Stop checking for power_lost on D3Cold GuC reset status is not reliable for this purpose and it is once in a while ending up in a situation of D3Cold, where power_reset is false and without the proper memory restoration the GuC reload and Display will fail to come back from D3Cold. So, let's do a full restoration of everything if we have a risk of losing power, without further optimizations. v2: also remove the gut_in_reset function (Anshuman) Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:07 -04:00
Rodrigo Vivi	e7b180b220	drm/xe: Prepare display for D3Cold Prepare power-well and DC handling for a full power lost during D3Cold, then sanitize it upon D3->D0. Otherwise we get a bunch of state mismatch. Ideally we could leave DC9 enabled and wouldn't need to move DC9->DC0 on every runtime resume, however, the disable_DC is part of the power-well checks and intrinsic to the dc_off power well. In the future that can be detangled so we can have even bigger power savings. But for now, let's focus on getting a D3Cold, which saves much more power by itself. v2: create new functions to avoid full-suspend-resume path, which would result in a deadlock between xe_gem_fault and the modeset-ioctl. v3: Only avoid the full modeset to avoid the race, for a more robust suspend-resume. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-5-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:07 -04:00
Rodrigo Vivi	73ba282e7f	drm/xe: Relax runtime pm protection around VM In the regular use case scenario, user space will create a VM, and keep it alive for the entire duration of its workload. For the regular desktop cases, it means that the VM is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Limit the VM protection solely for long-running workloads that are not protected by the scheduler references. By design, run_job for long-running workloads returns NULL and the scheduler drops all the references of it, hence protecting the VM for this case is necessary. v2: Update commit message to a more imperative language and to reflect why the VM protection is really needed. Also add a comment in the code to let the reason visbible. v3: Remove vma_access case and the mentions to mmap. Mmap cases are already protected by the gem page fault. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:53:50 -04:00
Rodrigo Vivi	ad1e331fc4	drm/xe: Relax runtime pm protection during execution Limit the protection only during moments of actual job execution, and introduce protection for guc submit fini, which is currently unprotected due to the absence of exec_queue life protection. In the regular use case scenario, user space will create an exec queue, and keep it alive to reuse that until it is done with that kind of workload. For the regular desktop cases, it means that the exec_queue is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:52:56 -04:00
Rodrigo Vivi	967c5d7c64	drm/xe: Fix xe_pm_runtime_get_if_in_use documentation Let's be clear on what it is actually doing and align with xe_pm_runtime_get_if_active doc style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:52:56 -04:00
Rodrigo Vivi	46edb0a3eb	drm/xe: Fix xe_pm_runtime_get_if_active return Current callers of this function are already taking the result to a boolean and using in an if. It might be a problem because current function might return negative error codes on failure, without increasing the reference counter. In this scenario we could end up with extra 'put' call ending in unbalanced scenarios. Let's fix it, while aligning with the current xe_pm_get_if_in_use style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:52:56 -04:00
Niranjana Vishwanathapura	40672b792a	drm/xe: Properly handle alloc_guc_id() failure Release the submission_state lock if alloc_guc_id() fails. v2: Add Fixes tag and CC stable kernel Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521201711.4934-1-niranjana.vishwanathapura@intel.com	2024-05-22 12:33:37 -07:00
Michal Wajdeczko	3ec3b42752	drm/xe/uc: Don't emit false error if running in execlist mode When running in execlist mode (using force_execlist=1 modparam) we incorrectly select the error path in xe_uc_init(), leading to an unwanted error message like this: [ ] xe 0000:00:00.0: [drm] ERROR GT0: Failed to initialize uC (0000000000000000) Fix that by doing early return like we do in other similar cases. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521114857.712-1-michal.wajdeczko@intel.com	2024-05-22 18:26:22 +02:00
Matthew Auld	dc51c682dd	drm/xe/display: move device_remove over to drmm i915 display calls this when releasing the drm_device, match this also in xe by using drmm. intel_display_device_remove() is freeing purely software state for the drm_device. v2: fix build error Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-36-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	48d74a0a45	drm/xe/display: stop calling domains_driver_remove twice Unclear why we call this twice. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-35-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	5b6937b65e	drm/xe/display: move display fini stuff to devm Match the i915 display handling here with calling both no_irq and noaccel when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-34-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	c711741978	drm/xe: reset mmio mappings with devm Set our various mmio mappings to NULL. This should make it easier to catch something rogue trying to mess with mmio after device removal. For example, we might unmap everything and then start hitting some mmio address which has already been unmamped by us and then remapped by something else, causing all kinds of carnage. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	a0b834c895	drm/xe/mmio: move mmio_fini over to devm Not valid to touch mmio once the device is removed, so make sure we unmap on removal and not just when driver instance goes away. Also set the mmio pointers to NULL to hopefully catch such issues more easily. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-32-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	cd506a33b0	drm/xe: make gt_remove use devm No need to hand roll the onion unwind here, just move gt_remove over to devm which will already have the correct ordering. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-31-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	1bd985ff9f	drm/xe/gt: break out gt_fini into sw vs hw state Have a cleaner separation between hw vs sw. v2: Fix missing return Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-30-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	cf13ae6b81	drm/xe/coredump: move over to devm Here we are using drmm to ensure we release the coredump when unloading the module, however the coredump is very much tied to the struct device underneath. We can see this when we hotunplug the device, for which we have already got a coredump attached. In such a case the coredump still remains and adding another is not possible. However we still register the release action via xe_driver_devcoredump_fini(), so in effect two or more releases for one dump. The other consideration is that the coredump state is embedded in the xe_driver instance, so technically once the drmm release action fires we might free the coredumpe state from a different driver instance, assuming we have two release actions and they can race. Rather use devm here to remove the coredump when the device is released. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-29-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	cee70645a7	drm/xe/device: move xe_device_sanitize over to devm Disable GuC submission when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-28-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	bc54f42c0e	drm/xe/device: move flr to devm Should be called when driver is removed, not when this particular driver instance is destroyed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-27-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	bbc9651fe9	drm/xe/irq: move irq_uninstall over to devm Makes sense to trigger this when the device is removed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-26-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	6d95155ae7	drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Make it clear that is about cleaning up the HW/FW side, and not software state. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-25-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	c9f422de07	drm/xe/guc_pc: move pc_fini to devm Here we are touching the HW/GuC and presumably this should happen when the device is removed. Currently if you hotunplug the device this is skipped if there is already open driver instance. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-24-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	19fa7aa4d2	drm/xe/guc: s/guc_fini/guc_fini_hw/ Make it clear that is about cleaning up the HW/FW side, and not software state. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-23-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	241f5d25ff	drm/xe/guc: move guc_fini over to devm Make sure to actually call this when the device is removed. Currently we only trigger it when the driver instance goes away, but that doesn't work too well with hotunplug, since device can be removed and re-probed with a new driver instance, where the guc_fini() is called too late. Move the fini over to devm to ensure this is called when device is removed. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-22-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	3a1c27cd01	drm/xe/ggtt: use drm_dev_enter to mark device section Device can be hotunplugged before we start destroying gem objects. In such a case don't touch the GGTT entries, trigger any invalidations or mess around with rpm. This should already be taken care of when removing the device, we just need to take care of dealing with the software state, like removing the mm node. v2: (Andrzej) - Avoid some duplication by tracking the bound status and checking that instead. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-21-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	c60f91bbc4	drm/xe: covert sysfs over to devm Hotunplugging the device seems to result in stuff like: kobject_add_internal failed for tile0 with -EEXIST, don't try to register things with the same name in the same directory. We only remove the sysfs as part of drmm, however that is tied to the lifetime of the driver instance and not the device underneath. Attempt to fix by using devm for all of the remaining sysfs stuff related to the device. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1667 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1432 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-20-matthew.auld@intel.com	2024-05-22 13:22:38 +01:00
Matthew Auld	4465b8c6d3	drm/xe/pci: remove broken driver_release This is quite broken since we are nuking the pdev link to the private driver struct, but note here that driver_release is called when the drm_device is released (poor mans drmm), which can be long after the device has been removed. So here what we are actually doing is nuking the pdev link for what is potentially bound to a different drm_device. If that happens before our pci remove callback is triggered (for the new drm_device) we silently exit and skip some important cleanup steps, resulting in hilarity. There should be no reason to implement driver_release, when we already have nicer stuff like drmm, so just remove completely. The actual pdev link is already nuked when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-19-matthew.auld@intel.com	2024-05-22 13:22:38 +01:00
Michal Wajdeczko	d8a417c4bd	drm/xe/vf: Custom GuC initialization if VF The GuC firmware is loaded and initialized by the PF driver. Make sure VF drivers only perform permitted operations. For submission initialization, use number of GuC context IDs from self config. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-3-michal.wajdeczko@intel.com	2024-05-22 12:53:45 +02:00
Michal Wajdeczko	7065b19bd5	drm/xe/guc: Allow to initialize submission with limited set of IDs While PF and native drivers may initialize submission code to use all available GuC contexts IDs, the VF driver may only use limited number of IDs. Update init function to accept number of context IDs available for use. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-2-michal.wajdeczko@intel.com	2024-05-22 12:53:43 +02:00
Michal Wajdeczko	f7e20cfb59	drm/xe: Cleanup xe_mmio.h We don't need <linux/delay.h> include since commit 5c09bd6ccd41 ("drm/xe/mmio: Move xe_mmio_wait32() to xe_mmio.c"). We don't need <linux/io-64-nonatomic-lo-hi.h> include since commit 54c659660d63 ("drm/xe: Make xe_mmio_read\|write() functions non-inline"). And since commit 924e6a9789a0 ("drm/xe/uapi: Remove MMIO ioctl") we don't need forward declarations of drm_device and drm_file. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-4-michal.wajdeczko@intel.com	2024-05-22 12:15:10 +02:00
Michal Wajdeczko	26a22952c8	drm/xe: Don't rely on indirect includes from xe_mmio.h These compilation units use udelay() or some GT oriented printk functions without explicitly including proper header files, and relying on #includes from the xe_mmio.h instead. Fix that. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-3-michal.wajdeczko@intel.com	2024-05-22 12:11:26 +02:00
Michal Wajdeczko	31a278b5a1	drm/i915/display: Add missing include to intel_vga.c This compilation unit uses udelay() function without including it's header file. Fix that to break dependency on other code. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-2-michal.wajdeczko@intel.com	2024-05-22 12:11:25 +02:00
Michal Wajdeczko	a6bc7cda37	drm/xe: Fix xe_guc_pc.h Prefer forward declaration over #include xe_guc_pc_types.h Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-5-michal.wajdeczko@intel.com	2024-05-22 12:03:56 +02:00
Michal Wajdeczko	de1429a99f	drm/xe: Fix xe_huc.h Prefer forward declaration over #include xe_huc_types.h Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-4-michal.wajdeczko@intel.com	2024-05-22 12:03:55 +02:00
Michal Wajdeczko	2291c09110	drm/xe: Fix xe_gsc.h Prefer forward declaration over #include xe_gsc_types.h Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-3-michal.wajdeczko@intel.com	2024-05-22 12:03:54 +02:00
Michal Wajdeczko	bdc9abed51	drm/xe: Fix xe_uc.h Prefer forward declaration over #include xe_uc_types.h Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521102828.668-2-michal.wajdeczko@intel.com	2024-05-22 12:03:53 +02:00
Nirmoy Das	01d71dff61	drm/xe/tests: Use uninterruptible VM lock Interruptible lock can return error and needed a return value check. This test should finish quick enough so use a uninterruptible lock instead. Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521102715.22700-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-05-21 22:18:51 +02:00
Nirmoy Das	735940f999	drm/xe: Add warn when level can not be zero. At xe_pt_zap_ptes_entry() and xe_pt_stage_unbind_entry, the level cannot be 0. Therefore, add an independent check for the level. Since the level cannot be zero at this point, there is no need to check for `is_compact`, so remove that instead. Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521103623.11645-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-05-21 22:09:20 +02:00
Francois Dugast	995f7dafd1	drm/xe/uapi: Expose the L3 bank mask The L3 bank mask is already generated and stored internally with the rest of the GT topology. In user space, the compute runtime now needs this information to be added to the device properties therefore the topology mask query is extended to provide a new mask which represents the L3 banks enabled on the GT. The changes in the compute runtime are ready and approved, see link below. v2: Rewrite commit message and add a link to the compute runtime PR (Francois Dugast) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Robert Krzemien <robert.krzemien@intel.com> Cc: Mateusz Jablonski <mateusz.jablonski@intel.com> Link: https://github.com/intel/compute-runtime/pull/722 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240416145037.7-2-francois.dugast@intel.com	2024-05-21 09:01:40 -07:00
Lucas De Marchi	188ced1e0f	drm/xe/client: Print runtime to fdinfo Print the accumulated runtime for client when printing fdinfo. Each time a query is done it first does 2 things: 1) loop through all the exec queues for the current client and accumulate the runtime, per engine class. CTX_TIMESTAMP is used for that, being read from the context image. 2) Read a "GPU timestamp" that can be used for considering "how much GPU time has passed" and that has the same unit/refclock as the one recording the runtime. RING_TIMESTAMP is used for that via MMIO. Since for all current platforms RING_TIMESTAMP follows the same refclock, just read it once, using any first engine available. This is exported to userspace as 2 numbers in fdinfo: drm-cycles-<class>: <RUNTIME> drm-total-cycles-<class>: <TIMESTAMP> Userspace is expected to collect at least 2 samples, which allows to know the client engine busyness as per: RUNTIME1 - RUNTIME0 busyness = --------------------- T1 - T0 Since drm-cycles-<class> always starts at 0, it's also possible to know if and engine was ever used by a client. It's expected that userspace will read any 2 samples every few seconds. Given the update frequency of the counters involved and that CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap around (assuming 100% utilization) after ~200s. The wraparound is not perceived by userspace since it's just accumulated for all the exec_queues in a 64-bit counter) but the measurement will not be accurate if the samples are too far apart. This could be mitigated by adding a workqueue to accumulate the counters every so often, but it's additional complexity for something that is done already by userspace every few seconds in tools like gputop (from igt), htop, nvtop, etc, with none of them really defaulting to 1 sample per minute or more. Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-9-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-21 06:33:40 -07:00
Lucas De Marchi	6aa18d7436	drm/xe: Add helper to return any available hw engine Get the first available engine from a gt, which helps in the case any engine serves as a context, like when reading RING_TIMESTAMP. Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-8-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-21 06:33:40 -07:00
Lucas De Marchi	baa1486552	drm/xe: Cache data about user-visible engines gt->info.engine_mask used to indicate the available engines, but that is not always true anymore: some engines are reserved to kernel and some may be exposed as a single engine (e.g. with ccs_mode). Runtime changes only happen when no clients exist, so it's safe to cache the list of engines in the gt and update that when it's needed. This will help implementing per client engine utilization so this (mostly constant) information doesn't need to be re-calculated on every query. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-21 06:33:40 -07:00
Umesh Nerlige Ramappa	6109f24f87	drm/xe: Add helper to accumulate exec queue runtime Add a helper to accumulate per-client runtime of all its exec queues. This is called every time a sched job is finished. v2: - Use guc_exec_queue_free_job() and execlist_job_free() to accumulate runtime when job is finished since xe_sched_job_completed() is not a notification that job finished. - Stop trying to update runtime from xe_exec_queue_fini() - that is redundant and may happen after xef is closed, leading to a use-after-free - Do not special case the first timestamp read: the default LRC sets CTX_TIMESTAMP to zero, so even the first sample should be a valid one. - Handle the parallel submission case by multiplying the runtime by width. v3: Update comments Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-21 06:33:40 -07:00
Lucas De Marchi	f2f6b667c6	drm/xe: Add helper to capture engine timestamp Just like CTX_TIMESTAMP is used to calculate runtime, add a helper to get the timestamp for the engine so it can be used to calculate the "engine time" with the same unit as the runtime is recorded. Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-21 06:33:40 -07:00

1 2 3 4 5 ...

1268006 Commits