1283570 Commits

Author SHA1 Message Date
Dave Airlie
412dbc662e Two fixes for v3d to fix an array indexing on newer V3D revisions.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZpoYMQAKCRDj7w1vZxhR
 xVs2AP0QGVNgOAY3ZVSrtPViz9EFqEwxMipGs7DqQHQUyUKy2gEA5vxb12Yu3Akp
 HjCO+1D734fKkSti3Asvw04/WgCntwg=
 =4xEO
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2024-07-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Two fixes for v3d to fix an array indexing on newer V3D revisions.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719-emerald-newt-of-skill-89b54a@houat
2024-07-22 13:03:03 +10:00
Dave Airlie
78e6e468e1 - Xe_exec ioctl minor fix on sync entry cleanup upon error (Ashutosh)
- SRIOV: limit VF LMEM provisioning (Michal)
 - Wedge mode fixes (Brost)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmaZOW8ACgkQ+mJfZA7r
 E8qN/Af+OftrVNA+eH5VvJcIdgjqSuNdAzIP7xC9yZAM43we1YnBzRoU2Z9vsw9D
 t61fCUWsqzNw+/peep1CpFnlk2cDqyV9vUfJQd2mKreoWIvQ4J2+Cv+qLGZ3xazh
 hxYznXN9c5FXB9ODPvcGUJfWyvwBdJWyALZB8kwELtK7i6RkFQupjsWxyhPvhjMu
 NNmIn8Iiducb6UjKDeYEILQdZHxBLgf6bbqQfGj548S7dZijZd//ap0uXoA/Zetn
 xpNYkjoo1CLgrAiWc4IcETMCKwK7pF0/Y0WvRdpAzpzfiuhlymsHpQ9Q2gCPm58S
 vQdSc/qIPX4/bXKME3VtdtBRjufdFg==
 =VxZY
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-next-fixes-2024-07-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

- Xe_exec ioctl minor fix on sync entry cleanup upon error (Ashutosh)
- SRIOV: limit VF LMEM provisioning (Michal)
- Wedge mode fixes (Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zpk6CI0FDoTJwkSb@intel.com
2024-07-22 11:51:53 +10:00
Dave Airlie
7d4ecf3707 Merge tag 'drm-intel-next-fixes-2024-07-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
- Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak)
- Don't switch the LTTPR mode on an active link [dp] (Imre Deak)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZpjgtowjpUZoHvrl@linux
2024-07-22 10:51:48 +10:00
Matthew Brost
90936a0a4c
drm/xe: Don't suspend device upon wedge
When wedging a device we shouldn't be suspending device as state for
debug will be lost.

Also this appears to not work as the below stack trace pops upon trying
to resume a wedged device:

[  304.245044] INFO: task cat:12115 blocked for more than 151 seconds.
[  304.251333]       Tainted: G        W          6.10.0-rc7-xe+ #3518
[  304.257617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  304.265459] task:cat             state:D stack:13384 pid:12115 tgid:12115 ppid:3986   flags:0x00000006
[  304.265465] Call Trace:
[  304.265467]  <TASK>
[  304.265469]  __schedule+0x3c4/0xdf0
[  304.265478]  schedule+0x3c/0x140
[  304.265481]  rpm_resume+0x1cc/0x740
[  304.265484]  ? __pfx_autoremove_wake_function+0x10/0x10
[  304.265489]  __pm_runtime_resume+0x49/0x80
[  304.265494]  guc_info+0x6b/0xb0 [xe]
[  304.265538]  ? __pfx___drm_printfn_seq_file+0x10/0x10
[  304.265541]  ? __pfx___drm_puts_seq_file+0x10/0x10
[  304.265545]  seq_read_iter+0x111/0x4c0
[  304.265551]  seq_read+0xfc/0x140
[  304.265556]  full_proxy_read+0x58/0x80
[  304.265560]  vfs_read+0xa7/0x360
[  304.265563]  ? find_held_lock+0x2b/0x80
[  304.265568]  ksys_read+0x64/0xe0
[  304.265571]  do_syscall_64+0x68/0x140
[  304.265575]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  304.265578] RIP: 0033:0x7f4254d14992
[  304.265580] RSP: 002b:00007ffc558666f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  304.265583] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f4254d14992
[  304.265584] RDX: 0000000000020000 RSI: 00007f4254ebb000 RDI: 0000000000000003
[  304.265586] RBP: 00007f4254ebb000 R08: 00007f4254eba010 R09: 00007f4254eba010
[  304.265587] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[  304.265588] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[  304.265593]  </TASK>
[  304.265594]
               Showing all locks held in the system:
[  304.265598] 1 lock held by khungtaskd/57:
[  304.265599]  #0: ffffffff8273b860 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x36/0x1c0
[  304.265607] 3 locks held by kworker/6:1/90:
[  304.265610] 1 lock held by in:imklog/547:
[  304.265611]  #0: ffff88810498cd88 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x76/0xc0
[  304.265620] 1 lock held by dmesg/1310:

v2: Drop local 'err' variable (Jonathan)

Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-2-matthew.brost@intel.com
(cherry picked from commit 452bca0edbd0764ca0284239d5438b3edd305ab3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-07-18 10:25:37 -04:00
Matthew Brost
c9474b726b
drm/xe: Wedge the entire device
Wedge the entire device, not just GT which may have triggered the wedge.
To implement this, cleanup the layering so xe_device_declare_wedged()
calls into the lower layers (GT) to ensure entire device is wedged.

While we are here, also signal any pending GT TLB invalidations upon
wedging device.

Lastly, short circuit reset wait if device is wedged.

v2:
 - Short circuit reset wait if device is wedged (Local testing)

Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-1-matthew.brost@intel.com
(cherry picked from commit 7dbe8af13c189f5937e87e9fb924d5bbc49e6f71)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-07-18 10:25:33 -04:00
Michal Wajdeczko
bf07ca963d
drm/xe/pf: Limit fair VF LMEM provisioning
Due to the current design of the BO and VRAM manager, any object
with XE_BO_FLAG_PINNED flag, which the PF driver uses during VF
LMEM provisionining, is created with the TTM_PL_FLAG_CONTIGUOUS
flag, which may cause VRAM fragmentation that prevents subsequent
allocations of larger objects, like fair VF LMEM provisioning.

To avoid such failures, round down fair VF LMEM provisioning size
to next power of two size, to compensate what xe_ttm_vram_mgr is
doing to achieve contiguous allocations.

Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711192320.1198-2-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4c3fe5eae46b92e2fd961b19f7779608352e5368)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-07-18 10:25:27 -04:00
Ashutosh Dixit
408c2f14a5
drm/xe/exec: Fix minor bug related to xe_sync_entry_cleanup
Increment num_syncs after xe_sync_entry_parse() is successful to ensure
the xe_sync_entry_cleanup() logic under "err_syncs" label works correctly.

v2: Use the same pattern as that in xe_vm.c (Matt Brost)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711211203.3728180-1-ashutosh.dixit@intel.com
(cherry picked from commit 43a6faa6d9b5e9139758200a79fe9c8f4aaa0c8d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-07-18 10:25:17 -04:00
Dave Airlie
478a52707b amd-drm-next-6.11-2024-07-12:
amdgpu:
 - RAS fixes
 - SMU fixes
 - GC 12 updates
 - SR-IOV fixes
 - IH 7 updates
 - DCC fixes
 - GC 11.5 fixes
 - DP MST fixes
 - GFX 9.4.4 fixes
 - SMU 14 updates
 - Documentation updates
 - MAINTAINERS updates
 - PSR SU fix
 - Misc small fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZpFf9AAKCRC93/aFa7yZ
 2H4lAP44+2MbaTiQr42ojyuE/CKybMP9km2yxoaEznIQXRT0PwD7BqVq7YEeCix2
 ls2WNi6NA6/cO/aam7na/Q2NHzdBewI=
 =Blew
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-07-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-07-12:

amdgpu:
- RAS fixes
- SMU fixes
- GC 12 updates
- SR-IOV fixes
- IH 7 updates
- DCC fixes
- GC 11.5 fixes
- DP MST fixes
- GFX 9.4.4 fixes
- SMU 14 updates
- Documentation updates
- MAINTAINERS updates
- PSR SU fix
- Misc small fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240712171637.2581787-1-alexander.deucher@amd.com
2024-07-18 09:20:00 +10:00
Imre Deak
509580fad7 drm/i915/dp: Don't switch the LTTPR mode on an active link
Switching to transparent mode leads to a loss of link synchronization,
so prevent doing this on an active link. This happened at least on an
Intel N100 system / DELL UD22 dock, the LTTPR residing either on the
host or the dock. To fix the issue, keep the current mode on an active
link, adjusting the LTTPR count accordingly (resetting it to 0 in
transparent mode).

v2: Adjust code comment during link training about reiniting the LTTPRs.
   (Ville)

Fixes: 7b2a4ab8b0ef ("drm/i915: Switch to LTTPR transparent mode link training")
Reported-and-tested-by: Gareth Yu <gareth.yu@intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10902
Cc: <stable@vger.kernel.org> # v5.15+
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708190029.271247-3-imre.deak@intel.com
(cherry picked from commit 211ad49cf8ccfdc798a719b4d1e000d0a8a9e588)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2024-07-16 08:14:29 +00:00
Imre Deak
d13e2a6e95 drm/i915/dp: Reset intel_dp->link_trained before retraining the link
Regularly retraining a link during an atomic commit happens with the
given pipe/link already disabled and hence intel_dp->link_trained being
false. Ensure this also for retraining a DP SST link via direct calls to
the link training functions (vs. an actual commit as for DP MST). So far
nothing depended on this, however the next patch will depend on
link_trained==false for changing the LTTPR mode to non-transparent.

Cc: <stable@vger.kernel.org> # v5.15+
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708190029.271247-2-imre.deak@intel.com
(cherry picked from commit a4d5ce61765c08ab364aa4b327f6739b646e6cfa)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2024-07-16 08:14:26 +00:00
Maíra Canal
1fe1c66274
drm/v3d: Fix Indirect Dispatch configuration for V3D 7.1.6 and later
`args->cfg[4]` is configured in Indirect Dispatch using the number of
batches. Currently, for all V3D tech versions, `args->cfg[4]` equals the
number of batches subtracted by 1. But, for V3D 7.1.6 and later, we must not
subtract 1 from the number of batches.

Implement the fix by checking the V3D tech version and revision.

Fixes several `dEQP-VK.synchronization*` CTS tests related to Indirect Dispatch.

Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240714145243.1223131-2-mcanal@igalia.com
2024-07-15 12:49:52 -03:00
Maíra Canal
7c78fdbace
drm/v3d: Add V3D tech revision to the device information
The V3D tech revision can be a useful information when configuring
jobs. Therefore, expose it in the `struct v3d_dev` with the V3D tech
version.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240714145243.1223131-1-mcanal@igalia.com
2024-07-15 12:45:59 -03:00
Alex Deucher
1cff1010be drm/amdgpu/mes12: add missing opcode string
Fixes the indexing of the string array.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher
478cb8badf drm/amdgpu/mes11: update opcode strings
Add new packet.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Leo Li
7ed58b68ac Revert "drm/amd/display: Reset freesync config before update new state"
This change caused PSR SU panels to not read from their remote fb,
preventing us from entering self-refresh. It is a regression.

This reverts commit eb6dfbb7a9c67c7d9bcdb9f9b9131270e2144e3d.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dc1000bf463d1d89f66d6b5369cf76603f32c4d3)
2024-07-12 11:46:46 -04:00
Nathan Chancellor
c58c39163a drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB
Prior to commit dc6fcaaba5a5 ("drm/omap: Allow build with
COMPILE_TEST=y"), it was only possible to build the omapdrm driver with
a 4KB page size. After that change, when the PAGE_SIZE is 64KB or
larger, clang points out that the driver has some assumptions around the
page size implicitly by passing PAGE_SIZE to a parameter with a type of
u16:

  drivers/gpu/drm/omapdrm/omap_gem.c:758:7: error: implicit conversion from 'unsigned long' to 'u16' (aka 'unsigned short') changes value from 65536 to 0 [-Werror,-Wconstant-conversion]
    757 |                 block = tiler_reserve_2d(fmt, omap_obj->width, omap_obj->height,
        |                         ~~~~~~~~~~~~~~~~
    758 |                                          PAGE_SIZE);
        |                                          ^~~~~~~~~
  arch/powerpc/include/asm/page.h:25:34: note: expanded from macro 'PAGE_SIZE'
     25 | #define PAGE_SIZE               (ASM_CONST(1) << PAGE_SHIFT)
        |                                  ~~~~~~~~~~~~~^~~~~~~~~~~~~
  drivers/gpu/drm/omapdrm/omap_gem.c:1504:44: error: implicit conversion from 'unsigned long' to 'u16' (aka 'unsigned short') changes value from 65536 to 0 [-Werror,-Wconstant-conversion]
   1504 |                         block = tiler_reserve_2d(fmts[i], w, h, PAGE_SIZE);
        |                                 ~~~~~~~~~~~~~~~~                ^~~~~~~~~
  arch/powerpc/include/asm/page.h:25:34: note: expanded from macro 'PAGE_SIZE'
     25 | #define PAGE_SIZE               (ASM_CONST(1) << PAGE_SHIFT)
        |                                  ~~~~~~~~~~~~~^~~~~~~~~~~~~
  2 errors generated.

As there is a lot of use of a u16 type throughout this driver and it
will only ever be run on hardware that has a 4KB page size, just
restrict compile testing to when the page size is less than 64KB (as no
other issues have been discussed and it keeps compile testing relatively
more available).

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240620-omapdrm-restrict-compile-test-to-sub-64kb-page-size-v1-1-5e56de71ffca@kernel.org
2024-07-12 13:13:15 +10:00
Dave Airlie
864204e467 UAPI Changes:
- Rename xe perf layer as xe observation layer (Ashutosh)
 
 Driver Changes:
 - Drop trace_xe_hw_fence_free (Brost)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmaP5OEACgkQ+mJfZA7r
 E8omlQf/SruWuGV3SaQq+gj3DmwPgYeFEE8Oml+xOTukex9OOEI2CbTKiKLpBm/y
 b83DYCCH2J0J3vb59GJsyHUQm8ETTbx1k6frErGHcQ5hYZoddrZw7bonH6gGCNoc
 Itg4j95FHIJPcysMrcmyYqWSnJpJxFfXuoFekgz3NQYnvCPHWNMmtaqD1PEyK9dr
 kQIg0V+mKyd2L4p4EoDxzIaEEyyp3zbrjQssWqTuE6Nh3R+1AvfgAOWmcOAUDASR
 Ptmy7helXgCPh/HO4qap8+aKseMUGqTr7LOJ+WTsWpbgp0zuMMHiGmqxSLMQgjiY
 Xn9I5ltbzcjH91SyWND9YVC9GXAdiQ==
 =+fR2
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-next-fixes-2024-07-11' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
- Rename xe perf layer as xe observation layer (Ashutosh)

Driver Changes:
- Drop trace_xe_hw_fence_free (Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zo_3ustogPDVKZwu@intel.com
2024-07-12 12:52:28 +10:00
Dave Airlie
38e73004c2 A fix for fbdev on big endian systems, a condition fix for a sharp panel
at removal, and a fix for qxl to prevent unpinned buffer access under
 certain conditions.
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZo/FPgAKCRAnX84Zoj2+
 dkt/AX9LM8fZS7ArDwJC706l+lEJSOgP6GUE2/xKuYaIkylbX7RYFD9bQGXZchxj
 o8mA4EoBgLTfgDFREmk8Fy+tWijwBdsisCzgpyktxBc2viutPMhDyJa0m2aSGn0h
 fMJbrJaFKw==
 =P2TW
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2024-07-11' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

A fix for fbdev on big endian systems, a condition fix for a sharp panel
at removal, and a fix for qxl to prevent unpinned buffer access under
certain conditions.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711-benign-rich-mouflon-2eeafe@houat
2024-07-12 12:50:30 +10:00
Matthew Brost
26d289158e
drm/xe: Drop trace_xe_hw_fence_free
fence->ctx may be stale memory when trace_xe_hw_fence_free is called
resuling UAF bug when deriving the device name. This tracepoint is not
all that useful, so just drop it.

Fixes: 501c4255c409 ("drm/xe/trace: Print device_id in xe_trace events")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708211008.956384-1-matthew.brost@intel.com
(cherry picked from commit caaf1f44a6a27bae33eee189842c4d8fc21c3b02)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-07-11 09:54:24 -04:00
Ashutosh Dixit
63347fe031
drm/xe/uapi: Rename xe perf layer as xe observation layer
In Xe, the perf layer allows capture of HW counter streams. These HW
counters are generally performance related but don't have to be necessarily
so. Also, the name "perf" is a carryover from i915 and is not preferred.

Here we propose the name "observation" for this common layer which allows
capture of different types of these counter streams.

v2: Rename observability layer to observation layer (Lucas/Rodrigo)
v3: Rename sysctl file to "observation_paranoid" (Jose)

Fixes: 52c2e956dceb ("drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types")
Fixes: fe8929bdf835 ("drm/xe/perf/uapi: Add perf_stream_paranoid sysctl")
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703164801.2561423-1-ashutosh.dixit@intel.com
(cherry picked from commit 8169b2097d88d99d7e4a72e20e4b549efe9eb8d7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-07-11 09:54:24 -04:00
Alex Deucher
8030f6533e drm/amdgpu: remove exp hw support check for gfx12
Enable it by default.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 13:45:57 -04:00
YiPeng Chai
e23300dfff drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
The problem case is as follows:
1. GPU A triggers a gpu ras reset, and GPU A drives
   GPU B to also perform a gpu ras reset.
2. After gpu B ras reset started, gpu B queried a DE
   data. Since the DE data was queried in the ras reset
   thread instead of the page retirement thread, bad
   page retirement work would not be triggered. Then
   even if all gpu resets are completed, the bad pages
   will be cached in RAM until GPU B's bad page retirement
   work is triggered again and then saved to eeprom.

This patch can save the bad pages to eeprom in time after gpu
ras reset is completed.

v2:
  1. Add the above description to code comments.
  2. Reuse existing function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:41 -04:00
YiPeng Chai
c04706914d drm/amdgpu: flush all cached ras bad pages to eeprom
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.

v2:
  Put the same code into a function and reuse the function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:35 -04:00
Sunil Khatri
c39385710c drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX12.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:22 -04:00
Aurabindo Pillai
21e6f6085b drm/amd/display: Allow display DCC for DCN401
To enable mesa to use display dcc, DM should expose them in the
supported modifiers. Add the best (most efficient) modifiers first.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:10 -04:00
Sunil Khatri
a85cc86cce drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX11.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:04 -04:00
Alex Deucher
7d570f56f1 drm/amdgpu/job: Replace DRM_INFO/ERROR logging
Use the dev_info/err variants so we get per device logging
in multi-GPU cases.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:55 -04:00
Sunil Khatri
948f2828a6 drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX10.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:48 -04:00
Danijel Slivka
708f220567 drm/amd/pm: Ignore initial value in smu response register
Why:
If the reg mmMP1_SMN_C2PMSG_90 is being written to during amdgpu driver
load or driver unload, subsequent amdgpu driver load will fail at
smu_hw_init. The default of mmMP1_SMN_C2PMSG_90 register at a clean
environment is 0x1 and if value differs from expected, amdgpu driver
load will fail.

How to fix:
Ignore the initial value in smu response register before the first smu
message is sent,if smc in SMU_FW_INIT state, just proceed further to
send the message. If register holds an unexpected value after smu message
was sent set, smc_state to SMU_FW_HANG state and no further smu messages
will be sent.

v2:
Set SMU_FW_INIT state at the start of smu hw_init/resume.
Check smc_fw_state before sending smu message if in hang state skip
sending message.
Set SMU_FW_HANG only in case unexpected value is detected

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:37 -04:00
Lijo Lazar
d02ddefc7e drm/amdgpu: Initialize VF partition mode
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions.
Fetch the partition information during init and initialize partition
nodes. There is no support to switch partition mode in VF mode, hence
disable the same.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:28 -04:00
Gavin Wan
5d64af40e3 drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.
sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have
sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For
Even numbered vfs, the sdma2 & sdma3 (irq srouce id
CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:19 -04:00
Alex Deucher
89d568ab90 MAINTAINERS: fix Xinhui's name
Switch to fist last for consistency.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
2024-07-10 10:12:15 -04:00
Alex Deucher
1fe5fa5ba1 MAINTAINERS: update powerplay and swsmu
Evan is no longer maintaining powerplay and swsmu.
Add Kenneth Feng as his replacement.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
2024-07-10 10:12:00 -04:00
Daniel Vetter
dbf35b4dea Merge tag 'drm-intel-next-2024-06-28' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull #2 for v6.11:

Features and functionality:
- More eDP Panel Replay enabling (Jouni)
- Add async flip and flip done tracepoints (Ville)

Refactoring and cleanups:
- Clean up BDW+ pipe interrupt register definitions (Ville)
- Prep work for DSB based plane programming (Ville)
- Relocate encoder suspend/shutdown helpers (Imre)
- Polish plane surface alignment handling (Ville)

Fixes:
- Enable more fault interrupts on TGL+/MTL+ (Ville)
- Fix CMRR 32-bit build (Mitul)
- Fix PSR Selective Update Region Scan Line Capture Indication (Jouni)
- Fix cursor fb unpinning (Maarten, Ville)
- Fix Cx0 PHY PLL state verification in TBT mode (Imre)
- Fix unnecessary MG DP programming on MTL+ Type-C (Imre)

DRM changes:
- Rename drm_plane_check_pixel_format() to drm_plane_has_format() and export
  (Ville)
- Add drm_vblank_work_flush_all() (Maarten)

Xe driver changes:
- Call encoder .suspend_complete() hook also on Xe (Imre)

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/875xttazx2.fsf@intel.com
2024-07-10 10:36:47 +02:00
Thomas Zimmermann
c537fb4e3d drm/qxl: Pin buffer objects for internal mappings
Add qxl_bo_pin_and_vmap() that pins and vmaps a buffer object in one
step. Update callers of the regular qxl_bo_vmap(). Fixes a bug where
qxl accesses an unpinned buffer object while it is being moved; such
as with the monitor-description BO. An typical error is shown below.

[    4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1 wrong: 65376256x16777216+0+0
[    4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1 wrong: 65376256x16777216+0+0
[    4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1 wrong: 65335296x16777216+0+0
[    5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find id in release_idr

Commit b33651a5c98d ("drm/qxl: Do not pin buffer objects for vmap")
removed the implicit pin operation from qxl's vmap code. This is the
correct behavior for GEM and PRIME interfaces, but the pin is still
needed for qxl internal operation.

Also add a corresponding function qxl_bo_vunmap_and_unpin() and remove
the old qxl_bo_vmap() helpers.

Future directions: BOs should not be pinned or vmapped unnecessarily.
The pin-and-vmap operation should be removed from the driver and a
temporary mapping should be established with a vmap_local-like helper.
See the client helper drm_client_buffer_vmap_local() for semantics.

v2:
- unreserve BO on errors in qxl_bo_pin_and_vmap() (Dmitry)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: b33651a5c98d ("drm/qxl: Do not pin buffer objects for vmap")
Reported-by: David Kaplan <david.kaplan@amd.com>
Closes: https://lore.kernel.org/dri-devel/ab0fb17d-0f96-4ee6-8b21-65d02bb02655@suse.de/
Tested-by: David Kaplan <david.kaplan@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Zack Rusin <zack.rusin@broadcom.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: virtualization@lists.linux.dev
Cc: spice-devel@lists.freedesktop.org
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708142208.194361-1-tzimmermann@suse.de
2024-07-10 09:12:42 +02:00
Douglas Anderson
ec85147a35 drm/panel: sharp-lq101r1sx01: Fixed reversed "if" in remove
Commit d7d473d8464e ("drm/panel: sharp-lq101r1sx01: Don't call disable
at shutdown/remove") had a subtle bug. We should be calling
sharp_panel_del() when the "sharp" variable is non-NULL, not when it's
NULL. Fix.

Fixes: d7d473d8464e ("drm/panel: sharp-lq101r1sx01: Don't call disable at shutdown/remove")
Cc: Thierry Reding <treding@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202406261525.SkhtM3ZV-lkp@intel.com/
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708105221.1.I576751c661c7edb6b804dda405d10e2e71153e32@changeid
2024-07-09 08:24:05 -07:00
Zhigang Luo
ee98fb71ba drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1
to avoid reading wrong WPTR from doorbell in sriov vf, set
CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:27 -04:00
Kent Russell
9e4c9ee0ba Documentation/amdgpu: Clarify MI200 and MI300 entries
Add "Series" to MI200 and MI300 to clarify that they represent the
series of cards, and to more closely match the product information
materials. This also matches other entries in this list

Also correct a typo in the MI300 codename (Vangaram->Vanjaram)

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:20 -04:00
Yang Wang
59f488be76 drm/amdgpu: add ras event state device attribute support
add amdgpu ras 'event_state' sysfs device attribute support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:13 -04:00
Li Ma
1dd34092c1 drm/amd/swsmu: enable more Pstates profile levels for SMU v14.0.0 and v14.0.1
V1: 	This patch enables following UMD stable Pstates profile
	levels for power_dpm_force_performance_level interface.

	- profile_peak
	- profile_min_mclk
	- profile_min_sclk
	- profile_standard

V2:	Fix conflict with commit "drm/amd/pm: smu v14.0.4 reuse smu v14.0.0 dpmtable "

V3:	Add VCLK1 and DCLK1 support for SMU V14.0.1
	And avoid to set VCLK1 and DCLK1 for SMU v14.0.0

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:43 -04:00
Yang Wang
12b435a40c drm/amdgpu: add ras POSION_CONSUMPTION event id support
add amdgpu ras POSION_CONSUMPTION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:37 -04:00
Stanley.Yang
91ba536ead drm/amdkfd: Use mode1 reset for GFX v9.4.4
GFX v9.4.4 uses mode1 reset to handle poison consumption.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:26 -04:00
Yang Wang
5b9de2596f drm/amdgpu: add ras POSION_CREATION event id support
add amdgpu ras POSION_CREATION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:18 -04:00
Yang Wang
75ac6a2506 drm/amdgpu: refine amdgpu ras event id core code
v1:
- use unified event id to manage ras events
- add a new function amdgpu_ras_query_error_status_with_event() to accept
  event type as parameter.

v2:
add a warn log to show the location of function failure
when calling amdgpu_ras_mark_event(). (Tao Zhou)

v3:
change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL.

v4:
rename amdgpu_ras_get_recovery_event() to
amdgpu_ras_get_fatal_error_event().

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:11 -04:00
Wayne Lin
e33697141b drm/amd/display: Solve mst monitors blank out problem after resume
[Why]
In dm resume, we firstly restore dc state and do the mst resume for topology
probing thereafter. If we change dpcd DP_MSTM_CTRL value after LT in mst reume,
it will cause light up problem on the hub.

[How]
Revert commit 202dc359adda ("drm/amd/display: Defer handling mst up request in resume").
And adjust the reason to trigger dc_link_detect by DETECT_REASON_RESUMEFROMS3S4.

Cc: stable@vger.kernel.org
Fixes: 202dc359adda ("drm/amd/display: Defer handling mst up request in resume")
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Reviewed-by: Fangzhi Zuo <jerry.zuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:51:02 -04:00
Christian König
320debca1b drm/amdgpu: reject gang submit on reserved VMIDs
A gang submit won't work if the VMID is reserved and we can't flush out
VM changes from multiple engines at the same time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:53 -04:00
Saleemkhan Jamadar
b6ad109166 drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2
DPG mode is enabled for vcn and jpeg on VCN v4_0_5

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:40 -04:00
Yang Wang
332210c13a drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG
remove redundant semicolons in RAS_EVENT_LOG to avoid
code format check warning.

Fixes: b712d7c20133 ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:43 -04:00
Frank Min
54837bd2be drm/amdgpu: restore dcc bo tilling configs while moving
While moving buffer which has dcc tiling config, it is needed to restore
its original dcc tiling.

1. extend copy flag to cover tiling bits
2. add logic to restore original dcc tiling config

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:27 -04:00
Sunil Khatri
c8714ac982 drm/amdgpu: add gfx queue support for gfx12 ipdump
Add support of all the CP GFX queues for gfx12 ipdump
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:22 -04:00