27063 Commits

Author SHA1 Message Date
YuBiao Wang
cc39f9ccb8 drm/amdkfd: Use gpu_offset for user queue's wptr
Directly use tbo's start address will miss the domain start offset. Need
to use gpu_offset instead.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-09-20 17:30:42 -04:00
Hamza Mahfooz
2de19022c5 drm/amd/display: fix the ability to use lower resolution modes on eDP
On eDP we can receive invalid modes from dm_update_crtc_state() for
entirely new streams for which drm_mode_set_crtcinfo() shouldn't be
called on. So, instead of calling drm_mode_set_crtcinfo() from within
create_stream_for_sink() we can instead call it from
amdgpu_dm_connector_mode_valid(). Since, we are guaranteed to only call
drm_mode_set_crtcinfo() for valid modes from that function (invalid
modes are rejected by that callback) and that is the only user
of create_validate_stream_for_sink() that we need to call
drm_mode_set_crtcinfo() for (as before commit cb841d27b876
("drm/amd/display: Always pass connector_state to stream validation"),
that is the only place where create_validate_stream_for_sink()'s
dm_state was NULL).

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2693
Fixes: cb841d27b876 ("drm/amd/display: Always pass connector_state to stream validation")
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 17:28:34 -04:00
Cong Liu
f387bb578d drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function.
The leak occurs when the function sends a command to the firmware to enable
or disable a RAS feature for a GFX block. If the command fails, the kfree()
function is not called to free the info memory.

Fixes: 9f051d6ff13f ("drm/amdgpu: Free ras cmd input buffer properly")
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 17:27:04 -04:00
Lijo Lazar
06cce38ef5 Revert "drm/amdgpu: Report vbios version instead of PN"
This reverts commit 7748ce5b69581325cae40c2134088820f0957902.

vbios_version sysfs node is used to identify Part Number also. Revert to
the same so that it doesn't break scripts/software which parse this.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 17:26:15 -04:00
Muhammad Ahmed
6f6583e58d drm/amd/display: Fix MST recognizes connected displays as one
[What]
MST now recognizes both connected displays

Fixes: 927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 17:23:30 -04:00
Dave Airlie
1216d49178 amd-drm-fixes-6.6-2023-09-13:
amdgpu:
 - GC 9.4.3 fixes
 - Fix white screen issues with S/G display on system with >= 64G of ram
 - Replay fixes
 - SMU 13.0.6 fixes
 - AUX backlight fix
 - NBIO 4.3 SR-IOV fixes for HDP
 - RAS fixes
 - DP MST resume fix
 - Fix segfault on systems with no vbios
 - DPIA fixes
 
 amdkfd:
 - CWSR grace period fix
 - Unaligned doorbell fix
 - CRIU fix for GFX11
 - Add missing TLB flush on gfx10 and newer
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZQIRSAAKCRC93/aFa7yZ
 2O/nAP4zB0fdLB46Hhz11aYsE9Zghe91b2rcmF4EYpEAQs7awwEAhSjy0Wiy6EYb
 prEGCdW0O8Tq7fdjr7+JrPmF7dasAQk=
 =SUbg
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.6-2023-09-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.6-2023-09-13:

amdgpu:
- GC 9.4.3 fixes
- Fix white screen issues with S/G display on system with >= 64G of ram
- Replay fixes
- SMU 13.0.6 fixes
- AUX backlight fix
- NBIO 4.3 SR-IOV fixes for HDP
- RAS fixes
- DP MST resume fix
- Fix segfault on systems with no vbios
- DPIA fixes

amdkfd:
- CWSR grace period fix
- Unaligned doorbell fix
- CRIU fix for GFX11
- Add missing TLB flush on gfx10 and newer

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com
2023-09-15 09:50:50 +10:00
Daniel Vetter
15794f9dc3 One doc fix for drm/connector, one fix for amdgpu for an crash when
VRAM usage is high, and one fix in gm12u320 to fix the timeout units in
 the code
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZPl/TAAKCRDj7w1vZxhR
 xWZZAP0b3k5vIuQdbiZBdXy7+guakiJ2DqOMxJJ+sYS5Mun53AEA73Cu1gmBNMoT
 d8H1uBjOfvPcXANNI0t0OgJfrESOdg8=
 =atPC
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-09-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

One doc fix for drm/connector, one fix for amdgpu for an crash when
VRAM usage is high, and one fix in gm12u320 to fix the timeout units in
the code

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
2023-09-14 14:00:51 +02:00
Harish Kasiviswanathan
edcfe22985 drm/amdkfd: Insert missing TLB flush on GFX10 and later
Heavy-weight TLB flush is required after unmap on all GPUs for
correctness and security.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-09-12 17:45:40 -04:00
Mustapha Ghaddar
2931937844 drm/amd/display: Fix 2nd DPIA encoder Assignment
[HOW & Why]
There seems to be an issue with 2nd DPIA acquiring link encoder for tiled displays.
Solution is to remove check for eng_id before we get first dynamic encoder for it

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Mustapha Ghaddar <mghaddar@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:30:15 -04:00
Mustapha Ghaddar
64be47ba28 drm/amd/display: Add DPIA Link Encoder Assignment Fix
For DPIA we should have preferred DIG assignment based on DPIA selected
as per the ASIC design.

Reviewed-by: George Shen <george.shen@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Mustapha Ghaddar <mghaddar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-09-11 18:30:03 -04:00
Randy Dunlap
db5494a852 drm/amd/display: fix replay_mode kernel-doc warning
Fix the typo in the kernel-doc for @replay_mode to prevent
kernel-doc warnings:

drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:623: warning: Incorrect use of kernel-doc format:          * @replay mode: Replay supported
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:626: warning: Function parameter or member 'replay_mode' not described in 'amdgpu_hdmi_vsdb_info'

Fixes: ec8e59cb4e0c ("drm/amd/display: Get replay info from VSDB")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:25:39 -04:00
David Francis
5e7e822542 drm/amdgpu: Handle null atom context in VBIOS info ioctl
On some APU systems, there is no atom context and so the
atom_context struct is null.

Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl
to handle this case, returning all zeroes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:25:26 -04:00
David Francis
9296da8c40 drm/amdkfd: Checkpoint and restore queues on GFX11
The code in kfd_mqd_manager_v11.c to support criu dump and
restore of queue state was missing.

Added it; should be equivalent to kfd_mqd_manager_v10.c.

CC: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:22:38 -04:00
Wayne Lin
ec5fa9fcde drm/amd/display: Adjust the MST resume flow
[Why]
In drm_dp_mst_topology_mgr_resume() today, it will resume the
mst branch to be ready handling mst mode and also consecutively do
the mst topology probing. Which will cause the dirver have chance
to fire hotplug event before restoring the old state. Then Userspace
will react to the hotplug event based on a wrong state.

[How]
Adjust the mst resume flow as:
1. set dpcd to resume mst branch status
2. restore source old state
3. Do mst resume topology probing

For drm_dp_mst_topology_mgr_resume(), it's better to adjust it to
pull out topology probing work into a 2nd part procedure of the mst
resume. Will have a follow up patch in drm.

Reviewed-by: Chao-kai Wang <stylon.wang@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:21:50 -04:00
Hawking Zhang
ffd6bde302 drm/amdgpu: fallback to old RAS error message for aqua_vanjaram
So driver doesn't generate incorrect message until
the new format is settled down for aqua_vanjaram

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:20:07 -04:00
Alex Deucher
ab43213e7a drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV
Needed for HDP flush to work correctly.

Reviewed-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:19:48 -04:00
Alex Deucher
1832403cd4 drm/amdgpu/soc21: don't remap HDP registers for SR-IOV
This matches the behavior for soc15 and nv.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:19:42 -04:00
Swapnil Patel
f5b2c10b57 drm/amd/display: Don't check registers, if using AUX BL control
[Why]
Currently the driver looks DCN registers to access if BL is on or not.
This check is not valid if we are using AUX based brightness control.
This causes driver to not send out "backlight off" command during power off
sequence as it already thinks it is off.

[How]
Only check DCN registers if we aren't using AUX based brightness control.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Swapnil Patel <swapnil.patel@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:19:31 -04:00
Dan Carpenter
81cc8779cf drm/amdgpu: fix retry loop test
This loop will exit with "retry" set to -1 if it fails but the code
checks for if "retry" is zero.  Fix this by changing post-op to a
pre-op.  --retry vs retry--.

Fixes: e01eeffc3f86 ("drm/amd/pm: avoid driver getting empty metrics table for the first time")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:19:03 -04:00
Bhawanpreet Lakha
679fc891bf drm/amd/display: Add dirty rect support for Replay
Dirty rect can be used with replay, so enable them to allow for more
powersaving.

Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Bhawanpreet Lakha <bhawanpreet.lakha@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:18:53 -04:00
Hamza Mahfooz
169ed4ece8 Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"
This reverts commit 70e64c4d522b732e31c6475a3be2349de337d321.

Since, we now have an actual fix for this issue, we can get rid of this
workaround as it can cause pin failures if enough VRAM isn't carved out
by the BIOS.

Cc: stable@vger.kernel.org # 6.1+
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:18:17 -04:00
Yifan Zhang
ef064187a9 drm/amd/display: fix the white screen issue when >= 64GB DRAM
Dropping bit 31:4 of page table base is wrong, it makes page table
base points to wrong address if phys addr is beyond 64GB; dropping
page_table_start/end bit 31:4 is unnecessary since dcn20_vmid_setup
will do that. Also, while we are at it, cleanup the assignments using
upper_32_bits()/lower_32_bits() and AMDGPU_GPU_PAGE_SHIFT.

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Co-developed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:18:09 -04:00
Mukul Joshi
fc6efed2c7 drm/amdkfd: Update CU masking for GFX 9.4.3
The CU mask passed from user-space will change based on
different spatial partitioning mode. As a result, update
CU masking code for GFX9.4.3 to work for all partitioning
modes.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:17:27 -04:00
Mukul Joshi
0752e66e91 drm/amdkfd: Update cache info reporting for GFX v9.4.3
Update cache info reporting in sysfs to report the correct
number of CUs and associated cache information based on
different spatial partitioning modes.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:17:20 -04:00
Mukul Joshi
97e3c6a853 drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3
Currently, we store CU info only for a single XCC assuming
that it is the same for all XCCs. However, that may not be
true. As a result, store CU info for all XCCs. This info is
later used for CU masking.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:16:31 -04:00
Mukul Joshi
2f06b27444 drm/amdkfd: Fix unaligned 64-bit doorbell warning
This patch fixes the following unaligned 64-bit doorbell
warning seen when submitting packets on HIQ on GFX v9.4.3
by making the HIQ doorbell 64-bit aligned.
The warning is seen when GPU is loaded in any mode other
than SPX mode.

[  +0.000301] ------------[ cut here ]------------
[  +0.000003] Unaligned 64-bit doorbell
[  +0.000030] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80
[  +0.000003] RIP: 0010:write_kernel_doorbell64+0x72/0x80
[  +0.000004] RSP: 0018:ffffc90004287730 EFLAGS: 00010246
[  +0.000005] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  +0.000003] RDX: 0000000000000001 RSI: ffffffff82837c71 RDI: 00000000ffffffff
[  +0.000003] RBP: ffffc90004287748 R08: 0000000000000003 R09: 0000000000000001
[  +0.000002] R10: 000000000000001a R11: ffff88a034008198 R12: ffffc900013bd004
[  +0.000003] R13: 0000000000000008 R14: ffffc900042877b0 R15: 000000000000007f
[  +0.000003] FS:  00007fa8c7b62000(0000) GS:ffff889f88400000(0000) knlGS:0000000000000000
[  +0.000004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000003] CR2: 000056111c45aaf0 CR3: 00000001414f2002 CR4: 0000000000770ee0
[  +0.000003] PKRU: 55555554
[  +0.000002] Call Trace:
[  +0.000004]  <TASK>
[  +0.000006]  kq_submit_packet+0x45/0x50 [amdgpu]
[  +0.000524]  pm_send_set_resources+0x7f/0xc0 [amdgpu]
[  +0.000500]  set_sched_resources+0xe4/0x160 [amdgpu]
[  +0.000503]  start_cpsch+0x1c5/0x2a0 [amdgpu]
[  +0.000497]  kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu]
[  +0.000743]  amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu]
[  +0.000602]  amdgpu_device_init.cold+0x1813/0x2176 [amdgpu]
[  +0.000684]  ? pci_bus_read_config_word+0x4a/0x80
[  +0.000012]  ? do_pci_enable_device+0xdc/0x110
[  +0.000008]  amdgpu_driver_load_kms+0x1a/0x110 [amdgpu]
[  +0.000545]  amdgpu_pci_probe+0x197/0x400 [amdgpu]

Fixes: c31866651086 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:15:49 -04:00
Mukul Joshi
81faf9e0c3 drm/amdkfd: Fix reg offset for setting CWSR grace period
This patch fixes the case where the code currently passes
absolute register address and not the reg offset, which HWS
expects, when sending the PM4 packet to set/update CWSR grace
period. Additionally, cleanup the signature of
build_grace_period_packet_info function as it no longer needs
the inst parameter.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:15:43 -04:00
Linus Torvalds
a48fa7efaf drm fixes for 6.6-rc1
amdgpu:
 - Display replay fixes
 - Fixes for headless boards
 - Fix documentation breakage
 - RAS fixes
 - Handle newer IP discovery tables
 - SMU 13.0.6 fixes
 - SR-IOV fixes
 - Display vstartup fixes
 - NBIO 7.9 fixes
 - Display scaling mode fixes
 - Debugfs power reporting fix
 - GC 9.4.3 fixes
 - Dirty framebuffer fixes for fbcon
 - eDP fixes
 - DCN 3.1.5 fix
 - Display ODM fixes
 - GPU core dump fix
 - Re-enable zops property now that IGT test is fixed
 - Fix possible UAF in CS code
 - Cursor degamma fix
 
 amdkfd:
 - HMM fixes
 - Interrupt masking fix
 - GFX11 MQD fixes
 
 i915:
 - Mark requests for GuC virtual engines to avoid use-after-free
 
 nouveau:
 - Fix fence state in nouveau_fence_emit()
 
 ivpu:
 - replace strncpy
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmT6ihUACgkQDHTzWXnE
 hr6Vqg/+OGfbxx0qev5C93bLYpg8d4zbply0zTed2hU48zczARkOyDH+h2uYM4tP
 rmB6mK/0KRy3H8vkngKduR/IF6QBwzOnLDpS+C/TrHYQYqMDwvs3qEDVYqXh3V5H
 GPIFuu9sb2Nb/o9Fid70pvNACbgGsAUgqMUhQ/is3NjzOR8S/qjdBQ7wSdoUoOqx
 okTdlwuMK5SEYrihSyTZvhNcwpR/1L8JuxOUXXXUSQ0tRXBer/ZNF2lcEyYmQ0zs
 bZHKM4dNbdew/EhygbH6LVB5RjFaT5pGw08Xm8zJry+q5tXQV/NIXPQHL3vWqQoX
 i2QLbvGX/Uu8LJg9YNdsa1kPwNKADAxF64cW38Llv8ybsPHyva/I255j689/TvSG
 Se7HkTooURKS6GWFHPOkyMNC0Y+Fb/7WG5zUPSDVk9stqJz9pzx48okTsCWeuovD
 cBOssp8If1QsTyPvDq5A47l5z1oO3J1rdJ9fL0GnpOuXulPhgJGIoUSkftyI6lbw
 rhUoRd7w6VcgOsA9WIkAf+/325em0Y0AKZBgQnr2jfF56IE4iLa8yFuJNeReZ9oy
 W9yf14AB0orVm9+P4+PqATXBh+PdCTD1CPcB0MyK1SZAjh836Tc0HPKvJAUU6jp+
 8aw3BKXcaLjIP/dCyhSddXCnuuTPWreff5isktwgEXUtNFv9jx0=
 =fPeN
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three
  weeks in one go, one i915, one nouveau and one ivpu.

  I think there might be a few more fixes in misc that I haven't pulled
  in yet, but we should get them all for rc2.

  amdgpu:
   - Display replay fixes
   - Fixes for headless boards
   - Fix documentation breakage
   - RAS fixes
   - Handle newer IP discovery tables
   - SMU 13.0.6 fixes
   - SR-IOV fixes
   - Display vstartup fixes
   - NBIO 7.9 fixes
   - Display scaling mode fixes
   - Debugfs power reporting fix
   - GC 9.4.3 fixes
   - Dirty framebuffer fixes for fbcon
   - eDP fixes
   - DCN 3.1.5 fix
   - Display ODM fixes
   - GPU core dump fix
   - Re-enable zops property now that IGT test is fixed
   - Fix possible UAF in CS code
   - Cursor degamma fix

  amdkfd:
   - HMM fixes
   - Interrupt masking fix
   - GFX11 MQD fixes

  i915:
   - Mark requests for GuC virtual engines to avoid use-after-free

  nouveau:
   - Fix fence state in nouveau_fence_emit()

  ivpu:
   - replace strncpy"

* tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits)
  drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
  drm/amd/display: prevent potential division by zero errors
  drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
  drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
  Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
  drm/amdgpu: fix amdgpu_cs_p1_user_fence
  Revert "Revert "drm/amd/display: Implement zpos property""
  drm/amdkfd: Add missing gfx11 MQD manager callbacks
  drm/amdgpu: Free ras cmd input buffer properly
  drm/amdgpu: Hide xcp partition sysfs under SRIOV
  drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting
  drm/amdkfd: use mask to get v9 interrupt sq data bits correctly
  drm/amdgpu: Allocate coredump memory in a nonblocking way
  drm/amdgpu: Support query ecc cap for aqua_vanjaram
  drm/amdgpu: Add umc_info v4_0 structure
  drm/amd/display: always switch off ODM before committing more streams
  drm/amd/display: Remove wait while locked
  drm/amd/display: update blank state on ODM changes
  drm/amd/display: Add smu write msg id fail retry process
  drm/amdgpu: Add SMU v13.0.6 default reset methods
  ...
2023-09-07 19:47:04 -07:00
Lijo Lazar
fbe1a9e0c7 drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
Restrict the wait for boot loader steady state only to SMUv13.0.6. For
older SOCs, ASIC init has a longer wait period and that takes care.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 22:11:51 -04:00
Hamza Mahfooz
07e388aab0 drm/amd/display: prevent potential division by zero errors
There are two places in apply_below_the_range() where it's possible for
a divide by zero error to occur. So, to fix this make sure the divisor
is non-zero before attempting the computation in both cases.

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2637
Fixes: a463b263032f ("drm/amd/display: Fix frames_to_insert math")
Fixes: ded6119e825a ("drm/amd/display: Reinstate LFC optimization")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 22:10:11 -04:00
Melissa Wen
57a943ebfc drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma
using a pre-defined sRGB transfer function. It works fine for DCN2
family where degamma ROM and custom curves go to the same color block.
But, on DCN3+, degamma is split into two blocks: degamma ROM for
pre-defined TFs and `gamma correction` for user/custom curves and
degamma ROM settings doesn't apply to cursor plane. To get DRM legacy
gamma working as expected, enable cursor degamma ROM for implict sRGB
degamma on HW with this configuration.

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 22:09:33 -04:00
Hamza Mahfooz
47428f4b63 drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
Since, calling dcn20_adjust_freesync_v_startup() on DCN3.1+ ASICs
can cause the display to flicker and underflow to occur, we shouldn't
call it for them. So, ensure that the DCN version is less than
DCN_VERSION_3_1 before calling dcn20_adjust_freesync_v_startup().

Cc: stable@vger.kernel.org
Reviewed-by: Fangzhi Zuo <jerry.zuo@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 22:08:26 -04:00
Hamza Mahfooz
a81de4a22b Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
This reverts commit 3a31e8b89b7240d9a17ace8a1ed050bdcb560f9e.

We still need to call dcn20_adjust_freesync_v_startup() for older DCN3+
ASICs. Otherwise, it can cause DP to HDMI 2.1 PCONs to fail to light up.

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2809
Reviewed-by: Fangzhi Zuo <jerry.zuo@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 22:08:10 -04:00
Simon Pilkington
e2884fe84a drm/amd: Make fence wait in suballocator uninterruptible
Commit c103a23f2f29
("drm/amd: Convert amdgpu to use suballocation helper.")
made the fence wait in amdgpu_sa_bo_new() interruptible but there is no
code to handle an interrupt. This caused the kernel to randomly explode
in high-VRAM-pressure situations so make it uninterruptible again.

Signed-off-by: Simon Pilkington <simonp.git@gmail.com>
Fixes: c103a23f2f29 ("drm/amd: Convert amdgpu to use suballocation helper.")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2761
CC: stable@vger.kernel.org # 6.4+
2023-09-01 15:12:07 +02:00
Christian König
35588314e9 drm/amdgpu: fix amdgpu_cs_p1_user_fence
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.

The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-08-31 18:14:49 -04:00
Hamza Mahfooz
46528db355 Revert "Revert "drm/amd/display: Implement zpos property""
This reverts commit e2066eb4efe0e7d2d329d6e6765ed637a523ac45.

The problematic IGT test case (i.e. kms_atomic@plane-immutable-zpos) has
been fixed as of commit cb77add45011 ("tests/kms_atomic: remove zpos <
N-planes assert") to the IGT repo. So, reintroduce the reverted code.

Link: cb77add450
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:14:49 -04:00
Jay Cornwall
e9dca969b2 drm/amdkfd: Add missing gfx11 MQD manager callbacks
mqd_stride function was introduced in commit 2f77b9a242a2
("drm/amdkfd: Update MQD management on multi XCC setup")
but not assigned for gfx11. Fixes a NULL dereference in debugfs.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.5.x
2023-08-31 18:14:28 -04:00
Hawking Zhang
9f051d6ff1 drm/amdgpu: Free ras cmd input buffer properly
Do not access the pointer for ras input cmd buffer
if it is even not allocated.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:12:13 -04:00
Rajneesh Bhardwaj
2031c46b09 drm/amdgpu: Hide xcp partition sysfs under SRIOV
XCP partitions should not be visible for the VF for GFXIP 9.4.3.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:11:41 -04:00
Tao Zhou
e23b10675a drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting
Instead of using direct update, avoid touching unrelated fields.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:11:09 -04:00
Alex Sierra
a1fe9e9f73 drm/amdkfd: use mask to get v9 interrupt sq data bits correctly
Interrupt sq data bits were not taken properly from contextid0 and contextid1.
Use macro KFD_CONTEXT_ID_GET_SQ_INT_DATA instead.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:10:28 -04:00
André Almeida
6d1b345548 drm/amdgpu: Allocate coredump memory in a nonblocking way
During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:10:11 -04:00
Hawking Zhang
e0c5c387ac drm/amdgpu: Support query ecc cap for aqua_vanjaram
Driver queries umc_info v4_0 to identify ecc cap
for aqua_vanjaram

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:09:45 -04:00
Hawking Zhang
48d02dcba1 drm/amdgpu: Add umc_info v4_0 structure
To be used by aqua_vanjaram

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:09:38 -04:00
Wenjing Liu
49a30c3d1a drm/amd/display: always switch off ODM before committing more streams
ODM power optimization is only supported with single stream. When ODM
power optimization is enabled, we might not have enough free pipes for
enabling other stream. So when we are committing more than 1 stream we
should first switch off ODM power optimization to make room for new
stream and then allocating pipe resource for the new stream.

Cc: stable@vger.kernel.org
Fixes: 59de751e3845 ("drm/amd/display: add ODM case when looking for first split pipe")
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:09:14 -04:00
Gabe Teeger
5a3ccb1400 drm/amd/display: Remove wait while locked
[Why]
We wait for mpc idle while in a locked state, leading to potential
deadlock.

[What]
Move the wait_for_idle call to outside of HW lock. This and a
call to wait_drr_doublebuffer_pending_clear are moved added to a new
static helper function called wait_for_outstanding_hw_updates, to make
the interface clearer.

Cc: stable@vger.kernel.org
Fixes: 8f0d304d21b3 ("drm/amd/display: Do not commit pipe when updating DRR")
Reviewed-by: Jun Lei <jun.lei@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:08:19 -04:00
Wenjing Liu
1482650bc7 drm/amd/display: update blank state on ODM changes
When we are dynamically adding new ODM slices, we didn't update
blank state, if the pipe used by new ODM slice is previously blanked,
we will continue outputting blank pixel data on that slice causing
right half of the screen showing blank image.

The previous fix was a temporary hack to directly update current state
when committing new state. This could potentially cause hw and sw
state synchronization issues and it is not permitted by dc commit
design.

Cc: stable@vger.kernel.org
Fixes: 7fbf451e7639 ("drm/amd/display: Reinit DPG when exiting dynamic ODM")
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:06:41 -04:00
Fudong Wang
72105dcfa3 drm/amd/display: Add smu write msg id fail retry process
A benchmark stress test (12-40 machines x 48hours) found that DCN315 has
cases where DC writes to an indirect register to set the smu clock msg
id, but when we go to read the same indirect register the returned msg
id doesn't match with what we just set it to. So, to fix this retry the
write until the register's value matches with the requested value.

Cc: stable@vger.kernel.org # 6.1+
Fixes: f94903996140 ("drm/amd/display: Add DCN315 CLK_MGR")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Fudong Wang <fudong.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:06:25 -04:00
Lijo Lazar
05347402d1 drm/amdgpu: Add SMU v13.0.6 default reset methods
For APUs with SMU v13.0.6, mode-2 reset is kept as default and for
others mode-1 is the default reset method.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:05:43 -04:00
Wenjing Liu
39c8b93a10 Partially revert "drm/amd/display: update add plane to context logic with a new algorithm"
This partially reverts commit 460ea8980511 ("drm/amd/display: update add
plane to context logic with a new algorithm").

The new secondary pipe allocation logic triggers an issue with a
specific hardware state transition and causes a frame of corruption when
toggling between windowed MPO and ODM desktop only mode. Ideally hwss is
supposed to handle this scenario. We are temporarily reverting the logic
and investigate the root cause why this transition would cause
corruptions.

Fixes: 460ea8980511 ("drm/amd/display: update add plane to context logic with a new algorithm")
Reviewed-by: Martin Leung <martin.leung@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:05:26 -04:00