81260 Commits

Author SHA1 Message Date
Nicholas Kazlauskas
c856f16c33 drm/amd/display: Set optimize_pwr_state for DCN31
[Why]
We'll exit optimized power state to do link detection but we won't enter
back into the optimized power state.

This could potentially block s2idle entry depending on the sequencing,
but it also means we're losing some power during the transition period.

[How]
Hook up the handler like DCN21. It was also missed like the
exit_optimized_pwr_state callback.

Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
George Shen
0d988e5de7 drm/amd/display: Remove CR AUX RD Interval limit for LTTPR
[Why]
DP spec specifies that DPRX shall use the read interval in the
TRAINING_AUX_RD_INTERVAL_PHY_REPEATER LTTPR DPCD register. This
register's bit definition is the same as the AUX read interval register
for DPRX.

[How}
Remove logic which forces AUX read interval to 100us for repeaters when
in LTTPR non-transparent mode.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Nicholas Kazlauskas
3db817fce4 drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
[Why]
Otherwise SMU won't mark Display as idle when trying to perform s2idle.

[How]
Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.

It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.

Fixes: 118a33151658 ("drm/amd/display: Add DCN3.1 clock manager support")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Alvin Lee
e56e9ad037 drm/amd/display: Fix check for null function ptr
[Why]
Bug fix for null function ptr (should check for NULL instead of not
NULL)

[How]
Fix if condition

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Samson Tam <samson.tam@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Lai, Derek
cdbc58386b drm/amd/display: Added power down for DCN10
[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.

[How]
Added power down for DCN10.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Nicholas Kazlauskas
2d0158497a drm/amd/display: Block z-states when stutter period exceeds criteria
[Why]
Stutter period won't be less than 5000.0, but if PSR is enabled then we
can potentially enter Z9 when MPO is enabled.

SMU will try to enter Z9 too early in these cases (before PSR is
enabled) and we'll see underflow.

[How]
Block z-states (z9, z10) until we can add a new interface to SMU to
signal when we can support z10 but not z9.

We can revert this once the interface change is in.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Shen, George
21bf3e6f14 drm/amd/display: Refactor vendor specific link training sequence
[Why]
Current implementation is not scalable and retrofits the existing
standard link training code for purposes outside of its original design.

[How]
Refactor vendor specific link training sequence into its own separate
function to be called instead of the standard link training function.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
George Shen
fddb024537 drm/amd/display: Limit max link cap with LTTPR caps
[Why]
Max link rate should be limited to the maximum link rate support by any
LTTPR that are connected, including when operating in transparent mode.

[How]
Include transparent mode when factoring in LTTPR max supported link
rate.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Charlene Liu
bf252ce1fa drm/amd/display: fix B0 TMDS deepcolor no dislay issue
[why]
B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.

[how]
use phyical instance when program PHY register.

[note]
could move resync_control programming to dmub next.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Surbhi Kakarya
b6fd6e0f5e drm/amdgpu: Check the memory can be accesssed by ttm_device_clear_dma_mappings.
If the event guard is enabled and VF doesn't receive an ack from PF for full access,
the guest driver load crashes.
This is caused due to the call to ttm_device_clear_dma_mappings with non-initialized
mman during driver tear down.

This patch adds the necessary condition to check if the mman initialization passed or not
and takes the path based on the condition output.

Signed-off-by: Surbhi Kakarya <Surbhi.Kakarya@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Guchun Chen
f89c6bf734 drm/amdkfd: correct sdma queue number in kfd device init (v3)
This patch keeps the setting of sdma queue number to the same
after recent KFD code refactor. Additionally, improve code to
use switch case to list IP version to complete kfd device_info
structure filling for IH version assignment. This makes consistency
with the IP parse code in amdgpu_discovery.c.

v2: use dev_warn for the default switch case;
    set default sdma queue per engine(8) and IH handler to v9. (Jonathan)

v3: Fix missed IP version check of Raven.

Fixes: f0dc99a6f742bc ("drm/amdkfd: add kfd_device_info_init function")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
67416bf853 drm/amdgpu: Access the FRU on Aldebaran
This is supported, although the offset is different from VG20, so fix
that with a variable and enable getting the product name and serial
number from the FRU. Do this for all SKUs since all SKUs have the FRU

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
de0af8a65e drm/amdgpu: Only overwrite serial if field is empty
On Aldebaran, the serial may be obtained from the FRU. Only overwrite
the serial with the unique_id if the serial is empty. This will support
printing serial numbers for mGPU devices where there are 2 unique_ids
for the 2 GPUs, but only one serial number for the board

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
4ad31fa15b drm/amdgpu: Enable unique_id for Aldebaran
It's supported, so support the unique_id sysfs file

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
6c92fe5fa5 drm/amdgpu: Increase potential product_name to 64 characters
Having seen at least 1 42-character product_name, bump the number up to
64, and put that definition into amdgpu.h to make future adjustments
simpler.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
yipechai
fd5256cbe1 drm/amdgpu: Remove the redundant code of psp bootloader functions
The psp bootloader functions code of psp_v13_0.c had been
optimized before. According the code style of psp_v13_0.c
to remove the redundant code of psp_v11_0.c.

v2: squash in drop unused variable (Alex)

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Leslie Shi
87172e89dc drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure
[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it
will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:24 -05:00
Dave Airlie
011e8c3239 Merge tag 'drm-intel-next-fixes-2021-12-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 fixes for the v5.17-rc1:
- Update FBC state even when not reallocating CFB

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87ee5vk54u.fsf@intel.com
2021-12-30 15:20:18 +10:00
Dave Airlie
aeeb82fd61 Merge tag 'amd-drm-fixes-5.16-2021-12-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.16-2021-12-29:

amdgpu:
- Fencing fix
- XGMI fix
- VCN regression fix
- IP discovery regression fixes
- Fix runpm documentation
- Suspend/resume fixes
- Yellow Carp display fixes
- MCLK power management fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211229155129.5789-1-alexander.deucher@amd.com
2021-12-30 13:55:48 +10:00
Dave Airlie
2b534e90a1 Merge tag 'drm-msm-next-2021-12-26' of ssh://gitlab.freedesktop.org/drm/msm into drm-next
* dpu plane state cleanup in prep for multirect
* dpu debugfs cleanup (and moving things to atomic_print_state) in prep
  for multirect
* dp support for sc7280
* struct_mutex removal
* include more GMU state in gpu devcore dumps
* add support for a506
* remove old eDP sub-driver (never was used in any upstream supported
  devices and modern things with eDP will use DP sub-driver instead)
* debugfs to disable hw gpu hang detect for (igt tests)
* debugfs for dumping display hw state
* and the usual assortment of cleanup and bug fixes

There still seems to be a timing issue with dpu, showing up on sc7180
devices, after the bridge probe-order change. Ie. things work great if
loglevel is high enough (or enough debug options are enabled, etc).
We'll continue to debug this in the new year.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs+vwr0nkwgYzuYAsCoHtypWpWav+yVvLZGsEJy8tJ56A@mail.gmail.com
2021-12-29 14:02:44 +10:00
Angus Wang
ee2698cf79 drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
[WHY]
Current implementation of pipe split policy prevents pipe split with
multiple displays connected, which caused the MCLK speed to be stuck at
max

[HOW]
Changed the pipe split policies so that pipe split is allowed for
multi-display configurations

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1522
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1709
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1655
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1403

Note this is a backport of this commit from amdgpu drm-next for 5.16.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Angus Wang <angus.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-12-28 17:02:31 -05:00
Nicholas Kazlauskas
33bb63915f drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
[Why]
A porting error on a previous patch left the block of code that
causes the crash from a NULL pointer dereference.

More specifically, we try to access link_enc before it's assigned in
the USB4 case in the following assignment:

config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;

[How]
That assignment occurs later depending on the ASIC version. It's only
needed on DCN31 and only after link_enc is already assigned.

Fixes: 986430446c917b ("drm/amd/display: fix a crash on USB4 over C20 PHY")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:51:24 -05:00
Nicholas Kazlauskas
33735c1c8d drm/amd/display: Set optimize_pwr_state for DCN31
[Why]
We'll exit optimized power state to do link detection but we won't enter
back into the optimized power state.

This could potentially block s2idle entry depending on the sequencing,
but it also means we're losing some power during the transition period.

[How]
Hook up the handler like DCN21. It was also missed like the
exit_optimized_pwr_state callback.

Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:51:24 -05:00
Nicholas Kazlauskas
a07f8b9983 drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
[Why]
Otherwise SMU won't mark Display as idle when trying to perform s2idle.

[How]
Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.

It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.

Fixes: 118a33151658 ("drm/amd/display: Add DCN3.1 clock manager support")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:51:24 -05:00
Lai, Derek
d97e631af2 drm/amd/display: Added power down for DCN10
[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.

[How]
Added power down for DCN10.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:51:24 -05:00
Charlene Liu
2eb82577a1 drm/amd/display: fix B0 TMDS deepcolor no dislay issue
[why]
B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.

[how]
use phyical instance when program PHY register.

[note]
could move resync_control programming to dmub next.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:51:24 -05:00
Zongmin Zhou
11544d77e3 drm/amdgpu: fixup bad vram size on gmc v8
Some boards(like RX550) seem to have garbage in the upper
16 bits of the vram size register.  Check for
this and clamp the size properly.  Fixes
boards reporting bogus amounts of vram.

after add this patch,the maximum GPU VRAM size is 64GB,
otherwise only 64GB vram size will be used.

Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:05:21 -05:00
Nicholas Kazlauskas
88eabcb8e6 drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
[Why]
A porting error on a previous patch left the block of code that
causes the crash from a NULL pointer dereference.

More specifically, we try to access link_enc before it's assigned in
the USB4 case in the following assignment:

config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;

[How]
That assignment occurs later depending on the ASIC version. It's only
needed on DCN31 and only after link_enc is already assigned.

Fixes: 986430446c917b ("drm/amd/display: fix a crash on USB4 over C20 PHY")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:35 -05:00
sashank saye
4da8b63944 drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling
For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: sashank saye <sashank.saye@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:19 -05:00
Rajneesh Bhardwaj
fbcdbfde87 drm/amdgpu: Don't inherit GEM object VMAs in child process
When an application having open file access to a node forks, its shared
mappings also get reflected in the address space of child process even
though it cannot access them with the object permissions applied. With the
existing permission checks on the gem objects, it might be reasonable to
also create the VMAs with VM_DONTCOPY flag so a user space application
doesn't need to explicitly call the madvise(addr, len, MADV_DONTFORK)
system call to prevent the pages in the mapped range to appear in the
address space of the child process. It also prevents the memory leaks
due to additional reference counts on the mapped BOs in the child
process that prevented freeing the memory in the parent for which we had
worked around earlier in the user space inside the thunk library.

Additionally, we faced this issue when using CRIU to checkpoint restore
an application that had such inherited mappings in the child which
confuse CRIU when it mmaps on restore. Having this flag set for the
render node VMAs helps. VMAs mapped via KFD already take care of this so
this is needed only for the render nodes.

To limit the impact of the change to user space consumers such as OpenGL
etc, limit it to KFD BOs only.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:08 -05:00
Tao Zhou
b6485bed40 drm/amdkfd: reset queue which consumes RAS poison (v2)
CP supports unmap queue with reset mode which only destroys specific queue without affecting others.
Replacing whole gpu reset with reset queue mode for RAS poison consumption
saves much time, and we can also fallback to gpu reset solution if reset
queue fails.

v2: Return directly if process is NULL;
    Reset queue solution is not applicable to SDMA, fallback to legacy
way;
    Call kfd_unref_process after lookup process.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:59 -05:00
Tao Zhou
dec6344338 drm/amdkfd: add reset queue function for RAS poison (v2)
The new interface unmaps queues with reset mode for the process consumes
RAS poison, it's only for compute queue.

v2: rename the function to reset_queues.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:47 -05:00
Tao Zhou
f6b80c04aa drm/amdkfd: add reset parameter for unmap queues
So we can set reset mode for unmap operation, no functional change.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:39 -05:00
Tao Zhou
f4409ee846 drm/amdgpu: add gpu reset control for umc page retirement
Add a reset parameter for umc page retirement, let user decide whether
call gpu reset in umc page retirement.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:32 -05:00
Victor Skvortsov
d764fb2af6 drm/amdgpu: Modify indirect register access for gfx9 sriov
Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

v2: Added a description for the scratch registers

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:25 -05:00
Victor Skvortsov
4a0165f060 drm/amdgpu: get xgmi info before ip_init
Driver needs to call get_xgmi_info() before ip_init
to determine whether it needs to handle a pending hive reset.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Reviewed by: shaoyun.liu <Shaoyun.lui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:17 -05:00
Victor Skvortsov
4aa325ae54 drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov
Modify GC register access from MMIO to RLCG if the indirect
flag is set

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:10 -05:00
Victor Skvortsov
92f153bb5a drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov
Modify GC register access from MMIO to RLCG if the
indirect flag is set

v2: Replaced ternary operator with if-else for better
readability

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:02 -05:00
Victor Skvortsov
0da6f6e587 drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions
Add helper macros to change register access
from direct to indirect.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:01:55 -05:00
Bokun Zhang
b18ff6925d drm/amdgpu: Filter security violation registers
Recently, there is security policy update under SRIOV.
We need to filter the registers that hit the violation
and move the code to the host driver side so that
the guest driver can execute correctly.

Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:00:47 -05:00
Alex Deucher
ebae897388 drm/amdgpu: no DC support for headless chips
Chips with no display hardware should return false for
DC support.

v2: drop Arcturus and Aldebaran

Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reported-by: Tareque Md.Hanif <tarequemd.hanif@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 09:07:30 -05:00
Evan Quan
7be3be2b02 drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and
put itself into a state ready for PNP. That can workaround some random resuming
failure observed on BOCO capable platforms.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-27 13:08:28 -05:00
Alex Deucher
daf8de0874 drm/amdgpu: always reset the asic in suspend (v2)
If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.

v2: handle s0ix

Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-27 12:16:34 -05:00
Prike Liang
8c45096c60 drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
In the s0ix entry need retain gfx in the gfxoff state,so here need't
set gfx cgpg in the S0ix suspend-resume process. Moreover move the S0ix
check into SMU12 can simplify the code condition check.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1712
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-27 11:57:25 -05:00
Ville Syrjälä
c65fe9cbbf drm/i915/fbc: Remember to update FBC state even when not reallocating CFB
We mustn't forget to update our FBC state even if we don't have
to reallocate the CFB. Otherwise we won't refresh our notion
of what eg. the new fence or the new override CFB stride
should be. Using the wrong CFB stride in particular can cause
underruns and could even corrupt other stuff in stolen.

Fixes: f4cfdbb02ca8 ("drm/i915/fbc: Nuke state_cache")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4774
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216110822.8461-1-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
(cherry picked from commit 798c5daf3cddff3f39c5542a50a2dbd83879b05d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-12-27 11:46:48 +02:00
Matthew Brost
d46f329a3f drm/i915: Increment composite fence seqno
Increment composite fence seqno on each fence creation.

Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214195913.35735-1-matthew.brost@intel.com
(cherry picked from commit 62eeb9ae1364cd96991ccc6e3c5c69d66b8c64df)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-12-27 11:33:40 +02:00
Matthew Brost
0f9d36af8f drm/i915: Fix possible uninitialized variable in parallel extension
'prev_engine' was declared inside the output loop and checked in the
inner after at least 1 pass of either loop. The variable should be
declared outside both loops as it needs to be persistent across the
entire loop structure.

Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211219001909.24348-1-matthew.brost@intel.com
(cherry picked from commit cbffbac9c14220b8716b0a9c29d72243f6b14ef3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-12-27 11:33:34 +02:00
Dave Airlie
040bf2a944 Short summary of fixes pull:
* bridge/lvds: Fix DT bindings
  * vmwgfx: Fix several issues with the recent conversion to GEM
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmHEP1sACgkQaA3BHVML
 eiOgnQgAmjsV2Y21demsRAgdAFdQa4QgNClnDUOPZpwfoWpaV9lXl4eh0h2z7Jv8
 Akh/s219AoGUAuvJ050tCp3w7H0b9qGSwCohyUa7BLQCcA+Ecd3/2ogD12vbQLMU
 WHfPD1a7r5UHaj7X9vtm8kAU/SrbAxtdL+JHPHBqbbeG0PyiU7tWYqC5JoLL81+D
 EjYhMxtxdv0WbkgOE4K+cMnYZTq3FVhYHSXUOWaP/nwH9w/l+tDjCYwKPV9bKYYW
 XL71PHg0k/3YKPo/iPqh7hmLY+NXDDN7wioTvrk60cBUIeH8YgHpDJEXaFxmEYrs
 fiLfJBo706zS/gsZGAP3XqYY7spU5A==
 =uWo8
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2021-12-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Short summary of fixes pull:

 * bridge/lvds: Fix DT bindings
 * vmwgfx: Fix several issues with the recent conversion to GEM

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YcRAH8lYbsoSCeY9@linux-uq9g.fritz.box
2021-12-24 06:24:06 +10:00
Dave Airlie
4817c37d71 Merge tag 'drm-intel-gt-next-2021-12-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:

- Added bits of DG2 support around page table handling (Stuart Summers, Matthew Auld)
- Fixed wakeref leak in PMU busyness during reset in GuC mode (Umesh Nerlige Ramappa)
- Fixed debugfs access crash if GuC failed to load (John Harrison)
- Bring back GuC error log to error capture, undoing accidental earlier breakage (Thomas Hellström)
- Fixed memory leak in error capture caused by earlier refactoring (Thomas Hellström)
- Exclude reserved stolen from driver use (Chris Wilson)
- Add memory region sanity checking and optional full test (Chris Wilson)
- Fixed buffer size truncation in TTM shmemfs backend (Robert Beckett)
- Use correct lock and don't overwrite internal data structures when stealing GuC context ids (Matthew Brost)
- Don't hog IRQs when destroying GuC contexts (John Harrison)
- Make GuC to Host communication more robust (Matthew Brost)
- Continuation of locking refactoring around VMA and backing store handling (Maarten Lankhorst)
- Improve performance of reading GuC log from debugfs (John Harrison)
- Log when GuC fails to reset an engine (John Harrison)
- Speed up GuC/HuC firmware loading by requesting RP0 (Vinay Belgaumkar)
- Further work on asynchronous VMA unbinding (Thomas Hellström, Christian König)

- Refactor GuC/HuC firmware handling to prepare for future platforms (John Harrison)
- Prepare for future different GuC/HuC firmware signing key sizes (Daniele Ceraolo Spurio, Michal Wajdeczko)
- Add noreclaim annotations (Matthew Auld)
- Remove racey GEM_BUG_ON between GPU reset and GuC communication handling (Matthew Brost)
- Refactor i915->gt with to_gt(i915) to prepare for future platforms (Michał Winiarski, Andi Shyti)
- Increase GuC log size for CONFIG_DEBUG_GEM (John Harrison)

- Fixed engine busyness in selftests when in GuC mode (Umesh Nerlige Ramappa)
- Make engine parking work with PREEMPT_RT (Sebastian Andrzej Siewior)
- Replace X86_FEATURE_PAT with pat_enabled() (Lucas De Marchi)
- Selftest for stealing of guc ids (Matthew Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YcRvKO5cyPvIxVCi@tursulin-mobl2
2021-12-24 06:14:51 +10:00
Dave Airlie
78942ae41d Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm-next
- make etnaviv work on IOMMU enabled systems
- fix mapping of command buffers on systems with more than 4GB RAM
- close a DoS vector
- fix spurious GPU resets

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/59619f8e9eb1d7ed7ea72cbead1f0aabc49f4e68.camel@pengutronix.de
2021-12-24 06:04:40 +10:00