80140 Commits

Author SHA1 Message Date
George Shen
0d988e5de7 drm/amd/display: Remove CR AUX RD Interval limit for LTTPR
[Why]
DP spec specifies that DPRX shall use the read interval in the
TRAINING_AUX_RD_INTERVAL_PHY_REPEATER LTTPR DPCD register. This
register's bit definition is the same as the AUX read interval register
for DPRX.

[How}
Remove logic which forces AUX read interval to 100us for repeaters when
in LTTPR non-transparent mode.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Nicholas Kazlauskas
3db817fce4 drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
[Why]
Otherwise SMU won't mark Display as idle when trying to perform s2idle.

[How]
Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.

It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.

Fixes: 118a33151658 ("drm/amd/display: Add DCN3.1 clock manager support")

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Alvin Lee
e56e9ad037 drm/amd/display: Fix check for null function ptr
[Why]
Bug fix for null function ptr (should check for NULL instead of not
NULL)

[How]
Fix if condition

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Samson Tam <samson.tam@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Lai, Derek
cdbc58386b drm/amd/display: Added power down for DCN10
[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.

[How]
Added power down for DCN10.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Nicholas Kazlauskas
2d0158497a drm/amd/display: Block z-states when stutter period exceeds criteria
[Why]
Stutter period won't be less than 5000.0, but if PSR is enabled then we
can potentially enter Z9 when MPO is enabled.

SMU will try to enter Z9 too early in these cases (before PSR is
enabled) and we'll see underflow.

[How]
Block z-states (z9, z10) until we can add a new interface to SMU to
signal when we can support z10 but not z9.

We can revert this once the interface change is in.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Shen, George
21bf3e6f14 drm/amd/display: Refactor vendor specific link training sequence
[Why]
Current implementation is not scalable and retrofits the existing
standard link training code for purposes outside of its original design.

[How]
Refactor vendor specific link training sequence into its own separate
function to be called instead of the standard link training function.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
George Shen
fddb024537 drm/amd/display: Limit max link cap with LTTPR caps
[Why]
Max link rate should be limited to the maximum link rate support by any
LTTPR that are connected, including when operating in transparent mode.

[How]
Include transparent mode when factoring in LTTPR max supported link
rate.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Charlene Liu
bf252ce1fa drm/amd/display: fix B0 TMDS deepcolor no dislay issue
[why]
B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.

[how]
use phyical instance when program PHY register.

[note]
could move resync_control programming to dmub next.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:44 -05:00
Surbhi Kakarya
b6fd6e0f5e drm/amdgpu: Check the memory can be accesssed by ttm_device_clear_dma_mappings.
If the event guard is enabled and VF doesn't receive an ack from PF for full access,
the guest driver load crashes.
This is caused due to the call to ttm_device_clear_dma_mappings with non-initialized
mman during driver tear down.

This patch adds the necessary condition to check if the mman initialization passed or not
and takes the path based on the condition output.

Signed-off-by: Surbhi Kakarya <Surbhi.Kakarya@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Guchun Chen
f89c6bf734 drm/amdkfd: correct sdma queue number in kfd device init (v3)
This patch keeps the setting of sdma queue number to the same
after recent KFD code refactor. Additionally, improve code to
use switch case to list IP version to complete kfd device_info
structure filling for IH version assignment. This makes consistency
with the IP parse code in amdgpu_discovery.c.

v2: use dev_warn for the default switch case;
    set default sdma queue per engine(8) and IH handler to v9. (Jonathan)

v3: Fix missed IP version check of Raven.

Fixes: f0dc99a6f742bc ("drm/amdkfd: add kfd_device_info_init function")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
67416bf853 drm/amdgpu: Access the FRU on Aldebaran
This is supported, although the offset is different from VG20, so fix
that with a variable and enable getting the product name and serial
number from the FRU. Do this for all SKUs since all SKUs have the FRU

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
de0af8a65e drm/amdgpu: Only overwrite serial if field is empty
On Aldebaran, the serial may be obtained from the FRU. Only overwrite
the serial with the unique_id if the serial is empty. This will support
printing serial numbers for mGPU devices where there are 2 unique_ids
for the 2 GPUs, but only one serial number for the board

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
4ad31fa15b drm/amdgpu: Enable unique_id for Aldebaran
It's supported, so support the unique_id sysfs file

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Kent Russell
6c92fe5fa5 drm/amdgpu: Increase potential product_name to 64 characters
Having seen at least 1 42-character product_name, bump the number up to
64, and put that definition into amdgpu.h to make future adjustments
simpler.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
yipechai
fd5256cbe1 drm/amdgpu: Remove the redundant code of psp bootloader functions
The psp bootloader functions code of psp_v13_0.c had been
optimized before. According the code style of psp_v13_0.c
to remove the redundant code of psp_v11_0.c.

v2: squash in drop unused variable (Alex)

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
Leslie Shi
87172e89dc drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure
[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it
will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:24 -05:00
Zongmin Zhou
11544d77e3 drm/amdgpu: fixup bad vram size on gmc v8
Some boards(like RX550) seem to have garbage in the upper
16 bits of the vram size register.  Check for
this and clamp the size properly.  Fixes
boards reporting bogus amounts of vram.

after add this patch,the maximum GPU VRAM size is 64GB,
otherwise only 64GB vram size will be used.

Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:05:21 -05:00
Nicholas Kazlauskas
88eabcb8e6 drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config
[Why]
A porting error on a previous patch left the block of code that
causes the crash from a NULL pointer dereference.

More specifically, we try to access link_enc before it's assigned in
the USB4 case in the following assignment:

config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;

[How]
That assignment occurs later depending on the ASIC version. It's only
needed on DCN31 and only after link_enc is already assigned.

Fixes: 986430446c917b ("drm/amd/display: fix a crash on USB4 over C20 PHY")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:35 -05:00
sashank saye
4da8b63944 drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling
For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: sashank saye <sashank.saye@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:19 -05:00
Rajneesh Bhardwaj
fbcdbfde87 drm/amdgpu: Don't inherit GEM object VMAs in child process
When an application having open file access to a node forks, its shared
mappings also get reflected in the address space of child process even
though it cannot access them with the object permissions applied. With the
existing permission checks on the gem objects, it might be reasonable to
also create the VMAs with VM_DONTCOPY flag so a user space application
doesn't need to explicitly call the madvise(addr, len, MADV_DONTFORK)
system call to prevent the pages in the mapped range to appear in the
address space of the child process. It also prevents the memory leaks
due to additional reference counts on the mapped BOs in the child
process that prevented freeing the memory in the parent for which we had
worked around earlier in the user space inside the thunk library.

Additionally, we faced this issue when using CRIU to checkpoint restore
an application that had such inherited mappings in the child which
confuse CRIU when it mmaps on restore. Having this flag set for the
render node VMAs helps. VMAs mapped via KFD already take care of this so
this is needed only for the render nodes.

To limit the impact of the change to user space consumers such as OpenGL
etc, limit it to KFD BOs only.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:08 -05:00
Tao Zhou
b6485bed40 drm/amdkfd: reset queue which consumes RAS poison (v2)
CP supports unmap queue with reset mode which only destroys specific queue without affecting others.
Replacing whole gpu reset with reset queue mode for RAS poison consumption
saves much time, and we can also fallback to gpu reset solution if reset
queue fails.

v2: Return directly if process is NULL;
    Reset queue solution is not applicable to SDMA, fallback to legacy
way;
    Call kfd_unref_process after lookup process.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:59 -05:00
Tao Zhou
dec6344338 drm/amdkfd: add reset queue function for RAS poison (v2)
The new interface unmaps queues with reset mode for the process consumes
RAS poison, it's only for compute queue.

v2: rename the function to reset_queues.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:47 -05:00
Tao Zhou
f6b80c04aa drm/amdkfd: add reset parameter for unmap queues
So we can set reset mode for unmap operation, no functional change.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:39 -05:00
Tao Zhou
f4409ee846 drm/amdgpu: add gpu reset control for umc page retirement
Add a reset parameter for umc page retirement, let user decide whether
call gpu reset in umc page retirement.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:32 -05:00
Victor Skvortsov
d764fb2af6 drm/amdgpu: Modify indirect register access for gfx9 sriov
Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.

v2: Added a description for the scratch registers

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:25 -05:00
Victor Skvortsov
4a0165f060 drm/amdgpu: get xgmi info before ip_init
Driver needs to call get_xgmi_info() before ip_init
to determine whether it needs to handle a pending hive reset.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Reviewed by: shaoyun.liu <Shaoyun.lui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:17 -05:00
Victor Skvortsov
4aa325ae54 drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov
Modify GC register access from MMIO to RLCG if the indirect
flag is set

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:10 -05:00
Victor Skvortsov
92f153bb5a drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov
Modify GC register access from MMIO to RLCG if the
indirect flag is set

v2: Replaced ternary operator with if-else for better
readability

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:02:02 -05:00
Victor Skvortsov
0da6f6e587 drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions
Add helper macros to change register access
from direct to indirect.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:01:55 -05:00
Bokun Zhang
b18ff6925d drm/amdgpu: Filter security violation registers
Recently, there is security policy update under SRIOV.
We need to filter the registers that hit the violation
and move the code to the host driver side so that
the guest driver can execute correctly.

Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:00:47 -05:00
Alex Deucher
0cd7f378b0 drm/amdgpu: add support for IP discovery gc_info table v2
Used on gfx9 based systems. Fixes incorrect CU counts reported
in the kernel log.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1833
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 14:08:20 -05:00
Jiapeng Chong
799dce6fbd drm/amd/display: Fix warning comparing pointer to 0
Fix the following coccicheck warning:

./drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c:744:35-36:
WARNING comparing pointer to 0.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 14:08:20 -05:00
Alex Deucher
109a357f28 drm/amdgpu: clean up some leftovers from bring up
Some old registers leftover from pre-silicon.  No longer
relevant on real hardware.  Remove.

Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 14:08:20 -05:00
Victor Skvortsov
892deb4826 drm/amdgpu: Separate vf2pf work item init from virt data exchange
We want to be able to call virt data exchange conditionally
after gmc sw init to reserve bad pages as early as possible.
Since this is a conditional call, we will need
to call it again unconditionally later in the init sequence.

Refactor the data exchange function so it can be
called multiple times without re-initializing the work item.

v2: Cleaned up the code. Kept the original call to init_exchange_data()
inside early init to initialize the work item, afterwards call
exchange_data() when needed.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed By: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 14:08:20 -05:00
Changcheng Deng
d999bc81ac drm/amdkfd: use max() and min() to make code cleaner
Use max() and min() in order to make code cleaner.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 14:08:20 -05:00
chen gong
d4c2933fb8 drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:

amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<vcn_v1_0> failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 14:08:10 -05:00
Lijo Lazar
3c27abee3f drm/amd/pm: Fix xgmi link control on aldebaran
Fix the message argument.
	0: Allow power down
	1: Disallow power down

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:43:46 -05:00
Mario Limonciello
0ff76b5334 drm/amd/pm: restore SMU version print statement for dGPUs
This is still needed for thoes in case the firmware fails to load
then the message is the only way to tell what firmware was on them

Suggested-by: Lijo Lazar <lijo.Lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:43:46 -05:00
Philip Yang
69879b3083 drm/amdkfd: fix svm_bo release invalid wait context warning
Add svm_range_bo_unref_async to schedule work to wait for svm_bo
eviction work done and then free svm_bo. __do_munmap put_page
is atomic context, call svm_range_bo_unref_async to avoid warning
invalid wait context. Other non atomic context call svm_range_bo_unref.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:43:46 -05:00
Huang Rui
5c1e6fa49e drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence
The job embedded fence donesn't initialize the flags at
dma_fence_init(). Then we will go a wrong way in
amdgpu_fence_get_timeline_name callback and trigger a null pointer panic
once we enabled the trace event here. So introduce new amdgpu_fence
object to indicate the job embedded fence.

[  156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0
[  156.131804] #PF: supervisor read access in kernel mode
[  156.131811] #PF: error_code(0x0000) - not-present page
[  156.131817] PGD 0 P4D 0
[  156.131824] Oops: 0000 [#1] PREEMPT SMP PTI
[  156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G           OE     5.16.0-rc1-custom #1
[  156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
[  156.131848] RIP: 0010:strlen+0x0/0x20
[  156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[  156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206
[  156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b
[  156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0
[  156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000
[  156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0
[  156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007
[  156.131914] FS:  0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000
[  156.131923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0
[  156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  156.131949] Call Trace:
[  156.131953]  <TASK>
[  156.131957]  ? trace_event_raw_event_dma_fence+0xcc/0x200
[  156.131973]  ? ring_buffer_unlock_commit+0x23/0x130
[  156.131982]  dma_fence_init+0x92/0xb0
[  156.131993]  amdgpu_fence_emit+0x10d/0x2b0 [amdgpu]
[  156.132302]  amdgpu_ib_schedule+0x2f9/0x580 [amdgpu]
[  156.132586]  amdgpu_job_run+0xed/0x220 [amdgpu]

v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:42:36 -05:00
Prike Liang
f296a0bcc9 drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
In the s0ix entry need retain gfx in the gfxoff state,so here need't
set gfx cgpg in the S0ix suspend-resume process. Moreover move the S0ix
check into SMU12 can simplify the code condition check.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:34:14 -05:00
Mario Limonciello
9f952378fc drivers/amd/pm: smu13: use local variable adev
Since this variable was made available by the previous commit, use
it to make function access cleaner.

Suggested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:34:03 -05:00
Xiang wangx
a342655865 drm/radeon: Fix syntax errors in comments
Delete the redundant word 'we'.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:11:02 -05:00
Yann Dirson
326db0dc00 amdgpu: fix some comment typos
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:10:58 -05:00
Yann Dirson
03f2abb07e amdgpu: fix some kernel-doc markup
Those are not today pulled by the sphinx doc, but better be ready.

Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:10:53 -05:00
Lang Yu
583637d66a drm/amd/pm: fix a potential gpu_metrics_table memory leak
Memory is allocated for gpu_metrics_table in renoir_init_smc_tables(),
but not freed in int smu_v12_0_fini_smc_tables(). Free it!

Fixes: 95868b85764a ("drm/amd/powerplay: add Renoir support for gpu metrics export")

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:10:08 -05:00
Jingwen Chen
948e7ce014 drm/amd/amdgpu: fix gmc bo pin count leak in SRIOV
[Why]
gmc bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu

[How]
add amdgpu_in_reset and sriov judgement to skip pin bo

v2: fix wrong judgement

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:10:00 -05:00
Jingwen Chen
85dfc1d692 drm/amd/amdgpu: fix psp tmr bo pin count leak in SRIOV
[Why]
psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu

[How]
add amdgpu_in_reset and sriov judgement to skip pin bo

v2: fix wrong judgement

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:09:49 -05:00
Lijo Lazar
91e16017b6 drm/amd/pm: Skip power state allocation
Power states are not valid for arcturus and aldebaran, no need to
allocate memory.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:09:42 -05:00
Evan Quan
17252701ec drm/amdgpu: correct the wrong cached state for GMC on PICASSO
Pair the operations did in GMC ->hw_init and ->hw_fini. That
can help to maintain correct cached state for GMC and avoid
unintention gate operation dropping due to wrong cached state.

BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:09:31 -05:00