8299 Commits

Author SHA1 Message Date
Prike Liang
b2a7e9735a drm/amdgpu: fix the hw hang during perform system reboot and reset
The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-14 12:48:01 -04:00
Evan Quan
028cfb2444 drm/amdgpu: fix wrong vram lost counter increment V2
Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:07:09 -04:00
Jason Yan
8e2f842063 drm/amdgpu: remove dead code in si_dpm.c
This code is dead, let's remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:42 -04:00
Aurabindo Pillai
dd4fa6c1b8 drm/amd/amdgpu: remove hardcoded module name in prints
Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:40 -04:00
Aurabindo Pillai
539489fc91 drm/amd/amdgpu: add print prefix for dev_* variants
Define dev_fmt macro for informative print messages

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:37 -04:00
Aurabindo Pillai
d57229b1da drm/amd/amdgpu: add prefix for pr_* prints
amdgpu uses lots of pr_* calls for printing error messages.
With this prefix, errors shall be more obvious to the end
use regarding its origin, and may help debugging.

Prefix format:

[xxx.xxxxx] amdgpu: ...

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:35 -04:00
Alex Deucher
a4c2468027 drm/amdgpu/ring: simplify scheduler setup logic
Set up a GPU scheduler based on the ring flag rather
than the ring type.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:26 -04:00
Alex Deucher
a783910d5c drm/amdgpu/kiq: add no_scheduler flag to KIQ
We don't want a GPU scheduler for this ring.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:24 -04:00
Alex Deucher
cb3d108501 drm/amdgpu/ring: add no_scheduler flag
This allows IPs to flag whether a specific ring requires
a GPU scheduler or not.  E.g., sometimes instances of an
IP are asymmetric and have different capabilities.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:19 -04:00
Evan Quan
dadce777e0 drm/amdgpu: fix wrong vram lost counter increment V2
Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:08 -04:00
Guchun Chen
ed72aa21c7 drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS
Prefix RAS message printing in GFX IP with PCI device info,
which assists the debug in multiple GPU case.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:03 -04:00
Yintian Tao
d32709dac6 drm/amdgpu: resume kiq access debugfs
If there is no GPU hang, user still can access
debugfs through kiq.

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Monk Liu <Monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:56 -04:00
Guchun Chen
6952e99cfd drm/amdgpu: refine ras related message print
Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:50 -04:00
Guchun Chen
1f3ef0efba drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb
Uncorrectable error count printing is missed when issuing UMC
UE injection. When going to the error count log function in GPU
recover work thread, there is no chance to get correct error count
value by last error injection and print, because the error status
register is automatically cleared after reading in UMC ecc irq
callback. So add such message printing in UMC ecc irq cb to be
consistent with other RAS error interrupt cases.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:44 -04:00
Yintian Tao
95a2f91738 drm/amdgpu: restrict debugfs register access under SR-IOV
Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:04 -04:00
Linus Torvalds
21c5b3c6d7 drm fixes for 5.7-rc1 (part two)
legacy:
 - fix drm_local_map.offset type
 
 ttm:
 - temporarily disable hugepages to debug amdgpu problems.
 
 prime:
 - fix sg extraction
 
 amdgpu:
 - Various Renoir fixes
 - Fix gfx clockgating sequence on gfx10
 - RAS fixes
 - Avoid MST property creation after registration
 - Various cursor/viewport fixes
 - Fix a confusing log message about optional firmwares
 
 i915:
 - Flush all the reloc_gpu batch (Chris)
 - Ignore readonly failures when updating relocs (Chris)
 - Fill all the unused space in the GGTT (Chris)
 - Return the right vswing table (Jose)
 - Don't enable DDI IO power on a TypeC port in TBT mode for ICL+ (Imre)
 
 analogix_dp:
 - probe fix
 
 virtio:
 - oob fix in object create
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJej453AAoJEAx081l5xIa+r7YQAIX48cUROehoNDhzEHnAxJuU
 WZXNHKaMCaDPzAs6SyCtHiPFWH6trBR5McE2dXfg6qc33lnzROFNp5PLB7qb4O+q
 +3QkJ5cGd1bohT7vn3omP9FxxZeD4H4bE/zat+yUPwMWJYSUz4m6w6Ya0rIPa8HS
 d9nRL0Y6wGBhm8/E0WCB6fe5G96D1JOFGLhfbVajGlDW/I+eBiS5WEyrtlIW698K
 Q7lTNOXKEi9kFEZiW39RbKwW3YwqOEiQf1k0KbvUqctq4qLskHD3MgJpmkAiGVPH
 mSnTYPPyATILVGKcmEHR3oB9wuYsoPhNgGCVhhm1MppI8GVUzfk6uqOfdK8UNfDU
 IRAZ05AynmMMFNu/4Fw0SyR1sbj4OtAiG0hWaJ6Ou9MBzhERGXfT3+/BzeHsR4MJ
 +fVIbOArSCAeFTAkqcLVbMKjivAJjullpsj36DFn+lXmHnxB98zAkSNT5dQcDjzl
 bp6FhJXm7pWYx8SvkGRneESqLdr2WVgyZmP6u+kgzZ5pPubWSDqY1IFu1exb5Sne
 bf7HoPzQ6LyD5KgX5WdoJ5++bcvQ9G4/qDF96NY6emCMKcwnOaAzvtErxQLpFeWP
 dZwnxHXXtxY4Z4r5bFURPeWX3rWX5f/cCZ8B7mTUDSTa4hgzV8yUX3ZQBc+9Knja
 zuvnpm4j1BXFqOg0Xfsu
 =ZTAf
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-04-10' of git://anongit.freedesktop.org/drm/drm

Pull more drm fixes from Dave Airlie:
 "As expected, more fixes did turn up in the latter part of the week.

  The drm_local_map build regression fix is here, along with temporary
  disabling of the hugepage work due to some amdgpu related crashes.

  Otherwise it's just a bunch of i915, and amdgpu fixes.

  legacy:
   - fix drm_local_map.offset type

  ttm:
   - temporarily disable hugepages to debug amdgpu problems.

  prime:
   - fix sg extraction

  amdgpu:
   - Various Renoir fixes
   - Fix gfx clockgating sequence on gfx10
   - RAS fixes
   - Avoid MST property creation after registration
   - Various cursor/viewport fixes
   - Fix a confusing log message about optional firmwares

  i915:
   - Flush all the reloc_gpu batch (Chris)
   - Ignore readonly failures when updating relocs (Chris)
   - Fill all the unused space in the GGTT (Chris)
   - Return the right vswing table (Jose)
   - Don't enable DDI IO power on a TypeC port in TBT mode for ICL+ (Imre)

  analogix_dp:
   - probe fix

  virtio:
   - oob fix in object create"

* tag 'drm-next-2020-04-10' of git://anongit.freedesktop.org/drm/drm: (34 commits)
  drm/ttm: Temporarily disable the huge_fault() callback
  drm/bridge: analogix_dp: Split bind() into probe() and real bind()
  drm/legacy: Fix type for drm_local_map.offset
  drm/amdgpu/display: fix warning when compiling without debugfs
  drm/amdgpu: unify fw_write_wait for new gfx9 asics
  drm/amd/powerplay: error out on forcing clock setting not supported
  drm/amdgpu: fix gfx hang during suspend with video playback (v2)
  drm/amd/display: Check for null fclk voltage when parsing clock table
  drm/amd/display: Acknowledge wm_optimized_required
  drm/amd/display: Make cursor source translation adjustment optional
  drm/amd/display: Calculate scaling ratios on every medium/full update
  drm/amd/display: Program viewport when source pos changes for DCN20 hw seq
  drm/amd/display: Fix incorrect cursor pos on scaled primary plane
  drm/amd/display: change default pipe_split policy for DCN1
  drm/amd/display: Translate cursor position by source rect
  drm/amd/display: Update stream adjust in dc_stream_adjust_vmin_vmax
  drm/amd/display: Avoid create MST prop after registration
  drm/amdgpu/psp: dont warn on missing optional TA's
  drm/amdgpu: update RAS related dmesg print
  drm/amdgpu: resolve mGPU RAS query instability
  ...
2020-04-10 12:38:28 -07:00
Likun Gao
bfa5807d4d Revert "drm/amdgpu: change SH MEM alignment mode for gfx10"
This reverts commit b74fb888f4927e2079be576ce6dcdbf0c420f1f8.
Revert the auto alignment mode set of SH MEM config, as it will result
to OCL Conformance Test fail.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:44:41 -04:00
John Clements
9a785c7ad1 drm/amdgpu: increased atom cmd timeout
added macro to define timeout

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:33 -04:00
Aurabindo Pillai
ad36d71b3f amdgpu_kms: Remove unnecessary condition check
Execution will only reach here if the asserted condition is true.
Hence there is no need for the additional check.

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Aaron Liu
ba714a56fc drm/amdgpu: unify fw_write_wait for new gfx9 asics
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Yuxian Dai <Yuxian.Dai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang
2eee0229f6 drm/amdgpu: support access regs outside of mmio bar
add indirect access support to registers outside of
mmio bar.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang
f384ff95f6 drm/amdgpu: retire AMDGPU_REGS_KIQ flag
all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang
ec59847e74 drm/amdgpu: retire RREG32_IDX/WREG32_IDX
those are not needed anymore

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang
3c888c1635 drm/amdgpu: retire indirect mmio reg support from cgs
not needed anymore

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang
46e840ed10 drm/amdgpu: replace indirect mmio access in non-dc code path
all the mmCUR_CONTROL instances are in mmr range and
can be accessd directly by using RREG32/WREG32

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang
dec0520aff drm/amdgpu: remove inproper workaround for vega10
the workaround is not needed for soc15 ASICs except
for vega10. it is even not needed with latest vega10
vbios.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Prike Liang
a23ca7f76d drm/amdgpu: fix gfx hang during suspend with video playback (v2)
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:17 -04:00
Kent Russell
1ea2b260eb drm/amdgpu: Re-enable FRU check for most models v5
There is at least 1 VG20 DID that does not have an FRU, and trying to read
that will cause a hang. For now, explicitly support reading the FRU for
Arcturus and for the WKS VG20 DIDs, and skip for everything else.
This re-enables serial number reporting for server cards

v2: Add ASIC check
v3: Don't default to true for pre-VG20
v4: Use DID instead of parsing the VBIOS
v5: Sqaush in overflow warning fix

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:17 -04:00
Jack Zhang
b639c22c98 drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset
[PATCH 2/2]
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those
allocated memories and get memory leak.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Jack Zhang
fe9824d15e drm/amdkfd Avoid destroy hqd when GPU is on reset
This reverts commit 5161bba4311f in order to split it into two
different patches, and this will make it easier to understand.

[PATCH 1/2]
porting to gfx10 from
commit 1b0bfcff463f390c40 ("drm/amdgpu: Avoid destroy hqd when GPU is on reset")

Originally, MEC is touched
without GPU initialized first.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
John Clements
4a06686b94 drm/amdgpu: update RAS related dmesg print
prefix RAS error related dmesg print with pci device info

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
John Clements
b3dbd6d3ec drm/amdgpu: resolve mGPU RAS query instability
upon receiving uncorrectable error, query every GPU node for ras errors

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Chengming Gui
c419bdf5b8 drm/amd/amdgpu: Correct gfx10's CG sequence
Incorrect CG sequence will cause gfx timedout,
if we keep switching power profile mode
(enter profile mod such as PEAK will disable CG,
exit profile mode EXIT will enable CG)
when run Vulkan test case(case used for test: vkexample).

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Tianci.Yin
b2d92682ff drm/amdgpu: add SPM golden settings for Navi12
Add RLC_SPM golden settings

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Tianci.Yin
a900f562c8 drm/amdgpu: add SPM golden settings for Navi14
Add RLC_SPM golden settings

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Tianci.Yin
4189425d30 drm/amdgpu: add SPM golden settings for Navi10(v2)
Add RLC_SPM golden settings

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Oak Zeng
d2155a719d drm/amdgpu: Print UTCL2 client ID on a gpuvm fault
UTCL2 client ID is useful information to get which
UTCL2 client caused the gpuvm fault. Print it out
for debug purpose

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
James Zhu
21b704d783 drm/amdgpu/vcn: add shared memory restore after wake up from sleep.
VCN shared memory needs restore after wake up during S3 test.

v2: Allocate shared memory saved_bo at sw_init and free it in sw_fini.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Aaron Ma
2a20e630f8 drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event
On ARCTURUS and RENOIR, powerplay is not supported yet.
When plug in or unplug power jack, ACPI event will issue.
Then kernel NULL pointer BUG will be triggered.
Check for NULL pointers before calling.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Alex Deucher
a45a9e5e10 drm/amdgpu/psp: dont warn on missing optional TA's
Replace dev_warn() with dev_info() and note that they are
optional to avoid confusing users.

The RAS TAs only exist on server boards and the HDCP and DTM
TAs only exist on client boards.  They are optional either way.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Nirmoy Das
1c6d567bdf drm/amdgpu: rework sched_list generation
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Nirmoy Das
07e14845d1 drm/amdgpu: sync ring type and drm hw_ip type
Use AMDGPU_HW_IP_* to set amdgpu_ring_type enum values

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Jack Zhang
04bef61e5d drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those allocated
memories and get memory leak.

v2:add a bugfix for kiq ring test fail

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Aaron Liu
2960758cce drm/amdgpu: unify fw_write_wait for new gfx9 asics
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Yuxian Dai <Yuxian.Dai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-08 17:52:38 -04:00
Prike Liang
487eca11a3 drm/amdgpu: fix gfx hang during suspend with video playback (v2)
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-08 17:51:03 -04:00
Linus Torvalds
f5e94d10e4 drm fixes for 5.7-rc1
core:
 - revert drm_mm atomic patch
 - dt binding fixes
 
 fbcon:
 - null ptr error fix
 
 i915:
 - GVT fixes
 
 nouveau:
 - runpm fix
 - svm fixes
 
 amdgpu:
 - HDCP fixes
 - gfx10 fix
 - Misc display fixes
 - BACO fixes
 
 amdkfd:
 - Fix memory leak
 
 vboxvideo:
 - remove conflicting fbs
 
 vc4:
 - mode validation fix
 
 xen:
 - fix PTR_ERR usage
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJejR7RAAoJEAx081l5xIa+heoP/jBjHdwaZsLOkZFqFr8613mR
 juL53xEn7RK7lnVSWBRqAdJznRyJqAyCtDrxW5Au3t0x6zetOv3AhS8fQBtqv/Ze
 ghSdLZdjuTd4Lm2qyS2aWU3DBNnBlYcGcTD6bxwJn3EXSSR1z8YabT2osOOhj7cz
 HwZjhW/XmIOpuhCKDEyyzeTCSLRISBdzgipIbuUXHeqUGB2jt/vpHTiqm+FgTFo5
 hrHfM2EPpUK7LDq+REfXFeIwaQvtRDB1AJ3p8JM9iLt2GvTfavjGIDKDG9XReQhC
 l8WeaMQD69ZbujmBX+XRuP7qj+vfnVYcV/RuMm8Yr09W2Ac8RZOQknvp9PlVmqVl
 xYAvSu1DSD/88/5fvaDxYR2pyCE2ui+3hbFd9WHGFEX8h/cIhGJZ+cZqDO4K0nmY
 vw5JmtSr0Eiwrv2dFn/CFYCxxGz0BdGxVOskbUCwZUPHaTLTAbpfFIaKEABW/sbR
 k5P+cQFjxUJMbCMQeZKSa7L2GZAjf/K/SKyRNMeBuCsfn8KpEUo1W4hhGijtxL//
 akwBFwVeZ0bLTELwB6mFdYGQ987vIcfYjJV0ochPlJCPWJ1OGx6jh1+gYR81W7Np
 5mVFApS2FybmBaTQYWH0E+AhJxyxEHE6spgaGCgCHTlknANmCu3lF90NFVPAK7JB
 Vvlj3gJnOzoguejWzomE
 =HNn4
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This is a set of fixes that have queued up, I think I might have
  another pull with some more before rc1 but I'd like to dequeue what I
  have now just in case Easter is more eggciting that expected.

  The main thing in here is a fix for a longstanding nouveau power
  management issues on certain laptops, it should help runtime
  suspend/resume for a lot of people.

  There is also a reverted patch for some drm_mm behaviour in atomic
  contexts.

  Summary:

  core:
   - revert drm_mm atomic patch
   - dt binding fixes

  fbcon:
   - null ptr error fix

  i915:
   - GVT fixes

  nouveau:
   - runpm fix
   - svm fixes

  amdgpu:
   - HDCP fixes
   - gfx10 fix
   - Misc display fixes
   - BACO fixes

  amdkfd:
   - Fix memory leak

  vboxvideo:
   - remove conflicting fbs

  vc4:
   - mode validation fix

  xen:
   - fix PTR_ERR usage"

* tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm: (41 commits)
  drm/nouveau/kms/nv50-: wait for FIFO space on PIO channels
  drm/nouveau/nvif: protect waits against GPU falling off the bus
  drm/nouveau/nvif: access PTIMER through usermode class, if available
  drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
  drm/nouveau: workaround runpm fail by disabling PCI power management on certain intel bridges
  drm/nouveau/svm: remove useless SVM range check
  drm/nouveau/svm: check for SVM initialized before migrating
  drm/nouveau/svm: fix vma range check for migration
  drm/nouveau: remove checks for return value of debugfs functions
  drm/nouveau/ttm: evict other IO mappings when running out of BAR1 space
  drm/amdkfd: kfree the wrong pointer
  drm/amd/display: increase HDCP authentication delay
  drm/amd/display: Correctly cancel future watchdog and callback events
  drm/amd/display: Don't try hdcp1.4 when content_type is set to type1
  drm/amd/powerplay: move the ASIC specific nbio operation out of smu_v11_0.c
  drm/amd/powerplay: drop redundant BIF doorbell interrupt operations
  drm/amd/display: Fix dcn21 num_states
  drm/amd/display: Enable BT2020 in COLOR_ENCODING property
  drm/amd/display: LFC not working on 2.0x range monitors (v2)
  drm/amd/display: Support plane level CTM
  ...
2020-04-07 20:24:34 -07:00
Alex Deucher
8f0622a19b drm/amdgpu/psp: dont warn on missing optional TA's
Replace dev_warn() with dev_info() and note that they are
optional to avoid confusing users.

The RAS TAs only exist on server boards and the HDCP and DTM
TAs only exist on client boards.  They are optional either way.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:07:41 -04:00
John Clements
2b961e6a95 drm/amdgpu: update RAS related dmesg print
prefix RAS error related dmesg print with pci device info

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:02:36 -04:00
John Clements
0b9ebd7eeb drm/amdgpu: resolve mGPU RAS query instability
upon receiving uncorrectable error, query every GPU node for ras errors

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:01:43 -04:00
Chengming Gui
dec7880579 drm/amd/amdgpu: Correct gfx10's CG sequence
Incorrect CG sequence will cause gfx timedout,
if we keep switching power profile mode
(enter profile mod such as PEAK will disable CG,
exit profile mode EXIT will enable CG)
when run Vulkan test case(case used for test: vkexample).

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:01:06 -04:00