8299 Commits

Author SHA1 Message Date
Luben Tuikov
9af5e21dac drm/scheduler: Remove priority macro INVALID (v2)
Remove DRM_SCHED_PRIORITY_INVALID. We no longer
carry around an invalid priority and cut it off
at the source.

Backwards compatibility behaviour of AMDGPU CTX
IOCTL passing in garbage for context priority
from user space and then mapping that to
DRM_SCHED_PRIORITY_NORMAL is preserved.

v2: Revert "res"  --> "r" and
           "prio" --> "priority".

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:20:26 -04:00
Luben Tuikov
e2d732fdb7 drm/scheduler: Scheduler priority fixes (v2)
Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.

Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)

Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.

v2: Add back KERNEL and remove SW and HW,
    in lieu of a single HIGH between NORMAL and KERNEL.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 18:20:17 -04:00
Huang Rui
34174b89bf drm/amdkfd: fix the wrong sdma instance query for renoir
Renoir only has one sdma instance, it will get failed once query the
sdma1 registers. So use switch-case instead of static register array.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 17:54:12 -04:00
Bhawanpreet Lakha
0a668aee0a drm/amdgpu: parse ta firmware for navy_flounder
Use the same case as sienna_cichlid

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 17:53:50 -04:00
Guchun Chen
1a68d96f81 drm/amdgpu: fix NULL pointer access issue when unloading driver
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G           OE     5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS:  00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 simple_recursive_removal+0x63/0x370
 ? debugfs_remove+0x60/0x60
 debugfs_remove+0x40/0x60
 amdgpu_ras_fini+0x82/0x230 [amdgpu]
 ? __kernfs_remove.part.17+0x101/0x1f0
 ? kernfs_name_hash+0x12/0x80
 amdgpu_device_fini+0x1c0/0x580 [amdgpu]
 amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
 amdgpu_pci_remove+0x36/0x60 [amdgpu]
 pci_device_remove+0x3b/0xb0
 device_release_driver_internal+0xe5/0x1c0
 driver_detach+0x46/0x90
 bus_remove_driver+0x58/0xd0
 pci_unregister_driver+0x29/0x90
 amdgpu_exit+0x11/0x25 [amdgpu]
 __x64_sys_delete_module+0x13d/0x210
 do_syscall_64+0x5f/0x250
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 17:00:46 -04:00
Jiansong Chen
9c9b17a7d1 drm/amdgpu: disable gfxoff for navy_flounder
gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18 16:58:59 -04:00
Maxime Ripard
d85ddd1318 Linux 5.9-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl85kWkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGGPwIAJpEmEBkMoQ+KARK
 PaaVDQW9fwAlC1nThMpGv/m8Ym7KbfLkTgEJQiQyNv3pDDhyLP8jvcZcscIkfs4s
 56IMjFndRHWNeCVu9YPXWmAEp/WycZNC7YVPu0j1bI9VgvaHvbHOqUWzxB716RbY
 K4TFprJEA3sotNm0vdda2NgSlSup/0NVKiP2LwQPjkwH+Kf6/Ol1j2uxbWywEo75
 BdW5LreDtUoJ7W5BeX8GJ0IVgWdyxBV61eVbaINNY3EOPc7+uMGOgR9oHeGWRceH
 V4ELYww5yjizUDtKFvVTc/k0tj+Rq73mtOADdaF0YWItqxtDBvAcdKIpC0KYzVaa
 2fB+rts=
 =9Pnj
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXzvIXwAKCRDj7w1vZxhR
 xYXLAQC80uF6JkpBeNyuewyY7CDadDG1qDchDmYquwGVDnO+HwEAmvL84csLcxBy
 ah3UMOKUyWz5Sahlg48ZIaaUhRaulwE=
 =Lu/B
 -----END PGP SIGNATURE-----

Merge v5.9-rc1 into drm-misc-next

Sam needs 5.9-rc1 to have dev_err_probe in to merge some patches.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2020-08-18 14:14:25 +02:00
Kevin Wang
4444457450 drm/amdgpu: add condition check for trace_amdgpu_cs()
v1:
add trace event enabled check to avoid nop loop when submit multi ibs
in amdgpu_cs_ioctl() function.

v2:
add a new wrapper function to trace all amdgpu cs ibs.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-17 14:06:37 -04:00
Kevin Wang
736b172978 drm/amdgpu: fix amdgpu_bo_release_notify() comment error
fix amdgpu_bo_release_notify() comment error.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-17 14:06:28 -04:00
Huang Rui
d95c42a150 drm/amdkfd: fix the wrong sdma instance query for renoir
Renoir only has one sdma instance, it will get failed once query the
sdma1 registers. So use switch-case instead of static register array.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-17 14:06:13 -04:00
Alex Deucher
11043b7a99 drm/amdgpu: note what type of reset we are using
When we reset the GPU, note what type of reset will be
used.  This makes debugging different reset scenarios
more clear as the driver may use different reset
methods depending on conditions on the system.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 17:03:20 -04:00
Alex Deucher
bddbacc9e0 drm/amdgpu: print where we get the vbios image from
ACPI, ROM, PCI BAR, etc.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 17:03:20 -04:00
Bhawanpreet Lakha
31e726ca3d drm/amdgpu: parse ta firmware for navy_flounder
Use the same case as sienna_cichlid

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
James Zhu
ac1128c996 drm/amdgpu/vcn3.0: only SIENNA_CICHLID need specify instance for dec/enc
Only SIENNA_CICHLID(VCN3) has 2 unsymmetrical instances, there're less
codecs on instance 1, we use 0 for decode and 1 for encode.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
Evan Quan
e098bc9612 drm/amd/pm: optimize the power related source code layout
The target is to provide a clear entry point(for power routines).
Also this can help to maintain a clear view about the frameworks
used on different ASICs. Hopefully all these can make power part
more friendly to play with.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
Evan Quan
e9372d2371 drm/amd/powerplay: put those exposed power interfaces in amdgpu_dpm.c
As other power interfaces.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
Evan Quan
20d3c28ce4 drm/amd/powerplay: optimize i2c bus access implementation
The caller needs not care about the internal details how the powerplay
API implemented.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
Evan Quan
70bdb6ed22 drm/amd/powerplay: drop unnecessary pp_funcs checker
It's redundant. Also, the callers should not care about
the implementation details.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
Evan Quan
b89e9eb681 drm/amd/powerplay: optimize amdgpu_dpm_set_clockgating_by_smu() implementation
Cover the implementation details from outside(of power part).

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:41 -04:00
Guchun Chen
ae2bf61ff3 drm/amdgpu: guard ras debugfs creation/removal based on CONFIG_DEBUG_FS
It can avoid potential build warn/error when
CONFIG_DEBUG_FS is not set.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Guchun Chen
2e2f5dd514 drm/amdgpu: fix NULL pointer access issue when unloading driver
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 11 PID: 1526 Comm: modprobe Tainted: G           OE     5.6.0-guchchen #1
Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
RIP: 0010:down_write+0x15/0x40
Code: eb de e8 7e 17 72 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 89 fb e8 92
d8 ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 43 08 5b c3
RSP: 0018:ffffb1590386fcd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff85b2fcc2 RDI: 00000000000000a0
RBP: ffffb1590386fd30 R08: ffffffff85b2fcc2 R09: 000000000002b3c0
R10: ffff97a330618c40 R11: 00000000000005f6 R12: ffff97a3481beb40
R13: 00000000000000a0 R14: ffff97a3481beb40 R15: 0000000000000000
FS:  00007fb11a717540(0000) GS:ffff97a376cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000004066d6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 simple_recursive_removal+0x63/0x370
 ? debugfs_remove+0x60/0x60
 debugfs_remove+0x40/0x60
 amdgpu_ras_fini+0x82/0x230 [amdgpu]
 ? __kernfs_remove.part.17+0x101/0x1f0
 ? kernfs_name_hash+0x12/0x80
 amdgpu_device_fini+0x1c0/0x580 [amdgpu]
 amdgpu_driver_unload_kms+0x3e/0x70 [amdgpu]
 amdgpu_pci_remove+0x36/0x60 [amdgpu]
 pci_device_remove+0x3b/0xb0
 device_release_driver_internal+0xe5/0x1c0
 driver_detach+0x46/0x90
 bus_remove_driver+0x58/0xd0
 pci_unregister_driver+0x29/0x90
 amdgpu_exit+0x11/0x25 [amdgpu]
 __x64_sys_delete_module+0x13d/0x210
 do_syscall_64+0x5f/0x250
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Christian König
f1403342eb drm/amdgpu: revert "fix system hang issue during GPU reset"
The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Evan Quan
9f979a49e2 drm/amd/powerplay: enable swSMU mgpu fan boost support
Enable mgpu fan boost feature on swSMU routines.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Evan Quan
f10bb940d8 drm/amd/powerplay: optimize the interface for mgpu fan boost enablement
Cover the implementation details from outside(of power). Also preparing
for expanding this to swSMU.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Jiansong Chen
ba4e049e63 drm/amdgpu: disable gfxoff for navy_flounder
gfxoff is temporarily disabled for navy_flounder,
since at present the feature has broken some basic
amdgpu test.

Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Oak Zeng
9fb1506eb6 drm/amdgpu: Use function pointer for some mmhub functions
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific
implementation of most mmhub functions are called from a general
function pointer, instead of calling different function for
different ASIC. Simplify the code by deleting duplicate functions

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Nirmoy Das
2f53072434 drm/amdgpu: pass NULL pointer instead of 0
Fixes: c030f2e4166c3f55 ("drm/amdgpu: add amdgpu_ras.c to support ras (v2)")
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Dennis Li
72e14ebf9f drm/amdgpu: annotate a false positive recursive locking
[  584.110304] ============================================
[  584.110590] WARNING: possible recursive locking detected
[  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G           OE
[  584.111164] --------------------------------------------
[  584.111456] kworker/38:1/553 is trying to acquire lock:
[  584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.112112]
               but task is already holding lock:
[  584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.113068]
               other info that might help us debug this:
[  584.113689]  Possible unsafe locking scenario:

[  584.114350]        CPU0
[  584.114685]        ----
[  584.115014]   lock(&adev->reset_sem);
[  584.115349]   lock(&adev->reset_sem);
[  584.115678]
                *** DEADLOCK ***

[  584.116624]  May be due to missing lock nesting notation

[  584.117284] 4 locks held by kworker/38:1/553:
[  584.117616]  #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
[  584.117967]  #1: ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[  584.118358]  #2: ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[  584.118786]  #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.119222]
               stack backtrace:
[  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G           OE     5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[  584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[  584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[  584.121638] Call Trace:
[  584.122050]  dump_stack+0x98/0xd5
[  584.122499]  __lock_acquire+0x1139/0x16e0
[  584.122931]  ? trace_hardirqs_on+0x3b/0xf0
[  584.123358]  ? cancel_delayed_work+0xa6/0xc0
[  584.123771]  lock_acquire+0xb8/0x1c0
[  584.124197]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.124599]  down_write+0x49/0x120
[  584.125032]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125472]  amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125910]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[  584.126367]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[  584.126789]  process_one_work+0x29e/0x630
[  584.127208]  worker_thread+0x3c/0x3f0
[  584.127621]  ? __kthread_parkme+0x61/0x90
[  584.128014]  kthread+0x12f/0x150
[  584.128402]  ? process_one_work+0x630/0x630
[  584.128790]  ? kthread_park+0x90/0x90
[  584.129174]  ret_from_fork+0x3a/0x50

Each adev has owned lock_class_key to avoid false positive
recursive locking.

v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning

[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210

v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Wenhui Sheng
a4322e1881 drm/amdgpu: add debugfs interface for RAP test
After amdgpu driver loading successfully, we can use
RAP debugfs interface <debugfs_dir>/dri/xxx/rap_test
to trigger RAP test.

Currently only L0 validate test is supported.

v2: refine amdgpu_rap.h

Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Wenhui Sheng
8602692b6f drm/amdgpu: enable RAP TA load
Enable the RAP TA loading path and add RAP test
trigger interface.

v2: fix potential mem leak issue

Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:39 -04:00
Wenhui Sheng
a189d0ae0c drm/amdgpu: add RAP TA header file
The RAP TA contains tests used to verify if
RAP(Register Access Policy), or otherwise known
as Security Policy is applied correctly
by PSP BL&TOS.

The RAP test is a measure to ensure that we reduce
the avenue of complexity and mistakes when dealing
with RAP in post-si execution, where debugging failures
related to RAP is quite difficult and expensive.

v2: add introduction for RAP TA

Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:39 -04:00
Tianci.Yin
425a78f43b drm/amdgpu: reconfigure spm golden settings on Navi1x after GFXOFF exit(v3)
On Navi1x, the SPM golden settings are lost after GFXOFF
enter/exit, so reconfigure the golden settings after GFXOFF
exit.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:39 -04:00
Tianci.Yin
d58fe3cf11 drm/amdgpu: add interface amdgpu_gfx_init_spm_golden for Navi1x
On Navi1x, the SPM golden settings are lost after GFXOFF
enter/exit, so reconfiguration is needed. Make the
configuration code as an interface for future use.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:29 -04:00
Guchun Chen
66459e1db2 drm/amdgpu: add debugfs node to toggle ras error cnt harvest
Before ras recovery is issued, user could operate this debugfs
node to enable/disable the harvest of all RAS IPs' ras error
count registers, which will help keep hardware's registers'
status instead of cleaning up them.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:12:47 -04:00
Guchun Chen
f75e94d868 drm/amdgpu: bypass querying ras error count registers
Once ras recovery is issued by ras sync flood interrupt or
ras controller interrupt, add this guard to bypass or execute
ras error count register harvest of all IPs.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:12:22 -04:00
Linus Torvalds
ea6ec77437 drm fixes for 5.9-rc1
core:
 - Fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi
 - Remove null check for kfree in drm_dev_release.
 - Fix DRM_FORMAT_MOD_AMLOGIC_FBC definition.
 - re-added docs for drm_gem_flink_ioctl()
 - add orientation quirk for ASUS T103HAF
 
 ttm:
 - ttm: fix page-offset calculation within TTM
 - revert patch causing vmwgfx regressions
 
 fbcon:
 - Fix a fbcon OOB read in fbdev, found by syzbot.
 
 vga:
 - Mark vga_tryget static as it's not used elsewhere.
 
 amdgpu:
 - Re-add spelling typo fix
 - Sienna Cichlid fixes
 - Navy Flounder fixes
 - DC fixes
 - SMU i2c fix
 - Power fixes
 
 vmwgfx:
 - regression fixes for modesetting crashes
 - misc fixes
 
 xlnx:
 - Small fixes to xlnx.
 
 omap:
 - Fix mode initialization in omap_connector_mode_valid().
 - force runtime PM suspend on system suspend
 
 tidss:
 - fix modeset init for DPI panels
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJfM3QrAAoJEAx081l5xIa+4H8P/0YLHHK52yuYqyYEtvfehlDC
 i3ur47QdQS/6fkspVSbVndsToQS+D9ZNAZFUUSumVr1kR40yGC6oGZGmo+UWYH+T
 AiZC1LHmz8hsGMni40uCb+ble1mtcsbqjAkb5hG/9RxPfBiunQ6OJJGU7X2FoB3e
 TZNWwYKyF/Y5f/OlKdRbi9qfszWwWb/+ZW94q9ER1MjlcwrW3EqgRrkRGT3hH4UO
 AvzYhgGlzz7FXwQic3mJw3Sirm6NnbuUv9aRPRBCGm9i/BDSRHFpIcXIuLuMsiOk
 L/WtdYodMDbhB0xRKwDQzf9TIUxSwulrXO0Jfo8CVKknyHha62fz2bMGpf+4ron2
 3WrZmV+5w0Q0qcfbh1EgHkVXmDry+mtFws1dvAtvyNp7GSI675tNI2qH3cgLfvQX
 FgMrfbmOp+f+ohXL7fUw+J/7cUjrVYZxtkCAEctB/6B44INjFdRuwQfP98x5CcmX
 CpZHLJKsMI+Y6r6Jno5foIQJIZJrDiUJLhRE2sN7cWv54+2gmHXbqsLAfMMpB6Mc
 2gMZNIwB+qSVjULR+MHvzC3iZHHTkD3hijkYshb8+s5pICGo+eq2v4S/r31C4Wn1
 DpToox1t/RPY+WPS5dbi95zxg1QDwr/BsExTCKTUB2k+ovoWE6byPvt2Vslh+F/K
 i+LrkziEqKb/BhyGA7z9
 =dr7v
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-08-12' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This has a few vmwgfx regression fixes we hit from the merge window
  (one in TTM), it also has a bunch of amdgpu fixes along with a
  scattering everywhere else.

  core:
   - Fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi
   - Remove null check for kfree in drm_dev_release.
   - Fix DRM_FORMAT_MOD_AMLOGIC_FBC definition.
   - re-added docs for drm_gem_flink_ioctl()
   - add orientation quirk for ASUS T103HAF

  ttm:
   - ttm: fix page-offset calculation within TTM
   - revert patch causing vmwgfx regressions

  fbcon:
   - Fix a fbcon OOB read in fbdev, found by syzbot.

  vga:
   - Mark vga_tryget static as it's not used elsewhere.

  amdgpu:
   - Re-add spelling typo fix
   - Sienna Cichlid fixes
   - Navy Flounder fixes
   - DC fixes
   - SMU i2c fix
   - Power fixes

  vmwgfx:
   - regression fixes for modesetting crashes
   - misc fixes

  xlnx:
   - Small fixes to xlnx.

  omap:
   - Fix mode initialization in omap_connector_mode_valid().
   - force runtime PM suspend on system suspend

  tidss:
   - fix modeset init for DPI panels"

* tag 'drm-next-2020-08-12' of git://anongit.freedesktop.org/drm/drm: (70 commits)
  drm/ttm: revert "drm/ttm: make TT creation purely optional v3"
  drm/vmwgfx: fix spelling mistake "Cant" -> "Can't"
  drm/vmwgfx: fix spelling mistake "Cound" -> "Could"
  drm/vmwgfx/ldu: Use drm_mode_config_reset
  drm/vmwgfx/sou: Use drm_mode_config_reset
  drm/vmwgfx/stdu: Use drm_mode_config_reset
  drm/vmwgfx: Fix two list_for_each loop exit tests
  drm/vmwgfx: Use correct vmw_legacy_display_unit pointer
  drm/vmwgfx: Use struct_size() helper
  drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume
  drm/amd/powerplay: put VCN/JPEG into PG ungate state before dpm table setup(V3)
  drm/amd/powerplay: update swSMU VCN/JPEG PG logics
  drm/amdgpu: use mode1 reset by default for sienna_cichlid
  drm/amdgpu/smu: rework i2c adpater registration
  drm/amd/display: Display goes blank after inst
  drm/amd/display: Change null plane state swizzle mode to 4kb_s
  drm/amd/display: Use helper function to check for HDMI signal
  drm/amd/display: AMD OUI (DPCD 0x00300) skipped on some sink
  drm/amd/display: Fix logger context
  drm/amd/display: populate new dml variable
  ...
2020-08-12 11:53:01 -07:00
Thomas Zimmermann
534b1f9071 Merge drm/drm-next into drm-misc-next
Backmerging drm-next into drm-misc-next for nouveau and panel updates.
Resolves a conflict between ttm and nouveau, where struct ttm_mem_res got
renamed to struct ttm_resource.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2020-08-12 20:42:08 +02:00
Christian König
b2458726b3 drm/ttm: give resource functions their own [ch] files
This is a separate object we work within TTM.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/384338/?series=80346&rev=1
2020-08-12 15:51:03 +02:00
Christian König
e92ae67d6e drm/ttm: rename ttm_resource_manager_func callbacks
The names get/put are associated with reference counting
in the Linux kernel, use alloc/free instead.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/384340/?series=80346&rev=1
2020-08-12 15:50:51 +02:00
Arunpravin
0cf0ee983b drm/amdgpu: Enable P2P dmabuf over XGMI
Access the exported P2P dmabuf over XGMI, if available.
Otherwise, fall back to the existing PCIe method.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arunpravin <apaneers@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-11 11:47:35 -04:00
Oleg Vasilev
65bf2cf95d drm/amdgpu: utilize subconnector property for DP through atombios
Since DP-specific information is stored in driver's structures, every
driver needs to implement subconnector property by itself.

v2: rebase

v3: renamed a function call

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Jeevan B <jeevan.b@intel.com>
Signed-off-by: Oleg Vasilev <oleg.vasilev@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1587732655-17544-4-git-send-email-jeevan.b@intel.com
2020-08-11 14:19:41 +02:00
Dave Airlie
16e6eea29d Merge tag 'amd-drm-fixes-5.9-2020-08-07' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-fixes-5.9-2020-08-07:

amdgpu:
- Re-add spelling typo fix
- Sienna Cichlid fixes
- Navy Flounder fixes
- DC fixes
- SMU i2c fix
- Power fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200807222843.3909-1-alexander.deucher@amd.com
2020-08-11 13:08:45 +10:00
Linus Torvalds
97d052ea3f A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various
     situations caused by the lockdep additions to seqcount to validate that
     the write side critical sections are non-preemptible.
 
   - The seqcount associated lock debug addons which were blocked by the
     above fallout.
 
     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict per
     CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
     validate that the lock is held.
 
     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored and
     write_seqcount_begin() has a lockdep assertion to validate that the
     lock is held.
 
     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API is
     unchanged and determines the type at compile time with the help of
     _Generic which is possible now that the minimal GCC version has been
     moved up.
 
     Adding this lockdep coverage unearthed a handful of seqcount bugs which
     have been addressed already independent of this.
 
     While generaly useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if the
     writers are serialized by an associated lock, which leads to the well
     known reader preempts writer livelock. RT prevents this by storing the
     associated lock pointer independent of lockdep in the seqcount and
     changing the reader side to block on the lock when a reader detects
     that a writer is in the write side critical section.
 
  - Conversion of seqcount usage sites to associated types and initializers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
 QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
 KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
 shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
 n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
 F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
 DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
 AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
 81ppn2KlvzUK8g==
 =7Gj+
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Dave Airlie
c44264f9f7 Linux 5.8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8nLmkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGBEkH/RJAnEan4gcdkBDf
 2xS0yk4XLMjKZwbz61VSeKMUBkGCsh1cWsaAtJJAIVYj/6o/7mld01sZCnOASLnJ
 ET0nXL2NiT/f+prYkTE5qeYH225/Yfh5jgmrqZtx/uXFCwgE5Nzi3f72IXQDmCR+
 kmpNhNox3YqQTKXhv7DXobDKcO0n8nZavnhxmA9SBZn2h9RHvmvJghD0UOfLjMpA
 1SbknaE67n5JN/JjI6TkYWk4nuJmqfvmBL5IYVDEZYO4UlM5Bqzhw0XN7Ax70K3M
 KRK/eiqRmNwun5MxWnbzQU7t7iTgVmzjHLTpWGcM3V4blgGXC3uhjc+p/R8KTQUE
 bIydSzs=
 =fDeo
 -----END PGP SIGNATURE-----

Merge tag 'v5.8' into drm-next

I need to backmerge 5.8 as I've got a bunch of fixes sitting
on an rc7 base that I want to land.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-08-11 11:58:31 +10:00
shiwu.zhang
97a9b60fa3 drm/amdgpu: update gc golden register for arcturus
Update golden setting to improve performance on HPC
and ML apps

Signed-off-by: shiwu.zhang <shiwu.zhang@amd.com>
Tested-by: gang.long <gang.long@amd.com>
Reviewed-by: guchun.chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 18:08:21 -04:00
Liu ChengZhe
d5bbb4761c drm/amdgpu: Skip some registers config for SRIOV
Some registers are not accessible to virtual function setup, so
skip their initialization when in VF-SRIOV mode.

v2: move SRIOV VF check into specify functions;
modify commit description and comment.

Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 18:06:35 -04:00
Christophe JAILLET
78484d7c74 drm: amdgpu: Use the correct size when allocating memory
When '*sgt' is allocated, we must allocated 'sizeof(**sgt)' bytes instead
of 'sizeof(*sg)'.

The sizeof(*sg) is bigger than sizeof(**sgt) so this wastes memory but
it won't lead to corruption.

Fixes: f44ffd677fb3 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-08-10 18:06:10 -04:00
Monk Liu
bcca629806 drm/amdgpu: fix reload KMD hang on GFX10 KIQ
GFX10 KIQ will hang if we try below steps:
modprobe amdgpu
rmmod amdgpu
modprobe amdgpu sched_hw_submission=4

Due to KIQ is always living there even after KMD unloaded
thus when doing the realod KIQ will crash upon its register
being programed by different values with the previous loading
(the config like HQD addr, ring size, is easily changed if we alter
the sched_hw_submission)

the fix is we must inactive KIQ first before touching any
of its registgers

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 17:26:52 -04:00
shiwu.zhang
5a58abf5ed drm/amdgpu: update gc golden register for arcturus
Update golden setting to improve performance on HPC
and ML apps

Signed-off-by: shiwu.zhang <shiwu.zhang@amd.com>
Tested-by: gang.long <gang.long@amd.com>
Reviewed-by: guchun.chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 17:26:52 -04:00
Liu ChengZhe
1d44732619 drm/amdgpu: Skip some registers config for SRIOV
Some registers are not accessible to virtual function setup, so
skip their initialization when in VF-SRIOV mode.

v2: move SRIOV VF check into specify functions;
modify commit description and comment.

Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-10 17:26:52 -04:00