9532 Commits

Author SHA1 Message Date
Hawking Zhang
6e36f23193 drm/amdgpu: split nbio callbacks into ras and non-ras ones
nbio ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. split nbio callbacks
into ras and non-ras ones so gpu driver only
initializes nbio ras callbacks when it manages
nbio ras.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:04 -04:00
Hawking Zhang
87da0cc101 drm/amdgpu: implement query_ras_error_address callback
query_ras_error_address will be invoked to query bad
page address when there is poison data in HBM consumed
by GPU engines.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:01 -04:00
Hawking Zhang
878b9e944c drm/amdgpu: implement umc query error count callback
umc query_ras_error_count will be invoked to query
umc correctable and uncorrectable error. It will
reset the umc ras error counter after the query.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:58 -04:00
Hawking Zhang
3f903560d1 drm/amdgpu: add helper funtion to query umc ras error
Add helper functions to query correctable and
uncorrectable umc ras error.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:56 -04:00
Hawking Zhang
1696bf3589 drm/amdgpu: create umc_v6_7_funcs for aldebaran
umc_v6_7_funcs are callbacks to support umc ras
functionalities in aldebaran

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:52 -04:00
Hawking Zhang
75f06251c9 drm/amdgpu: initialze ras caps per paltform config
Driver only manages GFX/SDMA/MMHUB RAS in platforms
that gpu node is connected to cpu through XGMI, other
than that, it queries VBIOS for RAS capabilities.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:48 -04:00
Alex Deucher
c108aef148 drm/amdgpu: drop some unused atombios functions
These were leftover from the old CI dpm code which was
retired a while ago.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:34 -04:00
Bernard Zhao
f08726868c drm/amd: cleanup coding style a bit
Fix patch check warning:
WARNING: suspect code indent for conditional statements (8, 17)
+       if (obj && obj->use < 0) {
+                DRM_ERROR("RAS ERROR: Unbalance obj(%s) use\n", obj->head.name);

WARNING: braces {} are not necessary for single statement blocks
+       if (obj && obj->use < 0) {
+                DRM_ERROR("RAS ERROR: Unbalance obj(%s) use\n", obj->head.name);
+       }

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:31 -04:00
Stanley.Yang
5a43452704 drm/amdgpu: support sdma error injection
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reivewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:23 -04:00
Philip Yang
2b665c3735 drm/amdgpu: reserve fence slot to update page table
Forgot to reserve a fence slot to use sdma to update page table, cause
below kernel BUG backtrace to handle vm retry fault while application is
exiting.

[  133.048143] kernel BUG at /home/yangp/git/compute_staging/kernel/drivers/dma-buf/dma-resv.c:281!
[  133.048487] Workqueue: events amdgpu_irq_handle_ih1 [amdgpu]
[  133.048506] RIP: 0010:dma_resv_add_shared_fence+0x204/0x280
[  133.048672]  amdgpu_vm_sdma_commit+0x134/0x220 [amdgpu]
[  133.048788]  amdgpu_vm_bo_update_range+0x220/0x250 [amdgpu]
[  133.048905]  amdgpu_vm_handle_fault+0x202/0x370 [amdgpu]
[  133.049031]  gmc_v9_0_process_interrupt+0x1ab/0x310 [amdgpu]
[  133.049165]  ? kgd2kfd_interrupt+0x9a/0x180 [amdgpu]
[  133.049289]  ? amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049408]  amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049534]  amdgpu_ih_process+0x9b/0x1c0 [amdgpu]
[  133.049657]  amdgpu_irq_handle_ih1+0x21/0x60 [amdgpu]
[  133.049669]  process_one_work+0x29f/0x640
[  133.049678]  worker_thread+0x39/0x3f0
[  133.049685]  ? process_one_work+0x640/0x640

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:20 -04:00
Peng Ju Zhou
5e025531b7 drm/amdgpu: indirect register access for nv12 sriov
1. expand rlcg interface for gc & mmhub indirect access
2. add rlcg interface for no kiq

v2: squash in fix for gfx9 (Changfeng)

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:17 -04:00
Peng Ju Zhou
5d23851029 drm/amdgpu: indirect register access for nv12 sriov
using the control bits got from host to control registers access.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:13 -04:00
Peng Ju Zhou
77eabc6f59 drm/amdgpu: indirect register access for nv12 sriov
get pf2vf msg info at it's earliest time so that
guest driver can use these info to decide whether
register indirect access enabled.

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:09 -04:00
Peng Ju Zhou
8b8a162da8 drm/amdgpu: indirect register access for nv12 sriov
unify host driver and guest driver indirect access
control bits names

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:06 -04:00
Xℹ Ruoyao
9a89a721b4 drm/amdgpu: check alignment on CPU page for bo map
The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it.  Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:50:00 -04:00
Huacai Chen
f4d3da72a7 drm/amdgpu: Set a suitable dev_info.gart_page_size
In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages.  So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).

Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
     and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:53 -04:00
Guchun Chen
9973de10b5 drm/amdgpu: fix compiler warning(v2)
warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);

v2: put short variable declaration last

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:42 -04:00
Guchun Chen
3c3dc65433 drm/amdgpu: fix NULL pointer dereference
ttm->sg needs to be checked before accessing its child member.

Call Trace:
 amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
 ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm]
 ttm_bo_release+0x17d/0x300 [ttm]
 amdgpu_bo_unref+0x1a/0x30 [amdgpu]
 amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x78b/0x8b0 [amdgpu]
 kfd_ioctl_alloc_memory_of_gpu+0x118/0x220 [amdgpu]
 kfd_ioctl+0x222/0x400 [amdgpu]
 ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
 __x64_sys_ioctl+0x8e/0xd0
 ? __context_tracking_exit+0x52/0x90
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f97f264d317
Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdb402c338 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f97f3cc63a0 RCX: 00007f97f264d317
RDX: 00007ffdb402c380 RSI: 00000000c0284b16 RDI: 0000000000000003
RBP: 00007ffdb402c380 R08: 00007ffdb402c428 R09: 00000000c4000004
R10: 00000000c4000004 R11: 0000000000000246 R12: 00000000c0284b16
R13: 0000000000000003 R14: 00007f97f3cc63a0 R15: 00007f8836200000

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:38 -04:00
Rohit Khaire
4d675e1eb8 drm/amdgpu: Add new PF2VF flags for VF register access method
Add 3 sub flags to notify guest for indirect reg access of
gc, mmhub and ih

The host sets these flags depending on L1 RAP version,
asic and other scenarios. These flags ensure that
there is compatibility between different guest/host/vbios versions.

Signed-off-by: Rohit Khaire <rohit.khaire@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:49:22 -04:00
Feifei Xu
0698b13403 drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaran
This message is not needed on Aldebaran.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:50 -04:00
Chengming Gui
4a7ffbdb27 drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW for vega20/ALDEBARAN
When resume from gpu reset, need set MP1 state to UNLOAD before reload SMU
FW otherwise will cause following errors:
[  121.642772] [drm] reserve 0x400000 from 0x87fec00000 for PSP TMR [  123.801051] [drm] failed to load ucode id (24) [  123.801055] [drm] psp command (0x6) failed and response status is (0x0) [  123.801214] [drm:psp_load_smu_fw [amdgpu]] *ERROR* PSP load smu failed!
[  123.801398] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [  123.801536] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22 [  123.801632] amdgpu 0000:04:00.0: amdgpu: GPU reset(9) failed [  123.801691] amdgpu 0000:07:00.0: amdgpu: GPU reset(9) failed [  123.802899] amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -22

v2: add error info and including ALDEBARAN also

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:43 -04:00
Lijo Lazar
404b277bbe drm/amdgpu: Reset error code for 'no handler' case
If reset handler is not implemented, reset error before proceeding.

Fixes issue with the following trace -
[  106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0
[  106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume
[  106.509116] [drm] PCIE GART of 512M enabled.
[  106.509120] [drm] PTB located at 0x0000008000000000
[  106.509136] [drm] VRAM is lost due to GPU reset!
[  106.509332] [drm] PSP is resuming...

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:37 -04:00
Alex Sierra
03e70a0271 drm/amdgpu: ih reroute for newer asics than vega20
Starting Arcturus, it supports ih reroute through mmio directly
in bare metal environment. This is also valid for newer asics
such as Aldebaran.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:11 -04:00
Nirmoy Das
84e070f58a drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
Offset calculation wasn't correct as start addresses are in pfn
not in bytes.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:47:02 -04:00
Evan Quan
2d64d23e95 drm/amd/pm: unify the interface for gfx state setting
No need to have special handling for swSMU supported ASICs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:51 -04:00
Evan Quan
2e4b2f7b57 drm/amd/pm: unify the interface for loading SMU microcode
No need to have special handling for swSMU supported ASICs.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:38 -04:00
Lijo Lazar
928a0fe688 drm/amdgpu: Fix build warnings
Fix header guard and make internal functions static. Fixes the below warnings:

drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
drivers/gpu/drm/amd/amdgpu/aldebaran.c:110:6: warning: no previous prototype for function 'aldebaran_async_reset' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.c:1435:5: warning: no previous prototype for function 'aldebaran_mode2_reset' [-Wmissing-prototypes]

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:32 -04:00
Lijo Lazar
ea4e96a7b3 drm/amdgpu: Enable recovery on aldebaran
Add aldebaran to devices which support recovery

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:29 -04:00
Lijo Lazar
142600e854 drm/amdgpu: Add mode2 reset support for aldebaran
v1: Aldebaran uses reset control to support mode2 reset. The sequences to
reset and restore hardware context are specific to a particular
configuration.

v2: Clear bus mastering before reset.
Fix coding style issues, drop unwanted variables and info log.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:26 -04:00
Lijo Lazar
5d89bb2d2f drm/amdgpu: Make set PG/CG state functions public
Expose PG/CG set states functions for other clients

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:22 -04:00
Lijo Lazar
a2052839cd drm/amdgpu: Add PSP public function to load a list of FWs
v1: Adds a function to load a list of FWs as passed by the caller. This is
needed as only a select need to loaded for some use cases.

v2: Omit unrelated change, remove info log, fix return value when count is 0

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:18 -04:00
Lijo Lazar
04442bf70d drm/amdgpu: Add reset control handling to reset workflow
This prefers reset control based handling if it's implemented
for a particular ASIC. If not, it takes the legacy path. It uses
the legacy method of preparing environment (job, scheduler tasks)
and restoring environment.

v2: remove unused variable (Alex)

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:14 -04:00
Lijo Lazar
e071dce38f drm/amdgpu: Add reset control to amdgpu_device
v1: Add generic amdgpu_reset_control to handle different types of resets. It
may be added at device, hive or ip level. Each reset control has a list
of handlers associated with it to handle different types of reset. Reset
control is responsible for choosing the right handler given a particular
reset context.

Handler objects may implement a set of functions on how to handle a
particular type of reset.

prepare_env = Prepare environment/software context (not used currently).
prepare_hwcontext = Prepare hardware context for the reset.
perform_reset = Perform the type of reset.
restore_hwcontext = Restore the hw context after reset.
restore_env = Restore the environment after reset (not used currently).

Reset context carries the context of reset, as of now this is based on
the parameters used for current set of resets.

v2: Fix coding style

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:46:08 -04:00
Jack Zhang
e6c6338f39 drm/amd/amdgpu implement tdr advanced mode
[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.

[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.

1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.

2. For whole gpu reset(vram lost), do resubmit as the old way.

v2: squash in build fix (Alex)

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:45 -04:00
Nirmoy Das
030bb4addb drm/amdgpu: make BO type check less restrictive
BO with ttm_bo_type_sg type can also have tiling_flag and metadata.
So so BO type check for only ttm_bo_type_kernel.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reported-by: Tom StDenis <Tom.StDenis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:37 -04:00
Nirmoy Das
cc1bcf85b0 drm/amdgpu: use amdgpu_bo_user bo for metadata and tiling flag
Tiling flag and metadata are only needed for BOs created by
amdgpu_gem_object_create(), so we can remove those from the
base class.

v2: * squash tiling_flags and metadata relared patches into one
    * use BUG_ON for non ttm_bo_type_device type when accessing
    tiling_flags and metadata._
v3: *include to_amdgpu_bo_user

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:31 -04:00
Nirmoy Das
22b40f7a3a drm/amdgpu: use amdgpu_bo_create_user() for when possible
Use amdgpu_bo_create_user() for all the BO allocations for
ttm_bo_type_device type.

v2: include amdgpu_amdkfd_alloc_gws() as well it calls amdgpu_bo_create()
    for  ttm_bo_type_device

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:28 -04:00
Nirmoy Das
9ad0d033ed drm/amdgpu: introduce struct amdgpu_bo_user
Implement a new struct amdgpu_bo_user as subclass of
struct amdgpu_bo and a function to created amdgpu_bo_user
bo with a flag to identify the owner.

v2: amdgpu_bo_to_amdgpu_bo_user -> to_amdgpu_bo_user()

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:19 -04:00
Nirmoy Das
9fd5543e95 drm/amdgpu: allow variable BO struct creation
Allow allocating BO structures with different structure size
than struct amdgpu_bo.

v2: Check bo_ptr_size in all amdgpu_bo_create() caller.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:12 -04:00
Christian König
87cc7f9ebf drm/amdgpu: load balance VCN3 decode as well v8
Add VCN3 IB parsing to figure out to which instance we can send the
stream for decode.

v2: remove VCN instance limit as well, fix amdgpu_cs_find_mapping,
    check supported formats instead of unsupported.
v3: fix typo and error handling
v4: make sure the message BO is CPU accessible
v5: fix addr calculation once more
v6: only check message buffers
v7: fix constant and use defines
v8: fix create msg calculation

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:07 -04:00
Christian König
c62dfdbbf7 drm/amdgpu: share scheduler score on VCN3 instances
The VCN3 instances can do both decode as well as encode.

Share the scheduler load balancing score and remove fixing encode to
only the second instance.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:45:03 -04:00
Christian König
c107171b8d drm/amdgpu: add the sched_score to amdgpu_ring_init
Allow separate ring to share the same scheduler score.

No functional change.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:44:56 -04:00
Bhaskar Chowdhury
63a93023ee drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes
s/acccess/access/
s/inferface/interface/
s/sequnce/sequence/  .....two different places.
s/retrive/retrieve/
s/sheduling/scheduling/
s/independant/independent/
s/wether/whether/ ......two different places.
s/emmit/emit/
s/synce/sync/

Reviewed-by: Nirmoy Das<nirmoy.das@amd.com>
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:44:29 -04:00
Arnd Bergmann
9e76e7b206 amdgpu: securedisplay: simplify i2c hexdump output
A previous fix I did left a rather complicated loop in
amdgpu_securedisplay_debugfs_write() for what could be expressed in a
simple sprintf, as Rasmus pointed out.

This drops the leading 0x for each byte, but is otherwise
much nicer.

Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:44:08 -04:00
Mark Yacoub
f4a9be998c drm/amdgpu: Ensure that the modifier requested is supported by plane.
On initializing the framebuffer, call drm_any_plane_has_format to do a
check if the modifier is supported. drm_any_plane_has_format calls
dm_plane_format_mod_supported which is extended to validate that the
modifier is on the list of the plane's supported modifiers.

The bug was caught using igt-gpu-tools test: kms_addfb_basic.addfb25-bad-modifier

Tested on ChromeOS Zork by turning on the display, running an overlay
test, and running a YT video.

=== Changes from v1 ===
Explicitly handle DRM_FORMAT_MOD_INVALID modifier.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:54 -04:00
Tian Tao
36000c7a51 drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit
Fix the following coccicheck warning:
drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING:
use scnprintf or sprintf

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:47 -04:00
Christian König
266b2d25e3 drm/amdgpu: remove irq_src->data handling
That is unused for quite some time now.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:28 -04:00
Luben Tuikov
084e2640e5 drm/amdgpu: Fix check for RAS support
Use positive logic to check for RAS
support. Rename the function to actually indicate
what it is testing for. Essentially, make the
function a predicate with the correct name.

Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:18 -04:00
Philip Cox
9b7f1e0467 drm/amdgpu: Set amdgpu.noretry=1 for Arcturus
Setting amdgpu.noretry=1 as default for Arcturus.

Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:10 -04:00
John Clements
cad7b7510c drm/amdgpu: added support for dynamic GECC
updated host to send boot config to psp to enable GECC for sienna cichlid

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:43:06 -04:00