29 Commits

Author SHA1 Message Date
Lee Jones
5f7d8ee71e drm/amd/amdgpu/mmhub_v9_4: Fix naming disparity with 'mmhub_v9_4_set_fault_enable_default()'
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c:446: warning: expecting prototype for mmhub_v1_0_set_fault_enable_default(). Prototype was for mmhub_v9_4_set_fault_enable_default() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-21 10:32:16 -04:00
Hawking Zhang
53ee6609b4 drm/amdgpu: only harvest gcea/mmea error status in arcturus
SDP RdRspStatus/WrRspStatus or first parity error on
RdRsp data can cause system fatal error in arcturus.
GPU will be freezed in such case.

Driver needs to harvest these error information before
reset the GPU. Check error type to avoid harvest normal
gcea/mmea information.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20 21:35:45 -04:00
Oak Zeng
0ca565ab97 drm/amdgpu: Calling address translation functions to simplify codes
Use amdgpu_gmc_vram_pa and amdgpu_gmc_vram_cpu_pa
to simplify codes. No logic change.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15 16:03:01 -04:00
Hawking Zhang
8bc7b360ad drm/amdgpu: split mmhub callbacks into ras and non-ras ones
mmhub ras is only avaiable in cerntain mmhub ip
generation.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09 16:51:19 -04:00
Nirmoy Das
68fce5f07c drm/amdgpu: use AMDGPU_NUM_VMID when possible
Replace hardcoded vmid number with AMDGPU_NUM_VMID macro.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-12-08 23:05:40 -05:00
Alex Deucher
9b498efae2 drm/amdgpu: store noretry parameter per driver instance
This will allow us to have different defaults per asic
in a future patch.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-25 16:55:16 -04:00
Stanley.Yang
3f975d0f71 drm/amdgpu: update athub interrupt harvesting handle
GCEA/MMHUB EA error should not result to DF freeze, this is
fixed in next generation, but for some reasons the GCEA/MMHUB
EA error will result to DF freeze in previous generation,
diver should avoid to indicate GCEA/MMHUB EA error as hw fatal
error in kernel message by read GCEA/MMHUB err status registers.

Changed from V1:
    make query_ras_error_status function more general
    make read mmhub er status register more friendly

Changed from V2:
    move ras error status query function into do_recovery workqueue

Changed from V3:
    remove useless code from V2, print GCEA error status
    instance number

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-22 17:37:38 -04:00
Oak Zeng
9fb1506eb6 drm/amdgpu: Use function pointer for some mmhub functions
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific
implementation of most mmhub functions are called from a general
function pointer, instead of calling different function for
different ASIC. Simplify the code by deleting duplicate functions

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14 16:22:40 -04:00
Huang Rui
714ec7a2bd drm/amdgpu: use register distance member instead of hardcode in mmhub v9.4
This patch updates to use register distance member instead of hardcode
in mmhub v9.4.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: AnZhong Huang <anzhong.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-08 09:04:08 -04:00
Huang Rui
1f9d56c309 drm/amdgpu: add register distance members into vmhub structure
This patch is to abstract register distances between two continuous
context domains and invalidation engines. In different ip headers, these
distances may be differences.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: AnZhong Huang <anzhong.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-08 09:03:00 -04:00
John Clements
2b961e6a95 drm/amdgpu: update RAS related dmesg print
prefix RAS error related dmesg print with pci device info

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:02:36 -04:00
Hawking Zhang
fe5211f19a drm/amdgpu: add reset_ras_error_count function for MMHUB
MMHUB ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:40 -05:00
Nirmoy Das
a9d4fe2fd6 drm/amdgpu: remove unnecessary conversion to bool
Better clean that up before some automation starts to complain about it

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:55:27 -05:00
Dennis Li
39aa0ef163 drm/amdgpu: enable RAS feature for more mmhub sub-blocks of Acrturus
Compared with Vg20, the size of mmhub range is changed from 2 to 8.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:35:56 -05:00
Zhigang Luo
20bf2f6fef drm/amd/amdgpu: L1 Policy(1/5) - removed VM settings for mmhub and gfxhub from VF
Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Jane Jian <jane.jian@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07 12:00:17 -05:00
Frank.Min
55d62fe10f drm/amdgpu: remove FB location config for sriov
FB location is already programmed by HV driver
for arcutus so remove this part

Signed-off-by: Frank.Min <Frank.Min@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-23 14:59:43 -05:00
Yong Zhao
6dcab16b41 drm/amdkfd: Contain MMHUB number in mmhub_v9_4_setup_vm_pt_regs()
Adjust the exposed function prototype so that the caller does not need
to know the MMHUB number.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-03 11:08:24 -05:00
Oak Zeng
3d3f9ba8c4 drm/amdgpu: Apply noretry setting for mmhub9.4
Config the translation retry behavior according to noretry
kernel parameter

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-25 11:19:55 -05:00
changzhu
dab5ef2722 drm/amdgpu: initialize vm_inv_eng0_sem for gfxhub and mmhub
SW must acquire/release one of the vm_invalidate_eng*_sem around the
invalidation req/ack. Through this way,it can avoid losing invalidate
acknowledge state across power-gating off cycle.
To use vm_invalidate_eng*_sem, it needs to initialize
vm_invalidate_eng*_sem firstly.

Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-22 14:27:11 -05:00
Dennis Li
f6c3623b7b drm/amdgpu: implement querying ras error count for mmhub9.4
Get mmhub error counter by accessing EDC_CNT registers.

v2: Add mmhub_v9_4_ prefix for local static variable and function

Signed-off-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-22 14:27:11 -05:00
Alex Deucher
e2f619aa14 drm/amdgpu/arcturus: properly set BANK_SELECT and FRAGMENT_SIZE
These were not aligned for optimal performance for GPUVM.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 14:20:08 -05:00
Felix Kuehling
7cae706193 drm/amdgpu: Disable retry faults in VMID0
There is no point retrying page faults in VMID0. Those faults are
always fatal.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-and-Tested-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 09:54:34 -05:00
Le Ma
a840159c82 drm/amdgpu: enable mmhub clock gating for Arcturus
Init MC_MGCG/LS flag. Also apply to athub CG.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12 12:47:49 -05:00
Le Ma
cb15e8046d drm/amdgpu: add mmhub clock gating for Arcturus
Add 2 mmhub instances CG

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12 12:47:49 -05:00
Le Ma
6c54afc7e8 drm/amdgpu: assign fb_start/end in mmhub v9.4 interface
Align with mmhub v1.0.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:06 -05:00
Le Ma
7d0670f441 drm/amdgpu: set system aperture to cover whole FB region in mmhub v9.4
In XGMI configuration, the FB region covers vram region from peer
device, adjust system aperture to cover all of them

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:05 -05:00
Yong Zhao
c9ffdf5acd drm/amdgpu: Set VM_L2_CNTL.PDE_FAULT_CLASSIFICATION to 0 for MMHUB 9.4
Should be set to 0 for mmhub 9.4.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:04 -05:00
Yong Zhao
6d5311ab2c drm/amdkfd: Expose function mmhub_v9_4_setup_vm_pt_regs() for kfd to use
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:04 -05:00
Le Ma
2cb2ea1e07 drm/amdgpu: add mmhub v9.4.1 block for Arcturus (v2)
Arcturus as an updated mmhub block. mmhub is the
memory controller hub used for sdma and multimedia.

v2: squash in AGP BAR programming (Alex)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:02 -05:00