Hawking Zhang
49070c4ea3
drm/amdgpu: split umc callbacks to ras and non-ras ones
...
umc ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. split umc callbacks
into ras and non-ras ones so gpu driver only
initializes umc ras callbacks when it manages
umc ras.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Dennis Li <Dennis.Li@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2021-04-09 16:51:11 -04:00
Deepak R Varma
58b5a793ff
drm/amdgpu/umc: use "*" adjacent to data name
...
When declaring pointer data, the "*" symbol should be used adjacent to
the data name as per the coding standards. This resolves following
issues reported by checkpatch script:
ERROR: "foo * bar" should be "foo *bar"
ERROR: "foo * bar" should be "foo *bar"
ERROR: "foo* bar" should be "foo *bar"
ERROR: "(foo*)" should be "(foo *)"
Signed-off-by: Deepak R Varma <mh12gx2825@gmail.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-11-02 15:35:44 -05:00
John Clements
c5a4ef3e20
drm/amdgpu: move umc specific macros to header
...
certain umc macros are common across umc versions
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-07-23 10:45:00 -04:00
Guchun Chen
fd90456c75
drm/amdgpu: decouple EccErrCnt query and clear operation
...
Due to hardware bug that when RSMU UMC index is disabled,
clear EccErrCnt at the first UMC instance will clean up all other
EccErrCnt registes from other instances at the same time. This
will break the correctable error count log in EccErrCnt register
once querying it. So decouple both to make error count query workable.
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-04-27 15:52:10 -04:00
Guchun Chen
40e733147f
drm/amdgpu: switch to SMN interface to operate RSMU index mode
...
This makes consistent with other regsiters' access in this module.
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-04-27 15:52:03 -04:00
John Clements
1a2172b5ee
drm/amdgpu: update page retirement sequence
...
check UMC status and exit prior to making and erroneus register access
this resolved unexpected behaviour with UMC indexing mode broadcasting writes
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-03-06 14:32:06 -05:00
Guchun Chen
d38c3ac716
drm/amdgpu: toggle DF-Cstate when accessing UMC ras error related registers
...
On arcturus, DF-Cstate needs to be toggled off/on
before and after accessing UMC error counter and
error address registers, otherwise, clearing such
registers may fail.
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: John Clements <John.Clements@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-03-06 14:31:59 -05:00
John Clements
eee2eabafe
drm/amdgpu: preserve RSMU UMC index mode state
...
between UMC RAS err register access restore previous RSMU UMC index mode state
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-01-14 10:18:10 -05:00
Guchun Chen
5d4667ec33
drm/amdgpu: calculate MCUMC_ADDRT0 per asic's UMC offset
...
Hardcoded offset is not friendly. And another benifit of this
patch is to keep read and write access to this register be
consistent with other similar UMC regsiters in this file.
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-01-14 10:18:09 -05:00
John Clements
c8aa6ae30c
drm/amdgpu: updated UMC error address record with correct channel index
...
defined macros for repetitive for loops
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-01-07 12:02:08 -05:00
John Clements
0ee51f1d94
drm/amdgpu: resolved bug in UMC RAS CE query
...
switch CE counter register access' to use SMN
disable UMC indexing mode
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-01-07 12:02:01 -05:00
John Clements
bd68fb94b3
drm/amdgpu: resolve bug in UMC 6 error counter query
...
iterate over all error counter registers in SMN space
removed support error counter access via MMIO
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-01-07 11:58:37 -05:00
John Clements
955c712007
drm/amdgpu: update UMC 6.1 RAS error counter register access path
...
use proper method for SMN register access
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2020-01-07 11:57:48 -05:00
Guchun Chen
fb71a336cd
drm/amdgpu: move umc offset to one new header file for Arcturus
...
Code refactor and no functional change.
Fixes: 4cf781c24c ("drm/amdgpu: Added RAS UMC error query support for Arcturus")
Signed-off-by: Guchun Chen <guchun.chen@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-12-18 16:33:01 -05:00
John Clements
4cf781c24c
drm/amdgpu: Added RAS UMC error query support for Arcturus
...
Updated UMC 6.1 function set to support UMC 6.1.1 and 6.1.2 devices
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: John Clements <john.clements@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-12-11 15:22:07 -05:00
Tao Zhou
afa44809a4
drm/amdgpu: use GPU PAGE SHIFT for umc retired page
...
umc retired page belongs to vram and it should be aligned to gpu page
size
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-10-03 09:10:59 -05:00
Tao Zhou
d99659a062
drm/amdgpu: rename umc ras_init to err_cnt_init
...
this interface is related to specific version of umc, distinguish it
from ras_late_init
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-09-16 10:06:20 -05:00
Tao Zhou
86edcc7dba
drm/amdgpu: move umc late init from gmc to umc block
...
umc late init is umc specific, it's more suitable to be put in umc block
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-09-16 10:06:03 -05:00
Tao Zhou
87d2b92f1e
drm/amdgpu: save umc error records
...
save umc error records to ras bad page array
v2: add bad pages before gpu reset
v3: add NULL check for adev->umc.funcs
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com >
Reviewed-by: Guchun Chen <guchun.chen@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-09-13 17:50:40 -05:00
Tao Zhou
dd21a572c9
drm/amdgpu: implement UMC 64 bits REG operations
...
implement 64 bits operations via 32 bits interface
v2: make use of lower_32_bits() and upper_32_bits() macros
Reviewed-by: Christian König <christian.koenig@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-08-09 11:17:10 -05:00
Tao Zhou
b1a5895352
drm/amdgpu: update the calc algorithm of umc ecc error count
...
the initial value of ecc error count can be adjusted
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-08-02 10:30:38 -05:00
Tao Zhou
b7f92097f5
drm/amdgpu: implement umc ras init function
...
enable umc ce interrupt and initialize ecc error count
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-08-02 10:30:38 -05:00
Tao Zhou
2b671b6049
drm/amdgpu: apply umc_for_each_channel macro to umc_6_1
...
use umc_for_each_channel to make code simpler
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-08-02 10:30:38 -05:00
Tao Zhou
3aacf4ea11
drm/amdgpu: initialize new parameters and functions for amdgpu_umc structure
...
add initialization for new members of amdgpu_umc structure
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-08-02 10:30:38 -05:00
Tao Zhou
a55c8d7bda
drm/amdgpu: remove the clear of MCA_ADDR
...
clearing MCA_STATUS is enough to reset the whole MCA, writing zero to
MCA_ADDR is unnecessary
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-08-02 10:30:38 -05:00
Tao Zhou
8c94810357
drm/amdgpu: query umc ras error address
...
query umc ras error address, translate it to gpu 4k page view
and save it.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Dennis Li <dennis.li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-07-31 14:50:17 -05:00
Tao Zhou
c2742aef4d
drm/amdgpu: add structures for umc error address translation
...
add related registers, callback function and channel index table
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-07-31 14:50:11 -05:00
Tao Zhou
f1ed4afa13
drm/amdgpu: update algorithm of umc uncorrectable error counting
...
remove the check of ErrorCodeExt
v2: refine the if condition for ue counting
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Dennis Li <dennis.li@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-07-31 14:49:58 -05:00
Tao Zhou
5bbfb64a17
drm/amdgpu: use 64bit operation macros for umc
...
replace some 32bit macros with 64bit operations to simplify code
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Dennis Li <dennis.li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-07-31 14:49:46 -05:00
Hawking Zhang
9884c2b1c3
drm/amdgpu: add umc v6_1 query error count support
...
Implement umc query_ras_error_count function to support querry
both correctable and uncorrectable error
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Dennis Li <dennis.li@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2019-07-31 14:49:16 -05:00