linux

iv/linux

Author	SHA1	Message	Date
Hawking Zhang	86153f1be2	drm/amdgpu: add reset_ras_error_count function for SDMA SDMA ras error counters are dirty ones after cold reboot Read operation is needed to reset them to 0 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-05 00:32:32 -05:00
Nirmoy Das	a9d4fe2fd6	drm/amdgpu: remove unnecessary conversion to bool Better clean that up before some automation starts to complain about it Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-22 16:55:27 -05:00
Aaron Liu	d8459d1b7f	drm/amdgpu: update goldensetting for renoir Update mmSDMA0_UTCL1_WATERMK golden setting for renoir. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-14 10:18:10 -05:00
Hawking Zhang	49da2ccd2d	drm/amdgpu: check sdma ras funcs pointer before accessing sdma ras funcs are not supported by ASIC prior to vega20 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-14 10:18:09 -05:00
Hawking Zhang	5e62db9df6	drm/amdgpu: read sdma edc counter to clear the counters SDMA edc counter registers were added in gfx edc counters array. When querying gfx error counter in that array, there is no way to differentiate sdma instance number for different asic and then results to NULL pointer access when trying to read sdma register base address for instances greater than 2 on Vega20. In addition, this also results to wrong gfx error counters since it actually added sdma edc counters. Therefore, sdma edc counter registers should be separated from gfx edc counter regsiter array and only get initialized when driver tries to enable sdma ras. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-14 10:18:08 -05:00
Hawking Zhang	1dd5ead294	drm/amdgpu: add ras_late_init and ras_fini for sdma v4 move ras_late_init and ras_fini to sdma_ras_funcs table Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-14 10:18:08 -05:00
Hawking Zhang	93070deb58	drm/amdgpu: add query_ras_error_count function for sdma v4 query_ras_error_count function will be invoked to query single bit error count detected in sdma ip block Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-14 10:18:08 -05:00
Luben Tuikov	ce73516d42	drm/amdgpu: simplify padding calculations (v2) Simplify padding calculations. v2: Comment update and spacing. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-23 14:55:44 -05:00
Nirmoy Das	0c88b43032	drm/amdgpu: replace vm_pte's run-queue list with drm gpu scheds list drm_sched_entity_init() takes drm gpu scheduler list instead of drm_sched_rq list. This makes conversion of drm_sched_rq list to drm gpu scheduler list unnecessary Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
chen gong	5aed95bbdd	drm/amdgpu: Fix SDMA hang when performing VKexample test VKexample test hang during Occlusion/SDMA/Varia runs. Clear XNACK_WATERMK in reg SDMA0_UTCL1_WATERMK to fix this issue. Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-25 16:50:10 -04:00
chen gong	6696b8adb8	drm/amdgpu: Do not implement power-on for SDMA after do mode2 reset on Renoir Find that ring sdma0 test failed if turn on SDMA powergating after do mode2 reset. Perhaps the mode2 reset does not reset the SDMA PG state, SDMA is already powered up so there is no need to ask the SMU to power it up again. So I skip this function for a moment. Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-10 19:39:06 -05:00
Tao Zhou	3d8361b11c	drm/amdgpu: add comments in ras interrupt callback add comments to clarify why checking GFX IP BLOCK for each ras interrupt callback Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-03 09:11:03 -05:00
Tao Zhou	e536c81850	drm/amdgpu: add common sdma_ras_fini function sdma_ras_fini can be shared among all generations of sdma Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-03 09:11:02 -05:00
Tao Zhou	fc04e6b484	drm/amdgpu: refine sdma4 ras_data_cb simplify code logic and refine return value v2: remove unused error source code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-03 09:11:02 -05:00
Tao Zhou	4c65dd1041	drm/amdgpu: move sdma ecc functions to generic sdma file sdma ras ecc functions can be reused among all sdma generations Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-03 09:11:02 -05:00
Tao Zhou	f5f06e21e9	drm/amdgpu: update parameter of ras_ih_cb change struct ras_err_data err_data to void err_data, align with umc code and the callback's declaration in each ras block could pay no attention to the structure type Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-03 09:11:01 -05:00
Prike Liang	a90a24d581	drm/amd/amdgpu: power up sdma engine when S3 resume back The sdma_v4 should be ungated when the IP resume back, otherwise it will hang up and resume time out error. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:16:07 -05:00
Hawking Zhang	bfcf62c2a5	drm/amdgpu/sdma: switch to amdgpu_sdma_ras_late_init helper function amdgpu_sdma_ras_late_init is used to init sdma specfic ras debugfs/sysfs node and sdma specific interrupt handler. It can be shared among sdma generations Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 17:41:49 -05:00
Hawking Zhang	d094aea312	drm/amdgpu: set ip specific ras interface pointer to NULL after free it to prevent access to dangling pointers Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 17:41:29 -05:00
Andrey Grodzovsky	7c6e68c777	drm/amdgpu: Avoid HW GPU reset for RAS. Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 17:41:05 -05:00
Hawking Zhang	8bf2485aec	drm/amdgpu: fix memory leak when ras is not supported on specific ip block free ras_if if ras is not supported Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 17:36:22 -05:00
Hawking Zhang	7d0a31e8cc	drm/amdgpu: switch to amdgpu_ras_late_init for sdma v4 block (v2) call helper function in late init phase to handle ras init for sdma ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 17:11:04 -05:00
Hawking Zhang	bebc076285	drm/amdgpu: switch to new amdgpu_nbio structure no functional change, just switch to new structures Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 17:11:03 -05:00
Prike Liang	334ffd0daa	drm/amdgpu: Initialize and update SDMA power gating Init SDMA HW base configuration and enable idle INT for rn. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-29 15:52:32 -05:00
Gang Ba	250af743c0	Revert "drm/amdgpu: free up the first paging queue v2" This reverts commit 4f8bc72fbf10f2dc8bca74d5da08b3a981b2e5cd. It turned out that a single reserved queue wouldn't be sufficient for page fault handling. Signed-off-by: Gang Ba <gaba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-27 08:16:26 -05:00
Aaron Liu	f13580a947	drm/amdgpu: update gc/sdma goldensetting for rn This patch updates gc/sdma goldensetting for renoir Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-22 17:48:46 -05:00
Prike Liang	91c5b6b326	drm/amdgpu/sdma4: set sdma clock gating for rn Add support for SDMA clockgating on RN. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-22 17:40:58 -05:00
Huang Rui	a46e1716f3	drm/amdgpu: add sdma golden settings for renoir This patch adds sdma golden settings for renoir asic. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-12 12:47:50 -05:00
Huang Rui	2d49738ae1	drm/amdgpu: add sdma support for renoir Add renoir checks to appropriate places. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-12 12:47:49 -05:00
Le Ma	8dc7e07cff	drm/amdgpu: add sdma clock gating for Arcturus Add ARCTURUS case in sdma set clockgating function Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-12 12:47:48 -05:00
Le Ma	78864760c2	drm/amdgpu: support sdma clock gating for more instances Shorten the code with RREG32_SDMA/WREG32_SDMA macro in CG part. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-12 12:47:48 -05:00
John Clements	8c2ef8ca0e	drm/amdgpu: update SDMA V4 microcode init Removed loading duplicate instances of SDMA FW for Arcturus. We use a single image for all instances. Signed-off-by: John Clements <john.clements@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-02 10:30:39 -05:00
Tao Zhou	bd2280da46	drm/amdgpu: replace AMDGPU_RAS_UE with AMDGPU_RAS_SUCCESS ce can also trigger interrupt, and even both ce and ue error can be found in one ras query, distinguishing between ce and ue in interrupt handler is uncessary. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Suggested-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-02 10:30:39 -05:00
Colin Ian King	ac4bf4a1eb	drm/amdgpu: fix unsigned variable instance compared to less than zero Currenly the error check on variable instance is always false because it is a uint32_t type and this is never less than zero. Fix this by making it an int type. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 7d0e6329dfdc ("drm/amdgpu: update more sdma instances irq support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-02 10:30:38 -05:00
Monk Liu	4cd4c5c064	drm/amdgpu: cleanup vega10 SRIOV code path we can simplify all those unnecessary function under SRIOV for vega10 since: 1) PSP L1 policy is by force enabled in SRIOV 2) original logic always set all flags which make itself a dummy step besides, 1) the ih_doorbell_range set should also be skipped for VEGA10 SRIOV. 2) the gfx_common registers should also be skipped for VEGA10 SRIOV. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-02 10:17:21 -05:00
Tao Zhou	81e02619e9	drm/amdgpu: update interrupt callback for all ras clients add err_data parameter in interrupt cb for ras clients Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-31 14:50:29 -05:00
Hawking Zhang	861324983d	drm/amdgpu: correct irq type used for sdma ecc we should pass irq type, instead of irq client id, to irq_get/put interface Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-30 23:48:35 -05:00
Le Ma	7d0e6329df	drm/amdgpu: update more sdma instances irq support Update for sdma ras ecc_irq and other minors. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-30 23:48:34 -05:00
Hawking Zhang	d52d6de280	drm/amdgpu: set sdma irq src num according to sdma instances Otherwise, it will cause driver access non-existing sdma registers in gpu reset code path Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-22 14:57:31 -05:00
Le Ma	69d4de94f8	drm/amdgpu: enable all 8 sdma instances for Arcturus silicon The more 6 sdma instances work fine now with DF fix in vbios: * mmDF_PIE_AON_MiscClientsEnable(0x1c728)=0x3fe(DF_ALL_INSTANCE) [9:4]MmhubsEnable=3f (change from 0) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:07 -05:00
Le Ma	fc1e272e8d	drm/amdgpu: limit sdma instances to 2 for Arcturus in BU phase Another 6 sdma instances do not work at present. Disable them to unblock KFD for silicon bringup as a workaround Signed-off-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:06 -05:00
Hawking Zhang	ca1961a2f5	drm/amdgpu: add arct sdma golden settings Golden SDMA register settings from the hw team. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:06 -05:00
Le Ma	eec28ef03c	drm/amdgpu: declare sdma firmware binary files for Arcturus So that they are properly picked up as a driver dependency. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:04 -05:00
Le Ma	f864e3e655	drm/amdgpu: add paging queue support for 8 SDMA instances on Arcturus Properly enable all 8 instances for paging queue. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:03 -05:00
Le Ma	5ce40fd86c	drm/amdgpu: add Arcturus chip_name for init sdma microcode So we load the proper firmware for arcturus. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:03 -05:00
Le Ma	121d859918	drm/amdgpu: enable 8 SDMA instances for Arcturus All the 8 SDMA instances work fine on the latest Gopher build model. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Snow Zhang <Snow.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:03 -05:00
Le Ma	5cd54ab85d	drm/amdgpu: correct Arcturus SDMA address space base index Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:03 -05:00
Le Ma	0fe6a7b49f	drm/amdgpu: support hdp flush for more sdma instances The bit RSVD_ENG0 to RSVD_ENG5 in GPU_HDP_FLUSH_REQ/GPU_HDP_FLUSH_DONE can be leveraged for sdma instance 2~7 to poll register/memory. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Snow Zhang < Snow.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:02 -05:00
Le Ma	b482a134ad	drm/amdgpu: specify sdma instance 5~7 with second mmhub type On Arcturus, sdma instance 5~7 is connected to the second mmhub. The vmhub type in amdgpu_ring_funcs is constant, so we create an individual amdgpu_ring_funcs with different vmhub type(AMDGPU_MMHUB_1) for these sdma instances. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Snow Zhang < Snow.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:02 -05:00
Le Ma	667a48226e	drm/amdgpu: reorganize sdma v4 code to support more instances This change is needed for Arcturus which has 8 sdma instances. The CG/PG part is not covered for now. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Snow Zhang < Snow.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-07-18 14:18:02 -05:00

1 2 3 4

199 Commits