From 66a5710beaf42903d553378f609166034bd219c7 Mon Sep 17 00:00:00 2001 From: Dennis Li Date: Wed, 2 Sep 2020 12:57:59 +0800 Subject: [PATCH 01/10] drm/kfd: fix a system crash issue during GPU recovery The crash log as the below: [Thu Aug 20 23:18:14 2020] general protection fault: 0000 [#1] SMP NOPTI [Thu Aug 20 23:18:14 2020] CPU: 152 PID: 1837 Comm: kworker/152:1 Tainted: G OE 5.4.0-42-generic #46~18.04.1-Ubuntu [Thu Aug 20 23:18:14 2020] Hardware name: GIGABYTE G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020 [Thu Aug 20 23:18:14 2020] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [Thu Aug 20 23:18:14 2020] RIP: 0010:evict_process_queues_cpsch+0xc9/0x130 [amdgpu] [Thu Aug 20 23:18:14 2020] Code: 49 8d 4d 10 48 39 c8 75 21 eb 44 83 fa 03 74 36 80 78 72 00 74 0c 83 ab 68 01 00 00 01 41 c6 45 41 00 48 8b 00 48 39 c8 74 25 <80> 78 70 00 c6 40 6d 01 74 ee 8b 50 28 c6 40 70 00 83 ab 60 01 00 [Thu Aug 20 23:18:14 2020] RSP: 0018:ffffb29b52f6fc90 EFLAGS: 00010213 [Thu Aug 20 23:18:14 2020] RAX: 1c884edb0a118914 RBX: ffff8a0d45ff3c00 RCX: ffff8a2d83e41038 [Thu Aug 20 23:18:14 2020] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8a0e2e4178c0 [Thu Aug 20 23:18:14 2020] RBP: ffffb29b52f6fcb0 R08: 0000000000001b64 R09: 0000000000000004 [Thu Aug 20 23:18:14 2020] R10: ffffb29b52f6fb78 R11: 0000000000000001 R12: ffff8a0d45ff3d28 [Thu Aug 20 23:18:14 2020] R13: ffff8a2d83e41028 R14: 0000000000000000 R15: 0000000000000000 [Thu Aug 20 23:18:14 2020] FS: 0000000000000000(0000) GS:ffff8a0e2e400000(0000) knlGS:0000000000000000 [Thu Aug 20 23:18:14 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Thu Aug 20 23:18:14 2020] CR2: 000055c783c0e6a8 CR3: 00000034a1284000 CR4: 0000000000340ee0 [Thu Aug 20 23:18:14 2020] Call Trace: [Thu Aug 20 23:18:14 2020] kfd_process_evict_queues+0x43/0xd0 [amdgpu] [Thu Aug 20 23:18:14 2020] kfd_suspend_all_processes+0x60/0xf0 [amdgpu] [Thu Aug 20 23:18:14 2020] kgd2kfd_suspend.part.7+0x43/0x50 [amdgpu] [Thu Aug 20 23:18:14 2020] kgd2kfd_pre_reset+0x46/0x60 [amdgpu] [Thu Aug 20 23:18:14 2020] amdgpu_amdkfd_pre_reset+0x1a/0x20 [amdgpu] [Thu Aug 20 23:18:14 2020] amdgpu_device_gpu_recover+0x377/0xf90 [amdgpu] [Thu Aug 20 23:18:14 2020] ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu] [Thu Aug 20 23:18:14 2020] amdgpu_ras_do_recovery+0x159/0x190 [amdgpu] [Thu Aug 20 23:18:14 2020] process_one_work+0x20f/0x400 [Thu Aug 20 23:18:14 2020] worker_thread+0x34/0x410 When GPU hang, user process will fail to create a compute queue whose struct object will be freed later, but driver wrongly add this queue to queue list of the proccess. And then kfd_process_evict_queues will access a freed memory, which cause a system crash. v2: The failure to execute_queues should probably not be reported to the caller of create_queue, because the queue was already created. Therefore change to ignore the return value from execute_queues. Reviewed-by: Felix Kuehling Signed-off-by: Dennis Li Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index e0e60b0d0669..ef78492ef956 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1326,7 +1326,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, if (q->properties.is_active) { increment_queue_count(dqm, q->properties.type); - retval = execute_queues_cpsch(dqm, + execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0); } From 087d764159996ae378b08c0fdd557537adfd6899 Mon Sep 17 00:00:00 2001 From: Dennis Li Date: Wed, 2 Sep 2020 17:11:09 +0800 Subject: [PATCH 02/10] drm/amdkfd: fix a memory leak issue In the resume stage of GPU recovery, start_cpsch will call pm_init which set pm->allocated as false, cause the next pm_release_ib has no chance to release ib memory. Add pm_release_ib in stop_cpsch which will be called in the suspend stage of GPU recovery. Reviewed-by: Felix Kuehling Signed-off-by: Dennis Li Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index ef78492ef956..0f4508b4903e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1216,6 +1216,8 @@ static int stop_cpsch(struct device_queue_manager *dqm) dqm->sched_running = false; dqm_unlock(dqm); + pm_release_ib(&dqm->packets); + kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); pm_uninit(&dqm->packets, hanging); From cc8e66e769ebd1d10c406a3152474bab24ac1730 Mon Sep 17 00:00:00 2001 From: Jiansong Chen Date: Mon, 14 Sep 2020 14:42:51 +0800 Subject: [PATCH 03/10] drm/amd/pm: support runtime pptable update for sienna_cichlid etc. This avoids smu issue when enabling runtime pptable update for sienna_cichlid and so on. Runtime pptable udpate is needed for test and debug purpose. Signed-off-by: Jiansong Chen Reviewed-by: Kenneth Feng Reviewed-by: Evan Quan Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c index 0826625573dc..63f945f9f331 100644 --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c @@ -1126,7 +1126,7 @@ static int smu_disable_dpms(struct smu_context *smu) */ if (smu->uploading_custom_pp_table && (adev->asic_type >= CHIP_NAVI10) && - (adev->asic_type <= CHIP_NAVI12)) + (adev->asic_type <= CHIP_NAVY_FLOUNDER)) return 0; /* @@ -1211,7 +1211,9 @@ static int smu_hw_fini(void *handle) int smu_reset(struct smu_context *smu) { struct amdgpu_device *adev = smu->adev; - int ret = 0; + int ret; + + amdgpu_gfx_off_ctrl(smu->adev, false); ret = smu_hw_fini(adev); if (ret) @@ -1222,8 +1224,12 @@ int smu_reset(struct smu_context *smu) return ret; ret = smu_late_init(adev); + if (ret) + return ret; - return ret; + amdgpu_gfx_off_ctrl(smu->adev, true); + + return 0; } static int smu_suspend(void *handle) From 4cdd7b332ed139b1e37faeb82409a14490adb644 Mon Sep 17 00:00:00 2001 From: Bhawanpreet Lakha Date: Fri, 28 Aug 2020 11:09:38 -0400 Subject: [PATCH 04/10] drm/amd/display: Don't use DRM_ERROR() for DTM add topology [Why] Previously we were only calling add_topology when hdcp was being enabled. Now we call add_topology by default so the ERROR messages are printed if the firmware is not loaded. This error message is not relevant for normal display functionality so no need to print a ERROR message. [How] Change DRM_ERROR to DRM_INFO Signed-off-by: Bhawanpreet Lakha Acked-by: Aurabindo Pillai Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c index fb1161dd7ea8..3a367a5968ae 100644 --- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c +++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c @@ -88,7 +88,7 @@ enum mod_hdcp_status mod_hdcp_add_display_to_topology(struct mod_hdcp *hdcp, enum mod_hdcp_status status = MOD_HDCP_STATUS_SUCCESS; if (!psp->dtm_context.dtm_initialized) { - DRM_ERROR("Failed to add display topology, DTM TA is not initialized."); + DRM_INFO("Failed to add display topology, DTM TA is not initialized."); display->state = MOD_HDCP_DISPLAY_INACTIVE; return MOD_HDCP_STATUS_FAILURE; } From c4790a8894232f39c25c7c546c06efe074e63384 Mon Sep 17 00:00:00 2001 From: Jun Lei Date: Thu, 3 Sep 2020 16:17:46 -0400 Subject: [PATCH 05/10] drm/amd/display: update nv1x stutter latencies [why] Recent characterization shows increased stutter latencies on some SKUs, leading to underflow. [how] Update SOC params to account for this worst case latency. Signed-off-by: Jun Lei Acked-by: Aurabindo Pillai Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index 9140b3fc767a..f31f48dd0da2 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -409,8 +409,8 @@ static struct _vcs_dpi_soc_bounding_box_st dcn2_0_nv14_soc = { }, }, .num_states = 5, - .sr_exit_time_us = 8.6, - .sr_enter_plus_exit_time_us = 10.9, + .sr_exit_time_us = 11.6, + .sr_enter_plus_exit_time_us = 13.9, .urgent_latency_us = 4.0, .urgent_latency_pixel_data_only_us = 4.0, .urgent_latency_pixel_mixed_with_vm_data_us = 4.0, From 5367eb6d8a98b8682877961f4ccee1088d5735f3 Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky Date: Thu, 10 Sep 2020 13:59:33 -0400 Subject: [PATCH 06/10] drm/amdgpu: Include sienna_cichlid in USBC PD FW support. Create sysfs interface also for sienna_cichlid. Reviewed-by: Alex Deucher Signed-off-by: Andrey Grodzovsky Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index d8c6520ff74a..06757681b2ce 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -178,7 +178,7 @@ static int psp_sw_init(void *handle) return ret; } - if (adev->asic_type == CHIP_NAVI10) { + if (adev->asic_type == CHIP_NAVI10 || adev->asic_type == CHIP_SIENNA_CICHLID) { ret= psp_sysfs_init(adev); if (ret) { return ret; From 40eab0f8956724b0c2bb9e5679269632afb72b26 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Wed, 9 Sep 2020 13:12:46 +0200 Subject: [PATCH 07/10] drm/radeon: revert "Prefer lower feedback dividers" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Turns out this breaks a lot of different hardware. This reverts commit fc8c70526bd30733ea8667adb8b8ffebea30a8ed. Signed-off-by: Christian König Acked-by: Nirmoy Das Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 7b69d6dfe44a..e0ae911ef427 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -933,7 +933,7 @@ static void avivo_get_fb_ref_div(unsigned nom, unsigned den, unsigned post_div, /* get matching reference and feedback divider */ *ref_div = min(max(den/post_div, 1u), ref_div_max); - *fb_div = max(nom * *ref_div * post_div / den, 1u); + *fb_div = DIV_ROUND_CLOSEST(nom * *ref_div * post_div, den); /* limit fb divider to its maximum */ if (*fb_div > fb_div_max) { From 2f228aab21bbc74e90e267a721215ec8be51daf7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Fri, 4 Sep 2020 12:43:04 +0200 Subject: [PATCH 08/10] drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Don't check drm_crtc_state::active for this either, per its documentation in include/drm/drm_crtc.h: * Hence drivers must not consult @active in their various * &drm_mode_config_funcs.atomic_check callback to reject an atomic * commit. atomic_remove_fb disables the CRTC as needed for disabling the primary plane. This prevents at least the following problems if the primary plane gets disabled (e.g. due to destroying the FB assigned to the primary plane, as happens e.g. with mutter in Wayland mode): * The legacy cursor ioctl returned EINVAL for a non-0 cursor FB ID (which enables the cursor plane). * If the cursor plane was enabled, changing the legacy DPMS property value from off to on returned EINVAL. v2: * Minor changes to code comment and commit log, per review feedback. GitLab: https://gitlab.gnome.org/GNOME/mutter/-/issues/1108 GitLab: https://gitlab.gnome.org/GNOME/mutter/-/issues/1165 GitLab: https://gitlab.gnome.org/GNOME/mutter/-/issues/1344 Suggested-by: Daniel Vetter Acked-by: Daniel Vetter Reviewed-by: Nicholas Kazlauskas Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 32 ++++++------------- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index b51c527a3f0d..4ba8b54a2695 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5278,19 +5278,6 @@ static void dm_crtc_helper_disable(struct drm_crtc *crtc) { } -static bool does_crtc_have_active_cursor(struct drm_crtc_state *new_crtc_state) -{ - struct drm_device *dev = new_crtc_state->crtc->dev; - struct drm_plane *plane; - - drm_for_each_plane_mask(plane, dev, new_crtc_state->plane_mask) { - if (plane->type == DRM_PLANE_TYPE_CURSOR) - return true; - } - - return false; -} - static int count_crtc_active_planes(struct drm_crtc_state *new_crtc_state) { struct drm_atomic_state *state = new_crtc_state->state; @@ -5354,19 +5341,20 @@ static int dm_crtc_helper_atomic_check(struct drm_crtc *crtc, return ret; } + /* + * We require the primary plane to be enabled whenever the CRTC is, otherwise + * drm_mode_cursor_universal may end up trying to enable the cursor plane while all other + * planes are disabled, which is not supported by the hardware. And there is legacy + * userspace which stops using the HW cursor altogether in response to the resulting EINVAL. + */ + if (state->enable && + !(state->plane_mask & drm_plane_mask(crtc->primary))) + return -EINVAL; + /* In some use cases, like reset, no stream is attached */ if (!dm_crtc_state->stream) return 0; - /* - * We want at least one hardware plane enabled to use - * the stream with a cursor enabled. - */ - if (state->enable && state->active && - does_crtc_have_active_cursor(state) && - dm_crtc_state->active_planes == 0) - return -EINVAL; - if (dc_validate_stream(dc, dm_crtc_state->stream) == DC_OK) return 0; From e60c27f1ffc733e729319662f75419f4d4fb6a80 Mon Sep 17 00:00:00 2001 From: Jiansong Chen Date: Wed, 16 Sep 2020 19:17:20 +0800 Subject: [PATCH 09/10] drm/amdgpu: declare ta firmware for navy_flounder The firmware provided via MODULE_FIRMWARE appears in the module information. External tools(eg. dracut) may use the list of fw files to include them as appropriate in an initramfs, thus missing declaration will lead to request firmware failure in boot time. Signed-off-by: Jiansong Chen Reviewed-by: Tianci Yin Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c index e16874f30d5d..6c5d9612abcb 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -58,7 +58,7 @@ MODULE_FIRMWARE("amdgpu/arcturus_ta.bin"); MODULE_FIRMWARE("amdgpu/sienna_cichlid_sos.bin"); MODULE_FIRMWARE("amdgpu/sienna_cichlid_ta.bin"); MODULE_FIRMWARE("amdgpu/navy_flounder_sos.bin"); -MODULE_FIRMWARE("amdgpu/navy_flounder_asd.bin"); +MODULE_FIRMWARE("amdgpu/navy_flounder_ta.bin"); /* address block */ #define smnMP1_FIRMWARE_FLAGS 0x3010024 From 875d369d8f75275d30e59421602d9366426abff7 Mon Sep 17 00:00:00 2001 From: Bhawanpreet Lakha Date: Tue, 15 Sep 2020 17:26:29 -0400 Subject: [PATCH 10/10] drm/amd/display: Don't log hdcp module warnings in dmesg [Why] DTM topology updates happens by default now. This results in DTM warnings when hdcp is not even being enabled. This spams the dmesg and doesn't effect normal display functionality so it is better to log it using DRM_DEBUG_KMS() [How] Change the DRM_WARN() to DRM_DEBUG_KMS() Signed-off-by: Bhawanpreet Lakha Acked-by: Alex Deucher Reviewed-by: Rodrigo Siqueira Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.h b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.h index d3192b9d0c3d..47f8ee2832ff 100644 --- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.h +++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_log.h @@ -27,7 +27,7 @@ #define MOD_HDCP_LOG_H_ #ifdef CONFIG_DRM_AMD_DC_HDCP -#define HDCP_LOG_ERR(hdcp, ...) DRM_WARN(__VA_ARGS__) +#define HDCP_LOG_ERR(hdcp, ...) DRM_DEBUG_KMS(__VA_ARGS__) #define HDCP_LOG_VER(hdcp, ...) DRM_DEBUG_KMS(__VA_ARGS__) #define HDCP_LOG_FSM(hdcp, ...) DRM_DEBUG_KMS(__VA_ARGS__) #define HDCP_LOG_TOP(hdcp, ...) pr_debug("[HDCP_TOP]:"__VA_ARGS__)