linux

iv/linux

Author	SHA1	Message	Date
Roman Li	cf8b92a756	drm/amd/display: fix potential gpu reset deadlock [Why] In gpu reset dc_lock acquired in dm_suspend(). Asynchronously handle_hpd_rx_irq can also be called through amdgpu_dm_irq_suspend->flush_work, which also tries to acquire dc_lock. That causes a deadlock. [How] Check if amdgpu executing reset before acquiring dc_lock. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com> Acked-by: Wayne Lin <Wayne.Lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Felix Kuehling	2e4ec25162	drm/amdkfd: Make svm_migrate_put_sys_page static This function is only used in this source file. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Nirmoy Das	b617207e80	drm/amdgpu: remove excess function parameter Fix below htmldocs build warning: "warning: Excess function parameter 'vm_context' description in 'amdgpu_vm_init'" Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Philip Yang	1704ac8e43	drm/amdkfd: flush TLB after updating GPU page table To workaround the situation that vm retry fault keep coming after page table update. We are investigating the root cause, but once this issue happens, application will stuck and sometimes have to reboot to recover. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Peng Ju Zhou	589bb0ca47	drm/amdgpu: Rename the flags to eliminate ambiguity v2 The flags vf_reg_access_* may cause confusion, rename the flags to make it more clear. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:44 -04:00
Evan Quan	a1b6aa4947	drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC Polaris12 32bit ASIC needs a special MC firmware. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Zhigang Luo	e7de0d844e	drm/amdgpu: Add Aldebaran virtualization support 1. add Aldebaran in virtualization detection list. 2. disable Aldebaran virtual display support as there is no GFX engine in Aldebaran. 3. skip TMR loading if Aldebaran is in virtualizatin mode as it shares the one host loaded. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Zhigang Luo	cecd91b4f7	drm/amdkfd: Add Aldebaran virtualization support update kfd_supported_devices to enable Aldebaran virtualization support Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Zhigang Luo	838eb73c8d	drm/amdgpu: Add a new device ID for Aldebaran It is Aldebaran VF device ID, for virtualization support. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Jonathan Kim	559f418ed6	drm/amdkfd: report the numa weight between host and device over xgmi GPUs connected to CPUs over xGMI are bidirectional so set weight by a single hop both ways. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Jonathan Kim	deb689832f	drm/amdkfd: report atomics support in io_links over xgmi Link atomics support over xGMI should be reported independently of PCIe. Do not set NO_ATOMICS flags on devices that support xGMI but that do not have atomics support over PCIe. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Wan Jiabing	d1dfd370c3	drm/amd/display: Remove duplicate declaration of dc_state There are two declarations of struct dc_state here. Remove the later duplicate more secure. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Wan Jiabing	4034fba138	drm/amd/display: Remove duplicate include of hubp.h In commit 482812d56698e ("drm/amd/display: Set max TTU on DPG enable"), "hubp.h" was added which caused the duplicate include. To be on the safe side, remove the later duplicate include. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Rodrigo Siqueira	ddab8bd788	drm/amd/display: Fix two cursor duplication when using overlay Our driver supports overlay planes, and as expected, some userspace compositor takes advantage of these features. If the userspace is not enabling the cursor, they can use multiple planes as they please. Nevertheless, we start to have constraints when userspace tries to enable hardware cursor with various planes. Basically, we cannot draw the cursor at the same size and position on two separated pipes since it uses extra bandwidth and DML only run with one cursor. For those reasons, when we enable hardware cursor and multiple planes, our driver should accept variations like the ones described below: +-------------+ +--------------+ \| +---------+ \| \| \| \| \|Primary \| \| \| Primary \| \| \| \| \| \| Overlay \| \| +---------+ \| \| \| \|Overlay \| \| \| +-------------+ +--------------+ In this scenario, we can have the desktop UI in the overlay and some other framebuffer attached to the primary plane (e.g., video). However, userspace needs to obey some rules and avoid scenarios like the ones described below (when enabling hw cursor): +--------+ \|Overlay \| +-------------+ +-----+-------+ +-\| \|--+ \| +--------+ \| +--------+ \| \| +--------+ \| \| \|Overlay \| \| \|Overlay \| \| \| \| \| \| \| \| \| \| \| \| \| \| +--------+ \| +--------+ \| \| \| \| Primary \| \| Primary \| \| Primary \| +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ \| +--------+ \| Primary \| \| \|Overlay \| \| \| \| \| \| \| \| \| +--------+ \| +--------+ \| \| Primary \| \| \|Overlay \| \| +-------------+ +-\| \|--+ +--------+ If the userspace violates some of the above scenarios, our driver needs to reject the commit; otherwise, we can have unexpected behavior. Since we don't have a proper driver validation for the above case, we can see some problems like a duplicate cursor in applications that use multiple planes. This commit fixes the cursor issue and others by adding adequate verification for multiple planes. Change since V1 (Harry and Sean): - Remove cursor verification from the equation. Cc: Louis Li <Ching-shih.Li@amd.com> Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Cc: Harry Wentland <Harry.Wentland@amd.com> Cc: Hersen Wu <hersenxs.wu@amd.com> Cc: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	1f6e8eb153	drm/amdgpu: enable gfx ras in aldebran by default gfx ras now can be enabled by default in aldebaran Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	9adaac6eb4	drm/amdgpu: switch to mmhub ras callback for ras fini invoke callback function for mmhub ras fini Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	8e17ddc2e2	drm/amdgpu: retired reset_ras_error_count from hdp callbacks It was moved to hdp ras callbacks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	78871b6c8b	drm/amdgpu: enable ras error count query and reset for HDP add hdp block ras error query and reset support in amdgpu ras error count query and reset interface Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	7c63694eb9	drm/amdgpu: init/fini hdp v4_0 ras invoke hdp v4_0 ras init in gmc late_init phase while ras fini in gmc sw_fini phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	6f12507fad	drm/amdgpu: initialize hdp v4_0 ras functions hdp v4_0 support ras features Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	ca81b26d21	drm/amdgpu: implement hdp v4_0 ras functions implement hdp v4_0 ras functions, including ras init/fini, query/reset_error_counter Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	b11625f56f	drm/amdgpu: add helpers for hdp ras init/fini hdp ras init/fini are common functions that can be shared among hdp generations Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Hawking Zhang	8f4a92937b	drm/amdgpu: add hdp ras structures centralize all hdp ras operation to ras_funcs Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:43 -04:00
Simon Ser	b44cdca7fd	amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create This error code-path is missing a drm_gem_object_put call. Other error code-paths are fine. Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: 1769152ac64b ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Guenter Roeck	5760dcb953	drm/amd/display: Fix build warnings Fix the following build warnings. drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function ‘dm_update_mst_vcpi_slots_for_dsc’: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6242:46: warning: variable ‘old_con_state’ set but not used drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function ‘amdgpu_dm_commit_cursors’: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7709:44: warning: variable ‘new_plane_state’ set but not used The variables were introduced to be used in iterators, but not used. Use other iterators which don't require the unused variables. Fixes: 8ad278062de4e ("drm/amd/display: Disable cursors before disabling planes") Fixes: 29b9ba74f6384 ("drm/amd/display: Recalculate VCPI slots for new DSC connectors") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Fabio M. De Francesco	1fdbbc123f	drm/amd/amdgpu: Fix errors in documentation of function parameters In the function documentation, I removed the excess parameters, described the undocumented ones, and fixed the syntax errors. Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Alex Deucher	a273f315b9	drm/amdgpu/display: add documentation for dmcub_trace_event_en Was missing when this structure was updated. Fixes: 46a83eba276cd3 ("drm/amd/display: Add debugfs to control DMUB trace buffer events") Reviewed-by: Leo (Hanghong) Ma <hanghong.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Fabio M. De Francesco	d477eb1719	drm/amd/pm/powerplay/hwmgr: Fix kernel-doc syntax in documentation Fixed kernel-doc syntax errors in documentation of functions. Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Kai-Heng Feng	440d8774ef	drm/amdgpu: Register VGA clients after init can no longer fail When an amdgpu device fails to init, it makes another VGA device cause kernel splat: kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init kernel: amdgpu: probe of 0000:08:00.0 failed with error -110 ... kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G W 5.12.0-rc8+ #12 kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020 kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu] kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0 kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000 kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246 kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000 kernel: FS: 00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: vga_arbiter_notify_clients.part.0+0x4a/0x80 kernel: vga_get+0x17f/0x1c0 kernel: vga_arb_write+0x121/0x6a0 kernel: ? apparmor_file_permission+0x1c/0x20 kernel: ? security_file_permission+0x30/0x180 kernel: vfs_write+0xca/0x280 kernel: ksys_write+0x67/0xe0 kernel: __x64_sys_write+0x1a/0x20 kernel: do_syscall_64+0x38/0x90 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae kernel: RIP: 0033:0x7f93041e02f7 kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7 kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0 kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808 kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980 kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic kernel: CR2: 0000000000000018 kernel: ---[ end trace 76d04313d4214c51 ]--- Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(), so the VGA clients remain registered. So when vga_arbiter_notify_clients() iterates over registered clients, it causes NULL pointer dereference. Since there's no reason to register VGA clients that early, so solve the issue by putting them after all the goto cleanups. v2: - Remove redundant vga_switcheroo cleanup in failed: label. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Pavan Kumar Ramayanam	8e4d5d43cc	drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown The runtime resume PM op disregards the return value from amdgpu_device_resume(), masking errors for failed resumes at the PM layer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Pavan Kumar Ramayanam <pavan.ramayanam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Victor Zhao	db7f1e0140	drm/amdgpu: fix r initial values Sriov gets suspend of IP block <dce_virtual> failed as return value was not initialized. v2: return 0 directly to align original code semantic before this was broken out into a separate helper function instead of setting initial values Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-05-10 18:06:42 -04:00
Dennis Li	0e0036c7d1	drm/amdgpu: fix no full coverage issue for gprs initialization The wave's number per simd in aldebaran is changed to 8, so it is impossible to use old algorithm to initiate all sgprs with one threadgroup. The new algorithm firstly use three threadgroups to initiate most sgprs simultaneously and then use another threadgroup with 4 waves to cover other uninitiated sgprs. v2: Add more description about the new algorithm to clear sgprs and add some comment for shader binaries Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Harish Kasiviswanathan	8baa6018b7	drm/amdkfd: Add Aldebaran gws support v2: updated MEC FW version after validating gws with debugger Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Philip Yang	b3dc91f973	drm/amdkfd: enable subsequent retry fault After draining the stale retry fault, or failed to validate the range to recover, have to remove the fault address from fault filter ring, to be able to handle subsequent retry interrupt on same address. Otherwise the retry fault will not be processed to recover until timeout passed. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Philip Yang	36255b5f61	drm/amdgpu: address remove from fault filter Add interface to remove address from fault filter ring by resetting fault ring entry key, then future vm fault on the address will be processed to recover. Define fault key as atomic64_t type to use atomic read/set/cmpxchg key to protect fault ring access by interrupt handler and interrupt deferred work for vg20. Change fault->timestamp to 48-bit to share same uint64_t with 8-bit fault->next, it is enough for 48bit IH timestamp. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Philip Yang	373e3ccd85	drm/amdkfd: handle stale retry fault Retry fault interrupt maybe pending in IH ring after GPU page table is updated to recover the vm fault, because each page of the range generate retry fault interrupt. There is race if application unmap range to remove and free the range first and then retry fault work restore_pages handle the retry fault interrupt, because range can not be found, this vm fault can not be recovered and report incorrect GPU vm fault to application. Before unmap to remove and free range, drain retry fault interrupt from IH ring1 to ensure no retry fault comes after the range is removed. Drain retry fault interrupt skip the range which is on deferred list to remove, or the range is child range, which is split by unmap, does not add to svms and have interval notifier. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Philip Yang	11dd55d174	drm/amdgpu: return IH ring drain finished if ring is empty Sometimes IH do not setup ring wptr overflow flag after wptr exceed rptr. As a workaround, if IH rptr equals to wptr, ring is empty, return true to indicate IH ring checkpoint is processed, IH ring drain is finished. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Philip Yang	4999e398e2	drm/amdkfd: retry validation to recover range GPU vm retry fault recover range need retry validation if 1. range is split in parallel by unmap while recover 2. range migrate to system memory and range is updated in system memory while recover Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Jonathan Kim	c3c5cc9a83	drm/amdkfd: fix spelling mistake in packet manager The plural of 'process' should be 'processes'. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Jack Zhang	95ea3dbc4e	drm/amd/amdgpu/sriov disable all ip hw status by default Disable all ip's hw status to false before any hw_init. Only set it to true until its hw_init is executed. The old 5.9 branch has this change but somehow the 5.11 kernrel does not have this fix. Without this change, sriov tdr have gfx IB test fail. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Review-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Christian König	dd03daec0f	drm/amdgpu: restructure amdgpu_vram_mgr_new Merge the two loops, loosen the restriction for big allocations. This reduces the CPU overhead in the good case, but increases it a bit under memory pressure. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-and-Tested-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:05 -04:00
Philip Yang	c0f76fc8ad	drm/amdkfd: fix double free device pgmap resource Use devm_memunmap_pages instead of memunmap_pages to release pgmap and remove pgmap from device action, to avoid double free pgmap when unloading driver module. Release device memory region if failed to create device memory pages structure. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:04 -04:00
Colin Ian King	dd57e65f7c	drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered" There is a spelling mistake in a pr_debug message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:04 -04:00
Feifei Xu	5d11699914	drm/amdgpu: Correct and simplify sdma 4.x irq.num_types Correct and init the sdma4.x irq.num_types. v2: squash in fix (Alex) Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:36:04 -04:00
Feifei Xu	3d2bee9188	drm/amdgpu: Change the sdma interrupt print level Change the print level into debug. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:51 -04:00
Victor Zhao	041e69160d	drm/amdgpu/sriov: Remove clear vf fw support PSP clear_vf_fw feature is outdated and has been removed. Remove the related functions. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:51 -04:00
Aric Cyr	18fa44625c	drm/amd/display: 3.2.133 Signed-off-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Wayne Lin <waynelin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:51 -04:00
Anthony Koo	8167538ffb	drm/amd/display: [FW Promotion] Release 0.0.63 Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Wayne Lin <waynelin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:50 -04:00
Max.Tseng	069a11cca5	drm/amd/display: Add SE_DCN3_REG_LIST for control SDP num [Why] New platform. Need to add corresponding register control Signed-off-by: Max.Tseng <Max.Tseng@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Wayne Lin <waynelin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:50 -04:00
Yu-ting Shen	088bebc79e	drm/amd/display: avoid to authentication when DEVICE_COUNT=0 [why] we don't support authentication with DEVICE_COUNT=0 [how] check value DEVICE_COUNT before doing authentication Signed-off-by: Yu-ting Shen <Yu-ting.Shen@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Wayne Lin <waynelin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:50 -04:00

... 2 3 4 5 6 ...

999918 Commits