Commit Graph

11904 Commits

Author SHA1 Message Date
6ce2ea07c5 drm/amdgpu/soc21: Add video cap query support for VCN_4_0_4
Added the video capability query support for VCN version 4_0_4

Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-03-09 22:06:19 -05:00
b42fee5e0b drm/amdgpu: fix error checking in amdgpu_read_mm_registers for nv
Properly skip non-existent registers as well.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-03-09 22:06:19 -05:00
2915e43a03 drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc21
Properly skip non-existent registers as well.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-03-09 22:06:19 -05:00
0dcdf8498e drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc15
Properly skip non-existent registers as well.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2442
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-03-09 22:06:19 -05:00
8879ec6dfd drm/amdgpu: Fix the warning info when removing amdgpu device
Actually, the drm_dev_enter in psp_cmd_submit_buf does not
protect anything. If DRM device is unplugged, it will always
check the condition in WARN_ON. So drop drm_dev_enter and
drm_dev_exit in psp_cmd_submit_buf.

When removing amdgpu, the calling order is as follows:
amdgpu_pci_remove
    drm_dev_unplug
    amdgpu_driver_unload_kms
        amdgpu_device_fini_hw
            amdgpu_device_ip_fini_early
                psp_hw_fini
                    psp_ras_terminate
                        psp_ta_unloadye
                            psp_cmd_submit_buf

[ 4507.740388] Call Trace:
[ 4507.740389]  <TASK>
[ 4507.740391]  psp_ta_unload+0x44/0x70 [amdgpu]
[ 4507.740485]  psp_ras_terminate+0x4d/0x70 [amdgpu]
[ 4507.740575]  psp_hw_fini+0x28/0xa0 [amdgpu]
[ 4507.740662]  amdgpu_device_fini_hw+0x328/0x442 [amdgpu]
[ 4507.740791]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[ 4507.740875]  amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[ 4507.740962]  ? _raw_spin_unlock_irqrestore+0x27/0x43
[ 4507.740965]  ? __pm_runtime_resume+0x60/0x90
[ 4507.740968]  pci_device_remove+0x39/0xb0
[ 4507.740971]  device_remove+0x46/0x70
[ 4507.740972]  device_release_driver_internal+0xd1/0x160
[ 4507.740974]  driver_detach+0x4a/0x90
[ 4507.740975]  bus_remove_driver+0x6c/0xf0
[ 4507.740976]  driver_unregister+0x31/0x50
[ 4507.740977]  pci_unregister_driver+0x40/0x90
[ 4507.740978]  amdgpu_exit+0x15/0x120 [amdgpu]

v2: fix commit message style issue

Signed-off-by: lyndonli <Lyndon.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-09 22:06:19 -05:00
1717cc5f29 drm/amd: Fix initialization mistake for NBIO 7.3.0
The same strapping initialization issue that happened on NBIO 7.5.1
appears to be happening on NBIO 7.3.0.
Apply the same fix to 7.3.0 as well.

Note: This workaround relies upon the integrated GPU being enabled
in BIOS. If the integrated GPU is disabled in BIOS a different
workaround will be required.

Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-09 22:06:19 -05:00
93bb18d2a8 drm/amdgpu: Fix call trace warning and hang when removing amdgpu device
On GPUs with RAS enabled, below call trace and hang are observed when
shutting down device.

v2: use DRM device unplugged flag instead of shutdown flag as the check to
prevent memory wipe in shutdown stage.

[ +0.000000] RIP: 0010:amdgpu_vram_mgr_fini+0x18d/0x1c0 [amdgpu]
[ +0.000001] PKRU: 55555554
[ +0.000001] Call Trace:
[ +0.000001] <TASK>
[ +0.000002] amdgpu_ttm_fini+0x140/0x1c0 [amdgpu]
[ +0.000183] amdgpu_bo_fini+0x27/0xa0 [amdgpu]
[ +0.000184] gmc_v11_0_sw_fini+0x2b/0x40 [amdgpu]
[ +0.000163] amdgpu_device_fini_sw+0xb6/0x510 [amdgpu]
[ +0.000152] amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[ +0.000090] drm_dev_release+0x28/0x50 [drm]
[ +0.000016] devm_drm_dev_init_release+0x38/0x60 [drm]
[ +0.000011] devm_action_release+0x15/0x20
[ +0.000003] release_nodes+0x40/0xc0
[ +0.000001] devres_release_all+0x9e/0xe0
[ +0.000001] device_unbind_cleanup+0x12/0x80
[ +0.000003] device_release_driver_internal+0xff/0x160
[ +0.000001] driver_detach+0x4a/0x90
[ +0.000001] bus_remove_driver+0x6c/0xf0
[ +0.000001] driver_unregister+0x31/0x50
[ +0.000001] pci_unregister_driver+0x40/0x90
[ +0.000003] amdgpu_exit+0x15/0x120 [amdgpu]

Signed-off-by: lyndonli <Lyndon.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-09 22:06:19 -05:00
06630fb9fc drm/amdgpu: Support umc node harvest config on umc v8_10
Don't need to query error count and error address on harvest umc nodes.
v2: Fix code bug, use active_mask instead of harvsest_config
    and remove unnecessary argument in LOOP macro.
v3: Leave adev->gmc.num_umc unchanged.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-07 15:55:58 -05:00
2eb29d59dd Merge tag 'drm-next-2023-03-03-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "fbdev:
   - fix uninit var in error path

  shmem:
   - revert unGPLing an export

  i915:
   - Don't use stolen memory or BAR mappings for ring buffers with LLC
   - Add inverted backlight quirk for HP 14-r206nv
   - Fix GSI offset for MCR lookups
   - GVT fixes (memleak, debugfs attributes, kconfig, typos)

  amdgpu:
   - SMU 13 fixes
   - Enable TMZ for GC 10.3.6
   - Misc display fixes
   - Buddy allocator fixes
   - GC 11 fixes
   - S0ix fix
   - INFO IOCTL queries for GC 11
   - VCN harvest fixes for SR-IOV
   - UMC 8.10 RAS fixes
   - Don't restrict bpc to 8
   - NBIO 7.5 fix
   - Allow freesync on PCon for more devices

  amdkfd:
   - SDMA fix
   - Illegal memory access fix"

* tag 'drm-next-2023-03-03-1' of git://anongit.freedesktop.org/drm/drm: (45 commits)
  drm/amdgpu/vcn: fix compilation issue with legacy gcc
  drm/amd/display: Extend Freesync over PCon support for more devices
  Revert "drm/amd/display: Do not set DRR on pipe commit"
  drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
  drm/amd/display: Ext displays with dock can't recognized after resume
  drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
  drm/amdgpu: remove unused variable ring
  drm/amd/display: fix dm irq error message in gpu recover
  drm/amd: Fix initialization for nbio 7.5.1
  drm/amd/display: Don't restrict bpc to 8 bpc
  drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable
  drm/radeon: Fix eDP for single-display iMac11,2
  drm/shmem-helper: Revert accidental non-GPL export
  drm: omapdrm: Do not use helper unininitialized in omap_fbdev_init()
  drm/amd/pm: downgrade log level upon SMU IF version mismatch
  drm/amdgpu: Add ecc info query interface for umc v8_10
  drm/amdgpu: Add convert_error_address function for umc v8_10
  drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err
  drm/amdgpu: change default behavior of bad_page_threshold parameter
  drm/amdgpu: exclude duplicate pages from UMC RAS UE count
  ...
2023-03-02 15:08:54 -08:00
6bb811d0ee drm/amdgpu/vcn: fix compilation issue with legacy gcc
This patch is used to fix following compilation issue with legacy gcc
error: ‘for’ loop initial declarations are only allowed in C99 mode
	for (int i = 0; i < adev->vcn.num_vcn_inst; ++i) {

Signed-off-by: bobzhou <bob.zhou@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-01 22:45:08 -05:00
23f4a2d29b drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
The call trace occurs when the amdgpu is removed after
the mode1 reset. During mode1 reset, from suspend to resume,
there is no need to reinitialize the ta firmware buffer
which caused the bo pin_count increase redundantly.

[  489.885525] Call Trace:
[  489.885525]  <TASK>
[  489.885526]  amdttm_bo_put+0x34/0x50 [amdttm]
[  489.885529]  amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu]
[  489.885620]  psp_free_shared_bufs+0xb7/0x150 [amdgpu]
[  489.885720]  psp_hw_fini+0xce/0x170 [amdgpu]
[  489.885815]  amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu]
[  489.885960]  ? blocking_notifier_chain_unregister+0x56/0xb0
[  489.885962]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[  489.886049]  amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[  489.886132]  ? __pm_runtime_resume+0x60/0x90
[  489.886134]  pci_device_remove+0x3e/0xb0
[  489.886135]  __device_release_driver+0x1ab/0x2a0
[  489.886137]  driver_detach+0xf3/0x140
[  489.886138]  bus_remove_driver+0x6c/0xf0
[  489.886140]  driver_unregister+0x31/0x60
[  489.886141]  pci_unregister_driver+0x40/0x90
[  489.886142]  amdgpu_exit+0x15/0x451 [amdgpu]

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-01 22:41:54 -05:00
cca3306488 drm/amdgpu: remove unused variable ring
building with gcc and W=1 reports
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:81:29: error: variable
  ‘ring’ set but not used [-Werror=unused-but-set-variable]
   81 |         struct amdgpu_ring *ring;
      |                             ^~~~

ring is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-01 22:40:58 -05:00
65a2400080 drm/amd: Fix initialization for nbio 7.5.1
A mistake has been made in the BIOS for some ASICs with NBIO 7.5.1
where some NBIO registers aren't properly setup.

Ensure that they're set during initialization.

Tested-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-03-01 22:38:22 -05:00
1bf56f2525 drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable
Fixes following warnings:
warning: no previous prototype for 'umc_v8_10_convert_error_address'
warning: variable 'channel_index' set but not used

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-01 22:37:55 -05:00
3822a7c409 Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
b1e9a718af drm/amdgpu: Add ecc info query interface for umc v8_10
Support ecc info query for umc v8_10.

v2: Simplied by convert_error_address.
v3: Remove unused variable and invalid checking.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:59 -05:00
2d53b579f3 drm/amdgpu: Add convert_error_address function for umc v8_10
Add convert_error_address for umc v8_10.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:59 -05:00
22106ed0be drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err
bad_page_threshold controls page retirement behavior and it should be
also checked.

v2: simplify the condition of bad page handling path.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:59 -05:00
f3cbe70e21 drm/amdgpu: change default behavior of bad_page_threshold parameter
Ignore ras umc bad page threshold by default, GPU initialization won't
be stopped in this mode.

v2: refine the description of bad_page_threshold.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:59 -05:00
4d33e0f134 drm/amdgpu: exclude duplicate pages from UMC RAS UE count
If a UMC bad page is reserved but not freed by an application, the
application may trigger uncorrectable error repeatly by accessing the page.

v2: add specific function to do the check.
v3: remove duplicate pages, calculate new added bad page number.
v4: reuse save_bad_pages to calculate new added bad page number.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:59 -05:00
e69c785723 drm/amdgpu: add umc retire unit element
It records how many bad pages are retired in one uncorrectable error.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:59 -05:00
b299221faf drm/amdgpu: add more fields into device info, caches sizes, etc.
AMDGPU_IDS_FLAGS_CONFORMANT_TRUNC_COORD: important for conformance on gfx11
Other fields are exposed from IP discovery.
enabled_rb_pipes_mask_hi is added for future chips, currently 0.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
6dcb38a19e drm/amdgpu/vcn: set and use harvest config
in early init to set harvest config if the vcn0/1 is disabled
rather than hard-code the ring attributes as before did

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
ca47518663 drm/amd: Don't allow s0ix on APUs older than Raven
APUs before Raven didn't support s0ix.  As we just relieved some
of the safety checks for s0ix to improve power consumption on
APUs that support it but that are missing BIOS support a new
blind spot was introduced that a user could "try" to run s0ix.

Plug this hole so that if users try to run s0ix on anything older
than Raven it will just skip suspend of the GPU.

Fixes: cf488dcd0a ("drm/amd: Allow s0ix without BIOS support")
Suggested-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
f9c35f4fff drm/amdgpu: fix incorrect active rb bitmap for gfx11
GFX v11 changes RB_BACKEND_DISABLE related registers
from per SA to global ones. The approach to query active
rb bitmap needs to be changed accordingly. Query per
SE setting returns wrong active RB bitmap especially
in the case when some of SA are disabled. With the new
approach, driver will generate the active rb bitmap
based on active SA bitmap and global active RB bitmap.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
2866cc0961 drm/amdgpu: optimize VRAM allocation when using drm buddy
Since the VRAM manager changed from drm mm to drm buddy. It's
not necessary to allocate 2MB aligned VRAM for more than 2MB
unaligned size, and then do trim. This method improves the
allocation efficiency and reduces memory fragmentation.

v2: Correct the remainder operation

Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
c105518679 drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system
Since VRAM manager is changed from drm mm to drm buddy, the
TOP_DOWN flag should not be set by default in the large bar system.
Removing this flag helps improve drm buddy allocator efficiency and
reduce the risk of splitting higher order block into lower order.

Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Christian K�nig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
455ad25997 drm/amdgpu: Select DRM_DISPLAY_HDCP_HELPER in amdgpu
Keeps this selection with the rest of the DRM HELPER
selection.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
08c6ab7fb4 drm/amdgpu: add tmz support for GC 10.3.6
this patch to add tmz support for GC 10.3.6

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23 17:35:58 -05:00
a5c95ca18a Merge tag 'drm-next-2023-02-23' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "There are a bunch of changes all over in the usual places.

  Highlights:

   - habanalabs moves from misc to accel

   - first accel driver for Intel VPU (Versatile Processing Unit)
     inference engine

   - dropped all the ancient legacy DRI1 drivers. I think it's been at
     least 10 years since anyone has heard about these.

   - Intel DG2 updates and prelim Meteorlake enablement

   - etnaviv adds support for Versilicon NPU device (a GPU like engine
     with inference accelerators)

  Detailed summary:

  Removals:
   - remove legacy dri1 drivers: i810, mga, r128, savage, sis, tdfx, via

  New driver:
   - intel VPU accelerator driver
   - habanalabs comes via drm tree now

  drm/core:
   - use drm_dbg_ helpers in several places
   - Document defaults for CRTC backgrounds
   - Document use of drm_minor

  edid:
   - improve mode parsing and refactoring

  connector:
   - support analog TV mode property

  media:
   - add some common formats

  udmabuf:
   - add vmap/vunmap methods

  fourcc:
   - add XRGB1555 and RGB565 formats
   - document open source user waiver

  firmware:
   - fix color-format selection for system framebuffer

  format-helper:
   - Add conversion from XRGB8888 to various sysfb formats
   - Make XRGB8888 the only driver-emulated legacy format
   - Add conversion from XRGB8888 to XBGR8888 and ABGR8888

  fb-helper:
   - fix preferred depth and bpp values across drivers
   - Avoid blank consoles from selecting an incorrect color format

  probe-helper:
   - Enable/disable HPD on connectors

  scheduler:
   - Fix lockup in drm_sched_entity_kill()
   - Deprecate drm_sched_resubmit_jobs()

  bridge:
   - remove unused functions
   - implement i2c probe_new in various drivers
   - ite-it6505: Locking fixes, Cache EDID data
   - ite-it66121: Support IT6610 chip
   - lontium-tl9611: Fix HDMI on DragonBoard 845c
   - parade-ps8640: Use atomic bridge functions
   - Support i.MX93 LDB plus DT bindings

  debugfs:
   - add per device helpers and convert drivers

  displayport:
   - mst fixes
   - add DP adaptive sync DPCD definitions

  fbdev:
   - always pick 32bpp as default
   - remove some unused code

  simpledrm:
   - support system memory framebuffers

  panel:
   - add orientation quirks for Lenovo Yoga Tab 3 X90F and DynaBook K50
   - Use ktime_get_boottime() to measure power-down delay
   - Fix auto-suspend delay
   - Visionox VTDR6130 AMOLED DSI
   - Support Himax HX8394
   - Convert many drivers to common generic DSI write-sequence helper
   - AUO A030JTN01

  ttm:
   - drop bo wait wrapper
   - fix MIPS build

  habanalabs:
   - moved driver to accel subsystem
   - gaudi2 decoder error improvement
   - more trace events
   - Gaudi2 abrupt reset by firmware support
   - add uAPI to flush memory transactions
   - add uAPI to pass through userspace reqs to fw
   - remove dma-buf export by handle

  amdgpu:
   - add new INFO queries for peak and min sclk/mclk for profile modes
   - Add PCIe info to the INFO IOCTL
   - secure display support for multiple displays
   - DML optimizations
   - DCN 3.2 updates
   - PSR updates
   - DP 2.1 updates
   - SR-IOV RAS updates
   - VCN RAS support
   - SMU 13.x updates
   - Switch 1 element arrays to flexible arrays
   - Add RAS support for DF 4.3
   - Stack size improvements
   - S0ix rework
   - Allow 0 as a vram limit on APUs
   - Handle profiling modes for SMU13.x
   - Fix possible segfault in failure case
   - Rework FW requests to happen in early_init for all IPs so that we
     don't lose the sbios console if FW is missing
   - Fix power reporting on certain firmwares for CZN/RN
   - Allow S0ix without BIOS support
   - Enable freesync over PCon
   - Re-enable the AGP aperture on GMC 11.x

  amdkfd:
   - Error handling fixes
   - PASID fixes
   - Fix for cleared VRAM BOs
   - Fix cleanup if GPUVM creation fails
   - Memory accounting fix
   - Use resource_size rather than open codeing it
   - GC11 mGPU fix

  radeon:
   - Switch 1 element arrays to flexible arrays
   - Fix memory leak on shutdown
   - move to new logging

  i915:
   - Meteorlake display/OA/GSC fw/workarounds enabling
   - DP MST DSC support
   - Gamma/degamma readout support for the state checker
   - Enable SDP split support for DP 2.0
   - Add probe blocking support to i915.force_probe parameter
   - Enable Xe HP 4tile support
   - Avoid display direct calls to uncore
   - Fix HuC delayed load memory leaks
   - Add DG2 workarounds Wa_18018764978 and Wa_18019271663
   - Improve suspend / resume times with VT-d scanout workaround active
   - Fix DG2 visual corruption on small BAR systems by not forgetting to
     copy CCS aux state
   - Fix TLB invalidation for Gen12.50 video and compute engines
   - Enable HF-EEODB by switching HDMI, DP and LVDS to use struct
     drm_edid
   - Start using unversioned DMC firmware paths for new platforms
   - ELD refactor: Stop using hardware buffer, precompute ELD
   - lots of display code refactoring

  nouveau:
   - drop legacy ioctl support
   - replace 0-sized array

  msm:
   - dpu/dsi/mdss: Support for SM8350, SM8450 SM8550 and SC8280XP platform
   - Added bindings for SM8150
   - dpu: Partial support for DSC on SM8150 and SM8250
   - dpu: Fixed color transformation matrix being lost on suspend/resume
   - dp: Support SDM845 and SC8280XP platforms
   - dp: Support for limiting DP link rate via DT property
   - dsi: Validate display modes according to the DSI OPP table
   - dsi: DSI PHY support for the SM6375 platform
   - Add MSM_SUBMIT_BO_NO_IMPLICI
   - a2xx: Support to load legacy firmware
   - a6xx: GPU devcore dump updates for a650/a660
   - GPU devfreq tuning and fixes
   - Turn 8960 HDMI PHY into clock provider,
   - Make 8960 HDMI PHY use PXO clock from DT

  etnaviv:
   - experimental versilicon NPU support
   - report GPU load via fdinfo format
   - MMU fault message improvements

  tegra:
   - rework syncpoint interrupt

  mediatek:
   - DSI timing fix
   - fix config deps

  ast:
   - various fixes

  exynos:
   - restore bridge chain order fixes

  gud:
   - convert to shadow plane buffers
   - perform flushing synchronously during atomic update
   - Use new debugfs helpers

  arm/hdlcd:
   - Use new debugfs helper

  ili9486:
   - Support 16-bit pixel data

  imx:
   - Split off IPUv3 driver

  mipi-dbi:
   - convert to DRM shadow-plane helpers
   - rsp driver changes
   - Support separate I/O-voltage supply

  mxsfb:
   - Depend on ARCH_MXS or ARCH_MXC

  sun4i:
   - convert to new TV mode property

  vc4:
   - convert to new TV mode property
   - kunit tests
   - Support RGB565 and RGB666 formats
   - convert dsi driver to bridge
   - Various HVS an CRTC fixes

  v3d:
   - Do not opencode drm_gem_object_lookup()

  virtio:
   - improve tracing

  vkms:
   - support small cursors in IGT tests
   - Fix SEGFAULT from incorrect GEM-buffer mapping

  rcar-du:
   - fixes and improvements"

* tag 'drm-next-2023-02-23' of git://anongit.freedesktop.org/drm/drm: (1455 commits)
  msm/fbdev: fix unused variable warning with clang.
  drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()
  dma-buf: make kobj_type structure constant
  drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt()
  drm/amd/display: disable SubVP + DRR to prevent underflow
  drm/amd/display: Fail atomic_check early on normalize_zpos error
  drm/amd/pm: avoid unaligned access warnings
  drm/amd/display: avoid unaligned access warnings
  drm/amd/display: Remove duplicate/repeating expressions
  drm/amd/display: Remove duplicate/repeating expression
  drm/amd/display: Make variables declaration inside ifdef guard
  drm/amd/display: Fix excess arguments on kernel-doc
  drm/amd/display: Add previously missing includes
  drm/amd/amdgpu: Add function prototypes to headers
  drm/amd/display: Add function prototypes to headers
  drm/amd/display: Turn global functions into static
  drm/amd/display: remove unused _calculate_degamma_curve function
  drm/amd/display: remove unused func declaration from resource headers
  drm/amd/display: unset initial value for tf since it's never used
  drm/amd/display: camel case cleanup in color_gamma file
  ...
2023-02-22 18:28:03 -08:00
8f32378986 drm/amd/amdgpu: fix warning during suspend
Freeing memory was warned during suspend.
Move the self test out of suspend.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825
Cc: jfalempe@redhat.com
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Tested-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-02-15 22:46:04 -05:00
01543dcf99 drm/amd/display: Fix excess arguments on kernel-doc
Remove arguments present on kernel-doc that are not present on the
function declaration and add the new ones if present.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-15 22:24:43 -05:00
877b57c6b5 drm/amd/amdgpu: Add function prototypes to headers
Add function prototypes to headers to reduce the number of
-Wmissing-prototypes warnings.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-15 22:24:36 -05:00
dc907c9db8 drm/amd/amdgpu: fix warning during suspend
Freeing memory was warned during suspend.
Move the self test out of suspend.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825
Cc: jfalempe@redhat.com
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Tested-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-15 22:19:43 -05:00
fa9b4155c3 drm/amdgpu: Revert programming GRBM_GFX_* in RLCG interface to support GFX9
[Why]
Regression of commit 72fef4980d ("drm/amdgpu: Remove writing GRBM_GFX_CNTL in RLCG interface under SRIOV") on GFX9.
According to GFX9 VF using different method to access GC registers including MMIO(direct) and RLCG(indirect),
removing GRBM_GFX_* writing would make PIPE/ME/VM/QUEUE selection chaos leading to some OCL benchmark failure.

For example,
using RLCG interface to program GRBM_GFX_CNTL/INDEX for selecting MEC(actually the value is only in scratch2/3),
then using MMIO directly program a MEC register in VF driver.
The register programming are invalid due to GC switched to incorrect ME.

[How]
With checking RLCG accessing flag, keep writing GRBM_GFX_* as a legacy way.
But it is still skipped on GFX10+ to avoid violation occurrence.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-14 16:04:48 -05:00
230dd6bb61 drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10
implement mode2 reset on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-14 15:47:15 -05:00
be9f1daad7 drm/amdgpu: Fix the warning info when unload or remove amdgpu
Checking INVOKE_CMD  to fix the below warning info when
unload or remove amdgpu driver

[  319.489809] Call Trace:
[  319.489810]  <TASK>
[  319.489812]  psp_ta_unload+0x9a/0xd0 [amdgpu]
[  319.489926]  ? smu_smc_hw_cleanup+0x2f6/0x360 [amdgpu]
[  319.490072]  psp_hw_fini+0xea/0x170 [amdgpu]
[  319.490231]  amdgpu_device_fini_hw+0x2fc/0x413 [amdgpu]
[  319.490398]  ? blocking_notifier_chain_unregister+0x56/0xb0
[  319.490401]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[  319.490493]  amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[  319.490583]  ? __pm_runtime_resume+0x60/0x90
[  319.490586]  pci_device_remove+0x3b/0xb0
[  319.490588]  __device_release_driver+0x1a8/0x2a0
[  319.490591]  driver_detach+0xf3/0x140
[  319.490593]  bus_remove_driver+0x6c/0xf0
[  319.490595]  driver_unregister+0x31/0x60
[  319.490597]  pci_unregister_driver+0x40/0x90
[  319.490599]  amdgpu_exit+0x15/0x44e [amdgpu]

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-14 15:46:55 -05:00
1c71222e5f mm: replace vma->vm_flags direct modifications with modifier calls
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-09 16:51:39 -08:00
777c1e01cb Merge tag 'amd-drm-fixes-6.2-2023-02-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.2-2023-02-09:

amdgpu:
- Add a parameter to disable S/G display
- Re-enable S/G display on all DCNs

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209174504.7577-1-alexander.deucher@amd.com
2023-02-10 09:49:13 +10:00
337d5b5edc Merge tag 'drm-misc-fixes-2023-02-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A fix for a circular refcounting in drm/client, one for a memory leak in
amdgpu and a virtio fence fix when interrupted

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209083600.7hi6roht6xxgldgz@houat
2023-02-10 09:15:57 +10:00
bf0207e172 drm/amdgpu: add S/G display parameter
Some users have reported flickerng with S/G display.  We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to.  We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
Add a option to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further.

v2: fix typo

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09 10:30:36 -05:00
4693e852f1 drm/amdgpu: add S/G display parameter
Some users have reported flickerng with S/G display.  We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to.  We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
Add a option to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further.

v2: fix typo

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09 10:29:28 -05:00
73ac3f22f5 drm/amdgpu/gmc11: fix system aperture set when AGP is enabled
Need to cover both FB and AGP apertures.

v2: fix missed gfxhub_v3_0_3.c

Fixes: c6eafee038 ("Revert "Revert "drm/amdgpu/gmc11: enable AGP aperture""")
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09 10:02:53 -05:00
6c1a6d0b64 amd/amdgpu: remove test ib on hw ring
test ib function is not necessary on hw ring,
so remove it.

v2: squash in NULL check fix

Signed-off-by: JesseZhang <Jesse.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-08 22:33:31 -05:00
5ad7bbf3db drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 <TASK>
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c8b4 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-02-08 22:32:50 -05:00
e53448e0a1 drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes
The pid field corresponds to the result of gettid() in userspace.
However, userspace cannot reliably attribute PTE events to processes
with just the thread id. This patch allows userspace to easily
attribute PTE update events to specific processes by comparing this
field with the result of getpid().

For attributing events to specific threads, the thread id is also
contained in the common fields of each trace event.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-02-08 22:32:40 -05:00
5630a35024 drm/amd/amdgpu: enable athub cg 11.0.3
enable athub cg on gc 11.0.3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-08 22:18:13 -05:00
e8a9c68842 amd/amdgpu: remove test ib on hw ring
test ib function is not necessary on hw ring,
so remove it.

v2: squash in NULL check fix

Signed-off-by: JesseZhang <Jesse.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-08 22:04:12 -05:00
70f1872e38 drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 <TASK>
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c8b4 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-08 22:04:12 -05:00
0082e2fcd7 drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes
The pid field corresponds to the result of gettid() in userspace.
However, userspace cannot reliably attribute PTE events to processes
with just the thread id. This patch allows userspace to easily
attribute PTE update events to specific processes by comparing this
field with the result of getpid().

For attributing events to specific threads, the thread id is also
contained in the common fields of each trace event.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-08 17:36:44 -05:00