9730 Commits

Author SHA1 Message Date
Philip Yang
d7eff46c21 drm/amdgpu: fix fdinfo race with process exit
Get process vm root BO ref in case process is exiting and root BO is
freed, to avoid NULL pointer dereference backtrace:

BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
Call Trace:
amdgpu_show_fdinfo+0xfe/0x2a0 [amdgpu]
seq_show+0x12c/0x180
seq_read+0x153/0x410
vfs_read+0x91/0x140[ 3427.206183]  ksys_read+0x4f/0xb0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-31 14:20:40 -04:00
xinhui pan
703677d934 drm/amdgpu: Fix a deadlock if previous GEM object allocation fails
Fall through to handle the error instead of return.

Fixes: f8aab60422c37 ("drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs")
Cc: stable@vger.kernel.org
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-31 14:20:08 -04:00
Guchun Chen
f7d6779df6 drm/amdgpu: stop scheduler when calling hw_fini (v2)
This gurantees no more work on the ring can be submitted
to hardware in suspend/resume case, otherwise a potential
race will occur and the ring will get no chance to stay
empty before suspend.

v2: Call drm_sched_resubmit_job before drm_sched_start to
restart jobs from the pending list.

Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-08-31 14:19:47 -04:00
John Clements
156872b07e drm/amdgpu: Clear RAS interrupt status on aldebaran
Resolve incorrect register address

Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-31 14:19:37 -04:00
Lang Yu
50c6dedeb1 drm/amdgpu: show both cmd id and name when psp cmd failed
To cover the corner case that people want to know the ID
of an UNKNOWN CMD.

Suggested-by: John Clements <john.clements@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 15:09:21 -04:00
Nicholas Kazlauskas
c5d3c9a093 drm/amdgpu: Enable S/G for Yellow Carp
Missing code for Yellow Carp to enable scatter gather - follows how
DCN21 support was added.

Tested that 8k framebuffer allocation and display can now succeed after
applying the patch.

v2: Add hookup in DM

Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-08-30 15:08:38 -04:00
Evan Quan
602e338ffe drm/amdgpu: reenable BACO support for 699F:C7 polaris12 SKU
This reverts the commit below:
"drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily".
As the S3 hang issue has been fixed by another commit:
"drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend".

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 14:59:33 -04:00
YuBiao Wang
64261a0d06 drm/amd/amdgpu: Add ready_to_reset resp for vega10
Send response to host after received the flr notification from host.
Port NV change to vega10.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 14:59:33 -04:00
Alex Deucher
8f0c93f454 drm/amdgpu: add some additional RDNA2 PCI IDs
New PCI IDs.

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 14:59:33 -04:00
Yifan Zhang
6333a495f5 drm/amdgpu: correct comments in memory type managers
The parameters were renamed.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 14:59:33 -04:00
Luben Tuikov
cc947bf91b drm/amdgpu: Process any VBIOS RAS EEPROM address
We can now process any RAS EEPROM address from
VBIOS. Generalize so as to compute the top three
bits of the 19-bit EEPROM address, from any byte
returned as the "i2c address" from VBIOS.

Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 14:59:33 -04:00
Luben Tuikov
a6a355a22f drm/amdgpu: Fixes to returning VBIOS RAS EEPROM address
1) Generalize the function--if the user didn't set
   i2c_address, still return true/false to
   indicate whether VBIOS contains the RAS EEPROM
   address.  This function shouldn't evaluate
   whether the user set the i2c_address pointer or
   not.

2) Don't touch the caller's i2c_address, unless
   you have to--this function shouldn't have side
   effects.

3) Correctly set the function comment as a
   kernel-doc comment.

Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30 14:59:33 -04:00
Dave Airlie
8f0284f190 Merge tag 'amd-drm-next-5.15-2021-08-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.15-2021-08-27:

amdgpu:
- PLL fix for SI
- Misc code cleanups
- RAS fixes
- PSP cleanups
- Polaris UVD/VCE suspend fixes
- aldebaran fixes
- DCN3.x mclk fixes

amdkfd:
- CWSR fixes for arcturus and aldebaran
- SVM fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210827192336.4649-1-alexander.deucher@amd.com
2021-08-30 09:06:03 +10:00
Hawking Zhang
192fb630fb drm/amdgpu: disable GFX CGCG in aldebaran
disable GFX CGCG and CGLS to workaround
a hardware issue found in aldebaran.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26 13:56:58 -04:00
John Clements
54e6badbed drm/amdgpu: Clear RAS interrupt status on aldebaran
resolve register address issue for detecting/clearing RAS interrupt

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26 13:56:48 -04:00
John Clements
3c4ff2dcc0 drm/amdgpu: Add support for RAS XGMI err query
Update XGMI RAS to support error query on aldebaran

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26 13:56:14 -04:00
Dave Airlie
697b6e28d0 Merge tag 'amd-drm-next-5.15-2021-08-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.15-2021-08-20:

amdgpu:
- embed hw fence into job
- Misc SMU fixes
- PSP TA code cleanup
- RAS fixes
- PWM fan speed fixes
- DC workqueue cleanups
- SR-IOV fixes
- gfxoff delayed work fix
- Pin domain check fix

amdkfd:
- SVM fixes

radeon:
- Code cleanup

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210820172335.4190-1-alexander.deucher@amd.com
2021-08-26 12:18:27 +10:00
Yifan Zhang
d035f84d83 drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain
amdgpu_bo_get_preferred_pin_domain is used for page tables
creation, which is not involved with page pinning. And it is used in
more cases than display scanout, modify its documentation as well.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-25 18:15:17 -04:00
Evan Quan
416e1fab47 drm/amdgpu: drop redundant cancel_delayed_work_sync call
As those _sw_fini() APIs follow just after _suspend() APIs.
And the cancel_delayed_work_sync was already called in latter.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-25 18:15:10 -04:00
Evan Quan
859e465927 drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
This is a supplement for commit below:
"drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend".

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-25 18:15:02 -04:00
Evan Quan
bf756fb833 drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
Perform proper cleanups on UVD/VCE suspend: powergate enablement,
clockgating enablement and dpm disablement. This can fix some hangs
observed on suspending when UVD/VCE still using(e.g. issue
"pm-suspend" when video is still playing).

Signed-off-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-25 18:14:48 -04:00
Philip Yang
ff891a2e64 drm/amdkfd: check access permisson to restore retry fault
Check range access permission to restore GPU retry fault, if GPU retry
fault on address which belongs to VMA, and VMA has no read or write
permission requested by GPU, failed to restore the address. The vm fault
event will pass back to user space.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:36:50 -04:00
John Clements
f24d991bb9 drm/amdgpu: Update RAS XGMI Error Query
Resolve bug querying error on unsupported ASIC

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:36:43 -04:00
John Clements
3907c49218 drm/amdgpu: Add driver infrastructure for MCA RAS
Add MCA specific IP blocks targetting RAS features

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:36:18 -04:00
Candice Li
30acef3c4a drm/amd/amdgpu: consolidate PSP TA init shared buf functions
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:36:05 -04:00
Candice Li
355e3e4ccc drm/amd/amdgpu: add name field back to ras_common_if
Adding name field back to ras_common_if to work around error
injection failure with amdgpuras tool.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:35:57 -04:00
Borislav Petkov
a47f6a5806 drm/amdgpu: Fix build with missing pm_suspend_target_state module export
Building a randconfig here triggered:

  ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

because the module export of that symbol happens in
kernel/power/suspend.c which is enabled with CONFIG_SUSPEND.

The ifdef guards in amdgpu_acpi_is_s0ix_supported(), however, test for
CONFIG_PM_SLEEP which is defined like this:

  config PM_SLEEP
          def_bool y
          depends on SUSPEND || HIBERNATE_CALLBACKS

and that randconfig has:

  # CONFIG_SUSPEND is not set
  CONFIG_HIBERNATE_CALLBACKS=y

leading to the module export missing.

Change the ifdeffery to depend directly on CONFIG_SUSPEND.

Fixes: 5706cb3c910c ("drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/YSP6Lv53QV0cOAsd@zn.tnic
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-08-24 15:35:50 -04:00
Christophe JAILLET
8a1d1bdb84 drm/amdgpu: switch from 'pci_' to 'dma_' API
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:35:41 -04:00
Mukul Joshi
f270921a17 drm/amdkfd: CWSR with sw scheduler on Aldebaran and Arcturus
Program trap handler settings to enable CWSR with software scheduler
on Aldebaran and Arcturus.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:35:33 -04:00
Shashank Sharma
7301757ea1 drm/amdgpu/OLAND: clip the ref divider max value
This patch limits the ref_div_max value to 100, during the
calculation of PLL feedback reference divider. With current
value (128), the produced fb_ref_div value generates unstable
output at particular frequencies. Radeon driver limits this
value at 100.

On Oland, when we try to setup mode 2048x1280@60 (a bit weird,
I know), it demands a clock of 221270 Khz. It's been observed
that the PLL calculations using values 128 and 100 are vastly
different, and look like this:

+------------------------------------------+
|Parameter    |AMDGPU        |Radeon       |
|             |              |             |
+-------------+----------------------------+
|Clock feedback              |             |
|divider max  |  128         |   100       |
|cap value    |              |             |
|             |              |             |
|             |              |             |
+------------------------------------------+
|ref_div_max  |              |             |
|             |  42          |  20         |
|             |              |             |
|             |              |             |
+------------------------------------------+
|ref_div      |  42          |  20         |
|             |              |             |
+------------------------------------------+
|fb_div       |  10326       |  8195       |
+------------------------------------------+
|fb_div       |  1024        |  163        |
+------------------------------------------+
|fb_dev_p     |  4           |  9          |
|frac fb_de^_p|              |             |
+----------------------------+-------------+

With ref_div_max value clipped at 100, AMDGPU driver can also
drive videmode 2048x1280@60 (221Mhz) and produce proper output
without any blanking and distortion on the screen.

PS: This value was changed from 128 to 100 in Radeon driver also, here:
4b21ce1b4b

V1:
Got acks from:
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>

V2:
- Restricting the changes only for OLAND, just to avoid any regression
  for other cards.
- Changed unsigned -> unsigned int to make checkpatch quiet.

V3: Apply the change on SI family (not only oland) (Christian)

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Eddy Qin <Eddy.Qin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-24 15:35:25 -04:00
Michel Dänzer
90a9266269 drm/amdgpu: Cancel delayed work when GFXOFF is disabled
schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.

This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).

To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.

v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
  mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
v4:
* Fix race condition between amdgpu_gfx_off_ctrl incrementing
  adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
  checking for it to be 0 (Evan Quan)

Cc: stable@vger.kernel.org
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> # v3
Acked-by: Christian König <christian.koenig@amd.com> # v3
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-20 12:09:44 -04:00
Christian König
9deb0b3dcf drm/amdgpu: use the preferred pin domain after the check
For some reason we run into an use case where a BO is already pinned
into GTT, but should be pinned into VRAM|GTT again.

Handle that case gracefully as well.

Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-08-20 12:09:32 -04:00
YuBiao Wang
691191a2f4 drm/amd/amdgpu:flush ttm delayed work before cancel_sync
[Why]
In some cases when we unload driver, warning call trace
will show up in vram_mgr_fini which claims that LRU is not empty, caused
by the ttm bo inside delay deleted queue.

[How]
We should flush delayed work to make sure the delay deleting is done.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-18 18:23:00 -04:00
Candice Li
ce97f37be8 drm/amd: consolidate TA shared memory structures
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-18 18:22:53 -04:00
Hawking Zhang
f2bd514d85 drm/amdgpu: increase max xgmi physical node for aldebaran
aldebaran supports up to 16 xgmi physical nodes.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-18 18:22:46 -04:00
Evan Quan
42447deb88 drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily
We have a S3 issue on that SKU with BACO enabled. Will bring back this
when that root caused.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-18 18:22:25 -04:00
Zhigang Luo
424f2b2e26 drm/amdgpu: correct MMSCH 1.0 version
MMSCH 1.0 doesn't have major/minor version, only verison.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed by Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-18 18:22:25 -04:00
Jonathan Kim
44357a1bd5 drm/amdgpu: get extended xgmi topology data
The TA has a limit to the amount of data that can be retrieved from
GET_TOPOLOGY.  For setups that exceed this limit, the xGMI topology
needs to be re-initialized and data needs to be re-fetched from the
extended link records by setting a flag in the shared command buffer.

The number of hops and the number of links must be accumulated by the
driver. Other data points are all fetched from the first request.
Because the TA has already exceeded its link record limit, it
cannot hold bidirectional information.  Otherwise the driver would
have to do more than two fetches so the driver has to reflect the
topology information in the opposite direction.

v2: squashed with internal reviewed fix

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-18 18:22:24 -04:00
Evan Quan
0d8318e112 drm/amd/pm: drop the unnecessary intermediate percent-based transition
Currently, the readout of fan speed pwm is transited into percent-based
and then pwm-based. However, the transition into percent-based is totally
unnecessary and make the final output less accurate.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16 15:35:56 -04:00
Candice Li
893cf382c0 drm/amd/amdgpu: remove unnecessary RAS context field
Delete ras_if->name in the RAS ctx structure and remove related lines.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16 15:35:55 -04:00
Candice Li
6457205c07 drm/amd/amdgpu: consolidate PSP TA context
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16 15:18:04 -04:00
Jiange Zhao
3e183e2fae drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.
When guest received FLR notification from host, it would
lock adapter into reset state. There will be no more
job submission and hardware access after that.

Then it should send a response to host that it has prepared
for host reset.

Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16 15:17:57 -04:00
Jack Zhang
c530b02f39 drm/amd/amdgpu embed hw_fence into amdgpu_job
Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16 15:16:58 -04:00
Dave Airlie
2819cf0e7d drm-misc-next for v5.15:
UAPI Changes:
 
 Cross-subsystem Changes:
 - Add lockdep_assert(once) helpers.
 
 Core Changes:
 - Add lockdep assert to drm_is_current_master_locked.
 - Fix typos in dma-buf documentation.
 - Mark drm irq midlayer as legacy only.
 - Fix GPF in udmabuf_create.
 - Rename member to correct value in drm_edid.h
 
 Driver Changes:
 - Build fix to make nouveau build with NOUVEAU_BACKLIGHT.
 - Add MI101AIT-ICP1, LTTD800480070-L6WWH-RT panels.
 - Assorted fixes to bridge/it66121, anx7625.
 - Add custom crtc_state to simple helpers, and use it to
   convert pll handling in mgag200 to atomic.
 - Convert drivers to use offset-adjusted framebuffer bo mappings.
 - Assorted small fixes and fix for a use-after-free in vmwgfx.
 - Convert remaining callers of non-legacy drivers to use linux irqs directly.
 - Small cleanup in ingenic.
 - Small fixes to virtio and ti-sn65dsi86.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmEVd7cACgkQ/lWMcqZw
 E8Pw4hAApFWMPThb1OGxAsozhfOAc+Nhd/XjClWrEeu+auowge0OJzjUTOpUci7o
 NrL5GWLPDoIXm5/8eLm6B9k+hPmsuLQUgriMMpPyulvgckSYiLKzFEygbKtilfsJ
 k+ucLYlZ93ADLIcUIaode5eeWIbtsiuXNdfpH91uLKT2Lo1k4azJTfLJl0t1TB78
 uAumaeLXirmqfyN8U/zgPSmaNyTgBlMjMrJG/0+NGaoiq3sKtC/vCU/4r2aEcsMd
 ZrYbAJph5ND/u449P4vk2mDNmpz6GE1i8y1zeq7OWq+EOFHaUv0M2v1ZGTl9NSCw
 CascPEb+Hzt4eMFEvpVn9Vz/p7NpVeT78733YkF6jO898pruDbJFz/1PFLUcWp5S
 jssvgYyhqTrLTCabg8OMrkww8S5on+qcRPVjrpaND2QJUrF58acrEQVCJmCf/XIg
 G2XXotZxv96upWj1n/bZz/KcNN7xc+J6z9SEZml/elz5er9msbL8OFEq9bxWxfKL
 xjDY0CwvRtkwnxkin6UpZHYlGGciY1j3pCMHiRpQNiHDQOfNE2pFbKHN0lWE7jiS
 wQaSMhxC8cjM1TpfHCeHmM522QS6oqJbPev9Z+2twO8KtfetbbBE3xe+Rm/Rf7eM
 vPRvHavWaRDlkq742qZqsiqP0aiqgr04i73Kw5mGLOCT/42uCuU=
 =G09r
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2021-08-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v5.15:

UAPI Changes:

Cross-subsystem Changes:
- Add lockdep_assert(once) helpers.

Core Changes:
- Add lockdep assert to drm_is_current_master_locked.
- Fix typos in dma-buf documentation.
- Mark drm irq midlayer as legacy only.
- Fix GPF in udmabuf_create.
- Rename member to correct value in drm_edid.h

Driver Changes:
- Build fix to make nouveau build with NOUVEAU_BACKLIGHT.
- Add MI101AIT-ICP1, LTTD800480070-L6WWH-RT panels.
- Assorted fixes to bridge/it66121, anx7625.
- Add custom crtc_state to simple helpers, and use it to
  convert pll handling in mgag200 to atomic.
- Convert drivers to use offset-adjusted framebuffer bo mappings.
- Assorted small fixes and fix for a use-after-free in vmwgfx.
- Convert remaining callers of non-legacy drivers to use linux irqs directly.
- Small cleanup in ingenic.
- Small fixes to virtio and ti-sn65dsi86.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1cf2d7fc-402d-1852-574a-21cbbd2eaebf@linux.intel.com
2021-08-16 12:57:33 +10:00
Tuo Li
a211260c34 gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
The variable val is declared without initialization, and its address is
passed to amdgpu_i2c_get_byte(). In this function, the value of val is
accessed in:
  DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n",
       addr, *val);

Also, when amdgpu_i2c_get_byte() returns, val may remain uninitialized,
but it is accessed in:
  val &= ~amdgpu_connector->router.ddc_mux_control_pin;

To fix this possible uninitialized-variable access, initialize val to 0 in
amdgpu_i2c_router_select_ddc_port().

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-11 17:19:54 -04:00
Mukul Joshi
b53ef0df1b drm/amdkfd: CWSR with software scheduler
This patch adds support to program trap handler settings
when loading driver with software scheduler (sched_policy=2).

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Suggested-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-11 17:19:54 -04:00
Thomas Zimmermann
450d61794d drm/amdgpu: Convert to Linux IRQ interfaces
Drop the DRM IRQ midlayer in favor of Linux IRQ interfaces. DRM's
IRQ helpers are mostly useful for UMS drivers. Modern KMS drivers
don't benefit from using it.

DRM IRQ callbacks are now being called directly or inlined.

The interrupt number returned by pci_msi_vector() is now stored
in struct amdgpu_irq. Calls to pci_msi_vector() can fail and return
a negative errno code. Abort initlaizaton in thi case. The DRM IRQ
midlayer does not handle this correctly.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210803090704.32152-2-tzimmermann@suse.de
2021-08-10 20:00:44 +02:00
Alex Deucher
59066d0083 drm/amdgpu: handle VCN instances when harvesting (v2)
There may be multiple instances and only one is harvested.

v2: fix typo in commit message

Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-10 10:34:25 -04:00
Sergio Miguéns Iglesias
6940db0fd1 drm/amdgpu: Removed unnecessary if statement
There was an "if" statement that did nothing so it was removed.

Signed-off-by: Sergio Miguéns Iglesias <sergio@lony.xyz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-09 15:44:50 -04:00
Randy Dunlap
7b42552be6 drm/amdgpu: fix kernel-doc warnings on non-kernel-doc comments
Don't use "begin kernel-doc notation" (/**) for comments that are
not kernel-doc. This eliminates warnings reported by the 0day bot.

drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:89: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * This shader is used to clear VGPRS and LDS, and also write the input
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:209: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * The below shaders are used to clear SGPRS, and also write the input
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:301: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * This shader is used to clear the uninitiated sgprs after the above

Fixes: 0e0036c7d13b ("drm/amdgpu: fix no full coverage issue for gprs initialization")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-09 15:44:47 -04:00