757 Commits

Author SHA1 Message Date
Nirmoy Das
204d8998ce drm/amdkfd: label internally used symbols as static
Used sparse(make C=1) to find these loose ends.

v2:
removed unwanted extra line

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:23 -04:00
Alex Deucher
1c1ada37af drm/amdkfd: fix ref count leak when pm_runtime_get_sync fails
The call to pm_runtime_get_sync increments the counter even in case of
failure, leading to incorrect ref count.
In case of failure, decrement the ref count before returning.

Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:23 -04:00
Qiushi Wu
20eca0123a drm/amdkfd: Fix reference count leaks.
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.

Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:21 -04:00
Felix Kuehling
b205795677 drm/amdkfd: Add eviction debug messages
Use WARN to print messages with backtrace when evictions are triggered.
This can help determine the root cause of evictions and help spot driver
bugs triggering evictions unintentionally, or help with performance tuning
by avoiding conditions that cause evictions in a specific workload.

The messages are controlled by a new module parameter that can be changed
at runtime:

  echo Y > /sys/module/amdgpu/parameters/debug_evictions
  echo N > /sys/module/amdgpu/parameters/debug_evictions

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:21 -04:00
Lorenz Brun
7159562a16 drm/amdkfd: Use correct major in devcgroup check
The existing code used the major version number of the DRM driver
instead of the device major number of the DRM subsystem for
validating access for a devices cgroup.

This meant that accesses allowed by the devices cgroup weren't
permitted and certain accesses denied by the devices cgroup were
permitted (if they matched the wrong major device number).

Signed-off-by: Lorenz Brun <lorenz@brun.one>
Fixes: 6b855f7b83d2f ("drm/amdkfd: Check against device cgroup")
Reviewed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:21 -04:00
shaoyunl
adab4dadd9 drm/amdkfd: sienna_cichlid virtual function support
amdkfd add support for sienna_cichlid virtual function

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Jay Cornwall
3cefc7189c drm/amdkfd: Support debugger in Navi1x trap handler
- Preserve scalar GPRs ttmp[4:11] and ttmp13
- Add single step exception during context save workaround
- Remove incorrect PC adjustment during context save

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Jay Cornwall
d0f1a85366 drm/amdkfd: Support newer assemblers in gfx10 trap handler
The contents of macros are parsed by the assembler before conditions
have been tested. This causes assembly errors when using IP-specific
instructions in the IP-unified trap handler.

Add a preprocessing step to filter IP-specific code.

Also guard a Navi1x-specific instruction (no effect on Sienna_Cichlid).

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Jay Cornwall
80b6cfedd3 drm/amdkfd: Add Sienna_Cichlid trap handler support
- Replace SQC stores with TCP stores
- Synchronize with MSG_SAVEWAVE via lgkmcnt
- HW_REG_IB_STS is now read-only

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Yong Zhao
3a2f0c813b drm/amdkfd: Support Sienna_Cichlid KFD v4
v4: drop get_tile_config, comment out other callbacks

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-01 01:59:11 -04:00
Dave Airlie
8a7a3d1d0d Merge tag 'amd-drm-fixes-5.8-2020-06-17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.8-2020-06-17:

amdgpu:
- Fix kvfree/kfree mixup
- Fix hawaii device id in powertune configuration
- Display FP fixes
- Documentation fixes

amdkfd:
- devcgroup check fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200617220733.3773183-1-alexander.deucher@amd.com
2020-06-19 10:02:30 +10:00
Lorenz Brun
99c7b30947 drm/amdkfd: Use correct major in devcgroup check
The existing code used the major version number of the DRM driver
instead of the device major number of the DRM subsystem for
validating access for a devices cgroup.

This meant that accesses allowed by the devices cgroup weren't
permitted and certain accesses denied by the devices cgroup were
permitted (if they matched the wrong major device number).

Signed-off-by: Lorenz Brun <lorenz@brun.one>
Fixes: 6b855f7b83d2f ("drm/amdkfd: Check against device cgroup")
Reviewed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-06-17 17:41:22 -04:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Linus Torvalds
faa392181a drm pull for 5.8-rc1
core:
 - uapi: error out EBUSY when existing master
 - uapi: rework SET/DROP MASTER permission handling
 - remove drm_pci.h
 - drm_pci* are now legacy
 - introduced managed DRM resources
 - subclassing support for drm_framebuffer
 - simple encoder helper
 - edid improvements
 - vblank + writeback documentation improved
 - drm/mm - optimise tree searches
 - port drivers to use devm_drm_dev_alloc
 
 dma-buf:
 - add flag for p2p buffer support
 
 mst:
 - ACT timeout improvements
 - remove drm_dp_mst_has_audio
 - don't use 2nd TX slot - spec recommends against it
 
 bridge:
 - dw-hdmi various improvements
 - chrontel ch7033 support
 - fix stack issues with old gcc
 
 hdmi:
 - add unpack function for drm infoframe
 
 fbdev:
 - misc fbdev driver fixes
 
 i915:
 - uapi: global sseu pinning
 - uapi: OA buffer polling
 - uapi: remove generated perf code
 - uapi: per-engine default property values in sysfs
 - Tigerlake GEN12 enabled.
 - Lots of gem refactoring
 - Tigerlake enablement patches
 - move to drm_device logging
 - Icelake gamma HW readout
 - push MST link retrain to hotplug work
 - bandwidth atomic helpers
 - ICL fixes
 - RPS/GT refactoring
 - Cherryview full-ppgtt support
 - i915 locking guidelines documented
 - require linear fb stride to be 512 multiple on gen9
 - Tigerlake SAGV support
 
 amdgpu:
 - uapi: encrypted GPU memory handling
 - uapi: add MEM_SYNC IB flag
 - p2p dma-buf support
 - export VRAM dma-bufs
 - FRU chip access support
 - RAS/SR-IOV updates
 - Powerplay locking fixes
 - VCN DPG (powergating) enablement
 - GFX10 clockgating fixes
 - DC fixes
 - GPU reset fixes
 - navi SDMA fix
 - expose FP16 for modesetting
 - DP 1.4 compliance fixes
 - gfx10 soft recovery
 - Improved Critical Thermal Faults handling
 - resizable BAR on gmc10
 
 amdkfd:
 - uapi: GWS resource management
 - track GPU memory per process
 - report PCI domain in topology
 
 radeon:
 - safe reg list generator fixes
 
 nouveau:
 - HD audio fixes on recent systems
 - vGPU detection (fail probe if we're on one, for now)
 - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it)
 - SVM improvements/fixes
 - NVIDIA format modifier support
 - Misc other fixes.
 
 adv7511:
 - HDMI SPDIF support
 
 ast:
 - allocate crtc state size
 - fix double assignment
 - fix suspend
 
 bochs:
 - drop connector register
 
 cirrus:
 - move to tiny drivers.
 
 exynos:
 - fix imported dma-buf mapping
 - enable runtime PM
 - fixes and cleanups
 
 mediatek:
 - DPI pin mode swap
 - config mipi_tx current/impedance
 
 lima:
 - devfreq + cooling device support
 - task handling improvements
 - runtime PM support
 
 pl111:
 - vexpress init improvements
 - fix module auto-load
 
 rcar-du:
 - DT bindings conversion to YAML
 - Planes zpos sanity check and fix
 - MAINTAINERS entry for LVDS panel driver
 
 mcde:
 - fix return value
 
 mgag200:
 - use managed config init
 
 stm:
 - read endpoints from DT
 
 vboxvideo:
 - use PCI managed functions
 - drop WC mtrr
 
 vkms:
 - enable cursor by default
 
 rockchip:
 - afbc support
 
 virtio:
 - various cleanups
 
 qxl:
 - fix cursor notify port
 
 hisilicon:
 - 128-byte stride alignment fix
 
 sun4i:
 - improved format handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJe1edsAAoJEAx081l5xIa+bKEQAJAZv/8OMM2rx+p+GyKgrNpl
 ihTX/oyToy8dw97s1kWF7V5kKU+qjF8aWlKoPS0xovzaMAzYSFz9FRNEUgqtTXMI
 zIAzSXioqP21oL9/ZTHcXDULtz8Gk3uiPomgXMWLlNBdt3X5qvCwsmPRIYSwG0GJ
 00VCvxDbVxGSM3wzcvbfyRwHCq3SrFvIusXv5jHnnxEFGH0C7Mj2/FLYMKLNjvli
 Q8VEI2wQPZj1QdA8fLFVneIQsR6YUSko9OfFMANP8VJGpPMnUkvVxTJ5ACGJspvn
 U/h6NYqJeUU2Y3BSKqtjIC3a1LY51tp5tL9q4H9TD1hqMckt6F2V7T2IeFU8i6+V
 YzUsSiT4q1xB+uiFVcgopx2hyIp8INOEyWrVdYgw2JviROeRD+pDHvJd13ZNMnTe
 GvLWQ/PfBFrcz8eligjiYjOf66ZTU+j/rivaOBFyrs9gdlsaEW2QRurFrcNX+0lZ
 kDbLsIFjhYnPXsvHP87x4BuQCKQIEh8wWuxXuJjunBPdqVrJyltZWbBiKO571b5/
 BtX6xj6ztUOffR2RdiVanzY546I2hEi7SHMUuWnMqXsOV46GBN0QvlpZad/47n9x
 ZUy8HDDD0/qWuGwvPOJGIeAnUteWge9AhWXTeN5+1h5m+QEOzYkPKqC3Hp8TW1pM
 gToTWgAhnu731fhzLWyt
 =H7IS
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Highlights:

   - Core DRM had a lot of refactoring around managed drm resources to
     make drivers simpler.

   - Intel Tigerlake support is on by default

   - amdgpu now support p2p PCI buffer sharing and encrypted GPU memory

  Details:

  core:
   - uapi: error out EBUSY when existing master
   - uapi: rework SET/DROP MASTER permission handling
   - remove drm_pci.h
   - drm_pci* are now legacy
   - introduced managed DRM resources
   - subclassing support for drm_framebuffer
   - simple encoder helper
   - edid improvements
   - vblank + writeback documentation improved
   - drm/mm - optimise tree searches
   - port drivers to use devm_drm_dev_alloc

  dma-buf:
   - add flag for p2p buffer support

  mst:
   - ACT timeout improvements
   - remove drm_dp_mst_has_audio
   - don't use 2nd TX slot - spec recommends against it

  bridge:
   - dw-hdmi various improvements
   - chrontel ch7033 support
   - fix stack issues with old gcc

  hdmi:
   - add unpack function for drm infoframe

  fbdev:
   - misc fbdev driver fixes

  i915:
   - uapi: global sseu pinning
   - uapi: OA buffer polling
   - uapi: remove generated perf code
   - uapi: per-engine default property values in sysfs
   - Tigerlake GEN12 enabled.
   - Lots of gem refactoring
   - Tigerlake enablement patches
   - move to drm_device logging
   - Icelake gamma HW readout
   - push MST link retrain to hotplug work
   - bandwidth atomic helpers
   - ICL fixes
   - RPS/GT refactoring
   - Cherryview full-ppgtt support
   - i915 locking guidelines documented
   - require linear fb stride to be 512 multiple on gen9
   - Tigerlake SAGV support

  amdgpu:
   - uapi: encrypted GPU memory handling
   - uapi: add MEM_SYNC IB flag
   - p2p dma-buf support
   - export VRAM dma-bufs
   - FRU chip access support
   - RAS/SR-IOV updates
   - Powerplay locking fixes
   - VCN DPG (powergating) enablement
   - GFX10 clockgating fixes
   - DC fixes
   - GPU reset fixes
   - navi SDMA fix
   - expose FP16 for modesetting
   - DP 1.4 compliance fixes
   - gfx10 soft recovery
   - Improved Critical Thermal Faults handling
   - resizable BAR on gmc10

  amdkfd:
   - uapi: GWS resource management
   - track GPU memory per process
   - report PCI domain in topology

  radeon:
   - safe reg list generator fixes

  nouveau:
   - HD audio fixes on recent systems
   - vGPU detection (fail probe if we're on one, for now)
   - Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it)
   - SVM improvements/fixes
   - NVIDIA format modifier support
   - Misc other fixes.

  adv7511:
   - HDMI SPDIF support

  ast:
   - allocate crtc state size
   - fix double assignment
   - fix suspend

  bochs:
   - drop connector register

  cirrus:
   - move to tiny drivers.

  exynos:
   - fix imported dma-buf mapping
   - enable runtime PM
   - fixes and cleanups

  mediatek:
   - DPI pin mode swap
   - config mipi_tx current/impedance

  lima:
   - devfreq + cooling device support
   - task handling improvements
   - runtime PM support

  pl111:
   - vexpress init improvements
   - fix module auto-load

  rcar-du:
   - DT bindings conversion to YAML
   - Planes zpos sanity check and fix
   - MAINTAINERS entry for LVDS panel driver

  mcde:
   - fix return value

  mgag200:
   - use managed config init

  stm:
   - read endpoints from DT

  vboxvideo:
   - use PCI managed functions
   - drop WC mtrr

  vkms:
   - enable cursor by default

  rockchip:
   - afbc support

  virtio:
   - various cleanups

  qxl:
   - fix cursor notify port

  hisilicon:
   - 128-byte stride alignment fix

  sun4i:
   - improved format handling"

* tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm: (1401 commits)
  drm/amd/display: Fix potential integer wraparound resulting in a hang
  drm/amd/display: drop cursor position check in atomic test
  drm/amdgpu: fix device attribute node create failed with multi gpu
  drm/nouveau: use correct conflicting framebuffer API
  drm/vblank: Fix -Wformat compile warnings on some arches
  drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode
  drm/amd/display: Handle GPU reset for DC block
  drm/amdgpu: add apu flags (v2)
  drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven
  drm/amdgpu: fix pm sysfs node handling (v2)
  drm/amdgpu: move gpu_info parsing after common early init
  drm/amdgpu: move discovery gfx config fetching
  drm/nouveau/dispnv50: fix runtime pm imbalance on error
  drm/nouveau: fix runtime pm imbalance on error
  drm/nouveau: fix runtime pm imbalance on error
  drm/nouveau/debugfs: fix runtime pm imbalance on error
  drm/nouveau/nouveau/hmm: fix migrate zero page to GPU
  drm/nouveau/nouveau/hmm: fix nouveau_dmem_chunk allocations
  drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST
  drm/nouveau/kms/nv50-: Move 8BPC limit for MST into nv50_mstc_get_modes()
  ...
2020-06-02 15:04:15 -07:00
Colin Ian King
2652bda7b4 drm/amdkfd: fix a dereference of pdd before it is null checked
Currently pointer pdd is being dereferenced when assigning pointer
dpm and then pdd is being null checked.  Fix this by checking if
pdd is null before the dereference of pdd occurs.

Addresses-Coverity: ("Dereference before null check")
Fixes: 32cb59f31362 ("drm/amdkfd: Track SDMA utilization per process")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-29 13:54:38 -04:00
Felix Kuehling
83a13ef590 drm/amdkfd: Fix GCC 10 compiler warning
GCC 10 was complaining about how we append data to a buffer using snprintf:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c: In function ‘perf_show’:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:214:3: warning: ‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]
  214 |   snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This patch fixes the warnings and makes the sysfs code more efficient
by remembering the offset in the buffer between append operations.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-28 17:46:20 -04:00
Mukul Joshi
32cb59f313 drm/amdkfd: Track SDMA utilization per process
Track SDMA usage on a per process basis and report it through sysfs.
The value in the sysfs file indicates the amount of time SDMA has
been in-use by this process since the creation of the process.
This value is in microsecond granularity.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-28 14:00:50 -04:00
Evan Quan
997769fa80 drm/amdkfd: report the real PCI bus number
Since the PCI bus number retrieved by PCI_BUS_NUM(pdev->devfn)
is wrong.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-21 12:48:42 -04:00
Aishwarya Ramakrishnan
8c8e1f6984 drm/amdkfd: Fix boolreturn.cocci warnings
Return statements in functions returning bool should use
true/false instead of 1/0.

drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10:
WARNING: return of 0/1 in function 'event_interrupt_isr_v9' with return type bool

Generated by: scripts/coccinelle/misc/boolreturn.cocci

Signed-off-by: Aishwarya Ramakrishnan <aishwaryarj100@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-21 12:37:19 -04:00
Yong Zhao
d09f85d52a drm/amdkfd: Use a systematic method to calculate queue mask bit
The queue mask used for set_resources always assumes the queue number
per pipe is 8, so KFD needs to align with that by using function
amdgpu_queue_mask_bit_to_set_resource_bit().

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01 15:19:08 -04:00
Felix Kuehling
0aeaaf64e6 drm/amdkfd: Fix comment formatting
Corrected two function names. Added a missing space.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01 10:00:19 -04:00
Ori Messinger
3e58e95ace drm/amdkfd: Report domain with topology
PCI domain has moved to 32-bits to accommodate virtualization,
so a 32-bit integer is exposed for domain to reflect this change.

Domain can be found in here:
/sys/class/kfd/kfd/topology/nodes/X/properties
Where X is the card number

Signed-off-by: Ori Messinger <ori.messinger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-01 09:59:51 -04:00
Mukul Joshi
d4566dee84 drm/amdkfd: Track GPU memory utilization per process
Track GPU VRAM usage on a per process basis and report it through
sysfs.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-30 16:47:34 -04:00
Joseph Greathouse
b8020b0304 drm/amdkfd: Enable over-subscription with >1 GWS queue
The current GWS usage model will only allows a single GWS-enabled
process to be active on the GPU at once. This ensures that a
barrier-using kernel gets a known amount of GPU hardware, to
prevent deadlock due to inability to go beyond the GWS barrier.

The HWS watches how many GWS entries are assigned to each process,
and goes into over-subscription mode when two processes need more
than the 64 that are available. The current KFD method for working
with this is to allocate all 64 GWS entries to each GWS-capable
process.

When more than one GWS-enabled process is in the runlist, we must
make sure the runlist is in over-subscription mode, so that the
HWS gets a chained RUN_LIST packet and continues scheduling
kernels.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:30 -04:00
Joseph Greathouse
29633d0e20 drm/amdkfd: Enable GWS based on FW Support
Rather than only enabling GWS support based on the hws_gws_support
modparm, also check whether the GPU's HWS firmware supports GWS.
Leave the old modparm in place in case users want to test GWS
on GPUs not yet in the support list.

v2: fix broken syntax from the first patch.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:30 -04:00
Oak Zeng
5bb4b78be9 drm/amdkfd: New IOCTL to allocate queue GWS (v2)
Add a new kfd ioctl to allocate queue GWS. Queue
GWS is released on queue destroy.

v2: re-introduce this API with the following fixes squashed in:
- drm/amdkfd: fix null pointer dereference on dev
- drm/amdkfd: Return proper error code for gws alloc API
- drm/amdkfd: Remove GPU ID in GWS queue creation

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:30 -04:00
Joseph Greathouse
c6d1ec4134 drm/amdkfd: Put ASIC revision into HSA capability
In order to surface the ASIC revision to user level, we want
to put it into the HSA topology. This can be because different
ASIC revisions may require user-level software to do different
things (e.g. patch code for things that are changed in later
hardware revisions).

The ASIC revision from the hardware is maximum of 4 bits at this
time, so put it into 4 of the open bits in the HSA capability.
Then user-level software can use this capability information to
know -- for each ASIC -- what revision-based things must be done.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 11:04:56 -04:00
Yong Zhao
de430916b4 drm/amdkfd: Adjust three kfd dmesg printings during initialization
Delete two printings which are not very useful, and change one from
pr_info() to pr_debug().

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Odin Ugedal
eec8fd0277 device_cgroup: Cleanup cgroup eBPF device filter code
Original cgroup v2 eBPF code for filtering device access made it
possible to compile with CONFIG_CGROUP_DEVICE=n and still use the eBPF
filtering. Change
commit 4b7d4d453fc4 ("device_cgroup: Export devcgroup_check_permission")
reverted this, making it required to set it to y.

Since the device filtering (and all the docs) for cgroup v2 is no longer
a "device controller" like it was in v1, someone might compile their
kernel with CONFIG_CGROUP_DEVICE=n. Then (for linux 5.5+) the eBPF
filter will not be invoked, and all processes will be allowed access
to all devices, no matter what the eBPF filter says.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-04-13 14:41:54 -04:00
Jack Zhang
3148a6a0ef drm/amdkfd: kfree the wrong pointer
Originally, it kfrees the wrong pointer for mem_obj.
It would cause memory leak under stress test.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:22 -04:00
Colin Ian King
8cd296082c drm: amd: fix spelling mistake "shoudn't" -> "shouldn't"
There are spelling mistakes in pr_err messages and a comment. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
Yong Zhao
1d251d9008 drm/amdkfd: Consolidate duplicated bo alloc flags
ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*,
but they are interweavedly used in kernel driver, resulting in bad
readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not
referenced in kernel, and it functions implicitly in kernel through
ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion.

Replace all occurrences of ALLOC_MEM_FLAGS_* with
KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:34 -04:00
Yong Zhao
8f2e0c0333 drm/amdkfd: Use pr_debug to print the message of reaching event limit
People are inclined to think of the previous pr_warn message as an
error, so use pre_debug instead.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:27 -04:00
Felix Kuehling
129657c86f drm/amdkfd: Signal eviction fence on process destruction (v2)
Otherwise BOs may wait for the fence indefinitely and never be destroyed.

v2: Signal the fence right after destroying queues to avoid unnecessary
    delaye-delete in kfd_process_wq_release

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:40:30 -05:00
Yong Zhao
2f6ae2de13 drm/amdkfd: Add more comments on GFX9 user CP queue MQD workaround
Because too many things are involved in this workaround, we need more
comments to avoid pitfalls.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:30 -05:00
Colin Ian King
b84fe6ffc1 drm/amdkfd: fix indentation issue
There is a statement that is indented with spaces instead of a tab.
Replace spaces with a tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:26:45 -05:00
Eric Huang
f2cc50cefd drm/amdkfd: change SDMA MQD memory type
SDMA MQD memory type is NC that causes MQD data overwritten
accidentally by an old stable cache line. Changing it to UC
default for GART will fix the issue.

The mqd_gfx9 parameter is meant for control stacks that are
allocated together with user mode queue MQDs. Setting
mqd_gfx9 to true maps the control stack pages as NC.
Here it was accidentally applied to SDMA MQDs,
which are allocated together with the HIQ MQD. Setting
the mqd_gfx9 to false avoids that.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:20 -05:00
Yong Zhao
fd7d08bad7 drm/amdkfd: Make get_tile_config() generic
Given we can query all the asic specific information from amdgpu_gfx_config,
we can make get_tile_config() generic.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:20 -05:00
Yong Zhao
c7637c95ab drm/amdkfd: Delete unnecessary unmap queue package submissions
The previous way of using SDMA queue count to infer whether we should unmap
SDMA engines has bugs. The reason it did not cause issues is because MEC
firmware unmaps all queues (CP + SDMA) when a unmap package for compute
engine is received. Becasue of that, only one unmap queue package
is needed, instead of one unmap queue package for CP and each SDMA engine,
which results in much simpler driver code.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:20:33 -05:00
Yong Zhao
1e21647402 drm/amdkfd: Delete excessive printings
Those printings are duplicated or useless.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:20:26 -05:00
Yong Zhao
66f28b9a16 drm/amdkfd: Fix a memory leak in queue creation error handling
When the queue creation failed, some resources were not freed. Fix it.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:20:20 -05:00
Yong Zhao
b42902f4af drm/amdkfd: Count active CP queues directly
The previous code of calculating active CP queues is problematic if
some SDMA queues are inactive. Fix that by counting CP queues directly.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:20:13 -05:00
Yong Zhao
e694530418 drm/amdkfd: Avoid ambiguity by indicating it's cp queue
The queues represented in queue_bitmap are only CP queues.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:20:05 -05:00
Yong Zhao
81b820b304 drm/amdkfd: Rename queue_count to active_queue_count
The name is easier to understand the code.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:19:38 -05:00
Divya Shikre
0c663695a6 drm/amd: Extend ROCt to surface UUID for devices that have them
Devices from Arcturus onwards will have their UUID exposed to Thunk.
Adding neccessary functions to the kernel to propagate the uuid.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:18:17 -05:00
Rajneesh Bhardwaj
9593f4d6a6 drm/amdkfd: refactor runtime pm for baco
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.

Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:00:54 -05:00
Rajneesh Bhardwaj
3c1224c02e drm/amdkfd: show warning when kfd is locked
During system suspend the kfd driver aquires a lock that prohibits
further kfd actions unless the gpu is resumed. This adds some info which
can be useful while debugging.

Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:00:47 -05:00
Amber Lin
6d220a7e79 drm/amdkfd: Add queue information to sysfs
Provide compute queues information in sysfs under /sys/class/kfd/kfd/proc.
The format is /sys/class/kfd/kfd/proc/<pid>/queues/<queue id>/XX where
XX are size, type, and gpuid three files to represent queue size, queue
type, and the GPU this queue uses. <queue id> folder and files underneath
are generated when a queue is created. They are removed when the queue is
destroyed.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-06 15:04:38 -05:00
Yong Zhao
f38abc15d1 drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-04 10:32:41 -05:00
Yong Zhao
5205503929 drm/amdkfd: Add a message when SW scheduler is used
SW scheduler is previously called non HW scheduler, or non HWS. This
message is useful when triaging issues from dmesg.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:38:07 -05:00