402 Commits

Author SHA1 Message Date
Stanislaw Gruszka
8f7fb1e21e accel/ivpu: Add debugfs files for testing device reset
Add new debugfs files to validate device recovery functionality.

Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230524074847.866711-4-stanislaw.gruszka@linux.intel.com
2023-07-07 09:33:23 +02:00
Stanislaw Gruszka
d4e4257afa accel/ivpu: Add firmware tracing support
Add support for firmware tracing and logging via debugfs.

Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230524074847.866711-3-stanislaw.gruszka@linux.intel.com
2023-07-07 09:33:20 +02:00
Stanislaw Gruszka
edde4caec1 accel/ivpu: Initial debugfs support
Add initial debugfs support. Provide below functionality:

- print buffer objects
- print latest boot mode
- trigger vpu engine reset

Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230524074847.866711-2-stanislaw.gruszka@linux.intel.com
2023-07-07 09:33:13 +02:00
Karol Wachowski
7f34e01f77 accel/ivpu: Clear specific interrupt status bits on C0
MTL C0 stepping fixed issue related to butrress interrupt status clearing,
to clear an interrupt status it is required to write 1 to specific
status bit field. This allows to execute read, modify and write routine.

Writing 0 will not clear the interrupt and will cause interrupt storm.

Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
Cc: stable@vger.kernel.org # 6.3.x
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230703080725.2065635-2-stanislaw.gruszka@linux.intel.com
2023-07-05 12:29:39 +02:00
Karol Wachowski
020b527b55 accel/ivpu: Fix VPU register access in irq disable
Incorrect REGB_WR32() macro was used to access VPUIP register.
Use correct REGV_WR32().

Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
Cc: stable@vger.kernel.org # 6.3.x
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230703080725.2065635-1-stanislaw.gruszka@linux.intel.com
2023-07-05 12:29:29 +02:00
Linus Torvalds
1b722407a1 drm changes for 6.5-rc1:
core:
 - replace strlcpy with strscpy
 - EDID changes to support further conversion to struct drm_edid
 - Move i915 DSC parameter code to common DRM helpers
 - Add Colorspace functionality
 
 aperture:
 - ignore framebuffers with non-primary devices
 
 fbdev:
 - use fbdev i/o helpers
 - add Kconfig options for fb_ops helpers
 - use new fb io helpers directly in drivers
 
 sysfs:
 - export DRM connector ID
 
 scheduler:
 - Avoid an infinite loop
 
 ttm:
 - store function table in .rodata
 - Add query for TTM mem limit
 - Add NUMA awareness to pools
 - Export ttm_pool_fini()
 
 bridge:
 - fsl-ldb: support i.MX6SX
 - lt9211, lt9611: remove blanking packets
 - tc358768: implement input bus formats, devm cleanups
 - ti-snd65dsi86: implement wait_hpd_asserted
 - analogix: fix endless probe loop
 - samsung-dsim: support swapped clock, fix enabling, support var clock
 - display-connector: Add support for external power supply
 - imx: Fix module linking
 - tc358762: Support reset GPIO
 
 panel:
 - nt36523: Support Lenovo J606F
 - st7703: Support Anbernic RG353V-V2
 - InnoLux G070ACE-L01 support
 - boe-tv101wum-nl6: Improve initialization
 - sharp-ls043t1le001: Mode fixes
 - simple: BOE EV121WXM-N10-1850, S6D7AA0
 - Ampire AM-800480L1TMQW-T00H
 - Rocktech RK043FN48H
 - Starry himax83102-j02
 - Starry ili9882t
 
 amdgpu:
 - add new ctx query flag to handle reset better
 - add new query/set shadow buffer for rdna3
 - DCN 3.2/3.1.x/3.0.x updates
 - Enable DC_FP on loongarch
 - PCIe fix for RDNA2
 - improve DC FAMS/SubVP support for better power management
 - partition support for lots of engines
 - Take NUMA into account when allocating memory
 - Add new DRM_AMDGPU_WERROR config parameter to help with CI
 - Initial SMU13 overdrive support
 - Add support for new colorspace KMS API
 - W=1 fixes
 
 amdkfd:
 - Query TTM mem limit rather than hardcoding it
 - GC 9.4.3 partition support
 - Handle NUMA for partitions
 - Add debugger interface for enabling gdb
 - Add KFD event age tracking
 
 radeon:
 - Fix possible UAF
 
 i915:
 - new getparam for PXP support
 - GSC/MEI proxy driver
 - Meteorlake display enablement
 - avoid clearing preallocated framebuffers with TTM
 - implement framebuffer mmap support
 - Disable sampler indirect state in bindless heap
 - Enable fdinfo for GuC backends
 - GuC loading and firmware table handling fixes
 - Various refactors for multi-tile enablement
 - Define MOCS and PAT tables for MTL
 - GSC/MEI support for Meteorlake
 - PMU multi-tile support
 - Large driver kernel doc cleanup
 - Allow VRR toggling and arbitrary refresh rates
 - Support async flips on linear buffers on display ver 12+
 - Expose CRTC CTM property on ILK/SNB/VLV
 - New debugfs for display clock frequencies
 - Hotplug refactoring
 - Display refactoring
 - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
 - Use large rings for compute contexts
 - HuC loading for MTL
 - Allow user to set cache at BO creation
 - MTL powermanagement enhancements
 - Switch to dedicated workqueues to stop using flush_scheduled_work()
 - Move display runtime init under display/
 - Remove 10bit gamma on desktop gen3 parts, they don't support it
 
 habanalabs:
 - uapi: return 0 for user queries if there was a h/w or f/w error
 - Add pci health check when we lose connection with the firmware. This can be used to
   distinguish between pci link down and firmware getting stuck.
 - Add more info to the error print when TPC interrupt occur.
 - Firmware fixes
 
 msm:
 - Adreno A660 bindings
 - SM8350 MDSS bindings fix
 - Added support for DPU on sm6350 and sm6375 platforms
 - Implemented tearcheck support to support vsync on SM150 and newer platforms
 - Enabled missing features (DSPP, DSC, split display) on sc8180x, sc8280xp, sm8450
 - Added support for DSI and 28nm DSI PHY on MSM8226 platform
 - Added support for DSI on sm6350 and sm6375 platforms
 - Added support for display controller on MSM8226 platform
 - A690 GPU support
 - Move cmdstream dumping out of fence signaling path
 - a610 support
 - Support for a6xx devices without GMU
 
 nouveau:
 - NULL ptr before deref fixes
 
 armada:
 - implement fbdev emulation as client
 
 sun4i:
 - fix mipi-dsi dotclock
 - release clocks
 
 vc4:
 - rgb range toggle property
 - BT601 / BT2020 HDMI support
 
 vkms:
 - convert to drmm helpers
 - add reflection and rotation support
 - fix rgb565 conversion
 
 gma500:
 - fix iomem access
 
 shmobile:
 - support renesas soc platform
 - enable fbdev
 
 mxsfb:
 - Add support for i.MX93 LCDIF
 
 stm:
 - dsi: Use devm_ helper
 - ltdc: Fix potential invalid pointer deref
 
 renesas:
 - Group drivers in renesas subdirectory to prepare for new platform
 - Drop deprecated R-Car H3 ES1.x support
 
 meson:
 - Add support for MIPI DSI displays
 
 virtio:
 - add sync object support
 
 mediatek:
 - Add display binding document for MT6795
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmSc3UwACgkQDHTzWXnE
 hr69fQ/+PF9L7FSB/qfjaoqJnk6wJyCehv7pDX2/UK7FUrW0e4EwNVx4KKIRqO/P
 pKSU9wRlC72ViGgqOYnw0pwzuh45630vWo1stbgxipU2cvM6Ywlq8FiQFdymFe+P
 tLYWe5MR55Y+E9Y+bCrKn2yvQ7v+f6EZ6ITIX7mrXL77Bpxhv58VzmZawkxmw5MV
 vwhSqJaaeeWNoyfSIDdN8Oj9fE6ScTyiA0YisOP6jnK/TiQofXQxFrMIdKctCcoA
 HjolfEEPVCDOSBipkV3hLiyN8lXmt47BmuHp9opSL/g1aASteVeD1/GrccTaA4xV
 ah+Jx1hBLcH5sm8CZzbCcHhNu3ILnPCFZFCx8gwflQqmDIOZvoMdL75j7lgqJZG8
 TePEiifG3kYO/ZiDc5TUBdeMfbgeehPOsxbvOlA3LxJrgyxe/5o9oejX2Uvvzhoq
 9fno1PLqeCILqYaMiCocJwyTw/2VKYCCH7Wiypd4o3h0nmAbbqPT3KeZgNOjoa2X
 GXpiIU9rTQ8LZgSmOXdCt2rc9Jb6q+eCiDgrZzAukbP8veQyOvO16Nx1+XzLhOYc
 BfjEOoA7nBJD+UPLWkwj42gKtoEWN7IOMTHgcK11d8jdpGISGupl/1nntGhYk0jO
 +3RRZXMB/Gjwe9ge4K9bFC81pbfuAE7ELQtPsgV9LapMmWHKccY=
 =FmUA
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "There is one set of patches to misc for a i915 gsc/mei proxy driver.

  Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots
  of refactoring.

  core:
   - replace strlcpy with strscpy
   - EDID changes to support further conversion to struct drm_edid
   - Move i915 DSC parameter code to common DRM helpers
   - Add Colorspace functionality

  aperture:
   - ignore framebuffers with non-primary devices

  fbdev:
   - use fbdev i/o helpers
   - add Kconfig options for fb_ops helpers
   - use new fb io helpers directly in drivers

  sysfs:
   - export DRM connector ID

  scheduler:
   - Avoid an infinite loop

  ttm:
   - store function table in .rodata
   - Add query for TTM mem limit
   - Add NUMA awareness to pools
   - Export ttm_pool_fini()

  bridge:
   - fsl-ldb: support i.MX6SX
   - lt9211, lt9611: remove blanking packets
   - tc358768: implement input bus formats, devm cleanups
   - ti-snd65dsi86: implement wait_hpd_asserted
   - analogix: fix endless probe loop
   - samsung-dsim: support swapped clock, fix enabling, support var
     clock
   - display-connector: Add support for external power supply
   - imx: Fix module linking
   - tc358762: Support reset GPIO

  panel:
   - nt36523: Support Lenovo J606F
   - st7703: Support Anbernic RG353V-V2
   - InnoLux G070ACE-L01 support
   - boe-tv101wum-nl6: Improve initialization
   - sharp-ls043t1le001: Mode fixes
   - simple: BOE EV121WXM-N10-1850, S6D7AA0
   - Ampire AM-800480L1TMQW-T00H
   - Rocktech RK043FN48H
   - Starry himax83102-j02
   - Starry ili9882t

  amdgpu:
   - add new ctx query flag to handle reset better
   - add new query/set shadow buffer for rdna3
   - DCN 3.2/3.1.x/3.0.x updates
   - Enable DC_FP on loongarch
   - PCIe fix for RDNA2
   - improve DC FAMS/SubVP support for better power management
   - partition support for lots of engines
   - Take NUMA into account when allocating memory
   - Add new DRM_AMDGPU_WERROR config parameter to help with CI
   - Initial SMU13 overdrive support
   - Add support for new colorspace KMS API
   - W=1 fixes

  amdkfd:
   - Query TTM mem limit rather than hardcoding it
   - GC 9.4.3 partition support
   - Handle NUMA for partitions
   - Add debugger interface for enabling gdb
   - Add KFD event age tracking

  radeon:
   - Fix possible UAF

  i915:
   - new getparam for PXP support
   - GSC/MEI proxy driver
   - Meteorlake display enablement
   - avoid clearing preallocated framebuffers with TTM
   - implement framebuffer mmap support
   - Disable sampler indirect state in bindless heap
   - Enable fdinfo for GuC backends
   - GuC loading and firmware table handling fixes
   - Various refactors for multi-tile enablement
   - Define MOCS and PAT tables for MTL
   - GSC/MEI support for Meteorlake
   - PMU multi-tile support
   - Large driver kernel doc cleanup
   - Allow VRR toggling and arbitrary refresh rates
   - Support async flips on linear buffers on display ver 12+
   - Expose CRTC CTM property on ILK/SNB/VLV
   - New debugfs for display clock frequencies
   - Hotplug refactoring
   - Display refactoring
   - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
   - Use large rings for compute contexts
   - HuC loading for MTL
   - Allow user to set cache at BO creation
   - MTL powermanagement enhancements
   - Switch to dedicated workqueues to stop using flush_scheduled_work()
   - Move display runtime init under display/
   - Remove 10bit gamma on desktop gen3 parts, they don't support it

  habanalabs:
   - uapi: return 0 for user queries if there was a h/w or f/w error
   - Add pci health check when we lose connection with the firmware.
     This can be used to distinguish between pci link down and firmware
     getting stuck.
   - Add more info to the error print when TPC interrupt occur.
   - Firmware fixes

  msm:
   - Adreno A660 bindings
   - SM8350 MDSS bindings fix
   - Added support for DPU on sm6350 and sm6375 platforms
   - Implemented tearcheck support to support vsync on SM150 and newer
     platforms
   - Enabled missing features (DSPP, DSC, split display) on sc8180x,
     sc8280xp, sm8450
   - Added support for DSI and 28nm DSI PHY on MSM8226 platform
   - Added support for DSI on sm6350 and sm6375 platforms
   - Added support for display controller on MSM8226 platform
   - A690 GPU support
   - Move cmdstream dumping out of fence signaling path
   - a610 support
   - Support for a6xx devices without GMU

  nouveau:
   - NULL ptr before deref fixes

  armada:
   - implement fbdev emulation as client

  sun4i:
   - fix mipi-dsi dotclock
   - release clocks

  vc4:
   - rgb range toggle property
   - BT601 / BT2020 HDMI support

  vkms:
   - convert to drmm helpers
   - add reflection and rotation support
   - fix rgb565 conversion

  gma500:
   - fix iomem access

  shmobile:
   - support renesas soc platform
   - enable fbdev

  mxsfb:
   - Add support for i.MX93 LCDIF

  stm:
   - dsi: Use devm_ helper
   - ltdc: Fix potential invalid pointer deref

  renesas:
   - Group drivers in renesas subdirectory to prepare for new platform
   - Drop deprecated R-Car H3 ES1.x support

  meson:
   - Add support for MIPI DSI displays

  virtio:
   - add sync object support

  mediatek:
   - Add display binding document for MT6795"

* tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits)
  drm/i915: Fix a NULL vs IS_ERR() bug
  drm/i915: make i915_drm_client_fdinfo() reference conditional again
  drm/i915/huc: Fix missing error code in intel_huc_init()
  drm/i915/gsc: take a wakeref for the proxy-init-completion check
  drm/msm/a6xx: Add A610 speedbin support
  drm/msm/a6xx: Add A619_holi speedbin support
  drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
  drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
  drm/msm/a6xx: Fix some A619 tunables
  drm/msm/a6xx: Add A610 support
  drm/msm/a6xx: Add support for A619_holi
  drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
  drm/msm/a6xx: Introduce GMU wrapper support
  drm/msm/a6xx: Move CX GMU power counter enablement to hw_init
  drm/msm/a6xx: Extend and explain UBWC config
  drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
  drm/msm/a6xx: Add a helper for software-resetting the GPU
  drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
  drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
  drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
  ...
2023-06-29 11:00:17 -07:00
Thomas Zimmermann
71e801b9b4 drm: Clear fd/handle callbacks in struct drm_driver
Clear all assignments of struct drm_driver's fd/handle callbacks to
drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These
functions are called by default. Add a TODO item to convert vmwgfx
to the defaults as well.

v2:
	* remove TODO item (Zack)
	* also update amdgpu's amdgpu_partition_driver

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> # qaic
Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-3-tzimmermann@suse.de
2023-06-26 11:08:41 +02:00
Pranjal Ramajor Asha Kanojiya
8d0d16a3ef accel/qaic: Call DRM helper function to destroy prime GEM
smatch warning:
	drivers/accel/qaic/qaic_data.c:620 qaic_free_object() error:
		dereferencing freed memory 'obj->import_attach'

obj->import_attach is detached and freed using dma_buf_detach().
But used after free to decrease the dmabuf ref count using
dma_buf_put().

drm_prime_gem_destroy() handles this issue and performs the proper clean
up instead of open coding it in the driver.

Fixes: ff13be830333 ("accel/qaic: Add datapath")
Reported-by: Sukrut Bellary <sukrut.bellary@linux.com>
Closes: https://lore.kernel.org/all/20230610021200.377452-1-sukrut.bellary@linux.com/
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230614161528.11710-1-quic_jhugo@quicinc.com
2023-06-20 08:07:29 -06:00
Thomas Zimmermann
de8a334f21 Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get commit 2c1c7ba457d4
("drm/amdgpu: support partition drm devices"), which is required to fix
commit 0adec22702d4 ("drm: Remove struct drm_driver.gem_prime_mmap").

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-06-19 16:33:14 +02:00
Thomas Zimmermann
0adec22702 drm: Remove struct drm_driver.gem_prime_mmap
All drivers initialize this field with drm_gem_prime_mmap(). Call
the function directly and remove the field. Simplifies the code and
resolves a long-standing TODO item.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230613150441.17720-3-tzimmermann@suse.de
2023-06-19 13:56:40 +02:00
Dave Airlie
cce3b573a5 Linux 6.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSPcdMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWrQH/3KmuvZsWMC4PpJY
 VcF9VfF9i+Zv7DoG8sjD5VpNh47e87RsR6WNOFnKol5SUrM6vsBAb5i2rfQahNIv
 NSj0fPCE4/Nj9LMecKVC9WD8CitxYdbR+CF9Is21AQj1VihUl9eHXGcAWxuaMyhk
 TjPUwmbOOsRVMXXdGJzjX78cvLsxqpSv8A/5OTh16IBimbh7p+YjKJFkbfj/PMWf
 aF1quFkIEXgzJcHCpP6KDZHr2KbpY+jIN9hUENnGKJxHYNso5u+KrIW1kAm8meP1
 x26ETSquM0T70OAzovOWg+BeVkLDac/3Rh30ztLAI4AtajrlSzycvFsU9UNEJCc2
 BnM2IZI=
 =ANT5
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next

Linux 6.4-rc7

Need this to pull in the msm work.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-06-19 16:01:25 +10:00
Jeffrey Hugo
61d8cdb787 accel/qaic: Fix NULL pointer deref in qaic_destroy_drm_device()
If qaic_destroy_drm_device() is called before the device has fully
initialized it will cause a NULL pointer dereference as the drm device
has not yet been created. Fix this with a NULL check.

Fixes: c501ca23a6a3 ("accel/qaic: Add uapi and core driver file")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230602210440.8411-3-quic_jhugo@quicinc.com
2023-06-09 11:07:28 -06:00
Carl Vanderlip
3e1b9b2d81 accel/qaic: Free user handle on interrupted mutex
After user handle is allocated, if mutex is interrupted, we do not free
the user handle and return an error. Kref had been initialized, but not
added to users list, so device teardown would also not call free_usr.

Fixes: c501ca23a6a3 ("accel/qaic: Add uapi and core driver file")
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230602210440.8411-2-quic_jhugo@quicinc.com
2023-06-09 11:06:33 -06:00
Dani Liberman
e6f49e96bc accel/habanalabs: refactor error info reset
Moved error info reset code to single function for future use from
other places in the driver.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Ofir Bitton
fac91dd54f accel/habanalabs: add event queue extra validation
In order to increase reliability of the event queue interface,
we apply to Gaudi2 the same mechanism we have in Gaudi1.
The extra validation is basically checking that the received
event index matches the expected index.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Ofir Bitton
19aa21b980 accel/habanalabs: unsecure TSB_CFG_MTRR regs
In order to utilize Engine Barrier padding, user must have access to
this register set.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Oded Gabbay
ff5c702522 accel/habanalabs: move ioctl error print to debug level
We don't want to allow users to spam the kernel log and sending
ioctls with bad opcodes is a sure way to do it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08 12:35:56 +03:00
Ofir Bitton
8a20b38164 accel/habanalabs: fix bug of not fetching addr_dec info
addr_dec info should always be fetched, regardless of cause value.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Oded Gabbay
569210233a accel/habanalabs: remove sim code
There were a few places where simulator only code got into the upstream.
Remove those places that can confuse other developers.

Fixes: 2a0a839b6a28 ("habanalabs: extend fatal messages to contain PCI info")
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Dani Liberman
5d658d0c51 accel/habanalabs: mask part of hmmu page fault captured address
When receiving page fault from hmmu, the captured address is scrambled
both by HW and by driver. The driver part is unscrambled but the HW
part isn't getting unscrambled.
To avoid declaring wrong address, the HW scrambled part will be
masked.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Koby Elbaz
7e63f317c0 accel/habanalabs: update state when loading boot fit
Any FW component we load must be followed by a corresponding state
update. However, it seems that so far we skipped doing so for the
bootfit case, so fix that.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Tomer Tayar
6092cedfff accel/habanalabs: print qman data on error only for lower qman
By default, the upper QMANs are not used, and instead engines ARCs
access the lower QMANs directly.
Errors for upper QMANs are therefore not expected, and the debug print
of the PQ entries is not needed.

Modify the QMAN debug data print on errors to include only information
for the lower QMAN.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Tomer Tayar
54381ee809 accel/habanalabs: use lower QM in QM errors handling
The QMAN GLBL_ERR_STS_4 register has indications for errors also in the
lower CQ and the ARC CQ, and not just for errors in the lower CP.
Modify the relevant define/struct and the related print to use "lower
QM" instead of "lower CP".

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Dani Liberman
dcc8fa88d4 accel/habanalabs: use binning info when handling razwi
When receiving sei interrupt from tpc or decoder, we need to check
the binning mask because if the engine is binned, the razwi info
won't be in the router of the binned engine, instead will be in the
router of the substitute engine.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Ofir Bitton
583f12a80d accel/habanalabs: remove support for mmu disable
As mmu disable mode is only used for bring-up stages, let's remove this
option and all code related to it.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Koby Elbaz
b2d61fecb4 accel/habanalabs: upon DMA errors, use FW-extracted error cause
Initially, the driver used to read the error cause data directly from
the ASIC. However, the FW now clears it before the driver could read
it. Therefore we should use the error cause data that is extracted by
the FW.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Oded Gabbay
adda800c04 accel/habanalabs: print max timeout value on CS stuck
If a workload got stuck, we print an error to the kernel log about it.
Add to that print the configured max timeout value, as that value is
not fixed between ASICs and in addition it can be configured using
a kernel module parameter.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08 12:35:55 +03:00
Oded Gabbay
dcfce96ee8 accel/habanalabs: align to latest firmware specs
Update the firmware common interface files with the latest version.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08 12:35:55 +03:00
Moti Haimovski
314a7ffd7c accel/habanalabs: fix mem leak in capture user mappings
This commit fixes a memory leak caused when clearing the user_mappings
info when a new context is opened immediately after user_mapping is
captured and a hard reset is performed.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Oded Gabbay
e715008b7c accel/habanalabs: set unused bit as reserved
Get latest f/w gaudi2 interface file which marks unused
bist_need_iatu_config bit in cold_rst_data structure as reserved bit.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08 12:35:55 +03:00
Koby Elbaz
964234aba5 accel/habanalabs: rename security functions related arguments
Make the argument names specify the registers array represent
registers that should be unsecured so the user can access them.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Dan Carpenter
9ec7639b5e accel/habanalabs: fix gaudi2_get_tpc_idle_status() return
The gaudi2_get_tpc_idle_status() function returned the incorrect variable
so it always returned true.

Fixes: d85f0531b928 ("accel/habanalabs: break is_idle function into per-engine sub-routines")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:55 +03:00
Yang Li
cc1eeaa335 accel/habanalabs: Fix some kernel-doc comments
Make the description of @regs_range_array and @regs_range_array_size
to @user_regs_range_array and @user_regs_range_array_size  to silence
the warnings:

drivers/accel/habanalabs/common/security.c:506: warning: Function parameter or member 'user_regs_range_array' not described in 'hl_init_pb_ranges_single_dcore'
drivers/accel/habanalabs/common/security.c:506: warning: Function parameter or member 'user_regs_range_array_size' not described in 'hl_init_pb_ranges_single_dcore'
drivers/accel/habanalabs/common/security.c:506: warning: Excess function parameter 'regs_range_array' description in 'hl_init_pb_ranges_single_dcore'
drivers/accel/habanalabs/common/security.c:506: warning: Excess function parameter 'regs_range_array_size' description in 'hl_init_pb_ranges_single_dcore'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=4940
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Ofir Bitton
d0dcd4bbfa accel/habanalabs: always fetch pci addr_dec error info
Due to missing indication of address decode source (LBW/HBW bus),
we should always try and fetch extended information.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Koby Elbaz
7d21296336 accel/habanalabs: fix a static warning - 'dubious: x & !y'
Use a straight forward approach to get a conditional result.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Koby Elbaz
67d19a2f49 accel/habanalabs: poll for device status update following WFE cmd
Currently, we rely on COMMS protocol's ack to verify that WFE command
has been acknowledged by the FW. However, this does not guarantee that
the device status has been updated.
Although unlikely, this could trigger a race since the driver expects
the device to be halted at that stage, but it might not be.
Therefore, we increase WFE's robustness by polling on the status
register that will be updated once the device is actually halted.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Tomer Tayar
3b9abb4fa6 accel/habanalabs: expose debugfs files later
Currently the debugfs root folder and files for a device are created at
an early step, before the device initialization and before the char
device and sysfs files are exposed to user.
As there is no real reason not to do it together with the device
creation, postpone it to be done right afterwards.

The initialization of the debugfs entry structure is left in its
current position because it is used before creating the files.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Ofir Bitton
d8b9cea584 accel/habanalabs: add pci health check during heartbeat
Currently upon a heartbeat failure, we don't know if the failure
is due to firmware hang or due to a bad PCI link. Hence, we
are reading a PCI config space register with a known value (vendor ID)
so we will know which of the two possibilities caused the heartbeat
failure.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Dafna Hirschfeld
3d21ec6424 accel/habanalabs: add missing tpc interrupt info
For some reason the last possible tpc interrupt cause in
gaudi2_tpc_interrupts_cause is missing from the code.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Koby Elbaz
9a4e44a4ee accel/habanalabs: refactor abort of completions and waits
Aborting CS completions should be in command_submission.c but aborting
waiting for user interrupts should be in device.c.

This separation is also for adding more abort operations in the future.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Koby Elbaz
ad8bfd3619 accel/habanalabs: minimize encapsulation signal mutex lock time
Sync Stream Encapsulated Signal Handlers can be managed from different
contexts, and as such they are protected via a spin_lock.
However, spin_lock was unnecessarily protecting a larger code section
than really needed, covering a sleepable code section as well.
Since spin_lock disables preemption, it could lead to sleeping in
atomic context.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Andrzej Kacprowski
a3efabee58 accel/ivpu: Fix sporadic VPU boot failure
Wait for AON bit in HOST_SS_CPR_RST_CLR to return 0 before
starting VPUIP power up sequence, otherwise the VPU device
may sporadically fail to boot.

An error in power up sequence is propagated to the runtime
power management - the device will be in an error state
until the VPU driver is reloaded.

Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
Cc: stable@vger.kernel.org # 6.3.x
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230607094502.388489-1-stanislaw.gruszka@linux.intel.com
2023-06-08 08:17:27 +02:00
Stanislaw Gruszka
b563e47957 accel/ivpu: Do not use mutex_lock_interruptible
If we get signal when waiting for the mmu->lock we do not invalidate
current MMU configuration that might result in undefined behavior.

Additionally there is little or no benefit on break waiting for
ipc->lock. In current code base, we keep this lock for short periods.

Fixes: 263b2ba5fc93 ("accel/ivpu: Add Intel VPU MMU support")
Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-2-stanislaw.gruszka@linux.intel.com
2023-06-08 08:15:46 +02:00
Andrzej Kacprowski
9f7e3611f6 accel/ivpu: Do not trigger extra VPU reset if the VPU is idle
Turning off the PLL and entering D0i3 will reset the VPU so
an explicit IP reset is redundant.
But if the VPU is active, it may interfere with PLL disabling
and to avoid that, we have to issue an additional IP reset
to silence the VPU before turning off the PLL.

Fixes: a8fed6d1e0b9 ("accel/ivpu: Fix power down sequence")
Cc: stable@vger.kernel.org # 6.3.x
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-1-stanislaw.gruszka@linux.intel.com
2023-06-08 08:15:30 +02:00
Karol Wachowski
95d440188d accel/ivpu: Mark 64 kB contiguous areas as contiguous in PTEs
Whenever KMD maps region larger than 64kB that is both aligned and
contiguous, set contiguous bit (52) in MMU PTE descriptor for each page
in that region.

This allows to treat 16 contiguous pages as one and reduce
number of MMU page walks required which results in lower latency.

Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-6-stanislaw.gruszka@linux.intel.com
2023-06-08 07:54:00 +02:00
Karol Wachowski
103d2ea139 accel/ivpu: Rename and cleanup MMU600 page tables
Simplify and unify naming convention in MMU600 page tables
configuration.

All DMA addresses in page tables directly accessed by VPU are called
with _dma sufix and all CPU pointers to those page tables have _ptr
sufix.

Base pointers used to do a page walk on the CPU have corresponding
names:

 pud_ptrs (pointers used to get access to PUD DMA)
 pmd_ptrs (pointers used to get access to PMD DMA)
 pte_ptrs (pointers used to get access to PTE DMA)

with the following convention:

 u64 *pud_dma_ptr = pud_ptrs[pgd_idx];
 *pud_dma_ptr = pud_dma;

 u64 *pmd_dma_ptr = pmd_ptrs[pgd_idx][pud_idx];
 *pmd_dma_ptr = pmd_dma;

 u64 *pte_dma_ptr = pte_ptrs[pgd_idx][pud_idx][pmd_idx];
 *pte_dma_ptr = pte_dma;

On the way change to coherent dma allocation, _wc is only valid on ARM
and was used by mistake.

Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-5-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:51 +02:00
Karol Wachowski
a4172d6cf0 accel/ivpu: Make DMA bit mask HW specific
Future devices will have different dma bit mask, make it hw specific.

Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-4-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:40 +02:00
Karol Wachowski
a2fd4a6fae accel/ivpu: Add MMU support for 4 level page mappings
Program additional fourth level required for mappings with VA above 38bits.

Co-developed-by: Raymond Tan <raymond.tan@intel.com>
Signed-off-by: Raymond Tan <raymond.tan@intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-3-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:33 +02:00
Karol Wachowski
cab032239a accel/ivpu: Remove configuration of MMU TBU1 and TBU3
MTL HW only uses StreamId0 and StreamId3 that map to TBU0 and TBU2.

Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-2-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:05 +02:00
Christophe JAILLET
9230d5dcb2 accel/ivpu: Use struct_size()
Use struct_size() instead of hand-writing it. It is less verbose, more
robust and more informative.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Marco Pagani <marpagan@redhat.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0ae53be873c27c9a8740c4fe6d8e7cd1b1224994.1685366864.git.christophe.jaillet@wanadoo.fr
2023-06-08 07:46:51 +02:00