linux

iv/linux

Author	SHA1	Message	Date
Jani Nikula	d09aa85258	drm/i915: move i915_coherent_map_type() to i915_gem_pages.c and un-inline The inline function has no place in i915_drv.h. Move it away, un-inline, and untangle some header dependencies while at it. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220914163514.1837467-1-jani.nikula@intel.com	2022-09-26 12:21:08 +03:00
Matthew Auld	8676145eb2	drm/i915/ttm: fix CCS handling Crucible + recent Mesa seems to sometimes hit: GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER) And it looks like we can also trigger this with gem_lmem_swapping, if we modify the test to use slightly larger object sizes. Looking closer it looks like we have the following issues in migrate_copy(): - We are using plain integer in various places, which we can easily overflow with a large object. - We pass the entire object size (when the src is lmem) into emit_pte() and then try to copy it, which doesn't work, since we only have a few fixed sized windows in which to map the pages and perform the copy. With an object > 8M we therefore aren't properly copying the pages. And then with an object > 64M we trigger the GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER). So it looks like our copy handling for any object > 8M (which is our CHUNK_SZ) is currently broken on DG2. Fixes: da0595ae91da ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj") Testcase: igt@gem_lmem_swapping Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C<ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com	2022-08-09 11:36:51 +01:00
Matthew Auld	dba4d442be	drm/i915/ttm: remove calc_ctrl_surf_instr_size We only ever need to emit one ccs block copy command. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C<ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-1-matthew.auld@intel.com	2022-08-09 11:36:19 +01:00
Matthew Auld	353819d85f	drm/i915/ttm: don't leak the ccs state The kernel only manages the ccs state with lmem-only objects, however the kernel should still take care not to leak the CCS state from the previous user. Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220727164346.282407-1-matthew.auld@intel.com	2022-07-28 11:37:13 +01:00
Jason Wang	2be1959ece	drm/i915/gt: Remove unneeded semicolon The semicolon after the `}' is unneeded. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Removed line mention when pushing] Link: https://patchwork.freedesktop.org/patch/msgid/20220716184439.72056-1-wangborong@cdjrlc.com	2022-07-21 04:06:44 -04:00
Ramalingam C	6e29832f61	drm/i915/gt: Document the eviction of the Flat-CCS objects Capture the eviction details for Flat-CCS capable, lmem objects. v2: Fix the Flat-ccs capbility of lmem obj with smem residency possibility [Thomas] v3: Fixed the suggestions [Matt] Signed-off-by: Ramalingam C <ramalingam.c@intel.com> cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220502142618.2704-4-ramalingam.c@intel.com	2022-05-03 07:42:09 +05:30
Ramalingam C	b8c9d486af	drm/i915/gt: optimize the ccs_sz calculation per chunk Calculate the ccs_sz that needs to be emitted based on the src and dst pages emitted per chunk. And handle the return value of emit_pte for the ccs pages. v2: ccs_sz moved to the reduced scope [Matt] Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220502142618.2704-3-ramalingam.c@intel.com	2022-05-03 07:42:02 +05:30
Ramalingam C	da0595ae91	drm/i915/migrate: Evict and restore the flatccs capable lmem obj When we are swapping out the local memory obj on flat-ccs capable platform, we need to capture the ccs data too along with main meory and we need to restore it when we are swapping in the content. When lmem object is swapped into a smem obj, smem obj will have the extra pages required to hold the ccs data corresponding to the lmem main memory. So main memory of lmem will be copied into the initial pages of the smem and then ccs data corresponding to the main memory will be copied to the subsequent pages of smem. ccs data is 1/256 of lmem size. Swapin happens exactly in reverse order. First main memory of lmem is restored from the smem's initial pages and the ccs data will be restored from the subsequent pages of smem. Extracting and restoring the CCS data is done through a special cmd called XY_CTRL_SURF_COPY_BLT v2: Fixing the ccs handling v3: Handle the ccs data at same loop as main memory [Thomas] v4: changes for emit_copy_ccs v5: handle non-flat-ccs scenario Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-10-ramalingam.c@intel.com	2022-04-14 13:20:29 +05:30
Ramalingam C	48760ffe92	drm/i915/gt: Clear compress metadata for Flat-ccs objects Xe-HP and latest devices support Flat CCS which reserved a portion of the device memory to store compression metadata, during the clearing of device memory buffer object we also need to clear the associated CCS buffer. XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the ccs surface of a lmem memory. So on Flat-CCS capable platform we use XY_CTRL_SURF_COPY_BLT to clear the CCS meta data. v2: Fixed issues with platform naming [Lucas] v3: Rebased [Ram] Used the round_up funcs [Bob] v4: Fixed ccs blk calculation [Ram] Added Kdoc on flat-ccs. v5: GENMASK is used [Matt] mocs fix [Matt] Comments Fix [Matt] Flush address programming [Ram] v6: FLUSH_DW is fixed Few coding style fix v7: Adopting the XY_FAST_COLOR_BLT (Thomas] v8: XY_CTRL_SURF_COPY_BLT for ccs clearing. v9: emit_copy_ccs is used. v10: ctrl_surf cmds are filled in caller itself. [Thomas] only one ctrl surf cmd is used as size of lmem is <=8M [Thomas] Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-6-ramalingam.c@intel.com	2022-04-14 13:20:29 +05:30
Ramalingam C	310bf25df2	drm/i915/gt: Pass the -EINVAL when emit_pte doesn't update any PTE When emit_pte doesn't update any PTE with return value as 0, interpret it as -EINVAL. v2: Add missing goto [Thomas] Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-5-ramalingam.c@intel.com	2022-04-14 13:20:18 +05:30
Ramalingam C	6e6bc8c0a8	drm/i915/gt: Optimize the migration and clear loop Move the static calculations out of the loops for copy and clear. v2: Fix the loss of proper error code on emit_pte Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-4-ramalingam.c@intel.com	2022-04-14 13:20:08 +05:30
Ramalingam C	a0ed9c95cc	drm/i915/gt: Use XY_FAST_COLOR_BLT to clear obj on graphics ver 12+ Use faster XY_FAST_COLOR_BLT cmd on graphics version of 12 and more, for clearing (Zero out) the pages of the newly allocated object. XY_FAST_COLOR_BLT is faster than the older XY_COLOR_BLT. v2: Typo fix at title [Thomas] v3: XY_FAST_COLOR_BLT is used only for FLAT_CCS capable gen12+ Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-3-ramalingam.c@intel.com	2022-04-14 13:17:05 +05:30
Ramalingam C	fd5803e5ee	drm/i915/gt: use engine instance directly for offset To make it uniform across copy and clear, use the engine offset directly to calculate the offset in the cmd forming for emit_clear. Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-2-ramalingam.c@intel.com	2022-04-14 13:17:05 +05:30
Matthew Auld	552caa1fdb	drm/i915/migrate: move the sanity check Move the sanity check that both src and dst are never both system memory, which should never happen on discrete, and likely means we have a bug. The only exception is on integrated where we trigger this path in the selftests. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220324172143.377104-2-matthew.auld@intel.com	2022-03-29 09:10:59 +01:00
Matthew Auld	00e27ad85b	drm/i915/migrate: add acceleration support for DG2 This is all kinds of awkward since we now have to contend with using 64K GTT pages when mapping anything in LMEM(including the page-tables themselves). v2(Ram) - Document the ppGTT layout and add a better description for the different windows. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-12-ramalingam.c@intel.com	2022-02-19 22:26:47 -08:00
Thomas Hellström	1193081710	drm/i915: Avoid using the i915_fence_array when collecting dependencies Since the gt migration code was using only a single fence for dependencies, these were collected in a dma_fence_array. However, it turns out that it's illegal to use some dma_fences in a dma_fence_array, in particular other dma_fence_arrays and dma_fence_chains, and this causes trouble for us moving forward. Have the gt migration code instead take a const struct i915_deps for dependencies. This means we can skip the dma_fence_array creation and instead pass the struct i915_deps instead to circumvent the problem. v2: - Make the prev_deps() function static. (kernel test robot <lkp@intel.com>) - Update the struct i915_deps kerneldoc. v4: - Rebase. Fixes: 5652df829b3c ("drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-2-thomas.hellstrom@linux.intel.com	2021-12-22 08:14:30 +01:00
Matthew Auld	31d70749bf	drm/i915/migrate: fix length calculation No need to insert PTEs for the PTE window itself, also foreach expects a length not an end offset, which could be gigantic here with a second engine. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-3-matthew.auld@intel.com	2021-12-08 10:25:26 +00:00
Matthew Auld	08c7c122ad	drm/i915/migrate: fix offset calculation Ensure we add the engine base only after we calculate the qword offset into the PTE window. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-2-matthew.auld@intel.com	2021-12-08 10:25:25 +00:00
Matthew Auld	8eb7fcce34	drm/i915/migrate: don't check the scratch page The scratch page might not be allocated in LMEM(like on DG2), so instead of using that as the deciding factor for where the paging structures live, let's just query the pt before mapping it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-1-matthew.auld@intel.com	2021-12-08 10:25:24 +00:00
Thomas Hellström	a259cc14ec	drm/i915: Reduce the number of objects subject to memcpy recover We really only need memcpy restore for objects that affect the operability of the migrate context. That is, primarily the page-table objects of the migrate VM. Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early restores using memcpy and a way to assign LMEM page-table object flags to be used by the vms. Restore objects without this flag with the gpu blitter and only objects carrying the flag using TTM memcpy. Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and defer for a later audit which vms actually need it. Most importantly, user- allocated vms with pinned page-table objects can be restored using the blitter. Performance-wise memcpy restore is probably as fast as gpu restore if not faster, but using gpu restore will help tackling future restrictions in mappable LMEM size. v4: - Don't mark the aliasing ppgtt page table flags for early resume, but rather the ggtt page table flags as intended. (Matthew Auld) - The check for user buffer objects during early resume is pointless, since they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld) v5: - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored before we fire up the migrate context. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com	2021-09-24 08:19:16 +02:00
Dan Carpenter	ff12ce2c9c	drm/i915/gt: Potential error pointer dereference in pinned_context() If the intel_engine_create_pinned_context() function returns an error pointer, then dereferencing "ce" will Oops. Use "vm" instead of "ce->vm". Fixes: cf586021642d ("drm/i915/gt: Pipelined page migration") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210813113600.GC30697@kili	2021-08-20 12:32:38 -04:00
Jason Ekstrand	74e4b90988	drm/i915: Stop storing the ring size in the ring pointer (v3) Previously, we were storing the ring size in the ring pointer before it was actually allocated. We would then guard setting the ring size on checking for CONTEXT_ALLOC_BIT. This is error-prone at best and really only saves us a few bytes on something that already burns at least 4K. Instead, this patch adds a new ring_size field and makes everything use that. v2 (Daniel Vetter): - Replace 512 * SZ_4K with SZ_2M v2 (Jason Ekstrand): - Rebase on top of page migration code Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-3-jason@jlekstrand.net	2021-07-08 19:43:35 +02:00
Lucas De Marchi	7c517f83fa	drm/i915/gt: finish INTEL_GEN and friends conversion Commit c816723b6b8a ("drm/i915/gt: replace IS_GEN and friends with GRAPHICS_VER") converted INTEL_GEN and friends to the new version check macros. Meanwhile, some changes sneaked in to use INTEL_GEN. Remove the last users so we can remove the macros. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210707181325.2130821-2-lucas.demarchi@intel.com	2021-07-07 16:29:39 -07:00
Chris Wilson	94ce0d6507	drm/i915/gt: Setup a default migration context on the GT Set up a default migration context on the GT and use it from the selftests. Add a perf selftest and make sure we exercise LMEM if available. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-10-thomas.hellstrom@linux.intel.com	2021-06-17 14:23:11 +01:00
Chris Wilson	563baae187	drm/i915/gt: Pipelined clear Update the PTE and emit a clear within a single unpreemptible packet such that we can schedule and pipeline clears. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-9-thomas.hellstrom@linux.intel.com	2021-06-17 14:23:09 +01:00
Chris Wilson	cf58602164	drm/i915/gt: Pipelined page migration If we pipeline the PTE updates and then do the copy of those pages within a single unpreemptible command packet, we can submit the copies and leave them to be scheduled without having to synchronously wait under a global lock. In order to manage migration, we need to preallocate the page tables (and keep them pinned and available for use at any time), causing a bottleneck for migrations as all clients must contend on the limited resources. By inlining the ppGTT updates and performing the blit atomically, each client only owns the PTE while in use, and so we can reschedule individual operations however we see fit. And most importantly, we do not need to take a global lock on the shared vm, and wait until the operation is complete before releasing the lock for others to claim the PTE for themselves. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-8-thomas.hellstrom@linux.intel.com	2021-06-17 14:23:05 +01:00

26 Commits