linux

iv/linux

Author	SHA1	Message	Date
Jason Gunthorpe	9744a7ab62	iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING To add a new IOMMUFD_OBJ_HWPT_NESTED, rename the HWPT object to confine it to PAGING hwpts/domains. The following patch will separate the hwpt structure as well. Link: https://lore.kernel.org/r/20231026043938.63898-3-yi.l.liu@intel.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-26 11:15:56 -03:00
Lu Baolu	e82c175e63	Revert "iommu/vt-d: Remove unused function" This reverts commit `c61c255e11`. The pasid_set_wpe() helper, which was removed by the reverted commit, is still used by the nesting translation support in the iommufd tree. To avoid a merge conflict, revert the commit. Link: https://lore.kernel.org/linux-kernel/20231025153455.283c5b12@canb.auug.org.au/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231025131854.375388-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-25 17:32:03 +02:00
Nicolin Chen	2ccabf81dd	iommufd: Only enforce cache coherency in iommufd_hw_pagetable_alloc According to the conversation in the following link: https://lore.kernel.org/linux-iommu/20231020135501.GG3952@nvidia.com/ The enforce_cache_coherency should be set/enforced in the hwpt allocation routine. The iommu driver in its attach_dev() op should decide whether to reject or not a device that doesn't match with the configuration of cache coherency. Drop the enforce_cache_coherency piece in the attach/replace() and move the remaining "num_devices" piece closer to the refcount that is using it. Accordingly drop its function prototype in the header and mark it static. Also add some extra comments to clarify the expected behaviors. Link: https://lore.kernel.org/r/20231024012958.30842-1-nicolinc@nvidia.com Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 12:56:37 -03:00
Joao Martins	0795b305da	iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag Change test_mock_dirty_bitmaps() to pass a flag where it specifies the flag under test. The test does the same thing as the GET_DIRTY_BITMAP regular test. Except that it tests whether the dirtied bits are fetched all the same a second time, as opposed to observing them cleared. Link: https://lore.kernel.org/r/20231024135109.73787-19-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:44 -03:00
Joao Martins	ae36fe70ce	iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO Enumerate the capabilities from the mock device and test whether it advertises as expected. Include it as part of the iommufd_dirty_tracking fixture. Link: https://lore.kernel.org/r/20231024135109.73787-18-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:44 -03:00
Joao Martins	a9af47e382	iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and implement the mock iommu domain ops that get the dirty tracking supported. The selftest exercises the usual main workflow of: 1) Setting dirty tracking from the iommu domain 2) Read and clear dirty IOPTEs Different fixtures will test different IOVA range sizes, that exercise corner cases of the bitmaps. Link: https://lore.kernel.org/r/20231024135109.73787-17-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:44 -03:00
Joao Martins	7adf267d66	iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING Change mock_domain to supporting dirty tracking and add tests to exercise the new SET_DIRTY_TRACKING API in the iommufd_dirty_tracking selftest fixture. Link: https://lore.kernel.org/r/20231024135109.73787-16-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:44 -03:00
Joao Martins	266ce58989	iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING In order to selftest the iommu domain dirty enforcing implement the mock_domain necessary support and add a new dev_flags to test that the hwpt_alloc/attach_device fails as expected. Expand the existing mock_domain fixture with a enforce_dirty test that exercises the hwpt_alloc and device attachment. Link: https://lore.kernel.org/r/20231024135109.73787-15-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:44 -03:00
Joao Martins	e04b23c8d4	iommufd/selftest: Expand mock_domain with dev_flags Expand mock_domain test to be able to manipulate the device capabilities. This allows testing with mockdev without dirty tracking support advertised and thus make sure enforce_dirty test does the expected. To avoid breaking IOMMUFD_TEST UABI replicate the mock_domain struct and thus add an input dev_flags at the end. Link: https://lore.kernel.org/r/20231024135109.73787-14-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:44 -03:00
Joao Martins	f35f22cc76	iommu/vt-d: Access/Dirty bit support for SS domains IOMMU advertises Access/Dirty bits for second-stage page table if the extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first stage table is compatible with CPU page table thus A/D bits are implicitly supported. Relevant Intel IOMMU SDM ref for first stage table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table "3.7.2 Accessed and Dirty Flags". First stage page table is enabled by default so it's allowed to set dirty tracking and no control bits needed, it just returns 0. To use SSADS, set bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB via pasid_flush_caches() following the manual. Relevant SDM refs: "3.7.2 Accessed and Dirty Flags" "6.5.3.3 Guidance to Software for Invalidations, Table 23. Guidance to Software for Invalidations" PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the caller of the iommu op will flush the IOTLB. Relevant manuals over the hardware translation is chapter 6 with some special mention to: "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" "6.2.4 IOTLB" Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	421a511a29	iommu/amd: Access/Dirty bit support in IOPTEs IOMMU advertises Access/Dirty bits if the extended feature register reports it. Relevant AMD IOMMU SDM ref[0] "1.3.8 Enhanced Support for Access and Dirty Bits" To enable it set the DTE flag in bits 7 and 8 to enable access, or access+dirty. With that, the IOMMU starts marking the D and A flags on every Memory Request or ATS translation request. It is on the VMM side to steer whether to enable dirty tracking or not, rather than wrongly doing in IOMMU. Relevant AMD IOMMU SDM ref [0], "Table 7. Device Table Entry (DTE) Field Definitions" particularly the entry "HAD". To actually toggle on and off it's relatively simple as it's setting 2 bits on DTE and flush the device DTE cache. To get what's dirtied use existing AMD io-pgtable support, by walking the pagetables over each IOVA, with fetch_pte(). The IOTLB flushing is left to the caller (much like unmap), and iommu_dirty_bitmap_record() is the one adding page-ranges to invalidate. This allows caller to batch the flush over a big span of IOVA space, without the iommu wondering about when to flush. Worthwhile sections from AMD IOMMU SDM: "2.2.3.1 Host Access Support" "2.2.3.2 Host Dirty Support" For details on how IOMMU hardware updates the dirty bit see, and expects from its consequent clearing by CPU: "2.2.7.4 Updating Accessed and Dirty Bits in the Guest Address Tables" "2.2.7.5 Clearing Accessed and Dirty Bits" Quoting the SDM: "The setting of accessed and dirty status bits in the page tables is visible to both the CPU and the peripheral when sharing guest page tables. The IOMMU interlocked operations to update A and D bits must be 64-bit operations and naturally aligned on a 64-bit boundary" .. and for the IOMMU update sequence to Dirty bit, essentially is states: 1. Decodes the read and write intent from the memory access. 2. If P=0 in the page descriptor, fail the access. 3. Compare the A & D bits in the descriptor with the read and write intent in the request. 4. If the A or D bits need to be updated in the descriptor: * Start atomic operation. * Read the descriptor as a 64-bit access. * If the descriptor no longer appears to require an update, release the atomic lock with no further action and continue to step 5. * Calculate the new A & D bits. * Write the descriptor as a 64-bit access. * End atomic operation. 5. Continue to the next stage of translation or to the memory access. Access/Dirty bits readout also need to consider the non-default page-sizes (aka replicated PTEs as mentined by manual), as AMD supports all powers of two (except 512G) page sizes. Select IOMMUFD_DRIVER only if IOMMUFD is enabled considering that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-12-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	134288158a	iommu/amd: Add domain_alloc_user based domain allocation Add the domain_alloc_user op implementation. To that end, refactor amd_iommu_domain_alloc() to receive a dev pointer and flags, while renaming it too, such that it becomes a common function shared with domain_alloc_user() implementation. The sole difference with domain_alloc_user() is that we initialize also other fields that iommu_domain_alloc() does. It lets it return the iommu domain correctly initialized in one function. This is in preparation to add dirty enforcement on AMD implementation of domain_alloc_user. Link: https://lore.kernel.org/r/20231024135109.73787-11-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	609848132c	iommufd: Add a flag to skip clearing of IOPTE dirty VFIO has an operation where it unmaps an IOVA while returning a bitmap with the dirty data. In reality the operation doesn't quite query the IO pagetables that the PTE was dirty or not. Instead it marks as dirty on anything that was mapped, and doing so in one syscall. In IOMMUFD the equivalent is done in two operations by querying with GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB flushes given that after clearing dirty bits IOMMU implementations require invalidating their IOTLB, plus another invalidation needed for the UNMAP. To allow dirty bits to be queried faster, add a flag (IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR) that requests to not clear the dirty bits from the PTE (but just reading them), under the expectation that the next operation is the unmap. An alternative is to unmap and just perpectually mark as dirty as that's the same behaviour as today. So here equivalent functionally can be provided with unmap alone, and if real dirty info is required it will amortize the cost while querying. There's still a race against DMA where in theory the unmap of the IOVA (when the guest invalidates the IOTLB via emulated iommu) would race against the VF performing DMA on the same IOVA. As discussed in [0], we are accepting to resolve this race as throwing away the DMA and it doesn't matter if it hit physical DRAM or not, the VM can't tell if we threw it away because the DMA was blocked or because we failed to copy the DRAM. [0] https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/ Link: https://lore.kernel.org/r/20231024135109.73787-10-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	7623683857	iommufd: Add capabilities to IOMMU_GET_HW_INFO Extend IOMMUFD_CMD_GET_HW_INFO op to query generic iommu capabilities for a given device. Capabilities are IOMMU agnostic and use device_iommu_capable() API passing one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY_TRACKING for now in the out_capabilities field returned back to userspace. Link: https://lore.kernel.org/r/20231024135109.73787-9-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	b9a60d6f85	iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP Connect a hw_pagetable to the IOMMU core dirty tracking read_and_clear_dirty iommu domain op. It exposes all of the functionality for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from the PTEs. In doing so, add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the reading of dirty IOPTEs for a given IOVA range and then copying back to userspace bitmap. Underneath it uses the IOMMU domain kernel API which will read the dirty bits, as well as atomically clearing the IOPTE dirty bit and flushing the IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the bitmaps user pages efficiently and without copies. Within the iterator function we iterate over io-pagetable contigous areas that have been mapped. Contrary to past incantation of a similar interface in VFIO the IOVA range to be scanned is tied in to the bitmap size, thus the application needs to pass a appropriately sized bitmap address taking into account the iova range being passed and page size ... as opposed to allowing bitmap-iova != iova. Link: https://lore.kernel.org/r/20231024135109.73787-8-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	e2a4b29478	iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING Every IOMMU driver should be able to implement the needed iommu domain ops to control dirty tracking. Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically the ability to enable/disable dirty tracking on an IOMMU domain (hw_pagetable id). To that end add an io_pagetable kernel API to toggle dirty tracking: * iopt_set_dirty_tracking(iopt, [domain], state) The intended caller of this is via the hw_pagetable object that is created. Internally it will ensure the leftover dirty state is cleared /right before/ dirty tracking starts. This is also useful for iommu drivers which may decide that dirty tracking is always-enabled at boot without wanting to toggle dynamically via corresponding iommu domain op. Link: https://lore.kernel.org/r/20231024135109.73787-7-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:43 -03:00
Joao Martins	5f9bdbf4c6	iommufd: Add a flag to enforce dirty tracking on attach Throughout IOMMU domain lifetime that wants to use dirty tracking, some guarantees are needed such that any device attached to the iommu_domain supports dirty tracking. The idea is to handle a case where IOMMU in the system are assymetric feature-wise and thus the capability may not be supported for all devices. The enforcement is done by adding a flag into HWPT_ALLOC namely: IOMMU_HWPT_ALLOC_DIRTY_TRACKING .. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating a iommu_domain via domain_alloc_user() and validating the requested flags with what the device IOMMU supports (and failing accordingly) advertised). Advertising the new IOMMU domain feature flag requires that the individual iommu driver capability is supported when a future device attachment happens. Link: https://lore.kernel.org/r/20231024135109.73787-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:42 -03:00
Joao Martins	13578d4ebe	iommufd/iova_bitmap: Move symbols to IOMMUFD namespace Have the IOVA bitmap exported symbols adhere to the IOMMUFD symbol export convention i.e. using the IOMMUFD namespace. In doing so, import the namespace in the current users. This means VFIO and the vfio-pci drivers that use iova_bitmap_set(). Link: https://lore.kernel.org/r/20231024135109.73787-4-joao.m.martins@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:42 -03:00
Joao Martins	8c9c727b61	vfio: Move iova_bitmap into iommufd Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking the user bitmaps, so move to the common dependency into IOMMUFD. In doing so, create the symbol IOMMUFD_DRIVER which designates the builtin code that will be used by drivers when selected. Today this means MLX5_VFIO_PCI and PDS_VFIO_PCI. IOMMU drivers will do the same (in future patches) when supporting dirty tracking and select IOMMUFD_DRIVER accordingly. Given that the symbol maybe be disabled, add header definitions in iova_bitmap.h for when IOMMUFD_DRIVER=n Link: https://lore.kernel.org/r/20231024135109.73787-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-24 11:58:42 -03:00
Vasant Hegde	cedc811c76	iommu/amd: Remove DMA_FQ type from domain allocation path .. as drivers won't see DMA_FQ any more. See commit `a4fdd97622` ("iommu: Use flush queue capability") for details. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20231016051305.13091-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:39:41 +02:00
Gustavo A. R. Silva	9e13ec61de	iommu/virtio: Add __counted_by for struct viommu_request and use struct_size() Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/ZSRFW0yDlDo8+at3@work Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:37:31 +02:00
Jingqi Liu	2b437e8045	iommu/vt-d: debugfs: Support dumping a specified page table The original debugfs only dumps all page tables without pasid. With pasid supported, the page table with pasid also needs to be dumped. This patch supports dumping a specified page table in legacy mode or scalable mode with or without a specified pasid. For legacy mode, according to bus number and DEVFN, traverse the root table and context table to get the pointer of page table in the context table entry, then dump the specified page table. For scalable mode, according to bus number, DEVFN and pasid, traverse the root table, context table, pasid directory and pasid table to get the pointer of page table in the pasid table entry, then dump the specified page table.. Examples are as follows: 1) Dump the page table of device "0000:00:1f.0" that only supports legacy mode. $ sudo cat /sys/kernel/debug/iommu/intel/0000:00:1f.0/domain_translation_struct 2) Dump the page table of device "0000:00:0a.0" with PASID "1" that supports scalable mode. $ sudo cat /sys/kernel/debug/iommu/intel/0000:00:0a.0/1/domain_translation_struct Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-4-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:34:52 +02:00
Jingqi Liu	d87731f609	iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} Add a debugfs directory per pair of {device, pasid} if the mappings of its page table are created and destroyed by the iommu_map/unmap() interfaces. i.e. /sys/kernel/debug/iommu/intel/<device source id>/<pasid>. Create a debugfs file in the directory for users to dump the page table corresponding to {device, pasid}. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct. For the default domain without pasid, it creates a debugfs file in the debugfs device directory for users to dump its page table. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct. When setting a domain to a PASID of device, create a debugfs file in the pasid debugfs directory for users to dump the page table of the specified pasid. Remove the debugfs device directory of the device when releasing a device. e.g. /sys/kernel/debug/iommu/intel/0000:00:01.0 Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-3-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:34:51 +02:00
Jingqi Liu	e8aa45f8cc	iommu/vt-d: debugfs: Dump entry pointing to huge page For the page table entry pointing to a huge page, the data below the level of the huge page is meaningless and does not need to be dumped. Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-2-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:34:51 +02:00
Jiapeng Chong	c61c255e11	iommu/vt-d: Remove unused function The function are defined in the pasid.c file, but not called elsewhere, so delete the unused function. drivers/iommu/intel/pasid.c:342:20: warning: unused function 'pasid_set_wpe'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6185 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20230818091603.64800-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-16 09:34:50 +02:00
Michael Shavit	37ed36448f	iommu/arm-smmu-v3-sva: Remove bond refcount Always allocate a new arm_smmu_bond in __arm_smmu_sva_bind and remove the bond refcount since arm_smmu_bond can never be shared across calls to __arm_smmu_sva_bind. The iommu framework will not allocate multiple SVA domains for the same (device/mm) pair, nor will it call set_dev_pasid for a device if a domain is already attached on the given pasid. There's also a one-to-one mapping between MM and PASID. __arm_smmu_sva_bind is therefore never called with the same (device/mm) pair, and so there's no reason to try and normalize allocations of the arm_smmu_bond struct for a (device/mm) pair across set_dev_pasid. Signed-off-by: Michael Shavit <mshavit@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230905194849.v1.2.Id3ab7cf665bcead097654937233a645722a4cce3@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 18:37:06 +01:00
Michael Shavit	d912aed14f	iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle The __arm_smmu_sva_bind function returned an unused iommu_sva handle that can be removed. Signed-off-by: Michael Shavit <mshavit@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230905194849.v1.1.Ib483f67c9e2ad90ea2254b4b5ac696e4b68aa638@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 18:37:06 +01:00
Michael Shavit	475918e9c4	iommu/arm-smmu-v3: Rename cdcfg to cd_table cdcfg is a confusing name, especially given other variables with the cfg suffix in this driver. cd_table more clearly describes what is being operated on. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Link: https://lore.kernel.org/r/20230915211705.v8.9.I5ee79793b444ddb933e8bc1eb7b77e728d7f8350@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	6032f58498	iommu/arm-smmu-v3: Update comment about STE liveness Update the comment to reflect the fact that the STE is not always installed. arm_smmu_domain_finalise_s1 intentionnaly calls arm_smmu_write_ctx_desc while the STE is not installed. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.8.I7a8beb615e2520ad395d96df94b9ab9708ee0d9c@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	5e14313df2	iommu/arm-smmu-v3: Cleanup arm_smmu_domain_finalise Remove unused master parameter now that the CD table is allocated elsewhere. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.7.Iff18df41564b9df82bf40b3ec7af26b87f08ef6e@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	10e4968cd5	iommu/arm-smmu-v3: Move CD table to arm_smmu_master With this change, each master will now own its own CD table instead of sharing one with other masters attached to the same domain. Attaching a stage 1 domain installs CD entries into the master's CD table. SVA writes its CD entries into each master's CD table if the domain is shared across masters. Also add the device to the devices list before writing the CD to the table so that SVA will know that the CD needs to be re-written to this device's CD table as well if it decides to update the CD's ASID concurrently with this function. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Link: https://lore.kernel.org/r/20230915211705.v8.6.Ice063dcf87d1b777a72e008d9e3406d2bcf6d876@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	24503148c5	iommu/arm-smmu-v3: Refactor write_ctx_desc Update arm_smmu_write_ctx_desc and downstream functions to operate on a master instead of an smmu domain. We expect arm_smmu_write_ctx_desc() to only be called to write a CD entry into a CD table owned by the master. Under the hood, arm_smmu_write_ctx_desc still fetches the CD table from the domain that is attached to the master, but a subsequent commit will move that table's ownership to the master. Note that this change isn't a nop refactor since SVA will call arm_smmu_write_ctx_desc in a loop for every master the domain is attached to despite the fact that they all share the same CD table. This loop may look weird but becomes necessary when the CD table becomes per-master in a subsequent commit. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.5.I219054a6cf538df5bb22f4ada2d9933155d6058c@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	1228cc509f	iommu/arm-smmu-v3: move stall_enabled to the cd table A domain can be attached to multiple masters with different master->stall_enabled values. The stall bit of a CD entry should follow master->stall_enabled and has an inverse relationship with the STE.S1STALLD bit. The stall_enabled bit does not depend on any property of the domain, so move it out of the arm_smmu_domain struct. Move it to the CD table struct so that it can fully describe how CD entries should be written to it. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.4.I5aa89c849228794a64146cfe86df21fb71629384@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	e3aad74c51	iommu/arm-smmu-v3: Encapsulate ctx_desc_cfg init in alloc_cd_tables This is slighlty cleaner: arm_smmu_ctx_desc_cfg is initialized in a single function instead of having pieces set ahead-of time by its caller. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.3.I875254464d044a8ce8b3a2ad6beb655a4a006456@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:17 +01:00
Michael Shavit	1f85888340	iommu/arm-smmu-v3: Replace s1_cfg with cdtab_cfg Remove struct arm_smmu_s1_cfg. This is really just a CD table with a bit of extra information. Move other attributes of the CD table that were held there into the existing CD table structure, struct arm_smmu_ctx_desc_cfg, and replace all usages of arm_smmu_s1_cfg with arm_smmu_ctx_desc_cfg. For clarity, use the name "cd_table" for the variables pointing to arm_smmu_ctx_desc_cfg in the new code instead of cdcfg. A later patch will make this fully consistent. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.2.I1ef1ed19d7786c8176a0d05820c869e650c8d68f@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:16 +01:00
Michael Shavit	987a878e09	iommu/arm-smmu-v3: Move ctx_desc out of s1_cfg arm_smmu_s1_cfg (and by extension arm_smmu_domain) owns both a CD table and the CD inserted into that table's non-pasid CD entry. This limits arm_smmu_domain's ability to represent non-pasid domains, where multiple domains need to be inserted into a common CD table. Rather than describing an STE entry (which may have multiple domains installed into it with PASID), a domain should describe a single CD entry instead. This is precisely the role of arm_smmu_ctx_desc. A subsequent commit will also move the CD table outside of arm_smmu_domain. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230915211705.v8.1.I67ab103c18d882aedc8a08985af1fba70bca084e@changeid Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 17:08:16 +01:00
Danila Tikhonov	70c613602b	iommu/arm-smmu-qcom: Add SM7150 SMMUv2 SM7150 uses a qcom,smmu-v2-style SMMU just for Adreno and friends. Add a compatible for it. Signed-off-by: Danila Tikhonov <danila@jiaxyga.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230913184526.20016-3-danila@jiaxyga.com Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 13:49:18 +01:00
Richard Acayan	270a147040	iommu/arm-smmu-qcom: Add SDM670 MDSS compatible Add the compatible for the MDSS client on the Snapdragon 670 so it can be properly configured by the IOMMU driver. Otherwise, there is an unhandled context fault. Signed-off-by: Richard Acayan <mailingradian@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230925234246.900351-3-mailingradian@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2023-10-12 13:46:23 +01:00
Yi Liu	c97d1b20d3	iommu/vt-d: Add domain_alloc_user op Add the domain_alloc_user() op implementation. It supports allocating domains to be used as parent under nested translation. Unlike other drivers VT-D uses only a single page table format so it only needs to check if the HW can support nesting. Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-10 13:31:24 -03:00
Yi Liu	408663619f	iommufd/selftest: Add domain_alloc_user() support in iommu mock Add mock_domain_alloc_user() and a new test case for IOMMU_HWPT_ALLOC_NEST_PARENT. Link: https://lore.kernel.org/r/20230928071528.26258-6-yi.l.liu@intel.com Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-10 13:31:24 -03:00
Yi Liu	4ff5421633	iommufd: Support allocating nested parent domain Extend IOMMU_HWPT_ALLOC to allocate domains to be used as parent (stage-2) in nested translation. Add IOMMU_HWPT_ALLOC_NEST_PARENT to the uAPI. Link: https://lore.kernel.org/r/20230928071528.26258-5-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-10 13:31:24 -03:00
Yi Liu	89d63875d8	iommufd: Flow user flags for domain allocation to domain_alloc_user() Extends iommufd_hw_pagetable_alloc() to accept user flags, the uAPI will provide the flags. Link: https://lore.kernel.org/r/20230928071528.26258-4-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-10 13:31:24 -03:00
Yi Liu	7975b72208	iommufd: Use the domain_alloc_user() op for domain allocation Make IOMMUFD use iommu_domain_alloc_user() by default for iommu_domain creation. IOMMUFD needs to support iommu_domain allocation with parameters from userspace in nested support, and a driver is expected to implement everything under this op. If the iommu driver doesn't provide domain_alloc_user callback then IOMMUFD falls back to use iommu_domain_alloc() with an UNMANAGED type if possible. Link: https://lore.kernel.org/r/20230928071528.26258-3-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-10 13:31:24 -03:00
Vasant Hegde	189116d5ad	Revert "iommu: Fix false ownership failure on AMD systems with PASID activated" This reverts commit `2380f1e819`. Previous patch removed AMD iommu_v2 module. Hence its safe to revert this workaround. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20231006095706.5694-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-06 16:01:56 +02:00
Vasant Hegde	45d08d85e6	iommu/amd: Remove unused EXPORT_SYMBOLS Drop EXPORT_SYMBOLS for the functions that are not used by any modules. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-06 16:01:55 +02:00
Vasant Hegde	37b282fa04	iommu/amd: Remove amd_iommu_device_info() No one is using this function. Hence remove it. Also move PCI device feature detection flags to amd_iommu_types.h as its only used inside AMD IOMMU driver. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-06 16:01:54 +02:00
Vasant Hegde	d55b0d2e07	iommu/amd: Remove PPR support Remove PPR handler and notifier related functions as its not used anymore. Note that we are retaining PPR interrupt handler support as it will be re-used when we introduce IOPF support. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-06 16:01:54 +02:00
Vasant Hegde	5a0b11a180	iommu/amd: Remove iommu_v2 module AMD GPU driver which was the only in-kernel user of iommu_v2 module removed dependency on iommu_v2 module. Also we are working on adding SVA support in AMD IOMMU driver. Device drivers are expected to use common SVA framework to enable device PASID/PRI features. Removing iommu_v2 module and then adding SVA simplifies the development. Hence remove iommu_v2 module. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-06 16:01:52 +02:00
Jason Gunthorpe	b85b4f3084	iommu: Fix return code in iommu_group_alloc_default_domain() This function returns NULL on errors, not ERR_PTR. Fixes: `1c68cbc64f` ("iommu: Add IOMMU_DOMAIN_PLATFORM") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/8fb75157-6c81-4a9c-9992-d73d49902fa8@moroto.mountain Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-ee2bae9af0f2+96-iommu_ga_err_ptr_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-05 13:03:40 +02:00
Jason Gunthorpe	0f6a90436a	iommu: Do not use IOMMU_DOMAIN_DMA if CONFIG_IOMMU_DMA is not enabled msm_iommu platforms do not select either CONFIG_IOMMU_DMA or CONFIG_ARM_DMA_USE_IOMMU so they create a IOMMU_DOMAIN_DMA domain by default and never populate it. This acts like a BLOCKED domain and breaks the GPU driver on the platform. Detect this and force use of IDENTITY instead. Fixes: `98ac73f99b` ("iommu: Require a default_domain for all iommu drivers") Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/linux-iommu/CAA8EJprz7VVmBG68U9zLuqPd0UdSRHYoLDJSP6tCj6H6qanuTQ@mail.gmail.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/0-v1-20700abdf239+19c-iommu_no_dma_iommu_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-05 12:58:56 +02:00
Niklas Schnelle	9f5b681e2a	iommu/dma: Use a large flush queue and timeout for shadow_on_flush Flush queues currently use a fixed compile time size of 256 entries. This being a power of 2 allows the compiler to use shift and mask instead of more expensive modulo operations. With per-CPU flush queues larger queue sizes would hit per-CPU allocation limits, with a single flush queue these limits do not apply however. Also with single queues being particularly suitable for virtualized environments with expensive IOTLB flushes these benefit especially from larger queues and thus fewer flushes. To this end re-order struct iova_fq so we can use a dynamic array and introduce the flush queue size and timeouts as new options in the iommu_dma_options struct. So as not to lose the shift and mask optimization, use a power of 2 for the length and use explicit shift and mask instead of letting the compiler optimize this. A large queue size and 1 second timeout is then set for the shadow on flush case set by s390 paged memory guests. This then brings performance on par with the previous s390 specific DMA API implementation. Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> #s390 Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-6-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-02 08:43:03 +02:00
Niklas Schnelle	32d5bc8b09	iommu/dma: Allow a single FQ in addition to per-CPU FQs In some virtualized environments, including s390 paged memory guests, IOTLB flushes are used to update IOMMU shadow tables. Due to this, they are much more expensive than in typical bare metal environments or non-paged s390 guests. In addition they may parallelize poorly in virtualized environments. This changes the trade off for flushing IOVAs such that minimizing the number of IOTLB flushes trumps any benefit of cheaper queuing operations or increased paralellism. In this scenario per-CPU flush queues pose several problems. Firstly per-CPU memory is often quite limited prohibiting larger queues. Secondly collecting IOVAs per-CPU but flushing via a global timeout reduces the number of IOVAs flushed for each timeout especially on s390 where PCI interrupts may not be bound to a specific CPU. Let's introduce a single flush queue mode that reuses the same queue logic but only allocates a single global queue. This mode is selected by dma-iommu if a newly introduced .shadow_on_flush flag is set in struct dev_iommu. As a first user the s390 IOMMU driver sets this flag during probe_device. With the unchanged small FQ size and timeouts this setting is worse than per-CPU queues but a follow up patch will make the FQ size and timeout variable. Together this allows the common IOVA flushing code to more closely resemble the global flush behavior used on s390's previous internal DMA API implementation. Link: https://lore.kernel.org/all/9a466109-01c5-96b0-bf03-304123f435ee@arm.com/ Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> #s390 Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-5-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-02 08:43:03 +02:00
Niklas Schnelle	53f8e9ad76	iommu/s390: Disable deferred flush for ISM devices ISM devices are virtual PCI devices used for cross-LPAR communication. Unlike real PCI devices ISM devices do not use the hardware IOMMU but inspects IOMMU translation tables directly on IOTLB flush (s390 RPCIT instruction). ISM devices keep their DMA allocations static and only very rarely DMA unmap at all. For each IOTLB flush that occurs after unmap the ISM devices will however inspect the area of the IOVA space indicated by the flush. This means that for the global IOTLB flushes used by the flush queue mechanism the entire IOVA space would be inspected. In principle this would be fine, albeit potentially unnecessarily slow, it turns out however that ISM devices are sensitive to seeing IOVA addresses that are currently in use in the IOVA range being flushed. Seeing such in-use IOVA addresses will cause the ISM device to enter an error state and become unusable. Fix this by claiming IOMMU_CAP_DEFERRED_FLUSH only for non-ISM devices. This makes sure IOTLB flushes only cover IOVAs that have been unmapped and also restricts the range of the IOTLB flush potentially reducing latency spikes. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-4-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-02 08:43:02 +02:00
Niklas Schnelle	c76c067e48	s390/pci: Use dma-iommu layer While s390 already has a standard IOMMU driver and previous changes have added I/O TLB flushing operations this driver is currently only used for user-space PCI access such as vfio-pci. For the DMA API s390 instead utilizes its own implementation in arch/s390/pci/pci_dma.c which drives the same hardware and shares some code but requires a complex and fragile hand over between DMA API and IOMMU API use of a device and despite code sharing still leads to significant duplication and maintenance effort. Let's utilize the common code DMAP API implementation from drivers/iommu/dma-iommu.c instead allowing us to get rid of arch/s390/pci/pci_dma.c. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-3-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-02 08:43:00 +02:00
Niklas Schnelle	fa4c450709	iommu: Allow .iotlb_sync_map to fail and handle s390's -ENOMEM return On s390 when using a paging hypervisor, .iotlb_sync_map is used to sync mappings by letting the hypervisor inspect the synced IOVA range and updating a shadow table. This however means that .iotlb_sync_map can fail as the hypervisor may run out of resources while doing the sync. This can be due to the hypervisor being unable to pin guest pages, due to a limit on mapped addresses such as vfio_iommu_type1.dma_entry_limit or lack of other resources. Either way such a failure to sync a mapping should result in a DMA_MAPPING_ERROR. Now especially when running with batched IOTLB flushes for unmap it may be that some IOVAs have already been invalidated but not yet synced via .iotlb_sync_map. Thus if the hypervisor indicates running out of resources, first do a global flush allowing the hypervisor to free resources associated with these mappings as well a retry creating the new mappings and only if that also fails report this error to callers. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> # sun50i Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-1-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-10-02 08:42:57 +02:00
Zhang Rui	59df44bfb0	iommu/vt-d: Avoid memory allocation in iommu_suspend() The iommu_suspend() syscore suspend callback is invoked with IRQ disabled. Allocating memory with the GFP_KERNEL flag may re-enable IRQs during the suspend callback, which can cause intermittent suspend/hibernation problems with the following kernel traces: Calling iommu_suspend+0x0/0x1d0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0 ... CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G U E 6.3-intel #r1 RIP: 0010:ktime_get+0x9b/0xb0 ... Call Trace: <IRQ> tick_sched_timer+0x22/0x90 ? __pfx_tick_sched_timer+0x10/0x10 __hrtimer_run_queues+0x111/0x2b0 hrtimer_interrupt+0xfa/0x230 __sysvec_apic_timer_interrupt+0x63/0x140 sysvec_apic_timer_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ... ------------[ cut here ]------------ Interrupts enabled after iommu_suspend+0x0/0x1d0 WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270 CPU: 0 PID: 27420 Comm: rtcwake Tainted: G U W E 6.3-intel #r1 RIP: 0010:syscore_suspend+0x147/0x270 ... Call Trace: <TASK> hibernation_snapshot+0x25b/0x670 hibernate+0xcd/0x390 state_store+0xcf/0xe0 kobj_attr_store+0x13/0x30 sysfs_kf_write+0x3f/0x50 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1fd/0x3c0 ksys_write+0x6f/0xf0 __x64_sys_write+0x1d/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Given that only 4 words memory is needed, avoid the memory allocation in iommu_suspend(). CC: stable@kernel.org Fixes: `33e0715710` ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com> Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 16:10:36 +02:00
Jinjie Ruan	f7da9c0815	iommu/tegra-smmu: Drop unnecessary error check for for debugfs_create_dir() The debugfs_create_dir() function returns error pointers. It never returns NULL. As Baolu suggested, this patch removes the error checking for debugfs_create_dir in tegra-smmu.c. This is because the DebugFS kernel API is developed in a way that the caller can safely ignore the errors that occur during the creation of DebugFS nodes. The debugfs APIs have a IS_ERR() judge in start_creating() which can handle it gracefully. So these checks are unnecessary. Fixes: `d1313e7896` ("iommu/tegra-smmu: Add debugfs support") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Baolu Lu <baolu.lu@linux.intel.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230901073056.1364755-1-ruanjinjie@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:42:02 +02:00
Jiapeng Chong	ccb76c5751	iommu: Remove duplicate include ./drivers/iommu/iommu.c: iommu-priv.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6186 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20230818092620.91748-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:41:02 +02:00
Vasant Hegde	7016b30055	iommu/amd: Initialize iommu_device->max_pasids Commit `1adf3cc20d` ("iommu: Add max_pasids field in struct iommu_device") introduced a variable struct iommu_device.max_pasids to track max PASIDS supported by each IOMMU. Let us initialize this field for AMD IOMMU. IOMMU core will use this value to set max PASIDs per device (see __iommu_probe_device()). Also remove unused global 'amd_iommu_max_pasid' variable. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-15-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:07 +02:00
Vasant Hegde	eda8c2860a	iommu/amd: Enable device ATS/PASID/PRI capabilities independently Introduce helper functions to enable/disable device ATS/PASID/PRI capabilities independently along with the new pasid_enabled and pri_enabled variables in struct iommu_dev_data to keep track, which allows attach_device() and detach_device() to be simplified. Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-14-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:06 +02:00
Vasant Hegde	92e2bd56a5	iommu/amd: Introduce iommu_dev_data.flags to track device capabilities Currently we use struct iommu_dev_data.iommu_v2 to keep track of the device ATS, PRI, and PASID capabilities. But these capabilities can be enabled independently (except PRI requires ATS support). Hence, replace the iommu_v2 variable with a flags variable, which keep track of the device capabilities. From commit `9bf49e36d7` ("PCI/ATS: Handle sharing of PF PRI Capability with all VFs"), device PRI/PASID is shared between PF and any associated VFs. Hence use pci_pri_supported() and pci_pasid_features() instead of pci_find_ext_capability() to check device PRI/PASID support. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-13-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:06 +02:00
Suravee Suthikulpanit	739eb25514	iommu/amd: Introduce iommu_dev_data.ppr For AMD IOMMU, the PPR feature is needed to support IO page fault (IOPF). PPR is enabled per PCI end-point device, and is configured by the PPR bit in the IOMMU device table entry (i.e DTE[PPR]). Introducing struct iommu_dev_data.ppr track PPR setting for each device. Also iommu_dev_data.ppr will be set only when IOMMU supports PPR. Hence remove redundant feature support check in set_dte_entry(). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-12-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:05 +02:00
Vasant Hegde	b0cc5dae1a	iommu/amd: Rename ats related variables Remove nested structure and make it as 'ats_{enable/qdep}'. Also convert 'dev_data.pri_tlp' to bit field. No functional changes intended. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-11-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:05 +02:00
Suravee Suthikulpanit	e339b51c13	iommu/amd: Modify logic for checking GT and PPR features In order to support v2 page table, IOMMU driver need to check if the hardware can support Guest Translation (GT) and Peripheral Page Request (PPR) features. Currently, IOMMU driver uses global (amd_iommu_v2_present) and per-iommu (struct amd_iommu.is_iommu_v2) variables to track the features. There variables area redundant since we could simply just check the global EFR mask. Therefore, replace it with a helper function with appropriate name. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-10-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:04 +02:00
Suravee Suthikulpanit	7b7563a934	iommu/amd: Consolidate feature detection and reporting logic Currently, IOMMU driver assumes capabilities on all IOMMU instances to be homogeneous. During early_amd_iommu_init(), the driver probes all IVHD blocks and do sanity check to make sure that only features common among all IOMMU instances are supported. This is tracked in the global amd_iommu_efr and amd_iommu_efr2, which should be used whenever the driver need to check hardware capabilities. Therefore, introduce check_feature() and check_feature2(), and modify the driver to adopt the new helper functions. In addition, clean up the print_iommu_info() to avoid reporting redundant EFR/EFR2 for each IOMMU instance. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-9-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:03 +02:00
Suravee Suthikulpanit	45677ab1e5	iommu/amd: Miscellaneous clean up when free domain * Use the protection_domain_free() helper function to free domain. The function has been modified to also free memory used for the v1 and v2 page tables. Also clear gcr3 table in v2 page table free path. * Refactor code into cleanup_domain() for reusability. Change BUG_ON to WARN_ON in cleanup path. * Protection domain dev_cnt should be read when the domain is locked. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-8-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:02 +02:00
Vasant Hegde	4c721d6a08	iommu/amd: Do not set amd_iommu_pgtable in pass-through mode Since AMD IOMMU page table is not used in passthrough mode, switching to v1 page table is not required. Therefore, remove redundant amd_iommu_pgtable update and misleading warning message. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:02 +02:00
Suravee Suthikulpanit	206fb06dc5	iommu/amd: Introduce helper functions for managing GCR3 table Refactor domain_enable_v2() into helper functions for managing GCR3 table (i.e. setup_gcr3_table() and get_gcr3_levels()), which will be used in subsequent patches. Also re-arrange code and remove forward declaration. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:01 +02:00
Vasant Hegde	bac05772fa	iommu/amd: Refactor protection domain allocation code To replace if-else with switch-case statement due to increasing number of domain types. No functional changes intended. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:01 +02:00
Suravee Suthikulpanit	ba7d263b77	iommu/amd: Consolidate logic to allocate protection domain Move the logic into the common caller function to simplify the code. No functional changes intended. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:00 +02:00
Suravee Suthikulpanit	75e6d7edfd	iommu/amd: Consolidate timeout pre-define to amd_iommu_type.h To allow inclusion in other files in subsequent patches. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:39:00 +02:00
Suravee Suthikulpanit	ade4bec9e1	iommu/amd: Remove unused amd_io_pgtable.pt_root variable It has been no longer used since the commit `6eedb59c18` ("iommu/amd: Remove amd_iommu_domain_get_pgtable"). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:38:59 +02:00
Robin Murphy	233045378d	iommu/iova: Manage the depot list size Automatically scaling the depot up to suit the peak capacity of a workload is all well and good, but it would be nice to have a way to scale it back down again if the workload changes. To that end, add backround reclaim that will gradually free surplus magazines if the depot size remains above a reasonable threshold for long enough. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/03170665c56d89c6ce6081246b47f68d4e483308.1694535580.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:07:44 +02:00
Robin Murphy	911aa1245d	iommu/iova: Make the rcache depot scale better The algorithm in the original paper specifies the storage of full magazines in the depot as an unbounded list rather than a fixed-size array. It turns out to be pretty straightforward to do this in our implementation with no significant loss of efficiency. This allows the depot to scale up to the working set sizes of larger systems, while also potentially saving some memory on smaller ones too. Since this involves touching struct iova_magazine with the requisite care, we may as well reinforce the comment with a proper assertion too. Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/f597aa72fc3e1d315bc4574af0ce0ebe5c31cd22.1694535580.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:07:43 +02:00
Robin Murphy	afad94a93e	iommu: Improve map/unmap sanity checks The current checks for the __IOMMU_DOMAIN_PAGING capability seem a bit stifled, since it is quite likely now that a non-paging domain won't have a pgsize_bitmap and/or mapping ops, and thus get caught by the earlier condition anyway. Swap them around to test the more fundamental condition first, then we can reasonably also upgrade the other to a WARN_ON, since if a driver does ever expose a paging domain without the means to actually page, it's clearly very broken. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/524db1ec0139c964d26928a6a264945aa66d010c.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:38 +02:00
Robin Murphy	bd111e987e	iommu: Retire map/unmap ops With everyone now implementing the new interfaces, clean up the last remnants of the old map/unmap ops and simplify the calling logic again. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/d2afdf13b2fbf537713c3ec642dfd49d16dd9e6a.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:38 +02:00
Robin Murphy	39f823dfb5	iommu/tegra-smmu: Update to {map,unmap}_pages Trivially update map/unmap to the new interface, which is quite happy for drivers to still process just one page per call. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/338c520ed947d6d5b9d0509ccb4588908bd9ce1e.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:37 +02:00
Robin Murphy	8c63d1e3d3	iommu/sun50i: Update to {map,unmap}_pages Trivially update map/unmap to the new interface, which is quite happy for drivers to still process just one page per call. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/395995e5097803f9a65f2fb79e0732d41c2b8a84.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:37 +02:00
Robin Murphy	d47b9777e7	iommu/rockchip: Update to {map,unmap}_pages Trivially update map/unmap to the new interface, which is quite happy for drivers to still process just one page per call. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/ccc21bf7d1d0da8989d4d517a13d0846d6b71a38.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:37 +02:00
Robin Murphy	dc9ffd8dba	iommu/omap: Update to {map,unmap}_pages Trivially update map/unmap to the new interface, which is quite happy for drivers to still process just one page per call. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/7bad94ffccd4cba32bded72e0860974012881e24.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:36 +02:00
Robin Murphy	983efefa44	iommu/exynos: Update to {map,unmap}_pages Trivially update map/unmap to the new interface, which is quite happy for drivers to still process just one page per call. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/579176033e92d49ec9fc9f3d33d7b9d4c474f0b4.1694525662.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 12:04:36 +02:00
Jason Gunthorpe	ebfdc4569e	iommu/omap: Convert to generic_single_device_group() Use the new helper. For some reason omap will probe its driver even if it doesn't load an iommu driver. Keep this working by keeping a bool to track if the iommu driver was started. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:52:08 +02:00
Jason Gunthorpe	8f68911efc	iommu/ipmmu-vmsa: Convert to generic_single_device_group() Use the new helper. This driver is kind of weird since in ARM mode it pretends it has per-device groups, but ARM64 mode does not. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:51:07 +02:00
Jason Gunthorpe	ef0f48c6be	iommu/rockchip: Convert to generic_single_device_group() Use the new helper. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:49:26 +02:00
Jason Gunthorpe	a62cafe124	iommu/sprd: Convert to generic_single_device_group() Use the new helper. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:48:23 +02:00
Jason Gunthorpe	4f43b6b6d1	iommu/sun50i: Convert to generic_single_device_group() Use the new helper. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/3-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:47:59 +02:00
Jason Gunthorpe	e8f52d84cf	iommu: Add generic_single_device_group() This implements the common pattern seen in drivers of a single iommu_group for the entire iommu driver instance. Implement this in core code so the drivers that want this can select it from their ops. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:45:29 +02:00
Jason Gunthorpe	e946f8e3e6	iommu: Remove useless group refcounting Several functions obtain the group reference and then release it before returning. This gives the impression that the refcount is protecting something for the duration of the function. In truth all of these functions are called in places that know a device driver is probed to the device and our locking rules already require that dev->iommu_group cannot change while a driver is attached to the struct device. If this was not the case then this code is already at risk of triggering UAF as it is racy if the dev->iommu_group is concurrently going to NULL/free. refcount debugging will throw a WARN if kobject_get() is called on a 0 refcount object to highlight the bug. Remove the confusing refcounting and leave behind a comment about the restriction. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v1-c869a95191f2+5e8-iommu_single_grp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:45:28 +02:00
Jason Gunthorpe	4efd98d41e	iommu: Convert remaining simple drivers to domain_alloc_paging() These drivers don't support IOMMU_DOMAIN_DMA, so this commit effectively allows them to support that mode. The prior work to require default_domains makes this safe because every one of these drivers is either compilation incompatible with dma-iommu.c, or already establishing a default_domain. In both cases alloc_domain() will never be called with IOMMU_DOMAIN_DMA for these drivers so it is safe to drop the test. Removing these tests clarifies that the domain allocation path is only about the functionality of a paging domain and has nothing to do with policy of how the paging domain is used for UNMANAGED/DMA/DMA_FQ. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/24-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:06 +02:00
Jason Gunthorpe	3529375e77	iommu: Convert simple drivers with DOMAIN_DMA to domain_alloc_paging() These drivers are all trivially converted since the function is only called if the domain type is going to be IOMMU_DOMAIN_UNMANAGED/DMA. Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yong Wu <yong.wu@mediatek.com> #For mtk_iommu.c Link: https://lore.kernel.org/r/23-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:04 +02:00
Jason Gunthorpe	4601cd2d7c	iommu: Add ops->domain_alloc_paging() This callback requests the driver to create only a __IOMMU_DOMAIN_PAGING domain, so it saves a few lines in a lot of drivers needlessly checking the type. More critically, this allows us to sweep out all the IOMMU_DOMAIN_UNMANAGED and IOMMU_DOMAIN_DMA checks from a lot of the drivers, simplifying what is going on in the code and ultimately removing the now-unused special cases in drivers where they did not support IOMMU_DOMAIN_DMA. domain_alloc_paging() should return a struct iommu_domain that is functionally compatible with ARM_DMA_USE_IOMMU, dma-iommu.c and iommufd. Be forwards looking and pass in a 'struct device *' argument. We can provide this when allocating the default_domain. No drivers will look at this. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/22-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:03 +02:00
Jason Gunthorpe	8359cf39ac	iommu: Add __iommu_group_domain_alloc() Allocate a domain from a group. Automatically obtains the iommu_ops to use from the device list of the group. Convert the internal callers to use it. Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/21-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:03 +02:00
Jason Gunthorpe	98ac73f99b	iommu: Require a default_domain for all iommu drivers At this point every iommu driver will cause a default_domain to be selected, so we can finally remove this gap from the core code. The following table explains what each driver supports and what the resulting default_domain will be: ops->defaut_domain IDENTITY DMA PLATFORM v ARM32 dma-iommu ARCH amd/iommu.c Y Y N/A either apple-dart.c Y Y N/A either arm-smmu.c Y Y IDENTITY either qcom_iommu.c G Y IDENTITY either arm-smmu-v3.c Y Y N/A either exynos-iommu.c G Y IDENTITY either fsl_pamu_domain.c Y Y N/A N/A PLATFORM intel/iommu.c Y Y N/A either ipmmu-vmsa.c G Y IDENTITY either msm_iommu.c G IDENTITY N/A mtk_iommu.c G Y IDENTITY either mtk_iommu_v1.c G IDENTITY N/A omap-iommu.c G IDENTITY N/A rockchip-iommu.c G Y IDENTITY either s390-iommu.c Y Y N/A N/A PLATFORM sprd-iommu.c Y N/A DMA sun50i-iommu.c G Y IDENTITY either tegra-smmu.c G Y IDENTITY IDENTITY virtio-iommu.c Y Y N/A either spapr Y Y N/A N/A PLATFORM * G means ops->identity_domain is used * N/A means the driver will not compile in this configuration ARM32 drivers select an IDENTITY default domain through either the ops->identity_domain or directly requesting an IDENTIY domain through alloc_domain(). In ARM64 mode tegra-smmu will still block the use of dma-iommu.c and forces an IDENTITY domain. S390 uses a PLATFORM domain to represent when the dma_ops are set to the s390 iommu code. fsl_pamu uses an PLATFORM domain. POWER SPAPR uses PLATFORM and blocking to enable its weird VFIO mode. The x86 drivers continue unchanged. After this patch group->default_domain is only NULL for a short period during bus iommu probing while all the groups are constituted. Otherwise it is always !NULL. This completes changing the iommu subsystem driver contract to a system where the current iommu_domain always represents some form of translation and the driver is continuously asserting a definable translation mode. It resolves the confusion that the original ops->detach_dev() caused around what translation, exactly, is the IOMMU performing after detach. There were at least three different answers to that question in the tree, they are all now clearly named with domain types. Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:02 +02:00
Jason Gunthorpe	5b167dea64	iommu/sun50i: Add an IOMMU_IDENTITIY_DOMAIN Prior to commit `1b932ceddd` ("iommu: Remove detach_dev callbacks") the sun50i_iommu_detach_device() function was being called by ops->detach_dev(). This is an IDENTITY domain so convert sun50i_iommu_detach_device() into sun50i_iommu_identity_attach() and a full IDENTITY domain and thus hook it back up the same was as the old ops->detach_dev(). Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/19-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:02 +02:00
Jason Gunthorpe	b01b125738	iommu/mtk_iommu: Add an IOMMU_IDENTITIY_DOMAIN This brings back the ops->detach_dev() code that commit `1b932ceddd` ("iommu: Remove detach_dev callbacks") deleted and turns it into an IDENTITY domain. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/18-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:02 +02:00
Jason Gunthorpe	666c9f1ef7	iommu/ipmmu: Add an IOMMU_IDENTITIY_DOMAIN This brings back the ops->detach_dev() code that commit `1b932ceddd` ("iommu: Remove detach_dev callbacks") deleted and turns it into an IDENTITY domain. Also reverts commit `584d334b13` ("iommu/ipmmu-vmsa: Remove ipmmu_utlb_disable()") Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/17-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:01 +02:00
Jason Gunthorpe	786478a902	iommu/qcom_iommu: Add an IOMMU_IDENTITIY_DOMAIN This brings back the ops->detach_dev() code that commit `1b932ceddd` ("iommu: Remove detach_dev callbacks") deleted and turns it into an IDENTITY domain. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/16-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:01 +02:00
Jason Gunthorpe	24b1d47616	iommu: Remove ops->set_platform_dma_ops() All drivers are now using IDENTITY or PLATFORM domains for what this did, we can remove it now. It is no longer possible to attach to a NULL domain. Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Steven Price <steven.price@arm.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/15-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:41:00 +02:00
Jason Gunthorpe	78fc30b4bb	iommu/msm: Implement an IDENTITY domain What msm does during msm_iommu_set_platform_dma() is actually putting the iommu into identity mode. Move to the new core support for ARM_DMA_USE_IOMMU by defining ops->identity_domain. This driver does not support IOMMU_DOMAIN_DMA, however it cannot be compiled on ARM64 either. Most likely it is fine to support dma-iommu.c Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/14-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:40:59 +02:00
Jason Gunthorpe	bbe39b7633	iommu/omap: Implement an IDENTITY domain What omap does during omap_iommu_set_platform_dma() is actually putting the iommu into identity mode. Move to the new core support for ARM_DMA_USE_IOMMU by defining ops->identity_domain. This driver does not support IOMMU_DOMAIN_DMA, however it cannot be compiled on ARM64 either. Most likely it is fine to support dma-iommu.c Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-09-25 11:40:59 +02:00

1 2 3 4 5 ...

4912 Commits