linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	0468be89b3	IOMMU Updates for Linux v6.6 Including: - Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only - AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver - Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops. - Lift RESV_DIRECT check from VT-d driver to core. - Miscellaneous cleanups and fixes. - ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976. - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices - Usual set of cleanups an smaller fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmTx7IMACgkQK/BELZcB GuMxUA/+P/wYvAKCbDpXyszIpyCTx37BkeRTBaVqG0vEKLG6439i+PIm3oudQK+6 0y+1clJi0Ddu0uv1ck90cIEP1YDuKaKdrOVeE7TtlK+6LKYxTyeN+mz4csMIbahI 6JMrWzrIEPIyMBHzAepQiGDCsmDkrCngPj0WmA7+EQZSSHVYp+TLe6OLzNs74vDF zCITkYNq6aKyg/dNJpMRy6VOHvw9PUiwRvm7ko7WONP4VCtpW4g3Jpkerf19zoV2 s0nwZuGn3o7F0aFOpRJPPKQNfQnNjOjHdxjcsGBafD9qqAk4TLvnZH24njKtPidJ P8CiAu//HxhDyUPTgTIrDroVOGVG7s85XO+WesjPkEI3vnNjXy+qEIinQBJ3oIaI ppDLSnArEhfSRgt6dXvPCJ/g4+WGS9jNV85GCa7XBtal2Msu8G89NKC97mpmjCkb lnGmCF9t7Tkt/fLWxw4GADBN3m2tOib1GQMvPYAF2WM3jH5aRq2UliIRuCHZkzwv EF3SiFQQqab6oogU9tF/A1QLUKQ8QfYOdabqL9z2COgF5tS00VC6b/6VTNkKeBHe qIiOpI7IWo76tFJule5gRaUth9nVkjpEo6kL9I6rEldOlFJrX6uaHTta6/isY3gx vkN98V/OThRUbDwMD122YVKNNjZE2MNsTeptXqB3jHvl3UWiLsQ= =RV+G -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops - Lift RESV_DIRECT check from VT-d driver to core - Miscellaneous cleanups and fixes ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976 - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices .. and the usual set of cleanups an smaller fixes" * tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits) iommu: Explicitly include correct DT includes iommu/amd: Remove unused declarations iommu/arm-smmu-qcom: Add SM6375 SMMUv2 iommu/arm-smmu-qcom: Add SM6350 DPU compatible iommu/arm-smmu-qcom: Add SM6375 DPU compatible iommu/arm-smmu-qcom: Sort the compatible list alphabetically dt-bindings: arm-smmu: Fix MSM8998 clocks description iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope() iommu/vt-d: Fix to convert mm pfn to dma pfn iommu/vt-d: Fix to flush cache of PASID directory table iommu/vt-d: Remove rmrr check in domain attaching device path iommu: Prevent RESV_DIRECT devices from blocking domains dmaengine/idxd: Re-enable kernel workqueue under DMA API iommu/vt-d: Add set_dev_pasid callback for dma domain iommu/vt-d: Prepare for set_dev_pasid callback iommu/vt-d: Make prq draining code generic iommu/vt-d: Remove pasid_mutex iommu/vt-d: Add domain_flush_pasid_iotlb() iommu: Move global PASID allocation from SVA to core iommu: Generalize PASID 0 for normal DMA w/o PASID ...	2023-09-01 16:54:25 -07:00
Linus Torvalds	1687d8aca5	* Rework apic callbacks, getting rid of unnecessary ones and coalescing lots of silly duplicates. * Use static_calls() instead of indirect calls for apic->foo() * Tons of cleanups an crap removal along the way -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTvfO8ACgkQaDWVMHDJ krAP2A//ccii/LuvtTnNEIMMR5w2rwTdHv91ancgFkC8pOeNk37Z8sSLq8tKuLFA vgjBIysVIqunuRcNCJ+eqwIIxYfU+UGCWHppzLwO+DY3Q7o9EoTL0BgytdAqxpQQ ntEVarqWq25QYXKFoAqbUTJ1UXa42/8HfiXAX/jvP+ACXfilkGPZre6ASxlXeOhm XbgPuNQPmXi2WYQH9GCQEsz2Nh80hKap8upK2WbQzzJ3lXsm+xA//4klab0HCYwl Uc302uVZozyXRMKbAlwmgasTFOLiV8KKriJ0oHoktBpWgkpdR9uv/RDeSaFR3DAl aFmecD4k/Hqezg4yVl+4YpEn2KjxiwARCm4PMW5AV7lpWBPBHAOOai65yJlAi9U6 bP8pM0+aIx9xg7oWfsTnQ7RkIJ+GZ0w+KZ9LXFM59iu3eV1pAJE3UVyUehe/J1q9 n8OcH0UeHRlAb8HckqVm1AC7IPvfHw4OAPtUq7z3NFDwbq6i651Tu7f+i2bj31cX 77Ames+fx6WjxUjyFbJwaK44E7Qez3waztdBfn91qw+m0b+gnKE3ieDNpJTqmm5b mKulV7KJwwS6cdqY3+Kr+pIlN+uuGAv7wGzVLcaEAXucDsVn/YAMJHY2+v97xv+n J9N+yeaYtmSXVlDsJ6dndMrTQMmcasK1CVXKxs+VYq5Lgf+A68w= =eoKm -----END PGP SIGNATURE----- Merge tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Dave Hansen: "This includes a very thorough rework of the 'struct apic' handlers. Quite a variety of them popped up over the years, especially in the 32-bit days when odd apics were much more in vogue. The end result speaks for itself, which is a removal of a ton of code and static calls to replace indirect calls. If there's any breakage here, it's likely to be around the 32-bit museum pieces that get light to no testing these days. Summary: - Rework apic callbacks, getting rid of unnecessary ones and coalescing lots of silly duplicates. - Use static_calls() instead of indirect calls for apic->foo() - Tons of cleanups an crap removal along the way" * tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/apic: Turn on static calls x86/apic: Provide static call infrastructure for APIC callbacks x86/apic: Wrap IPI calls into helper functions x86/apic: Mark all hotpath APIC callback wrappers __always_inline x86/xen/apic: Mark apic __ro_after_init x86/apic: Convert other overrides to apic_update_callback() x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb() x86/apic: Provide apic_update_callback() x86/xen/apic: Use standard apic driver mechanism for Xen PV x86/apic: Provide common init infrastructure x86/apic: Wrap apic->native_eoi() into a helper x86/apic: Nuke ack_APIC_irq() x86/apic: Remove pointless arguments from [native_]eoi_write() x86/apic/noop: Tidy up the code x86/apic: Remove pointless NULL initializations x86/apic: Sanitize APIC ID range validation x86/apic: Prepare x2APIC for using apic::max_apic_id x86/apic: Simplify X2APIC ID validation x86/apic: Add max_apic_id member x86/apic: Wrap APIC ID validation into an inline ...	2023-08-30 10:44:46 -07:00
Alistair Popple	1af5a81099	mmu_notifiers: rename invalidate_range notifier There are two main use cases for mmu notifiers. One is by KVM which uses mmu_notifier_invalidate_range_start()/end() to manage a software TLB. The other is to manage hardware TLBs which need to use the invalidate_range() callback because HW can establish new TLB entries at any time. Hence using start/end() can lead to memory corruption as these callbacks happen too soon/late during page unmap. mmu notifier users should therefore either use the start()/end() callbacks or the invalidate_range() callbacks. To make this usage clearer rename the invalidate_range() callback to arch_invalidate_secondary_tlbs() and update documention. Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Frederic Barrat <fbarrat@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: SeongJae Park <sj@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:41 -07:00
Yue Haibing	0652cf98e0	iommu/amd: Remove unused declarations Commit `aafd8ba0ca` ("iommu/amd: Implement add_device and remove_device") removed the implementations but left declarations in place. Remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230814135502.4808-1-yuehaibing@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-17 12:58:24 +02:00
Vasant Hegde	8e11876a11	iommu/amd: Rearrange DTE bit definations Rearrage according to 64bit word they are in. Note that I have not rearranged gcr3 related macros even though they belong to different 64bit word as its easy to read it in current format. No functional changes intended. Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230619131908.5887-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-08-08 08:30:19 +02:00
Thomas Gleixner	a539cc86a1	x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup() Rename send_cleanup_vector() to vector_schedule_cleanup() to prepare for replacing the vector cleanup IPI with a timer callback. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20230621171248.6805-2-xin3.li@intel.com	2023-08-06 14:15:09 +02:00
Vasant Hegde	a48130e92f	iommu/amd: Enable PPR/GA interrupt after interrupt handler setup Current code enables PPR and GA interrupts before setting up the interrupt handler (in state_next()). Make sure interrupt handler is in place before enabling these interrupt. amd_iommu_enable_interrupts() gets called in normal boot, kdump as well as in suspend/resume path. Hence moving interrupt enablement to this function works fine. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:21:42 +02:00
Vasant Hegde	f52c895a2d	iommu/amd: Consolidate PPR log enablement Move PPR log interrupt bit setting to iommu_enable_ppr_log(). Also rearrange iommu_enable_ppr_log() such that PPREn bit is enabled before enabling PPRLog and PPRInt bits. So that when PPRLog bit is set it will clear the PPRLogOverflow bit and sets the PPRLogRun bit in the IOMMU Status Register [MMIO Offset 2020h]. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:21:41 +02:00
Vasant Hegde	7827a2689e	iommu/amd: Disable PPR log/interrupt in iommu_disable() Similar to other logs, disable PPR log/interrupt in iommu_disable() path. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:21:41 +02:00
Vasant Hegde	e5ebd90d1b	iommu/amd: Enable separate interrupt for PPR and GA log AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). These logs can be configured to generate different interrupts when an entry is inserted into a log buffer. However, current implementation share single interrupt to handle all three logs. With increasing usages of the GA (for IOMMU AVIC) and PPR logs (for IOMMUv2 APIs and SVA), interrupt sharing could potentially become performance bottleneck. Hence, separate IOMMU interrupt into use three separate vectors and irq threads with corresponding name, which will be displayed in the /proc/interrupts as "AMD-Vi<x>-[Evt/PPR/GA]", where "x" is an IOMMU id. Note that this patch changes interrupt handling only in IOMMU x2apic mode (MMIO 0x18[IntCapXTEn]=1). In legacy mode it will continue to use single MSI interrupt. Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Alexey Kardashevskiy<aik@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:20:38 +02:00
Vasant Hegde	2379f34852	iommu/amd: Refactor IOMMU interrupt handling logic for Event, PPR, and GA logs The AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). The IOMMU driver processes these log entries when it receive an IOMMU interrupt. Then, it needs to clear the corresponding interrupt status bits. Also, when an overflow occurs, it needs to handle the log overflow by clearing the specific overflow status bit and restart the log. Since, logic for handling these logs is the same, refactor the code into a helper function called amd_iommu_handle_irq(), which handles the steps described. Then, reuse it for all types of log. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:20:37 +02:00
Vasant Hegde	274c2218b8	iommu/amd: Handle PPR log overflow Some ATS-capable peripherals can issue requests to the processor to service peripheral page requests using PCIe PRI (the Page Request Interface). IOMMU supports PRI using PPR log buffer. IOMMU writes PRI request to PPR log buffer and sends PPR interrupt to host. When there is no space in the PPR log buffer (PPR log overflow) it will set PprOverflow bit in 'MMIO Offset 2020h IOMMU Status Register'. When this happens PPR log needs to be restarted as specified in IOMMU spec [1] section 2.6.2. When handling the event it just resumes the PPR log without resizing (similar to the way event and GA log overflow is handled). Failing to handle PPR overflow means device may not work properly as IOMMU stops processing new PPR events from device. [1] https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:19:36 +02:00
Vasant Hegde	386ae59bd7	iommu/amd: Generalize log overflow handling Each IOMMU has three log buffers (Event, GA and PPR log). Once a buffer becomes full, IOMMU generates an interrupt with the corresponding overflow status bit, and stop processing the log. To handle an overflow, the IOMMU driver needs to disable the log, clear the overflow status bit, and re-enable the log. This procedure is same among all types of log buffer except it uses different overflow status bit and enabling bit. Hence, to consolidate the log buffer restarting logic, introduce a helper function amd_iommu_restart_log(), which caller can specify parameters specific for each type of log buffer. Also rename MMIO_STATUS_EVT_OVERFLOW_INT_MASK as MMIO_STATUS_EVT_OVERFLOW_MASK. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:19:36 +02:00
Vasant Hegde	d269ab61f4	iommu/amd/iommu_v2: Clear pasid state in free path Clear pasid state in device amd_iommu_free_device() path. It will make sure no new ppr notifier is registered in free path. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:16:44 +02:00
Daniel Marcovitch	534103bcd5	iommu/amd/iommu_v2: Fix pasid_state refcount dec hit 0 warning on pasid unbind When unbinding pasid - a race condition exists vs outstanding page faults. To prevent this, the pasid_state object contains a refcount. * set to 1 on pasid bind * incremented on each ppr notification start * decremented on each ppr notification done * decremented on pasid unbind Since refcount_dec assumes that refcount will never reach 0: the current implementation causes the following to be invoked on pasid unbind: REFCOUNT_WARN("decrement hit 0; leaking memory") Fix this issue by changing refcount_dec to refcount_dec_and_test to explicitly handle refcount=1. Fixes: `8bc54824da` ("iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count") Signed-off-by: Daniel Marcovitch <dmarcovitch@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-07-14 16:16:44 +02:00
Linus Torvalds	d35ac6ac0e	IOMMU Updates for Linux v6.5 Including: - Core changes: - iova_magazine_alloc() optimization - Make flush-queue an IOMMU driver capability - Consolidate the error handling around device attachment - AMD IOMMU changes: - AVIC Interrupt Remapping Improvements - Some minor fixes and cleanups - Intel VT-d changes from Lu Baolu: - Small and misc cleanups - ARM-SMMU changes from Will Deacon: - Device-tree binding updates: * Add missing clocks for SC8280XP and SA8775 Adreno SMMUs * Add two new Qualcomm SMMUs in SDX75 and SM6375 - Workarounds for Arm MMU-700 errata: * 1076982: Avoid use of SEV-based cmdq wakeup * 2812531: Terminate command batches with a CMD_SYNC * Enforce single-stage translation to avoid nesting-related errata - Set the correct level hint for range TLB invalidation on teardown - Some other minor fixes and cleanups (including Freescale PAMU and virtio-iommu changes) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmSVnS0ACgkQK/BELZcB GuM4txAAvtE5pMxM4V/9uTJt+de/vd8XiaH2kfQEULCJm2Yz07Z5+oE+QRtjPc2D No+98IGMJCNOg+U+6JZ8P2GR3/soFvKdYjhY/iKTXK+C6jiy3dStIFN/KzzHkbpu Y/fUZ5B+DizTO6837osDWIdAz3PcwV3Vk/ogHe3FoHWU13RJYOMp2FAox0QreBNE kb7tK3ki/RCasbF9rMt9ClB0SZEVDysRkYF7AtXtsMNVm5jpQAITXVcNUYMeaJFL n0J8hjn3EiZj7dgzxbL5bRgDyfPadwJkWz2BxkQ6x0gopgHu0EimGL8p2Bei2f8x lv2y692L6zZth2ZgjSkecf3Lo4YHirsP/1U1zrLDjEgeBZ0vRxiX0qsvCb9692C1 +shy5jOX22ub+zJ2UFHMNGKu3ZdhcKi+meejdqM/GrHcRfZABh26bQILFnPF3Oxp 2WFb2v7Hq9qdQP50jsGbLji6n165aRW969fBdsk1uDUoCDHNOcdHQS3FsiKAAz5d /Z/3PR9tQgnF9bDXJB6RbGJ1rQxHlfvarOQCAYiC02ALj4FnuSLiFSBLe1bI4InR AgmnQaH2jmFMWHibdvj3q3sm33sLhOjmAE+ZX0YOhFfgrRGHq88qRwV53IfW477E 8a+6A+tnu28axk7yVJMvvz5/PeYkD2CMeplYQycUiaQutjvN9sk= =aRMe -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - iova_magazine_alloc() optimization - Make flush-queue an IOMMU driver capability - Consolidate the error handling around device attachment AMD IOMMU changes: - AVIC Interrupt Remapping Improvements - Some minor fixes and cleanups Intel VT-d changes from Lu Baolu: - Small and misc cleanups ARM-SMMU changes from Will Deacon: - Device-tree binding updates: - Add missing clocks for SC8280XP and SA8775 Adreno SMMUs - Add two new Qualcomm SMMUs in SDX75 and SM6375 - Workarounds for Arm MMU-700 errata: - 1076982: Avoid use of SEV-based cmdq wakeup - 2812531: Terminate command batches with a CMD_SYNC - Enforce single-stage translation to avoid nesting-related errata - Set the correct level hint for range TLB invalidation on teardown .. and some other minor fixes and cleanups (including Freescale PAMU and virtio-iommu changes)" * tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (50 commits) iommu/vt-d: Remove commented-out code iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one() iommu/vt-d: Handle the failure case of dmar_reenable_qi() iommu/vt-d: Remove unnecessary (void*) conversions iommu/amd: Remove extern from function prototypes iommu/amd: Use BIT/BIT_ULL macro to define bit fields iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro iommu/amd: Fix compile error for unused function iommu/amd: Improving Interrupt Remapping Table Invalidation iommu/amd: Do not Invalidate IRT when IRTE caching is disabled iommu/amd: Introduce Disable IRTE Caching Support iommu/amd: Remove the unused struct amd_ir_data.ref iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga() iommu/arm-smmu-v3: Set TTL invalidation hint better iommu/arm-smmu-v3: Document nesting-related errata iommu/arm-smmu-v3: Add explicit feature for nesting iommu/arm-smmu-v3: Document MMU-700 erratum 2812531 iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982 dt-bindings: arm-smmu: Add SDX75 SMMU compatible dt-bindings: arm-smmu: Add SM6375 GPU SMMU ...	2023-06-29 20:51:03 -07:00
Linus Torvalds	9471f1f2f5	Merge branch 'expand-stack' This modifies our user mode stack expansion code to always take the mmap_lock for writing before modifying the VM layout. It's actually something we always technically should have done, but because we didn't strictly need it, we were being lazy ("opportunistic" sounds so much better, doesn't it?) about things, and had this hack in place where we would extend the stack vma in-place without doing the proper locking. And it worked fine. We just needed to change vm_start (or, in the case of grow-up stacks, vm_end) and together with some special ad-hoc locking using the anon_vma lock and the mm->page_table_lock, it all was fairly straightforward. That is, it was all fine until Ruihan Li pointed out that now that the vma layout uses the maple tree code, we really don't just change vm_start and vm_end any more, and the locking really is broken. Oops. It's not actually all _that_ horrible to fix this once and for all, and do proper locking, but it's a bit painful. We have basically three different cases of stack expansion, and they all work just a bit differently: - the common and obvious case is the page fault handling. It's actually fairly simple and straightforward, except for the fact that we have something like 24 different versions of it, and you end up in a maze of twisty little passages, all alike. - the simplest case is the execve() code that creates a new stack. There are no real locking concerns because it's all in a private new VM that hasn't been exposed to anybody, but lockdep still can end up unhappy if you get it wrong. - and finally, we have GUP and page pinning, which shouldn't really be expanding the stack in the first place, but in addition to execve() we also use it for ptrace(). And debuggers do want to possibly access memory under the stack pointer and thus need to be able to expand the stack as a special case. None of these cases are exactly complicated, but the page fault case in particular is just repeated slightly differently many many times. And ia64 in particular has a fairly complicated situation where you can have both a regular grow-down stack _and_ a special grow-up stack for the register backing store. So to make this slightly more manageable, the bulk of this series is to first create a helper function for the most common page fault case, and convert all the straightforward architectures to it. Thus the new 'lock_mm_and_find_vma()' helper function, which ends up being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more than half the architectures, we now have more shared code and avoid some of those twisty little passages. And largely due to this common helper function, the full diffstat of this series ends up deleting more lines than it adds. That still leaves eight architectures (ia64, m68k, microblaze, openrisc, parisc, s390, sparc64 and um) that end up doing 'expand_stack()' manually because they are doing something slightly different from the normal pattern. Along with the couple of special cases in execve() and GUP. So there's a couple of patches that first create 'locked' helper versions of the stack expansion functions, so that there's a obvious path forward in the conversion. The execve() case is then actually pretty simple, and is a nice cleanup from our old "grow-up stackls are special, because at execve time even they grow down". The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because it's just more straightforward to write out the stack expansion there manually, instead od having get_user_pages_remote() do it for us in some situations but not others and have to worry about locking rules for GUP. And the final step is then to just convert the remaining odd cases to a new world order where 'expand_stack()' is called with the mmap_lock held for reading, but where it might drop it and upgrade it to a write, only to return with it held for reading (in the success case) or with it completely dropped (in the failure case). In the process, we remove all the stack expansion from GUP (where dropping the lock wouldn't be ok without special rules anyway), and add it in manually to __access_remote_vm() for ptrace(). Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases. Everything else here felt pretty straightforward, but the ia64 rules for stack expansion are really quite odd and very different from everything else. Also thanks to Vegard Nossum who caught me getting one of those odd conditions entirely the wrong way around. Anyway, I think I want to actually move all the stack expansion code to a whole new file of its own, rather than have it split up between mm/mmap.c and mm/memory.c, but since this will have to be backported to the initial maple tree vma introduction anyway, I tried to keep the patches _fairly_ minimal. Also, while I don't think it's valid to expand the stack from GUP, the final patch in here is a "warn if some crazy GUP user wants to try to expand the stack" patch. That one will be reverted before the final release, but it's left to catch any odd cases during the merge window and release candidates. Reported-by: Ruihan Li <lrh2000@pku.edu.cn> * branch 'expand-stack': gup: add warning if some caller would seem to want stack expansion mm: always expand the stack with the mmap write lock held execve: expand new process stack manually ahead of time mm: make find_extend_vma() fail if write lock not held powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma() mm/fault: convert remaining simple cases to lock_mm_and_find_vma() arm/mm: Convert to using lock_mm_and_find_vma() riscv/mm: Convert to using lock_mm_and_find_vma() mips/mm: Convert to using lock_mm_and_find_vma() powerpc/mm: Convert to using lock_mm_and_find_vma() arm64/mm: Convert to using lock_mm_and_find_vma() mm: make the page fault mmap locking killable mm: introduce new 'lock_mm_and_find_vma()' page fault helper	2023-06-28 20:35:21 -07:00
Linus Torvalds	bc6cb4d5bc	Locking changes for v6.5: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double(). The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface: instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. Generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSav3wRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gDyxAAjCHQjpolrre7fRpyiTDwqzIKT27H04vQ zrQVlVc42WBnn9pe8LthGy43/RvYvqlZvLoLONA4fMkuYriM6nSMsoZjeUmE+6Rs QAElQC74P5YvEBOa67VNY3/M7sj22ftDe7ODtVV8OrnPjMk1sQNRvaK025Cs3yig 8MAI//hHGNmyVAp1dPYZMJNqxGCvluReLZ4SaUJFCMrg7YgUXgCBj/5Gi07TlKxn sT8BFCssoEW/B9FXkh59B1t6FBCZoSy4XSZfsZe0uVAUJ4XDEOO+zBgaWFCedNQT wP323ryBgMrkzUKA8j2/o5d3QnMA1GcBfHNNlvAl/fOfrxWXzDZnOEY26YcaLMa0 YIuRF/JNbPZlt6DCUVBUEvMPpfNYi18dFN0rat1a6xL2L4w+tm55y3mFtSsg76Ka r7L2nWlRrAGXnuA+VEPqkqbSWRUSWOv5hT2Mcyb5BqqZRsxBETn6G8GVAzIO6j6v giyfUdA8Z9wmMZ7NtB6usxe3p1lXtnZ/shCE7ZHXm6xstyZrSXaHgOSgAnB9DcuJ 7KpGIhhSODQSwC/h/J0KEpb9Pr/5jCWmXAQ2DWnZK6ndt1jUfFi8pfK58wm0AuAM o9t8Mx3o8wZjbMdt6up9OIM1HyFiMx2BSaZK+8f/bWemHQ0xwez5g4k5O5AwVOaC x9Nt+Tp0Ze4= =DsYj -----END PGP SIGNATURE----- Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface. Instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. The generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. * tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg() locking/atomic: treewide: delete arch_atomic_() kerneldoc locking/atomic: docs: Add atomic operations to the driver basic API documentation locking/atomic: scripts: generate kerneldoc comments docs: scripts: kernel-doc: accept bitwise negation like ~@var locking/atomic: scripts: simplify raw_atomic() definitions locking/atomic: scripts: simplify raw_atomic_long() definitions locking/atomic: scripts: split pfx/name/sfx/order locking/atomic: scripts: restructure fallback ifdeffery locking/atomic: scripts: build raw_atomic_long() directly locking/atomic: treewide: use raw_atomic_<op>() locking/atomic: scripts: add trivial raw_atomic_<op>() locking/atomic: scripts: factor out order template generation locking/atomic: scripts: remove leftover "${mult}" locking/atomic: scripts: remove bogus order parameter locking/atomic: xtensa: add preprocessor symbols locking/atomic: x86: add preprocessor symbols locking/atomic: sparc: add preprocessor symbols locking/atomic: sh: add preprocessor symbols ...	2023-06-27 14:14:30 -07:00
Linus Torvalds	8d7071af89	mm: always expand the stack with the mmap write lock held This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-06-27 09:41:30 -07:00
Joerg Roedel	a7a334076d	Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', 'core' and 'x86/amd' into next	2023-06-19 10:12:42 +02:00
Su Hui	5b00369fcf	iommu/amd: Fix possible memory leak of 'domain' Move allocation code down to avoid memory leak. Fixes: `29f54745f2` ("iommu/amd: Add missing domain type checks") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230608021933.856045-1-suhui@nfschina.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:36:45 +02:00
Vasant Hegde	78db2985c2	iommu/amd: Remove extern from function prototypes The kernel coding style does not require 'extern' in function prototypes. Hence remove them from header file. No functional change intended. Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230609090631.6052-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:33:58 +02:00
Vasant Hegde	d18f4ee219	iommu/amd: Use BIT/BIT_ULL macro to define bit fields Make use of BIT macro when defining bitfields which makes it easy to read. No functional change intended. Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609090631.6052-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:32:30 +02:00
Vasant Hegde	85751a8af5	iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro Interrupt Table Root Pointer is 52 bit and table must be aligned to start on a 128-byte boundary. Hence first 6 bits are ignored. Current code uses address mask as 45 instead of 46bit. Use GENMASK_ULL macro instead of manually generating address mask. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230609090327.5923-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-16 16:30:59 +02:00
Joerg Roedel	1ce018df87	iommu/amd: Fix compile error for unused function Recent changes introduced a compile error: drivers/iommu/amd/iommu.c:1285:13: error: ‘iommu_flush_irt_and_complete’ defined but not used [-Werror=unused-function] 1285 \| static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ This happens with defconfig-x86_64 because AMD IOMMU is enabled but CONFIG_IRQ_REMAP is disabled. Move the function under #ifdef CONFIG_IRQ_REMAP to fix the error. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-09 15:18:12 +02:00
Suravee Suthikulpanit	bccc37a8a2	iommu/amd: Improving Interrupt Remapping Table Invalidation Invalidating Interrupt Remapping Table (IRT) requires, the AMD IOMMU driver to issue INVALIDATE_INTERRUPT_TABLE and COMPLETION_WAIT commands. Currently, the driver issues the two commands separately, which requires calling raw_spin_lock_irqsave() twice. In addition, the COMPLETION_WAIT could potentially be interleaved with other commands causing delay of the COMPLETION_WAIT command. Therefore, combine issuing of the two commands in one spin-lock, and changing struct amd_iommu.cmd_sem_val to use atomic64 to minimize locking. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-6-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-09 14:47:10 +02:00
Suravee Suthikulpanit	98aeb4ea55	iommu/amd: Do not Invalidate IRT when IRTE caching is disabled With the Interrupt Remapping Table cache disabled, there is no need to issue invalidate IRT and wait for its completion. Therefore, add logic to bypass the operation. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-5-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-09 14:47:10 +02:00
Suravee Suthikulpanit	66419036f6	iommu/amd: Introduce Disable IRTE Caching Support An Interrupt Remapping Table (IRT) stores interrupt remapping configuration for each device. In a normal operation, the AMD IOMMU caches the table to optimize subsequent data accesses. This requires the IOMMU driver to invalidate IRT whenever it updates the table. The invalidation process includes issuing an INVALIDATE_INTERRUPT_TABLE command following by a COMPLETION_WAIT command. However, there are cases in which the IRT is updated at a high rate. For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large amount of vcpus and VFIO PCI pass-through devices, the invalidation process could potentially become a performance bottleneck. Introducing a new kernel boot option: amd_iommu=irtcachedis which disables IRTE caching by setting the IRTCachedis bit in each IOMMU Control register, and bypass the IRT invalidation process. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-09 14:47:09 +02:00
Suravee Suthikulpanit	74a37817bd	iommu/amd: Remove the unused struct amd_ir_data.ref Since the amd_iommu_update_ga() has been switched to use the modify_irte_ga() helper function to update the IRTE, the parameter is no longer needed. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Suggested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-3-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-09 14:47:08 +02:00
Joao Martins	a42f0c7a41	iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga() The modify_irte_ga() uses cmpxchg_double() to update the IRTE in one shot, which is necessary when adding IRTE cache disabling support since the driver no longer need to flush the IRT for hardware to take effect. Please note that there is a functional change where the IsRun and Destination bits of IRTE are now cached in the struct amd_ir_data.entry. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-2-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-06-09 14:47:08 +02:00
Peter Zijlstra	0a0a6800b0	x86,amd_iommu: Replace cmpxchg_double() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.788955257@infradead.org	2023-06-05 09:36:38 +02:00
Vasant Hegde	11c439a194	iommu/amd/pgtbl_v2: Fix domain max address IOMMU v2 page table supports 4 level (47 bit) or 5 level (56 bit) virtual address space. Current code assumes it can support 64bit IOVA address space. If IOVA allocator allocates virtual address > 47/56 bit (depending on page table level) then it will do wrong mapping and cause invalid translation. Hence adjust aperture size to use max address supported by the page table. Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: `aaac38f614` ("iommu/amd: Initial support for AMD IOMMU v2 page table") Cc: <Stable@vger.kernel.org> # v6.0+ Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230518054351.9626-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-23 08:29:25 +02:00
Robin Murphy	4a20ce0ff6	iommu: Add a capability for flush queue support Passing a special type to domain_alloc to indirectly query whether flush queues are a worthwhile optimisation with the given driver is a bit clunky, and looking increasingly anachronistic. Let's put that into an explicit capability instead. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3 Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/f0086a93dbccb92622e1ace775846d81c1c4b174.1683233867.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:38:44 +02:00
Jon Pan-Doh	2212fc2acf	iommu/amd: Fix domain flush size when syncing iotlb When running on an AMD vIOMMU, we observed multiple invalidations (of decreasing power of 2 aligned sizes) when unmapping a single page. Domain flush takes gather bounds (end-start) as size param. However, gather->end is defined as the last inclusive address (start + size - 1). This leads to an off by 1 error. With this patch, verified that 1 invalidation occurs when unmapping a single page. Fixes: `a270be1b3f` ("iommu/amd: Use only natural aligned flushes in a VM") Cc: stable@vger.kernel.org # >= 5.15 Signed-off-by: Jon Pan-Doh <pandoh@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Suggested-by: Gary Zibrat <gzibrat@google.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Acked-by: Nadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20230426203256.237116-1-pandoh@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:33:43 +02:00
Jason Gunthorpe	29f54745f2	iommu/amd: Add missing domain type checks Drivers are supposed to list the domain types they support in their domain_alloc() ops so when we add new domain types, like BLOCKING or SVA, they don't start breaking. This ended up providing an empty UNMANAGED domain when the core code asked for a BLOCKING domain, which happens to be the fallback for drivers that don't support it, but this is completely wrong for SVA. Check for the DMA types AMD supports and reject every other kind. Fixes: `136467962e` ("iommu: Add IOMMU SVA domain support") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-2ac37b893728+da-amd_check_types_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:26:45 +02:00
Jerry Snitselaar	8ec4e2befe	iommu/amd: Fix up merge conflict resolution Merge commit `e17c6debd4` ("Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next") added amd_iommu_init_devices, amd_iommu_uninit_devices, and amd_iommu_init_notifier back to drivers/iommu/amd/amd_iommu.h. The only references to them are here, so clean them up. Fixes: `e17c6debd4` ("Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next") Cc: Joerg Roedel <joro@8bytes.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230420192013.733331-1-jsnitsel@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:23:32 +02:00
Carlos Bilbao	75a616168b	iommu/amd: Update copyright notice The most recent changes to AMD'S IOMMU, such as level 5 guest page table support date to the year 2023. Update copyright statement accordingly. Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> Link: https://lore.kernel.org/r/20230420173006.3100682-1-carlos.bilbao@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:20:50 +02:00
Jerry Snitselaar	354440a761	iommu/amd: Use page mode macros in fetch_pte() Use the page mode macros instead of magic numbers in fetch_pte. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230420080718.523132-1-jsnitsel@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:17:41 +02:00
Joao Martins	af47b0a240	iommu/amd: Handle GALog overflows GALog exists to propagate interrupts into all vCPUs in the system when interrupts are marked as non running (e.g. when vCPUs aren't running). A GALog overflow happens when there's in no space in the log to record the GATag of the interrupt. So when the GALOverflow condition happens, the GALog queue is processed and the GALog is restarted, as the IOMMU manual indicates in section "2.7.4 Guest Virtual APIC Log Restart Procedure": \| * Wait until MMIO Offset 2020h[GALogRun]=0b so that all request \| entries are completed as circumstances allow. GALogRun must be 0b to \| modify the guest virtual APIC log registers safely. \| * Write MMIO Offset 0018h[GALogEn]=0b. \| * As necessary, change the following values (e.g., to relocate or \| resize the guest virtual APIC event log): \| - the Guest Virtual APIC Log Base Address Register \| [MMIO Offset 00E0h], \| - the Guest Virtual APIC Log Head Pointer Register \| [MMIO Offset 2040h][GALogHead], and \| - the Guest Virtual APIC Log Tail Pointer Register \| [MMIO Offset 2048h][GALogTail]. \| * Write MMIO Offset 2020h[GALOverflow] = 1b to clear the bit (W1C). \| * Write MMIO Offset 0018h[GALogEn] = 1b, and either set \| MMIO Offset 0018h[GAIntEn] to enable the GA log interrupt or clear \| the bit to disable it. Failing to handle the GALog overflow means that none of the VFs (in any guest) will work with IOMMU AVIC forcing the user to power cycle the host. When handling the event it resumes the GALog without resizing much like how it is done in the event handler overflow. The [MMIO Offset 2020h][GALOverflow] bit might be set in status register without the [MMIO Offset 2020h][GAInt] bit, so when deciding to poll for GA events (to clear space in the galog), also check the overflow bit. [suravee: Check for GAOverflow without GAInt, toggle CONTROL_GAINT_EN] Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230419201154.83880-3-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:16:04 +02:00
Joao Martins	ed8a2f4dde	iommu/amd: Don't block updates to GATag if guest mode is on On KVM GSI routing table updates, specially those where they have vIOMMUs with interrupt remapping enabled (to boot >255vcpus setups without relying on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs with a new VCPU affinity. On AMD with AVIC enabled, the new vcpu affinity info is updated via: avic_pi_update_irte() irq_set_vcpu_affinity() amd_ir_set_vcpu_affinity() amd_iommu_{de}activate_guest_mode() Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM (via GALog) when interrupt cannot be delivered due to vCPU is in blocking state. The issue is that amd_iommu_activate_guest_mode() will essentially only change IRTE fields on transitions from non-guest-mode to guest-mode and otherwise returns with no changes to IRTE on already configured guest-mode interrupts. To the guest this means that the VF interrupts remain affined to the first vCPU they were first configured, and guest will be unable to issue VF interrupts and receive messages like this from spurious interrupts (e.g. from waking the wrong vCPU in GALog): [ 167.759472] __common_interrupt: 3.34 No irq handler for vector [ 230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid 3122): Recovered 1 EQEs on cmd_eq [ 230.681799] mlx5_core 0000:00:02.0: wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400) recovered after timeout [ 230.683266] __common_interrupt: 3.34 No irq handler for vector Given the fact that amd_ir_set_vcpu_affinity() uses amd_iommu_activate_guest_mode() underneath it essentially means that VCPU affinity changes of IRTEs are nops. Fix it by dropping the check for guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change underlying DestID on the host, the VFIO IRQ handler will still be able to poke at the right guest-vCPU. Fixes: `b9c6ff94e4` ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230419201154.83880-2-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-05-22 17:16:04 +02:00
Joerg Roedel	e51b419839	Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next	2023-04-14 13:45:50 +02:00
Kishon Vijay Abraham I	ccc62b8277	iommu/amd: Fix "Guest Virtual APIC Table Root Pointer" configuration in IRTE commit `b9c6ff94e4` ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") while refactoring guest virtual APIC activation/de-activation code, stored information for activate/de-activate in "struct amd_ir_data". It used 32-bit integer data type for storing the "Guest Virtual APIC Table Root Pointer" (ga_root_ptr), though the "ga_root_ptr" is actually a 40-bit field in IRTE (Interrupt Remapping Table Entry). This causes interrupts from PCIe devices to not reach the guest in the case of PCIe passthrough with SME (Secure Memory Encryption) enabled as _SME_ bit in the "ga_root_ptr" is lost before writing it to the IRTE. Fix it by using 64-bit data type for storing the "ga_root_ptr". While at that also change the data type of "ga_tag" to u32 in order to match the IOMMU spec. Fixes: `b9c6ff94e4` ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Cc: stable@vger.kernel.org # v5.4+ Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Link: https://lore.kernel.org/r/20230405130317.9351-1-kvijayab@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 11:57:30 +02:00
Jerry Snitselaar	8f880d19e6	iommu/amd: Set page size bitmap during V2 domain allocation With the addition of the V2 page table support, the domain page size bitmap needs to be set prior to iommu core setting up direct mappings for reserved regions. When reserved regions are mapped, if this is not done, it will be looking at the V1 page size bitmap when determining the page size to use in iommu_pgsize(). When it gets into the actual amd mapping code, a check of see if the page size is supported can fail, because at that point it is checking it against the V2 page size bitmap which only supports 4K, 2M, and 1G. Add a check to __iommu_domain_alloc() to not override the bitmap if it was already set by the iommu ops domain_alloc() code path. Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Fixes: `4db6c41f09` ("iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230404072742.1895252-1-jsnitsel@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-04-13 11:56:19 +02:00
Vasant Hegde	f594496403	iommu/amd: Add 5 level guest page table support Newer AMD IOMMU supports 5 level guest page table (v2 page table). If both processor and IOMMU supports 5 level page table then enable it. Otherwise fall back to 4 level page table. Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230310090000.1117786-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-28 15:31:31 +02:00
Lu Baolu	c33fcc13ee	iommu: Use sysfs_emit() for sysfs show Use sysfs_emit() instead of the sprintf() for sysfs entries. sysfs_emit() knows the maximum of the temporary buffer used for outputting sysfs content and avoids overrunning the buffer length. Prefer 'long long' over 'long long int' as suggested by checkpatch.pl. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230322123421.278852-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-22 15:47:10 +01:00
Vasant Hegde	4d4a0dbab2	iommu/amd: Allocate IOMMU irqs using numa locality info Use numa information to allocate irq resources and also to set irq affinity. This optimizes the IOMMU interrupt handling. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230321092348.6127-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-22 15:43:40 +01:00
Vasant Hegde	0d571dcbe7	iommu/amd: Allocate page table using numa locality info Introduce 'struct protection_domain->nid' variable. It will contain IOMMU NUMA node ID. And allocate page table pages using IOMMU numa locality info. This optimizes page table walk by IOMMU. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230321092348.6127-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-03-22 15:43:39 +01:00
Linus Torvalds	143c7bc649	iommufd for 6.3 Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D crkeq9D4FUiPAkFnJ64Exw2FHb060Qg= =RABK -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not add the same hwpt to the ioas->hwpt_list twice iommufd: Make sure to zero vfio_iommu_type1_info before copying to user vfio: Support VFIO_NOIOMMU with iommufd iommufd: Add three missing structures in ucmd_buffer selftests: iommu: Fix test_cmd_destroy_access() call in user_copy iommu: Remove IOMMU_CAP_INTR_REMAP irq/s390: Add arch_is_isolated_msi() for s390 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code iommufd: Convert to msi_device_has_isolated_msi() vfio/type1: Convert to iommu_group_has_isolated_msi() iommu: Add iommu_group_has_isolated_msi() genirq/msi: Add msi_device_has_isolated_msi()	2023-02-24 14:34:12 -08:00
Joerg Roedel	bedd29d793	Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next	2023-02-18 15:43:04 +01:00
Vasant Hegde	f451c7a5a3	iommu/amd: Skip attach device domain is same as new domain If device->domain is same as new domain then we can skip the device attach process. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230215052642.6016-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-02-18 15:36:33 +01:00

1 2 3 4 5 ...

308 Commits