linux

iv/linux

Author	SHA1	Message	Date
Robin Murphy	804e72062b	iommu/arm-smmu-v3: Document nesting-related errata commit 0bfbfc526c70606bf0fad302e4821087cbecfaf4 upstream Both MMU-600 and MMU-700 have similar errata around TLB invalidation while both stages of translation are active, which will need some consideration once nesting support is implemented. For now, though, it's very easy to make our implicit lack of nesting support explicit for those cases, so they're less likely to be missed in future. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/696da78d32bb4491f898f11b0bb4d850a8aa7c6a.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 15:13:49 +02:00
Robin Murphy	744e6b80b8	iommu/arm-smmu-v3: Add explicit feature for nesting commit 1d9777b9f3d55b4b6faf186ba4f1d6fb560c0523 upstream In certain cases we may want to refuse to allow nested translation even when both stages are implemented, so let's add an explicit feature for nesting support which we can control in its own right. For now this merely serves as documentation, but it means a nice convenient check will be ready and waiting for the future nesting code. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/136c3f4a3a84cc14a5a1978ace57dfd3ed67b688.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 15:13:49 +02:00
Robin Murphy	fd86b59442	iommu/arm-smmu-v3: Document MMU-700 erratum 2812531 commit 309a15cb16bb075da1c99d46fb457db6a1a2669e upstream To work around MMU-700 erratum 2812531 we need to ensure that certain sequences of commands cannot be issued without an intervening sync. In practice this falls out of our current command-batching machinery anyway - each batch only contains a single type of invalidation command, and ends with a sync. The only exception is when a batch is sufficiently large to need issuing across multiple command queue slots, wherein the earlier slots will not contain a sync and thus may in theory interleave with another batch being issued in parallel to create an affected sequence across the slot boundary. Since MMU-700 supports range invalidate commands and thus we will prefer to use them (which also happens to avoid conditions for other errata), I'm not entirely sure it's even possible for a single high-level invalidate call to generate a batch of more than 63 commands, but for the sake of robustness and documentation, wire up an option to enforce that a sync is always inserted for every slot issued. The other aspect is that the relative order of DVM commands cannot be controlled, so DVM cannot be used. Again that is already the status quo, but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly disable it for documentation purposes even if it's not wired up anywhere yet. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 15:13:49 +02:00
Robin Murphy	2de9f3dcfe	iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982 commit f322e8af35c7f23a8c08b595c38d6c855b2d836f upstream MMU-600 versions prior to r1p0 fail to correctly generate a WFE wakeup event when the command queue transitions fom full to non-full. We can easily work around this by simply hiding the SEV capability such that we fall back to polling for space in the queue - since MMU-600 implements MSIs we wouldn't expect to need SEV for sync completion either, so this should have little to no impact. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/08adbe3d01024d8382a478325f73b56851f76e49.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 15:13:48 +02:00
Jon Pan-Doh	d42d869b2c	iommu/amd: Fix domain flush size when syncing iotlb commit 2212fc2acf3f6ee690ea36506fb882a19d1bfcab upstream. When running on an AMD vIOMMU, we observed multiple invalidations (of decreasing power of 2 aligned sizes) when unmapping a single page. Domain flush takes gather bounds (end-start) as size param. However, gather->end is defined as the last inclusive address (start + size - 1). This leads to an off by 1 error. With this patch, verified that 1 invalidation occurs when unmapping a single page. Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM") Cc: stable@vger.kernel.org # >= 5.15 Signed-off-by: Jon Pan-Doh <pandoh@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Suggested-by: Gary Zibrat <gzibrat@google.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Acked-by: Nadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20230426203256.237116-1-pandoh@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-06-09 10:32:31 +02:00
Joao Martins	c4be5d71d7	iommu/amd: Don't block updates to GATag if guest mode is on [ Upstream commit ed8a2f4ddef2eaaf864ab1efbbca9788187036ab ] On KVM GSI routing table updates, specially those where they have vIOMMUs with interrupt remapping enabled (to boot >255vcpus setups without relying on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs with a new VCPU affinity. On AMD with AVIC enabled, the new vcpu affinity info is updated via: avic_pi_update_irte() irq_set_vcpu_affinity() amd_ir_set_vcpu_affinity() amd_iommu_{de}activate_guest_mode() Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM (via GALog) when interrupt cannot be delivered due to vCPU is in blocking state. The issue is that amd_iommu_activate_guest_mode() will essentially only change IRTE fields on transitions from non-guest-mode to guest-mode and otherwise returns with no changes to IRTE on already configured guest-mode interrupts. To the guest this means that the VF interrupts remain affined to the first vCPU they were first configured, and guest will be unable to issue VF interrupts and receive messages like this from spurious interrupts (e.g. from waking the wrong vCPU in GALog): [ 167.759472] __common_interrupt: 3.34 No irq handler for vector [ 230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid 3122): Recovered 1 EQEs on cmd_eq [ 230.681799] mlx5_core 0000:00:02.0: wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400) recovered after timeout [ 230.683266] __common_interrupt: 3.34 No irq handler for vector Given the fact that amd_ir_set_vcpu_affinity() uses amd_iommu_activate_guest_mode() underneath it essentially means that VCPU affinity changes of IRTEs are nops. Fix it by dropping the check for guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change underlying DestID on the host, the VFIO IRQ handler will still be able to poke at the right guest-vCPU. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230419201154.83880-2-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-09 10:32:15 +02:00
Chao Wang	b4fd38c0c7	iommu/rockchip: Fix unwind goto issue [ Upstream commit ec014683c564fb74fc68e8f5e84691d3b3839d24 ] Smatch complains that drivers/iommu/rockchip-iommu.c:1306 rk_iommu_probe() warn: missing unwind goto? The rk_iommu_probe function, after obtaining the irq value through platform_get_irq, directly returns an error if the returned value is negative, without releasing any resources. Fix this by adding a new error handling label "err_pm_disable" and use a goto statement to redirect to the error handling process. In order to preserve the original semantics, set err to the value of irq. Fixes: 1aa55ca9b14a ("iommu/rockchip: Move irq request past pm_runtime_enable") Signed-off-by: Chao Wang <D202280639@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20230417030421.2777-1-D202280639@hust.edu.cn Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-09 10:32:15 +02:00
Chunyan Zhang	92c089a931	iommu/sprd: Release dma buffer to avoid memory leak [ Upstream commit 9afea57384d4ae7b2034593eac7fa76c7122762a ] When attaching to a domain, the driver would alloc a DMA buffer which is used to store address mapping table, and it need to be released when the IOMMU domain is freed. Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Link: https://lore.kernel.org/r/20230331033124.864691-2-zhang.lyra@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-24 17:36:48 +01:00
Tomas Krcka	a2bc5241ee	iommu/arm-smmu-v3: Acknowledge pri/event queue overflow if any [ Upstream commit 67ea0b7ce41844eae7c10bb04dfe66a23318c224 ] When an overflow occurs in the PRI queue, the SMMU toggles the overflow flag in the PROD register. To exit the overflow condition, the PRI thread is supposed to acknowledge it by toggling this flag in the CONS register. Unacknowledged overflow causes the queue to stop adding anything new. Currently, the priq thread always writes the CONS register back to the SMMU after clearing the queue. The writeback is not necessary if the OVFLG in the PROD register has not been changed, no overflow has occured. This commit checks the difference of the overflow flag between CONS and PROD register. If it's different, toggles the OVACKFLG flag in the CONS register and write it to the SMMU. The situation is similar for the event queue. The acknowledge register is also toggled after clearing the event queue but never propagated to the hardware. This would only be done the next time when executing evtq thread. Unacknowledged event queue overflow doesn't affect the event queue, because the SMMU still adds elements to that queue when the overflow condition is active. But it feel nicer to keep SMMU in sync when possible, so use the same way here as well. Signed-off-by: Tomas Krcka <krckatom@amazon.de> Link: https://lore.kernel.org/r/20230329123420.34641-1-tomas.krcka@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-24 17:36:48 +01:00
Manivannan Sadhasivam	8ebcbd1811	iommu/arm-smmu-qcom: Limit the SMR groups to 128 [ Upstream commit 12261134732689b7e30c59db9978f81230965181 ] Some platforms support more than 128 stream matching groups than what is defined by the ARM SMMU architecture specification. But due to some unknown reasons, those additional groups don't exhibit the same behavior as the architecture supported ones. For instance, the additional groups will not detect the quirky behavior of some firmware versions intercepting writes to S2CR register, thus skipping the quirk implemented in the driver and causing boot crash. So let's limit the groups to 128 for now until the issue with those groups are fixed and issue a notice to users in that case. Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20230327080029.11584-1-manivannan.sadhasivam@linaro.org [will: Reworded the comment slightly] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-24 17:36:47 +01:00
Kishon Vijay Abraham I	dc299bd1d5	iommu/amd: Fix "Guest Virtual APIC Table Root Pointer" configuration in IRTE commit ccc62b827775915a9b82db42a29813d04f92df7a upstream. commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") while refactoring guest virtual APIC activation/de-activation code, stored information for activate/de-activate in "struct amd_ir_data". It used 32-bit integer data type for storing the "Guest Virtual APIC Table Root Pointer" (ga_root_ptr), though the "ga_root_ptr" is actually a 40-bit field in IRTE (Interrupt Remapping Table Entry). This causes interrupts from PCIe devices to not reach the guest in the case of PCIe passthrough with SME (Secure Memory Encryption) enabled as _SME_ bit in the "ga_root_ptr" is lost before writing it to the IRTE. Fix it by using 64-bit data type for storing the "ga_root_ptr". While at that also change the data type of "ga_tag" to u32 in order to match the IOMMU spec. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Cc: stable@vger.kernel.org # v5.4+ Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Link: https://lore.kernel.org/r/20230405130317.9351-1-kvijayab@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-11 23:00:18 +09:00
Lu Baolu	16c951f3eb	iommu/vt-d: Allow zero SAGAW if second-stage not supported [ Upstream commit bfd3c6b9fa4a1dc78139dd1621d5bea321ffa69d ] The VT-d spec states (in section 11.4.2) that hardware implementations reporting second-stage translation support (SSTS) field as Clear also report the SAGAW field as 0. Fix an inappropriate check in alloc_iommu(). Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default") Suggested-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230318024824.124542-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20230329134721.469447-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-04-05 11:24:58 +02:00
Gavrilov Ilia	c513043e0a	iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter [ Upstream commit b6b26d86c61c441144c72f842f7469bb686e1211 ] The 'acpiid' buffer in the parse_ivrs_acpihid function may overflow, because the string specifier in the format string sscanf() has no width limitation. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Kim Phillips <kim.phillips@amd.com> Link: https://lore.kernel.org/r/20230202082719.1513849-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:58 +01:00
Kim Phillips	0fd72f1d1b	iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options [ Upstream commit 1198d2316dc4265a97d0e8445a22c7a6d17580a4 ] Currently, these options cause the following libkmod error: libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \ Ignoring bad option on kernel command line while parsing module \ name: 'ivrs_xxxx[XX:XX' Fix by introducing a new parameter format for these options and throw a warning for the deprecated format. Users are still allowed to omit the PCI Segment if zero. Adding a Link: to the reason why we're modding the syntax parsing in the driver and not in libkmod. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/ Reported-by: Kim Phillips <kim.phillips@amd.com> Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: b6b26d86c61c ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:58 +01:00
Suravee Suthikulpanit	2af1716780	iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands [ Upstream commit bbe3a106580c21bc883fb0c9fa3da01534392fe8 ] By default, PCI segment is zero and can be omitted. To support system with non-zero PCI segment ID, modify the parsing functions to allow PCI segment ID. Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20220706113825.25582-33-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: b6b26d86c61c ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:58 +01:00
Jacob Pan	5c65f09712	iommu/vt-d: Fix PASID directory pointer coherency [ Upstream commit 194b3348bdbb7db65375c72f3f774aee4cc6614e ] On platforms that do not support IOMMU Extended capability bit 0 Page-walk Coherency, CPU caches are not snooped when IOMMU is accessing any translation structures. IOMMU access goes only directly to memory. Intel IOMMU code was missing a flush for the PASID table directory that resulted in the unrecoverable fault as shown below. This patch adds clflush calls whenever allocating and updating a PASID table directory to ensure cache coherency. On the reverse direction, there's no need to clflush the PASID directory pointer when we deactivate a context entry in that IOMMU hardware will not see the old PASID directory pointer after we clear the context entry. PASID directory entries are also never freed once allocated. DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read NO_PASID] Request device [00:0d.2] fault addr 0x1026a4000 [fault reason 0x51] SM: Present bit in Directory Entry is clear DMAR: Dump dmar1 table entries for IOVA 0x1026a4000 DMAR: scalable mode root entry: hi 0x0000000102448001, low 0x0000000101b3e001 DMAR: context entry: hi 0x0000000000000000, low 0x0000000101b4d401 DMAR: pasid dir entry: 0x0000000101b4e001 DMAR: pasid table entry[0]: 0x0000000000000109 DMAR: pasid table entry[1]: 0x0000000000000001 DMAR: pasid table entry[2]: 0x0000000000000000 DMAR: pasid table entry[3]: 0x0000000000000000 DMAR: pasid table entry[4]: 0x0000000000000000 DMAR: pasid table entry[5]: 0x0000000000000000 DMAR: pasid table entry[6]: 0x0000000000000000 DMAR: pasid table entry[7]: 0x0000000000000000 DMAR: PTE not present at level 4 Cc: <stable@vger.kernel.org> Fixes: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Sukumar Ghorai <sukumar.ghorai@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230209212843.1788125-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-17 08:48:51 +01:00
Vasant Hegde	1f11ed61d6	iommu/amd: Fix error handling for pdev_pri_ats_enable() [ Upstream commit 080920e52148b4fbbf9360d5345fdcd7846e4841 ] Current code throws kernel warning if it fails to enable pasid/pri [1]. Do not call pci_disable_[pasid/pri] if pci_enable_[pasid/pri] failed. [1] https://lore.kernel.org/linux-iommu/15d0f9ff-2a56-b3e9-5b45-e6b23300ae3b@leemhuis.info/ Reported-by: Matt Fagnani <matt.fagnani@bell.net> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230111121503.5931-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:57:33 +01:00
Christophe JAILLET	6ac2adcc2b	iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() commit 4e5973dd2725bb30c3db622f7d73f7a5864ce718 upstream. If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious. Go through the end of the function instead. This way, the missing 'rcu_read_unlock()' is called. Fixes: 7afd7f6aa21a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.fr Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:15 +01:00
Jacob Pan	2fd6f6c8cb	iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode commit 16a75bbe480c3598b3af57a2504ea89b1e32c3ac upstream. Intel IOMMU driver implements IOTLB flush queue with domain selective or PASID selective invalidations. In this case there's no need to track IOVA page range and sync IOTLBs, which may cause significant performance hit. This patch adds a check to avoid IOVA gather page and IOTLB sync for the lazy path. The performance difference on Sapphire Rapids 100Gb NIC is improved by the following (as measured by iperf send): w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s Cc: <stable@vger.kernel.org> Fixes: 2a2b8eaa5b25 ("iommu: Handle freelists when using deferred flushing in iommu drivers") Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230209175330.1783556-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:40:13 +01:00
Tina Zhang	a495b6a5d0	iommu/vt-d: Allow to use flush-queue when first level is default [ Upstream commit 257ec290741924f8df678927d0dfecb1deebb9c5 ] Commit 29b32839725f ("iommu/vt-d: Do not use flush-queue when caching-mode is on") forced default domains to be strict mode as long as IOMMU caching-mode is flagged. The reason for doing this is that when vIOMMU uses VT-d caching mode to synchronize shadowing page tables, the strict mode shows better performance. However, this optimization is orthogonal to the first-level page table because the Intel VT-d architecture does not define the caching mode of the first-level page table. Refer to VT-d spec, section 6.1, "When the CM field is reported as Set, any software updates to remapping structures other than first-stage mapping (including updates to not- present entries or present entries whose programming resulted in translation faults) requires explicit invalidation of the caches." Exclude the first-level page table from this optimization. Generally using first-stage translation in vIOMMU implies nested translation enabled in the physical IOMMU. In this case the first-stage page table is wholly captured by the guest. The vIOMMU only needs to transfer the cache invalidations on vIOMMU to the physical IOMMU. Forcing the default domain to strict mode will cause more frequent cache invalidations, resulting in performance degradation. In a real performance benchmark test measured by iperf receive, the performance result on Sapphire Rapids 100Gb NIC shows: w/ this fix ~51 Gbits/s, w/o this fix ~39.3 Gbits/s. Theoretically a first-stage IOMMU page table can still be shadowed in absence of the caching mode, e.g. with host write-protecting guest IOMMU page table to synchronize changed PTEs with the physical IOMMU page table. In this case the shadowing overhead is decoupled from emulating IOTLB invalidation then the overhead of the latter part is solely decided by the frequency of IOTLB invalidations. Hence allowing guest default dma domain to be lazy can also benefit the overall performance by reducing the total VM-exit numbers. Fixes: 29b32839725f ("iommu/vt-d: Do not use flush-queue when caching-mode is on") Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230214025618.2292889-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:43 +01:00
Lu Baolu	990c539e9c	iommu/vt-d: Use second level for GPA->HPA translation [ Upstream commit 032c5ee40e9fc68ed650a3f86f23259376ec93fc ] The IOMMU VT-d implementation uses the first level for GPA->HPA translation by default. Although both the first level and the second level could handle the DMA translation, they're different in some way. For example, the second level translation has separate controls for the Access/Dirty page tracking. With the first level translation, there's no such control. On the other hand, the second level translation has the page-level control for forcing snoop, but the first level only has global control with pasid granularity. This uses the second level for GPA->HPA translation so that we can provide a consistent hardware interface for use cases like dirty page tracking for live migration. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 257ec2907419 ("iommu/vt-d: Allow to use flush-queue when first level is default") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:43 +01:00
Lu Baolu	727fb414fe	iommu/vt-d: Check FL and SL capability sanity in scalable mode [ Upstream commit 7afd7f6aa21a2929aff3a059b741933ee1819c6b ] An iommu domain could be allocated and mapped before it's attached to any device. This requires that in scalable mode, when the domain is allocated, the format (FL or SL) of the page table must be determined. In order to achieve this, the platform should support consistent SL or FL capabilities on all IOMMU's. This adds a check for this and aborts IOMMU probing if it doesn't meet this requirement. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 257ec2907419 ("iommu/vt-d: Allow to use flush-queue when first level is default") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:43 +01:00
Lu Baolu	b0a2bf28af	iommu/vt-d: Remove duplicate identity domain flag [ Upstream commit b34380a6d767c54480a937951e6189a7f9699443 ] The iommu_domain data structure already has the "type" field to keep the type of a domain. It's unnecessary to have the DOMAIN_FLAG_STATIC_IDENTITY flag in the vt-d implementation. This cleans it up with no functionality change. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210926114535.923263-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20211014053839.727419-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 257ec2907419 ("iommu/vt-d: Allow to use flush-queue when first level is default") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:42 +01:00
Lu Baolu	db05a58ed4	iommu/vt-d: Fix error handling in sva enable/disable paths [ Upstream commit 60b1daa3b168fbc648ae2ad28a84759223e49e18 ] Roll back all previous actions in error paths of intel_iommu_enable_sva() and intel_iommu_disable_sva(). Fixes: d5b9e4bfe0d8 ("iommu/vt-d: Report prq to io-pgfault framework") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230208051559.700109-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:42 +01:00
Jason Gunthorpe	f119ef452e	iommu: Fix error unwind in iommu_group_alloc() [ Upstream commit 4daa861174d56023c2068ddb03de0752f07fa199 ] If either iommu_group_grate_file() fails then the iommu_group is leaked. Destroy it on these error paths. Found by kselftest/iommu/iommufd_fail_nth Fixes: bc7d12b91bd3 ("iommu: Implement reserved_regions iommu-group sysfs file") Fixes: c52c72d3dee8 ("iommu: Add sysfs attribyte for domain type") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-8f616bee028d+8b-iommu_group_alloc_leak_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:42 +01:00
Lu Baolu	c4f590e84a	iommu/vt-d: Set No Execute Enable bit in PASID table entry [ Upstream commit e06d24435596c8afcaa81c0c498f5b0ec4ee2b7c ] Setup No Execute Enable bit (Bit 133) of a scalable mode PASID entry. This is to allow the use of XD bit of the first level page table. Fixes: ddf09b6d43ec ("iommu/vt-d: Setup pasid entries for iova over first level") Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230126095438.354205-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:39:40 +01:00
Christophe JAILLET	d766ccadbe	iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() commit 142e821f68cf5da79ce722cb9c1323afae30e185 upstream. A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in the error handling path of mtk_iommu_v1_probe(). Add the corresponding clk_disable_unprepare(), as already done in the remove function. Fixes: b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-18 11:48:52 +01:00
Yunfei Wang	c929a230c8	iommu/iova: Fix alloc iova overflows issue commit dcdb3ba7e2a8caae7bfefd603bc22fd0ce9a389c upstream. In __alloc_and_insert_iova_range, there is an issue that retry_pfn overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will overflow. As a result, if the retry logic is executed, low_pfn is updated to 0, and then new_pfn < low_pfn returns false to make the allocation successful. This issue occurs in the following two situations: 1. The first iova size exceeds the domain size. When initializing iova domain, iovad->cached_node is assigned as iovad->anchor. For example, the iova domain size is 10M, start_pfn is 0x1_F000_0000, and the iova size allocated for the first time is 11M. The following is the log information, new->pfn_lo is smaller than iovad->cached_node. Example log as follows: [ 223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00 [ 223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff 2. The node with the largest iova->pfn_lo value in the iova domain is deleted, iovad->cached_node will be updated to iovad->anchor, and then the alloc iova size exceeds the maximum iova size that can be allocated in the domain. After judging that retry_pfn is less than limit_pfn, call retry_pfn+1 to fix the overflow issue. Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com> Cc: <stable@vger.kernel.org> # 5.15.* Fixes: 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails") Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-18 11:48:52 +01:00
Kim Phillips	9b615f957c	iommu/amd: Fix ivrs_acpihid cmdline parsing code commit 5f18e9f8868c6d4eae71678e7ebd4977b7d8c8cf upstream. The second (UID) strcmp in acpi_dev_hid_uid_match considers "0" and "00" different, which can prevent device registration. Have the AMD IOMMU driver's ivrs_acpihid parsing code remove any leading zeroes to make the UID strcmp succeed. Now users can safely specify "AMDxxxxx:00" or "AMDxxxxx:0" and expect the same behaviour. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20220919155638.391481-1-kim.phillips@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-12 11:58:59 +01:00
Jason Gunthorpe	37ea9a6c41	iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY [ Upstream commit ef5bb8e7a7127218f826b9ccdf7508e7a339f4c2 ] This driver treats IOMMU_DOMAIN_IDENTITY the same as UNMANAGED, which cannot possibly be correct. UNMANAGED domains are required to start out blocking all DMAs. This seems to be what this driver does as it allocates a first level 'dt' for the IO page table that is 0 filled. Thus UNMANAGED looks like a working IO page table, and so IDENTITY must be a mistake. Remove it. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0-v1-97f0adf27b5e+1f0-s50_identity_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:35 +01:00
Yuan Can	e42b543d08	iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() [ Upstream commit 73f5fc5f884ad0c5f7d57f66303af64f9f002526 ] The fsl_pamu_probe() returns directly when create_csd() failed, leaving irq and memories unreleased. Fix by jumping to error if create_csd() returns error. Fixes: 695093e38c3e ("iommu/fsl: Freescale PAMU driver and iommu implementation.") Signed-off-by: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221121082022.19091-1-yuancan@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:35 +01:00
Yang Yingliang	6e501b3fd7	iommu/amd: Fix pci device refcount leak in ppr_notifier() [ Upstream commit 6cf0981c2233f97d56938d9d61845383d6eb227c ] As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrement the reference count by calling pci_dev_put(). So call it before returning from ppr_notifier() to avoid refcount leak. Fixes: daae2d25a477 ("iommu/amd: Don't copy GCR3 table root pointer") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221118093604.216371-1-yangyingliang@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:35 +01:00
Michael Riesch	ae00848e55	iommu/rockchip: fix permission bits in page table entries v2 [ Upstream commit 7eb99841f340b80be0d0973b0deb592d75fb8928 ] As pointed out in the corresponding downstream fix [0], the permission bits of the page table entries are compatible between v1 and v2 of the IOMMU. This is in contrast to the current mainline code that incorrectly assumes that the read and write permission bits are switched. Fix the permission bits by reusing the v1 bit defines. [0] `e3bc123a22` Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20221102063553.2464161-1-michael.riesch@wolfvision.net Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:33 +01:00
Jernej Skrabec	a7f6ad2c42	iommu/sun50i: Fix flush size [ Upstream commit 67a8a67f9eceb72e4c73d1d09ed9ab04f4b8e12d ] Function sun50i_table_flush() takes number of entries as an argument, not number of bytes. Fix that mistake in sun50i_dte_get_page_table(). Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-5-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:33 +01:00
Jernej Skrabec	38ccb9b469	iommu/sun50i: Fix R/W permission check [ Upstream commit eac0104dc69be50bed86926d6f32e82b44f8c921 ] Because driver has enum type permissions and iommu subsystem has bitmap type, we have to be careful how check for combined read and write permissions is done. In such case, we have to mask both permissions and check that both are set at the same time. Current code just masks both flags but doesn't check that both are set. In short, it always sets R/W permission, regardles if requested permissions were RO, WO or RW. Fix that. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-4-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:33 +01:00
Jernej Skrabec	ae4ab47a0b	iommu/sun50i: Consider all fault sources for reset [ Upstream commit cef20703e2b2276aaa402ec5a65ec9a09963b83e ] We have to reset masters for all faults - permissions, L1 fault or L2 fault. Currently it's done only for permissions. If other type of fault happens, master is in locked up state. Fix that by really considering all fault sources. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-3-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:33 +01:00
Jernej Skrabec	84fee3ce82	iommu/sun50i: Fix reset release [ Upstream commit 9ad0c1252e84dbc664f0462707182245ed603237 ] Reset signal is asserted by writing 0 to the corresponding locations of masters we want to reset. So in order to deassert all reset signals, we should write 1's to all locations. Current code writes 1's to locations of masters which were just reset which is good. However, at the same time it also writes 0's to other locations and thus asserts reset signals of remaining masters. Fix code by writing all 1's when we want to deassert all reset signals. This bug was discovered when working with Cedrus (video decoder). When it faulted, display went blank due to reset signal assertion. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20221025165415.307591-2-jernej.skrabec@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:33 +01:00
Xiongfeng Wang	bdb613ef17	iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init() [ Upstream commit 4bedbbd782ebbe7287231fea862c158d4f08a9e3 ] for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() for the error path to avoid reference count leak. Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-08 11:28:44 +01:00
Xiongfeng Wang	b6eea8b2e8	iommu/vt-d: Fix PCI device refcount leak in has_external_pci() [ Upstream commit afca9e19cc720bfafc75dc5ce429c185ca93f31d ] for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() before 'return true' to avoid reference count leak. Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-08 11:28:44 +01:00
Tina Zhang	052d0e79ef	iommu/vt-d: Set SRE bit only when hardware has SRS cap commit 7fc961cf7ffcb130c4e93ee9a5628134f9de700a upstream. SRS cap is the hardware cap telling if the hardware IOMMU can support requests seeking supervisor privilege or not. SRE bit in scalable-mode PASID table entry is treated as Reserved(0) for implementation not supporting SRS cap. Checking SRS cap before setting SRE bit can avoid the non-recoverable fault of "Non-zero reserved field set in PASID Table Entry" caused by setting SRE bit while there is no SRS cap support. The fault messages look like below: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000 [fault reason 0x5a] SM: Non-zero reserved field set in PASID Table Entry Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Cc: stable@vger.kernel.org Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:47 +01:00
Tina Zhang	c31a792a82	iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entries commit 242b0aaeabbe2efbef1b9d42a8e56627e800964c upstream. The A/D bits are preseted for IOVA over first level(FL) usage for both kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED). Presetting A bit in FL requires to preset the bit in every related paging entries, including the non-leaf ones. Otherwise, hardware may treat this as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might occur with below DMAR fault messages (wrapped for line length) dumped. DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000 [fault reason 0x90] SM: A/D bit update needed in first-level entry when set up in no snoop Fixes: 289b3b005cb9 ("iommu/vt-d: Preset A/D bits for user space DMA usage") Cc: stable@vger.kernel.org Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-26 09:24:47 +01:00
Jerry Snitselaar	0365d6af75	iommu/vt-d: Clean up si_domain in the init_dmars() error path [ Upstream commit 620bf9f981365c18cc2766c53d92bf8131c63f32 ] A splat from kmem_cache_destroy() was seen with a kernel prior to commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-29 10:12:57 +02:00
Yicong Yang	3d69461807	iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity [ Upstream commit 24b6c7798a0122012ca848ea0d25e973334266b0 ] The DMA operations of HiSilicon PTT device can only work properly with identical mappings. So add a quirk for the device to force the domain as passthrough. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20220816114414.4092-2-yangyicong@huawei.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-26 12:35:47 +02:00
Dan Carpenter	0c7043a5b5	iommu/omap: Fix buffer overflow in debugfs [ Upstream commit 184233a5202786b20220acd2d04ddf909ef18f29 ] There are two issues here: 1) The "len" variable needs to be checked before the very first write. Otherwise if omap2_iommu_dump_ctx() with "bytes" less than 32 it is a buffer overflow. 2) The snprintf() function returns the number of bytes that would have been copied if there were enough space. But we want to know the number of bytes which were actually copied so use scnprintf() instead. Fixes: bd4396f09a4a ("iommu/omap: Consolidate OMAP IOMMU modules") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/YuvYh1JbE3v+abd5@kili Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-26 12:35:25 +02:00
Yi Liu	b02f86689a	iommu/vt-d: Check correct capability for sagaw determination commit 154897807050c1161cb2660e502fc0470d46b986 upstream. Check 5-level paging capability for 57 bits address width instead of checking 1GB large page capability. Fixes: 53fc7ad6edf2 ("iommu/vt-d: Correctly calculate sagaw value of IOMMU") Cc: stable@vger.kernel.org Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Link: https://lore.kernel.org/r/20220916071212.2223869-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-28 11:11:42 +02:00
Lu Baolu	709edbac4c	iommu/vt-d: Fix kdump kernels boot failure with scalable mode [ Upstream commit 0c5f6c0d8201a809a6585b07b6263e9db2c874a3 ] The translation table copying code for kdump kernels is currently based on the extended root/context entry formats of ECS mode defined in older VT-d v2.5, and doesn't handle the scalable mode formats. This causes the kexec capture kernel boot failure with DMAR faults if the IOMMU was enabled in scalable mode by the previous kernel. The ECS mode has already been deprecated by the VT-d spec since v3.0 and Intel IOMMU driver doesn't support this mode as there's no real hardware implementation. Hence this converts ECS checking in copying table code into scalable mode. The existing copying code consumes a bit in the context entry as a mark of copied entry. It needs to work for the old format as well as for the extended context entries. As it's hard to find such a common bit for both legacy and scalable mode context entries. This replaces it with a per- IOMMU bitmap. Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-20 12:39:43 +02:00
Lu Baolu	4740910867	iommu/vt-d: Correctly calculate sagaw value of IOMMU commit 53fc7ad6edf210b497230ce74b61b322a202470c upstream. The Intel IOMMU driver possibly selects between the first-level and the second-level translation tables for DMA address translation. However, the levels of page-table walks for the 4KB base page size are calculated from the SAGAW field of the capability register, which is only valid for the second-level page table. This causes the IOMMU driver to stop working if the hardware (or the emulated IOMMU) advertises only first-level translation capability and reports the SAGAW field as 0. This solves the above problem by considering both the first level and the second level when calculating the supported page table levels. Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-15 11:30:08 +02:00
John Sperbeck	a557ae0942	iommu/amd: use full 64-bit value in build_completion_wait() [ Upstream commit 94a568ce32038d8ff9257004bb4632e60eb43a49 ] We started using a 64 bit completion value. Unfortunately, we only stored the low 32-bits, so a very large completion value would never be matched in iommu_completion_wait(). Fixes: c69d89aff393 ("iommu/amd: Use 4K page for completion wait write-back semaphore") Signed-off-by: John Sperbeck <jsperbeck@google.com> Link: https://lore.kernel.org/r/20220801192229.3358786-1-jsperbeck@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-15 11:30:07 +02:00
Yunfei Wang	2097c78351	iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit [ Upstream commit bfdd231374181254742c5e2faef0bef2d30c0ee4 ] Single memory zone feature will remove ZONE_DMA32 and ZONE_DMA and cause pgtable PA size larger than 32bit. Since Mediatek IOMMU hardware support at most 35bit PA in pgtable, so add a quirk to allow the PA of pgtables support up to bit35. Signed-off-by: Ning Li <ning.li@mediatek.com> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220630092927.24925-2-yf.wang@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-25 11:40:41 +02:00
Alexander Lobakin	0b4c0003ae	iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE) [ Upstream commit b0b0b77ea611e3088e9523e60860f4f41b62b235 ] KASAN reports: [ 4.668325][ T0] BUG: KASAN: wild-memory-access in dmar_parse_one_rhsa (arch/x86/include/asm/bitops.h:214 arch/x86/include/asm/bitops.h:226 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/nodemask.h:415 drivers/iommu/intel/dmar.c:497) [ 4.676149][ T0] Read of size 8 at addr 1fffffff85115558 by task swapper/0/0 [ 4.683454][ T0] [ 4.685638][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc3-00004-g0e862838f290 #1 [ 4.694331][ T0] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016 [ 4.703196][ T0] Call Trace: [ 4.706334][ T0] <TASK> [ 4.709133][ T0] ? dmar_parse_one_rhsa (arch/x86/include/asm/bitops.h:214 arch/x86/include/asm/bitops.h:226 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/nodemask.h:415 drivers/iommu/intel/dmar.c:497) after converting the type of the first argument (@nr, bit number) of arch_test_bit() from `long` to `unsigned long`[0]. Under certain conditions (for example, when ACPI NUMA is disabled via command line), pxm_to_node() can return %NUMA_NO_NODE (-1). It is valid 'magic' number of NUMA node, but not valid bit number to use in bitops. node_online() eventually descends to test_bit() without checking for the input, assuming it's on caller side (which might be good for perf-critical tasks). There, -1 becomes %ULONG_MAX which leads to an insane array index when calculating bit position in memory. For now, add an explicit check for @node being not %NUMA_NO_NODE before calling test_bit(). The actual logics didn't change here at all. [0] `0e862838f2` Fixes: ee34b32d8c29 ("dmar: support for parsing Remapping Hardware Static Affinity structure") Cc: stable@vger.kernel.org # 2.6.33+ Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:22 +02:00

1 2 3 4 5 ...

3910 Commits