IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This has a set of swiotlb alignment fixes for sometimes very long
standing bugs from Will. We've been discussion them for a while and they
should be solid now.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmX/bmILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOuKQ//cUR3EywszAc04x8dIYsfegFGdQxUeJD0+1elAPss
ELiqrlg5A/Yn4uHKpXjWbvJ+v1Ywh3o8+vlgUiG4aFeg4xEd+FsJqm2SDa3jhdMP
2hV8pwB92kpkKCxyCAqx8O/4o4fY++KCFsOtnammEudFjurJaCrRTlauOn6D1t/i
JsBYCFtjFIhIPHQe7jmZ6dNiLEfiIJ+q8ImW+UxuB+gOGgU8C4VVW3tHuo3KeU7n
yVOcz4yJrQ4xYzG3RKtaU0FE0ybA860xwiA5oPvqpI9A2ISGovv7ik0QCUlHXhff
z+iL8Lj/KsOucq5pBDhbRYeN2n4VVogEwb/hut6mgyqj1ESjqeZaLioVHqOTDbmB
+vNTVBt6OGTOq1YkNKttK9vBBXs5RdZSBalzBG/QO1ewmrNVVZ7z8fWXVRDipoIl
sAIXmI8xAy5TNL6UbJ+RDfYeLlTzHjXGKQGB49gumOA8s4w5P5v9diYegX6GcVZV
PKkYLOvprwcyi8Xxx2mNxFDxh+LWqzMYqzwsN7AoRTW4TRc7Tel0G6Axs+V/cL/Y
23IHfFfT2HqDUM5PuBfUcgCrtw1hinuD80xqXVcvaU+AYoQhrGHJFLHkj6lTwV2b
hmuul170froI2A/vm8yGGqcn2Me55AexlpMab+UWL+iisGtqFTWi9b9vK/2Vi+Zj
wBg=
=Xaob
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"This has a set of swiotlb alignment fixes for sometimes very long
standing bugs from Will. We've been discussion them for a while and
they should be solid now"
* tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
swiotlb: Fix alignment checks when both allocation and DMA masks are present
swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
swiotlb: Enforce page alignment in swiotlb_alloc()
swiotlb: Fix double-allocation of slots due to broken alignment handling
The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
an NVME device can map a size between 300KB~512KB, which certainly failed
the swiotlb mappings, though the default pool of swiotlb has many slots:
systemd[1]: Started Journal Service.
=> nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
note: journal-offline[392] exited with irqs disabled
note: journal-offline[392] exited with preempt_count 1
Call trace:
[ 3.099918] swiotlb_tbl_map_single+0x214/0x240
[ 3.099921] iommu_dma_map_page+0x218/0x328
[ 3.099928] dma_map_page_attrs+0x2e8/0x3a0
[ 3.101985] nvme_prep_rq.part.0+0x408/0x878 [nvme]
[ 3.102308] nvme_queue_rqs+0xc0/0x300 [nvme]
[ 3.102313] blk_mq_flush_plug_list.part.0+0x57c/0x600
[ 3.102321] blk_add_rq_to_plug+0x180/0x2a0
[ 3.102323] blk_mq_submit_bio+0x4c8/0x6b8
[ 3.103463] __submit_bio+0x44/0x220
[ 3.103468] submit_bio_noacct_nocheck+0x2b8/0x360
[ 3.103470] submit_bio_noacct+0x180/0x6c8
[ 3.103471] submit_bio+0x34/0x130
[ 3.103473] ext4_bio_write_folio+0x5a4/0x8c8
[ 3.104766] mpage_submit_folio+0xa0/0x100
[ 3.104769] mpage_map_and_submit_buffers+0x1a4/0x400
[ 3.104771] ext4_do_writepages+0x6a0/0xd78
[ 3.105615] ext4_writepages+0x80/0x118
[ 3.105616] do_writepages+0x90/0x1e8
[ 3.105619] filemap_fdatawrite_wbc+0x94/0xe0
[ 3.105622] __filemap_fdatawrite_range+0x68/0xb8
[ 3.106656] file_write_and_wait_range+0x84/0x120
[ 3.106658] ext4_sync_file+0x7c/0x4c0
[ 3.106660] vfs_fsync_range+0x3c/0xa8
[ 3.106663] do_fsync+0x44/0xc0
Since untrusted devices might go down the swiotlb pathway with dma-iommu,
these devices should not map a size larger than swiotlb_max_mapping_size.
To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
the iommu_dma_opt_mapping_size().
Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
[will: Drop redundant is_swiotlb_active(dev) check]
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
iommu-dma does not explicitly reference min_align_mask since we already
assume that will be less than or equal to any typical IOVA granule.
We wouldn't realistically expect to see the case where it is larger, and
that would be non-trivial to support, however for the sake of reasoning
(particularly around the interaction with SWIOTLB), let's clearly
enforce the assumption.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/dbb4d2d8e5d1691ac9a6c67e9758904e6c447ba5.1709553942.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- Core changes:
- Fix race conditions in device probe path
- Retire IOMMU bus_ops
- Support for passing custom allocators to page table drivers
- Clean up Kconfig around IOMMU_SVA
- Support for sharing SVA domains with all devices bound to
a mm
- Firmware data parsing cleanup
- Tracing improvements for iommu-dma code
- Some smaller fixes and cleanups
- ARM-SMMU drivers:
- Device-tree binding updates:
- Add additional compatible strings for Qualcomm SoCs
- Document Adreno clocks for Qualcomm's SM8350 SoC
- SMMUv2:
- Implement support for the ->domain_alloc_paging() callback
- Ensure Secure context is restored following suspend of Qualcomm SMMU
implementation
- SMMUv3:
- Disable stalling mode for the "quiet" context descriptor
- Minor refactoring and driver cleanups
- Intel VT-d driver:
- Cleanup and refactoring
- AMD IOMMU driver:
- Improve IO TLB invalidation logic
- Small cleanups and improvements
- Rockchip IOMMU driver:
- DT binding update to add Rockchip RK3588
- Apple DART driver:
- Apple M1 USB4/Thunderbolt DART support
- Cleanups
- Virtio IOMMU driver:
- Add support for iotlb_sync_map
- Enable deferred IO TLB flushes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmWecQoACgkQK/BELZcB
GuN5ZxAAzC5QUKAzANx0puk7QhPpKKlbSvj6Q7iRgCLk00KJO1+VQh9v4ouCmXqF
kn3Ko8gddjhtrgwN0OQ54F39cLUrp1SBemy71K5YOR+vu8VKtwtmawZGeeRZ+k+B
Eohw58oaXTiR1maYvoLixLYczLrjklqyJOQ1vZ0GxFGxDqrFByAryHDgG/3OCpJx
C9e6PsLbbfhfqA8Kv97iKcBqniGbXxAMuodqSUG0buQ3oZgfpIP6Bt3EgUzFGPGk
3BTlYxowS/gkjUWd3fgjQFIFLTA01u9FhpA2Jb0a4v67pUCR64YxHN7rBQ6ZChtG
kB9laQfU9re79RsHhqQzr0JT9x/eyq7pzGzjp5TV5TPW6IW+sqjMIPhzd9P08Ef7
BclkCVobx0jSAHOhnnG4QJiKANr2Y2oM3HfsAJccMMY45RRhUKmVqM7jxMPfGn3A
i+inlee73xTjZXJse1EWG1fmKKMLvX9LDEp4DyOfn9CqVT+7hpZvzPjfbGr937Rm
JlwXhF3rQXEpOCagEsbt1vOf+V0e9QiCLf1Y2KpkIkDbE5wwSD/2qLm3tFhJG3oF
fkW+J14Cid0pj+hY0afGe0kOUOIYlimu0nFmSf0pzMH+UktZdKogSfyb1gSDsy+S
rsZRGPFhMJ832ExqhlDfxqBebqh+jsfKynlskui6Td5C9ZULaHA=
=q751
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core changes:
- Fix race conditions in device probe path
- Retire IOMMU bus_ops
- Support for passing custom allocators to page table drivers
- Clean up Kconfig around IOMMU_SVA
- Support for sharing SVA domains with all devices bound to a mm
- Firmware data parsing cleanup
- Tracing improvements for iommu-dma code
- Some smaller fixes and cleanups
ARM-SMMU drivers:
- Device-tree binding updates:
- Add additional compatible strings for Qualcomm SoCs
- Document Adreno clocks for Qualcomm's SM8350 SoC
- SMMUv2:
- Implement support for the ->domain_alloc_paging() callback
- Ensure Secure context is restored following suspend of Qualcomm
SMMU implementation
- SMMUv3:
- Disable stalling mode for the "quiet" context descriptor
- Minor refactoring and driver cleanups
Intel VT-d driver:
- Cleanup and refactoring
AMD IOMMU driver:
- Improve IO TLB invalidation logic
- Small cleanups and improvements
Rockchip IOMMU driver:
- DT binding update to add Rockchip RK3588
Apple DART driver:
- Apple M1 USB4/Thunderbolt DART support
- Cleanups
Virtio IOMMU driver:
- Add support for iotlb_sync_map
- Enable deferred IO TLB flushes"
* tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits)
iommu: Don't reserve 0-length IOVA region
iommu/vt-d: Move inline helpers to header files
iommu/vt-d: Remove unused vcmd interfaces
iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through()
iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly
iommu/sva: Fix memory leak in iommu_sva_bind_device()
dt-bindings: iommu: rockchip: Add Rockchip RK3588
iommu/dma: Trace bounce buffer usage when mapping buffers
iommu/arm-smmu: Convert to domain_alloc_paging()
iommu/arm-smmu: Pass arm_smmu_domain to internal functions
iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED
iommu/arm-smmu: Convert to a global static identity domain
iommu/arm-smmu: Reorganize arm_smmu_domain_add_master()
iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent()
iommu/arm-smmu-v3: Add a type for the STE
iommu/arm-smmu-v3: disable stall for quiet_cd
iommu/qcom: restore IOMMU state if needed
iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible
iommu/arm-smmu-qcom: Add missing GMU entry to match table
...
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When commit 82612d66d51d ("iommu: Allow the dma-iommu api to
use bounce buffers") was introduced, it did not add the logic
for tracing the bounce buffer usage from iommu_dma_map_page().
All of the users of swiotlb_tbl_map_single() trace their bounce
buffer usage, except iommu_dma_map_page(). This makes it difficult
to track SWIOTLB usage from that function. Thus, trace bounce buffer
usage from iommu_dma_map_page().
Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Cc: stable@vger.kernel.org # v5.15+
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Saravana Kannan <saravanak@google.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Link: https://lore.kernel.org/r/20231208234141.2356157-1-isaacmanjarres@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Flush queues currently use a fixed compile time size of 256 entries.
This being a power of 2 allows the compiler to use shift and mask
instead of more expensive modulo operations. With per-CPU flush queues
larger queue sizes would hit per-CPU allocation limits, with a single
flush queue these limits do not apply however. Also with single queues
being particularly suitable for virtualized environments with expensive
IOTLB flushes these benefit especially from larger queues and thus fewer
flushes.
To this end re-order struct iova_fq so we can use a dynamic array and
introduce the flush queue size and timeouts as new options in the
iommu_dma_options struct. So as not to lose the shift and mask
optimization, use a power of 2 for the length and use explicit shift and
mask instead of letting the compiler optimize this.
A large queue size and 1 second timeout is then set for the shadow on
flush case set by s390 paged memory guests. This then brings performance
on par with the previous s390 specific DMA API implementation.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> #s390
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-6-9e5fc4dacc36@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In some virtualized environments, including s390 paged memory guests,
IOTLB flushes are used to update IOMMU shadow tables. Due to this, they
are much more expensive than in typical bare metal environments or
non-paged s390 guests. In addition they may parallelize poorly in
virtualized environments. This changes the trade off for flushing IOVAs
such that minimizing the number of IOTLB flushes trumps any benefit of
cheaper queuing operations or increased paralellism.
In this scenario per-CPU flush queues pose several problems. Firstly
per-CPU memory is often quite limited prohibiting larger queues.
Secondly collecting IOVAs per-CPU but flushing via a global timeout
reduces the number of IOVAs flushed for each timeout especially on s390
where PCI interrupts may not be bound to a specific CPU.
Let's introduce a single flush queue mode that reuses the same queue
logic but only allocates a single global queue. This mode is selected by
dma-iommu if a newly introduced .shadow_on_flush flag is set in struct
dev_iommu. As a first user the s390 IOMMU driver sets this flag during
probe_device. With the unchanged small FQ size and timeouts this setting
is worse than per-CPU queues but a follow up patch will make the FQ size
and timeout variable. Together this allows the common IOVA flushing code
to more closely resemble the global flush behavior used on s390's
previous internal DMA API implementation.
Link: https://lore.kernel.org/all/9a466109-01c5-96b0-bf03-304123f435ee@arm.com/
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> #s390
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-5-9e5fc4dacc36@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
PCI SAC address trick") and its subsequent revert, this mechanism no
longer serves its original purpose, but now only works around broken
hardware/drivers in a way that is unfortunately too impactful to remove.
This does not, however, prevent us from solving the performance impact
which that workaround has on large-scale systems that don't need it.
Once the 32-bit IOVA space fills up and a workload starts allocating and
freeing on both sides of the boundary, the opportunistic SAC allocation
can then end up spending significant time hunting down scattered
fragments of free 32-bit space, or just reestablishing max32_alloc_size.
This can easily be exacerbated by a change in allocation pattern, such
as by changing the network MTU, which can increase pressure on the
32-bit space by leaving a large quantity of cached IOVAs which are now
the wrong size to be recycled, but also won't be freed since the
non-opportunistic allocations can still be satisfied from the whole
64-bit space without triggering the reclaim path.
However, in the context of a workaround where smaller DMA addresses
aren't simply a preference but a necessity, if we get to that point at
all then in fact it's already the endgame. The nature of the allocator
is currently such that the first IOVA we give to a device after the
32-bit space runs out will be the highest possible address for that
device, ever. If that works, then great, we know we can optimise for
speed by always allocating from the full range. And if it doesn't, then
the worst has already happened and any brokenness is now showing, so
there's little point in continuing to try to hide it.
To that end, implement a flag to refine the SAC business into a
per-device policy that can automatically get itself out of the way if
and when it stops being useful.
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Vasant Hegde <vasant.hegde@amd.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/b8502b115b915d2a3fabde367e099e39106686c8.1681392791.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- Core changes:
- iova_magazine_alloc() optimization
- Make flush-queue an IOMMU driver capability
- Consolidate the error handling around device attachment
- AMD IOMMU changes:
- AVIC Interrupt Remapping Improvements
- Some minor fixes and cleanups
- Intel VT-d changes from Lu Baolu:
- Small and misc cleanups
- ARM-SMMU changes from Will Deacon:
- Device-tree binding updates:
* Add missing clocks for SC8280XP and SA8775 Adreno SMMUs
* Add two new Qualcomm SMMUs in SDX75 and SM6375
- Workarounds for Arm MMU-700 errata:
* 1076982: Avoid use of SEV-based cmdq wakeup
* 2812531: Terminate command batches with a CMD_SYNC
* Enforce single-stage translation to avoid nesting-related errata
- Set the correct level hint for range TLB invalidation on teardown
- Some other minor fixes and cleanups (including Freescale PAMU and
virtio-iommu changes)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmSVnS0ACgkQK/BELZcB
GuM4txAAvtE5pMxM4V/9uTJt+de/vd8XiaH2kfQEULCJm2Yz07Z5+oE+QRtjPc2D
No+98IGMJCNOg+U+6JZ8P2GR3/soFvKdYjhY/iKTXK+C6jiy3dStIFN/KzzHkbpu
Y/fUZ5B+DizTO6837osDWIdAz3PcwV3Vk/ogHe3FoHWU13RJYOMp2FAox0QreBNE
kb7tK3ki/RCasbF9rMt9ClB0SZEVDysRkYF7AtXtsMNVm5jpQAITXVcNUYMeaJFL
n0J8hjn3EiZj7dgzxbL5bRgDyfPadwJkWz2BxkQ6x0gopgHu0EimGL8p2Bei2f8x
lv2y692L6zZth2ZgjSkecf3Lo4YHirsP/1U1zrLDjEgeBZ0vRxiX0qsvCb9692C1
+shy5jOX22ub+zJ2UFHMNGKu3ZdhcKi+meejdqM/GrHcRfZABh26bQILFnPF3Oxp
2WFb2v7Hq9qdQP50jsGbLji6n165aRW969fBdsk1uDUoCDHNOcdHQS3FsiKAAz5d
/Z/3PR9tQgnF9bDXJB6RbGJ1rQxHlfvarOQCAYiC02ALj4FnuSLiFSBLe1bI4InR
AgmnQaH2jmFMWHibdvj3q3sm33sLhOjmAE+ZX0YOhFfgrRGHq88qRwV53IfW477E
8a+6A+tnu28axk7yVJMvvz5/PeYkD2CMeplYQycUiaQutjvN9sk=
=aRMe
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core changes:
- iova_magazine_alloc() optimization
- Make flush-queue an IOMMU driver capability
- Consolidate the error handling around device attachment
AMD IOMMU changes:
- AVIC Interrupt Remapping Improvements
- Some minor fixes and cleanups
Intel VT-d changes from Lu Baolu:
- Small and misc cleanups
ARM-SMMU changes from Will Deacon:
- Device-tree binding updates:
- Add missing clocks for SC8280XP and SA8775 Adreno SMMUs
- Add two new Qualcomm SMMUs in SDX75 and SM6375
- Workarounds for Arm MMU-700 errata:
- 1076982: Avoid use of SEV-based cmdq wakeup
- 2812531: Terminate command batches with a CMD_SYNC
- Enforce single-stage translation to avoid nesting-related errata
- Set the correct level hint for range TLB invalidation on teardown
.. and some other minor fixes and cleanups (including Freescale PAMU
and virtio-iommu changes)"
* tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (50 commits)
iommu/vt-d: Remove commented-out code
iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one()
iommu/vt-d: Handle the failure case of dmar_reenable_qi()
iommu/vt-d: Remove unnecessary (void*) conversions
iommu/amd: Remove extern from function prototypes
iommu/amd: Use BIT/BIT_ULL macro to define bit fields
iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro
iommu/amd: Fix compile error for unused function
iommu/amd: Improving Interrupt Remapping Table Invalidation
iommu/amd: Do not Invalidate IRT when IRTE caching is disabled
iommu/amd: Introduce Disable IRTE Caching Support
iommu/amd: Remove the unused struct amd_ir_data.ref
iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga()
iommu/arm-smmu-v3: Set TTL invalidation hint better
iommu/arm-smmu-v3: Document nesting-related errata
iommu/arm-smmu-v3: Add explicit feature for nesting
iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
dt-bindings: arm-smmu: Add SDX75 SMMU compatible
dt-bindings: arm-smmu: Add SM6375 GPU SMMU
...
Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).
To avoid scanning the scatterlist on the 'sync' operations, introduce an
SG_DMA_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
dev_use_swiotlb() function together with the newly added
dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
kmalloc() buffers (suggested by Robin Murphy).
Link: https://lkml.kernel.org/r/20230612153201.554742-16-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sg_is_dma_bus_address() is inconsistent with the naming pattern of its
corresponding setters and its own kerneldoc, so take the majority vote and
rename it sg_dma_is_bus_address() (and fix up the missing underscores in
the kerneldoc too). This gives us a nice clear pattern where SG DMA flags
are SG_DMA_<NAME>, and the helpers for acting on them are
sg_dma_<action>_<name>().
Link: https://lkml.kernel.org/r/20230612153201.554742-14-catalin.marinas@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.com
Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It remains really handy to have distinct DMA domain types within core
code for the sake of default domain policy selection, but we can now
hide that detail from drivers by using the new capability instead.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1c552d99e8ba452bdac48209fa74c0bdd52fd9d9.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
MAX_ORDER is not inclusive: the maximum allocation order buddy allocator
can deliver is MAX_ORDER-1.
Fix MAX_ORDER usage in __iommu_dma_alloc_pages().
Also use GENMASK() instead of hard to read "(2U << order) - 1" magic.
Link: https://lkml.kernel.org/r/20230315113133.11326-10-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Merge patch-set from Jason:
"Let iommufd charge IOPTE allocations to the memory cgroup"
Description:
IOMMUFD follows the same design as KVM and uses memory cgroups to limit
the amount of kernel memory a iommufd file descriptor can pin down. The
various internal data structures already use GFP_KERNEL_ACCOUNT to charge
its own memory.
However, one of the biggest consumers of kernel memory is the IOPTEs
stored under the iommu_domain and these allocations are not tracked.
This series is the first step in fixing it.
The iommu driver contract already includes a 'gfp' argument to the
map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then
having the driver allocate the IOPTE tables with that flag will capture a
significant amount of the allocations.
Update the iommu_map() API to pass in the GFP argument, and fix all call
sites. Replace iommu_map_atomic().
Audit the "enterprise" iommu drivers to make sure they do the right thing.
Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is
problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are
already correct.
A follow up series will be needed to capture the allocations made when the
iommu_domain itself is allocated, which will complete the job.
Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com/
This function does an allocation of a buffer to return to the caller and
then goes on to allocate some internal memory, eg the scatterlist and
IOPTEs.
Instead of hard wiring GFP_KERNEL and a wrong GFP_ATOMIC, continue to use
the passed in gfp flags for all of the allocations. Clear the zone and
policy bits that are only relevant for the buffer allocation before
re-using them for internal allocations.
Auditing says this is never called from an atomic context, so the
GFP_ATOMIC is the incorrect flag.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Follow the pattern for iommu_map() and remove iommu_map_sg_atomic().
This allows __iommu_dma_alloc_noncontiguous() to use a GFP_KERNEL
allocation here, based on the provided gfp flags.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There is only one call site and it can now just pass the GFP_ATOMIC to the
normal iommu_map().
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The internal mechanisms support this, but instead of exposting the gfp to
the caller it wrappers it into iommu_map() and iommu_map_atomic()
Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/1-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For device tree nodes, use the standard of_iommu_get_resv_regions()
implementation to obtain the reserved memory regions associated with a
device.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: devicetree@vger.kernel.org
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20230120174251.4004100-5-thierry.reding@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
DMA allocations can never be turned back into a page pointer, so
requesting compound pages doesn't make sense and it can't even be
supported at all by various backends.
Reject __GFP_COMP with a warning in dma_alloc_attrs, and stop clearing
the flag in the arm dma ops and dma-iommu.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with
only a couple more public interfaces left pertaining to MSI integration.
Since these depend on the main IOMMU API header anyway, move their
declarations there, taking the opportunity to update the half-baked
comments to proper kerneldoc along the way.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This reverts commit 4bf7fda4dce22214c70c49960b1b6438e6260b67.
It turns out that it was hopelessly naive to think that this would work,
considering that we've always done this. The first machine I actually
tested this on broke at bootup, getting to
Reached target cryptsetup.target - Local Encrypted Volumes.
and then hanging. It's unclear what actually fails, since there's a lot
else going on around that time (eg amdgpu probing also happens around
that same time, but it could be some other random init thing that didn't
complete earlier and just caused the boot to hang at that point).
The expectations that we should default to some unsafe and untested mode
seems entirely unfounded, and the belief that this wouldn't affect
modern systems is clearly entirely false. The machine in question is
about two years old, so it's not exactly shiny, but it's also not some
dusty old museum piece PDP-11 in a closet.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin Murphy,
Christoph Hellwig)
- restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
- allow the IOMMU code to communicate an optional DMA mapping length
and use that in scsi and libata (John Garry)
- split the global swiotlb lock (Tianyu Lan)
- various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
Lukas Bulwahn, Robin Murphy)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmLuIYULHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPS5A//Ty1ZNyXExmwZ6J6g7/oIvQlpAHilDr22mCd8tR8Y
Ne7TgLa/X+usFvJTxJfkvg/LNMDjD7qx0J/mhDGm4reOFcEL4/PBy0rDSOgnmntV
k/fPhgwnpuztiAQ+s+WkJ3pkrmG1HaEId7GGj2JaoYdas6RX2mGX7vL8uvUFepjw
lYPAqWMtJHkOfsDK0PqqyQsr7dcC6lyFLqnn/wqvHtTJeKCfGs6W/SIrlWme2SZY
3dNx84ZR1uPjaazAmtf2IWfjh/TBmd0ETRYycgUUKRP9iwsCkBQDBwsBGSIYXiWj
BUKQ5oMvjAlUGRF0jYz9e77KuedE6GxWiXNQstitBmid142M37DHA5tvZRf65MPS
THHcjTDmmoaO4YfFhhXOcFOrjG4/V8bF7fgHB6XkHDjhVVTcnIx8zuOAXIVBZvIV
VAALmamBqEfIZZrCqgr7hzFssK2bip+TIMkdoD46Wcr+D7bAlujhuzWxubn9+ulT
23v/pAvC80ut6LvKj6EA+GpRm/pejfOtEbjXPoO2hguNxvuUKvPQqNh9hy0q+v1e
8n2Y/4lhy5bv02S7wKooNkfCoV753jBY1TIru45UmEYc3EkTQPii6okYe0DvW4QX
VCnKgo156wSBfE+9eWdxCROv2SZqJFMV/wL3vw54dpJQMbDy7VkNsh4mGREdUkU1
uek=
=Bv19
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- convert arm32 to the common dma-direct code (Arnd Bergmann, Robin
Murphy, Christoph Hellwig)
- restructure the PCIe peer to peer mapping support (Logan Gunthorpe)
- allow the IOMMU code to communicate an optional DMA mapping length
and use that in scsi and libata (John Garry)
- split the global swiotlb lock (Tianyu Lan)
- various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
Lukas Bulwahn, Robin Murphy)
* tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits)
swiotlb: fix passing local variable to debugfs_create_ulong()
dma-mapping: reformat comment to suppress htmldoc warning
PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
RDMA/rw: drop pci_p2pdma_[un]map_sg()
RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
nvme-pci: convert to using dma_map_sgtable()
nvme-pci: check DMA ops when indicating support for PCI P2PDMA
iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
iommu: Explicitly skip bus address marked segments in __iommu_map_sg()
dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
dma-direct: support PCI P2PDMA pages in dma-direct map_sg
dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
PCI/P2PDMA: Attempt to set map_type if it has not been set
lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
swiotlb: clean up some coding style and minor issues
dma-mapping: update comment after dmabounce removal
scsi: sd: Add a comment about limiting max_sectors to shost optimal limit
ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors
scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit
...
Call pci_p2pdma_map_segment() when a PCI P2PDMA page is seen so the bus
address is set in the dma address and the segment is marked with
sg_dma_mark_bus_address(). iommu_map_sg() will then skip these segments.
Then, in __finalise_sg(), copy the dma address from the input segment
to the output segment. __invalidate_sg() must also learn to skip these
segments.
A P2PDMA page may have three possible outcomes when being mapped:
1) If the data path between the two devices doesn't go through
the root port, then it should be mapped with a PCI bus address
2) If the data path goes through the host bridge, it should be mapped
normally with an IOMMU IOVA.
3) It is not possible for the two devices to communicate and thus
the mapping operation should fail (and it will return -EREMOTEIO).
Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.
With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Currently IORT provides a helper to retrieve HW MSI reserve regions.
Change this to a generic helper to retrieve any IORT related reserve
regions. This will be useful when we add support for RMR nodes in
subsequent patches.
[Lorenzo: For ACPI IORT]
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steven Price <steven.price@arm.com>
Tested-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220615101044.1972-4-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When many devices share the same iova domain, iommu_dma_init_domain()
may be called at the same time. The checking of iovad->start_pfn will
all get false in iommu_dma_init_domain() and both enter init_iova_domain()
to do iovad initialization.
Fix this by protecting init_iova_domain() with iommu_dma_cookie->mutex.
Exception backtrace:
rb_insert_color(param1=0xFFFFFF80CD2BDB40, param3=1) + 64
init_iova_domain() + 180
iommu_setup_dma_ops() + 260
arch_setup_dma_ops() + 132
of_dma_configure_id() + 468
platform_dma_configure() + 32
really_probe() + 1168
driver_probe_device() + 268
__device_attach_driver() + 524
__device_attach() + 524
bus_probe_device() + 64
deferred_probe_work_func() + 260
process_one_work() + 580
worker_thread() + 1076
kthread() + 332
ret_from_fork() + 16
Signed-off-by: Ning Li <ning.li@mediatek.com>
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/20220530120748.31733-1-yf.wang@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For devices stuck behind a conventional PCI bus, saving extra cycles at
33MHz is probably fairly significant. However since native PCI Express
is now the norm for high-performance devices, the optimisation to always
prefer 32-bit addresses for the sake of avoiding DAC is starting to look
rather anachronistic. Technically 32-bit addresses do have shorter TLPs
on PCIe, but unless the device is saturating its link bandwidth with
small transfers it seems unlikely that the difference is appreciable.
What definitely is appreciable, however, is that the IOVA allocator
doesn't behave all that well once the 32-bit space starts getting full.
As DMA working sets get bigger, this optimisation increasingly backfires
and adds considerable overhead to the dma_map path for use-cases like
high-bandwidth networking. We've increasingly bandaged the allocator
in attempts to mitigate this, but it remains fundamentally at odds with
other valid requirements to try as hard as possible to satisfy a request
within the given limit; what we really need is to just avoid this odd
notion of a speculative allocation when it isn't beneficial anyway.
Unfortunately that's where things get awkward... Having been present on
x86 for 15 years or so now, it turns out there are systems which fail to
properly define the upper limit of usable IOVA space for certain devices
and this trick was the only thing letting them work OK. I had a similar
ulterior motive for a couple of early arm64 systems when originally
adding it to iommu-dma, but those really should be fixed with proper
firmware bindings by now. Let's be brave and default it to off in the
hope that CI systems and developers will find and fix those bugs, but
expect that desktop-focused distro configs are likely to want to turn
it back on for maximum compatibility.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/3f06994f9f370f9d35b2630ab75171ecd2065621.1654782107.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Originally, creating the dma_ranges resource list in pre-sorted fashion
was the simplest and most efficient way to enforce the order required by
iova_reserve_pci_windows(). However since then at least one PCI host
driver is now re-sorting the list for its own probe-time processing,
which doesn't seem entirely unreasonable, so that basic assumption no
longer holds. Make iommu-dma robust and get the sort order it needs by
explicitly sorting, which means we can also save the effort at creation
time and just build the list in whatever natural order the DT had.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/35661036a7e4160850895f9b37f35408b6a29f2f.1652091160.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The data type of the return value of the iommu_map_sg_atomic
is ssize_t, but the data type of iova size is size_t,
e.g. one is int while the other is unsigned int.
When iommu_map_sg_atomic return value is compared with iova size,
it will force the signed int to be converted to unsigned int, if
iova map fails and iommu_map_sg_atomic return error code is less
than 0, then (ret < iova_len) is false, which will to cause not
do free iova, and the master can still successfully get the iova
of map fail, which is not expected.
Therefore, we need to check the return value of iommu_map_sg_atomic
in two cases according to whether it is less than 0.
Fixes: ad8f36e4b6b1 ("iommu: return full error code from iommu_map_sg[_atomic]()")
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Cc: <stable@vger.kernel.org> # 5.15.*
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/20220507085204.16914-1-yf.wang@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If the IOMMU is in use and an untrusted device is connected to an external
facing port but the address requested isn't page aligned will cause the
kernel to attempt to use bounce buffers.
If for some reason the bounce buffers have not been allocated this is a
problem that should be made apparent to the user.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220404204723.9767-3-mario.limonciello@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmJDDgsLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYM9oBAAxm93DZCXsqektM2qJ34o1KCyfAhvTvZ1r38ab+cl
wJwmMPF6/S9MCj6XZEnCzUnXL//TnhcuYVztNpPTWqhx6QaqWmmx9yJKjoYAnHce
svVMef7iipn35w7hAPpiVR/AVwWyxQCkSC+5sgp6XX8mp7l7I3ajfO0fZ52JCcxw
12d4k1E0yjC096Kw8wXQv+rzmCAoQcK9Jj20COUO3rkgOr68ZIXse2HXUJjn76Fy
wym2rJfqJ9mdKrDHqphe1ntIzkcQNWx9xR0UVh7/e4p7Si5H8Lp8QWwC7Zw6Y2Gb
paeotIMu1uTKkcZI4K54J8PXRLA7PLrDSDFdxnKOsWNZU/inIwt9b11kr9FOaYqR
BLJ+w6bF1/PmM6q2gkOwNuoiJD5YQfwF7y+wi84VyaauM0J8ssIHYnVrCWXn0m1E
4veAkWasAYb1oaoNlDhmZEbpI+kcN3xwDyK1WbtHuGvR00oSvxl0d1viGTVXYfDA
k5rBjb7CovK8JIrFIJoMiDM4TvdauxL66IlEL7ohLDh6l1f09Q0+gsdVcAM0ObX6
zOkoulyHCFqkePvoH/xpyIrZZ9cHA228fZYC7QiBcxdWlD3dFMWkKvhajiSDQJSW
SAz94CeEDWn64Q462N+ecivKlLwz7j/TqOig5xU+/6UoMC/2a7+HIim+p6bjh8Pc
5Gg=
=C+Es
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
* tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: extract a common header file for map_benchmark definition
dma-debug: fix return value of __setup handlers
dma-mapping: remove CONFIG_DMA_REMAP
media: v4l2-pci-skeleton: Remove usage of the deprecated "pci-dma-compat.h" API
rapidio/tsi721: Remove usage of the deprecated "pci-dma-compat.h" API
sparc: Remove usage of the deprecated "pci-dma-compat.h" API
agp/intel: Remove usage of the deprecated "pci-dma-compat.h" API
alpha: Remove usage of the deprecated "pci-dma-compat.h" API
MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
swiotlb: simplify array allocation
swiotlb: tidy up includes
swiotlb: simplify debugfs setup
swiotlb: do not zero buffer in set_memory_decrypted()
CONFIG_DMA_REMAP is used to build a few helpers around the core
vmalloc code, and to use them in case there is a highmem page in
dma-direct, and to make dma coherent allocations be able to use
non-contiguous pages allocations for DMA allocations in the dma-iommu
layer.
Right now it needs to be explicitly selected by architectures, and
is only done so by architectures that require remapping to deal
with devices that are not DMA coherent. Make it unconditional for
builds with CONFIG_MMU as it is very little extra code, but makes
it much more likely that large DMA allocations succeed on x86.
This fixes hot plugging a NVMe thunderbolt SSD for me, which tries
to allocate a 1MB buffer that is otherwise hard to obtain due to
memory fragmentation on a heavily used laptop.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Currently the rcache structures are allocated for all IOVA domains, even if
they do not use "fast" alloc+free interface. This is wasteful of memory.
In addition, fails in init_iova_rcaches() are not handled safely, which is
less than ideal.
Make "fast" users call a separate rcache init explicitly, which includes
error checking.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/1643882360-241739-1-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Complete the move into iommu-dma by refactoring the flush queues
themselves to belong to the DMA cookie rather than the IOVA domain.
The refactoring may as well extend to some minor cosmetic aspects
too, to help us stay one step ahead of the style police.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/24304722005bc6f144e2a1fdd865d1465722fc2e.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Flush queues are specific to DMA ops, which are now handled exclusively
by iommu-dma. As such, now that the historical artefacts from being
shared directly with drivers have been cleaned up, move the flush queue
code into iommu-dma itself to get it out of the way of other IOVA users.
This is pure code movement with no functional change; refactoring to
clean up the headers and definitions will follow.
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1d9a1ee1392e96eaae5e6467181b3e83edfdfbad.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
page->freelist is for the use of slab. We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the Intel IOMMU and IOVA code over to using free_pages_list().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[rm: split from original patch, cosmetic tweaks, fix fq entries]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/2115b560d9a0ce7cd4b948bd51a2b7bde8fdfd59.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Once again, with iommu-dma now being the only flush queue user, we no
longer need the extra level of indirection through flush_cb. Squash that
and let the flush queue code call the domain method directly. This does
mean temporarily having to carry an additional copy of the IOMMU domain
pointer around instead, but only until a later patch untangles it again.
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e3f9b4acdd6640012ef4fbc819ac868d727b64a9.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All flush queues are driven by iommu-dma now, so there is no need to
abstract entry_dtor or its data any more. Squash the now-canonical
implementation directly into the IOVA code to get it out of the way.
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/2260f8de00ab5e0f9d2a1cf8978e6ae7cd4f182c.1639753638.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It really is a property of the IOVA rcache code that we need to alloc a
power-of-2 size, so relocate the functionality to resize into
alloc_iova_fast(), rather than the callsites.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1638875846-23993-1-git-send-email-john.garry@huawei.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
scsi_dma_map() was reporting a failure during boot on an AMD machine
with the IOMMU enabled.
scsi_dma_map failed: request for 36 bytes!
The issue was tracked down to a mistake in logic: should not return
an error if iommu_deferred_attach() returns zero.
Reported-by: Marshall Midden <marshallmidden@gmail.com>
Fixes: dabb16f67215 ("iommu/dma: return error code from iommu_dma_map_sg()")
Link: https://lore.kernel.org/all/CAD2CkAWjS8=kKwEEN4cgVNjyFORUibzEiCUA-X+SMtbo0JoMmA@mail.gmail.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211027174757.119755-1-logang@deltatee.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pass the non-aligned size to __iommu_dma_map when using swiotlb bounce
buffers in iommu_dma_map_page, to account for min_align_mask.
To deal with granule alignment, __iommu_dma_map maps iova_align(size +
iova_off) bytes starting at phys - iova_off. If iommu_dma_map_page
passes aligned size when using swiotlb, then this becomes
iova_align(iova_align(orig_size) + iova_off). Normally iova_off will be
zero when using swiotlb. However, this is not the case for devices that
set min_align_mask. When iova_off is non-zero, __iommu_dma_map ends up
mapping an extra page at the end of the buffer. Beyond just being a
security issue, the extra page is not cleaned up by __iommu_dma_unmap.
This causes problems when the IOVA is reused, due to collisions in the
iommu driver. Just passing the original size is sufficient, since
__iommu_dma_map will take care of granule alignment.
Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
Signed-off-by: David Stevens <stevensd@chromium.org>
Link: https://lore.kernel.org/r/20210929023300.335969-8-stevensd@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add an argument to swiotlb_tbl_map_single that specifies the desired
alignment of the allocated buffer. This is used by dma-iommu to ensure
the buffer is aligned to the iova granule size when using swiotlb with
untrusted sub-granule mappings. This addresses an issue where adjacent
slots could be exposed to the untrusted device if IO_TLB_SIZE < iova
granule < PAGE_SIZE.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210929023300.335969-7-stevensd@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce a new dev_use_swiotlb function to guard swiotlb code, instead
of overloading dev_is_untrusted. This allows CONFIG_SWIOTLB to be
checked more broadly, so the swiotlb related code can be removed more
aggressively.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210929023300.335969-6-stevensd@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fold the _swiotlb helper functions into the respective _page functions,
since recent fixes have moved all logic from the _page functions to the
_swiotlb functions.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210929023300.335969-5-stevensd@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>