ea01fa7031
When IOMMU is on, the actual synchronization happens in the same cases as with the direct DMA. Advertise %DMA_F_CAN_SKIP_SYNC in IOMMU DMA to skip sync ops calls (indirect) for non-SWIOTLB buffers. perf profile before the patch: 18.53% [kernel] [k] gq_rx_skb 14.77% [kernel] [k] napi_reuse_skb 8.95% [kernel] [k] skb_release_data 5.42% [kernel] [k] dev_gro_receive 5.37% [kernel] [k] memcpy <*> 5.26% [kernel] [k] iommu_dma_sync_sg_for_cpu 4.78% [kernel] [k] tcp_gro_receive <*> 4.42% [kernel] [k] iommu_dma_sync_sg_for_device 4.12% [kernel] [k] ipv6_gro_receive 3.65% [kernel] [k] gq_pool_get 3.25% [kernel] [k] skb_gro_receive 2.07% [kernel] [k] napi_gro_frags 1.98% [kernel] [k] tcp6_gro_receive 1.27% [kernel] [k] gq_rx_prep_buffers 1.18% [kernel] [k] gq_rx_napi_handler 0.99% [kernel] [k] csum_partial 0.74% [kernel] [k] csum_ipv6_magic 0.72% [kernel] [k] free_pcp_prepare 0.60% [kernel] [k] __napi_poll 0.58% [kernel] [k] net_rx_action 0.56% [kernel] [k] read_tsc <*> 0.50% [kernel] [k] __x86_indirect_thunk_r11 0.45% [kernel] [k] memset After patch, lines with <*> no longer show up, and overall cpu usage looks much better (~60% instead of ~72%): 25.56% [kernel] [k] gq_rx_skb 9.90% [kernel] [k] napi_reuse_skb 7.39% [kernel] [k] dev_gro_receive 6.78% [kernel] [k] memcpy 6.53% [kernel] [k] skb_release_data 6.39% [kernel] [k] tcp_gro_receive 5.71% [kernel] [k] ipv6_gro_receive 4.35% [kernel] [k] napi_gro_frags 4.34% [kernel] [k] skb_gro_receive 3.50% [kernel] [k] gq_pool_get 3.08% [kernel] [k] gq_rx_napi_handler 2.35% [kernel] [k] tcp6_gro_receive 2.06% [kernel] [k] gq_rx_prep_buffers 1.32% [kernel] [k] csum_partial 0.93% [kernel] [k] csum_ipv6_magic 0.65% [kernel] [k] net_rx_action iavf yields +10% of Mpps on Rx. This also unblocks batched allocations of XSk buffers when IOMMU is active. Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
---|---|---|
.. | ||
amd | ||
arm | ||
intel | ||
iommufd | ||
apple-dart.c | ||
dma-iommu.c | ||
dma-iommu.h | ||
exynos-iommu.c | ||
fsl_pamu_domain.c | ||
fsl_pamu_domain.h | ||
fsl_pamu.c | ||
fsl_pamu.h | ||
hyperv-iommu.c | ||
io-pgfault.c | ||
io-pgtable-arm-v7s.c | ||
io-pgtable-arm.c | ||
io-pgtable-arm.h | ||
io-pgtable-dart.c | ||
io-pgtable.c | ||
iommu-debugfs.c | ||
iommu-priv.h | ||
iommu-sva.c | ||
iommu-sysfs.c | ||
iommu-traces.c | ||
iommu.c | ||
iova.c | ||
ipmmu-vmsa.c | ||
irq_remapping.c | ||
irq_remapping.h | ||
Kconfig | ||
Makefile | ||
msm_iommu_hw-8xxx.h | ||
msm_iommu.c | ||
msm_iommu.h | ||
mtk_iommu_v1.c | ||
mtk_iommu.c | ||
of_iommu.c | ||
omap-iommu-debug.c | ||
omap-iommu.c | ||
omap-iommu.h | ||
omap-iopgtable.h | ||
rockchip-iommu.c | ||
s390-iommu.c | ||
sprd-iommu.c | ||
sun50i-iommu.c | ||
tegra-smmu.c | ||
virtio-iommu.c |