linux/drivers/iommu
Adrian Huang b00833768e iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:

pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
      0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
      fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
      9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS:  0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 intel_pasid_alloc_table+0x9c/0x1d0
 dmar_insert_one_dev_info+0x423/0x540
 ? device_to_iommu+0x12d/0x2f0
 intel_iommu_attach_device+0x116/0x290
 __iommu_attach_device+0x1a/0x90
 iommu_group_add_device+0x190/0x2c0
 __iommu_probe_device+0x13e/0x250
 iommu_probe_device+0x24/0x150
 iommu_bus_notifier+0x69/0x90
 blocking_notifier_call_chain+0x5a/0x80
 device_add+0x3db/0x7b0
 ? arch_memremap_can_ram_remap+0x19/0x50
 ? memremap+0x75/0x140
 pci_device_add+0x193/0x1d0
 pci_scan_single_device+0xb9/0xf0
 pci_scan_slot+0x4c/0x110
 pci_scan_child_bus_extend+0x3a/0x290
 vmd_enable_domain.constprop.0+0x63e/0x820
 vmd_probe+0x163/0x190
 local_pci_probe+0x42/0x80
 work_for_cpu_fn+0x13/0x20
 process_one_work+0x1e2/0x3b0
 worker_thread+0x1c4/0x3a0
 ? rescuer_thread+0x370/0x370
 kthread+0xc7/0xf0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:

  $ lspci
  ...
  0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
  ...
  10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
  10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
  10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
  10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
  10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
  10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]

The symptom 'list_add double add' is caused by the following failure
message:

  pci 10000:80:01.0: DMAR: Setup RID2PASID failed
  pci 10000:80:01.0: Failed to add to iommu group 42: -16
  pci 10000:80:03.0: [8086:352b] type 01 class 0x060400

Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
device 0000:59:00.5. Here is call path:

  intel_pasid_alloc_table
    pci_for_each_dma_alias
     get_alias_pasid_table
       search_pasid_table

pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
which is the VMD device 0000:59:00.5. However, pte of the VMD device
0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
Adding to iommu group 42". So, the status -EBUSY is returned when
configuring pasid entry for device 10000:80:01.0.

It then invokes dmar_remove_one_dev_info() to release
'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
table is not released because of the following statement in
__dmar_remove_one_dev_info():

	if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
		...
		intel_pasid_free_table(info->dev);
        }

The subsequent dmar_insert_one_dev_info() operation of device
10000:80:03.0 allocates 'struct device_domain_info *' from
iommu_devinfo_cache. The allocated address is the same address that
is released previously for device 10000:80:01.0. Finally, invoking
device_attach_pasid_table() causes the issue.

`git bisect` points to the offending commit 474dd1c650 ("iommu/vt-d:
Fix clearing real DMA device's scalable-mode context entries"), which
releases the pasid table if the device is not the subdevice by
checking the returned status of dev_is_real_dma_subdevice().
Reverting the offending commit can work around the issue.

The solution is to prevent from allocating pasid table if those
devices are subdevices of the VMD device.

Fixes: 474dd1c650 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28 13:48:44 +01:00
..
amd iommu/amd: Fix I/O page table memory leak 2022-02-14 12:52:40 +01:00
arm Rework of the MSI interrupt infrastructure: 2022-01-13 09:05:29 -08:00
intel iommu/vt-d: Fix double list_add when enabling VMD in scalable mode 2022-02-28 13:48:44 +01:00
apple-dart.c pci-v5.16-changes 2021-11-06 14:36:12 -07:00
dma-iommu.c iommu: Move flush queue data into iommu_dma_cookie 2021-12-20 09:03:05 +01:00
exynos-iommu.c iommu/exynos: Drop IOVA cookie management 2021-08-18 13:25:31 +02:00
fsl_pamu_domain.c iommu: Streamline registration interface 2021-04-16 17:20:45 +02:00
fsl_pamu_domain.h iommu/fsl_pamu: remove the snoop_id field 2021-04-07 10:56:52 +02:00
fsl_pamu.c iommu/fsl_pamu: hardcode the window address and size in pamu_config_ppaace 2021-04-07 10:56:52 +02:00
fsl_pamu.h iommu/fsl_pamu: hardcode the window address and size in pamu_config_ppaace 2021-04-07 10:56:52 +02:00
hyperv-iommu.c iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition 2021-02-11 08:47:07 +00:00
io-pgfault.c iommu: Add a page fault handler 2021-04-07 10:54:29 +02:00
io-pgtable-arm-v7s.c iommu/io-pgtable-arm-v7s: Add error handle for page table allocation failure 2021-12-14 14:45:35 +00:00
io-pgtable-arm.c iommu/io-pgtable-arm: Fix table descriptor paddr formatting 2021-12-06 13:03:01 +01:00
io-pgtable-arm.h iommu/io-pgtable-arm: Move some definitions to a header 2020-09-28 23:48:06 +01:00
io-pgtable.c iommu/io-pgtable: Add DART pagetable format 2021-08-12 13:15:02 +02:00
ioasid.c iommu: Fix some W=1 warnings 2022-01-31 16:49:54 +01:00
iommu-debugfs.c
iommu-sva-lib.c iommu/sva: Add PASID helpers 2020-11-23 14:16:55 +00:00
iommu-sva-lib.h iommu: Add a page fault handler 2021-04-07 10:54:29 +02:00
iommu-sysfs.c
iommu-traces.c
iommu.c iommu: Fix some W=1 warnings 2022-01-31 16:49:54 +01:00
iova.c iommu: Move flush queue data into iommu_dma_cookie 2021-12-20 09:03:05 +01:00
ipmmu-vmsa.c iommu/ipmmu-vmsa: Hook up r8a77980 DT matching code 2021-09-28 11:43:50 +02:00
irq_remapping.c x86: Kill all traces of irq_remapping_get_irq_domain() 2020-10-28 20:26:28 +01:00
irq_remapping.h x86: Kill all traces of irq_remapping_get_irq_domain() 2020-10-28 20:26:28 +01:00
Kconfig iommu/arm: fix ARM_SMMU_QCOM compilation 2021-10-13 21:28:44 +02:00
Makefile iommu/dart: Add DART iommu driver 2021-08-12 13:15:02 +02:00
msm_iommu_hw-8xxx.h
msm_iommu.c iommu: Drop unnecessary of_iommu.h includes 2021-06-08 14:15:46 +02:00
msm_iommu.h
mtk_iommu_v1.c iommu/mtk: Drop IOVA cookie management 2021-08-18 13:25:32 +02:00
mtk_iommu.c iommu/mediatek: Fix out-of-range warning with clang 2021-09-28 11:59:11 +02:00
mtk_iommu.h iommu/mediatek: Add mt8192 support 2021-02-01 11:31:19 +00:00
of_iommu.c iommu: Remove unused of_get_dma_window() 2021-06-08 14:15:46 +02:00
omap-iommu-debug.c
omap-iommu.c iommu: Fix some W=1 warnings 2022-01-31 16:49:54 +01:00
omap-iommu.h
omap-iopgtable.h
rockchip-iommu.c iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 2021-11-26 22:54:20 +01:00
s390-iommu.c s390/pci: use physical addresses in DMA tables 2021-12-06 14:42:26 +01:00
sprd-iommu.c iommu/sprd: Drop IOVA cookie management 2021-08-18 13:25:32 +02:00
sun50i-iommu.c iommu/sun50i: Drop IOVA cookie management 2021-08-18 13:25:32 +02:00
tegra-gart.c iommu: Streamline registration interface 2021-04-16 17:20:45 +02:00
tegra-smmu.c iommu/tegra-smmu: Use devm_bitmap_zalloc when applicable 2021-10-18 13:39:41 +02:00
virtio-iommu.c virtio,vdpa,qemu_fw_cfg: features, cleanups, fixes 2022-01-18 10:05:48 +02:00