linux/drivers/pci
Dexuan Cui c234ba8042 PCI: hv: Only reuse existing IRTE allocation for Multi-MSI
Jeffrey added Multi-MSI support to the pci-hyperv driver by the 4 patches:
08e61e861a ("PCI: hv: Fix multi-MSI to allow more than one MSI vector")
455880dfe2 ("PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI")
b4b77778ec ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
a2bad844a6 ("PCI: hv: Fix interrupt mapping for multi-MSI")

It turns out that the third patch (b4b77778ec) causes a performance
regression because all the interrupts now happen on 1 physical CPU (or two
pCPUs, if one pCPU doesn't have enough vectors). When a guest has many PCI
devices, it may suffer from soft lockups if the workload is heavy, e.g.,
see https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/

Commit b4b77778ec itself is good. The real issue is that the hypercall in
hv_irq_unmask() -> hv_arch_irq_unmask() ->
hv_do_hypercall(HVCALL_RETARGET_INTERRUPT...) only changes the target
virtual CPU rather than physical CPU; with b4b77778ec, the pCPU is
determined only once in hv_compose_msi_msg() where only vCPU0 is specified;
consequently the hypervisor only uses 1 target pCPU for all the interrupts.

Note: before b4b77778ec, the pCPU is determined twice, and when the pCPU
is determined the second time, the vCPU in the effective affinity mask is
used (i.e., it isn't always vCPU0), so the hypervisor chooses different
pCPU for each interrupt.

The hypercall will be fixed in future to update the pCPU as well, but
that will take quite a while, so let's restore the old behavior in
hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
single-MSI and MSI-X; for multi-MSI, we choose the vCPU in a round-robin
manner for each PCI device, so the interrupts of different devices can
happen on different pCPUs, though the interrupts of each device happen on
some single pCPU.

The hypercall fix may not be backported to all old versions of Hyper-V, so
we want to have this guest side change forever (or at least till we're sure
the old affected versions of Hyper-V are no longer supported).

Fixes: b4b77778ec ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Co-developed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221104222953.11356-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-12 12:43:59 +00:00
..
controller PCI: hv: Only reuse existing IRTE allocation for Multi-MSI 2022-11-12 12:43:59 +00:00
endpoint Fix of heap data and clang warnings, support for a new Intel NTB device, 2022-08-13 14:00:45 -07:00
hotplug PCI: hotplug: Clean up include files 2022-04-05 11:13:33 -05:00
msi PCI/MSI: Correct 'can_mask' test in msi_add_msi_desc() 2022-08-26 10:47:54 -05:00
pcie Merge branch 'pci/pm' 2022-10-05 17:32:53 -05:00
switch PCI: switchtec: Prefer ida_alloc()/free() over ida_simple_get()/remove() 2022-06-09 12:28:21 -05:00
access.c PCI: Reduce warnings on possible RW1C corruption 2022-03-04 15:59:52 -06:00
ats.c
bus.c
doe.c PCI/DOE: Add DOE mailbox support functions 2022-07-19 15:38:04 -07:00
ecam.c
host-bridge.c PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus 2021-09-02 17:59:58 +02:00
iov.c PCI/IOV: Fix wrong kernel-doc identifier 2022-03-07 12:06:10 -07:00
irq.c
Kconfig cxl for 6.0 2022-08-10 11:07:26 -07:00
Makefile PCI/DOE: Add DOE mailbox support functions 2022-07-19 15:38:04 -07:00
mmap.c PCI: Remove pci_mmap_page_range() wrapper 2022-07-29 12:08:44 -05:00
of.c IOMMU Updates for Linux v5.19 2022-05-31 09:56:54 -07:00
p2pdma.c PCI/P2PDMA: Use for_each_pci_dev() helper 2022-09-19 13:44:38 -05:00
pci-acpi.c PCI/ACPI: Update link to PCI firmware specification 2022-07-22 14:38:38 -05:00
pci-bridge-emul.c PCI: pci-bridge-emul: Set position of PCI capabilities to real HW value 2022-08-25 12:07:56 +02:00
pci-bridge-emul.h PCI: pci-bridge-emul: Set position of PCI capabilities to real HW value 2022-08-25 12:07:56 +02:00
pci-driver.c PCI/PM: Simplify pci_pm_suspend_noirq() 2022-09-12 15:30:18 -05:00
pci-label.c
pci-mid.c PCI: PM: Do not use pci_platform_pm_ops for Intel MID PM 2021-09-27 17:13:21 +02:00
pci-pf-stub.c
pci-stub.c PCI: pci_stub: Set driver_managed_dma 2022-04-28 15:32:20 +02:00
pci-sysfs.c PCI: Expose PCIe Resizable BAR support via sysfs 2022-10-05 12:21:02 -05:00
pci.c Merge branch 'pci/pm' 2022-10-05 17:32:53 -05:00
pci.h Merge branch 'remotes/lorenzo/pci/misc' 2022-10-05 17:32:57 -05:00
probe.c PCI: Fix typo in pci_scan_child_bus_extend() 2022-09-21 14:50:35 -05:00
proc.c PCI: Remove pci_mmap_page_range() wrapper 2022-07-29 12:08:44 -05:00
quirks.c PCI/DPC: Quirk PIO log size for certain Intel Root Ports 2022-09-27 18:13:18 -05:00
remove.c
rom.c PCI: Prefer 'unsigned int' over bare 'unsigned' 2021-10-27 13:41:22 -05:00
search.c
setup-bus.c Revert "PCI: Distribute available resources for root buses, too" 2022-10-14 14:27:58 -05:00
setup-irq.c PCI: Tidy comments 2021-09-28 13:43:17 -05:00
setup-res.c PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge 2022-09-21 17:52:47 -05:00
slot.c PCI/sysfs: Use default_groups in kobj_type for slot attrs 2021-12-29 13:42:04 -06:00
syscall.c
vc.c
vgaarb.c PCI/VGA: Replace full MIT license text with SPDX identifier 2022-03-09 18:31:34 -06:00
vpd.c PCI/VPD: Use pci_read_vpd_any() in pci_vpd_size() 2021-10-25 19:12:23 -05:00
xen-pcifront.c xen/pcifront: move xenstore config scanning into sub-function 2022-10-07 07:36:44 +02:00