IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmWldYsUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyxUhAAs2ctoK/sMAfTOO2b1UAD/ig7CGGz
DlDt38RezFU4uqeY0Ix4heFs3RIt8YGuns76Fejfyevh1I7SOA9lbhFuMLBfO9j0
LU+KuZeGoXtIe5Kd6hCQIUgVvwISs407yp7JUUzqxFQ2rv7bin64xiDb407ZQGaK
5v4oRsnQn1KBhgZ2wfQ/S+adAma9IroK9F3C/Bm+IJ+mpNxJcbWPqnf9+5ExoxzU
MFyu0azan1crqWA/geJBetL4zVoRJx4qNEve0gqwk06vwLeIKyzB2jPO5dmn9pAb
kfAFCQgtTUGZHvZWyBZMWQcMKEQLSupOLYXU4b2Vf+oR9U0jvevqs3LArBsUceM9
vQw8Vg9RZiWs9lVeVYSQErYQecMhdiHYCXFuteaNH9tvATN4PumXiT2ZM9OsX6uy
jrXW7YLawJbGLIDNsAyrn8JESzY/CsRPpCIUq3JzL2VQdInC3mEl18rTEuKTBeZF
zE/RgwudhWDT58/vceS2LHa5KNd/vAzMTmUHEUwHg1N7TV3qkSgpPaVcvx4KklXv
1nKT2KcfD5K1Yy/InjxUYdGhRPYa7azl+l7W4hJ+NCGxwL+tUCg3knp80+empTJ0
mZm6/VSbc245nKjx3ydLlTbQ/xNMQXgHHDKPW6eO4ezZaydJZG2xkK3x6eF1+i0k
PWHSLjUxrK1AGrg=
=ri0M
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Reserve ECAM so we don't assign it to PCI BARs; this works around
bugs where BIOS included ECAM in a PNP0A03 host bridge window,
didn't reserve it via a PNP0C02 motherboard device, and didn't
allocate space for SR-IOV VF BARs (Bjorn Helgaas)
- Add MMCONFIG/ECAM debug logging (Bjorn Helgaas)
- Rename 'MMCONFIG' to 'ECAM' to match spec usage (Bjorn Helgaas)
- Log device type (Root Port, Switch Port, etc) during enumeration
(Bjorn Helgaas)
- Log bridges before downstream devices so the dmesg order is more
logical (Bjorn Helgaas)
- Log resource names (BAR 0, VF BAR 0, bridge window, etc)
consistently instead of a mix of names and "reg 0x10" (Puranjay
Mohan, Bjorn Helgaas)
- Fix 64GT/s effective data rate calculation to use 1b/1b encoding
rather than the 8b/10b or 128b/130b used by lower rates (Ilpo
Järvinen)
- Use PCI_HEADER_TYPE_* instead of literals in x86, powerpc, SCSI
lpfc (Ilpo Järvinen)
- Clean up open-coded PCIBIOS return code mangling (Ilpo Järvinen)
Resource management:
- Restructure pci_dev_for_each_resource() to avoid computing the
address of an out-of-bounds array element (the bounds check was
performed later so the element was never actually *read*, but it's
nicer to avoid even computing an out-of-bounds address) (Andy
Shevchenko)
Driver binding:
- Convert pci-host-common.c platform .remove() callback to
.remove_new() returning 'void' since it's not useful to return
error codes here (Uwe Kleine-König)
- Convert exynos, keystone, kirin from .remove() to .remove_new(),
which returns void instead of int (Uwe Kleine-König)
- Drop unused struct pci_driver.node member (Mathias Krause)
Virtualization:
- Add ACS quirk for more Zhaoxin Root Ports (LeoLiuoc)
Error handling:
- Log AER errors as "Correctable" (not "Corrected") or
"Uncorrectable" to match spec terminology (Bjorn Helgaas)
- Decode Requester ID when no error info found instead of printing
the raw hex value (Bjorn Helgaas)
Endpoint framework:
- Use a unique test pattern for each BAR in the pci_endpoint_test to
make it easier to debug address translation issues (Niklas Cassel)
Broadcom STB PCIe controller driver:
- Add DT property "brcm,clkreq-mode" and driver support for different
CLKREQ# modes to make ASPM L1.x states possible (Jim Quinlan)
Freescale Layerscape PCIe controller driver:
- Add suspend/resume support for Layerscape LS1043a and LS1021a,
including software-managed PME_Turn_Off and transitions between L0,
L2/L3_Ready Link states (Frank Li)
MediaTek PCIe controller driver:
- Clear MSI interrupt status before handler to avoid missing MSIs
that occur after the handler (qizhong cheng)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 translation window setup to handle MMIO space
that is not a power of two in size (Jianjun Wang)
Qualcomm PCIe controller driver:
- Increase qcom iommu-map maxItems to accommodate SDX55 (five
entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski)
- Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
Kozlowski)
- Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
Kozlowski)
- Correct the qcom "reset-name" property, previously incorrectly
called "reset-names" (Krzysztof Kozlowski)
- Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil
Armstrong)
Renesas R-Car PCIe controller driver:
- Replace of_device.h with explicit of.h include to untangle header
usage (Rob Herring)
- Add DT and driver support for optional miniPCIe 1.5v and 3.3v
regulators on KingFisher (Wolfram Sang)
SiFive FU740 PCIe controller driver:
- Convert fu740 CONFIG_PCIE_FU740 dependency from SOC_SIFIVE to
ARCH_SIFIVE (Conor Dooley)
Synopsys DesignWare PCIe controller driver:
- Align iATU mapping for endpoint MSI-X (Niklas Cassel)
- Drop "host_" prefix from struct dw_pcie_host_ops members (Yoshihiro
Shimoda)
- Drop "ep_" prefix from struct dw_pcie_ep_ops members (Yoshihiro
Shimoda)
- Rename struct dw_pcie_ep_ops.func_conf_select() to
.get_dbi_offset() to be more descriptive (Yoshihiro Shimoda)
- Add Endpoint DBI accessors to encapsulate offset lookups (Yoshihiro
Shimoda)
TI J721E PCIe driver:
- Add j721e DT and driver support for 'num-lanes' for devices that
support x1, x2, or x4 Links (Matt Ranostay)
- Add j721e DT compatible strings and driver support for j784s4 (Matt
Ranostay)
- Make TI J721E Kconfig depend on ARCH_K3 since the hardware is
specific to those TI SoC parts (Peter Robinson)
TI Keystone PCIe controller driver:
- Hold power management references to all PHYs while enabling them to
avoid a race when one provides clocks to others (Siddharth
Vadapalli)
Xilinx XDMA PCIe controller driver:
- Remove redundant dev_err(), since platform_get_irq() and
platform_get_irq_byname() already log errors (Yang Li)
- Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
(Krzysztof Wilczyński)
- Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
irq_domain_add_linear() fails (Harshit Mogalapalli)
MicroSemi Switchtec management driver:
- Do dma_mrpc cleanup during switchtec_pci_remove() to match its devm
ioremapping in switchtec_pci_probe(). Previously the cleanup was
done in stdev_release(), which used stale pointers if stdev->cdev
happened to be open when the PCI device was removed (Daniel
Stodden)
Miscellaneous:
- Convert interrupt terminology from "legacy" to "INTx" to be more
specific and match spec terminology (Damien Le Moal)
- In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of
deprecated ida_simple_*() API with ida_alloc() and ida_free()
(Christophe JAILLET)"
* tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
PCI: Fix kernel-doc issues
PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device
dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
PCI: mediatek-gen3: Fix translation window size calculation
PCI: mediatek: Clear interrupt status before dispatching handler
PCI: keystone: Fix race condition when initializing PHYs
PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
PCI: rcar-gen4: Fix -Wvoid-pointer-to-enum-cast error
PCI: iproc: Fix -Wvoid-pointer-to-enum-cast warning
PCI: dwc: Add dw_pcie_ep_{read,write}_dbi[2] helpers
PCI: dwc: Rename .func_conf_select to .get_dbi_offset in struct dw_pcie_ep_ops
PCI: dwc: Rename .ep_init to .init in struct dw_pcie_ep_ops
PCI: dwc: Drop host prefix from struct dw_pcie_host_ops members
misc: pci_endpoint_test: Use a unique test pattern for each BAR
PCI: j721e: Make TI J721E depend on ARCH_K3
PCI: j721e: Add TI J784S4 PCIe configuration
PCI/AER: Use explicit register sizes for struct members
PCI/AER: Decode Requester ID when no error info found
PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
...
The vmd_pm_enable_quirk() helper is called from pci_walk_bus() during
probe to enable ASPM for controllers with VMD_FEAT_BIOS_PM_QUIRK set.
Since pci_walk_bus() already holds a pci_bus_sem read lock, use
pci_enable_link_state_locked() to enable link states in order to avoid a
potential deadlock (e.g. in case someone takes a write lock before
reacquiring the read lock).
Fixes: f492edb40b54 ("PCI: vmd: Add quirk to configure PCIe ASPM and LTR")
Link: https://lore.kernel.org/r/20231128081512.19387-3-johan+linaro@kernel.org
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
[bhelgaas: add "potential" in subject since the deadlock has only been
reported by lockdep, include helper name in commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: <stable@vger.kernel.org> # 6.3
Cc: Michael Bottini <michael.a.bottini@linux.intel.com>
Cc: David E. Box <david.e.box@linux.intel.com>
Replace literals under drivers/pci/ with PCI_HEADER_TYPE_MASK,
PCI_HEADER_TYPE_NORMAL, and PCI_HEADER_TYPE_MFD.
Also replace !! boolean conversions with FIELD_GET().
Link: https://lore.kernel.org/r/20231003125300.5541-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> # for Renesas R-Car
vmd_domain_reset() attempts to find whether the device may contain multiple
functions by checking 0x80 (Multi-Function Device), however, the hdr_type
variable has already been masked with PCI_HEADER_TYPE_MASK so the check can
never true.
To fix the issue, don't mask the read with PCI_HEADER_TYPE_MASK.
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
Link: https://lore.kernel.org/r/20231003125300.5541-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
The if-statement within the vmd_resume() function has an inconsistent
indentation that leads to a compile time warning.
Thus, correct the inconsistent indentation. While at it, remove the
if-statement completely, which will make the code simpler.
This was detected by Smatch:
drivers/pci/controller/vmd.c:1066 vmd_resume() warn: inconsistent indenting
No functional changes are intended.
[kwilczynski: use correct tags, commit log]
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-pci/20230627113808.269716-1-korantwork@gmail.com
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
During domain reset process vmd_domain_reset() clears PCI
configuration space of VMD root ports. But certain platform
has observed following errors and failed to boot.
...
DMAR: VT-d detected Invalidation Queue Error: Reason f
DMAR: VT-d detected Invalidation Time-out Error: SID ffff
DMAR: VT-d detected Invalidation Completion Error: SID ffff
DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
DMAR: Invalidation Time-out Error (ITE) cleared
The root cause is that memset_io() clears prefetchable memory base/limit
registers and prefetchable base/limit 32 bits registers sequentially.
This seems to be enabling prefetchable memory if the device disabled
prefetchable memory originally.
Here is an example (before memset_io()):
PCI configuration space for 10000:00:00.0:
86 80 30 20 06 00 10 00 04 00 04 06 00 00 01 00
00 00 00 00 00 00 00 00 00 01 01 00 00 00 00 20
00 00 00 00 01 00 01 00 ff ff ff ff 75 05 00 00
...
So, prefetchable memory is ffffffff00000000-575000fffff, which is
disabled. When memset_io() clears prefetchable base 32 bits register,
the prefetchable memory becomes 0000000000000000-575000fffff, which is
enabled and incorrect.
Here is the quote from section 7.5.1.3.9 of PCI Express Base 6.0 spec:
The Prefetchable Memory Limit register must be programmed to a smaller
value than the Prefetchable Memory Base register if there is no
prefetchable memory on the secondary side of the bridge.
This is believed to be the reason for the failure and in addition the
sequence of operation in vmd_domain_reset() is not following the PCIe
specs.
Disable the bridge window by executing a sequence of operations
borrowed from pci_disable_bridge_window() and pci_setup_bridge_io(),
that comply with the PCI specifications.
Link: https://lore.kernel.org/r/20230810215029.1177379-1-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
The ret variable in the vmd_enable_domain() function was used
uninitialized when printing a warning message upon failure of
the pci_reset_bus() function.
Thus, fix the issue by assigning ret with the value returned from
pci_reset_bus() before referencing it in the warning message.
This was detected by Smatch:
drivers/pci/controller/vmd.c:931 vmd_enable_domain() error: uninitialized symbol 'ret'.
[kwilczynski: drop the second patch from the series, add missing reported
by tag, commit log]
Fixes: 0a584655ef89 ("PCI: vmd: Fix secondary bus reset for Intel bridges")
Link: https://lore.kernel.org/all/202305270219.B96IiIfv-lkp@intel.com
Link: https://lore.kernel.org/linux-pci/20230420094332.1507900-2-korantwork@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Xinghui Li <korantli@tencent.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
VMD driver can disable or enable MSI remapping by changing
VMCONFIG_MSI_REMAP register. This register needs to be set to the
default value during soft reboots. Drives failed to enumerate
when Windows boots after performing a soft reboot from Linux.
Windows doesn't support MSI remapping disable feature and stale
register value hinders Windows VMD driver initialization process.
Adding vmd_shutdown function to make sure to set the VMCONFIG
register to the default value.
Link: https://lore.kernel.org/r/20230224202811.644370-1-nirmal.patel@linux.intel.com
Fixes: ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible")
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
PCIe ports reserved for VMD use are not visible to BIOS and therefore not
configured to enable PCIe ASPM or LTR values (which BIOS will configure if
they are not set). Lack of this programming results in high power
consumption on laptops as reported in bugzilla. For affected products use
pci_enable_link_state to set the allowed link states for devices on the
root ports. Also set the LTR value to the maximum value needed for the SoC.
This is a workaround for products from Rocket Lake through Alder Lake.
Raptor Lake, the latest product at this time, has already implemented LTR
configuring in BIOS. Future products will move ASPM configuration back to
BIOS as well. As this solution is intended for laptops, support is not
added for hotplug or for devices downstream of a switch on the root port.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212355
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215063
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213717
Link: https://lore.kernel.org/r/20230120031522.2304439-5-david.e.box@linux.intel.com
Signed-off-by: Michael Bottini <michael.a.bottini@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Simplify the device ID list by creating a grouping of features shared by
client products.
Suggested-by: Jon Derrick <jonathan.derrick@linux.dev>
Link: https://lore.kernel.org/r/20230120031522.2304439-4-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Use PCI_VDEVICE to simplify the device table.
Link: https://lore.kernel.org/r/20230120031522.2304439-3-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
The reset was never applied in the current implementation because Intel
Bridges owned by VMD are parentless. Internally, pci_reset_bus() applies
a reset to the parent of the PCI device supplied as argument, but in this
case it failed because there wasn't a parent.
In more detail, this change allows the VMD driver to enumerate NVMe devices
in pass-through configurations when guest reboots are performed. There was
an attempted to fix this, but later we discovered that the code inside
pci_reset_bus() wasn’t triggering secondary bus resets. Therefore, we
updated the parameters passed to it, and now NVMe SSDs attached to VMD
bridges are properly enumerated in VT-d pass-through scenarios.
Link: https://lore.kernel.org/r/20221206001637.4744-1-francisco.munoz.ruiz@linux.intel.com
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
Signed-off-by: Francisco Munoz <francisco.munoz.ruiz@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Reviewed-by: Jonathan Derrick <jonathan.derrick@linux.dev>
MSI remapping is disabled by VMD driver for Intel's Icelake and
newer systems in order to improve performance by setting
VMCONFIG_MSI_REMAP. By design VMCONFIG_MSI_REMAP register is cleared
by firmware during boot. The same register gets cleared when system
is put in S3 power state. VMD driver needs to set this register again
in order to avoid interrupt issues with devices behind VMD if MSI
remapping was disabled before.
Link: https://lore.kernel.org/r/20221109142652.450998-1-nirmal.patel@linux.intel.com
Fixes: ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible")
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Francisco Munoz <francisco.munoz.ruiz@linux.intel.com>
Add support for VMD devices in MTL-H/P/U/S/M with bus restriction mode and
vector 0 disabled for MSI-X remapping.
Link: https://lore.kernel.org/r/20220628221023.190547-1-francisco.munoz.ruiz@linux.intel.com
Signed-off-by: Francisco Munoz <francisco.munoz.ruiz@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if
interrupt remapping is enabled by IOMMU.")
The commit 2565e5b69c44 was added as a workaround to keep MSI-X
remapping enabled if IOMMU enables interrupt remapping. VMD would keep
running in low performance mode. There is no dependency between MSI-X
remapping by VMD and interrupt remapping by IOMMU.
Link: https://lore.kernel.org/r/20220511095707.25403-3-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tejas reported the following recursive locking issue:
swapper/0/1 is trying to acquire lock:
ffff8881074fd0a0 (&md->mutex){+.+.}-{3:3}, at: msi_get_virq+0x30/0xc0
but task is already holding lock:
ffff8881017cd6a0 (&md->mutex){+.+.}-{3:3}, at: __pci_enable_msi_range+0xf2/0x290
stack backtrace:
__mutex_lock+0x9d/0x920
msi_get_virq+0x30/0xc0
pci_irq_vector+0x26/0x30
vmd_msi_init+0xcc/0x210
msi_domain_alloc+0xbf/0x150
msi_domain_alloc_irqs_descs_locked+0x3e/0xb0
__pci_enable_msi_range+0x155/0x290
pci_alloc_irq_vectors_affinity+0xba/0x100
pcie_port_device_register+0x307/0x550
pcie_portdrv_probe+0x3c/0xd0
pci_device_probe+0x95/0x110
This is caused by the VMD MSI code which does a lookup of the Linux
interrupt number for an VMD managed MSI[X] vector. The lookup function
tries to acquire the already held mutex.
Avoid that by caching the Linux interrupt number at initialization time
instead of looking it up over and over.
Fixes: 82ff8e6b78fc ("PCI/MSI: Use msi_get_virq() in pci_get_vector()")
Reported-by: "Surendrakumar Upadhyay, TejaskumarX" <tejaskumarx.surendrakumar.upadhyay@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "Surendrakumar Upadhyay, TejaskumarX" <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: linux-pci@vger.kernel.org
Link: https://lore.kernel.org/r/87a6euub2a.ffs@tglx
- Add PCI_ERROR_RESPONSE and related definitions for signaling and checking
for transaction errors on PCI (Naveen Naidu)
- Fabricate PCI_ERROR_RESPONSE data (~0) in config read wrappers, instead
of in host controller drivers, when transactions fail on PCI (Naveen
Naidu)
- Use PCI_POSSIBLE_ERROR() to check for possible failure of config reads
(Naveen Naidu)
* pci/errors:
PCI: xgene: Use PCI_ERROR_RESPONSE to identify config read errors
PCI: hv: Use PCI_ERROR_RESPONSE to identify config read errors
PCI: keystone: Use PCI_ERROR_RESPONSE to identify config read errors
PCI: Use PCI_ERROR_RESPONSE to identify config read errors
PCI: cpqphp: Use PCI_POSSIBLE_ERROR() to check config reads
PCI/PME: Use PCI_POSSIBLE_ERROR() to check config reads
PCI/DPC: Use PCI_POSSIBLE_ERROR() to check config reads
PCI: pciehp: Use PCI_POSSIBLE_ERROR() to check config reads
PCI: vmd: Use PCI_POSSIBLE_ERROR() to check config reads
PCI/ERR: Use PCI_POSSIBLE_ERROR() to check config reads
PCI: rockchip-host: Drop error data fabrication when config read fails
PCI: rcar-host: Drop error data fabrication when config read fails
PCI: altera: Drop error data fabrication when config read fails
PCI: mvebu: Drop error data fabrication when config read fails
PCI: aardvark: Drop error data fabrication when config read fails
PCI: kirin: Drop error data fabrication when config read fails
PCI: histb: Drop error data fabrication when config read fails
PCI: exynos: Drop error data fabrication when config read fails
PCI: mediatek: Drop error data fabrication when config read fails
PCI: iproc: Drop error data fabrication when config read fails
PCI: thunder: Drop error data fabrication when config read fails
PCI: Drop error data fabrication when config read fails
PCI: Use PCI_SET_ERROR_RESPONSE() for disconnected devices
PCI: Set error response data when config read fails
PCI: Add PCI_ERROR_RESPONSE and related definitions
Add support for this VMD device which supports the bus restriction mode.
The feature that turns off vector 0 for MSI-X remapping is also enabled.
Link: https://lore.kernel.org/r/20211217231211.46018-1-francisco.munoz.ruiz@linux.intel.com
Signed-off-by: Karthik L Gopalakrishnan <karthik.l.gopalakrishnan@intel.com>
Signed-off-by: Francisco Munoz <francisco.munoz.ruiz@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
When Samsung PCIe Gen4 NVMe is connected to Intel ADL VMD, the
combination causes AER message flood and drags the system performance
down.
The issue doesn't happen when VMD mode is disabled in BIOS, since AER
isn't enabled by acpi_pci_root_create() . When VMD mode is enabled, AER
is enabled regardless of _OSC:
[ 0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
...
[ 1.486704] pcieport 10000:e0:06.0: AER: enabled with IRQ 146
Since VMD is an aperture to regular PCIe root ports, honor ACPI _OSC to
disable PCIe features accordingly to resolve the issue.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215027
Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During VT-d pass-through, the VMD driver occasionally fails to
enumerate underlying NVMe devices when repetitive reboots are
performed in the guest OS. The issue can be resolved by resetting
VMD root ports for proper enumeration and triggering secondary bus
reset which will also propagate reset through downstream bridges.
Link: https://lore.kernel.org/r/20211116221136.85134-1-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
When config pci_ops.read() can detect failed PCI transactions, the data
returned to the CPU is PCI_ERROR_RESPONSE (~0 or 0xffffffff).
Obviously a successful PCI config read may *also* return that data if a
config register happens to contain ~0, so it doesn't definitively indicate
an error unless we know the register cannot contain ~0.
Use PCI_POSSIBLE_ERROR() to check the response we get when we read data
from hardware. This unifies PCI error response checking and makes error
checks consistent and easier to find.
Link: https://lore.kernel.org/r/ed01cad87a2e35f3865275b5fb34290817a1ebf8.1637243717.git.naveennaidu479@gmail.com
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Derrick <jonathan.derrick@linux.dev>
- Assign a number to each VMD controller to distinguish them in
/proc/interrupts (Chunguang Xu)
- Don't disable VMD MSI-X remapping if IOMMU remapping is enabled (Adrian
Huang)
- Add Kconfig dependency on !UML for allyesconfig build issue (Johannes
Berg)
* remotes/lorenzo/pci/vmd:
PCI: vmd: depend on !UML
PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU
PCI: vmd: Assign a number to each VMD controller
We already include <linux/device.h> and <linux/msi.h>, which
include <asm/device.h> and <asm/msi.h>.
Drop the redundant includes of <asm/device.h> and <asm/msi.h>.
[bhelgaas: squash in fix from Wan Jiabing <wanjiabing@vivo.com>:
https://lore.kernel.org/r/20211104063720.29375-1-wanjiabing@vivo.com]
Link: https://lore.kernel.org/r/20211013003145.1107148-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Derrick <jonathan.derrick@linux.dev>
When enabling VMD in BIOS setup (Ice Lake Processor: Whitley platform),
the host OS cannot boot successfully with the following error message:
nvme nvme0: I/O 12 QID 0 timeout, completion polled
nvme nvme0: Shutdown timeout set to 6 seconds
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0x00:0x00.5] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
The request device is the VMD controller:
# lspci -s 0000:00.5 -nn
0000:00:00.5 RAID bus controller [0104]: Intel Corporation Volume
Management Device NVMe RAID Controller [8086:28c0] (rev 04)
`git bisect` points to this offending commit ee81ee84f873 ("PCI:
vmd: Disable MSI-X remapping when possible"), which disables VMD MSI
remapping. The IOMMU hardware blocks the compatibility format
interrupt request because Interrupt Remapping Enable Status (IRES) and
Extended Interrupt Mode Enable (EIME) are enabled. Please refer to
section "5.1.4 Interrupt-Remapping Hardware Operation" in Intel VT-d
spec.
To fix the issue, VMD driver still enables the interrupt remapping
irrespective of VMD_FEAT_CAN_BYPASS_MSI_REMAP if the IOMMU subsystem
enables the interrupt remapping.
Test configuration is shown as follows:
* Two VMD controllers
1. 8086:28c0 (Whitley's VMD)
2. 8086:201d (Purley's VMD: The issue does not appear in this
controller. Just make sure if any side effect occurs.)
* w/wo intremap=off
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214219
Link: https://lore.kernel.org/r/20210901124047.1615-1-adrianhuang0701@gmail.com
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Cc: Jon Derrick <jonathan.derrick@intel.com>
Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
If the system has multiple VMD controllers, the driver does not assign
a number to each controller, so when analyzing the interrupt through
/proc/interrupts, the names of all controllers are the same, which is
not very convenient for problem analysis. Here, try to assign a number
to each VMD controller.
Link: https://lore.kernel.org/r/1631884404-24141-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
On some systems, in order to get to the deepest low-power state of
the platform (which may be necessary to save significant enough
amounts of energy while suspended to idle. for example), devices on
the PCI bus exposed by the VMD driver need to be power-managed via
ACPI. However, the layout of the ACPI namespace below the VMD
controller device object does not reflect the layout of the PCI bus
under the VMD host bridge, so in order to identify the ACPI companion
objects for the devices on that bus, it is necessary to use a special
_ADR encoding on the ACPI side. In other words, acpi_pci_find_companion()
does not work for these devices, so it needs to be amended with a
special lookup logic specific to the VMD bus.
Address this issue by allowing the VMD driver to temporarily install
an ACPI companion lookup hook containing the code matching the devices
on the VMD PCI bus with the corresponding objects in the ACPI
namespace.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jon Derrick <jonathan.derrick@intel.com>
VMD will retransmit child device MSI-X using its own MSI-X table and
requester-id. This limits the number of MSI-X available to the whole
child device domain to the number of VMD MSI-X interrupts.
Some VMD devices have a mode where this remapping can be disabled,
allowing child device interrupts to bypass processing with the VMD MSI-X
domain interrupt handler and going straight the child device interrupt
handler, allowing for better performance and scaling. The requester-id
still gets changed to the VMD endpoint's requester-id, and the interrupt
remapping handlers have been updated to properly set IRTE for child
device interrupts to the VMD endpoint's context.
Some VMD platforms have existing production BIOS which rely on MSI-X
remapping and won't explicitly program the MSI-X remapping bit. This
re-enables MSI-X remapping on unload.
Link: https://lore.kernel.org/r/20210210161315.316097-3-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl/ZJwsUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyGGQ//UhNBwb061zUaLWvaDy2RY67tGsED
kSNucKqrtaJtWaZDL3fmi8e15lwc4J/xWd96X0sd0udddrVto3WBjYd+CfbfhW8M
rQufCGlaj+VAZEzrwwbo1cTHXGH3g57BZbVlebCxJJ3mycIv716N4bdo0jz5vf9J
/PotN6yr5bCMWfO9320cmdaFDO2PK5o+AcRxUtkmFD854yJRZvmlxczFhj33pBB1
LLQlbi+Hr367WpK2uT2E36Z3POXwUqWOxSAs3Q4ql57/3UY1THZPIn5TmgXGqo7F
Dr3TRlScWxftssFSAmGZat7lBEaDBh9UnHPAsic0cjlUg6+PE7O+2iLZ7XrtPokL
Z8LVhdVlwLTNTfonsBSlMGFZJjh/zO9u1QtKYQjFhE5tUU3DUDaJz/pdcyjgaGQ1
RJNMlsHgdg89v7IJeyote+IfIpQs9YcyDxvXbXW8EFf8ft6KprTj1YNcsjRpgmwN
xhvTX40oDV9HwF3UUzO6xCdvmhmgD0n4m9cRR4kH+KtiImkZJ4tMh50AQnIs2wag
I1N2XRld7dqVGWLu4TYkmzxVgjJ9n8ylIRpHy9aMAR9TN+Q/ovLYxJvQkR4NYvxX
hft3YLgpF1ZKlyflvHmmKszLqdlPX5e56UQl3/VSRD3IQGe1+HzWOhrssWkPnNSc
0P8b7YpJU4WnVzQ=
=xRTv
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Decode PCIe 64 GT/s link speed (Gustavo Pimentel)
- Remove unused HAVE_PCI_SET_MWI (Heiner Kallweit)
- Reduce pci_set_cacheline_size() message to debug level (Heiner
Kallweit)
- Fix pci_slot_release() NULL pointer dereference (Jubin Zhong)
- Unify ECAM constants in native PCI Express drivers (Krzysztof
Wilczyński)
- Return u8 from pci_find_capability() and similar (Puranjay Mohan)
- Return u16 from pci_find_ext_capability() and similar (Bjorn
Helgaas)
- Fix ACPI companion lookup for device 0 on the root bus (Rafael J.
Wysocki)
Resource management:
- Keep both device and resource name for config space remaps
(Alexander Lobakin)
- Bounds-check command-line resource alignment requests (Bjorn
Helgaas)
- Fix overflow in command-line resource alignment requests (Colin Ian
King)
Driver binding:
- Avoid duplicate IDs in driver dynamic IDs list (Zhenzhong Duan)
Power management:
- Save/restore Precision Time Measurement Capability for
suspend/resume (David E. Box)
- Disable PTM during suspend to save power (David E. Box)
- Add sysfs attribute for device power state (Maximilian Luz)
- Rename pci_wakeup_bus() to pci_resume_bus() (Mika Westerberg)
- Do not generate wakeup event when runtime resuming device (Mika
Westerberg)
- Save/restore ASPM L1SS Capability for suspend/resume (Vidya Sagar)
Virtualization:
- Mark AMD Raven iGPU ATS as broken in some platforms (Alex Deucher)
- Add function 1 DMA alias quirk for Marvell 9215 SATA controller
(Bjorn Helgaas)
MSI:
- Disable MSI for Pericom PCIe-USB adapter (Andy Shevchenko)
- Improve warnings for 32-bit-limited MSI support (Vidya Sagar)
Error handling:
- Cache RCEC EA Capability offset in pci_init_capabilities() (Sean V
Kelley)
- Rename reset_link() to reset_subordinates() (Sean V Kelley)
- Write AER Capability only when we control it (Sean V Kelley)
- Clear AER status only when we control AER (Sean V Kelley)
- Bind RCEC devices to the Root Port driver (Qiuxu Zhuo)
- Recover from RCiEP AER errors (Qiuxu Zhuo)
- Recover from RCEC AER errors (Sean V Kelley)
- Add pcie_link_rcec() to associate RCiEPs (Sean V Kelley)
- Add pcie_walk_rcec() to RCEC AER handling (Sean V Kelley)
- Add pcie_walk_rcec() to RCEC PME handling (Sean V Kelley)
- Add RCEC AER error injection support (Qiuxu Zhuo)
Broadcom iProc PCIe controller driver:
- Fix out-of-bound array accesses (Bharat Gooty)
- Invalidate correct PAXB inbound windows (Roman Bacik)
- Enhance PCIe Link information display (Srinath Mannam)
Cadence PCIe controller driver:
- Make "cdns,max-outbound-regions" property optional (Kishon Vijay
Abraham I)
Intel VMD host bridge driver:
- Offset client MSI-X vectors (Jon Derrick)
- Update type of __iomem pointers (Krzysztof Wilczyński)
NVIDIA Tegra PCIe controller driver:
- Move "dbi" accesses to post common DWC initialization (Vidya Sagar)
- Read "dbi" base address to program in application logic (Vidya
Sagar)
- Fix ASPM-L1SS advertisement disable code (Vidya Sagar)
- Set DesignWare IP version (Vidya Sagar)
- Continue unconfig sequence even if parts fail (Vidya Sagar)
- Check return value of tegra_pcie_init_controller() (Vidya Sagar)
- Disable LTSSM during L2 entry (Vidya Sagar)
Qualcomm PCIe controller driver:
- Document PCIe bindings for SM8250 SoC (Manivannan Sadhasivam)
- Add SM8250 SoC support (Manivannan Sadhasivam)
- Add support for configuring BDF to SID mapping for SM8250
(Manivannan Sadhasivam)
Renesas R-Car PCIe controller driver:
- rcar: Drop unused members from struct rcar_pcie_host (Lad
Prabhakar)
- PCI: rcar-pci-host: Document r8a774e1 bindings (Lad Prabhakar)
- PCI: rcar-pci-host: Convert bindings to json-schema (Yoshihiro
Shimoda)
- PCI: rcar-pci-host: Document r8a77965 bindings (Yoshihiro Shimoda)
Samsung Exynos PCIe controller driver:
- Rework driver to support Exynos5433 PCIe PHY (Jaehoon Chung)
- Rework driver to support Exynos5433 variant (Jaehoon Chung)
- Drop samsung,exynos5440-pcie binding (Marek Szyprowski)
- Add the samsung,exynos-pcie binding (Marek Szyprowski)
- Add the samsung,exynos-pcie-phy binding (Marek Szyprowski)
Synopsys DesignWare PCIe controller driver:
- Support multiple ATU memory regions (Rob Herring)
- Move intel-gw ATU offset out of driver match data (Rob Herring)
- Move "dbi", "dbi2", and "addr_space" resource setup into common
code (Rob Herring)
- Remove intel-gw unneeded function wrappers (Rob Herring)
- Ensure all outbound ATU windows are reset (Rob Herring)
- Use the common MSI irq_chip in dra7xx (Rob Herring)
- Drop the .set_num_vectors() host op (Rob Herring)
- Move MSI interrupt setup into DWC common code (Rob Herring)
- Rework MSI initialization (Rob Herring)
- Move link handling into common code (Rob Herring)
- Move dw_pcie_msi_init() into core (Rob Herring)
- Move dw_pcie_setup_rc() to DWC common code (Rob Herring)
- Remove unnecessary wrappers around dw_pcie_host_init() (Rob
Herring)
- Drop keystone duplicated 'num-viewport'" (Rob Herring)
- Move inbound and outbound windows to common struct (Rob Herring)
- Detect number of iATU windows (Rob Herring)
- Warn if non-prefetchable memory aperture size is > 32-bit (Vidya
Sagar)
- Add support to program ATU for >4GB memory (Vidya Sagar)
- Set 32-bit DMA mask for MSI target address allocation (Vidya Sagar)
TI J721E PCIe driver:
- Fix "ti,syscon-pcie-ctrl" to take argument (Kishon Vijay Abraham I)
- Add host mode dt-bindings for TI's J7200 SoC (Kishon Vijay Abraham
I)
- Add EP mode dt-bindings for TI's J7200 SoC (Kishon Vijay Abraham I)
- Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg
(Kishon Vijay Abraham I)
TI Keystone PCIe controller driver:
- Enable compile-testing on !ARM (Alex Dewar)"
* tag 'pci-v5.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (100 commits)
PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller
PCI/ACPI: Fix companion lookup for device 0 on the root bus
PCI: Keep both device and resource name for config space remaps
PCI: xgene: Removed unused ".bus_shift" initialisers from pci-xgene.c
PCI: vmd: Update type of the __iomem pointers
PCI: iproc: Convert to use the new ECAM constants
PCI: thunder-pem: Add constant for custom ".bus_shift" initialiser
PCI: Unify ECAM constants in native PCI Express drivers
PCI: Disable PTM during suspend to save power
PCI/PTM: Save/restore Precision Time Measurement Capability for suspend/resume
PCI: Mark AMD Raven iGPU ATS as broken in some platforms
PCI: j721e: Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg
dt-bindings: PCI: Add EP mode dt-bindings for TI's J7200 SoC
dt-bindings: PCI: Add host mode dt-bindings for TI's J7200 SoC
dt-bindings: pci: ti,j721e: Fix "ti,syscon-pcie-ctrl" to take argument
PCI: dwc: Set 32-bit DMA mask for MSI target address allocation
PCI: qcom: Add support for configuring BDF to SID mapping for SM8250
PCI: Reduce pci_set_cacheline_size() message to debug level
PCI: Remove unused HAVE_PCI_SET_MWI
PCI: qcom: Add SM8250 SoC support
...
Use "void __iomem" instead "char __iomem" pointer type when working with
the accessor functions (with names like readb() or writel(), etc.) to
better match a given accessor function signature where commonly the address
pointing to an I/O memory region would be a "void __iomem" pointer.
Related: https://lwn.net/Articles/102232/
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20201129230743.3006978-5-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Add ECAM-related constants to provide a set of standard constants
defining memory address shift values to the byte-level address that can
be used to access the PCI Express Configuration Space, and then move
native PCI Express controller drivers to use the newly introduced
definitions retiring driver-specific ones.
Refactor pci_ecam_map_bus() function to use newly added constants so
that limits to the bus, device function and offset (now limited to 4K as
per the specification) are in place to prevent the defective or
malicious caller from supplying incorrect configuration offset and thus
targeting the wrong device when accessing extended configuration space.
This refactor also allows for the ".bus_shift" initialisers to be
dropped when the user is not using a custom value as a default value
will be used as per the PCI Express Specification.
Thanks to Qian Cai <qcai@redhat.com>, Michael Walle <michael@walle.cc>,
and Vladimir Oltean <olteanv@gmail.com> for reporting a pci_ecam_create()
issue with .bus_shift and to Vladimir for proposing the fix.
[bhelgaas: incorporate Vladimir's fix, update commit log]
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20201129230743.3006978-2-kw@linux.com
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Client VMD platforms have a software-triggered MSI-X vector 0 that will
not forward hardware-remapped MSI from the sub-device domain. This
causes an issue with VMD platforms that use AHCI behind VMD and have a
single MSI-X vector remapped to VMD vector 0. Add a VMD MSI-X vector
offset for these platforms.
Link: https://lore.kernel.org/r/20201102222223.92978-1-jonathan.derrick@intel.com
Tested-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Use the x86 shadow structs in msi_msg instead of the macros.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-16-dwmw2@infradead.org
The pci_save_state() call in vmd_suspend() can be performed by
pci_pm_suspend_irq(). This also allows VMD to benefit from the call into
pci_prepare_to_sleep().
The pci_restore_state() call in vmd_resume() was restoring state after
pci_pm_resume()::pci_restore_standard_config() had already restored state.
It's also been suspected that the config state should have been restored
before re-requesting IRQs instead of afterwards.
Remove the pci_save_state()/pci_restore_state() calls in
vmd_suspend()/vmd_resume() to allow proper flow through generic PCI core
Power Management code.
Link: https://lore.kernel.org/r/20200806210017.5654-1-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: You-Sheng Yang <vicamo.yang@canonical.com>
Move the IRQ allocation and SRCU initialization code to a new helper. No
functional changes.
Link: https://lore.kernel.org/r/20200728194945.14126-5-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Move the IRQ and MSI Domain configuration code to new helpers. No
functional changes.
Link: https://lore.kernel.org/r/20200728194945.14126-4-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Move the bus offset configuration discovery code to a new helper. Modify
the bus offset 2-bit decode switch to have a 0 case and a default error
case, just in case the field is expanded in future hardware.
Link: https://lore.kernel.org/r/20200728194945.14126-3-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Move the guest-passthrough physical offset discovery code to a new helper.
No functional changes.
Link: https://lore.kernel.org/r/20200728194945.14126-2-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Devices on the VMD bus use their own MSI irq domain, but it is not
distinguishable from regular PCI/MSI irq domains. This is required
to exclude VMD devices from getting the irq domain pointer set by
interrupt remapping.
Override the default bus token.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112333.047315047@linutronix.de
VMD has it's own PCI/MSI interrupt domain which is not in any way depending
on the x86 vector domain. PCI devices behind VMD share the VMD MSIX vector
entries via a VMD specific message translation to the actual VMD MSIX
vector. The VMD device interrupt handler for the VMD MSIX vectors invokes
all interrupt handlers of the devices which share a vector.
Making the x86 vector domain the actual parent of the VMD irq domain is
pointless and actually counterproductive. When a device interrupt is
requested then it will activate the interrupt which traverses down the
hierarchy and consumes an interrupt vector in the vector domain which is
never used.
The domain is self contained and has no parent dependencies, so just hand
in NULL for the parent and be done with it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112330.928952181@linutronix.de
Commit 711419e504eb ("irqdomain: Add the missing assignment of
domain->fwnode for named fwnode") unintentionally caused a dangling pointer
page fault issue on firmware nodes that were freed after IRQ domain
allocation. Commit e3beca48a45b fixed that dangling pointer issue by only
freeing the firmware node after an IRQ domain allocation failure. That fix
no longer frees the firmware node immediately, but leaves the firmware node
allocated after the domain is removed.
The firmware node must be kept around through irq_domain_remove, but should be
freed it afterwards.
Add the missing free operations after domain removal where where appropriate.
Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
creating the irqdomain. The only purpose of these FW nodes is to convey
name information. When this was introduced the core code did not store the
pointer to the node in the irqdomain. A recent change stored the firmware
node pointer in irqdomain for other reasons and missed to notice that the
usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
are broken by this. Storing a dangling pointer is dangerous itself, but in
case that the domain is destroyed later on this leads to a double free.
Remove the freeing of the firmware node after creating the irqdomain from
all affected call sites to cure this.
Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
VMD device 28C0 natively assists guest passthrough of the VMD endpoint
through the use of shadow registers that provide Host Physical Addresses
to correctly assign bridge windows. These shadow registers are only
available if VMD config space register 0x70, bit 1 is set.
In order to support this mode in existing VMD devices which don't
natively support the shadow register, it was decided that the hypervisor
could offer the shadow registers in a vendor-specific PCI capability.
QEMU has been modified to create this vendor-specific capability and
supply the shadow membar registers for VMDs which don't natively support
this feature. This patch adds this mode and updates the supported device
list to allow this feature to be used on these VMDs.
Link: https://lore.kernel.org/r/20200528030240.16024-4-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Versions of VMD with the Host Physical Address shadow register use this
register to calculate the bus address offset needed to do guest
passthrough of the domain. This register shadows the Host Physical
Address registers including the resource type bits. After calculating
the offset, the extra resource type bits lead to the VMD resources being
over-provisioned at the front and under-provisioned at the back.
Example:
pci 10000:80:02.0: reg 0x10: [mem 0xf801fffc-0xf803fffb 64bit]
Expected:
pci 10000:80:02.0: reg 0x10: [mem 0xf8020000-0xf803ffff 64bit]
If other devices are mapped in the over-provisioned front, it could lead
to resource conflict issues with VMD or those devices.
Link: https://lore.kernel.org/r/20200528030240.16024-3-jonathan.derrick@intel.com
Fixes: a1a30170138c9 ("PCI: vmd: Fix shadow offsets to reflect spec changes")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>