linux

iv/linux

Author	SHA1	Message	Date
Bjorn Helgaas	55fd033bae	Merge branch 'pci/error' - Clear AER "multiple errors" bits to avoid race that left them set forever (Kuppuswamy Sathyanarayanan) * pci/error: PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits	2022-05-24 16:42:21 -05:00
Johan Hovold	83013631f0	PCI: qcom: Fix unbalanced PHY init on probe errors Undo the PHY initialisation (e.g. balance runtime PM) if host initialisation fails during probe. Link: https://lore.kernel.org/r/20220401133854.10421-3-johan+linaro@kernel.org Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # 4.5	2022-05-24 16:40:45 -05:00
Johan Hovold	87d83b96c8	PCI: qcom: Fix runtime PM imbalance on probe errors Drop the leftover pm_runtime_disable() calls from the late probe error paths that would, for example, prevent runtime PM from being reenabled after a probe deferral. Link: https://lore.kernel.org/r/20220401133854.10421-2-johan+linaro@kernel.org Fixes: 6e5da6f7d824 ("PCI: qcom: Fix error handling in runtime PM support") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # 4.20 Cc: Bjorn Andersson <bjorn.andersson@linaro.org>	2022-05-24 16:40:45 -05:00
Johan Hovold	fdf6a2f533	PCI: qcom: Fix pipe clock imbalance Fix a clock imbalance introduced by ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller"), which enables the pipe clock both in init() and in post_init() but only disables in post_deinit(). Note that the pipe clock was also never disabled in the init() error paths and that enabling the clock before powering up the PHY looks questionable. Link: https://lore.kernel.org/r/20220401133351.10113-1-johan+linaro@kernel.org Fixes: ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: stable@vger.kernel.org # 5.6	2022-05-24 16:39:51 -05:00
Bhupesh Sharma	a935601eed	PCI: qcom: Add SM8150 SoC support The PCIe IP (rev 1.5.0) on SM8150 SoC is similar to the one used on SM8250. Add SM8150 support, reusing the members of ops_1_9_0. Link: https://lore.kernel.org/r/20220326060810.1797516-3-bhupesh.sharma@linaro.org Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Vinod Koul <vkoul@kernel.org>	2022-05-24 16:39:15 -05:00
Linus Torvalds	d613060475	xen: branch for v5.19-rc1 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYosaQAAKCRCAXGG7T9hj vil9AP9b4C+f9LTG0kAinjxLPyWE0Mo/iq3gO60MteZ2HyeI+AD/eSzJioJA0vyH 4pnU/UaGLJSp/B1LitLdjwoWIvwcEws= =pDcW -----END PGP SIGNATURE----- Merge tag 'for-linus-5.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - decouple the PV interface from kernel internals in the Xen scsifront/scsiback pv drivers - harden the Xen scsifront PV driver against a malicious backend driver - simplify Xen PV frontend driver ring page setup - support Xen setups with multiple domains created at boot time to tolerate Xenstore coming up late - two small cleanup patches * tag 'for-linus-5.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits) xen: add support for initializing xenstore later as HVM domain xen: sync xs_wire.h header with upstream xen x86: xen: remove STACK_FRAME_NON_STANDARD from xen_cpuid xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent xen/xenbus: eliminate xenbus_grant_ring() xen/sndfront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/usbfront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/scsifront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/pcifront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/drmfront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/tpmfront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/netfront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/blkfront: use xenbus_setup_ring() and xenbus_teardown_ring() xen/xenbus: add xenbus_setup_ring() service function xen: update ring.h xen/shbuf: switch xen-front-pgdir-shbuf to use INVALID_GRANT_REF xen/dmabuf: switch gntdev-dmabuf to use INVALID_GRANT_REF xen/sound: switch xen_snd_front to use INVALID_GRANT_REF xen/drm: switch xen_drm_front to use INVALID_GRANT_REF xen/usb: switch xen-hcd to use INVALID_GRANT_REF ...	2022-05-23 20:49:45 -07:00
Joerg Roedel	b0dacee202	Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next	2022-05-20 12:27:17 +02:00
Juergen Gross	0e6b139dbd	xen/pcifront: use xenbus_setup_ring() and xenbus_teardown_ring() Simplify pcifront's shared page creation and removal via xenbus_setup_ring() and xenbus_teardown_ring(). Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2022-05-19 14:22:02 +02:00
Daire McNamara	7013654af6	PCI: microchip: Fix potential race in interrupt handling Clear the MSI bit in ISTATUS_LOCAL register after reading it, but before reading and handling individual MSI bits from the ISTATUS_MSI register. This avoids a potential race where new MSI bits may be set on the ISTATUS_MSI register after it was read and be missed when the MSI bit in the ISTATUS_LOCAL register is cleared. ISTATUS_LOCAL is a read/write/clear register; the register's bits are set when the corresponding interrupt source is activated. Each source is independent and thus multiple sources may be active simultaneously. The processor can monitor and clear status bits. If one or more ISTATUS_LOCAL interrupt sources are active, the RootPort issues an interrupt towards the processor (on the AXI domain). Bit 28 of this register reports an MSI has been received by the RootPort. ISTATUS_MSI is a read/write/clear register. Bits 31-0 are asserted when an MSI with message number 31-0 is received by the RootPort. The processor must monitor and clear these bits. Effectively, Bit 28 of ISTATUS_LOCAL informs the processor that an MSI has arrived at the RootPort and ISTATUS_MSI informs the processor which MSI (in the range 0 - 31) needs handling. Reported by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ Link: https://lore.kernel.org/r/20220517141622.145581-1-daire.mcnamara@microchip.com Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-18 17:14:21 +01:00
Linus Torvalds	210e04ff76	pci-v5.18-fixes-1 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKEBFkUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vx3Qw/+LrVFKL6YvTJKgQf3dKiTkTXwoOiD AWnLUEB3dSh1RSJh42o5NDFerqWR4uKIJReOZ1SFWC/AGBorxsYtmmKbTrs6CZ5H x1jixWxI773XWiye6c8GlPwsdqhw6Zm6yjVGNuh9NX6ma0Mfw2wHWGwu0nrlTNeO VEh6US2McrnhFlqVmIEGR6op14JP/10haLPW1uy4a3mDjGltpprXiFjLug/4amrU rEl7H5bNl0GVDWl/WiCBB+ouS4QgK+krWENH63YyDlXdSPQwEjhNTiff/NDM/Hzc MW7sJgcOxS6mU6TbKM+oQ3aoewph8TvC3wN56VXncfwALVpkp1qlDBeNYkapecBO qO4nTPyL0nwHvLon62D+p2QFjlBq1FJM0jrCB4vXV4PwPux9TZG1UjaiMXrIuJMb 3gXYtxiiSZjWat+qhnRsNiyNu5Dtvcctd9YIUtx2LLvEgAB4CeC66mBIt6qh5QZv j39ITKJGmDPUmGOnEbtvS6knmWOUQYGFgiUjSgX1ODQ9LYR9xJb3TheTkv257a2l t3kgHzXZO5cebqAu444N1f6NeRu6rEU9OiR9HQBfEDfp9+cgDjDltHp2KlmS968o xY2pn+neC+w4gMg3CjIdWu5x4yfZ0OFeCzXtWewqNX18pljwvKGjKCRUF5vosYdd Q3KJu8/CuGkV9MU= =5giZ -----END PGP SIGNATURE----- Merge tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Avoid putting Elo i2 PCIe Ports in D3cold because downstream devices are inaccessible after going back to D0 (Rafael J. Wysocki) - Qualcomm SM8250 has a ddrss_sf_tbu clock but SC8180X does not; make a SC8180X-specific config without the clock so it probes correctly (Bjorn Andersson) - Revert aardvark chained IRQ handler rewrite because it broke interrupt affinity (Pali Rohár) * tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler" PCI: qcom: Remove ddrss_sf_tbu clock from SC8180X PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold	2022-05-17 13:46:22 -10:00
Kuppuswamy Sathyanarayanan	203926da2b	PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits When a Root Port or Root Complex Event Collector receives an error Message e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status register and logs the Requester ID in the Error Source Identification register. If it receives a second ERR_COR Message before software clears PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the Requester ID is lost. In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared: - hardware receives ERR_COR message - hardware sets PCI_ERR_ROOT_COR_RCV - aer_irq() entered - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV - hardware receives second ERR_COR message - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status) - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set - aer_irq() entered again - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set - PCI_ERR_ROOT_MULTI_COR_RCV is still set The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV. Fix the problem by queueing an AER event and clearing the Root Error Status bits when any of these bits are set: PCI_ERR_ROOT_COR_RCV PCI_ERR_ROOT_UNCOR_RCV PCI_ERR_ROOT_MULTI_COR_RCV PCI_ERR_ROOT_MULTI_UNCOR_RCV See the bugzilla link for details from Eric about how to reproduce this problem. [bhelgaas: commit log, move repro details to bugzilla] Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992 Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com Reported-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com>	2022-05-17 16:30:07 -05:00
Pali Rohár	a3b69dd0ad	Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler" This reverts commit 1571d67dc190e50c6c56e8f88cdc39f7cc53166e. This commit broke support for setting interrupt affinity. It looks like that it is related to the chained IRQ handler. Revert this commit until issue with setting interrupt affinity is fixed. Fixes: 1571d67dc190 ("PCI: aardvark: Rewrite IRQ code to chained IRQ handler") Link: https://lore.kernel.org/r/20220515125815.30157-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-16 15:58:47 -05:00
Andrea Parri (Microsoft)	b4927bd272	PCI: hv: Fix synchronization between channel callback and hv_pci_bus_exit() [ Similarly to commit a765ed47e4516 ("PCI: hv: Fix synchronization between channel callback and hv_compose_msi_msg()"): ] The (on-stack) teardown packet becomes invalid once the completion timeout in hv_pci_bus_exit() has expired and hv_pci_bus_exit() has returned. Prevent the channel callback from accessing the invalid packet by removing the ID associated to such packet from the VMbus requestor in hv_pci_bus_exit(). Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20220511223207.3386-3-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-05-13 16:57:32 +00:00
Andrea Parri (Microsoft)	9937fa6d1e	PCI: hv: Add validation for untrusted Hyper-V values For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest in the host-to-guest ring buffer. Ensure that invalid values cannot cause data being copied out of the bounds of the source buffer in hv_pci_onchannelcallback(). While at it, remove a redundant validation in hv_pci_generic_compl(): hv_pci_onchannelcallback() already ensures that all processed incoming packets are "at least as large as [in fact larger than] a response". Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20220511223207.3386-2-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-05-13 16:57:32 +00:00
Robin Murphy	b8397a8f4e	iommu/dma: Explicitly sort PCI DMA windows Originally, creating the dma_ranges resource list in pre-sorted fashion was the simplest and most efficient way to enforce the order required by iova_reserve_pci_windows(). However since then at least one PCI host driver is now re-sorting the list for its own probe-time processing, which doesn't seem entirely unreasonable, so that basic assumption no longer holds. Make iommu-dma robust and get the sort order it needs by explicitly sorting, which means we can also save the effort at creation time and just build the list in whatever natural order the DT had. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/35661036a7e4160850895f9b37f35408b6a29f2f.1652091160.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-05-13 15:08:20 +02:00
Parshuram Thombare	95b00f6820	PCI: cadence: Clear FLR in device capabilities register Clear FLR (Function Level Reset) from device capabilities registers for all physical functions. During FLR, the Margining Lane Status and Margining Lane Control registers should not be reset, as per PCIe specification. However, the controller incorrectly resets these registers upon FLR. This causes PCISIG compliance FLR test to fail. Hence preventing all functions from advertising FLR support if flag quirk_disable_flr is set. Link: https://lore.kernel.org/r/1635165075-89864-1-git-send-email-pthombar@cadence.com Signed-off-by: Parshuram Thombare <pthombar@cadence.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-12 22:19:40 +01:00
Christian Gmeiner	a1f67bc131	PCI: cadence: Allow PTM Responder to be enabled This enables the Controller [RP] to automatically respond with Response/ResponseD messages if CDNS_PCIE_LM_TPM_CTRL_PTMRSEN and PCI_PTM_CTRL_ENABLE bits are both set. Link: https://lore.kernel.org/r/20220512055539.1782437-1-christian.gmeiner@gmail.com Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-12 22:03:05 +01:00
Nirmal Patel	c94f732e80	PCI: vmd: Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.") Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.") The commit 2565e5b69c44 was added as a workaround to keep MSI-X remapping enabled if IOMMU enables interrupt remapping. VMD would keep running in low performance mode. There is no dependency between MSI-X remapping by VMD and interrupt remapping by IOMMU. Link: https://lore.kernel.org/r/20220511095707.25403-3-nirmal.patel@linux.intel.com Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-12 15:54:14 +01:00
Nirmal Patel	886e67100b	PCI: vmd: Assign VMD IRQ domain before enumeration During the boot process all the PCI devices are assigned default PCI-MSI IRQ domain including VMD endpoint devices. If interrupt-remapping is enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ domain for its child devices in order to support MSI-X remapping capabilities. Now when MSI-X remapping in VMD is disabled in order to improve performance, VMD skips VMD-MSI IRQ domain assignment process to its child devices. Thus the devices behind VMD get default PCI-MSI IRQ domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and configures child devices. As a result host OS fails to boot and DMAR errors were observed when interrupt remapping was enabled on Intel Icelake CPUs. For instance: DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request To fix this issue, dev_msi_info struct in dev struct maintains correct value of IRQ domain. VMD will use this information to assign proper IRQ domain to its child devices when it doesn't create a separate IRQ domain. Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.com Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-12 15:54:14 +01:00
Yicong Yang	a91ee0e9fc	PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store() The sysfs sriov_numvfs_store() path acquires the device lock before the config space access lock: sriov_numvfs_store device_lock # A (1) acquire device lock sriov_configure vfio_pci_sriov_configure # (for example) vfio_pci_core_sriov_configure pci_disable_sriov sriov_disable pci_cfg_access_lock pci_wait_cfg # B (4) wait for dev->block_cfg_access == 0 Previously, pci_dev_lock() acquired the config space access lock before the device lock: pci_dev_lock pci_cfg_access_lock dev->block_cfg_access = 1 # B (2) set dev->block_cfg_access = 1 device_lock # A (3) wait for device lock Any path that uses pci_dev_lock(), e.g., pci_reset_function(), may deadlock with sriov_numvfs_store() if the operations occur in the sequence (1) (2) (3) (4). Avoid the deadlock by reversing the order in pci_dev_lock() so it acquires the device lock before the config space access lock, the same as the sriov_numvfs_store() path. [bhelgaas: combined and adapted commit log from Jay Zhou's independent subsequent posting: https://lore.kernel.org/r/20220404062539.1710-1-jianjay.zhou@huawei.com] Link: https://lore.kernel.org/linux-pci/1583489997-17156-1-git-send-email-yangyicong@hisilicon.com/ Also-posted-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-11 17:20:09 -05:00
Jeffrey Hugo	a2bad844a6	PCI: hv: Fix interrupt mapping for multi-MSI According to Dexuan, the hypervisor folks beleive that multi-msi allocations are not correct. compose_msi_msg() will allocate multi-msi one by one. However, multi-msi is a block of related MSIs, with alignment requirements. In order for the hypervisor to allocate properly aligned and consecutive entries in the IOMMU Interrupt Remapping Table, there should be a single mapping request that requests all of the multi-msi vectors in one shot. Dexuan suggests detecting the multi-msi case and composing a single request related to the first MSI. Then for the other MSIs in the same block, use the cached information. This appears to be viable, so do it. Suggested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1652282599-21643-1-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-05-11 17:51:02 +00:00
Jeffrey Hugo	b4b77778ec	PCI: hv: Reuse existing IRTE allocation in compose_msi_msg() Currently if compose_msi_msg() is called multiple times, it will free any previous IRTE allocation, and generate a new allocation. While nothing prevents this from occurring, it is extraneous when Linux could just reuse the existing allocation and avoid a bunch of overhead. However, when future IRTE allocations operate on blocks of MSIs instead of a single line, freeing the allocation will impact all of the lines. This could cause an issue where an allocation of N MSIs occurs, then some of the lines are retargeted, and finally the allocation is freed/reallocated. The freeing of the allocation removes all of the configuration for the entire block, which requires all the lines to be retargeted, which might not happen since some lines might already be unmasked/active. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1652282582-21595-1-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-05-11 17:50:20 +00:00
Peter Geis	e8aae154df	PCI: rockchip-dwc: Add legacy interrupt support The legacy interrupts on the rk356x PCIe controller are handled by a single muxed interrupt. Add IRQ domain support to the pcie-dw-rockchip driver to support the virtual domain. Link: https://lore.kernel.org/r/20220429123832.2376381-4-pgwipeout@gmail.com Signed-off-by: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org>	2022-05-11 16:01:26 +01:00
Peter Geis	431e7d2eec	PCI: rockchip-dwc: Reset core at driver probe The PCIe controller is in an unknown state at driver probe. This can lead to undesireable effects when the driver attempts to configure the controller. Prevent issues in the future by resetting the core during probe. Link: https://lore.kernel.org/r/20220429123832.2376381-3-pgwipeout@gmail.com Tested-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com> Signed-off-by: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-11 16:01:26 +01:00
AngeloGioacchino Del Regno	1d565935e3	PCI: mediatek-gen3: Assert resets to ensure expected init state The controller may have been left out of reset by the bootloader, in which case, before the powerup sequence, the controller will be found preconfigured with values that were set before booting the kernel: this produces a controller failure, with the result of a failure during the mtk_pcie_startup_port() sequence as the PCIe link never gets up. To ensure that we get a clean start in an expected state, assert both the PHY and MAC resets before executing the controller power-up sequence. Link: https://lore.kernel.org/r/20220404144858.92390-1-angelogioacchino.delregno@collabora.com Fixes: d3bf75b579b9 ("PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-11 15:25:26 +01:00
Conor Dooley	30097efa33	PCI: microchip: Add missing chained_irq_enter()/exit() calls Two of the chained IRQ handlers miss their chained_irq_enter()/chained_irq_exit() calls, so add them in to avoid potentially lost interrupts. Reported by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/linux-pci/87h76b8nxc.wl-maz@kernel.org Link: https://lore.kernel.org/r/20220511095504.2273799-1-conor.dooley@microchip.com Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-11 14:02:42 +01:00
Francesco Dolcini	a6809941c1	PCI: imx6: Fix PERST# start-up sequence According to the PCIe standard the PERST# signal (reset-gpio in fsl,imx* compatible dts) should be kept asserted for at least 100 usec before the PCIe refclock is stable, should be kept asserted for at least 100 msec after the power rails are stable and the host should wait at least 100 msec after it is de-asserted before accessing the configuration space of any attached device. From PCIe CEM r2.0, sec 2.6.2 T-PVPERL: Power stable to PERST# inactive - 100 msec T-PERST-CLK: REFCLK stable before PERST# inactive - 100 usec. From PCIe r5.0, sec 6.6.1 With a Downstream Port that does not support Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms before sending a Configuration Request to the device immediately below that Port. Failure to do so could prevent PCIe devices to be working correctly, and this was experienced with real devices. Move reset assert to imx6_pcie_assert_core_reset(), this way we ensure that PERST# is asserted before enabling any clock, move de-assert to the end of imx6_pcie_deassert_core_reset() after the clock is enabled and deemed stable and add a new delay of 100 msec just afterward. Link: https://lore.kernel.org/all/20220211152550.286821-1-francesco.dolcini@toradex.com Link: https://lore.kernel.org/r/20220404081509.94356-1-francesco.dolcini@toradex.com Fixes: bb38919ec56e ("PCI: imx6: Add support for i.MX6 PCIe controller") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Richard Zhu <hongxing.zhu@nxp.com>	2022-05-11 13:50:45 +01:00
Dmitry Baryshkov	bc49681c96	PCI: qcom-ep: Move enable/disable resources code to common functions Remove code duplication by moving the code related to enabling/disabling the resources (PHY, CLK, Reset) to common functions so that they can be called from multiple places. [mani: renamed the functions and reworded the commit message] Link: https://lore.kernel.org/r/20220502104938.97033-1-manivannan.sadhasivam@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2022-05-11 10:48:35 +01:00
Rafael J. Wysocki	0f40ac35e4	PCI/PM: Replace pci_set_power_state() in pci_pm_thaw_noirq() Calling pci_set_power_state() to put the given device into D0 in pci_pm_thaw_noirq() may cause it to restore the device's BARs, which is redundant before calling pci_restore_state(), so replace it with a direct pci_power_up() call followed by pci_update_current_state() if it returns a nonzero value, in analogy with pci_pm_default_resume_early(). Avoid code duplication by introducing a wrapper function to contain the repeating pattern and calling it in both places. Link: https://lore.kernel.org/r/3639079.MHq7AAxBmi@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	3cc2a2b270	PCI/PM: Rearrange pci_set_power_state() The part of pci_set_power_state() related to transitions into low-power states is unnecessary convoluted, so clearly divide it into the D3cold special case and the general case covering all of the other states. Also fix a potential issue with calling pci_bus_set_current_state() to set the current state of all devices on the subordinate bus to D3cold without checking if the power state of the parent bridge has really changed to D3cold. Link: https://lore.kernel.org/r/2139440.Mh6RI2rZIc@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	0aacdc9574	PCI/PM: Clean up pci_set_low_power_state() Make the following assorted non-essential changes in pci_set_low_power_state(): 1. Drop two redundant checks from it (the caller takes care of these conditions). 2. Change the log level of a messages printed by it to "debug", because it only indicates a programming mistake. Link: https://lore.kernel.org/r/2539071.Lt9SDvczpP@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	0ce74a3b9c	PCI/PM: Do not restore BARs if device is not in D0 Do not attempt to restore the device's BARs in pci_set_full_power_state() if the actual current power state of the device is not D0. Link: https://lore.kernel.org/r/1849718.CQOukoFCf9@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	e200904b27	PCI/PM: Split pci_power_up() One of the two callers of pci_power_up() invokes pci_update_current_state() and pci_restore_state() right after calling it, in which case running the part of it happening after the mandatory transition delays is redundant, so move that part out of it into a new function called pci_set_full_power_state() that will be invoked from pci_set_power_state() which is the other caller of pci_power_up(). Link: https://lore.kernel.org/r/1942150.usQuhbGJ8B@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	f0881d38c7	PCI/PM: Write 0 to PMCSR in pci_power_up() in all cases Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in order to put it into D0 regardless of the power state returned by the previous read from that register which should not matter. Link: https://lore.kernel.org/r/5748066.MhkbZ0Pkbq@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	0b59193548	PCI/PM: Do not call pci_update_current_state() from pci_power_up() Notice that calling pci_update_current_state() from pci_power_up() is redundant and may be harmful in some cases. First, if the device is in a low-power state before pci_power_up() gets called for it and platform_pci_set_power_state() successfully changes its power state to D0, pci_update_current_state() will update current_state to reflect that and pci_power_up() will return success right away without restoring the device's BARs or reconfiguring ASPM which may be necessary. This is arguably incorrect and definitely inconsistent with the case when platform_pci_set_power_state() returns an error (for example, because the device is not power-manageable by the platform firmware). Second, current_state should not be overwritten until the decision whether or not to restore the device's BARs is made, because that decision generally depends on its value. Again, calling pci_update_current_state() in pci_power_up() is not consistent with this observation. Next, pci_power_up() attempts to read from the device's PCI_PM_CTRL register regardless of the current_state value unless it is PCI_D0, including the case when pci_update_current_state() sets current_state to PCI_D3cold to indicate that the device is not accessible. If the register read is not successful, current_state will be set to PCI_D3cold anyway, so that pci_update_current_state() action is redundant. Further, if pci_update_current_state() reads the device's PCI_PM_CTRL register, pci_power_up() will repeat that read going forward and it is not necessary to update current_state in the meantime. Finally, if pm_cap is not set (in which case the PCI_PM_CTRL register is not present), the power state of the device should be determined with the help of the platform firmware or set to D0 if that's not possible and pci_update_current_state() does not do that. Accordingly, rearrange pci_power_up() so as to address the above shortcomings. Link: https://lore.kernel.org/r/3695055.kQq0lBPeGt@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:49 -05:00
Rafael J. Wysocki	6d8c016a55	PCI/PM: Unfold pci_platform_power_transition() in pci_power_up() Some actions carried out by pci_platform_power_transition(() in pci_power_up() are redundant, but before dealing with them, make pci_power_up() call the pci_platform_power_transition() code directly (and avoid a redundant check when pm_cap is unset while at it). Link: https://lore.kernel.org/r/1922486.PYKUYFuaPT@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:48 -05:00
Rafael J. Wysocki	1aa85bb14d	PCI/PM: Set current_state to D3cold if the device is not accessible Make pci_power_up() and pci_set_low_power_state() change current_state to PCI_D3cold when the device is not accessible along the lines of pci_update_current_state(). Link: https://lore.kernel.org/r/10104376.nUPlyArG6x@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:48 -05:00
Rafael J. Wysocki	7957d20145	PCI/PM: Relocate pci_set_low_power_state() Because pci_set_power_state() is the only caller of pci_set_low_power_state(), put the latter next to the former. No functional impact. Link: https://lore.kernel.org/r/3202976.44csPzL39Z@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:48 -05:00
Rafael J. Wysocki	10aa5377fc	PCI/PM: Split pci_raw_set_power_state() The transitions from low-power states to D0 and the other way around are unnecessarily tangled in pci_raw_set_power_state() which makes it rather hard to follow. Moreover, the only caller of pci_raw_set_power_state() passing PCI_D0 as its state argument is pci_power_up(), so the code carrying out transitions into D0 can be put directly into that function. Accordingly, move the code handling transitions from low-power states into D0 directly into pci_power_up() and rename the remaining part of pci_raw_set_power_state() to pci_set_low_power_state(), because it only handles transitions into low-power state now. While at it, fix up some white space, update some comments and modify messages printed by pci_power_up() and pci_set_low_power_state() to be less confusing (which is the only expected functional impact of this change). Link: https://lore.kernel.org/r/13038676.uLZWGnKmhe@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:48 -05:00
Rafael J. Wysocki	9c384ddd6e	PCI/PM: Rearrange pci_update_current_state() Save one config space access in pci_update_current_state() by testing the retrieved PCI_PM_CTRL register value against PCI_POSSIBLE_ERROR() instead of invoking pci_device_is_present() separately. While at it, drop a pair of unnecessary parens. No expected functional impact. Link: https://lore.kernel.org/r/1917095.PYKUYFuaPT@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:19:48 -05:00
Rafael J. Wysocki	8221ecd4e4	PCI/PM: Drop the runtime_d3cold device flag The runtime_d3cold flag is not needed any more, so drop it. Link: https://lore.kernel.org/r/8077784.T7Z3S40VBb@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2022-05-05 14:19:48 -05:00
Rafael J. Wysocki	730643d33e	PCI/PM: Resume subordinate bus in bus type callbacks Calling pci_resume_bus() on the secondary bus from pci_power_up() as it is done now is questionable, because it depends on the mandatory bridge power-up delays that are only covered by the PCI bus type PM callbacks. For this reason, move the subordinate bus resume to those callbacks too and use the observation that if a bridge is being powered-up during resume from system-wide suspend, it may be still desirable to runtime-resume its subordinate bus after completing the system-wide transition (in case the resume of the devices on that bus is skipped during it). Link: https://lore.kernel.org/r/3190097.aeNJFYEL58@kreacher Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2022-05-05 14:19:45 -05:00
Rafael J. Wysocki	9a6058312e	PCI/PM: Power up all devices during runtime resume Currently, endpoint devices may not be powered up entirely during runtime resume that follows a D3hot -> D0 transition of the parent bridge. Namely, even if the power state of an endpoint device, as indicated by its PCI_PM_CTRL register, is D0 after powering up its parent bridge, it may be still necessary to bring its ACPI companion into D0 and that should be done before accessing it. However, the current code assumes that reading the PCI_PM_CTRL register is sufficient to establish the endpoint device's power state, which may lead to problems. Address that by forcing a power-up of all PCI devices, including the platform firmware part of it, during runtime resume. Link: https://lore.kernel.org/linux-pm/11967527.O9o76ZdvQC@kreacher Fixes: 5775b843a619 ("PCI: Restore config space on runtime resume despite being unbound") Link: https://lore.kernel.org/r/2652115.mvXUDI8C0e@kreacher Reported-by: Abhishek Sahu <abhsahu@nvidia.com> Tested-by: Abhishek Sahu <abhsahu@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2022-05-05 14:19:11 -05:00
Krzysztof Kozlowski	18a94192e2	PCI/PM: Define pci_restore_standard_config() only for CONFIG_PM_SLEEP pci_restore_standard_config() was defined under CONFIG_PM but called only by pci_pm_resume() (defined under CONFIG_SUSPEND) and pci_pm_restore() (defined under CONFIG_HIBERNATE_CALLBACKS). A configuration with only CONFIG_PM leads to a warning: drivers/pci/pci-driver.c:533:12: error: ‘pci_restore_standard_config’ defined but not used [-Werror=unused-function] CONFIG_PM_SLEEP depends on CONFIG_SUSPEND and CONFIG_HIBERNATE_CALLBACKS, so define pci_restore_standard_config() under that instead. Link: https://lore.kernel.org/r/20220420141135.444820-1-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-05-05 14:10:24 -05:00
Bjorn Andersson	134b5ce3ed	PCI: qcom: Remove ddrss_sf_tbu clock from SC8180X The Qualcomm SC8180X platform was piggy-backing on the SM8250 qcom_pcie_cfg, but SC8180X doesn't have the ddrss_sf_tbu clock, so it now fails to probe due to the missing clock. Give SC8180X its own qcom_pcie_cfg, without the ddrss_sf_tbu flag set. Fixes: 0614f98bbb9f ("PCI: qcom: Add ddrss_sf_tbu flag") Link: https://lore.kernel.org/r/20220331013415.592748-1-bjorn.andersson@linaro.org Tested-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com>	2022-05-03 17:41:28 -05:00
Dexuan Cui	23e118a48a	PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time Currently when the pci-hyperv driver finishes probing and initializing the PCI device, it sets the PCI_COMMAND_MEMORY bit; later when the PCI device is registered to the core PCI subsystem, the core PCI driver's BAR detection and initialization code toggles the bit multiple times, and each toggling of the bit causes the hypervisor to unmap/map the virtual BARs from/to the physical BARs, which can be slow if the BAR sizes are huge, e.g., a Linux VM with 14 GPU devices has to spend more than 3 minutes on BAR detection and initialization, causing a long boot time. Reduce the boot time by not setting the PCI_COMMAND_MEMORY bit when we register the PCI device (there is no need to have it set in the first place). The bit stays off till the PCI device driver calls pci_enable_device(). With this change, the boot time of such a 14-GPU VM is reduced by almost 3 minutes. Link: https://lore.kernel.org/lkml/20220419220007.26550-1-decui@microsoft.com/ Tested-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Jake Oshins <jakeo@microsoft.com> Link: https://lore.kernel.org/r/20220502074255.16901-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-05-03 10:59:10 +00:00
Jeffrey Hugo	455880dfe2	PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first MSI of the N allocated. This is because only the first msi_desc is cached and it is shared by all the MSIs of the multi-MSI block. This means that hv_arch_irq_unmask() gets the correct address, but the wrong data (always 0). This can break MSIs. Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0. hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure the MSI address and data (0) to vector 34 of CPU0. This is correct. Then hv_arch_irq_unmask is called on MSI1. It uses another hypercall to configure the MSI address and data (0) to vector 33 of CPU0. This is wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux will observe extra instances of MSI1 and no instances of MSI0 despite the endpoint device behaving correctly. For the multi-MSI case, we need unique address and data info for each MSI, but the cached msi_desc does not provide that. However, that information can be gotten from the int_desc cached in the chip_data by compose_msi_msg(). Fix the multi-MSI case to use that cached information instead. Since hv_set_msi_entry_from_desc() is no longer applicable, remove it. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1651068453-29588-1-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2022-04-28 15:09:02 +00:00
Lu Baolu	c7d4698497	PCI: portdrv: Set driver_managed_dma If a switch lacks ACS P2P Request Redirect, a device below the switch can bypass the IOMMU and DMA directly to other devices below the switch, so all the downstream devices must be in the same IOMMU group as the switch itself. The existing VFIO framework allows the portdrv driver to be bound to the bridge while its downstream devices are assigned to user space. The pci_dma_configure() marks the IOMMU group as containing only devices with kernel drivers that manage DMA. Avoid this default behavior for the portdrv driver in order for compatibility with the current VFIO usage. We achieve this by setting ".driver_managed_dma = true" in pci_driver structure. It is safe because the portdrv driver meets below criteria: - This driver doesn't use DMA, as you can't find any related calls like pci_set_master() or any kernel DMA API (dma_map_*() and etc.). - It doesn't use MMIO as you can't find ioremap() or similar calls. It's tolerant to userspace possibly also touching the same MMIO registers via P2P DMA access. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220418005000.897664-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Lu Baolu	18c7a349d0	PCI: pci_stub: Set driver_managed_dma The current VFIO implementation allows pci-stub driver to be bound to a PCI device with other devices in the same IOMMU group being assigned to userspace. The pci-stub driver has no dependencies on DMA or the IOVA mapping of the device, but it does prevent the user from having direct access to the device, which is useful in some circumstances. The pci_dma_configure() marks the iommu_group as containing only devices with kernel drivers that manage DMA. For compatibility with the VFIO usage, avoid this default behavior for the pci_stub. This allows the pci_stub still able to be used by the admin to block driver binding after applying the DMA ownership to VFIO. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220418005000.897664-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00
Lu Baolu	512881eacf	bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers with the device DMA managed by kernel drivers or user-space applications. Unfortunately, multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. The DMA on these devices must either be entirely under kernel control or userspace control, never a mixture. Otherwise the driver integrity is not guaranteed because they could access each other through the peer-to-peer accesses which by-pass the IOMMU protection. This checks and sets the default DMA mode during driver binding, and cleanups during driver unbinding. In the default mode, the device DMA is managed by the device driver which handles DMA operations through the kernel DMA APIs (see Documentation/core-api/dma-api.rst). For cases where the devices are assigned for userspace control through the userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this default setting in the assumption that the drivers know what they are doing with the device DMA. Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure is currently a problem. As things stand, the IOMMU driver ignored the initial iommu_probe_device() call when the device was added, since at that point it had no fwspec yet. In this situation, {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after the IOMMU driver has seen the firmware data via .of_xlate to learn that it actually responsible for the given device. As the result, before that gets fixed, iommu_use_default_domain() goes at the end, and calls arch_teardown_dma_ops() if it fails. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2022-04-28 15:32:20 +02:00

... 3 4 5 6 7 ...

9670 Commits