linux

iv/linux

Author	SHA1	Message	Date
Christoph Hellwig	d8668bb045	memremap: pass a struct dev_pagemap to ->kill and ->cleanup Passing the actual typed structure leads to more understandable code vs just passing the ref member. Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-07-02 14:32:44 -03:00
Christoph Hellwig	1e240e8d4a	memremap: move dev_pagemap callbacks into a separate structure The dev_pagemap is a growing too many callbacks. Move them into a separate ops structure so that they are not duplicated for multiple instances, and an attacker can't easily overwrite them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-07-02 14:32:44 -03:00
Rafael J. Wysocki	28ad4b4e34	Merge back PCI power management material for v5.3.	2019-06-30 13:41:52 +02:00
David S. Miller	d96ff269a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The new route handling in ip_mc_finish_output() from 'net' overlapped with the new support for returning congestion notifications from BPF programs. In order to handle this I had to take the dev_loopback_xmit() calls out of the switch statement. The aquantia driver conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-27 21:06:39 -07:00
Vidya Sagar	ca98329d3b	PCI: dwc: Export APIs to support .remove() implementation Export all configuration space access APIs and also other APIs to support host controller drivers of dwc core based implementations while adding support for .remove() hook to build their respective drivers as modules. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>	2019-06-27 12:02:46 +01:00
Vidya Sagar	7bc082d7e9	PCI: dwc: Cleanup DBI,ATU read and write APIs Cleanup DBI read and write APIs by removing leading "__" (underscore) from their names as there is no reason to have leading underscores in the first place in the function definition. Remove dbi/dbi2 base address parameters as the same behaviour can be obtained through read and write APIs. Since dw_pcie_{readl/writel}_dbi() APIs can't be used for ATU read/write as ATU base address could be different from DBI base address, implement ATU read/write APIs using ATU base address without using dw_pcie_{readl/writel}_dbi() APIs. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Jingoo Han <jingoohan1@gmail.com>	2019-06-27 12:02:46 +01:00
Vidya Sagar	9d071cade3	PCI: dwc: Add API support to de-initialize host Add an API to group all the tasks to be done to de-initialize host which can then be called by any dwc core based driver implementations while adding .remove() support in their respective drivers. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>	2019-06-27 12:02:46 +01:00
Rafael J. Wysocki	b51033e06c	PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete() In pci_pm_complete() there are checks to decide whether or not to resume devices that were left in runtime-suspend during the preceding system-wide transition into a sleep state. They involve checking the current power state of the device and comparing it with the power state of it set before the preceding system-wide transition, but the platform component of the device's power state is not handled correctly in there. Namely, on platforms with ACPI, the device power state information needs to be updated with care, so that the reference counters of power resources used by the device (if any) are set to ensure that the refreshed power state of it will be maintained going forward. To that end, introduce a new ->refresh_state() platform PM callback for PCI devices, for asking the platform to refresh the device power state data and ensure that the corresponding power state will be maintained going forward, make it invoke acpi_device_update_power() (for devices with ACPI PM) on platforms with ACPI and make pci_pm_complete() use it, through a new pci_refresh_power_state() wrapper function. Fixes: a0d2a959d3da (PCI: Avoid unnecessary resume after direct-complete) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2019-06-27 12:33:26 +02:00
Mika Westerberg	53b22f900c	PCI / ACPI: Add _PR0 dependent devices If otherwise unrelated PCI devices share ACPI power resources turning them on causes the devices to enter D0uninitialized power state which may cause problems. For example in Intel Ice Lake two root ports (RP0 and RP1), Thunderbolt controller (NHI) and xHCI controller all share power resources as can be ween in the topology below where power resources are marked with []: Host bridge \| +- RP0 ---\ +- RP1 ---\|--+--> [TBT] +- NHI --/ \| \| \| \| v +- xHCI --> [D3C] In a situation where all devices sharing the power resources are in D3cold (the power resources are turned off) and for example the Thunderbolt controller is runtime resumed resulting that the power resources are turned on. This means that the other devices sharing them (RP0, RP1 and xHCI) are transitioned into D0uninitialized state. If they were configured to trigger wake (PME) on a certain event that configuration gets lost after reset so we would need to re-initialize them to get the wakeup working as expected again. To do so we would need to runtime resume all of them to make sure their registers get restored properly before we can runtime suspend them again. Since we just added concept of "_PR0 dependent device" we can solve this by calling the relevant add/remove functions when the PCI device is bind to its ACPI representation. If it has power resources the PCI device will be added as dependent device to them and runtime resumed whenever they are physically turned on. This should make sure PCI core can reconfigure wakes after the device is transitioned into D0uninitialized. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-06-27 12:31:57 +02:00
Mika Westerberg	83a16e3f6d	PCI / ACPI: Use cached ACPI device state to get PCI device power state The ACPI power state returned by acpi_device_get_power() may depend on the configuration of ACPI power resources in the system which may change any time after acpi_device_get_power() has returned, unless the reference counters of the ACPI power resources in question are set to prevent that from happening. Thus it is invalid to use acpi_device_get_power() in acpi_pci_get_power_state() the way it is done now and the value of the ->power.state field in the corresponding struct acpi_device objects (which reflects the ACPI power resources reference counting, among other things) should be used instead. As an example where this becomes an issue is Intel Ice Lake where the Thunderbolt controller (NHI), two PCIe root ports (RP0 and RP1) and xHCI all share the same power resources. The following picture with power resources marked with [] shows the topology: Host bridge \| +- RP0 ---\ +- RP1 ---\|--+--> [TBT] +- NHI --/ \| \| \| \| v +- xHCI --> [D3C] Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method of the devices in question returns either TBT or D3C or both. Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now since the TBT power resource is still on when the root ports are runtime suspended their dev->current_state is set to D3hot. When NHI is runtime suspended TBT is finally turned off but state of the root ports remain to be D3hot. Now when the xHCI is runtime suspended D3C gets also turned off. PCI core thus has power states of these devices cached in their dev->current_state as follows: RP0 -> D3hot RP1 -> D3hot NHI -> D3cold xHCI -> D3cold If the user now runs lspci for instance, the result is all 1's like in the below output (00:07.0 is the first root port, RP0): 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: pcieport In short the hardware state is not in sync with the software state anymore. The exact same thing happens with the PME polling thread which ends up bringing the root ports back into D0 after they are runtime suspended. For this reason, modify acpi_pci_get_power_state() so that it uses the ACPI device power state that was cached by the ACPI core. This makes the PCI device power state match the ACPI device power state regardless of state of the shared power resources which may still be on at this point. Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-06-27 12:31:57 +02:00
Rafael J. Wysocki	471a739a47	PCI: PM: Avoid skipping bus-level PM on platforms without ACPI There are platforms that do not call pm_set_suspend_via_firmware(), so pm_suspend_via_firmware() returns 'false' on them, but the power states of PCI devices (PCIe ports in particular) are changed as a result of powering down core platform components during system-wide suspend. Thus the pm_suspend_via_firmware() checks in pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by commit 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to- idle") are not sufficient to determine that devices left in D0 during suspend will remain in D0 during resume and so the bus-level power management can be skipped for them. For this reason, introduce a new global suspend flag, PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only and replace the pm_suspend_via_firmware() checks mentioned above with checks against this flag. Fixes: 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2019-06-26 23:51:56 +02:00
Bharat Kumar Gogada	181fa434d0	PCI: xilinx-nwl: Fix Multi MSI data programming According to the PCI Local Bus specification Revision 3.0, section 6.8.1.3 (Message Control for MSI), endpoints that are Multiple Message Capable as defined by bits [3:1] in the Message Control for MSI can request a number of vectors that is power of two aligned. As specified in section 6.8.1.6 "Message data for MSI", the Multiple Message Enable field (bits [6:4] of the Message Control register) defines the number of low order message data bits the function is permitted to modify to generate its system software allocated vectors. The MSI controller in the Xilinx NWL PCIe controller supports a number of MSI vectors specified through a bitmap and the hwirq number for an MSI, that is the value written in the MSI data TLP is determined by the bitmap allocation. For instance, in a situation where two endpoints sitting on the PCI bus request the following MSI configuration, with the current PCI Xilinx bitmap allocation code (that does not align MSI vector allocation on a power of two boundary): Endpoint #1: Requesting 1 MSI vector - allocated bitmap bits 0 Endpoint #2: Requesting 2 MSI vectors - allocated bitmap bits [1,2] The bitmap value(s) corresponds to the hwirq number that is programmed into the Message Data for MSI field in the endpoint MSI capability and is detected by the root complex to fire the corresponding MSI irqs. The value written in Message Data for MSI field corresponds to the first bit allocated in the bitmap for Multi MSI vectors. The current Xilinx NWL MSI allocation code allows a bitmap allocation that is not a power of two boundaries, so endpoint #2, is allowed to toggle Message Data bit[0] to differentiate between its two vectors (meaning that the MSI data will be respectively 0x0 and 0x1 for the two vectors allocated to endpoint #2). This clearly aliases with the Endpoint #1 vector allocation, resulting in a broken Multi MSI implementation. Update the code to allocate MSI bitmap ranges with a power of two alignment, fixing the bug. Fixes: ab597d35ef11 ("PCI: xilinx-nwl: Add support for Xilinx NWL PCIe Host Controller") Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>	2019-06-26 10:56:51 +01:00
Rafael J. Wysocki	25bc694a8a	Merge back PCI power management material for v5.3.	2019-06-24 10:11:27 +02:00
Suzuki K Poulose	418e3ea157	bus_find_device: Unify the match callback with class_find_device There is an arbitrary difference between the prototypes of bus_find_device() and class_find_device() preventing their callers from passing the same pair of data and match() arguments to both of them, which is the const qualifier used in the prototype of class_find_device(). If that qualifier is also used in the bus_find_device() prototype, it will be possible to pass the same match() callback function to both bus_find_device() and class_find_device(), which will allow some optimizations to be made in order to avoid code duplication going forward. Also with that, constify the "data" parameter as it is passed as a const to the match function. For this reason, change the prototype of bus_find_device() to match the prototype of class_find_device() and adjust its callers to use the const qualifier in accordance with the new prototype of it. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Andreas Noever <andreas.noever@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Corey Minyard <minyard@acm.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Kershner <david.kershner@unisys.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Airlie <airlied@linux.ie> Cc: Felipe Balbi <balbi@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Hartmut Knaack <knaack.h@gmx.de> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Cameron <jic23@kernel.org> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: Len Brown <lenb@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Jamet <michael.jamet@intel.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Sebastian Ott <sebott@linux.ibm.com> Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Yehezkel Bernat <YehezkelShB@gmail.com> Cc: rafael@kernel.org Acked-by: Corey Minyard <minyard@acm.org> Acked-by: David Kershner <david.kershner@unisys.com> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> # for the I2C parts Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-24 05:22:31 +02:00
Linus Torvalds	b253d5f3ec	pci-v5.2-fixes-1 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl0OTC0UHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vxBkQ//Zye+wY3Md9M3TWHYjj73jeJTmfb/ BQgWU2OsOvGWDgQe7gX2c5sjkJe1S450Mf9CvRYu77z1SHgT/2E25yY3OxV1wNMS UX1xjm91N1/KBnPj2L6ks5exequobVdAkkhUF0WsRB8L09KHe+E/eg8hKNhRA5jK mqNCmRox20LjbkKpKJF2p20ynU8+psFoM3Enm1JRo5UprgXsfFwBJaB75qQBGhN8 SaRBQVMP2vJoghRVeofj2y/cZWSIKEcZ/dOY+q6MMzd2hsRxjiuqRLrm4f6MStdV W+0qr6qd0V6BzxMO+NrCrrhWrkjEb8cqB0F8V/hw3xU8G/17CBEk34HLrqWi2+3D //puT7TjXA8t/awkuz+wH2saDldZU4BfeDgpEriop//jQa30EhXM0RLiUBZofKfS U88qkd4N/CqPbScTe71ve5pUW5WH5kdcBYWHTN5venEW3sxCR13vFtlfDheki/Mc C865E7+ZEep8FhakhGrwiS6MjQrlF1Mzq3BxGviED0Cw92Rz3SEShdp+C0Qk/Av6 5OYUaDfOw6tx92hBz6DtlbTHUNYbXCMv9aXCR2ju1DjDrIkFeIIr8cJwKI64f4LZ EJXIQrEKVWD7QOLe1ebRBlZKV+mEN0q9ZTII2waMcUfZ7GXLLueqYkCKRC8hd805 +5sZtKspodBpl04= =0Qnd -----END PGP SIGNATURE----- Merge tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2 because we don't yet know how to support P2PDMA in that case (Logan Gunthorpe)" * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present	2019-06-22 09:42:29 -07:00
David S. Miller	92ad6325cb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-22 08:59:24 -04:00
Heiner Kallweit	4cfd218855	PCI: let pci_disable_link_state propagate errors Drivers may rely on pci_disable_link_state() having disabled certain ASPM link states. If OS can't control ASPM then pci_disable_link_state() turns into a no-op w/o informing the caller. The driver therefore may falsely assume the respective ASPM link states are disabled. Let pci_disable_link_state() propagate errors to the caller, enabling the caller to react accordingly. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-21 22:05:42 -04:00
Benjamin Herrenschmidt	7ac0d094fb	PCI: Don't auto-realloc if we're preserving firmware config Prevent auto-enabling of bridges reallocation when the FW tells us that the initial configuration must be preserved for a given host bridge. Link: https://lore.kernel.org/r/20190615002359.29577-3-benh@kernel.crashing.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2019-06-21 18:11:53 -05:00
Marek Vasut	dc6b698a86	PCI: sysfs: Ignore lockdep for remove attribute With CONFIG_PROVE_LOCKING=y, using sysfs to remove a bridge with a device below it causes a lockdep warning, e.g., # echo 1 > /sys/class/pci_bus/0000:00/device/0000:00:00.0/remove ============================================ WARNING: possible recursive locking detected ... pci_bus 0000:01: busn_res: [bus 01] is released The remove recursively removes the subtree below the bridge. Each call uses a different lock so there's no deadlock, but the locks were all created with the same lockdep key so the lockdep checker can't tell them apart. Mark the "remove" sysfs attribute with __ATTR_IGNORE_LOCKDEP() as it is safe to ignore the lockdep check between different "remove" kernfs instances. There's discussion about a similar issue in USB at [1], which resulted in 356c05d58af0 ("sysfs: get rid of some lockdep false positives") and e9b526fe7048 ("i2c: suppress lockdep warning on delete_device"), which do basically the same thing for USB "remove" and i2c "delete_device" files. [1] https://lore.kernel.org/r/Pine.LNX.4.44L0.1204251436140.1206-100000@iolanthe.rowland.org Link: https://lore.kernel.org/r/20190526225151.3865-1-marek.vasut@gmail.com Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> [bhelgaas: trim commit log, details at above links] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Tejun Heo <tj@kernel.org> Cc: Wolfram Sang <wsa@the-dreams.de>	2019-06-21 09:40:13 -05:00
Manikanta Maddireddy	2d8c736158	PCI: tegra: Put PEX CLK & BIAS pads in DPD mode In Tegra210 AFI design has clamp value for the BIAS pad as 0, which keeps the bias pad in non power down mode. This is leading to power consumption of 2 mW in BIAS pad, even if the PCIe partition is powergated. To avoid unnecessary power consumption, put PEX CLK & BIAS pads in deep power down mode when PCIe partition is power gated. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:40:48 +01:00
Manikanta Maddireddy	adb2653b3d	PCI: tegra: Add AFI_PEX2_CTRL reg offset as part of SoC struct Tegra186 and Tegra30 have three PCIe root ports. AFI_PEX2_CTRL register is defined for third root port. Offset of this register in Tegra186 is different from Tegra30, so add the offset as part of SoC data structure. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:40:48 +01:00
Manikanta Maddireddy	c894121d01	PCI: tegra: Change PRSNT_SENSE IRQ log to debug PRSNT_MAP bit field is programmed to update the slot present status. PRSNT_SENSE IRQ is triggered when this bit field is programmed, which is not an error. Add a new if condition to trap PRSNT_SENSE code and print it with debug log level. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:40:48 +01:00
Manikanta Maddireddy	b5b4717ea0	PCI: tegra: Program AFI_CACHE_BAR_{0,1}_{ST,SZ} registers only for Tegra20 Cacheable upstream transactions are supported in Tegra20 and Tegra186 only. AFI_CACHE_BAR_{0,1}_{ST,SZ} registers are available in Tegra20 to support cacheable upstream transactions. In Tegra186, AFI_AXCACHE register is defined instead of AFI_CACHE_BAR_{0,1}_{ST,SZ} to be in line with its memory subsystem design. Therefore, program AFI_CACHE_BAR_{0,1}_{ST,SZ} registers only for Tegra20. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:40:48 +01:00
Manikanta Maddireddy	eef4a35026	PCI: tegra: Fix PLLE power down issue due to CLKREQ# signal Disable controllers which failed to bring the link up and configure CLKREQ# signals of these controllers as GPIO. This is required to avoid CLKREQ# signal of inactive controllers interfering with PLLE power down sequence. PCIE_CLKREQ_GPIO bits are defined only in Tegra186, however programming these bits in other SoCs doesn't cause any side effects. Program these bits for all Tegra SoCs to avoid a conditional check. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:23:05 +01:00
Manikanta Maddireddy	c23ae2aec5	PCI: tegra: Set target speed as Gen1 before starting LTSSM PCIe link up fails with few legacy endpoints if root port advertises both Gen-1 and Gen-2 speeds in Tegra. This is because link number negotiation fails if both Gen1 & Gen2 are advertised. Tegra doesn't retry link up by advertising only Gen1. Hence, the strategy followed here is to initially advertise only Gen-1 and after link is up, retrain link to Gen-2 speed. Tegra doesn't support HW autonomous speed change. Link comes up in Gen1 even if Gen2 is advertised, so there is no downside of this change. This behavior is observed with following two PCIe devices on Tegra: - Fusion HDTV 5 Express card - IOGear SIL - PCIE - SATA card Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:22:27 +01:00
Manikanta Maddireddy	9f570b6c24	PCI: tegra: Update flow control timer frequency in Tegra210 Recommended UpdateFC threshold in Tegra210 is 0x60 for best performance of x1 link. Setting this to 0x60 provides the best balance between number of UpdateFC packets and read data sent over the link. UpdateFC timer frequency is equal to twice the value of register content in nsec, i.e (2 * 0x60) = 192 nsec. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:22:12 +01:00
Manikanta Maddireddy	191cd6fb5d	PCI: tegra: Add SW fixup for RAW violations The logic which blocks read requests till AFI gets ACK for all outstanding writes from memory controller does not behave correctly when number of outstanding writes become more than 32 in Tegra124 and Tegra132. SW fixup is to prevent writes from accumulating more than 32 by: - limiting outstanding posted writes to 14 - modifying Gen1 and Gen2 UpdateFC timer frequency UpdateFC timer frequency is equal to twice the value of register content in nsec. These settings are recommended after stress testing with different values. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:21:52 +01:00
Manikanta Maddireddy	b2634cd0d2	PCI: tegra: Increase the deskew retry time Sometimes link speed change from Gen2 to Gen1 fails due to instability in deskew logic on lane-0 in Tegra210. Increase the deskew retry time to resolve this issue. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:21:44 +01:00
Manikanta Maddireddy	f1178099a6	PCI: tegra: Enable PCIe xclk clock clamping Enable xclk clock clamping when entering L1. Clamp threshold will determine the time spent waiting for clock module to turn on xclk after signaling it. Default threshold value in Tegra124 and Tegra210 is not enough to turn on xclk clock. Increase the clamp threshold to meet the clock module timing in Tegra124 and Tegra210. Default threshold value is enough in Tegra20, Tegra30 and Tegra186. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:21:31 +01:00
Manikanta Maddireddy	52db2fd89e	PCI: tegra: Process pending DLL transactions before entering L1 or L2 PM message are truncated while entering L1 or L2, which is resulting in receiver errors. Set the required bit to finish processing DLLP before link enter L1 or L2. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:21:21 +01:00
Manikanta Maddireddy	92bd94f1fd	PCI: tegra: Disable AFI dynamic clock gating Outstanding write counter in AFI is used to generate idle signal to dynamically gate the AFI clock. When there are 32 outstanding writes from AFI to memory, the outstanding write counter overflows and indicates that there are "0" outstanding write transactions. When memory controller is under heavy load, write completions to AFI gets delayed and AFI write counter overflows. This causes AFI clock gating even when there are outstanding transactions towards memory controller resulting in a system hang. Disable dynamic clock gating of AFI clock to avoid system hang. CLKEN_OVERRIDE bit is not defined in Tegra20 and Tegra30, however programming this bit doesn't cause any side effects. Program this bit for all Tegra SoCs to avoid conditional check. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:20:51 +01:00
Manikanta Maddireddy	7763cc24e2	PCI: tegra: Enable opportunistic UpdateFC and ACK Enable opportunistic UpdateFC and ACK to allow data link layer send pending ACKs and UpdateFC packets when link is idle instead of waiting for timers to expire. This improves the PCIe performance due to better utilization of PCIe bandwidth. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:20:40 +01:00
Manikanta Maddireddy	2513a4ee47	PCI: tegra: Program UPHY electrical settings for Tegra210 UPHY electrical programming guidelines are documented in Tegra210 TRM. Program these electrical settings for proper eye diagram in Gen1 and Gen2 link speeds. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:20:32 +01:00
Manikanta Maddireddy	c635a815c8	PCI: tegra: Advertise PCIe Advanced Error Reporting (AER) capability Default root port setting hides AER capability. This patch enables the advertisement of AER capability by root port. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:20:25 +01:00
Manikanta Maddireddy	538123a29a	PCI: tegra: Add PCIe Gen2 link speed support Tegra124, Tegra132, Tegra210 and Tegra186 support Gen2 link speed. After PCIe link is up in Gen1, set target link speed as Gen2 and retrain link. Link switches to Gen2 speed if Gen2 capable end point is connected, otherwise the link stays in Gen1. Per PCIe 4.0r0.9 sec 7.6.3.7 implementation note, driver needs to wait for PCIe LTSSM to come back from recovery before retraining the link. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:19:47 +01:00
Manikanta Maddireddy	d1f9113faf	PCI: tegra: Fix PCIe host power up sequence The PCIe host power up sequence requires to program AFI(AXI to FPCI bridge) registers first and then PCIe registers, otherwise AFI register settings may not latch to PCIe IP. PCIe root port starts LTSSM as soon as PCIe xrst is deasserted. So deassert PCIe xrst after programming PCIe registers. Modify PCIe power up sequence as follows: - Power ungate PCIe partition - Enable AFI clock - Deassert AFI reset - Program AFI registers - Enable PCIe clock - Deassert PCIe reset - Program PCIe PHY - Program PCIe pad control registers - Program PCIe root port registers - Deassert PCIe xrst to start LTSSM Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:18:50 +01:00
Manikanta Maddireddy	316b9ef1ee	PCI: tegra: Mask AFI_INTR in runtime suspend AFI_INTR is unmasked in tegra_pcie_enable_controller(), mask it to avoid unwanted interrupts raised by AFI after pex_rst is asserted. The following sequence triggers such scenario: - tegra_pcie_remove() triggers runtime suspend - pex_rst is asserted in runtime suspend - PRSNT_MAP bit field in RP_PRIV_MISC register changes from EP_PRSNT to EP_ABSNT - This is sensed by AFI and triggers "Slot present pin change" interrupt - tegra_pcie_isr() function accesses AFI register when runtime suspend is going through power off sequence Resulting faulty backtrace: rmmod pci-tegra pci_generic_config_write32: 108 callbacks suppressed pci_bus 0002:00: 2-byte config write to 0002:00:02.0 offset 0x4c may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:02.0 offset 0x9c may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:02.0 offset 0x88 may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:02.0 offset 0x90 may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:02.0 offset 0x4 may corrupt adjacent RW1C bits igb 0002:04:00.1: removed PHC on enP2p4s0f1 igb 0002:04:00.0: removed PHC on enP2p4s0f0 pci_bus 0002:00: 2-byte config write to 0002:00:01.0 offset 0x4c may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:01.0 offset 0x9c may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:01.0 offset 0x88 may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:01.0 offset 0x90 may corrupt adjacent RW1C bits pci_bus 0002:00: 2-byte config write to 0002:00:01.0 offset 0x4 may corrupt adjacent RW1C bits rcu: INFO: rcu_preempt self-detected stall on CPU SError Interrupt on CPU0, code 0xbf000002 -- SError CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.1.0-rc3-next-20190405-00027-gcd8110499e6f-dirty #42 Hardware name: NVIDIA Jetson TX1 Developer Kit (DT) pstate: 20000085 (nzCv daIf -PAN -UAO) pc : tegra_pcie_isr+0x58/0x178 [pci_tegra] lr : tegra_pcie_isr+0x40/0x178 [pci_tegra] sp : ffff000010003da0 x29: ffff000010003da0 x28: 0000000000000000 x27: ffff8000f9e61000 x26: ffff000010fbf420 x25: ffff000011427f93 x24: ffff8000fa600410 x23: ffff00001129d000 x22: ffff00001129d000 x21: ffff8000f18bf3c0 x20: 0000000000000070 x19: 00000000ffffffff x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: ffff000008d40a48 x13: ffff000008d40a30 x12: ffff000008d40a20 x11: ffff000008d40a10 x10: ffff000008d40a00 x9 : ffff000008d409e8 x8 : ffff000008d40ae8 x7 : ffff000008d40ad0 x6 : ffff000010003e58 x5 : ffff8000fac00248 x4 : 0000000000000000 x3 : ffff000008d40b08 x2 : fffffffffffffff8 x1 : ffff000008d3f4e8 x0 : 00000000ffffffff Kernel panic - not syncing: Asynchronous SError Interrupt CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.1.0-rc3-next-20190405-00027-gcd8110499e6f-dirty #42 Hardware name: NVIDIA Jetson TX1 Developer Kit (DT) Call trace: dump_backtrace+0x0/0x158 show_stack+0x14/0x20 dump_stack+0xa8/0xcc panic+0x140/0x2f4 nmi_panic+0x6c/0x70 arm64_serror_panic+0x74/0x80 __pte_error+0x0/0x28 el1_error+0x84/0xf8 tegra_pcie_isr+0x58/0x178 [pci_tegra] __handle_irq_event_percpu+0x70/0x198 handle_irq_event_percpu+0x34/0x88 handle_irq_event+0x48/0x78 handle_fasteoi_irq+0xb4/0x190 generic_handle_irq+0x24/0x38 __handle_domain_irq+0x5c/0xb8 gic_handle_irq+0x58/0xa8 el1_irq+0xb8/0x180 cpuidle_enter_state+0x138/0x358 cpuidle_enter+0x18/0x20 call_cpuidle+0x1c/0x48 do_idle+0x230/0x2d0 cpu_startup_entry+0x20/0x28 rest_init+0xd4/0xe0 arch_call_rest_init+0xc/0x14 start_kernel+0x444/0x470 AFI_INTR is re-enabled on resume in tegra_pcie_pm_resume() through tegra_pcie_enable_controller(). Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> [lorenzo.pieralisi@arm.com: updated log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:14:28 +01:00
Manikanta Maddireddy	973d7499c5	PCI: tegra: Rearrange Tegra PCIe driver functions Tegra PCIe has register specifications for: - AXI to FPCI(AFI) bridge - Multiple PCIe root ports - PCIe PHY - PCIe pad control Rearrange Tegra PCIe driver functions so that each function programs the required module only. - tegra_pcie_enable_controller(): Program AFI module and enable PCIe controller - tegra_pcie_phy_power_on(): Bring up PCIe PHY - tegra_pcie_apply_pad_settings(): Program PCIe REFCLK pad settings - tegra_pcie_enable_ports(): Program each root port and bring up PCIe link Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:12:56 +01:00
Manikanta Maddireddy	1056dda8a8	PCI: tegra: Handle failure cases in tegra_pcie_power_on() Unroll the PCIe power on sequence if any one of the steps fails in tegra_pcie_power_on(). Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com>	2019-06-20 17:12:45 +01:00
Logan Gunthorpe	6dbbd053e6	PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present Presently, there is no path to DMA map P2PDMA memory, so if a TLP targeting this memory hits the root complex and an IOMMU is present, the IOMMU will reject the transaction, even if the RC would support P2PDMA. So until the kernel knows to map these DMA addresses in the IOMMU, we should not enable the whitelist when an IOMMU is present. Link: https://lore.kernel.org/linux-pci/20190522201252.2997-1-logang@deltatee.com/ Fixes: 0f97da831026 ("PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root Complex") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de>	2019-06-19 16:43:42 -05:00
Linus Torvalds	abf02e2964	Power management fix for 5.2-rc6 Prevent PCI bridges in general (and PCIe ports in particular) from being put into low-power states during system-wide suspend transitions if there are any devices in D0 below them and refine the handling of PCI devices in D0 during suspend-to-idle cycles. -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl0KAdUSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxejIQAIoR8FCLoKcxD4wJ6sDp5CtGVaw65pc9 i0WaTGlQWiBcr3bkxCRERl+NNjolVrUu7aAVrrUNe5SUQduFXZuGsreF0q3SPMUh OZwSb+EpN6gSM3GTjrsF2P9nyvlJ80r5t9HI6vG1hAEFBU8T15gGVS6bnwm4ci7I +KuIb4zwkOQ+LCwjqwkGjn6s4ZHmx2KxGnI58GBTAd4KsvV3G7QIaa7Lfa/js88C pDhz8BiQqs/HTU0gHY52hsEvhKPeefMKH3QDpBFhoR0p1ZOkMoqK1jA+Kc5r/JF+ 36Fj/rPlD26pmqYMZA7bZi4Ij0M+vR8SWCdefcvzPqUZpzkHh9y7/foi01DVNjsf QGhlJODgGUl78mjEQwdPXz/ntzj4DEyo/3Re9Xf/SZ09sMeoyhbNi5Qolri05LqV 8hAshCNcFLbOzF1emcAa+Yq76tggWnW78q3oKAsfUqg4Olyvcbxy6J3GDRpzTwPz D/4lEM7jtSqbcgprqWUcANB/zE3Jw93et0QtQNUfdOJ6a+LsS2XAhqenkQn2JQpk 7ZjaVfNNm3YDQlKt6nPaWCCxVv/g6KHSYXDeWB5VJpCOrSVhdXAZgPU+UCGrk9TU 3TqdqFoKi0LVZJVuWT+oyNfwzfolGZ7gd7TJVndFxVM8kbcLTzrj1ZQgpwP1l/tI Xs12WM7cw1dy =n3GG -----END PGP SIGNATURE----- Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Prevent PCI bridges in general (and PCIe ports in particular) from being put into low-power states during system-wide suspend transitions if there are any devices in D0 below them and refine the handling of PCI devices in D0 during suspend-to-idle cycles" * tag 'pm-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI: PM: Skip devices in D0 for suspend-to-idle	2019-06-19 11:44:04 -07:00
Mika Westerberg	000dd5316e	PCI: Do not poll for PME if the device is in D3cold PME polling does not take into account that a device that is directly connected to the host bridge may go into D3cold as well. This leads to a situation where the PME poll thread reads from a config space of a device that is in D3cold and gets incorrect information because the config space is not accessible. Here is an example from Intel Ice Lake system where two PCIe root ports are in D3cold (I've instrumented the kernel to log the PMCSR register contents): [ 62.971442] pcieport 0000:00:07.1: Check PME status, PMCSR=0xffff [ 62.971504] pcieport 0000:00:07.0: Check PME status, PMCSR=0xffff Since 0xffff is interpreted so that PME is pending, the root ports will be runtime resumed. This repeats over and over again essentially blocking all runtime power management. Prevent this from happening by checking whether the device is in D3cold before its PME status is read. Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: 3.6+ <stable@vger.kernel.org> # v3.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-06-18 01:40:41 +02:00
Mika Westerberg	c2bf1fc212	PCI: Add missing link delays required by the PCIe spec Currently Linux does not follow PCIe spec regarding the required delays after reset. A concrete example is a Thunderbolt add-in-card that consists of a PCIe switch and two PCIe endpoints: +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller +-01.0-[04-36]-- DS hotplug port +-02.0-[37]----00.0 xHCI controller \-04.0-[38-6b]-- DS hotplug port The root port (1b.0) and the PCIe switch downstream ports are all PCIe gen3 so they support 8GT/s link speeds. We wait for the PCIe hierarchy to enter D3cold (runtime): pcieport 0000:00:1b.0: power state changed by ACPI to D3cold When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the PCIe switch is put to reset and its power is re-applied. This means that we must follow the rules in PCIe 4.0 section 6.6.1. For the PCIe gen3 ports we are dealing with here, the following applies: With a Downstream Port that supports Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms after Link training completes before sending a Configuration Request to the device immediately below that Port. Software can determine when Link training completes by polling the Data Link Layer Link Active bit or by setting up an associated interrupt (see Section 6.7.3.3). Translating this into the above topology we would need to do this (DLLLA stands for Data Link Layer Link Active): pcieport 0000:00:1b.0: wait for 100ms after DLLLA is set before access to 0000:01:00.0 pcieport 0000:02:00.0: wait for 100ms after DLLLA is set before access to 0000:03:00.0 pcieport 0000:02:02.0: wait for 100ms after DLLLA is set before access to 0000:37:00.0 I've instrumented the kernel with additional logging so we can see the actual delays the kernel performs: pcieport 0000:00:1b.0: power state changed by ACPI to D0 pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms pcieport 0000:00:1b.0: waking up bus pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60) ... pcieport 0000:00:1b.0: PME# disabled pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) ... pcieport 0000:01:00.0: PME# disabled pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) ... pcieport 0000:02:00.0: PME# disabled pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) ... pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407) pcieport 0000:02:01.0: PME# disabled pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) ... pcieport 0000:02:02.0: PME# disabled pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) ... pcieport 0000:02:04.0: PME# disabled pcieport 0000:02:01.0: PME# enabled pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms pcieport 0000:02:04.0: PME# enabled pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000) ... thunderbolt 0000:03:00.0: PME# disabled xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000) ... xhci_hcd 0000:37:00.0: PME# disabled For the switch upstream port (01:00.0) we wait for 100ms but not taking into account the DLLLA requirement. We then wait 10ms for D3hot -> D0 transition of the root port and the two downstream hotplug ports. This means that we deviate from what the spec requires. Performing the same check for system sleep (s2idle) transitions we can see following when resuming from s2idle: pcieport 0000:00:1b.0: power state changed by ACPI to D0 pcieport 0000:00:1b.0: restoring config space at offset 0x2c (was 0x60, writing 0x60) ... pcieport 0000:01:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) ... pcieport 0000:02:02.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) pcieport 0000:02:02.0: restoring config space at offset 0x2c (was 0x0, writing 0x0) pcieport 0000:02:01.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) pcieport 0000:02:04.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) pcieport 0000:02:02.0: restoring config space at offset 0x28 (was 0x0, writing 0x0) pcieport 0000:02:00.0: restoring config space at offset 0x3c (was 0x1ff, writing 0x201ff) pcieport 0000:02:02.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1) pcieport 0000:02:01.0: restoring config space at offset 0x2c (was 0x0, writing 0x60) pcieport 0000:02:02.0: restoring config space at offset 0x20 (was 0x0, writing 0x73f073f0) pcieport 0000:02:04.0: restoring config space at offset 0x2c (was 0x0, writing 0x60) pcieport 0000:02:01.0: restoring config space at offset 0x28 (was 0x0, writing 0x60) pcieport 0000:02:00.0: restoring config space at offset 0x2c (was 0x0, writing 0x0) pcieport 0000:02:02.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1) pcieport 0000:02:04.0: restoring config space at offset 0x28 (was 0x0, writing 0x60) pcieport 0000:02:01.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1ff10001) pcieport 0000:02:00.0: restoring config space at offset 0x28 (was 0x0, writing 0x0) pcieport 0000:02:02.0: restoring config space at offset 0x18 (was 0x0, writing 0x373702) pcieport 0000:02:04.0: restoring config space at offset 0x24 (was 0x10001, writing 0x49f12001) pcieport 0000:02:01.0: restoring config space at offset 0x20 (was 0x0, writing 0x73e05c00) pcieport 0000:02:00.0: restoring config space at offset 0x24 (was 0x10001, writing 0x1fff1) pcieport 0000:02:04.0: restoring config space at offset 0x20 (was 0x0, writing 0x89f07400) pcieport 0000:02:01.0: restoring config space at offset 0x1c (was 0x101, writing 0x5151) pcieport 0000:02:00.0: restoring config space at offset 0x20 (was 0x0, writing 0x8a008a00) pcieport 0000:02:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020) pcieport 0000:02:04.0: restoring config space at offset 0x1c (was 0x101, writing 0x6161) pcieport 0000:02:01.0: restoring config space at offset 0x18 (was 0x0, writing 0x360402) pcieport 0000:02:00.0: restoring config space at offset 0x1c (was 0x101, writing 0x1f1) pcieport 0000:02:04.0: restoring config space at offset 0x18 (was 0x0, writing 0x6b3802) pcieport 0000:02:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407) pcieport 0000:02:00.0: restoring config space at offset 0x18 (was 0x0, writing 0x30302) pcieport 0000:02:01.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020) pcieport 0000:02:04.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020) pcieport 0000:02:00.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020) pcieport 0000:02:01.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407) pcieport 0000:02:04.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407) pcieport 0000:02:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407) xhci_hcd 0000:37:00.0: restoring config space at offset 0x10 (was 0x0, writing 0x73f00000) ... thunderbolt 0000:03:00.0: restoring config space at offset 0x14 (was 0x0, writing 0x8a040000) This is even worse. None of the mandatory delays are performed. If this would be S3 instead of s2idle then according to PCI FW spec 3.2 section 4.6.8. there is a specific _DSM that allows the OS to skip the delays but this platform does not provide the _DSM and does not go to S3 anyway so no firmware is involved that could already handle these delays. In this particular Intel Coffee Lake platform these delays are not actually needed because there is an additional delay as part of the ACPI power resource that is used to turn on power to the hierarchy but since that additional delay is not required by any of standards (PCIe, ACPI) it is not present in the Intel Ice Lake, for example where missing the mandatory delays causes pciehp to start tearing down the stack too early (links are not yet trained). For this reason, change the PCIe portdrv PM resume hooks so that they perform the mandatory delays before the downstream component gets resumed. We perform the delays before port services are resumed because otherwise pciehp might find that the link is not up (even if it is just training) and tears-down the hierarchy. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-06-18 01:40:41 +02:00
Ley Foon Tan	7a28db0a25	PCI: altera: Fix configuration type based on secondary number Stratix 10 PCIe controller does not support Type 1 to Type 0 conversion as previous version (V1) does so the PCIe controller configuration mechanism needs to send Type 0 config TLP if the target bus number matches with the secondary bus number. Implement a function to form a TLP header that depends on the PCIe controller version, so that the header can be formed according to specific host controller HW internals, fixing the type conversion issue. Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com> [lorenzo.pieralisi@arm.com: commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2019-06-17 12:22:25 +01:00
Miquel Raynal	c369b536f8	PCI: armada8k: Add PHYs support Bring PHY support for the Armada8k driver. The Armada8k IP only supports x1, x2 or x4 link widths. Iterate over the DT 'phys' entries and configure them one by one. Use phy_set_mode_ext() to make use of the submode parameter (initially introduced for Ethernet modes). For PCI configuration, let the submode be the width (1, 2, 4, etc) so that the PHY driver knows how many lanes are bundled. Do not error out in case of error for compatibility reasons. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>	2019-06-17 12:15:07 +01:00
Rafael J. Wysocki	0c7376ada9	PCI: PM: Replace pci_dev_keep_suspended() with two functions The code in pci_dev_keep_suspended() is relatively hard to follow due to the negative checks in it and in its callers and the function has a possible side-effect (disabling the PME) which doesn't really match its role. For this reason, move the PME disabling from pci_dev_keep_suspended() to a separate function and change the semantics (and name) of the rest of it, so that 'true' is returned when the device needs to be resumed (and not the other way around). Change the callers of pci_dev_keep_suspended() accordingly. While at it, make the code flow in pci_pm_poweroff() reflect the pci_pm_suspend() more closely to avoid arbitrary differences between them. This is a cosmetic change with no intention to alter behavior. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2019-06-17 12:30:24 +02:00
Rafael J. Wysocki	234f223d63	PCI: PM: Avoid resuming devices in D3hot during system suspend The current code resumes devices in D3hot during system suspend if the target power state for them is D3cold, but that is not necessary in general. It only is necessary to do that if the platform firmware requires the device to be resumed, but that should be covered by the platform_pci_need_resume() check anyway, so rework pci_dev_keep_suspended() to avoid returning 'false' for devices in D3hot which need not be resumed due to platform firmware requirements. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>	2019-06-17 12:30:24 +02:00
Dan Williams	50f44ee724	mm/devm_memremap_pages: fix final page put race Logan noticed that devm_memremap_pages_release() kills the percpu_ref drops all the page references that were acquired at init and then immediately proceeds to unplug, arch_remove_memory(), the backing pages for the pagemap. If for some reason device shutdown actually collides with a busy / elevated-ref-count page then arch_remove_memory() should be deferred until after that reference is dropped. As it stands the "wait for last page ref drop" happens after devm_memremap_pages_release() returns, which is obviously too late and can lead to crashes. Fix this situation by assigning the responsibility to wait for the percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup() callback. Implement the new cleanup callback for all devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma. Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 41e94a851304 ("add devm_memremap_pages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-06-13 17:34:56 -10:00
Dan Williams	1570175abd	PCI/P2PDMA: track pgmap references per resource, not globally In preparation for fixing a race between devm_memremap_pages_release() and the final put of a page from the device-page-map, allocate a percpu-ref per p2pdma resource mapping. Link: http://lkml.kernel.org/r/155727338646.292046.9922678317501435597.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-06-13 17:34:56 -10:00
Dan Williams	e615a19121	PCI/P2PDMA: fix the gen_pool_add_virt() failure path The pci_p2pdma_add_resource() implementation immediately frees the pgmap if gen_pool_add_virt() fails. However, that means that when @dev triggers a devres release devm_memremap_pages_release() will crash trying to access the freed @pgmap. Use the new devm_memunmap_pages() to manually free the mapping in the error path. Link: http://lkml.kernel.org/r/155727337603.292046.13101332703665246702.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-06-13 17:34:56 -10:00

... 3 4 5 6 7 ...

7435 Commits