From eeee3b5e6d0bf331befa57b4dcb079f827bcd829 Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Wed, 27 Mar 2024 10:45:09 +0800 Subject: [PATCH 1/2] PCI: Mask Replay Timer Timeout errors for Genesys GL975x SD host controller Due to a hardware defect in GL975x, config accesses when ASPM is enabled frequently cause Replay Timer Timeouts in the Port leading to the device. These are Correctable Errors, so the Downstream Port logs it in its AER Correctable Error Status register and, when the error is not masked, sends an ERR_COR message upstream. The message terminates at a Root Port, which may generate an AER interrupt so the OS can log it. The Correctable Error logging is an annoyance but not a major issue itself. But when the AER interrupt happens during suspend, it can prevent the system from suspending. 015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the replay timer timeout of AER") masked these errors in the GL975x itself. Mask these errors in the Port leading to GL975x as well. Note that Replay Timer Timeouts will still be logged in the AER Correctable Error Status register, but they will not cause AER interrupts. Link: https://lore.kernel.org/r/20240327024509.1071189-1-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng [bhelgaas: commit log, update dmesg note] Signed-off-by: Bjorn Helgaas Cc: Victor Shih Cc: Ben Chuang --- drivers/pci/quirks.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index bf4833221816..5cb0f7fae3b8 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -6261,3 +6261,23 @@ static void pci_fixup_d3cold_delay_1sec(struct pci_dev *pdev) pdev->d3cold_delay = 1000; } DECLARE_PCI_FIXUP_FINAL(0x5555, 0x0004, pci_fixup_d3cold_delay_1sec); + +#ifdef CONFIG_PCIEAER +static void pci_mask_replay_timer_timeout(struct pci_dev *pdev) +{ + struct pci_dev *parent = pci_upstream_bridge(pdev); + u32 val; + + if (!parent || !parent->aer_cap) + return; + + pci_info(parent, "mask Replay Timer Timeout Correctable Errors due to %s hardware defect", + pci_name(pdev)); + + pci_read_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, &val); + val |= PCI_ERR_COR_REP_TIMER; + pci_write_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, val); +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout); +#endif From a29e5290e3566ae4db4e6fe5f31caf23118c82b6 Mon Sep 17 00:00:00 2001 From: Kuppuswamy Sathyanarayanan Date: Tue, 16 Apr 2024 05:50:35 +0000 Subject: [PATCH 2/2] PCI/AER: Update aer-inject tool source URL The aer-inject tool is no longer maintained in the original repository and is missing a fix related to the musl library. So, with the author's (Huang Ying) consent, it has been moved to a new repository [1]. Update all references to the repository link. Link: https://github.com/intel/aer-inject.git [1] Link: https://lore.kernel.org/r/20240416055035.200085-1-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Bjorn Helgaas Cc: Huang Ying --- Documentation/PCI/pcieaer-howto.rst | 2 +- drivers/pci/pcie/Kconfig | 2 +- drivers/pci/pcie/aer_inject.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst index e00d63971695..f013f3b27c82 100644 --- a/Documentation/PCI/pcieaer-howto.rst +++ b/Documentation/PCI/pcieaer-howto.rst @@ -241,7 +241,7 @@ After reboot with new kernel or insert the module, a device file named Then, you need a user space tool named aer-inject, which can be gotten from: - https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ + https://github.com/intel/aer-inject.git More information about aer-inject can be found in the document in its source code. diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig index 8999fcebde6a..17919b99fa66 100644 --- a/drivers/pci/pcie/Kconfig +++ b/drivers/pci/pcie/Kconfig @@ -47,7 +47,7 @@ config PCIEAER_INJECT error injection can fake almost all kinds of errors with the help of a user space helper tool aer-inject, which can be gotten from: - https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ + https://github.com/intel/aer-inject.git config PCIEAER_CXL bool "PCI Express CXL RAS support" diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c index 2dab275d252f..f81b2303bf6a 100644 --- a/drivers/pci/pcie/aer_inject.c +++ b/drivers/pci/pcie/aer_inject.c @@ -6,7 +6,7 @@ * trigger various real hardware errors. Software based error * injection can fake almost all kinds of errors with the help of a * user space helper tool aer-inject, which can be gotten from: - * https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ + * https://github.com/intel/aer-inject.git * * Copyright 2009 Intel Corporation. * Huang Ying