2006-07-31 15:21:33 +08:00
/*
* drivers / pci / pcie / aer / aerdrv_core . c
*
* This file is subject to the terms and conditions of the GNU General Public
* License . See the file " COPYING " in the main directory of this archive
* for more details .
*
* This file implements the core part of PCI - Express AER . When an pci - express
* error is delivered , an error message will be collected and printed to
* console , then , an error recovery procedure will be executed by following
* the pci error recovery rules .
*
* Copyright ( C ) 2006 Intel Corp .
* Tom Long Nguyen ( tom . l . nguyen @ intel . com )
* Zhang Yanmin ( yanmin . zhang @ intel . com )
*
*/
# include <linux/module.h>
# include <linux/pci.h>
# include <linux/kernel.h>
# include <linux/errno.h>
# include <linux/pm.h>
# include <linux/suspend.h>
# include <linux/delay.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2006-07-31 15:21:33 +08:00
# include "aerdrv.h"
static int forceload ;
2009-06-16 13:35:11 +08:00
static int nosourceid ;
2006-07-31 15:21:33 +08:00
module_param ( forceload , bool , 0 ) ;
2009-06-16 13:35:11 +08:00
module_param ( nosourceid , bool , 0 ) ;
2006-07-31 15:21:33 +08:00
int pci_enable_pcie_error_reporting ( struct pci_dev * dev )
{
u16 reg16 = 0 ;
int pos ;
2010-05-18 14:35:16 +08:00
if ( pcie_aer_get_firmware_first ( dev ) )
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-02 11:51:24 -06:00
return - EIO ;
2008-10-19 20:35:20 +08:00
pos = pci_find_ext_capability ( dev , PCI_EXT_CAP_ID_ERR ) ;
2006-07-31 15:21:33 +08:00
if ( ! pos )
return - EIO ;
2009-11-11 14:31:38 +09:00
pos = pci_pcie_cap ( dev ) ;
2008-10-18 17:33:19 -07:00
if ( ! pos )
return - EIO ;
2010-04-15 13:23:17 +09:00
pci_read_config_word ( dev , pos + PCI_EXP_DEVCTL , & reg16 ) ;
reg16 | = ( PCI_EXP_DEVCTL_CERE |
2006-07-31 15:21:33 +08:00
PCI_EXP_DEVCTL_NFERE |
PCI_EXP_DEVCTL_FERE |
2010-04-15 13:23:17 +09:00
PCI_EXP_DEVCTL_URRE ) ;
pci_write_config_word ( dev , pos + PCI_EXP_DEVCTL , reg16 ) ;
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
2006-07-31 15:21:33 +08:00
return 0 ;
}
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
EXPORT_SYMBOL_GPL ( pci_enable_pcie_error_reporting ) ;
2006-07-31 15:21:33 +08:00
int pci_disable_pcie_error_reporting ( struct pci_dev * dev )
{
u16 reg16 = 0 ;
int pos ;
2010-05-18 14:35:16 +08:00
if ( pcie_aer_get_firmware_first ( dev ) )
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-02 11:51:24 -06:00
return - EIO ;
2009-11-11 14:31:38 +09:00
pos = pci_pcie_cap ( dev ) ;
2006-07-31 15:21:33 +08:00
if ( ! pos )
return - EIO ;
2010-04-15 13:23:17 +09:00
pci_read_config_word ( dev , pos + PCI_EXP_DEVCTL , & reg16 ) ;
reg16 & = ~ ( PCI_EXP_DEVCTL_CERE |
PCI_EXP_DEVCTL_NFERE |
PCI_EXP_DEVCTL_FERE |
PCI_EXP_DEVCTL_URRE ) ;
pci_write_config_word ( dev , pos + PCI_EXP_DEVCTL , reg16 ) ;
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
2006-07-31 15:21:33 +08:00
return 0 ;
}
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
EXPORT_SYMBOL_GPL ( pci_disable_pcie_error_reporting ) ;
2006-07-31 15:21:33 +08:00
int pci_cleanup_aer_uncorrect_error_status ( struct pci_dev * dev )
{
int pos ;
2009-12-03 10:28:20 -07:00
u32 status ;
2006-07-31 15:21:33 +08:00
2008-10-18 17:33:19 -07:00
pos = pci_find_ext_capability ( dev , PCI_EXT_CAP_ID_ERR ) ;
2006-07-31 15:21:33 +08:00
if ( ! pos )
return - EIO ;
pci_read_config_dword ( dev , pos + PCI_ERR_UNCOR_STATUS , & status ) ;
2009-12-03 10:28:20 -07:00
if ( status )
pci_write_config_dword ( dev , pos + PCI_ERR_UNCOR_STATUS , status ) ;
2006-07-31 15:21:33 +08:00
return 0 ;
}
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
EXPORT_SYMBOL_GPL ( pci_cleanup_aer_uncorrect_error_status ) ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:14:17 +09:00
/**
* add_error_device - list device to be handled
* @ e_info : pointer to error info
* @ dev : pointer to pci_dev to be added
*/
2009-06-16 13:35:16 +08:00
static int add_error_device ( struct aer_err_info * e_info , struct pci_dev * dev )
{
if ( e_info - > error_dev_num < AER_MAX_MULTI_ERR_DEVICES ) {
e_info - > dev [ e_info - > error_dev_num ] = dev ;
e_info - > error_dev_num + + ;
2010-04-15 13:14:17 +09:00
return 0 ;
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
}
2010-04-15 13:14:17 +09:00
return - ENOSPC ;
2009-06-16 13:35:16 +08:00
}
2009-06-16 13:35:11 +08:00
# define PCI_BUS(x) (((x) >> 8) & 0xff)
2010-04-15 13:12:21 +09:00
/**
* is_error_source - check whether the device is source of reported error
* @ dev : pointer to pci_dev to be checked
* @ e_info : pointer to reported error info
*/
static bool is_error_source ( struct pci_dev * dev , struct aer_err_info * e_info )
2009-06-16 13:35:11 +08:00
{
int pos ;
2010-04-15 13:12:21 +09:00
u32 status , mask ;
2009-06-16 13:35:11 +08:00
u16 reg16 ;
/*
* When bus id is equal to 0 , it might be a bad id
* reported by root port .
*/
if ( ! nosourceid & & ( PCI_BUS ( e_info - > id ) ! = 0 ) ) {
2010-04-15 13:13:41 +09:00
/* Device ID match? */
if ( e_info - > id = = ( ( dev - > bus - > number < < 8 ) | dev - > devfn ) )
2010-04-15 13:12:21 +09:00
return true ;
2009-06-16 13:35:16 +08:00
2010-04-15 13:12:21 +09:00
/* Continue id comparing if there is no multiple error */
2009-09-07 17:16:20 +09:00
if ( ! e_info - > multi_error_valid )
2010-04-15 13:12:21 +09:00
return false ;
2009-06-16 13:35:11 +08:00
}
/*
2009-06-16 13:35:16 +08:00
* When either
* 1 ) nosourceid = = y ;
* 2 ) bus id is equal to 0. Some ports might lose the bus
* id of error source id ;
* 3 ) There are multiple errors and prior id comparing fails ;
2010-04-15 13:12:21 +09:00
* We check AER status registers to find possible reporter .
2009-06-16 13:35:11 +08:00
*/
if ( atomic_read ( & dev - > enable_cnt ) = = 0 )
2010-04-15 13:12:21 +09:00
return false ;
2009-11-11 14:31:38 +09:00
pos = pci_pcie_cap ( dev ) ;
2009-06-16 13:35:11 +08:00
if ( ! pos )
2010-04-15 13:12:21 +09:00
return false ;
2009-06-16 13:35:11 +08:00
/* Check if AER is enabled */
2010-04-15 13:12:21 +09:00
pci_read_config_word ( dev , pos + PCI_EXP_DEVCTL , & reg16 ) ;
2009-06-16 13:35:11 +08:00
if ( ! ( reg16 & (
PCI_EXP_DEVCTL_CERE |
PCI_EXP_DEVCTL_NFERE |
PCI_EXP_DEVCTL_FERE |
PCI_EXP_DEVCTL_URRE ) ) )
2010-04-15 13:12:21 +09:00
return false ;
2009-06-16 13:35:11 +08:00
pos = pci_find_ext_capability ( dev , PCI_EXT_CAP_ID_ERR ) ;
if ( ! pos )
2010-04-15 13:12:21 +09:00
return false ;
2009-06-16 13:35:11 +08:00
2010-04-15 13:12:21 +09:00
/* Check if error is recorded */
2009-06-16 13:35:11 +08:00
if ( e_info - > severity = = AER_CORRECTABLE ) {
2009-09-07 17:12:25 +09:00
pci_read_config_dword ( dev , pos + PCI_ERR_COR_STATUS , & status ) ;
pci_read_config_dword ( dev , pos + PCI_ERR_COR_MASK , & mask ) ;
2009-06-16 13:35:11 +08:00
} else {
2009-09-07 17:12:25 +09:00
pci_read_config_dword ( dev , pos + PCI_ERR_UNCOR_STATUS , & status ) ;
pci_read_config_dword ( dev , pos + PCI_ERR_UNCOR_MASK , & mask ) ;
2006-07-31 15:21:33 +08:00
}
2010-04-15 13:12:21 +09:00
if ( status & ~ mask )
return true ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:12:21 +09:00
return false ;
}
2009-06-16 13:35:16 +08:00
2010-04-15 13:12:21 +09:00
static int find_device_iter ( struct pci_dev * dev , void * data )
{
struct aer_err_info * e_info = ( struct aer_err_info * ) data ;
if ( is_error_source ( dev , e_info ) ) {
2010-04-15 13:14:17 +09:00
/* List this device */
if ( add_error_device ( e_info , dev ) ) {
/* We cannot handle more... Stop iteration */
/* TODO: Should print error message here? */
return 1 ;
}
2010-04-15 13:12:21 +09:00
/* If there is only a single error, stop iteration */
if ( ! e_info - > multi_error_valid )
return 1 ;
}
return 0 ;
2006-07-31 15:21:33 +08:00
}
/**
* find_source_device - search through device hierarchy for source device
2007-11-28 09:04:23 -08:00
* @ parent : pointer to Root Port pci_dev data structure
2010-04-15 13:11:42 +09:00
* @ e_info : including detailed error information such like id
2006-07-31 15:21:33 +08:00
*
2010-04-15 13:11:42 +09:00
* Return true if found .
*
* Invoked by DPC when error is detected at the Root Port .
2010-04-15 13:15:08 +09:00
* Caller of this function must set id , severity , and multi_error_valid of
* struct aer_err_info pointed by @ e_info properly . This function must fill
* e_info - > error_dev_num and e_info - > dev [ ] , based on the given information .
2007-11-28 09:04:23 -08:00
*/
2010-04-15 13:11:42 +09:00
static bool find_source_device ( struct pci_dev * parent ,
2009-06-16 13:35:11 +08:00
struct aer_err_info * e_info )
2006-07-31 15:21:33 +08:00
{
struct pci_dev * dev = parent ;
2009-06-16 13:35:11 +08:00
int result ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:15:08 +09:00
/* Must reset in this function */
e_info - > error_dev_num = 0 ;
2006-07-31 15:21:33 +08:00
/* Is Root Port an agent that sends error message? */
2009-06-16 13:35:11 +08:00
result = find_device_iter ( dev , e_info ) ;
if ( result )
2010-04-15 13:11:42 +09:00
return true ;
2006-07-31 15:21:33 +08:00
2009-06-16 13:35:11 +08:00
pci_walk_bus ( parent - > subordinate , find_device_iter , e_info ) ;
2010-04-15 13:11:42 +09:00
if ( ! e_info - > error_dev_num ) {
dev_printk ( KERN_DEBUG , & parent - > dev ,
" can't find device of ID%04x \n " ,
e_info - > id ) ;
return false ;
}
return true ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:34:38 +08:00
static int report_error_detected ( struct pci_dev * dev , void * data )
2006-07-31 15:21:33 +08:00
{
pci_ers_result_t vote ;
struct pci_error_handlers * err_handler ;
struct aer_broadcast_data * result_data ;
result_data = ( struct aer_broadcast_data * ) data ;
dev - > error_state = result_data - > state ;
if ( ! dev - > driver | |
! dev - > driver - > err_handler | |
! dev - > driver - > err_handler - > error_detected ) {
if ( result_data - > state = = pci_channel_io_frozen & &
! ( dev - > hdr_type & PCI_HEADER_TYPE_BRIDGE ) ) {
/*
* In case of fatal recovery , if one of down -
* stream device has no driver . We might be
* unable to recover because a later insmod
* of a driver for this device is unaware of
* its hw state .
*/
2008-06-13 10:52:12 -06:00
dev_printk ( KERN_DEBUG , & dev - > dev , " device has %s \n " ,
dev - > driver ?
" no AER-aware driver " : " no driver " ) ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
}
err_handler = dev - > driver - > err_handler ;
vote = err_handler - > error_detected ( dev , result_data - > state ) ;
result_data - > result = merge_result ( result_data - > result , vote ) ;
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:34:38 +08:00
static int report_mmio_enabled ( struct pci_dev * dev , void * data )
2006-07-31 15:21:33 +08:00
{
pci_ers_result_t vote ;
struct pci_error_handlers * err_handler ;
struct aer_broadcast_data * result_data ;
result_data = ( struct aer_broadcast_data * ) data ;
if ( ! dev - > driver | |
! dev - > driver - > err_handler | |
! dev - > driver - > err_handler - > mmio_enabled )
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
err_handler = dev - > driver - > err_handler ;
vote = err_handler - > mmio_enabled ( dev ) ;
result_data - > result = merge_result ( result_data - > result , vote ) ;
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:34:38 +08:00
static int report_slot_reset ( struct pci_dev * dev , void * data )
2006-07-31 15:21:33 +08:00
{
pci_ers_result_t vote ;
struct pci_error_handlers * err_handler ;
struct aer_broadcast_data * result_data ;
result_data = ( struct aer_broadcast_data * ) data ;
if ( ! dev - > driver | |
! dev - > driver - > err_handler | |
! dev - > driver - > err_handler - > slot_reset )
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
err_handler = dev - > driver - > err_handler ;
vote = err_handler - > slot_reset ( dev ) ;
result_data - > result = merge_result ( result_data - > result , vote ) ;
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:34:38 +08:00
static int report_resume ( struct pci_dev * dev , void * data )
2006-07-31 15:21:33 +08:00
{
struct pci_error_handlers * err_handler ;
dev - > error_state = pci_channel_io_normal ;
if ( ! dev - > driver | |
! dev - > driver - > err_handler | |
2008-12-01 16:31:06 +09:00
! dev - > driver - > err_handler - > resume )
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
err_handler = dev - > driver - > err_handler ;
err_handler - > resume ( dev ) ;
2009-06-16 13:34:38 +08:00
return 0 ;
2006-07-31 15:21:33 +08:00
}
/**
* broadcast_error_message - handle message broadcast to downstream drivers
2007-11-28 09:04:23 -08:00
* @ dev : pointer to from where in a hierarchy message is broadcasted down
2006-07-31 15:21:33 +08:00
* @ state : error state
2007-11-28 09:04:23 -08:00
* @ error_mesg : message to print
* @ cb : callback to be broadcasted
2006-07-31 15:21:33 +08:00
*
* Invoked during error recovery process . Once being invoked , the content
* of error severity will be broadcasted to all downstream drivers in a
* hierarchy in question .
2007-11-28 09:04:23 -08:00
*/
2006-07-31 15:21:33 +08:00
static pci_ers_result_t broadcast_error_message ( struct pci_dev * dev ,
enum pci_channel_state state ,
char * error_mesg ,
2009-06-16 13:34:38 +08:00
int ( * cb ) ( struct pci_dev * , void * ) )
2006-07-31 15:21:33 +08:00
{
struct aer_broadcast_data result_data ;
2008-06-13 10:52:12 -06:00
dev_printk ( KERN_DEBUG , & dev - > dev , " broadcast %s message \n " , error_mesg ) ;
2006-07-31 15:21:33 +08:00
result_data . state = state ;
if ( cb = = report_error_detected )
result_data . result = PCI_ERS_RESULT_CAN_RECOVER ;
else
result_data . result = PCI_ERS_RESULT_RECOVERED ;
if ( dev - > hdr_type & PCI_HEADER_TYPE_BRIDGE ) {
/*
* If the error is reported by a bridge , we think this error
* is related to the downstream link of the bridge , so we
* do error recovery on all subordinates of the bridge instead
* of the bridge and clear the error status of the bridge .
*/
if ( cb = = report_error_detected )
dev - > error_state = state ;
pci_walk_bus ( dev - > subordinate , cb , & result_data ) ;
if ( cb = = report_resume ) {
pci_cleanup_aer_uncorrect_error_status ( dev ) ;
dev - > error_state = pci_channel_io_normal ;
}
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
} else {
2006-07-31 15:21:33 +08:00
/*
* If the error is reported by an end point , we think this
* error is related to the upstream link of the end point .
*/
pci_walk_bus ( dev - > bus , cb , & result_data ) ;
}
return result_data . result ;
}
2010-04-15 13:21:27 +09:00
/**
* aer_do_secondary_bus_reset - perform secondary bus reset
* @ dev : pointer to bridge ' s pci_dev data structure
*
* Invoked when performing link reset at Root Port or Downstream Port .
*/
void aer_do_secondary_bus_reset ( struct pci_dev * dev )
{
u16 p2p_ctrl ;
/* Assert Secondary Bus Reset */
pci_read_config_word ( dev , PCI_BRIDGE_CONTROL , & p2p_ctrl ) ;
p2p_ctrl | = PCI_BRIDGE_CTL_BUS_RESET ;
pci_write_config_word ( dev , PCI_BRIDGE_CONTROL , p2p_ctrl ) ;
/*
* we should send hot reset message for 2 ms to allow it time to
* propagate to all downstream ports
*/
msleep ( 2 ) ;
/* De-assert Secondary Bus Reset */
p2p_ctrl & = ~ PCI_BRIDGE_CTL_BUS_RESET ;
pci_write_config_word ( dev , PCI_BRIDGE_CONTROL , p2p_ctrl ) ;
/*
* System software must wait for at least 100 ms from the end
* of a reset of one or more device before it is permitted
* to issue Configuration Requests to those devices .
*/
msleep ( 200 ) ;
}
/**
* default_downstream_reset_link - default reset function for Downstream Port
* @ dev : pointer to downstream port ' s pci_dev data structure
*
* Invoked when performing link reset at Downstream Port w / no aer driver .
*/
static pci_ers_result_t default_downstream_reset_link ( struct pci_dev * dev )
{
aer_do_secondary_bus_reset ( dev ) ;
dev_printk ( KERN_DEBUG , & dev - > dev ,
" Downstream Port link has been reset \n " ) ;
return PCI_ERS_RESULT_RECOVERED ;
}
2006-07-31 15:21:33 +08:00
static int find_aer_service_iter ( struct device * device , void * data )
{
2010-04-15 13:20:43 +09:00
struct pcie_port_service_driver * service_driver , * * drv ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:20:43 +09:00
drv = ( struct pcie_port_service_driver * * ) data ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:19:48 +09:00
if ( device - > bus = = & pcie_port_bus_type & & device - > driver ) {
service_driver = to_service_driver ( device - > driver ) ;
if ( service_driver - > service = = PCIE_PORT_SERVICE_AER ) {
2010-04-15 13:20:43 +09:00
* drv = service_driver ;
2010-04-15 13:19:48 +09:00
return 1 ;
2006-07-31 15:21:33 +08:00
}
}
return 0 ;
}
2010-04-15 13:20:43 +09:00
static struct pcie_port_service_driver * find_aer_service ( struct pci_dev * dev )
2006-07-31 15:21:33 +08:00
{
2010-04-15 13:20:43 +09:00
struct pcie_port_service_driver * drv = NULL ;
device_for_each_child ( & dev - > dev , & drv , find_aer_service_iter ) ;
return drv ;
2006-07-31 15:21:33 +08:00
}
static pci_ers_result_t reset_link ( struct pcie_device * aerdev ,
struct pci_dev * dev )
{
struct pci_dev * udev ;
pci_ers_result_t status ;
2010-04-15 13:20:43 +09:00
struct pcie_port_service_driver * driver ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:21:27 +09:00
if ( dev - > hdr_type & PCI_HEADER_TYPE_BRIDGE ) {
/* Reset this port for all subordinates */
2006-07-31 15:21:33 +08:00
udev = dev ;
2010-04-15 13:21:27 +09:00
} else {
/* Reset the upstream component (likely downstream port) */
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
udev = dev - > bus - > self ;
2010-04-15 13:21:27 +09:00
}
2006-07-31 15:21:33 +08:00
2010-04-15 13:20:43 +09:00
/* Use the aer driver of the component firstly */
driver = find_aer_service ( udev ) ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:21:27 +09:00
if ( driver & & driver - > reset_link ) {
status = driver - > reset_link ( udev ) ;
} else if ( udev - > pcie_type = = PCI_EXP_TYPE_DOWNSTREAM ) {
status = default_downstream_reset_link ( udev ) ;
} else {
dev_printk ( KERN_DEBUG , & dev - > dev ,
" no link-reset support at upstream device %s \n " ,
pci_name ( udev ) ) ;
return PCI_ERS_RESULT_DISCONNECT ;
2006-07-31 15:21:33 +08:00
}
if ( status ! = PCI_ERS_RESULT_RECOVERED ) {
2010-04-15 13:19:48 +09:00
dev_printk ( KERN_DEBUG , & dev - > dev ,
" link reset at upstream device %s failed \n " ,
pci_name ( udev ) ) ;
2006-07-31 15:21:33 +08:00
return PCI_ERS_RESULT_DISCONNECT ;
}
return status ;
}
/**
* do_recovery - handle nonfatal / fatal error recovery process
* @ aerdev : pointer to a pcie_device data structure of root port
* @ dev : pointer to a pci_dev data structure of agent detecting an error
* @ severity : error severity type
*
* Invoked when an error is nonfatal / fatal . Once being invoked , broadcast
* error detected message to all downstream drivers within a hierarchy in
* question and return the returned code .
2007-11-28 09:04:23 -08:00
*/
2010-04-15 13:16:52 +09:00
static void do_recovery ( struct pcie_device * aerdev , struct pci_dev * dev ,
2006-07-31 15:21:33 +08:00
int severity )
{
pci_ers_result_t status , result = PCI_ERS_RESULT_RECOVERED ;
enum pci_channel_state state ;
if ( severity = = AER_FATAL )
state = pci_channel_io_frozen ;
else
state = pci_channel_io_normal ;
status = broadcast_error_message ( dev ,
state ,
" error_detected " ,
report_error_detected ) ;
if ( severity = = AER_FATAL ) {
result = reset_link ( aerdev , dev ) ;
2010-04-15 13:16:52 +09:00
if ( result ! = PCI_ERS_RESULT_RECOVERED )
goto failed ;
2006-07-31 15:21:33 +08:00
}
if ( status = = PCI_ERS_RESULT_CAN_RECOVER )
status = broadcast_error_message ( dev ,
state ,
" mmio_enabled " ,
report_mmio_enabled ) ;
if ( status = = PCI_ERS_RESULT_NEED_RESET ) {
/*
* TODO : Should call platform - specific
* functions to reset slot before calling
* drivers ' slot_reset callbacks ?
*/
status = broadcast_error_message ( dev ,
state ,
" slot_reset " ,
report_slot_reset ) ;
}
2010-04-15 13:16:52 +09:00
if ( status ! = PCI_ERS_RESULT_RECOVERED )
goto failed ;
broadcast_error_message ( dev ,
2006-07-31 15:21:33 +08:00
state ,
" resume " ,
report_resume ) ;
2010-04-15 13:16:52 +09:00
dev_printk ( KERN_DEBUG , & dev - > dev ,
" AER driver successfully recovered \n " ) ;
return ;
failed :
/* TODO: Should kernel panic here? */
dev_printk ( KERN_DEBUG , & dev - > dev ,
" AER driver didn't recover \n " ) ;
2006-07-31 15:21:33 +08:00
}
/**
* handle_error_source - handle logging error into an event log
* @ aerdev : pointer to pcie_device data structure of the root port
* @ dev : pointer to pci_dev data structure of error source device
* @ info : comprehensive error information
*
* Invoked when an error being detected by Root Port .
2007-11-28 09:04:23 -08:00
*/
PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
drivers/pci/pcie/aer/aer_inject.c
total: 4 errors, 4 warnings, 473 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 5 errors, 2 warnings, 333 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 1 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 4 errors, 3 warnings, 872 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 12 errors, 11 warnings, 248 lines checked
After:
drivers/pci/pcie/aer/aer_inject.c
total: 0 errors, 0 warnings, 466 lines checked
drivers/pci/pcie/aer/aerdrv.c
total: 0 errors, 0 warnings, 335 lines checked
drivers/pci/pcie/aer/aerdrv.h
total: 0 errors, 0 warnings, 139 lines checked
drivers/pci/pcie/aer/aerdrv_core.c
total: 0 errors, 0 warnings, 869 lines checked
drivers/pci/pcie/aer/aerdrv_errprint.c
total: 0 errors, 10 warnings, 247 lines checked
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:07:29 +09:00
static void handle_error_source ( struct pcie_device * aerdev ,
2006-07-31 15:21:33 +08:00
struct pci_dev * dev ,
2009-06-16 13:35:11 +08:00
struct aer_err_info * info )
2006-07-31 15:21:33 +08:00
{
int pos ;
2009-06-16 13:35:11 +08:00
if ( info - > severity = = AER_CORRECTABLE ) {
2006-07-31 15:21:33 +08:00
/*
* Correctable error does not need software intevention .
* No need to go through error recovery process .
*/
2008-10-18 17:33:19 -07:00
pos = pci_find_ext_capability ( dev , PCI_EXT_CAP_ID_ERR ) ;
2006-07-31 15:21:33 +08:00
if ( pos )
pci_write_config_dword ( dev , pos + PCI_ERR_COR_STATUS ,
2009-06-16 13:35:11 +08:00
info - > status ) ;
2010-04-15 13:16:52 +09:00
} else
do_recovery ( aerdev , dev , info - > severity ) ;
2006-07-31 15:21:33 +08:00
}
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
/**
* get_device_error_info - read error status from dev and store it to info
* @ dev : pointer to the device expected to have a error record
* @ info : pointer to structure to store the error record
*
* Return 1 on success , 0 on error .
2010-04-15 13:15:08 +09:00
*
* Note that @ info is reused among all error devices . Clear fields properly .
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
*/
2006-07-31 15:21:33 +08:00
static int get_device_error_info ( struct pci_dev * dev , struct aer_err_info * info )
{
2009-09-07 17:13:42 +09:00
int pos , temp ;
2006-07-31 15:21:33 +08:00
2010-04-15 13:15:08 +09:00
/* Must reset in this function */
2009-09-07 17:09:58 +09:00
info - > status = 0 ;
2009-09-07 17:16:20 +09:00
info - > tlp_header_valid = 0 ;
2009-09-07 17:09:58 +09:00
2008-10-18 17:33:19 -07:00
pos = pci_find_ext_capability ( dev , PCI_EXT_CAP_ID_ERR ) ;
2006-07-31 15:21:33 +08:00
/* The device might not support AER */
if ( ! pos )
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
return 1 ;
2006-07-31 15:21:33 +08:00
if ( info - > severity = = AER_CORRECTABLE ) {
pci_read_config_dword ( dev , pos + PCI_ERR_COR_STATUS ,
& info - > status ) ;
2009-09-07 17:12:25 +09:00
pci_read_config_dword ( dev , pos + PCI_ERR_COR_MASK ,
& info - > mask ) ;
if ( ! ( info - > status & ~ info - > mask ) )
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
return 0 ;
2006-07-31 15:21:33 +08:00
} else if ( dev - > hdr_type & PCI_HEADER_TYPE_BRIDGE | |
info - > severity = = AER_NONFATAL ) {
/* Link is still healthy for IO reads */
pci_read_config_dword ( dev , pos + PCI_ERR_UNCOR_STATUS ,
& info - > status ) ;
2009-09-07 17:12:25 +09:00
pci_read_config_dword ( dev , pos + PCI_ERR_UNCOR_MASK ,
& info - > mask ) ;
if ( ! ( info - > status & ~ info - > mask ) )
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
return 0 ;
2006-07-31 15:21:33 +08:00
2009-09-07 17:13:42 +09:00
/* Get First Error Pointer */
pci_read_config_dword ( dev , pos + PCI_ERR_CAP , & temp ) ;
2009-09-07 17:16:20 +09:00
info - > first_error = PCI_ERR_CAP_FEP ( temp ) ;
2009-09-07 17:13:42 +09:00
2006-07-31 15:21:33 +08:00
if ( info - > status & AER_LOG_TLP_MASKS ) {
2009-09-07 17:16:20 +09:00
info - > tlp_header_valid = 1 ;
2006-07-31 15:21:33 +08:00
pci_read_config_dword ( dev ,
pos + PCI_ERR_HEADER_LOG , & info - > tlp . dw0 ) ;
pci_read_config_dword ( dev ,
pos + PCI_ERR_HEADER_LOG + 4 , & info - > tlp . dw1 ) ;
pci_read_config_dword ( dev ,
pos + PCI_ERR_HEADER_LOG + 8 , & info - > tlp . dw2 ) ;
pci_read_config_dword ( dev ,
pos + PCI_ERR_HEADER_LOG + 12 , & info - > tlp . dw3 ) ;
}
}
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
return 1 ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:35:16 +08:00
static inline void aer_process_err_devices ( struct pcie_device * p_device ,
struct aer_err_info * e_info )
{
int i ;
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
/* Report all before handle them, not to lost records by reset etc. */
2009-06-16 13:35:16 +08:00
for ( i = 0 ; i < e_info - > error_dev_num & & e_info - > dev [ i ] ; i + + ) {
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
if ( get_device_error_info ( e_info - > dev [ i ] , e_info ) )
2009-06-16 13:35:16 +08:00
aer_print_error ( e_info - > dev [ i ] , e_info ) ;
PCI: pcie, aer: report all error before recovery
This patch is required not to lost error records by action invoked on
error recovery, such as slot reset etc.
Following sample (real machine + dummy record injected by aer-inject)
shows that record of 28:00.1 could not be retrieved by recovery of 28:00.0:
- Before:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00001000/00100000
e1000e 0000:28:00.0: [12] Poisoned TLP (First)
e1000e 0000:28:00.0: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: PCIE Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=00081000/00100000
e1000e 0000:28:00.1: [12] Poisoned TLP (First)
e1000e 0000:28:00.1: [19] ECRC
e1000e 0000:28:00.1: TLP Header: 00000000 00000001 00000002 00000003
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
e1000e 0000:28:00.0: broadcast error_detected message
e1000e 0000:28:00.0: broadcast slot_reset message
e1000e 0000:28:00.0: setting latency timer to 64
e1000e 0000:28:00.0: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.0: PME# disabled
e1000e 0000:28:00.1: setting latency timer to 64
e1000e 0000:28:00.1: restoring config space at offset 0x1 (was 0x100547, writing 0x100147)
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.1: PME# disabled
e1000e 0000:28:00.0: broadcast resume message
e1000e 0000:28:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:59 +09:00
}
for ( i = 0 ; i < e_info - > error_dev_num & & e_info - > dev [ i ] ; i + + ) {
if ( get_device_error_info ( e_info - > dev [ i ] , e_info ) )
handle_error_source ( p_device , e_info - > dev [ i ] , e_info ) ;
2009-06-16 13:35:16 +08:00
}
}
2006-07-31 15:21:33 +08:00
/**
* aer_isr_one_error - consume an error detected by root port
* @ p_device : pointer to error root port service device
* @ e_src : pointer to an error source
2007-11-28 09:04:23 -08:00
*/
2006-07-31 15:21:33 +08:00
static void aer_isr_one_error ( struct pcie_device * p_device ,
struct aer_err_source * e_src )
{
2009-06-16 13:35:11 +08:00
struct aer_err_info * e_info ;
/* struct aer_err_info might be big, so we allocate it with slab */
e_info = kmalloc ( sizeof ( struct aer_err_info ) , GFP_KERNEL ) ;
2010-04-15 13:15:08 +09:00
if ( ! e_info ) {
2009-06-16 13:35:11 +08:00
dev_printk ( KERN_DEBUG , & p_device - > port - > dev ,
" Can't allocate mem when processing AER errors \n " ) ;
return ;
}
2006-07-31 15:21:33 +08:00
/*
* There is a possibility that both correctable error and
* uncorrectable error being logged . Report correctable error first .
*/
2010-04-15 13:15:08 +09:00
if ( e_src - > status & PCI_ERR_ROOT_COR_RCV ) {
e_info - > id = ERR_COR_ID ( e_src - > id ) ;
e_info - > severity = AER_CORRECTABLE ;
if ( e_src - > status & PCI_ERR_ROOT_MULTI_COR_RCV )
e_info - > multi_error_valid = 1 ;
else
e_info - > multi_error_valid = 0 ;
aer_print_port_info ( p_device - > port , e_info ) ;
if ( find_source_device ( p_device - > port , e_info ) )
aer_process_err_devices ( p_device , e_info ) ;
}
if ( e_src - > status & PCI_ERR_ROOT_UNCOR_RCV ) {
e_info - > id = ERR_UNCOR_ID ( e_src - > id ) ;
if ( e_src - > status & PCI_ERR_ROOT_FATAL_RCV )
e_info - > severity = AER_FATAL ;
else
e_info - > severity = AER_NONFATAL ;
if ( e_src - > status & PCI_ERR_ROOT_MULTI_UNCOR_RCV )
2009-09-07 17:16:20 +09:00
e_info - > multi_error_valid = 1 ;
2010-04-15 13:15:08 +09:00
else
e_info - > multi_error_valid = 0 ;
2009-06-16 13:35:11 +08:00
PCI: pcie, aer: change error print format
Use dev_printk like format.
Sample (real machine + dummy error injected by aer-inject):
- Before:
+------ PCI-Express Device Error ------+
Error Severity : Corrected
PCIE Bus Error type : Data Link Layer
Bad TLP :
Receiver ID : 2800
VendorID=8086h, DeviceID=1096h, Bus=28h, Device=00h, Function=00h
+------ PCI-Express Device Error ------+
Error Severity : Corrected
PCIE Bus Error type : Data Link Layer
Bad TLP :
Bad DLLP :
Receiver ID : 2801
VendorID=8086h, DeviceID=1096h, Bus=28h, Device=00h, Function=01h
Error of this Agent(2801) is reported first
- After:
pcieport-driver 0000:00:02.0: AER: Multiple Corrected error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0: device [8086:1096] error status/mask=00000040/00000000
e1000e 0000:28:00.0: [ 6] Bad TLP
e1000e 0000:28:00.1: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1: device [8086:1096] error status/mask=000000c0/00000000
e1000e 0000:28:00.1: [ 6] Bad TLP
e1000e 0000:28:00.1: [ 7] Bad DLLP
e1000e 0000:28:00.1: Error of this Agent(2801) is reported first
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-07 17:16:45 +09:00
aer_print_port_info ( p_device - > port , e_info ) ;
2010-04-15 13:11:42 +09:00
if ( find_source_device ( p_device - > port , e_info ) )
aer_process_err_devices ( p_device , e_info ) ;
2006-07-31 15:21:33 +08:00
}
2009-06-16 13:35:11 +08:00
kfree ( e_info ) ;
2006-07-31 15:21:33 +08:00
}
2010-04-15 13:16:16 +09:00
/**
* get_e_source - retrieve an error source
* @ rpc : pointer to the root port which holds an error
* @ e_src : pointer to store retrieved error source
*
* Return 1 if an error source is retrieved , otherwise 0.
*
* Invoked by DPC handler to consume an error .
*/
static int get_e_source ( struct aer_rpc * rpc , struct aer_err_source * e_src )
{
unsigned long flags ;
/* Lock access to Root error producer/consumer index */
spin_lock_irqsave ( & rpc - > e_lock , flags ) ;
2010-05-27 11:21:11 +09:00
if ( rpc - > prod_idx = = rpc - > cons_idx ) {
spin_unlock_irqrestore ( & rpc - > e_lock , flags ) ;
return 0 ;
2010-04-15 13:16:16 +09:00
}
2010-05-27 11:21:11 +09:00
* e_src = rpc - > e_sources [ rpc - > cons_idx ] ;
rpc - > cons_idx + + ;
if ( rpc - > cons_idx = = AER_ERROR_SOURCES_MAX )
rpc - > cons_idx = 0 ;
2010-04-15 13:16:16 +09:00
spin_unlock_irqrestore ( & rpc - > e_lock , flags ) ;
2010-05-27 11:21:11 +09:00
return 1 ;
2010-04-15 13:16:16 +09:00
}
2006-07-31 15:21:33 +08:00
/**
* aer_isr - consume errors detected by root port
2006-11-22 14:55:48 +00:00
* @ work : definition of this work item
2006-07-31 15:21:33 +08:00
*
* Invoked , as DPC , when root port records new detected error
2007-11-28 09:04:23 -08:00
*/
2006-11-22 14:55:48 +00:00
void aer_isr ( struct work_struct * work )
2006-07-31 15:21:33 +08:00
{
2006-11-22 14:55:48 +00:00
struct aer_rpc * rpc = container_of ( work , struct aer_rpc , dpc_handler ) ;
struct pcie_device * p_device = rpc - > rpd ;
2010-08-03 15:18:43 -04:00
struct aer_err_source uninitialized_var ( e_src ) ;
2006-07-31 15:21:33 +08:00
mutex_lock ( & rpc - > rpc_mutex ) ;
2010-04-15 13:16:16 +09:00
while ( get_e_source ( rpc , & e_src ) )
aer_isr_one_error ( p_device , & e_src ) ;
2006-07-31 15:21:33 +08:00
mutex_unlock ( & rpc - > rpc_mutex ) ;
wake_up ( & rpc - > wait_release ) ;
}
/**
* aer_init - provide AER initialization
* @ dev : pointer to AER pcie device
*
* Invoked when AER service driver is loaded .
2007-11-28 09:04:23 -08:00
*/
2006-07-31 15:21:33 +08:00
int aer_init ( struct pcie_device * dev )
{
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-02 11:51:24 -06:00
if ( forceload ) {
dev_printk ( KERN_DEBUG , & dev - > device ,
" aerdrv forceload requested. \n " ) ;
2010-05-18 14:35:16 +08:00
pcie_aer_force_firmware_first ( dev - > port , 0 ) ;
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-02 11:51:24 -06:00
}
PCI: PCIe: Ask BIOS for control of all native services at once
After commit 852972acff8f10f3a15679be2059bb94916cba5d (ACPI: Disable
ASPM if the platform won't provide _OSC control for PCIe) control of
the PCIe Capability Structure is unconditionally requested by
acpi_pci_root_add(), which in principle may cause problems to
happen in two ways. First, the BIOS may refuse to give control of
the PCIe Capability Structure if it is not asked for any of the
_OSC features depending on it at the same time. Second, the BIOS may
assume that control of the _OSC features depending on the PCIe
Capability Structure will be requested in the future and may behave
incorrectly if that doesn't happen. For this reason, control of
the PCIe Capability Structure should always be requested along with
control of any other _OSC features that may depend on it (ie. PCIe
native PME, PCIe native hot-plug, PCIe AER).
Rework the PCIe port driver so that (1) it checks which native PCIe
port services can be enabled, according to the BIOS, and (2) it
requests control of all these services simultaneously. In
particular, this causes pcie_portdrv_probe() to fail if the BIOS
refuses to grant control of the PCIe Capability Structure, which
means that no native PCIe port services can be enabled for the PCIe
Root Complex the given port belongs to. If that happens, ASPM is
disabled to avoid problems with mishandling it by the part of the
PCIe hierarchy for which control of the PCIe Capability Structure
has not been received.
Make it possible to override this behavior using 'pcie_ports=native'
(use the PCIe native services regardless of the BIOS response to the
control request), or 'pcie_ports=compat' (do not use the PCIe native
services at all).
Accordingly, rework the existing PCIe port service drivers so that
they don't request control of the services directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-21 22:02:38 +02:00
return 0 ;
2006-07-31 15:21:33 +08:00
}