2018-01-26 14:22:04 -06:00
// SPDX-License-Identifier: GPL-2.0+
2005-04-16 15:20:36 -07:00
/*
* PCI Hot Plug Controller Driver for RPA - compliant PPC64 platform .
* Copyright ( C ) 2003 Linda Xie < lxie @ us . ibm . com >
*
* All rights reserved .
*
* Send feedback to < lxie @ us . ibm . com >
*
*/
2022-04-02 12:11:56 +02:00
# include <linux/of.h>
2005-04-16 15:20:36 -07:00
# include <linux/pci.h>
2005-10-30 15:03:48 -08:00
# include <linux/string.h>
2005-04-16 15:20:36 -07:00
# include <asm/pci-bridge.h>
# include <asm/rtas.h>
# include <asm/machdep.h>
2005-10-30 15:03:48 -08:00
# include "../pci.h" /* for pci_add_new_bus */
2005-04-16 15:20:36 -07:00
# include "rpaphp.h"
PCI: rpaphp: Error out on busy status from get-sensor-state
When certain PHB HW failure causes pHyp to recover PHB, it marks the PE
state as temporarily unavailable until recovery is complete. This also
triggers an EEH handler in Linux which needs to notify drivers, and perform
recovery. But before notifying the driver about the PCI error it uses
get_adapter_status()->rpaphp_get_sensor_state()->rtas_call(get-sensor-state)
operation of the hotplug_slot to determine if the slot contains a device or
not. If the slot is empty, the recovery is skipped entirely.
eeh_event_handler()
->eeh_handle_normal_event()
->eeh_slot_presence_check()
->get_adapter_status()
->rpaphp_get_sensor_state()
->rtas_get_sensor()
->rtas_call(get-sensor-state)
However on certain PHB failures, the RTAS call rtas_call(get-sensor-state)
returns extended busy error (9902) until PHB is recovered by pHyp. Once PHB
is recovered, the rtas_call(get-sensor-state) returns success with correct
presence status. The RTAS call interface rtas_get_sensor() loops over the
RTAS call on extended delay return code (9902) until the return value is
either success (0) or error (-1). This causes the EEH handler to get stuck
for ~6 seconds before it could notify that the PCI error has been detected
and stop any active operations. Hence with running I/O traffic, during this
6 seconds, the network driver continues its operation and hits a timeout
(netdev watchdog).
------------
[52732.244731] DEBUG: ibm_read_slot_reset_state2()
[52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
[52732.244798] DEBUG: in eeh_slot_presence_check
[52732.244804] DEBUG: error state check
[52732.244807] DEBUG: Is slot hotpluggable
[52732.244810] DEBUG: hotpluggable ops ?
[52732.244953] DEBUG: Calling ops->get_adapter_status
[52732.244958] DEBUG: calling rpaphp_get_sensor_state
[52736.564262] ------------[ cut here ]------------
[52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
[52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
[...]
[52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
[52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
------------
On timeouts, network driver starts dumping debug information to console
(e.g bnx2 driver calls bnx2x_panic_dump()), and go into recovery path while
pHyp is still recovering the PHB. As part of recovery, the driver tries to
reset the device and it keeps failing since every PCI read/write returns
ff's. And when EEH recovery kicks-in, the driver is unable to recover the
device. This impacts the ssh connection and leads to the system being
inaccessible. To get the NIC working again it needs a reboot or re-assign
the I/O adapter from HMC.
[ 9531.168587] EEH: Beginning: 'slot_reset'
[ 9531.168601] PCI 0013:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
[...]
[ 9614.110094] bnx2x: [bnx2x_func_stop:9129(enP19p1s0f0)]FUNC_STOP ramrod failed. Running a dry transaction
[ 9614.110300] bnx2x: [bnx2x_igu_int_disable:902(enP19p1s0f0)]BUG! Proper val not read from IGU!
[ 9629.178067] bnx2x: [bnx2x_fw_command:3055(enP19p1s0f0)]FW failed to respond!
[ 9629.178085] bnx2x 0013:01:00.0 enP19p1s0f0: bc 7.10.4
[ 9629.178091] bnx2x: [bnx2x_fw_dump_lvl:789(enP19p1s0f0)]Cannot dump MCP info while in PCI error
[ 9644.241813] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f0)]IO slot reset --> driver unload
[...]
[ 9644.241819] PCI 0013:01:00.0#10000: EEH: bnx2x driver reports: 'disconnect'
[ 9644.241823] PCI 0013:01:00.1#10000: EEH: Invoking bnx2x->slot_reset()
[ 9644.241827] bnx2x: [bnx2x_io_slot_reset:14229(enP19p1s0f1)]IO slot reset initializing...
[ 9644.241916] bnx2x 0013:01:00.1: enabling device (0140 -> 0142)
[ 9644.258604] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f1)]IO slot reset --> driver unload
[ 9644.258612] PCI 0013:01:00.1#10000: EEH: bnx2x driver reports: 'disconnect'
[ 9644.258615] EEH: Finished:'slot_reset' with aggregate recovery state:'disconnect'
[ 9644.258620] EEH: Unable to recover from failure from PHB#13-PE#10000.
[ 9644.261811] EEH: Beginning: 'error_detected(permanent failure)'
[...]
[ 9644.261823] EEH: Finished:'error_detected(permanent failure)'
Hence, it becomes important to inform driver about the PCI error detection
as early as possible, so that driver is aware of PCI error and waits for
EEH handler's next action for successful recovery.
Current implementation uses rtas_get_sensor() API which blocks the slot
check state until RTAS call returns success. To avoid this, fix the PCI
hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
state can not be detected immediately while PE is in EEH recovery state.
Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
directly only if the respective PE is in EEH recovery state, and take
actions based on RTAS return status. This way EEH handler will not be
blocked on rpaphp_get_sensor_state() and can immediately notify driver
about the PCI error and stop any active operations.
In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
invoke rtas_get_sensor() as it was earlier with no change in existing
behavior.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/169235815601.193557.13989873835811325343.stgit@jupiter
2023-08-18 16:59:43 +05:30
/*
* RTAS call get - sensor - state ( DR_ENTITY_SENSE ) return values as per PAPR :
* - - generic return codes - - -
* - 1 : Hardware Error
* - 2 : RTAS_BUSY
* - 3 : Invalid sensor . RTAS Parameter Error .
* - - rtas_get_sensor function specific return codes - - -
* - 9000 : Need DR entity to be powered up and unisolated before RTAS call
* - 9001 : Need DR entity to be powered up , but not unisolated , before RTAS call
* - 9002 : DR entity unusable
* 990 x : Extended delay - where x is a number in the range of 0 - 5
*/
# define RTAS_SLOT_UNISOLATED -9000
# define RTAS_SLOT_NOT_UNISOLATED -9001
# define RTAS_SLOT_NOT_USABLE -9002
static int rtas_get_sensor_errno ( int rtas_rc )
{
switch ( rtas_rc ) {
case 0 :
/* Success case */
return 0 ;
case RTAS_SLOT_UNISOLATED :
case RTAS_SLOT_NOT_UNISOLATED :
return - EFAULT ;
case RTAS_SLOT_NOT_USABLE :
return - ENODEV ;
case RTAS_BUSY :
case RTAS_EXTENDED_DELAY_MIN . . . RTAS_EXTENDED_DELAY_MAX :
return - EBUSY ;
default :
return rtas_error_rc ( rtas_rc ) ;
}
}
/*
* get_adapter_status ( ) can be called by the EEH handler during EEH recovery .
* On certain PHB failures , the RTAS call rtas_call ( get - sensor - state ) returns
* extended busy error ( 9902 ) until PHB is recovered by pHyp . The RTAS call
* interface rtas_get_sensor ( ) loops over the RTAS call on extended delay
* return code ( 9902 ) until the return value is either success ( 0 ) or error
* ( - 1 ) . This causes the EEH handler to get stuck for ~ 6 seconds before it
* could notify that the PCI error has been detected and stop any active
* operations . This sometimes causes EEH recovery to fail . To avoid this issue ,
* invoke rtas_call ( get - sensor - state ) directly if the respective PE is in EEH
* recovery state and return - EBUSY error based on RTAS return status . This
* will help the EEH handler to notify the driver about the PCI error
* immediately and successfully proceed with EEH recovery steps .
*/
static int __rpaphp_get_sensor_state ( struct slot * slot , int * state )
{
int rc ;
int token = rtas_token ( " get-sensor-state " ) ;
struct pci_dn * pdn ;
struct eeh_pe * pe ;
struct pci_controller * phb = PCI_DN ( slot - > dn ) - > phb ;
if ( token = = RTAS_UNKNOWN_SERVICE )
return - ENOENT ;
/*
* Fallback to existing method for empty slot or PE isn ' t in EEH
* recovery .
*/
pdn = list_first_entry_or_null ( & PCI_DN ( phb - > dn ) - > child_list ,
struct pci_dn , list ) ;
if ( ! pdn )
goto fallback ;
pe = eeh_dev_to_pe ( pdn - > edev ) ;
if ( pe & & ( pe - > state & EEH_PE_RECOVERING ) ) {
rc = rtas_call ( token , 2 , 2 , state , DR_ENTITY_SENSE ,
slot - > index ) ;
return rtas_get_sensor_errno ( rc ) ;
}
fallback :
return rtas_get_sensor ( DR_ENTITY_SENSE , slot - > index , state ) ;
}
2006-01-12 18:28:22 -06:00
int rpaphp_get_sensor_state ( struct slot * slot , int * state )
2005-04-16 15:20:36 -07:00
{
int rc ;
int setlevel ;
PCI: rpaphp: Error out on busy status from get-sensor-state
When certain PHB HW failure causes pHyp to recover PHB, it marks the PE
state as temporarily unavailable until recovery is complete. This also
triggers an EEH handler in Linux which needs to notify drivers, and perform
recovery. But before notifying the driver about the PCI error it uses
get_adapter_status()->rpaphp_get_sensor_state()->rtas_call(get-sensor-state)
operation of the hotplug_slot to determine if the slot contains a device or
not. If the slot is empty, the recovery is skipped entirely.
eeh_event_handler()
->eeh_handle_normal_event()
->eeh_slot_presence_check()
->get_adapter_status()
->rpaphp_get_sensor_state()
->rtas_get_sensor()
->rtas_call(get-sensor-state)
However on certain PHB failures, the RTAS call rtas_call(get-sensor-state)
returns extended busy error (9902) until PHB is recovered by pHyp. Once PHB
is recovered, the rtas_call(get-sensor-state) returns success with correct
presence status. The RTAS call interface rtas_get_sensor() loops over the
RTAS call on extended delay return code (9902) until the return value is
either success (0) or error (-1). This causes the EEH handler to get stuck
for ~6 seconds before it could notify that the PCI error has been detected
and stop any active operations. Hence with running I/O traffic, during this
6 seconds, the network driver continues its operation and hits a timeout
(netdev watchdog).
------------
[52732.244731] DEBUG: ibm_read_slot_reset_state2()
[52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
[52732.244798] DEBUG: in eeh_slot_presence_check
[52732.244804] DEBUG: error state check
[52732.244807] DEBUG: Is slot hotpluggable
[52732.244810] DEBUG: hotpluggable ops ?
[52732.244953] DEBUG: Calling ops->get_adapter_status
[52732.244958] DEBUG: calling rpaphp_get_sensor_state
[52736.564262] ------------[ cut here ]------------
[52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
[52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
[...]
[52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
[52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
------------
On timeouts, network driver starts dumping debug information to console
(e.g bnx2 driver calls bnx2x_panic_dump()), and go into recovery path while
pHyp is still recovering the PHB. As part of recovery, the driver tries to
reset the device and it keeps failing since every PCI read/write returns
ff's. And when EEH recovery kicks-in, the driver is unable to recover the
device. This impacts the ssh connection and leads to the system being
inaccessible. To get the NIC working again it needs a reboot or re-assign
the I/O adapter from HMC.
[ 9531.168587] EEH: Beginning: 'slot_reset'
[ 9531.168601] PCI 0013:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
[...]
[ 9614.110094] bnx2x: [bnx2x_func_stop:9129(enP19p1s0f0)]FUNC_STOP ramrod failed. Running a dry transaction
[ 9614.110300] bnx2x: [bnx2x_igu_int_disable:902(enP19p1s0f0)]BUG! Proper val not read from IGU!
[ 9629.178067] bnx2x: [bnx2x_fw_command:3055(enP19p1s0f0)]FW failed to respond!
[ 9629.178085] bnx2x 0013:01:00.0 enP19p1s0f0: bc 7.10.4
[ 9629.178091] bnx2x: [bnx2x_fw_dump_lvl:789(enP19p1s0f0)]Cannot dump MCP info while in PCI error
[ 9644.241813] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f0)]IO slot reset --> driver unload
[...]
[ 9644.241819] PCI 0013:01:00.0#10000: EEH: bnx2x driver reports: 'disconnect'
[ 9644.241823] PCI 0013:01:00.1#10000: EEH: Invoking bnx2x->slot_reset()
[ 9644.241827] bnx2x: [bnx2x_io_slot_reset:14229(enP19p1s0f1)]IO slot reset initializing...
[ 9644.241916] bnx2x 0013:01:00.1: enabling device (0140 -> 0142)
[ 9644.258604] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f1)]IO slot reset --> driver unload
[ 9644.258612] PCI 0013:01:00.1#10000: EEH: bnx2x driver reports: 'disconnect'
[ 9644.258615] EEH: Finished:'slot_reset' with aggregate recovery state:'disconnect'
[ 9644.258620] EEH: Unable to recover from failure from PHB#13-PE#10000.
[ 9644.261811] EEH: Beginning: 'error_detected(permanent failure)'
[...]
[ 9644.261823] EEH: Finished:'error_detected(permanent failure)'
Hence, it becomes important to inform driver about the PCI error detection
as early as possible, so that driver is aware of PCI error and waits for
EEH handler's next action for successful recovery.
Current implementation uses rtas_get_sensor() API which blocks the slot
check state until RTAS call returns success. To avoid this, fix the PCI
hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
state can not be detected immediately while PE is in EEH recovery state.
Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
directly only if the respective PE is in EEH recovery state, and take
actions based on RTAS return status. This way EEH handler will not be
blocked on rpaphp_get_sensor_state() and can immediately notify driver
about the PCI error and stop any active operations.
In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
invoke rtas_get_sensor() as it was earlier with no change in existing
behavior.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/169235815601.193557.13989873835811325343.stgit@jupiter
2023-08-18 16:59:43 +05:30
rc = __rpaphp_get_sensor_state ( slot , state ) ;
2005-04-16 15:20:36 -07:00
if ( rc < 0 ) {
if ( rc = = - EFAULT | | rc = = - EEXIST ) {
dbg ( " %s: slot must be power up to get sensor-state \n " ,
2008-03-03 19:09:46 -08:00
__func__ ) ;
2005-04-16 15:20:36 -07:00
2013-11-14 11:28:18 -07:00
/* some slots have to be powered up
2005-04-16 15:20:36 -07:00
* before get - sensor will succeed .
*/
rc = rtas_set_power_level ( slot - > power_domain , POWER_ON ,
& setlevel ) ;
if ( rc < 0 ) {
dbg ( " %s: power on slot[%s] failed rc=%d. \n " ,
2008-03-03 19:09:46 -08:00
__func__ , slot - > name , rc ) ;
2005-04-16 15:20:36 -07:00
} else {
PCI: rpaphp: Error out on busy status from get-sensor-state
When certain PHB HW failure causes pHyp to recover PHB, it marks the PE
state as temporarily unavailable until recovery is complete. This also
triggers an EEH handler in Linux which needs to notify drivers, and perform
recovery. But before notifying the driver about the PCI error it uses
get_adapter_status()->rpaphp_get_sensor_state()->rtas_call(get-sensor-state)
operation of the hotplug_slot to determine if the slot contains a device or
not. If the slot is empty, the recovery is skipped entirely.
eeh_event_handler()
->eeh_handle_normal_event()
->eeh_slot_presence_check()
->get_adapter_status()
->rpaphp_get_sensor_state()
->rtas_get_sensor()
->rtas_call(get-sensor-state)
However on certain PHB failures, the RTAS call rtas_call(get-sensor-state)
returns extended busy error (9902) until PHB is recovered by pHyp. Once PHB
is recovered, the rtas_call(get-sensor-state) returns success with correct
presence status. The RTAS call interface rtas_get_sensor() loops over the
RTAS call on extended delay return code (9902) until the return value is
either success (0) or error (-1). This causes the EEH handler to get stuck
for ~6 seconds before it could notify that the PCI error has been detected
and stop any active operations. Hence with running I/O traffic, during this
6 seconds, the network driver continues its operation and hits a timeout
(netdev watchdog).
------------
[52732.244731] DEBUG: ibm_read_slot_reset_state2()
[52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
[52732.244798] DEBUG: in eeh_slot_presence_check
[52732.244804] DEBUG: error state check
[52732.244807] DEBUG: Is slot hotpluggable
[52732.244810] DEBUG: hotpluggable ops ?
[52732.244953] DEBUG: Calling ops->get_adapter_status
[52732.244958] DEBUG: calling rpaphp_get_sensor_state
[52736.564262] ------------[ cut here ]------------
[52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
[52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
[...]
[52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
[52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
------------
On timeouts, network driver starts dumping debug information to console
(e.g bnx2 driver calls bnx2x_panic_dump()), and go into recovery path while
pHyp is still recovering the PHB. As part of recovery, the driver tries to
reset the device and it keeps failing since every PCI read/write returns
ff's. And when EEH recovery kicks-in, the driver is unable to recover the
device. This impacts the ssh connection and leads to the system being
inaccessible. To get the NIC working again it needs a reboot or re-assign
the I/O adapter from HMC.
[ 9531.168587] EEH: Beginning: 'slot_reset'
[ 9531.168601] PCI 0013:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
[...]
[ 9614.110094] bnx2x: [bnx2x_func_stop:9129(enP19p1s0f0)]FUNC_STOP ramrod failed. Running a dry transaction
[ 9614.110300] bnx2x: [bnx2x_igu_int_disable:902(enP19p1s0f0)]BUG! Proper val not read from IGU!
[ 9629.178067] bnx2x: [bnx2x_fw_command:3055(enP19p1s0f0)]FW failed to respond!
[ 9629.178085] bnx2x 0013:01:00.0 enP19p1s0f0: bc 7.10.4
[ 9629.178091] bnx2x: [bnx2x_fw_dump_lvl:789(enP19p1s0f0)]Cannot dump MCP info while in PCI error
[ 9644.241813] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f0)]IO slot reset --> driver unload
[...]
[ 9644.241819] PCI 0013:01:00.0#10000: EEH: bnx2x driver reports: 'disconnect'
[ 9644.241823] PCI 0013:01:00.1#10000: EEH: Invoking bnx2x->slot_reset()
[ 9644.241827] bnx2x: [bnx2x_io_slot_reset:14229(enP19p1s0f1)]IO slot reset initializing...
[ 9644.241916] bnx2x 0013:01:00.1: enabling device (0140 -> 0142)
[ 9644.258604] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f1)]IO slot reset --> driver unload
[ 9644.258612] PCI 0013:01:00.1#10000: EEH: bnx2x driver reports: 'disconnect'
[ 9644.258615] EEH: Finished:'slot_reset' with aggregate recovery state:'disconnect'
[ 9644.258620] EEH: Unable to recover from failure from PHB#13-PE#10000.
[ 9644.261811] EEH: Beginning: 'error_detected(permanent failure)'
[...]
[ 9644.261823] EEH: Finished:'error_detected(permanent failure)'
Hence, it becomes important to inform driver about the PCI error detection
as early as possible, so that driver is aware of PCI error and waits for
EEH handler's next action for successful recovery.
Current implementation uses rtas_get_sensor() API which blocks the slot
check state until RTAS call returns success. To avoid this, fix the PCI
hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
state can not be detected immediately while PE is in EEH recovery state.
Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
directly only if the respective PE is in EEH recovery state, and take
actions based on RTAS return status. This way EEH handler will not be
blocked on rpaphp_get_sensor_state() and can immediately notify driver
about the PCI error and stop any active operations.
In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
invoke rtas_get_sensor() as it was earlier with no change in existing
behavior.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/169235815601.193557.13989873835811325343.stgit@jupiter
2023-08-18 16:59:43 +05:30
rc = __rpaphp_get_sensor_state ( slot , state ) ;
2005-04-16 15:20:36 -07:00
}
} else if ( rc = = - ENODEV )
2008-03-03 19:09:46 -08:00
info ( " %s: slot is unusable \n " , __func__ ) ;
2005-04-16 15:20:36 -07:00
else
2008-03-03 19:09:46 -08:00
err ( " %s failed to get sensor state \n " , __func__ ) ;
2005-04-16 15:20:36 -07:00
}
return rc ;
}
2007-04-13 15:34:20 -07:00
/**
* rpaphp_enable_slot - record slot state , config pci device
2007-11-28 09:04:30 -08:00
* @ slot : target & slot
2007-04-13 15:34:20 -07:00
*
PCI: hotplug: Drop hotplug_slot_info
Ever since the PCI hotplug core was introduced in 2002, drivers had to
allocate and register a struct hotplug_slot_info for every slot:
https://git.kernel.org/tglx/history/c/a8a2069f432c
Apparently the idea was that drivers furnish the hotplug core with an
up-to-date card presence status, power status, latch status and
attention indicator status as well as notify the hotplug core of changes
thereof. However only 4 out of 12 hotplug drivers bother to notify the
hotplug core with pci_hp_change_slot_info() and the hotplug core never
made any use of the information: There is just a single macro in
pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
the driver lacks the corresponding callback in hotplug_slot_ops. The
macro is called when the user reads the attribute via sysfs.
Now, if the callback isn't defined, the attribute isn't exposed in sysfs
in the first place (see e.g. has_power_file()). There are only two
situations when the hotplug_slot_info would actually be accessed:
* If the driver defines ->enable_slot or ->disable_slot but not
->get_power_status.
* If the driver defines ->set_attention_status but not
->get_attention_status.
There is no driver doing the former and just a single driver doing the
latter, namely pnv_php.c. Amend it with a ->get_attention_status
callback. With that, the hotplug_slot_info becomes completely unused by
the PCI hotplug core. But a few drivers use it internally as a cache:
cpcihp uses it to cache the latch_status and adapter_status.
cpqhp uses it to cache the adapter_status.
pnv_php and rpaphp use it to cache the attention_status.
shpchp uses it to cache all four values.
Amend these drivers to cache the information in their private slot
struct. shpchp's slot struct already contains members to cache the
power_status and adapter_status, so additional members are only needed
for the other two values. In the case of cpqphp, the cached value is
only accessed in a single place, so instead of caching it, read the
current value from the hardware.
Caution: acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
populate the hotplug_slot_info with initial values on probe. That code
is herewith removed. There is a theoretical chance that the code has
side effects without which the driver fails to function, e.g. if the
ACPI method to read the adapter status needs to be executed at least
once on probe. That seems unlikely to me, still maintainers should
review the changes carefully for this possibility.
Rafael adds: "I'm not aware of any case in which it will break anything,
[...] but if that happens, it may be necessary to add the execution of
the control methods in question directly to the initialization part."
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
2018-09-08 09:59:01 +02:00
* Initialize values in the slot structure to indicate if there is a pci card
* plugged into the slot . If the slot is not empty , run the pcibios routine
2007-04-13 15:34:20 -07:00
* to get pcibios stuff correctly set up .
*/
int rpaphp_enable_slot ( struct slot * slot )
2005-04-16 15:20:36 -07:00
{
2007-04-13 15:34:14 -07:00
int rc , level , state ;
struct pci_bus * bus ;
2007-04-13 15:34:12 -07:00
2007-04-13 15:34:16 -07:00
slot - > state = EMPTY ;
2007-04-13 15:34:14 -07:00
/* Find out if the power is turned on for the slot */
2007-04-13 15:34:13 -07:00
rc = rtas_get_power_level ( slot - > power_domain , & level ) ;
if ( rc )
return rc ;
2007-04-13 15:34:14 -07:00
/* Figure out if there is an adapter in the slot */
rc = rpaphp_get_sensor_state ( slot , & state ) ;
if ( rc )
return rc ;
2007-04-13 15:34:12 -07:00
2016-05-03 15:41:38 +10:00
bus = pci_find_bus_by_node ( slot - > dn ) ;
2007-04-13 15:34:16 -07:00
if ( ! bus ) {
2017-07-18 16:43:21 -05:00
err ( " %s: no pci_bus for dn %pOF \n " , __func__ , slot - > dn ) ;
2007-04-13 15:34:16 -07:00
return - EINVAL ;
2007-04-13 15:34:12 -07:00
}
2005-04-16 15:20:36 -07:00
2007-04-13 15:34:16 -07:00
slot - > bus = bus ;
slot - > pci_devs = & bus - > devices ;
/* if there's an adapter in the slot, go add the pci devices */
if ( state = = PRESENT ) {
slot - > state = NOT_CONFIGURED ;
/* non-empty slot has to have child */
if ( ! slot - > dn - > child ) {
err ( " %s: slot[%s]'s device_node doesn't have child for adapter \n " ,
2008-03-03 19:09:46 -08:00
__func__ , slot - > name ) ;
2007-04-13 15:34:16 -07:00
return - EINVAL ;
}
2020-03-06 18:39:01 +11:00
if ( list_empty ( & bus - > devices ) ) {
2020-03-06 18:39:03 +11:00
pseries_eeh_init_edev_recursive ( PCI_DN ( slot - > dn ) ) ;
2016-05-03 15:41:37 +10:00
pci_hp_add_devices ( bus ) ;
2020-03-06 18:39:01 +11:00
}
2007-04-13 15:34:16 -07:00
if ( ! list_empty ( & bus - > devices ) ) {
slot - > state = CONFIGURED ;
}
2007-04-13 15:34:17 -07:00
2008-10-13 09:59:12 -07:00
if ( rpaphp_debug ) {
2007-04-13 15:34:17 -07:00
struct pci_dev * dev ;
2017-07-18 16:43:21 -05:00
dbg ( " %s: pci_devs of slot[%pOF] \n " , __func__ , slot - > dn ) ;
2015-12-27 13:21:11 -08:00
list_for_each_entry ( dev , & bus - > devices , bus_list )
2007-04-13 15:34:17 -07:00
dbg ( " \t %s \n " , pci_name ( dev ) ) ;
}
2007-04-13 15:34:16 -07:00
}
2007-04-13 15:34:12 -07:00
2007-04-13 15:34:19 -07:00
return 0 ;
2005-04-16 15:20:36 -07:00
}