2018-01-26 14:22:04 -06:00
// SPDX-License-Identifier: GPL-2.0+
2005-04-16 15:20:36 -07:00
/*
* PCI Express Hot Plug Controller Driver
*
* Copyright ( C ) 1995 , 2001 Compaq Computer Corporation
* Copyright ( C ) 2001 Greg Kroah - Hartman ( greg @ kroah . com )
* Copyright ( C ) 2001 IBM Corp .
* Copyright ( C ) 2003 - 2004 Intel Corporation
*
* All rights reserved .
*
2005-08-16 15:16:10 -07:00
* Send feedback to < greg @ kroah . com > , < kristen . c . accardi @ intel . com >
2005-04-16 15:20:36 -07:00
*
2016-11-15 07:57:30 -06:00
* Authors :
* Dan Zink < dan . zink @ compaq . com >
* Greg Kroah - Hartman < greg @ kroah . com >
* Dely Sy < dely . l . sy @ intel . com > "
2005-04-16 15:20:36 -07:00
*/
2019-05-07 18:24:51 -05:00
# define pr_fmt(fmt) "pciehp: " fmt
# define dev_fmt pr_fmt
2023-10-18 14:32:50 +03:00
# include <linux/bitfield.h>
2005-04-16 15:20:36 -07:00
# include <linux/moduleparam.h>
# include <linux/kernel.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2005-04-16 15:20:36 -07:00
# include <linux/types.h>
# include <linux/pci.h>
# include "pciehp.h"
2018-07-19 17:27:57 -05:00
# include "../pci.h"
2005-04-16 15:20:36 -07:00
/* Global variables */
2012-01-13 09:32:20 +10:30
bool pciehp_poll_mode ;
2005-04-16 15:20:36 -07:00
int pciehp_poll_time ;
2016-08-24 16:57:52 -04:00
/*
* not really modular , but the easiest way to keep compat with existing
* bootargs behaviour is to continue using module_param here .
*/
2005-04-16 15:20:36 -07:00
module_param ( pciehp_poll_mode , bool , 0644 ) ;
module_param ( pciehp_poll_time , int , 0644 ) ;
MODULE_PARM_DESC ( pciehp_poll_mode , " Using polling mechanism for hot-plug events or not " ) ;
MODULE_PARM_DESC ( pciehp_poll_time , " Polling mechanism frequency, in seconds " ) ;
2015-12-27 13:21:11 -08:00
static int set_attention_status ( struct hotplug_slot * slot , u8 value ) ;
static int get_power_status ( struct hotplug_slot * slot , u8 * value ) ;
static int get_latch_status ( struct hotplug_slot * slot , u8 * value ) ;
static int get_adapter_status ( struct hotplug_slot * slot , u8 * value ) ;
2005-04-16 15:20:36 -07:00
2009-09-15 17:24:46 +09:00
static int init_slot ( struct controller * ctrl )
2005-04-16 15:20:36 -07:00
{
2018-09-08 09:59:01 +02:00
struct hotplug_slot_ops * ops ;
2008-10-20 17:41:38 -06:00
char name [ SLOT_NAME_SIZE ] ;
2018-09-08 09:59:01 +02:00
int retval ;
2009-09-15 17:24:46 +09:00
2009-10-05 17:41:37 +09:00
/* Setup hotplug slot ops */
ops = kzalloc ( sizeof ( * ops ) , GFP_KERNEL ) ;
if ( ! ops )
2018-09-08 09:59:01 +02:00
return - ENOMEM ;
2014-02-11 15:26:29 -07:00
2018-08-19 16:29:00 +02:00
ops - > enable_slot = pciehp_sysfs_enable_slot ;
ops - > disable_slot = pciehp_sysfs_disable_slot ;
2009-10-05 17:41:37 +09:00
ops - > get_power_status = get_power_status ;
ops - > get_adapter_status = get_adapter_status ;
2018-08-19 16:29:00 +02:00
ops - > reset_slot = pciehp_reset_slot ;
2009-10-05 17:41:37 +09:00
if ( MRL_SENS ( ctrl ) )
ops - > get_latch_status = get_latch_status ;
if ( ATTN_LED ( ctrl ) ) {
2018-08-19 16:29:00 +02:00
ops - > get_attention_status = pciehp_get_attention_status ;
2009-10-05 17:41:37 +09:00
ops - > set_attention_status = set_attention_status ;
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 10:31:59 -06:00
} else if ( ctrl - > pcie - > port - > hotplug_user_indicators ) {
ops - > get_attention_status = pciehp_get_raw_indicator_status ;
ops - > set_attention_status = pciehp_set_raw_indicator_status ;
2009-10-05 17:41:37 +09:00
}
2009-09-15 17:24:46 +09:00
/* register this slot with the hotplug pci core */
2018-09-08 09:59:01 +02:00
ctrl - > hotplug_slot . ops = ops ;
2009-09-15 17:31:16 +09:00
snprintf ( name , SLOT_NAME_SIZE , " %u " , PSN ( ctrl ) ) ;
2009-09-15 17:24:46 +09:00
2018-09-08 09:59:01 +02:00
retval = pci_hp_initialize ( & ctrl - > hotplug_slot ,
2018-07-19 17:27:43 -05:00
ctrl - > pcie - > port - > subordinate , 0 , name ) ;
2009-09-15 17:24:46 +09:00
if ( retval ) {
2018-09-08 09:59:01 +02:00
ctrl_err ( ctrl , " pci_hp_initialize failed: error %d \n " , retval ) ;
2009-10-05 17:41:37 +09:00
kfree ( ops ) ;
2005-04-16 15:20:36 -07:00
}
2006-12-21 17:01:02 -08:00
return retval ;
2005-04-16 15:20:36 -07:00
}
2009-09-15 17:24:46 +09:00
static void cleanup_slot ( struct controller * ctrl )
2005-04-16 15:20:36 -07:00
{
2018-09-08 09:59:01 +02:00
struct hotplug_slot * hotplug_slot = & ctrl - > hotplug_slot ;
PCI: hotplug: Demidlayer registration with the core
When a hotplug driver calls pci_hp_register(), all steps necessary for
registration are carried out in one go, including creation of a kobject
and addition to sysfs. That's a problem for pciehp once it's converted
to enable/disable the slot exclusively from the IRQ thread: The thread
needs to be spawned after creation of the kobject (because it uses the
kobject's name), but before addition to sysfs (because it will handle
enable/disable requests submitted via sysfs).
pci_hp_deregister() does offer a ->release callback that's invoked
after deletion from sysfs and before destruction of the kobject. But
because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
->probe and ->remove code becomes asymmetric, which is error prone
as recently discovered use-after-free bugs in pciehp's ->remove hook
have shown.
In a sense, this appears to be a case of the midlayer antipattern:
"The core thesis of the "midlayer mistake" is that midlayers are
bad and should not exist. That common functionality which it is
so tempting to put in a midlayer should instead be provided as
library routines which can [be] used, augmented, or ignored by
each bottom level driver independently. Thus every subsystem
that supports multiple implementations (or drivers) should
provide a very thin top layer which calls directly into the
bottom layer drivers, and a rich library of support code that
eases the implementation of those drivers. This library is
available to, but not forced upon, those drivers."
-- Neil Brown (2009), https://lwn.net/Articles/336262/
The presence of midlayer traits in the PCI hotplug core might be ascribed
to its age: When it was introduced in February 2002, the blessings of a
library approach might not have been well known:
https://git.kernel.org/tglx/history/c/a8a2069f432c
For comparison, the driver core does offer split functions for creating
a kobject (device_initialize()) and addition to sysfs (device_add()) as
an alternative to carrying out everything at once (device_register()).
This was introduced in October 2002:
https://git.kernel.org/tglx/history/c/8b290eb19962
The odd ->release callback in the PCI hotplug core was added in 2003:
https://git.kernel.org/tglx/history/c/69f8d663b595
Clearly, a library approach would not force every hotplug driver to
implement a ->release callback, but rather allow the driver to remove
the sysfs files, release its data structures and finally destroy the
kobject. Alternatively, a driver may choose to remove everything with
pci_hp_deregister(), then release its data structures.
To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
split-up version of pci_hp_register(). Likewise, offer pci_hp_del()
and pci_hp_destroy() as a split-up version of pci_hp_deregister().
Eliminate the ->release callback and move its code into each driver's
teardown routine.
Declare pci_hp_deregister() void, in keeping with the usual kernel
pattern that enablement can fail, but disablement cannot. It only
returned an error if the caller passed in a NULL pointer or a slot which
has never or is no longer registered or is sharing its name with another
slot. Those would be bugs, so WARN about them. Few hotplug drivers
actually checked the return value and those that did only printed a
useless error message to dmesg. Remove that.
For most drivers the conversion was straightforward since it doesn't
matter whether the code in the ->release callback is executed before or
after destruction of the kobject. But in the case of ibmphp, it was
unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
NULL needs to happen before the kobject is destroyed, so I erred on
the side of caution and ensured that the order stays the same. Another
nontrivial case is pnv_php, I've found the list and kref logic difficult
to understand, however my impression was that it is safe to delete the
list element and drop the references until after the kobject is
destroyed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
2018-07-19 17:27:43 -05:00
2018-07-19 17:27:43 -05:00
pci_hp_destroy ( hotplug_slot ) ;
PCI: hotplug: Demidlayer registration with the core
When a hotplug driver calls pci_hp_register(), all steps necessary for
registration are carried out in one go, including creation of a kobject
and addition to sysfs. That's a problem for pciehp once it's converted
to enable/disable the slot exclusively from the IRQ thread: The thread
needs to be spawned after creation of the kobject (because it uses the
kobject's name), but before addition to sysfs (because it will handle
enable/disable requests submitted via sysfs).
pci_hp_deregister() does offer a ->release callback that's invoked
after deletion from sysfs and before destruction of the kobject. But
because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
->probe and ->remove code becomes asymmetric, which is error prone
as recently discovered use-after-free bugs in pciehp's ->remove hook
have shown.
In a sense, this appears to be a case of the midlayer antipattern:
"The core thesis of the "midlayer mistake" is that midlayers are
bad and should not exist. That common functionality which it is
so tempting to put in a midlayer should instead be provided as
library routines which can [be] used, augmented, or ignored by
each bottom level driver independently. Thus every subsystem
that supports multiple implementations (or drivers) should
provide a very thin top layer which calls directly into the
bottom layer drivers, and a rich library of support code that
eases the implementation of those drivers. This library is
available to, but not forced upon, those drivers."
-- Neil Brown (2009), https://lwn.net/Articles/336262/
The presence of midlayer traits in the PCI hotplug core might be ascribed
to its age: When it was introduced in February 2002, the blessings of a
library approach might not have been well known:
https://git.kernel.org/tglx/history/c/a8a2069f432c
For comparison, the driver core does offer split functions for creating
a kobject (device_initialize()) and addition to sysfs (device_add()) as
an alternative to carrying out everything at once (device_register()).
This was introduced in October 2002:
https://git.kernel.org/tglx/history/c/8b290eb19962
The odd ->release callback in the PCI hotplug core was added in 2003:
https://git.kernel.org/tglx/history/c/69f8d663b595
Clearly, a library approach would not force every hotplug driver to
implement a ->release callback, but rather allow the driver to remove
the sysfs files, release its data structures and finally destroy the
kobject. Alternatively, a driver may choose to remove everything with
pci_hp_deregister(), then release its data structures.
To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
split-up version of pci_hp_register(). Likewise, offer pci_hp_del()
and pci_hp_destroy() as a split-up version of pci_hp_deregister().
Eliminate the ->release callback and move its code into each driver's
teardown routine.
Declare pci_hp_deregister() void, in keeping with the usual kernel
pattern that enablement can fail, but disablement cannot. It only
returned an error if the caller passed in a NULL pointer or a slot which
has never or is no longer registered or is sharing its name with another
slot. Those would be bugs, so WARN about them. Few hotplug drivers
actually checked the return value and those that did only printed a
useless error message to dmesg. Remove that.
For most drivers the conversion was straightforward since it doesn't
matter whether the code in the ->release callback is executed before or
after destruction of the kobject. But in the case of ibmphp, it was
unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
NULL needs to happen before the kobject is destroyed, so I erred on
the side of caution and ensured that the order stays the same. Another
nontrivial case is pnv_php, I've found the list and kref logic difficult
to understand, however my impression was that it is safe to delete the
list element and drop the references until after the kobject is
destroyed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
2018-07-19 17:27:43 -05:00
kfree ( hotplug_slot - > ops ) ;
2005-04-16 15:20:36 -07:00
}
/*
2019-09-03 14:10:20 +03:00
* set_attention_status - Turns the Attention Indicator on , off or blinking
2005-04-16 15:20:36 -07:00
*/
static int set_attention_status ( struct hotplug_slot * hotplug_slot , u8 status )
{
2018-09-08 09:59:01 +02:00
struct controller * ctrl = to_ctrl ( hotplug_slot ) ;
2018-09-18 21:46:17 +02:00
struct pci_dev * pdev = ctrl - > pcie - > port ;
2005-04-16 15:20:36 -07:00
2019-09-03 14:10:20 +03:00
if ( status )
2023-10-18 14:32:50 +03:00
status = FIELD_PREP ( PCI_EXP_SLTCTL_AIC , status ) ;
2019-09-03 14:10:20 +03:00
else
status = PCI_EXP_SLTCTL_ATTN_IND_OFF ;
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_get ( pdev ) ;
2019-09-03 14:10:20 +03:00
pciehp_set_indicators ( ctrl , INDICATOR_NOOP , status ) ;
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_put ( pdev ) ;
2013-12-14 13:06:16 -07:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
static int get_power_status ( struct hotplug_slot * hotplug_slot , u8 * value )
{
2018-09-08 09:59:01 +02:00
struct controller * ctrl = to_ctrl ( hotplug_slot ) ;
2018-09-18 21:46:17 +02:00
struct pci_dev * pdev = ctrl - > pcie - > port ;
2005-04-16 15:20:36 -07:00
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_get ( pdev ) ;
2018-09-18 21:46:17 +02:00
pciehp_get_power_status ( ctrl , value ) ;
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_put ( pdev ) ;
2013-12-14 13:06:16 -07:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
static int get_latch_status ( struct hotplug_slot * hotplug_slot , u8 * value )
{
2018-09-08 09:59:01 +02:00
struct controller * ctrl = to_ctrl ( hotplug_slot ) ;
2018-09-18 21:46:17 +02:00
struct pci_dev * pdev = ctrl - > pcie - > port ;
2005-04-16 15:20:36 -07:00
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_get ( pdev ) ;
2018-09-18 21:46:17 +02:00
pciehp_get_latch_status ( ctrl , value ) ;
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_put ( pdev ) ;
2013-12-14 13:06:16 -07:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
static int get_adapter_status ( struct hotplug_slot * hotplug_slot , u8 * value )
{
2018-09-08 09:59:01 +02:00
struct controller * ctrl = to_ctrl ( hotplug_slot ) ;
2018-09-18 21:46:17 +02:00
struct pci_dev * pdev = ctrl - > pcie - > port ;
2019-10-29 20:00:22 +03:00
int ret ;
2005-04-16 15:20:36 -07:00
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_get ( pdev ) ;
2019-10-29 20:00:22 +03:00
ret = pciehp_card_present_or_link_active ( ctrl ) ;
2018-07-19 17:27:57 -05:00
pci_config_pm_runtime_put ( pdev ) ;
2019-10-29 20:00:22 +03:00
if ( ret < 0 )
return ret ;
* value = ret ;
2013-12-14 13:06:16 -07:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
/**
* pciehp_check_presence ( ) - synthesize event if presence has changed
2020-07-29 22:12:19 +02:00
* @ ctrl : controller to check
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
*
* On probe and resume , an explicit presence check is necessary to bring up an
* occupied slot or bring down an unoccupied slot . This can ' t be triggered by
* events in the Slot Status register , they may be stale and are therefore
* cleared . Secondly , sending an interrupt for " events that occur while
* interrupt generation is disabled [ when ] interrupt generation is subsequently
* enabled " is optional per PCIe r4.0, sec 6.7.3.4.
*/
static void pciehp_check_presence ( struct controller * ctrl )
{
2019-10-29 20:00:22 +03:00
int occupied ;
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.
This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:
pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
============================================
WARNING: possible recursive locking detected
5.16.0-rc2+ #621 Not tainted
--------------------------------------------
irq/124-pciehp/86 is trying to acquire lock:
ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
but task is already holding lock:
ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ctrl->reset_lock);
lock(&ctrl->reset_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by irq/124-pciehp/86:
#0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
#1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
#2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
stack backtrace:
CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
Call Trace:
<TASK>
dump_stack_lvl+0x59/0x73
__lock_acquire.cold+0xc5/0x2c6
lock_acquire+0xb5/0x2b0
down_read+0x3e/0x50
pciehp_check_presence+0x23/0x80
pciehp_runtime_resume+0x5c/0xa0
device_for_each_child+0x45/0x70
pcie_port_device_runtime_resume+0x20/0x30
pci_pm_runtime_resume+0xa7/0xc0
__rpm_callback+0x41/0x110
rpm_callback+0x59/0x70
rpm_resume+0x512/0x7b0
__pm_runtime_resume+0x4a/0x90
__device_release_driver+0x28/0x240
device_release_driver+0x26/0x40
pci_stop_bus_device+0x68/0x90
pci_stop_bus_device+0x2c/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pciehp_unconfigure_device+0x6c/0x110
pciehp_disable_slot+0x5b/0xe0
pciehp_handle_presence_or_link_change+0xc3/0x2f0
pciehp_ist+0x179/0x180
This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.
Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.
The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.
Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
2021-12-17 15:17:09 +01:00
down_read_nested ( & ctrl - > reset_lock , ctrl - > depth ) ;
2018-09-08 09:59:01 +02:00
mutex_lock ( & ctrl - > state_lock ) ;
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
2018-09-08 09:59:01 +02:00
occupied = pciehp_card_present_or_link_active ( ctrl ) ;
2019-10-29 20:00:22 +03:00
if ( ( occupied > 0 & & ( ctrl - > state = = OFF_STATE | |
2018-09-18 21:46:17 +02:00
ctrl - > state = = BLINKINGON_STATE ) ) | |
( ! occupied & & ( ctrl - > state = = ON_STATE | |
ctrl - > state = = BLINKINGOFF_STATE ) ) )
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
pciehp_request ( ctrl , PCI_EXP_SLTSTA_PDC ) ;
2018-09-08 09:59:01 +02:00
mutex_unlock ( & ctrl - > state_lock ) ;
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
up_read ( & ctrl - > reset_lock ) ;
}
2009-01-13 14:44:19 +01:00
static int pciehp_probe ( struct pcie_device * dev )
2005-04-16 15:20:36 -07:00
{
int rc ;
struct controller * ctrl ;
2008-05-28 14:57:30 +09:00
2015-05-19 15:27:58 +02:00
/* If this is not a "hotplug" service, we have no business here. */
if ( dev - > service ! = PCIE_PORT_SERVICE_HP )
return - ENODEV ;
2007-08-09 16:09:35 -07:00
2014-06-09 23:03:32 +02:00
if ( ! dev - > port - > subordinate ) {
/* Can happen if we run out of bus numbers during probe */
2019-05-07 18:24:51 -05:00
pci_err ( dev - > port ,
2014-06-09 23:03:32 +02:00
" Hotplug bridge without secondary bus, ignoring \n " ) ;
2015-05-23 00:38:57 +02:00
return - ENODEV ;
2014-06-09 23:03:32 +02:00
}
2008-06-20 12:07:08 +09:00
ctrl = pcie_init ( dev ) ;
2005-04-16 15:20:36 -07:00
if ( ! ctrl ) {
2019-05-07 18:24:51 -05:00
pci_err ( dev - > port , " Controller initialization failed \n " ) ;
2015-05-23 00:38:57 +02:00
return - ENODEV ;
2005-04-16 15:20:36 -07:00
}
2008-06-26 20:06:24 +09:00
set_service_data ( dev , ctrl ) ;
2005-04-16 15:20:36 -07:00
/* Setup the slot information structures */
2009-09-15 17:24:46 +09:00
rc = init_slot ( ctrl ) ;
2005-04-16 15:20:36 -07:00
if ( rc ) {
2008-06-10 15:28:50 -06:00
if ( rc = = - EBUSY )
2014-04-18 20:13:50 -04:00
ctrl_warn ( ctrl , " Slot already registered by another hotplug driver \n " ) ;
2008-06-10 15:28:50 -06:00
else
2015-06-15 16:28:29 -05:00
ctrl_err ( ctrl , " Slot initialization failed (%d) \n " , rc ) ;
2006-12-21 17:01:05 -08:00
goto err_out_release_ctlr ;
2005-04-16 15:20:36 -07:00
}
2009-01-28 19:31:18 -08:00
/* Enable events after we have setup the data structures */
rc = pcie_init_notification ( ctrl ) ;
if ( rc ) {
2015-06-15 16:28:29 -05:00
ctrl_err ( ctrl , " Notification initialization failed (%d) \n " , rc ) ;
2009-10-05 17:43:29 +09:00
goto err_out_free_ctrl_slot ;
2009-01-28 19:31:18 -08:00
}
2018-07-19 17:27:43 -05:00
/* Publish to user space */
2018-09-08 09:59:01 +02:00
rc = pci_hp_add ( & ctrl - > hotplug_slot ) ;
2018-07-19 17:27:43 -05:00
if ( rc ) {
ctrl_err ( ctrl , " Publication to user space failed (%d) \n " , rc ) ;
goto err_out_shutdown_notification ;
}
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
pciehp_check_presence ( ctrl ) ;
2005-04-16 15:20:36 -07:00
return 0 ;
2018-07-19 17:27:43 -05:00
err_out_shutdown_notification :
pcie_shutdown_notification ( ctrl ) ;
2005-04-16 15:20:36 -07:00
err_out_free_ctrl_slot :
2009-09-15 17:24:46 +09:00
cleanup_slot ( ctrl ) ;
2006-12-21 17:01:05 -08:00
err_out_release_ctlr :
2009-09-15 17:30:48 +09:00
pciehp_release_ctrl ( ctrl ) ;
2005-04-16 15:20:36 -07:00
return - ENODEV ;
}
2009-09-15 17:30:48 +09:00
static void pciehp_remove ( struct pcie_device * dev )
2005-04-16 15:20:36 -07:00
{
2008-06-26 20:06:24 +09:00
struct controller * ctrl = get_service_data ( dev ) ;
2005-04-16 15:20:36 -07:00
2018-09-08 09:59:01 +02:00
pci_hp_del ( & ctrl - > hotplug_slot ) ;
2018-07-19 17:27:32 -05:00
pcie_shutdown_notification ( ctrl ) ;
2009-09-15 17:24:46 +09:00
cleanup_slot ( ctrl ) ;
2009-09-15 17:30:48 +09:00
pciehp_release_ctrl ( ctrl ) ;
2005-04-16 15:20:36 -07:00
}
# ifdef CONFIG_PM
2018-09-27 16:38:19 -05:00
static bool pme_is_native ( struct pcie_device * dev )
{
const struct pci_host_bridge * host ;
host = pci_find_host_bridge ( dev - > port - > bus ) ;
return pcie_ports_native | | host - > native_pme ;
}
2019-10-29 20:00:21 +03:00
static void pciehp_disable_interrupt ( struct pcie_device * dev )
2005-04-16 15:20:36 -07:00
{
2018-09-27 16:38:19 -05:00
/*
* Disable hotplug interrupt so that it does not trigger
* immediately when the downstream link goes down .
*/
if ( pme_is_native ( dev ) )
pcie_disable_interrupt ( get_service_data ( dev ) ) ;
2019-10-29 20:00:21 +03:00
}
2018-09-27 16:38:19 -05:00
2019-10-29 20:00:21 +03:00
# ifdef CONFIG_PM_SLEEP
static int pciehp_suspend ( struct pcie_device * dev )
{
/*
* If the port is already runtime suspended we can keep it that
* way .
*/
2020-04-18 18:52:48 +02:00
if ( dev_pm_skip_suspend ( & dev - > port - > dev ) )
2019-10-29 20:00:21 +03:00
return 0 ;
pciehp_disable_interrupt ( dev ) ;
2005-04-16 15:20:36 -07:00
return 0 ;
}
PCI: pciehp: Detect device replacement during system sleep
Ricky reports that replacing a device in a hotplug slot during ACPI sleep
state S3 does not cause re-enumeration on resume, as one would expect.
Instead, the new device is treated as if it was the old one.
There is no bulletproof way to detect device replacement, but as a
heuristic, check whether the device identity in config space matches cached
data in struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID,
Subsystem Vendor ID, Subsystem ID). Additionally, cache and compare the
Device Serial Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected,
mark the old device disconnected (to prevent its driver from accessing the
new device) and synthesize a Presence Detect Changed event.
The device identity in config space which is compared here is the same as
the one included in the signed Subject Alternative Name per PCIe r6.1 sec
6.31.3. Thus, the present commit prevents attacks where a valid device is
replaced with a malicious device during system sleep and the valid device's
driver obliviously accesses the malicious device.
This is about as much as can be done at the PCI layer. Drivers may have
additional ways to identify devices (such as reading a WWID from some
register) and may trigger re-enumeration when detecting an identity change
on resume.
Link: https://lore.kernel.org/r/a1afaa12f341d146ecbea27c1743661c71683833.1716992815.git.lukas@wunner.de
Reported-by: Ricky Wu <ricky_wu@realtek.com>
Closes: https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@realtek.com
Tested-by: Ricky Wu <ricky_wu@realtek.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-29 16:32:09 +02:00
static bool pciehp_device_replaced ( struct controller * ctrl )
{
struct pci_dev * pdev __free ( pci_dev_put ) ;
u32 reg ;
pdev = pci_get_slot ( ctrl - > pcie - > port - > subordinate , PCI_DEVFN ( 0 , 0 ) ) ;
if ( ! pdev )
return true ;
if ( pci_read_config_dword ( pdev , PCI_VENDOR_ID , & reg ) | |
reg ! = ( pdev - > vendor | ( pdev - > device < < 16 ) ) | |
pci_read_config_dword ( pdev , PCI_CLASS_REVISION , & reg ) | |
reg ! = ( pdev - > revision | ( pdev - > class < < 8 ) ) )
return true ;
if ( pdev - > hdr_type = = PCI_HEADER_TYPE_NORMAL & &
( pci_read_config_dword ( pdev , PCI_SUBSYSTEM_VENDOR_ID , & reg ) | |
reg ! = ( pdev - > subsystem_vendor | ( pdev - > subsystem_device < < 16 ) ) ) )
return true ;
if ( pci_get_dsn ( pdev ) ! = ctrl - > dsn )
return true ;
return false ;
}
2018-07-19 17:27:53 -05:00
static int pciehp_resume_noirq ( struct pcie_device * dev )
{
struct controller * ctrl = get_service_data ( dev ) ;
2018-07-19 17:27:54 -05:00
/* pci_restore_state() just wrote to the Slot Control register */
ctrl - > cmd_started = jiffies ;
ctrl - > cmd_busy = true ;
2018-07-19 17:27:53 -05:00
/* clear spurious events from rediscovery of inserted card */
PCI: pciehp: Detect device replacement during system sleep
Ricky reports that replacing a device in a hotplug slot during ACPI sleep
state S3 does not cause re-enumeration on resume, as one would expect.
Instead, the new device is treated as if it was the old one.
There is no bulletproof way to detect device replacement, but as a
heuristic, check whether the device identity in config space matches cached
data in struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID,
Subsystem Vendor ID, Subsystem ID). Additionally, cache and compare the
Device Serial Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected,
mark the old device disconnected (to prevent its driver from accessing the
new device) and synthesize a Presence Detect Changed event.
The device identity in config space which is compared here is the same as
the one included in the signed Subject Alternative Name per PCIe r6.1 sec
6.31.3. Thus, the present commit prevents attacks where a valid device is
replaced with a malicious device during system sleep and the valid device's
driver obliviously accesses the malicious device.
This is about as much as can be done at the PCI layer. Drivers may have
additional ways to identify devices (such as reading a WWID from some
register) and may trigger re-enumeration when detecting an identity change
on resume.
Link: https://lore.kernel.org/r/a1afaa12f341d146ecbea27c1743661c71683833.1716992815.git.lukas@wunner.de
Reported-by: Ricky Wu <ricky_wu@realtek.com>
Closes: https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@realtek.com
Tested-by: Ricky Wu <ricky_wu@realtek.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-29 16:32:09 +02:00
if ( ctrl - > state = = ON_STATE | | ctrl - > state = = BLINKINGOFF_STATE ) {
2018-07-19 17:27:53 -05:00
pcie_clear_hotplug_events ( ctrl ) ;
PCI: pciehp: Detect device replacement during system sleep
Ricky reports that replacing a device in a hotplug slot during ACPI sleep
state S3 does not cause re-enumeration on resume, as one would expect.
Instead, the new device is treated as if it was the old one.
There is no bulletproof way to detect device replacement, but as a
heuristic, check whether the device identity in config space matches cached
data in struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID,
Subsystem Vendor ID, Subsystem ID). Additionally, cache and compare the
Device Serial Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected,
mark the old device disconnected (to prevent its driver from accessing the
new device) and synthesize a Presence Detect Changed event.
The device identity in config space which is compared here is the same as
the one included in the signed Subject Alternative Name per PCIe r6.1 sec
6.31.3. Thus, the present commit prevents attacks where a valid device is
replaced with a malicious device during system sleep and the valid device's
driver obliviously accesses the malicious device.
This is about as much as can be done at the PCI layer. Drivers may have
additional ways to identify devices (such as reading a WWID from some
register) and may trigger re-enumeration when detecting an identity change
on resume.
Link: https://lore.kernel.org/r/a1afaa12f341d146ecbea27c1743661c71683833.1716992815.git.lukas@wunner.de
Reported-by: Ricky Wu <ricky_wu@realtek.com>
Closes: https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@realtek.com
Tested-by: Ricky Wu <ricky_wu@realtek.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-29 16:32:09 +02:00
/*
* If hotplugged device was replaced with a different one
* during system sleep , mark the old device disconnected
* ( to prevent its driver from accessing the new device )
* and synthesize a Presence Detect Changed event .
*/
if ( pciehp_device_replaced ( ctrl ) ) {
ctrl_dbg ( ctrl , " device replaced during system sleep \n " ) ;
pci_walk_bus ( ctrl - > pcie - > port - > subordinate ,
pci_dev_set_disconnected , NULL ) ;
pciehp_request ( ctrl , PCI_EXP_SLTSTA_PDC ) ;
}
}
2018-07-19 17:27:53 -05:00
return 0 ;
}
2019-10-29 20:00:21 +03:00
# endif
2018-07-19 17:27:53 -05:00
2014-04-18 20:13:49 -04:00
static int pciehp_resume ( struct pcie_device * dev )
2005-04-16 15:20:36 -07:00
{
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
struct controller * ctrl = get_service_data ( dev ) ;
2007-11-28 15:12:00 -08:00
2018-09-27 16:38:19 -05:00
if ( pme_is_native ( dev ) )
pcie_enable_interrupt ( ctrl ) ;
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 07:22:00 +02:00
pciehp_check_presence ( ctrl ) ;
PCI: pciehp: Enable/disable exclusively from IRQ thread
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 17:27:46 -05:00
2005-04-16 15:20:36 -07:00
return 0 ;
}
2018-09-27 16:41:49 -05:00
2019-10-29 20:00:21 +03:00
static int pciehp_runtime_suspend ( struct pcie_device * dev )
{
pciehp_disable_interrupt ( dev ) ;
return 0 ;
}
2018-09-27 16:41:49 -05:00
static int pciehp_runtime_resume ( struct pcie_device * dev )
{
struct controller * ctrl = get_service_data ( dev ) ;
/* pci_restore_state() just wrote to the Slot Control register */
ctrl - > cmd_started = jiffies ;
ctrl - > cmd_busy = true ;
/* clear spurious events from rediscovery of inserted card */
if ( ( ctrl - > state = = ON_STATE | | ctrl - > state = = BLINKINGOFF_STATE ) & &
pme_is_native ( dev ) )
pcie_clear_hotplug_events ( ctrl ) ;
return pciehp_resume ( dev ) ;
}
2009-02-15 22:32:48 +01:00
# endif /* PM */
2005-04-16 15:20:36 -07:00
static struct pcie_port_service_driver hpdriver_portdrv = {
2019-05-08 15:23:39 -05:00
. name = " pciehp " ,
2009-01-13 14:46:46 +01:00
. port_type = PCIE_ANY_PORT ,
. service = PCIE_PORT_SERVICE_HP ,
2005-04-16 15:20:36 -07:00
. probe = pciehp_probe ,
. remove = pciehp_remove ,
# ifdef CONFIG_PM
2019-10-29 20:00:21 +03:00
# ifdef CONFIG_PM_SLEEP
2005-04-16 15:20:36 -07:00
. suspend = pciehp_suspend ,
2018-07-19 17:27:53 -05:00
. resume_noirq = pciehp_resume_noirq ,
2005-04-16 15:20:36 -07:00
. resume = pciehp_resume ,
2019-10-29 20:00:21 +03:00
# endif
. runtime_suspend = pciehp_runtime_suspend ,
2018-09-27 16:41:49 -05:00
. runtime_resume = pciehp_runtime_resume ,
2005-04-16 15:20:36 -07:00
# endif /* PM */
PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de
Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch@kernel.org>
2021-07-31 14:39:01 +02:00
. slot_reset = pciehp_slot_reset ,
2005-04-16 15:20:36 -07:00
} ;
2018-09-20 10:27:06 -06:00
int __init pcie_hp_init ( void )
2005-04-16 15:20:36 -07:00
{
int retval = 0 ;
2005-10-31 16:20:07 -08:00
retval = pcie_port_service_register ( & hpdriver_portdrv ) ;
2019-05-07 18:24:53 -05:00
pr_debug ( " pcie_port_service_register = %d \n " , retval ) ;
2013-01-11 10:15:54 +08:00
if ( retval )
2019-05-07 18:24:53 -05:00
pr_debug ( " Failure to register service \n " ) ;
2013-01-11 10:15:54 +08:00
2005-04-16 15:20:36 -07:00
return retval ;
}