IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When platform_get_irq() fails we should propagate the real error value
instead of always returning -EINVAL.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
platform_get_irq() returns a negative number on failure, so adjust the
logic to detect such condition and propagate the real error value on
failure.
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jingoo Han <jingoohan1@gmail.com>
PCI_EXP_CAP is an iProc-specific value, so rename it to IPROC_PCI_EXP_CAP
to make it obvious that it's not related to the generic values like
PCI_EXP_RTCTL, etc. No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
During soft reset (e.g., "reboot" from Linux) on some iProc-based SOCs, the
LCPLL clock and PERST both go off simultaneously. This seems in accordance
with the PCIe Card Electromechanical spec, r2.0, sec 2.2.3, which says the
clock goes inactive after PERST# goes active, but doesn't specify how long
the clock should be valid after PERST#.
However, we have observed that with the iProc Stingray, some Intel NVMe
endpoints, e.g., the P3700 400GB series, are not detected correctly upon
the next boot sequence unless the clock remains valid for some time after
PERST# is asserted.
Delay 500ms after asserting PERST# before performing a reboot. The 500ms
is experimentally determined.
Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
[bhelgaas: changelog, add spec reference, fold in iproc_pcie_shutdown()
export from Arnd Bergmann <arnd@arndb.de>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
- Lots of hfi1 driver updates (mixed with a few qib and core updates as
well)
- rxe updates
- various mlx updates
- Set default roce type to RoCEv2
- Several larger fixes for bnxt_re that were too big for -rc
- Several larger fixes for qedr that, likewise, were too big for -rc
- Misc core changes
- Make the hns_roce driver compilable on arches other than aarch64 so we
can more easily debug build issues related to it
- Add rdma-netlink infrastructure updates
- Add automatic IRQ affinity infrastructure
- Add 32bit lid support
- Lots of misc fixes across the subsystem from random people
- Autoloading of RDMA netlink modules
- PCI pool cleanups from Romain Perier
- mlx5 driver feature additions and fixes
- Hardware tag matchine feature
- Fix sleeping in atomic when resolving roce ah
- Add experimental ioctl interface as posted to linux-api@
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZqBDtAAoJELgmozMOVy/dNlcQAJhYNRGaNUBx0L6+8t2xwUrt
7ndP6qlMar30DJY9FjTQCzRBw0CRMWkXdJD8rYlyaHy07pjWDKG8LZtxEXu1FLdZ
oNRvQX6ZJh8Bz7db2SQFBCTF2uWGZZFqWQCrSbQwjj9xxjMDs59u/knmwHVY9dKk
egjPG4IQBDmcTeNY7h1otG2hXpx7QPIOilQW2EFN5SWAuBAazdF2JKxjjxqhnUfp
gD2pSdgsm3VSMoo0zpMa6qOP+9GcOu8J97fYFhasRYWCavPdWHyq+XNu9S/eicRd
xbv+seCYM+9jPb2dsNdjEKll7w3yyWdu7h6tSCMPYv54eN9sDDiO1w2L2ZnESMZa
JRnSfB+HXru1r4RyHOTPO8peaNhYlR1V4u8bTS5G2dffbHis9BajkWoAR/oSiUcB
AIjIIDcdJFVGfpF9KIt/pEl+adHNgESibSijzOUYkyw6RNbPqDmdd7YakPHcQhKN
clE3zQfIsPRLWsToP/nkBE0tUd3tQocRuLy7ote7hXQK+0p7TBz0a6Kkj87MvX33
8dVbUI+q6WRlEY90l71y0ZdXy/AvkxkFxAc4Y7FQZyJxhEArTaKgfa5fmpRwVxBm
yi9baoYCspHNRNv6AO4IL86ZCJqmWBuch8CBY1n2X3h8IGfKYEZUAZ+T/mnTTeUq
A4joXduz94ZD4w23leD1
=2ntC
-----END PGP SIGNATURE-----
Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is a big pull request.
Of note is that I'm sending you the new ioctl API for the rdma
subsystem. We put it up on linux-api@, but didn't get much response.
The API is complex, but it solves two different problems in one go:
1) The bi-directional nature of the RDMA file write calls, which
created the security hole we had to handle (and for which the fix
is now causing problems for systems in production, we were a bit
over zealous in the fix and the ability to open a device, then
fork, then create new queue pairs on the device and use them is
broken).
2) The bloat caused by different vendors implementing extensions to
the base verbs API. Each vendor's hardware is slightly different,
and the hardware might be suitable for one extension but not
another.
By the time we add generic extensions for all the different ways
that the different hardware can offload things, the API becomes
bloated. Things like our completion structs have started to exceed
a cache line in size because of all the elements needed to support
this. That in turn shows up heavily in the performance graphs with
a noticable drop in performance on 100Gigabit links as our
completion structs go from occupying one cache line to 1+.
This API makes things like the completion structs modular in a
very similar way to netlink so that your structs can only include
the items needed for the offloads/features you are actually using
on a given queue pair. In that way we support everything, but only
use what we need, and our structs stay smaller.
The ioctl API is better explained by the posting on linux-api@ than I
can explain it here, so I'll just leave it at that.
The rest of the pull request is typical stuff.
Updates for 4.14 kernel merge window
- Lots of hfi1 driver updates (mixed with a few qib and core updates
as well)
- rxe updates
- various mlx updates
- Set default roce type to RoCEv2
- Several larger fixes for bnxt_re that were too big for -rc
- Several larger fixes for qedr that, likewise, were too big for -rc
- Misc core changes
- Make the hns_roce driver compilable on arches other than aarch64 so
we can more easily debug build issues related to it
- Add rdma-netlink infrastructure updates
- Add automatic IRQ affinity infrastructure
- Add 32bit lid support
- Lots of misc fixes across the subsystem from random people
- Autoloading of RDMA netlink modules
- PCI pool cleanups from Romain Perier
- mlx5 driver feature additions and fixes
- Hardware tag matchine feature
- Fix sleeping in atomic when resolving roce ah
- Add experimental ioctl interface as posted to linux-api@"
* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
IB/core: Expose ioctl interface through experimental Kconfig
IB/core: Assign root to all drivers
IB/core: Add completion queue (cq) object actions
IB/core: Add legacy driver's user-data
IB/core: Export ioctl enum types to user-space
IB/core: Explicitly destroy an object while keeping uobject
IB/core: Add macros for declaring methods and attributes
IB/core: Add uverbs merge trees functionality
IB/core: Add DEVICE object and root tree structure
IB/core: Declare an object instead of declaring only type attributes
IB/core: Add new ioctl interface
RDMA/vmw_pvrdma: Fix a signedness
RDMA/vmw_pvrdma: Report network header type in WC
IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
IB/cm: Fix sleeping in atomic when RoCE is used
IB/core: Add support to finalize objects in one transaction
IB/core: Add a generic way to execute an operation on a uobject
Documentation: Hardware tag matching
IB/mlx5: Support IB_SRQT_TM
net/mlx5: Add XRQ support
...
The "res" variable in pci_resource_io() is never used. Remove it.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
VMD currently only exists for Intel x86 products, so move the VMD quirk to
arch/x86.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
VMD hardware has to share its vectors among child devices in its PCI
domain so we should allocate as many as possible rather than just ones
that can be affinitized.
pci_alloc_irq_vectors_affinity() limits the number of affinitized IRQs to
the number of present CPUs (see irq_calc_affinity_vectors()). But we'd
prefer to have more vectors, even if they aren't distributed across the
CPUs, so use pci_alloc_irq_vectors() instead.
Reported-by: Brad Goodman <Bradley.Goodman@dell.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: add irq_calc_affinity_vectors() reference to changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Switch from using custom INTX_NUM macro to the generic PCI_NUM_INTX definition
for the number of INTx interrupts.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[bhelgaas: use subject/changelog from similar patches]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
MT2712 and MT7622's PCIe host controller support MSI, but only 32-bit MSI
addresses are supported. It connects to GIC with the same IRQ number as the
INTx IRQ, so it shares the same IRQ with INTx IRQ.
Add MSI support for MT2712 and MT7622.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[bhelgaas: changes to follow rcar & tegra: rename to mtk_pcie_msi_alloc(),
add mtk_pcie_msi_free(), free hwirq if irq_create_mapping() fails, call
irq_dispose_mapping() from mtk_msi_teardown_irq()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ryder Lee <ryder.lee@mediatek.com>
75983c6d1f38 ("PCI: mediatek: Add controller support for MT2712 and
MT7622") has put the mtk_pcie * into bus->sysdata. Take advantage of that
to get the private data and simplify the code.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ryder Lee <ryder.lee@mediatek.com>
MT2712 and MT7622 using a new IP block of Gen2 controller which has two
root ports and shares the same probing flow with MT2701/MT7623.
Both MT2712 and MT7622 have the same per-port control registers, but
there are slight differences between them:
- MT7622 has more clocks than MT2712.
- MT7622 has shared control registers which are used to enable LTSSM and
ASPM while MT2712 does not.
Add host controller support for MT2712/MT7622.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[bhelgaas: folded in fix from http://lkml.kernel.org/r/1502715868-17651-2-git-send-email-honghui.zhang@mediatek.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This is a transitional patch. We currently use platfarm_get_resource() for
retrieving the IOMEM resources, but there might be some chips don't have
subsys/shared registers part, which depends on platform design, and these
will be introduced in further patches.
Switch this function to use the platform_get_resource_byname() so that the
binding can be agnostic of the resource order.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Introduce a structure "mtk_pcie_soc" to abstract the differences between
controller generations, and the .startup() hook is used to encapsulate some
SoC-dependent related setting. In doing so, the common code which will be
reused by future chips.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Rename "port->index" to "port->slot" since the ports are hardwired at
PCI_SLOT. Also rename "mtk_pcie_parse_ports()" to "mtk_pcie_parse_port()"
since it parses one port each time.
No functional change in this patch.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Wait for Gen2 training with readl_poll_timeout(), and simplify the hardware
assert logical by merging it into a new mtk_pcie_startup_port() interface.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit a53e35db70d1 ("reset: Ensure drivers are explicit when requesting
reset lines") started to transition the reset control request API calls to
explicitly state whether the driver needs exclusive or shared reset control
behavior. Convert all drivers requesting exclusive resets to the explicit
API call so the temporary transition helpers can be removed.
No functional changes.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
A struct resource represents the address space consumed by a device. We
should not modify that resource while the device is actively using the
address space. For VFs, pci_iov_update_resource() enforces this by
printing a warning and doing nothing if the VFE (VF Enable) and MSE (VF
Memory Space Enable) bits are set.
Previously, both sriov_enable() and sriov_disable() called the
pcibios_sriov_disable() arch hook, which may update the struct resource,
while VFE and MSE were enabled. This effectively dropped the resource
update pcibios_sriov_disable() intended to do.
Disable VF memory decoding before calling pcibios_sriov_disable().
Reported-by: Carol L Soto <clsoto@us.ibm.com>
Tested-by: Carol L Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: shan.gavin@gmail.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
The ls2088a PCIe controller's register addresses are different from
ls2080a, so add a match entry to identify ls2088a PCIe.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Minghuan Lian <minghuan.Lian@nxp.com>
Previously we enabled writes to the DBI read-only registers so the Class
Code fix in dw_pcie_setup_rc() would work. But now dw_pcie_setup_rc()
enables write permission itself, so we don't need to do it here.
Stop enabling writes to the DBI read-only registers.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
Now that the Class Code fixup in dw_pcie_setup_rc() works, remove the fixup
from the Layerscape driver.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
dw_pcie_setup_rc() contains fixes to update the Class Code and Interrupt
Pin registers, but the fixes don't actually work because these registers
are read-only.
Enable write permission before updating the Class Code and Interrupt
Pin.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joao Pinto <jpinto@synopsys.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
The read-only DBI registers can be written only when the "Write to RO
Registers Using DBI" (DBI_RO_WR_EN) field of MISC_CONTROL_1_OFF is set.
Add accessors to enable and disable write permission, and use them instead
of accessing MISC_CONTROL_1_OFF directly.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joao Pinto <jpinto@synopsys.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
Disable all the outbound windows to avoid one transaction hitting multiple
outbound windows. dw_pcie_setup_rc() will reconfigure the outbound
windows, which may conflict with windows configured by the bootloader.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
ls1021_pcie_host_init() duplicated the code in the generic
ls_pcie_host_init(). Call ls_pcie_host_init() instead of duplicating the
code.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
Some platforms like K2G has reserved use of BAR_0 which shouldn't be
disabled by software. Avoid disabling all BARs during initialization.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
dra7xx has all base address registers (BAR) enabled by default. Reset all
BARs during initialization and so that BARs are enabled only if they are
actually used.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use the newly introduced __pci_epc_mem_init() instead of pci_epc_mem_init()
to provide page_size to pci_epc_mem. This is in preparation for
adding EP support to K2G which has a restriction that the
address region should be either divided into 1MB/2MB/4MB or 8MB
sizes (Ref: 11.14.4.9.1 Outbound Address Translation in K2G TRM SPRUHY8F
January 2016 – Revised May 2017).
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
epf_test is allocated using devm_kzalloc(). Hence it's not required to
explicitly free it in remove() callback. Since ->remove() callback doesn't
do anything other than freeing epf_test, remove the ->remove() callback.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Certain platforms like TI's K2G doesn't support link-up notification. Add
support to poll early (without waiting for the linkup notification) for
commands from the host.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_epf_test always maps the PCI_ENDPOINT_TEST registers to BAR_0. But if
BAR_0 is reserved for some other purpose (like in TI's K2G BAR_0 is mapped
to application registers and cannot be used to map any other regions),
PCI_ENDPOINT_TEST registers cannot be mapped making pci_epf_test unusable.
Add support to use any BAR to map PCI_ENDPOINT_TEST registers.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_epf_test_cmd_handler() is the delayed work function which reads
*command* (set by the host) and performs various actions requested by the
host periodically. If the value in *command* is '0', it goes to the
reset_handler where it resets *command* to '0' and queues
pci_epf_test_cmd_handler().
However if the host writes a value to the *command* just after the
pci-epf-test driver checks *command* for '0' and before the control goes to
reset_handler, the *command* will be reset to '0' and the pci-epf-test
driver won't be able to perform the actions requested by the host. Fix it
here by not resetting the *command* in the reset_handler.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
struct pci_epf_test_reg is the MEMSPACE of pci-epf-test function driver
that will be accessed by the "host" for programming the pci-epf-test
device. So this structure shouldn't be subjected to compiler optimization
in pci_epf_test_cmd_handler() since the values can be changed by code
outside the scope of current code at any time.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci-epc-mem uses a page size equal to *PAGE_SIZE* (usually 4KB) to manage
the address space. However certain platforms like TI's K2G have a
restriction that this address space should be either divided into
1MB/2MB/4MB or 8MB sizes (Ref: 11.14.4.9.1 Outbound Address Translation in
K2G TRM SPRUHY8F January 2016 – Revised May 2017). Add support to handle
different page sizes here.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Make ->remove() callback optional so that endpoint function drivers don't
have to populate empty ->remove() callback functions.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We will use the generic ls_pcie_link_up() and ls_pcie_host_init() from
device-specific routines. Move the generic functions earlier in the file
so we won't need forward declarations. This is strictly a code move with
no functional change intended.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
The current code depends on class code and multifunction fixups done by the
bootloader. Perform these fixups in ls1021_pcie_host_init() to remove this
dependency.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
The STRFMR1 is not a DBI read-only register, so move it out from the
write-enable bracket.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
We called dw_pcie_setup_rc() from the ls1021a host init function, but not
from the common ls_pcie_host_init() function, so platforms other than
ls1021a still depended on initialization by the bootloader.
Call dw_pcie_setup_rc() from ls_pcie_host_init() to reduce dependencies on
the bootloader.
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
Add a print statement in pci_bus_wait_crs() so that user observes the
progress of device polling instead of silently waiting for timeout to be
reached.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: check for timeout first so we don't print "waiting, giving up",
always print time we've slept (not the actual timeout, print a "ready"
message if we've printed a "waiting" message]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Sporadic reset issues have been observed with an Intel 750 NVMe drive while
assigning the physical function to the guest machine. The sequence of
events observed is as follows:
- perform a Function Level Reset (FLR)
- sleep up to 1000ms total
- read ~0 from PCI_COMMAND (CRS completion for config read)
- warn that the device didn't return from FLR
- touch the device before it's ready
- device drops config writes when we restore register settings (there's
no mechanism for software to learn about CRS completions for writes)
- incomplete register restore leaves device in inconsistent state
- device probe fails because device is in inconsistent state
After reset, an endpoint may respond to config requests with Configuration
Request Retry Status (CRS) to indicate that it is not ready to accept new
requests. See PCIe r3.1, sec 2.3.1 and 6.6.2.
Increase the timeout value from 1 second to 60 seconds to cover the period
where device responds with CRS and also report polling progress.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: include the mandatory 100ms in the delays we print]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Configuration Request Retry Status (CRS) was previously hidden inside
pci_bus_read_dev_vendor_id(). We want to add support for CRS in other
situations, such as waiting for a device to become ready after a Function
Level Reset.
Move CRS handling into pci_bus_wait_crs() so it can be called from other
places.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: pass pointer, not value, to pci_bus_wait_crs() so caller gets
correct Vendor ID]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add pci_bus_crs_vendor_id() to determine whether data returned for a config
read of the Vendor ID indicates a Configuration Request Retry Status (CRS)
response.
Per PCIe r3.1, sec 2.3.2, this data is only returned if:
- CRS Software Visibility is enabled,
- a config read includes both bytes of the Vendor ID, and
- the read receives a CRS completion
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: changelog, change name to pci_bus_crs_vendor_id(), make static
in probe.c, use it in pci_bus_read_dev_vendor_id()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
While waiting for a device to become ready (i.e., to return a non-CRS
completion to a read of its Vendor ID), if we got a valid response to the
very last read before timing out, we printed a warning and gave up on the
device even though it was actually ready.
For a typical 60s timeout, we wait about 65s (it's not exact because of the
exponential backoff), but we treated devices that became ready between 33s
and 65s as though they failed.
Move the Device ID read later so we check whether the device is ready
before checking for a timeout.
Thanks to Sinan Kaya <okaya@codeaurora.org>, reorder reads so we always
check device presence after sleep, since it's pointless to sleep unless we
recheck afterwards.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>