Brett Creeley
7dabb1bcd1
vfio/pds: Add support for firmware recovery
It's possible that the device firmware crashes and is able to recover due to some configuration and/or other issue. If a live migration is in progress while the firmware crashes, the live migration will fail. However, the VF PCI device should still be functional post crash recovery and subsequent migrations should go through as expected. When the pds_core device notices that firmware crashes it sends an event to all its client drivers. When the pds_vfio driver receives this event while migration is in progress it will request a deferred reset on the next migration state transition. This state transition will report failure as well as any subsequent state transition requests from the VMM/VFIO. Based on uapi/vfio.h the only way out of VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET. Once this reset is done, the migration state will be reset to VFIO_DEVICE_STATE_RUNNING and migration can be performed. If the event is received while no migration is in progress (i.e. the VM is in normal operating mode), then no actions are taken and the migration state remains VFIO_DEVICE_STATE_RUNNING. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-8-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.
Description
Languages
C
97.6%
Assembly
1%
Shell
0.5%
Python
0.3%
Makefile
0.3%