IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Currently, if the runtime power management is enabled for vfio-pci
based devices in the guest OS, then the guest OS will do the register
write for PCI_PM_CTRL register. This write request will be handled in
vfio_pm_config_write() where it will do the actual register write of
PCI_PM_CTRL register. With this, the maximum D3hot state can be
achieved for low power. If we can use the runtime PM framework, then
we can achieve the D3cold state (on the supported systems) which will
help in saving maximum power.
1. D3cold state can't be achieved by writing PCI standard
PM config registers. This patch implements the following
newly added low power related device features:
- VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY
- VFIO_DEVICE_FEATURE_LOW_POWER_EXIT
The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY feature will allow the
device to make use of low power platform states on the host
while the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will prevent
further use of those power states.
2. The vfio-pci driver uses runtime PM framework for low power entry and
exit. On the platforms where D3cold state is supported, the runtime
PM framework will put the device into D3cold otherwise, D3hot or some
other power state will be used.
There are various cases where the device will not go into the runtime
suspended state. For example,
- The runtime power management is disabled on the host side for
the device.
- The user keeps the device busy after calling LOW_POWER_ENTRY.
- There are dependent devices that are still in runtime active state.
For these cases, the device will be in the same power state that has
been configured by the user through PCI_PM_CTRL register.
3. The hypervisors can implement virtual ACPI methods. For example,
in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power
resources with _ON/_OFF method, then guest linux OS invokes
the _OFF method during D3cold transition and then _ON during D0
transition. The hypervisor can tap these virtual ACPI calls and then
call the low power device feature IOCTL.
4. The 'pm_runtime_engaged' flag tracks the entry and exit to
runtime PM. This flag is protected with 'memory_lock' semaphore.
5. All the config and other region access are wrapped under
pm_runtime_resume_and_get() and pm_runtime_put(). So, if any
device access happens while the device is in the runtime suspended
state, then the device will be resumed first before access. Once the
access has been finished, then the device will again go into the
runtime suspended state.
6. The memory region access through mmap will not be allowed in the low
power state. Since __vfio_pci_memory_enabled() is a common function,
so check for 'pm_runtime_engaged' has been added explicitly in
vfio_pci_mmap_fault() to block only mmap'ed access.
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Link: https://lore.kernel.org/r/20220829114850.4341-5-abhsahu@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>