a9cf69d0e7
- Cleanup use of extern in function prototypes (Alex Williamson) - Simplify bus_type usage and convert to device IOMMU interfaces (Robin Murphy) - Check missed return value and fix comment typos (Bo Liu) - Split migration ops from device ops and fix races in mlx5 migration support (Yishai Hadas) - Fix missed return value check in noiommu support (Liam Ni) - Hardening to clear buffer pointer to avoid use-after-free (Schspa Shi) - Remove requirement that only the same mm can unmap a previously mapped range (Li Zhe) - Adjust semaphore release vs device open counter (Yi Liu) - Remove unused arg from SPAPR support code (Deming Wang) - Rework vfio-ccw driver to better fit new mdev framework (Eric Farman, Michael Kawano) - Replace DMA unmap notifier with callbacks (Jason Gunthorpe) - Clarify SPAPR support comment relative to iommu_ops (Alexey Kardashevskiy) - Revise page pinning API towards compatibility with future iommufd support (Nicolin Chen) - Resolve issues in vfio-ccw, including use of DMA unmap callback (Eric Farman) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmLqvYMbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiHM0P/1n/bszel20PRC7x+NLI P7b/0aonW4Qtei2HORwowmaznb4NgRE5GCm5RU+a9+AwQKnK44j3lqy0skcfgZXr f4viFlxOyd0H4blOhUZ+FuPNkUMAyz6HerzvJ9jQFG426pL5vr7UKWBuJPYB5RCT 4jEy3EUTSH8/Zt8ApLysFTyR64xN3Sk7vSUcj9rEhu5T3FWq8t9+jb3tE/HW/Xaw pMwdC+ctYzYaBD/oA7Ns2IebNS9AUIUjKMXC25oCmc83WGgGOqgLB2mAthQ2NKB5 5capKBYuYl7PWERvpGpsPILEWvR6m+Rxh8r4Pqjcoyfq4k7vp+A/AFKiD7AEYBdy BtfLWO59w6vuRQ5XXOa6Hu4ef6BcMvH4StrHxlHkKcgI4PJA0QscIXiJPQSt7Crr m+kCNgPPgrfZDu7lmZTiWbXOYSkJR3Mxkhf2iNHudW9SsJT9pUAVEiGVVA/kC1Y/ fNBziRQeVF6JUW8M4pveXEWEbA8iE1HQeJA6aVRonxAkJk1KBaQgm/GKJlPXCHIR R6lI90NXZHz/3ndIX1znKOm0qli+8auX/FH8iWUffZxGmtINOGGMYebD6YxFdCCJ sWalL8vlQNCams2MZdovu/5BowXWtwOMm6KNG9RXSyWIWZEcNVbAzhTr+rrDdHZd AJiUNCGO9UlO9FZM+ntfQTSr =4BE8 -----END PGP SIGNATURE----- Merge tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Cleanup use of extern in function prototypes (Alex Williamson) - Simplify bus_type usage and convert to device IOMMU interfaces (Robin Murphy) - Check missed return value and fix comment typos (Bo Liu) - Split migration ops from device ops and fix races in mlx5 migration support (Yishai Hadas) - Fix missed return value check in noiommu support (Liam Ni) - Hardening to clear buffer pointer to avoid use-after-free (Schspa Shi) - Remove requirement that only the same mm can unmap a previously mapped range (Li Zhe) - Adjust semaphore release vs device open counter (Yi Liu) - Remove unused arg from SPAPR support code (Deming Wang) - Rework vfio-ccw driver to better fit new mdev framework (Eric Farman, Michael Kawano) - Replace DMA unmap notifier with callbacks (Jason Gunthorpe) - Clarify SPAPR support comment relative to iommu_ops (Alexey Kardashevskiy) - Revise page pinning API towards compatibility with future iommufd support (Nicolin Chen) - Resolve issues in vfio-ccw, including use of DMA unmap callback (Eric Farman) * tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio: (40 commits) vfio/pci: fix the wrong word vfio/ccw: Check return code from subchannel quiesce vfio/ccw: Remove FSM Close from remove handlers vfio/ccw: Add length to DMA_UNMAP checks vfio: Replace phys_pfn with pages for vfio_pin_pages() vfio/ccw: Add kmap_local_page() for memcpy vfio: Rename user_iova of vfio_dma_rw() vfio/ccw: Change pa_pfn list to pa_iova list vfio/ap: Change saved_pfn to saved_iova vfio: Pass in starting IOVA to vfio_pin/unpin_pages API vfio/ccw: Only pass in contiguous pages vfio/ap: Pass in physical address of ind to ap_aqic() drm/i915/gvt: Replace roundup with DIV_ROUND_UP vfio: Make vfio_unpin_pages() return void vfio/spapr_tce: Fix the comment vfio: Replace the iommu notifier with a device list vfio: Replace the DMA unmapping notifier with a callback vfio/ccw: Move FSM open/close to MDEV open/close vfio/ccw: Refactor vfio_ccw_mdev_reset vfio/ccw: Create a CLOSE FSM event ...
382 lines
13 KiB
ReStructuredText
382 lines
13 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0-only
|
|
.. include:: <isonum.txt>
|
|
|
|
=====================
|
|
VFIO Mediated devices
|
|
=====================
|
|
|
|
:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
|
|
:Author: Neo Jia <cjia@nvidia.com>
|
|
:Author: Kirti Wankhede <kwankhede@nvidia.com>
|
|
|
|
|
|
|
|
Virtual Function I/O (VFIO) Mediated devices[1]
|
|
===============================================
|
|
|
|
The number of use cases for virtualizing DMA devices that do not have built-in
|
|
SR_IOV capability is increasing. Previously, to virtualize such devices,
|
|
developers had to create their own management interfaces and APIs, and then
|
|
integrate them with user space software. To simplify integration with user space
|
|
software, we have identified common requirements and a unified management
|
|
interface for such devices.
|
|
|
|
The VFIO driver framework provides unified APIs for direct device access. It is
|
|
an IOMMU/device-agnostic framework for exposing direct device access to user
|
|
space in a secure, IOMMU-protected environment. This framework is used for
|
|
multiple devices, such as GPUs, network adapters, and compute accelerators. With
|
|
direct device access, virtual machines or user space applications have direct
|
|
access to the physical device. This framework is reused for mediated devices.
|
|
|
|
The mediated core driver provides a common interface for mediated device
|
|
management that can be used by drivers of different devices. This module
|
|
provides a generic interface to perform these operations:
|
|
|
|
* Create and destroy a mediated device
|
|
* Add a mediated device to and remove it from a mediated bus driver
|
|
* Add a mediated device to and remove it from an IOMMU group
|
|
|
|
The mediated core driver also provides an interface to register a bus driver.
|
|
For example, the mediated VFIO mdev driver is designed for mediated devices and
|
|
supports VFIO APIs. The mediated bus driver adds a mediated device to and
|
|
removes it from a VFIO group.
|
|
|
|
The following high-level block diagram shows the main components and interfaces
|
|
in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
|
|
devices as examples, as these devices are the first devices to use this module::
|
|
|
|
+---------------+
|
|
| |
|
|
| +-----------+ | mdev_register_driver() +--------------+
|
|
| | | +<------------------------+ |
|
|
| | mdev | | | |
|
|
| | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user
|
|
| | driver | | probe()/remove() | | APIs
|
|
| | | | +--------------+
|
|
| +-----------+ |
|
|
| |
|
|
| MDEV CORE |
|
|
| MODULE |
|
|
| mdev.ko |
|
|
| +-----------+ | mdev_register_device() +--------------+
|
|
| | | +<------------------------+ |
|
|
| | | | | nvidia.ko |<-> physical
|
|
| | | +------------------------>+ | device
|
|
| | | | callbacks +--------------+
|
|
| | Physical | |
|
|
| | device | | mdev_register_device() +--------------+
|
|
| | interface | |<------------------------+ |
|
|
| | | | | i915.ko |<-> physical
|
|
| | | +------------------------>+ | device
|
|
| | | | callbacks +--------------+
|
|
| | | |
|
|
| | | | mdev_register_device() +--------------+
|
|
| | | +<------------------------+ |
|
|
| | | | | ccw_device.ko|<-> physical
|
|
| | | +------------------------>+ | device
|
|
| | | | callbacks +--------------+
|
|
| +-----------+ |
|
|
+---------------+
|
|
|
|
|
|
Registration Interfaces
|
|
=======================
|
|
|
|
The mediated core driver provides the following types of registration
|
|
interfaces:
|
|
|
|
* Registration interface for a mediated bus driver
|
|
* Physical device driver interface
|
|
|
|
Registration Interface for a Mediated Bus Driver
|
|
------------------------------------------------
|
|
|
|
The registration interface for a mediated device driver provides the following
|
|
structure to represent a mediated device's driver::
|
|
|
|
/*
|
|
* struct mdev_driver [2] - Mediated device's driver
|
|
* @probe: called when new device created
|
|
* @remove: called when device removed
|
|
* @driver: device driver structure
|
|
*/
|
|
struct mdev_driver {
|
|
int (*probe) (struct mdev_device *dev);
|
|
void (*remove) (struct mdev_device *dev);
|
|
struct attribute_group **supported_type_groups;
|
|
struct device_driver driver;
|
|
};
|
|
|
|
A mediated bus driver for mdev should use this structure in the function calls
|
|
to register and unregister itself with the core driver:
|
|
|
|
* Register::
|
|
|
|
int mdev_register_driver(struct mdev_driver *drv);
|
|
|
|
* Unregister::
|
|
|
|
void mdev_unregister_driver(struct mdev_driver *drv);
|
|
|
|
The mediated bus driver's probe function should create a vfio_device on top of
|
|
the mdev_device and connect it to an appropriate implementation of
|
|
vfio_device_ops.
|
|
|
|
When a driver wants to add the GUID creation sysfs to an existing device it has
|
|
probe'd to then it should call::
|
|
|
|
int mdev_register_device(struct device *dev,
|
|
struct mdev_driver *mdev_driver);
|
|
|
|
This will provide the 'mdev_supported_types/XX/create' files which can then be
|
|
used to trigger the creation of a mdev_device. The created mdev_device will be
|
|
attached to the specified driver.
|
|
|
|
When the driver needs to remove itself it calls::
|
|
|
|
void mdev_unregister_device(struct device *dev);
|
|
|
|
Which will unbind and destroy all the created mdevs and remove the sysfs files.
|
|
|
|
Mediated Device Management Interface Through sysfs
|
|
==================================================
|
|
|
|
The management interface through sysfs enables user space software, such as
|
|
libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
|
|
This management interface provides flexibility to the underlying physical
|
|
device's driver to support features such as:
|
|
|
|
* Mediated device hot plug
|
|
* Multiple mediated devices in a single virtual machine
|
|
* Multiple mediated devices from different physical devices
|
|
|
|
Links in the mdev_bus Class Directory
|
|
-------------------------------------
|
|
The /sys/class/mdev_bus/ directory contains links to devices that are registered
|
|
with the mdev core driver.
|
|
|
|
Directories and files under the sysfs for Each Physical Device
|
|
--------------------------------------------------------------
|
|
|
|
::
|
|
|
|
|- [parent physical device]
|
|
|--- Vendor-specific-attributes [optional]
|
|
|--- [mdev_supported_types]
|
|
| |--- [<type-id>]
|
|
| | |--- create
|
|
| | |--- name
|
|
| | |--- available_instances
|
|
| | |--- device_api
|
|
| | |--- description
|
|
| | |--- [devices]
|
|
| |--- [<type-id>]
|
|
| | |--- create
|
|
| | |--- name
|
|
| | |--- available_instances
|
|
| | |--- device_api
|
|
| | |--- description
|
|
| | |--- [devices]
|
|
| |--- [<type-id>]
|
|
| |--- create
|
|
| |--- name
|
|
| |--- available_instances
|
|
| |--- device_api
|
|
| |--- description
|
|
| |--- [devices]
|
|
|
|
* [mdev_supported_types]
|
|
|
|
The list of currently supported mediated device types and their details.
|
|
|
|
[<type-id>], device_api, and available_instances are mandatory attributes
|
|
that should be provided by vendor driver.
|
|
|
|
* [<type-id>]
|
|
|
|
The [<type-id>] name is created by adding the device driver string as a prefix
|
|
to the string provided by the vendor driver. This format of this name is as
|
|
follows::
|
|
|
|
sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
|
|
|
|
(or using mdev_parent_dev(mdev) to arrive at the parent device outside
|
|
of the core mdev code)
|
|
|
|
* device_api
|
|
|
|
This attribute should show which device API is being created, for example,
|
|
"vfio-pci" for a PCI device.
|
|
|
|
* available_instances
|
|
|
|
This attribute should show the number of devices of type <type-id> that can be
|
|
created.
|
|
|
|
* [device]
|
|
|
|
This directory contains links to the devices of type <type-id> that have been
|
|
created.
|
|
|
|
* name
|
|
|
|
This attribute should show human readable name. This is optional attribute.
|
|
|
|
* description
|
|
|
|
This attribute should show brief features/description of the type. This is
|
|
optional attribute.
|
|
|
|
Directories and Files Under the sysfs for Each mdev Device
|
|
----------------------------------------------------------
|
|
|
|
::
|
|
|
|
|- [parent phy device]
|
|
|--- [$MDEV_UUID]
|
|
|--- remove
|
|
|--- mdev_type {link to its type}
|
|
|--- vendor-specific-attributes [optional]
|
|
|
|
* remove (write only)
|
|
|
|
Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
|
|
fail the remove() callback if that device is active and the vendor driver
|
|
doesn't support hot unplug.
|
|
|
|
Example::
|
|
|
|
# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
|
|
|
|
Mediated device Hot plug
|
|
------------------------
|
|
|
|
Mediated devices can be created and assigned at runtime. The procedure to hot
|
|
plug a mediated device is the same as the procedure to hot plug a PCI device.
|
|
|
|
Translation APIs for Mediated Devices
|
|
=====================================
|
|
|
|
The following APIs are provided for translating user pfn to host pfn in a VFIO
|
|
driver::
|
|
|
|
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
|
|
int npage, int prot, struct page **pages);
|
|
|
|
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
|
|
int npage);
|
|
|
|
These functions call back into the back-end IOMMU module by using the pin_pages
|
|
and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
|
|
these callbacks are supported in the TYPE1 IOMMU module. To enable them for
|
|
other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
|
|
these two callback functions.
|
|
|
|
Using the Sample Code
|
|
=====================
|
|
|
|
mtty.c in samples/vfio-mdev/ directory is a sample driver program to
|
|
demonstrate how to use the mediated device framework.
|
|
|
|
The sample driver creates an mdev device that simulates a serial port over a PCI
|
|
card.
|
|
|
|
1. Build and load the mtty.ko module.
|
|
|
|
This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
|
|
|
|
Files in this device directory in sysfs are similar to the following::
|
|
|
|
# tree /sys/devices/virtual/mtty/mtty/
|
|
/sys/devices/virtual/mtty/mtty/
|
|
|-- mdev_supported_types
|
|
| |-- mtty-1
|
|
| | |-- available_instances
|
|
| | |-- create
|
|
| | |-- device_api
|
|
| | |-- devices
|
|
| | `-- name
|
|
| `-- mtty-2
|
|
| |-- available_instances
|
|
| |-- create
|
|
| |-- device_api
|
|
| |-- devices
|
|
| `-- name
|
|
|-- mtty_dev
|
|
| `-- sample_mtty_dev
|
|
|-- power
|
|
| |-- autosuspend_delay_ms
|
|
| |-- control
|
|
| |-- runtime_active_time
|
|
| |-- runtime_status
|
|
| `-- runtime_suspended_time
|
|
|-- subsystem -> ../../../../class/mtty
|
|
`-- uevent
|
|
|
|
2. Create a mediated device by using the dummy device that you created in the
|
|
previous step::
|
|
|
|
# echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
|
|
/sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
|
|
|
|
3. Add parameters to qemu-kvm::
|
|
|
|
-device vfio-pci,\
|
|
sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
|
|
|
|
4. Boot the VM.
|
|
|
|
In the Linux guest VM, with no hardware on the host, the device appears
|
|
as follows::
|
|
|
|
# lspci -s 00:05.0 -xxvv
|
|
00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
|
|
Subsystem: Device 4348:3253
|
|
Physical Slot: 5
|
|
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
|
|
Stepping- SERR- FastB2B- DisINTx-
|
|
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
|
|
<TAbort- <MAbort- >SERR- <PERR- INTx-
|
|
Interrupt: pin A routed to IRQ 10
|
|
Region 0: I/O ports at c150 [size=8]
|
|
Region 1: I/O ports at c158 [size=8]
|
|
Kernel driver in use: serial
|
|
00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
|
|
10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
|
|
20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
|
|
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
|
|
|
|
In the Linux guest VM, dmesg output for the device is as follows:
|
|
|
|
serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
|
|
0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
|
|
0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
|
|
|
|
|
|
5. In the Linux guest VM, check the serial ports::
|
|
|
|
# setserial -g /dev/ttyS*
|
|
/dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
|
|
/dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
|
|
/dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
|
|
|
|
6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
|
|
/dev/ttyS2 with hardware flow control disabled.
|
|
|
|
7. Type data on the minicom terminal or send data to the terminal emulation
|
|
program and read the data.
|
|
|
|
Data is loop backed from hosts mtty driver.
|
|
|
|
8. Destroy the mediated device that you created::
|
|
|
|
# echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
|
|
|
|
References
|
|
==========
|
|
|
|
1. See Documentation/driver-api/vfio.rst for more information on VFIO.
|
|
2. struct mdev_driver in include/linux/mdev.h
|
|
3. struct mdev_parent_ops in include/linux/mdev.h
|
|
4. struct vfio_iommu_driver_ops in include/linux/vfio.h
|