2b034e82ff
A sentence that checks for later spec version is meant for developers hacking the documentation source. Make it comment block (hidden from actual output). Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Message-Id: <20221220095828.27588-4-bagasdotme@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
146 lines
5.2 KiB
ReStructuredText
146 lines
5.2 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
.. _virtio:
|
|
|
|
===============
|
|
Virtio on Linux
|
|
===============
|
|
|
|
Introduction
|
|
============
|
|
|
|
Virtio is an open standard that defines a protocol for communication
|
|
between drivers and devices of different types, see Chapter 5 ("Device
|
|
Types") of the virtio spec (`[1]`_). Originally developed as a standard
|
|
for paravirtualized devices implemented by a hypervisor, it can be used
|
|
to interface any compliant device (real or emulated) with a driver.
|
|
|
|
For illustrative purposes, this document will focus on the common case
|
|
of a Linux kernel running in a virtual machine and using paravirtualized
|
|
devices provided by the hypervisor, which exposes them as virtio devices
|
|
via standard mechanisms such as PCI.
|
|
|
|
|
|
Device - Driver communication: virtqueues
|
|
=========================================
|
|
|
|
Although the virtio devices are really an abstraction layer in the
|
|
hypervisor, they're exposed to the guest as if they are physical devices
|
|
using a specific transport method -- PCI, MMIO or CCW -- that is
|
|
orthogonal to the device itself. The virtio spec defines these transport
|
|
methods in detail, including device discovery, capabilities and
|
|
interrupt handling.
|
|
|
|
The communication between the driver in the guest OS and the device in
|
|
the hypervisor is done through shared memory (that's what makes virtio
|
|
devices so efficient) using specialized data structures called
|
|
virtqueues, which are actually ring buffers [#f1]_ of buffer descriptors
|
|
similar to the ones used in a network device:
|
|
|
|
.. kernel-doc:: include/uapi/linux/virtio_ring.h
|
|
:identifiers: struct vring_desc
|
|
|
|
All the buffers the descriptors point to are allocated by the guest and
|
|
used by the host either for reading or for writing but not for both.
|
|
|
|
Refer to Chapter 2.5 ("Virtqueues") of the virtio spec (`[1]`_) for the
|
|
reference definitions of virtqueues and "Virtqueues and virtio ring: How
|
|
the data travels" blog post (`[2]`_) for an illustrated overview of how
|
|
the host device and the guest driver communicate.
|
|
|
|
The :c:type:`vring_virtqueue` struct models a virtqueue, including the
|
|
ring buffers and management data. Embedded in this struct is the
|
|
:c:type:`virtqueue` struct, which is the data structure that's
|
|
ultimately used by virtio drivers:
|
|
|
|
.. kernel-doc:: include/linux/virtio.h
|
|
:identifiers: struct virtqueue
|
|
|
|
The callback function pointed by this struct is triggered when the
|
|
device has consumed the buffers provided by the driver. More
|
|
specifically, the trigger will be an interrupt issued by the hypervisor
|
|
(see vring_interrupt()). Interrupt request handlers are registered for
|
|
a virtqueue during the virtqueue setup process (transport-specific).
|
|
|
|
.. kernel-doc:: drivers/virtio/virtio_ring.c
|
|
:identifiers: vring_interrupt
|
|
|
|
|
|
Device discovery and probing
|
|
============================
|
|
|
|
In the kernel, the virtio core contains the virtio bus driver and
|
|
transport-specific drivers like `virtio-pci` and `virtio-mmio`. Then
|
|
there are individual virtio drivers for specific device types that are
|
|
registered to the virtio bus driver.
|
|
|
|
How a virtio device is found and configured by the kernel depends on how
|
|
the hypervisor defines it. Taking the `QEMU virtio-console
|
|
<https://gitlab.com/qemu-project/qemu/-/blob/master/hw/char/virtio-console.c>`__
|
|
device as an example. When using PCI as a transport method, the device
|
|
will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.)
|
|
and device id 0x1003 (virtio console), as defined in the spec, so the
|
|
kernel will detect it as it would do with any other PCI device.
|
|
|
|
During the PCI enumeration process, if a device is found to match the
|
|
virtio-pci driver (according to the virtio-pci device table, any PCI
|
|
device with vendor id = 0x1af4)::
|
|
|
|
/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
|
|
static const struct pci_device_id virtio_pci_id_table[] = {
|
|
{ PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
|
|
{ 0 }
|
|
};
|
|
|
|
then the virtio-pci driver is probed and, if the probing goes well, the
|
|
device is registered to the virtio bus::
|
|
|
|
static int virtio_pci_probe(struct pci_dev *pci_dev,
|
|
const struct pci_device_id *id)
|
|
{
|
|
...
|
|
|
|
if (force_legacy) {
|
|
rc = virtio_pci_legacy_probe(vp_dev);
|
|
/* Also try modern mode if we can't map BAR0 (no IO space). */
|
|
if (rc == -ENODEV || rc == -ENOMEM)
|
|
rc = virtio_pci_modern_probe(vp_dev);
|
|
if (rc)
|
|
goto err_probe;
|
|
} else {
|
|
rc = virtio_pci_modern_probe(vp_dev);
|
|
if (rc == -ENODEV)
|
|
rc = virtio_pci_legacy_probe(vp_dev);
|
|
if (rc)
|
|
goto err_probe;
|
|
}
|
|
|
|
...
|
|
|
|
rc = register_virtio_device(&vp_dev->vdev);
|
|
|
|
When the device is registered to the virtio bus the kernel will look
|
|
for a driver in the bus that can handle the device and call that
|
|
driver's ``probe`` method.
|
|
|
|
At this point, the virtqueues will be allocated and configured by
|
|
calling the appropriate ``virtio_find`` helper function, such as
|
|
virtio_find_single_vq() or virtio_find_vqs(), which will end up calling
|
|
a transport-specific ``find_vqs`` method.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
_`[1]` Virtio Spec v1.2:
|
|
https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html
|
|
|
|
.. Check for later versions of the spec as well.
|
|
|
|
_`[2]` Virtqueues and virtio ring: How the data travels
|
|
https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#f1] that's why they may be also referred to as virtrings.
|