mirror of
git://git.proxmox.com/git/pve-docs.git
synced 2025-01-26 10:03:45 +03:00
1045 lines
42 KiB
Plaintext
1045 lines
42 KiB
Plaintext
[[chapter_virtual_machines]]
|
||
ifdef::manvolnum[]
|
||
qm(1)
|
||
=====
|
||
:pve-toplevel:
|
||
|
||
NAME
|
||
----
|
||
|
||
qm - Qemu/KVM Virtual Machine Manager
|
||
|
||
|
||
SYNOPSIS
|
||
--------
|
||
|
||
include::qm.1-synopsis.adoc[]
|
||
|
||
DESCRIPTION
|
||
-----------
|
||
endif::manvolnum[]
|
||
ifndef::manvolnum[]
|
||
Qemu/KVM Virtual Machines
|
||
=========================
|
||
:pve-toplevel:
|
||
endif::manvolnum[]
|
||
|
||
// deprecates
|
||
// http://pve.proxmox.com/wiki/Container_and_Full_Virtualization
|
||
// http://pve.proxmox.com/wiki/KVM
|
||
// http://pve.proxmox.com/wiki/Qemu_Server
|
||
|
||
Qemu (short form for Quick Emulator) is an open source hypervisor that emulates a
|
||
physical computer. From the perspective of the host system where Qemu is
|
||
running, Qemu is a user program which has access to a number of local resources
|
||
like partitions, files, network cards which are then passed to an
|
||
emulated computer which sees them as if they were real devices.
|
||
|
||
A guest operating system running in the emulated computer accesses these
|
||
devices, and runs as it were running on real hardware. For instance you can pass
|
||
an iso image as a parameter to Qemu, and the OS running in the emulated computer
|
||
will see a real CDROM inserted in a CD drive.
|
||
|
||
Qemu can emulate a great variety of hardware from ARM to Sparc, but {pve} is
|
||
only concerned with 32 and 64 bits PC clone emulation, since it represents the
|
||
overwhelming majority of server hardware. The emulation of PC clones is also one
|
||
of the fastest due to the availability of processor extensions which greatly
|
||
speed up Qemu when the emulated architecture is the same as the host
|
||
architecture.
|
||
|
||
NOTE: You may sometimes encounter the term _KVM_ (Kernel-based Virtual Machine).
|
||
It means that Qemu is running with the support of the virtualization processor
|
||
extensions, via the Linux kvm module. In the context of {pve} _Qemu_ and
|
||
_KVM_ can be used interchangeably as Qemu in {pve} will always try to load the kvm
|
||
module.
|
||
|
||
Qemu inside {pve} runs as a root process, since this is required to access block
|
||
and PCI devices.
|
||
|
||
|
||
Emulated devices and paravirtualized devices
|
||
--------------------------------------------
|
||
|
||
The PC hardware emulated by Qemu includes a mainboard, network controllers,
|
||
scsi, ide and sata controllers, serial ports (the complete list can be seen in
|
||
the `kvm(1)` man page) all of them emulated in software. All these devices
|
||
are the exact software equivalent of existing hardware devices, and if the OS
|
||
running in the guest has the proper drivers it will use the devices as if it
|
||
were running on real hardware. This allows Qemu to runs _unmodified_ operating
|
||
systems.
|
||
|
||
This however has a performance cost, as running in software what was meant to
|
||
run in hardware involves a lot of extra work for the host CPU. To mitigate this,
|
||
Qemu can present to the guest operating system _paravirtualized devices_, where
|
||
the guest OS recognizes it is running inside Qemu and cooperates with the
|
||
hypervisor.
|
||
|
||
Qemu relies on the virtio virtualization standard, and is thus able to present
|
||
paravirtualized virtio devices, which includes a paravirtualized generic disk
|
||
controller, a paravirtualized network card, a paravirtualized serial port,
|
||
a paravirtualized SCSI controller, etc ...
|
||
|
||
It is highly recommended to use the virtio devices whenever you can, as they
|
||
provide a big performance improvement. Using the virtio generic disk controller
|
||
versus an emulated IDE controller will double the sequential write throughput,
|
||
as measured with `bonnie++(8)`. Using the virtio network interface can deliver
|
||
up to three times the throughput of an emulated Intel E1000 network card, as
|
||
measured with `iperf(1)`. footnote:[See this benchmark on the KVM wiki
|
||
http://www.linux-kvm.org/page/Using_VirtIO_NIC]
|
||
|
||
|
||
[[qm_virtual_machines_settings]]
|
||
Virtual Machines Settings
|
||
-------------------------
|
||
|
||
Generally speaking {pve} tries to choose sane defaults for virtual machines
|
||
(VM). Make sure you understand the meaning of the settings you change, as it
|
||
could incur a performance slowdown, or putting your data at risk.
|
||
|
||
|
||
[[qm_general_settings]]
|
||
General Settings
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
[thumbnail="gui-create-vm-general.png"]
|
||
|
||
General settings of a VM include
|
||
|
||
* the *Node* : the physical server on which the VM will run
|
||
* the *VM ID*: a unique number in this {pve} installation used to identify your VM
|
||
* *Name*: a free form text string you can use to describe the VM
|
||
* *Resource Pool*: a logical group of VMs
|
||
|
||
|
||
[[qm_os_settings]]
|
||
OS Settings
|
||
~~~~~~~~~~~
|
||
|
||
[thumbnail="gui-create-vm-os.png"]
|
||
|
||
When creating a VM, setting the proper Operating System(OS) allows {pve} to
|
||
optimize some low level parameters. For instance Windows OS expect the BIOS
|
||
clock to use the local time, while Unix based OS expect the BIOS clock to have
|
||
the UTC time.
|
||
|
||
|
||
[[qm_hard_disk]]
|
||
Hard Disk
|
||
~~~~~~~~~
|
||
|
||
Qemu can emulate a number of storage controllers:
|
||
|
||
* the *IDE* controller, has a design which goes back to the 1984 PC/AT disk
|
||
controller. Even if this controller has been superseded by recent designs,
|
||
each and every OS you can think of has support for it, making it a great choice
|
||
if you want to run an OS released before 2003. You can connect up to 4 devices
|
||
on this controller.
|
||
|
||
* the *SATA* (Serial ATA) controller, dating from 2003, has a more modern
|
||
design, allowing higher throughput and a greater number of devices to be
|
||
connected. You can connect up to 6 devices on this controller.
|
||
|
||
* the *SCSI* controller, designed in 1985, is commonly found on server grade
|
||
hardware, and can connect up to 14 storage devices. {pve} emulates by default a
|
||
LSI 53C895A controller.
|
||
+
|
||
A SCSI controller of type _VirtIO SCSI_ is the recommended setting if you aim for
|
||
performance and is automatically selected for newly created Linux VMs since
|
||
{pve} 4.3. Linux distributions have support for this controller since 2012, and
|
||
FreeBSD since 2014. For Windows OSes, you need to provide an extra iso
|
||
containing the drivers during the installation.
|
||
// https://pve.proxmox.com/wiki/Paravirtualized_Block_Drivers_for_Windows#During_windows_installation.
|
||
If you aim at maximum performance, you can select a SCSI controller of type
|
||
_VirtIO SCSI single_ which will allow you to select the *IO Thread* option.
|
||
When selecting _VirtIO SCSI single_ Qemu will create a new controller for
|
||
each disk, instead of adding all disks to the same controller.
|
||
|
||
* The *VirtIO Block* controller, often just called VirtIO or virtio-blk,
|
||
is an older type of paravirtualized controller. It has been superseded by the
|
||
VirtIO SCSI Controller, in terms of features.
|
||
|
||
[thumbnail="gui-create-vm-hard-disk.png"]
|
||
On each controller you attach a number of emulated hard disks, which are backed
|
||
by a file or a block device residing in the configured storage. The choice of
|
||
a storage type will determine the format of the hard disk image. Storages which
|
||
present block devices (LVM, ZFS, Ceph) will require the *raw disk image format*,
|
||
whereas files based storages (Ext4, NFS, CIFS, GlusterFS) will let you to choose
|
||
either the *raw disk image format* or the *QEMU image format*.
|
||
|
||
* the *QEMU image format* is a copy on write format which allows snapshots, and
|
||
thin provisioning of the disk image.
|
||
* the *raw disk image* is a bit-to-bit image of a hard disk, similar to what
|
||
you would get when executing the `dd` command on a block device in Linux. This
|
||
format does not support thin provisioning or snapshots by itself, requiring
|
||
cooperation from the storage layer for these tasks. It may, however, be up to
|
||
10% faster than the *QEMU image format*. footnote:[See this benchmark for details
|
||
http://events.linuxfoundation.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf]
|
||
* the *VMware image format* only makes sense if you intend to import/export the
|
||
disk image to other hypervisors.
|
||
|
||
Setting the *Cache* mode of the hard drive will impact how the host system will
|
||
notify the guest systems of block write completions. The *No cache* default
|
||
means that the guest system will be notified that a write is complete when each
|
||
block reaches the physical storage write queue, ignoring the host page cache.
|
||
This provides a good balance between safety and speed.
|
||
|
||
If you want the {pve} backup manager to skip a disk when doing a backup of a VM,
|
||
you can set the *No backup* option on that disk.
|
||
|
||
If you want the {pve} storage replication mechanism to skip a disk when starting
|
||
a replication job, you can set the *Skip replication* option on that disk.
|
||
As of {pve} 5.0, replication requires the disk images to be on a storage of type
|
||
`zfspool`, so adding a disk image to other storages when the VM has replication
|
||
configured requires to skip replication for this disk image.
|
||
|
||
If your storage supports _thin provisioning_ (see the storage chapter in the
|
||
{pve} guide), and your VM has a *SCSI* controller you can activate the *Discard*
|
||
option on the hard disks connected to that controller. With *Discard* enabled,
|
||
when the filesystem of a VM marks blocks as unused after removing files, the
|
||
emulated SCSI controller will relay this information to the storage, which will
|
||
then shrink the disk image accordingly.
|
||
|
||
.IO Thread
|
||
The option *IO Thread* can only be used when using a disk with the
|
||
*VirtIO* controller, or with the *SCSI* controller, when the emulated controller
|
||
type is *VirtIO SCSI single*.
|
||
With this enabled, Qemu creates one I/O thread per storage controller,
|
||
instead of a single thread for all I/O, so it increases performance when
|
||
multiple disks are used and each disk has its own storage controller.
|
||
Note that backups do not currently work with *IO Thread* enabled.
|
||
|
||
|
||
[[qm_cpu]]
|
||
CPU
|
||
~~~
|
||
|
||
[thumbnail="gui-create-vm-cpu.png"]
|
||
|
||
A *CPU socket* is a physical slot on a PC motherboard where you can plug a CPU.
|
||
This CPU can then contain one or many *cores*, which are independent
|
||
processing units. Whether you have a single CPU socket with 4 cores, or two CPU
|
||
sockets with two cores is mostly irrelevant from a performance point of view.
|
||
However some software licenses depend on the number of sockets a machine has,
|
||
in that case it makes sense to set the number of sockets to what the license
|
||
allows you.
|
||
|
||
Increasing the number of virtual cpus (cores and sockets) will usually provide a
|
||
performance improvement though that is heavily dependent on the use of the VM.
|
||
Multithreaded applications will of course benefit from a large number of
|
||
virtual cpus, as for each virtual cpu you add, Qemu will create a new thread of
|
||
execution on the host system. If you're not sure about the workload of your VM,
|
||
it is usually a safe bet to set the number of *Total cores* to 2.
|
||
|
||
NOTE: It is perfectly safe if the _overall_ number of cores of all your VMs
|
||
is greater than the number of cores on the server (e.g., 4 VMs with each 4
|
||
cores on a machine with only 8 cores). In that case the host system will
|
||
balance the Qemu execution threads between your server cores, just like if you
|
||
were running a standard multithreaded application. However, {pve} will prevent
|
||
you from assigning more virtual CPU cores than physically available, as this will
|
||
only bring the performance down due to the cost of context switches.
|
||
|
||
[[qm_cpu_resource_limits]]
|
||
Resource Limits
|
||
^^^^^^^^^^^^^^^
|
||
|
||
In addition to the number of virtual cores, you can configure how much resources
|
||
a VM can get in relation to the host CPU time and also in relation to other
|
||
VMs.
|
||
With the *cpulimit* (``Host CPU Time'') option you can limit how much CPU time
|
||
the whole VM can use on the host. It is a floating point value representing CPU
|
||
time in percent, so `1.0` is equal to `100%`, `2.5` to `250%` and so on. If a
|
||
single process would fully use one single core it would have `100%` CPU Time
|
||
usage. If a VM with four cores utilizes all its cores fully it would
|
||
theoretically use `400%`. In reality the usage may be even a bit higher as Qemu
|
||
can have additional threads for VM peripherals besides the vCPU core ones.
|
||
This setting can be useful if a VM should have multiple vCPUs, as it runs a few
|
||
processes in parallel, but the VM as a whole should not be able to run all
|
||
vCPUs at 100% at the same time. Using a specific example: lets say we have a VM
|
||
which would profit from having 8 vCPUs, but at no time all of those 8 cores
|
||
should run at full load - as this would make the server so overloaded that
|
||
other VMs and CTs would get to less CPU. So, we set the *cpulimit* limit to
|
||
`4.0` (=400%). If all cores do the same heavy work they would all get 50% of a
|
||
real host cores CPU time. But, if only 4 would do work they could still get
|
||
almost 100% of a real core each.
|
||
|
||
NOTE: VMs can, depending on their configuration, use additional threads e.g.,
|
||
for networking or IO operations but also live migration. Thus a VM can show up
|
||
to use more CPU time than just its virtual CPUs could use. To ensure that a VM
|
||
never uses more CPU time than virtual CPUs assigned set the *cpulimit* setting
|
||
to the same value as the total core count.
|
||
|
||
The second CPU resource limiting setting, *cpuunits* (nowadays often called CPU
|
||
shares or CPU weight), controls how much CPU time a VM gets in regards to other
|
||
VMs running. It is a relative weight which defaults to `1024`, if you increase
|
||
this for a VM it will be prioritized by the scheduler in comparison to other
|
||
VMs with lower weight. E.g., if VM 100 has set the default 1024 and VM 200 was
|
||
changed to `2048`, the latter VM 200 would receive twice the CPU bandwidth than
|
||
the first VM 100.
|
||
|
||
For more information see `man systemd.resource-control`, here `CPUQuota`
|
||
corresponds to `cpulimit` and `CPUShares` corresponds to our `cpuunits`
|
||
setting, visit its Notes section for references and implementation details.
|
||
|
||
CPU Type
|
||
^^^^^^^^
|
||
|
||
Qemu can emulate a number different of *CPU types* from 486 to the latest Xeon
|
||
processors. Each new processor generation adds new features, like hardware
|
||
assisted 3d rendering, random number generation, memory protection, etc ...
|
||
Usually you should select for your VM a processor type which closely matches the
|
||
CPU of the host system, as it means that the host CPU features (also called _CPU
|
||
flags_ ) will be available in your VMs. If you want an exact match, you can set
|
||
the CPU type to *host* in which case the VM will have exactly the same CPU flags
|
||
as your host system.
|
||
|
||
This has a downside though. If you want to do a live migration of VMs between
|
||
different hosts, your VM might end up on a new system with a different CPU type.
|
||
If the CPU flags passed to the guest are missing, the qemu process will stop. To
|
||
remedy this Qemu has also its own CPU type *kvm64*, that {pve} uses by defaults.
|
||
kvm64 is a Pentium 4 look a like CPU type, which has a reduced CPU flags set,
|
||
but is guaranteed to work everywhere.
|
||
|
||
In short, if you care about live migration and moving VMs between nodes, leave
|
||
the kvm64 default. If you don’t care about live migration or have a homogeneous
|
||
cluster where all nodes have the same CPU, set the CPU type to host, as in
|
||
theory this will give your guests maximum performance.
|
||
|
||
Meltdown / Spectre related CPU flags
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
There are two CPU flags related to the Meltdown and Spectre vulnerabilities
|
||
footnote:[Meltdown Attack https://meltdownattack.com/] which need to be set
|
||
manually unless the selected CPU type of your VM already enables them by default.
|
||
|
||
The first, called 'pcid', helps to reduce the performance impact of the Meltdown
|
||
mitigation called 'Kernel Page-Table Isolation (KPTI)', which effectively hides
|
||
the Kernel memory from the user space. Without PCID, KPTI is quite an expensive
|
||
mechanism footnote:[PCID is now a critical performance/security feature on x86
|
||
https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU].
|
||
|
||
The second CPU flag is called 'spec-ctrl', which allows an operating system to
|
||
selectively disable or restrict speculative execution in order to limit the
|
||
ability of attackers to exploit the Spectre vulnerability.
|
||
|
||
There are two requirements that need to be fulfilled in order to use these two
|
||
CPU flags:
|
||
|
||
* The host CPU(s) must support the feature and propagate it to the guest's virtual CPU(s)
|
||
* The guest operating system must be updated to a version which mitigates the
|
||
attacks and is able to utilize the CPU feature
|
||
|
||
In order to use 'spec-ctrl', your CPU or system vendor also needs to provide a
|
||
so-called ``microcode update'' footnote:[You can use `intel-microcode' /
|
||
`amd-microcode' from Debian non-free if your vendor does not provide such an
|
||
update. Note that not all affected CPUs can be updated to support spec-ctrl.]
|
||
for your CPU.
|
||
|
||
To check if the {pve} host supports PCID, execute the following command as root:
|
||
|
||
----
|
||
# grep ' pcid ' /proc/cpuinfo
|
||
----
|
||
|
||
If this does not return empty your host's CPU has support for 'pcid'.
|
||
|
||
To check if the {pve} host supports spec-ctrl, execute the following command as root:
|
||
|
||
----
|
||
# grep ' spec_ctrl ' /proc/cpuinfo
|
||
----
|
||
|
||
If this does not return empty your host's CPU has support for 'spec-ctrl'.
|
||
|
||
If you use `host' or another CPU type which enables the desired flags by
|
||
default, and you updated your guest OS to make use of the associated CPU
|
||
features, you're already set.
|
||
|
||
Otherwise you need to set the desired CPU flag of the virtual CPU, either by
|
||
editing the CPU options in the WebUI, or by setting the 'flags' property of the
|
||
'cpu' option in the VM configuration file.
|
||
|
||
NUMA
|
||
^^^^
|
||
You can also optionally emulate a *NUMA*
|
||
footnote:[https://en.wikipedia.org/wiki/Non-uniform_memory_access] architecture
|
||
in your VMs. The basics of the NUMA architecture mean that instead of having a
|
||
global memory pool available to all your cores, the memory is spread into local
|
||
banks close to each socket.
|
||
This can bring speed improvements as the memory bus is not a bottleneck
|
||
anymore. If your system has a NUMA architecture footnote:[if the command
|
||
`numactl --hardware | grep available` returns more than one node, then your host
|
||
system has a NUMA architecture] we recommend to activate the option, as this
|
||
will allow proper distribution of the VM resources on the host system.
|
||
This option is also required to hot-plug cores or RAM in a VM.
|
||
|
||
If the NUMA option is used, it is recommended to set the number of sockets to
|
||
the number of sockets of the host system.
|
||
|
||
vCPU hot-plug
|
||
^^^^^^^^^^^^^
|
||
|
||
Modern operating systems introduced the capability to hot-plug and, to a
|
||
certain extent, hot-unplug CPUs in a running systems. Virtualisation allows us
|
||
to avoid a lot of the (physical) problems real hardware can cause in such
|
||
scenarios.
|
||
Still, this is a rather new and complicated feature, so its use should be
|
||
restricted to cases where its absolutely needed. Most of the functionality can
|
||
be replicated with other, well tested and less complicated, features, see
|
||
xref:qm_cpu_resource_limits[Resource Limits].
|
||
|
||
In {pve} the maximal number of plugged CPUs is always `cores * sockets`.
|
||
To start a VM with less than this total core count of CPUs you may use the
|
||
*vpus* setting, it denotes how many vCPUs should be plugged in at VM start.
|
||
|
||
Currently only this feature is only supported on Linux, a kernel newer than 3.10
|
||
is needed, a kernel newer than 4.7 is recommended.
|
||
|
||
You can use a udev rule as follow to automatically set new CPUs as online in
|
||
the guest:
|
||
|
||
----
|
||
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1"
|
||
----
|
||
|
||
Save this under /etc/udev/rules.d/ as a file ending in `.rules`.
|
||
|
||
Note: CPU hot-remove is machine dependent and requires guest cooperation.
|
||
The deletion command does not guarantee CPU removal to actually happen,
|
||
typically it's a request forwarded to guest using target dependent mechanism,
|
||
e.g., ACPI on x86/amd64.
|
||
|
||
|
||
[[qm_memory]]
|
||
Memory
|
||
~~~~~~
|
||
|
||
For each VM you have the option to set a fixed size memory or asking
|
||
{pve} to dynamically allocate memory based on the current RAM usage of the
|
||
host.
|
||
|
||
.Fixed Memory Allocation
|
||
[thumbnail="gui-create-vm-memory.png"]
|
||
|
||
When setting memory and minimum memory to the same amount
|
||
{pve} will simply allocate what you specify to your VM.
|
||
|
||
Even when using a fixed memory size, the ballooning device gets added to the
|
||
VM, because it delivers useful information such as how much memory the guest
|
||
really uses.
|
||
In general, you should leave *ballooning* enabled, but if you want to disable
|
||
it (e.g. for debugging purposes), simply uncheck
|
||
*Ballooning Device* or set
|
||
|
||
balloon: 0
|
||
|
||
in the configuration.
|
||
|
||
.Automatic Memory Allocation
|
||
|
||
// see autoballoon() in pvestatd.pm
|
||
When setting the minimum memory lower than memory, {pve} will make sure that the
|
||
minimum amount you specified is always available to the VM, and if RAM usage on
|
||
the host is below 80%, will dynamically add memory to the guest up to the
|
||
maximum memory specified.
|
||
|
||
When the host is becoming short on RAM, the VM will then release some memory
|
||
back to the host, swapping running processes if needed and starting the oom
|
||
killer in last resort. The passing around of memory between host and guest is
|
||
done via a special `balloon` kernel driver running inside the guest, which will
|
||
grab or release memory pages from the host.
|
||
footnote:[A good explanation of the inner workings of the balloon driver can be found here https://rwmj.wordpress.com/2010/07/17/virtio-balloon/]
|
||
|
||
When multiple VMs use the autoallocate facility, it is possible to set a
|
||
*Shares* coefficient which indicates the relative amount of the free host memory
|
||
that each VM should take. Suppose for instance you have four VMs, three of them
|
||
running a HTTP server and the last one is a database server. To cache more
|
||
database blocks in the database server RAM, you would like to prioritize the
|
||
database VM when spare RAM is available. For this you assign a Shares property
|
||
of 3000 to the database VM, leaving the other VMs to the Shares default setting
|
||
of 1000. The host server has 32GB of RAM, and is currently using 16GB, leaving 32
|
||
* 80/100 - 16 = 9GB RAM to be allocated to the VMs. The database VM will get 9 *
|
||
3000 / (3000 + 1000 + 1000 + 1000) = 4.5 GB extra RAM and each HTTP server will
|
||
get 1/5 GB.
|
||
|
||
All Linux distributions released after 2010 have the balloon kernel driver
|
||
included. For Windows OSes, the balloon driver needs to be added manually and can
|
||
incur a slowdown of the guest, so we don't recommend using it on critical
|
||
systems.
|
||
// see https://forum.proxmox.com/threads/solved-hyper-threading-vs-no-hyper-threading-fixed-vs-variable-memory.20265/
|
||
|
||
When allocating RAM to your VMs, a good rule of thumb is always to leave 1GB
|
||
of RAM available to the host.
|
||
|
||
|
||
[[qm_network_device]]
|
||
Network Device
|
||
~~~~~~~~~~~~~~
|
||
|
||
[thumbnail="gui-create-vm-network.png"]
|
||
|
||
Each VM can have many _Network interface controllers_ (NIC), of four different
|
||
types:
|
||
|
||
* *Intel E1000* is the default, and emulates an Intel Gigabit network card.
|
||
* the *VirtIO* paravirtualized NIC should be used if you aim for maximum
|
||
performance. Like all VirtIO devices, the guest OS should have the proper driver
|
||
installed.
|
||
* the *Realtek 8139* emulates an older 100 MB/s network card, and should
|
||
only be used when emulating older operating systems ( released before 2002 )
|
||
* the *vmxnet3* is another paravirtualized device, which should only be used
|
||
when importing a VM from another hypervisor.
|
||
|
||
{pve} will generate for each NIC a random *MAC address*, so that your VM is
|
||
addressable on Ethernet networks.
|
||
|
||
The NIC you added to the VM can follow one of two different models:
|
||
|
||
* in the default *Bridged mode* each virtual NIC is backed on the host by a
|
||
_tap device_, ( a software loopback device simulating an Ethernet NIC ). This
|
||
tap device is added to a bridge, by default vmbr0 in {pve}. In this mode, VMs
|
||
have direct access to the Ethernet LAN on which the host is located.
|
||
* in the alternative *NAT mode*, each virtual NIC will only communicate with
|
||
the Qemu user networking stack, where a built-in router and DHCP server can
|
||
provide network access. This built-in DHCP will serve addresses in the private
|
||
10.0.2.0/24 range. The NAT mode is much slower than the bridged mode, and
|
||
should only be used for testing. This mode is only available via CLI or the API,
|
||
but not via the WebUI.
|
||
|
||
You can also skip adding a network device when creating a VM by selecting *No
|
||
network device*.
|
||
|
||
.Multiqueue
|
||
If you are using the VirtIO driver, you can optionally activate the
|
||
*Multiqueue* option. This option allows the guest OS to process networking
|
||
packets using multiple virtual CPUs, providing an increase in the total number
|
||
of packets transferred.
|
||
|
||
//http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html
|
||
When using the VirtIO driver with {pve}, each NIC network queue is passed to the
|
||
host kernel, where the queue will be processed by a kernel thread spawn by the
|
||
vhost driver. With this option activated, it is possible to pass _multiple_
|
||
network queues to the host kernel for each NIC.
|
||
|
||
//https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-Networking-Techniques.html#sect-Virtualization_Tuning_Optimization_Guide-Networking-Multi-queue_virtio-net
|
||
When using Multiqueue, it is recommended to set it to a value equal
|
||
to the number of Total Cores of your guest. You also need to set in
|
||
the VM the number of multi-purpose channels on each VirtIO NIC with the ethtool
|
||
command:
|
||
|
||
`ethtool -L ens1 combined X`
|
||
|
||
where X is the number of the number of vcpus of the VM.
|
||
|
||
You should note that setting the Multiqueue parameter to a value greater
|
||
than one will increase the CPU load on the host and guest systems as the
|
||
traffic increases. We recommend to set this option only when the VM has to
|
||
process a great number of incoming connections, such as when the VM is running
|
||
as a router, reverse proxy or a busy HTTP server doing long polling.
|
||
|
||
|
||
[[qm_usb_passthrough]]
|
||
USB Passthrough
|
||
~~~~~~~~~~~~~~~
|
||
|
||
There are two different types of USB passthrough devices:
|
||
|
||
* Host USB passthrough
|
||
* SPICE USB passthrough
|
||
|
||
Host USB passthrough works by giving a VM a USB device of the host.
|
||
This can either be done via the vendor- and product-id, or
|
||
via the host bus and port.
|
||
|
||
The vendor/product-id looks like this: *0123:abcd*,
|
||
where *0123* is the id of the vendor, and *abcd* is the id
|
||
of the product, meaning two pieces of the same usb device
|
||
have the same id.
|
||
|
||
The bus/port looks like this: *1-2.3.4*, where *1* is the bus
|
||
and *2.3.4* is the port path. This represents the physical
|
||
ports of your host (depending of the internal order of the
|
||
usb controllers).
|
||
|
||
If a device is present in a VM configuration when the VM starts up,
|
||
but the device is not present in the host, the VM can boot without problems.
|
||
As soon as the device/port is available in the host, it gets passed through.
|
||
|
||
WARNING: Using this kind of USB passthrough means that you cannot move
|
||
a VM online to another host, since the hardware is only available
|
||
on the host the VM is currently residing.
|
||
|
||
The second type of passthrough is SPICE USB passthrough. This is useful
|
||
if you use a SPICE client which supports it. If you add a SPICE USB port
|
||
to your VM, you can passthrough a USB device from where your SPICE client is,
|
||
directly to the VM (for example an input device or hardware dongle).
|
||
|
||
|
||
[[qm_bios_and_uefi]]
|
||
BIOS and UEFI
|
||
~~~~~~~~~~~~~
|
||
|
||
In order to properly emulate a computer, QEMU needs to use a firmware.
|
||
By default QEMU uses *SeaBIOS* for this, which is an open-source, x86 BIOS
|
||
implementation. SeaBIOS is a good choice for most standard setups.
|
||
|
||
There are, however, some scenarios in which a BIOS is not a good firmware
|
||
to boot from, e.g. if you want to do VGA passthrough. footnote:[Alex Williamson has a very good blog entry about this.
|
||
http://vfio.blogspot.co.at/2014/08/primary-graphics-assignment-without-vga.html]
|
||
In such cases, you should rather use *OVMF*, which is an open-source UEFI implementation. footnote:[See the OVMF Project http://www.tianocore.org/ovmf/]
|
||
|
||
If you want to use OVMF, there are several things to consider:
|
||
|
||
In order to save things like the *boot order*, there needs to be an EFI Disk.
|
||
This disk will be included in backups and snapshots, and there can only be one.
|
||
|
||
You can create such a disk with the following command:
|
||
|
||
qm set <vmid> -efidisk0 <storage>:1,format=<format>
|
||
|
||
Where *<storage>* is the storage where you want to have the disk, and
|
||
*<format>* is a format which the storage supports. Alternatively, you can
|
||
create such a disk through the web interface with 'Add' -> 'EFI Disk' in the
|
||
hardware section of a VM.
|
||
|
||
When using OVMF with a virtual display (without VGA passthrough),
|
||
you need to set the client resolution in the OVMF menu(which you can reach
|
||
with a press of the ESC button during boot), or you have to choose
|
||
SPICE as the display type.
|
||
|
||
[[qm_startup_and_shutdown]]
|
||
Automatic Start and Shutdown of Virtual Machines
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
After creating your VMs, you probably want them to start automatically
|
||
when the host system boots. For this you need to select the option 'Start at
|
||
boot' from the 'Options' Tab of your VM in the web interface, or set it with
|
||
the following command:
|
||
|
||
qm set <vmid> -onboot 1
|
||
|
||
.Start and Shutdown Order
|
||
|
||
[thumbnail="gui-qemu-edit-start-order.png"]
|
||
|
||
In some case you want to be able to fine tune the boot order of your
|
||
VMs, for instance if one of your VM is providing firewalling or DHCP
|
||
to other guest systems. For this you can use the following
|
||
parameters:
|
||
|
||
* *Start/Shutdown order*: Defines the start order priority. E.g. set it to 1 if
|
||
you want the VM to be the first to be started. (We use the reverse startup
|
||
order for shutdown, so a machine with a start order of 1 would be the last to
|
||
be shut down). If multiple VMs have the same order defined on a host, they will
|
||
additionally be ordered by 'VMID' in ascending order.
|
||
* *Startup delay*: Defines the interval between this VM start and subsequent
|
||
VMs starts . E.g. set it to 240 if you want to wait 240 seconds before starting
|
||
other VMs.
|
||
* *Shutdown timeout*: Defines the duration in seconds {pve} should wait
|
||
for the VM to be offline after issuing a shutdown command.
|
||
By default this value is set to 180, which means that {pve} will issue a
|
||
shutdown request and wait 180 seconds for the machine to be offline. If
|
||
the machine is still online after the timeout it will be stopped forcefully.
|
||
|
||
NOTE: VMs managed by the HA stack do not follow the 'start on boot' and
|
||
'boot order' options currently. Those VMs will be skipped by the startup and
|
||
shutdown algorithm as the HA manager itself ensures that VMs get started and
|
||
stopped.
|
||
|
||
Please note that machines without a Start/Shutdown order parameter will always
|
||
start after those where the parameter is set. Further, this parameter can only
|
||
be enforced between virtual machines running on the same host, not
|
||
cluster-wide.
|
||
|
||
|
||
[[qm_migration]]
|
||
Migration
|
||
---------
|
||
|
||
[thumbnail="gui-qemu-migrate.png"]
|
||
|
||
If you have a cluster, you can migrate your VM to another host with
|
||
|
||
qm migrate <vmid> <target>
|
||
|
||
There are generally two mechanisms for this
|
||
|
||
* Online Migration (aka Live Migration)
|
||
* Offline Migration
|
||
|
||
Online Migration
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
When your VM is running and it has no local resources defined (such as disks
|
||
on local storage, passed through devices, etc.) you can initiate a live
|
||
migration with the -online flag.
|
||
|
||
How it works
|
||
^^^^^^^^^^^^
|
||
|
||
This starts a Qemu Process on the target host with the 'incoming' flag, which
|
||
means that the process starts and waits for the memory data and device states
|
||
from the source Virtual Machine (since all other resources, e.g. disks,
|
||
are shared, the memory content and device state are the only things left
|
||
to transmit).
|
||
|
||
Once this connection is established, the source begins to send the memory
|
||
content asynchronously to the target. If the memory on the source changes,
|
||
those sections are marked dirty and there will be another pass of sending data.
|
||
This happens until the amount of data to send is so small that it can
|
||
pause the VM on the source, send the remaining data to the target and start
|
||
the VM on the target in under a second.
|
||
|
||
Requirements
|
||
^^^^^^^^^^^^
|
||
|
||
For Live Migration to work, there are some things required:
|
||
|
||
* The VM has no local resources (e.g. passed through devices, local disks, etc.)
|
||
* The hosts are in the same {pve} cluster.
|
||
* The hosts have a working (and reliable) network connection.
|
||
* The target host must have the same or higher versions of the
|
||
{pve} packages. (It *might* work the other way, but this is never guaranteed)
|
||
|
||
Offline Migration
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
If you have local resources, you can still offline migrate your VMs,
|
||
as long as all disk are on storages, which are defined on both hosts.
|
||
Then the migration will copy the disk over the network to the target host.
|
||
|
||
[[qm_copy_and_clone]]
|
||
Copies and Clones
|
||
-----------------
|
||
|
||
[thumbnail="gui-qemu-full-clone.png"]
|
||
|
||
VM installation is usually done using an installation media (CD-ROM)
|
||
from the operation system vendor. Depending on the OS, this can be a
|
||
time consuming task one might want to avoid.
|
||
|
||
An easy way to deploy many VMs of the same type is to copy an existing
|
||
VM. We use the term 'clone' for such copies, and distinguish between
|
||
'linked' and 'full' clones.
|
||
|
||
Full Clone::
|
||
|
||
The result of such copy is an independent VM. The
|
||
new VM does not share any storage resources with the original.
|
||
+
|
||
|
||
It is possible to select a *Target Storage*, so one can use this to
|
||
migrate a VM to a totally different storage. You can also change the
|
||
disk image *Format* if the storage driver supports several formats.
|
||
+
|
||
|
||
NOTE: A full clone need to read and copy all VM image data. This is
|
||
usually much slower than creating a linked clone.
|
||
+
|
||
|
||
Some storage types allows to copy a specific *Snapshot*, which
|
||
defaults to the 'current' VM data. This also means that the final copy
|
||
never includes any additional snapshots from the original VM.
|
||
|
||
|
||
Linked Clone::
|
||
|
||
Modern storage drivers supports a way to generate fast linked
|
||
clones. Such a clone is a writable copy whose initial contents are the
|
||
same as the original data. Creating a linked clone is nearly
|
||
instantaneous, and initially consumes no additional space.
|
||
+
|
||
|
||
They are called 'linked' because the new image still refers to the
|
||
original. Unmodified data blocks are read from the original image, but
|
||
modification are written (and afterwards read) from a new
|
||
location. This technique is called 'Copy-on-write'.
|
||
+
|
||
|
||
This requires that the original volume is read-only. With {pve} one
|
||
can convert any VM into a read-only <<qm_templates, Template>>). Such
|
||
templates can later be used to create linked clones efficiently.
|
||
+
|
||
|
||
NOTE: You cannot delete the original template while linked clones
|
||
exists.
|
||
+
|
||
|
||
It is not possible to change the *Target storage* for linked clones,
|
||
because this is a storage internal feature.
|
||
|
||
|
||
The *Target node* option allows you to create the new VM on a
|
||
different node. The only restriction is that the VM is on shared
|
||
storage, and that storage is also available on the target node.
|
||
|
||
To avoid resource conflicts, all network interface MAC addresses gets
|
||
randomized, and we generate a new 'UUID' for the VM BIOS (smbios1)
|
||
setting.
|
||
|
||
|
||
[[qm_templates]]
|
||
Virtual Machine Templates
|
||
-------------------------
|
||
|
||
One can convert a VM into a Template. Such templates are read-only,
|
||
and you can use them to create linked clones.
|
||
|
||
NOTE: It is not possible to start templates, because this would modify
|
||
the disk images. If you want to change the template, create a linked
|
||
clone and modify that.
|
||
|
||
Importing Virtual Machines and disk images
|
||
------------------------------------------
|
||
|
||
A VM export from a foreign hypervisor takes usually the form of one or more disk
|
||
images, with a configuration file describing the settings of the VM (RAM,
|
||
number of cores). +
|
||
The disk images can be in the vmdk format, if the disks come from
|
||
VMware or VirtualBox, or qcow2 if the disks come from a KVM hypervisor.
|
||
The most popular configuration format for VM exports is the OVF standard, but in
|
||
practice interoperation is limited because many settings are not implemented in
|
||
the standard itself, and hypervisors export the supplementary information
|
||
in non-standard extensions.
|
||
|
||
Besides the problem of format, importing disk images from other hypervisors
|
||
may fail if the emulated hardware changes too much from one hypervisor to
|
||
another. Windows VMs are particularly concerned by this, as the OS is very
|
||
picky about any changes of hardware. This problem may be solved by
|
||
installing the MergeIDE.zip utility available from the Internet before exporting
|
||
and choosing a hard disk type of *IDE* before booting the imported Windows VM.
|
||
|
||
Finally there is the question of paravirtualized drivers, which improve the
|
||
speed of the emulated system and are specific to the hypervisor.
|
||
GNU/Linux and other free Unix OSes have all the necessary drivers installed by
|
||
default and you can switch to the paravirtualized drivers right after importing
|
||
the VM. For Windows VMs, you need to install the Windows paravirtualized
|
||
drivers by yourself.
|
||
|
||
GNU/Linux and other free Unix can usually be imported without hassle. Note
|
||
that we cannot guarantee a successful import/export of Windows VMs in all
|
||
cases due to the problems above.
|
||
|
||
Step-by-step example of a Windows OVF import
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Microsoft provides
|
||
https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/[Virtual Machines downloads]
|
||
to get started with Windows development.We are going to use one of these
|
||
to demonstrate the OVF import feature.
|
||
|
||
Download the Virtual Machine zip
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
After getting informed about the user agreement, choose the _Windows 10
|
||
Enterprise (Evaluation - Build)_ for the VMware platform, and download the zip.
|
||
|
||
Extract the disk image from the zip
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Using the `unzip` utility or any archiver of your choice, unpack the zip,
|
||
and copy via ssh/scp the ovf and vmdk files to your {pve} host.
|
||
|
||
Import the Virtual Machine
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
This will create a new virtual machine, using cores, memory and
|
||
VM name as read from the OVF manifest, and import the disks to the +local-lvm+
|
||
storage. You have to configure the network manually.
|
||
|
||
qm importovf 999 WinDev1709Eval.ovf local-lvm
|
||
|
||
The VM is ready to be started.
|
||
|
||
Adding an external disk image to a Virtual Machine
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
You can also add an existing disk image to a VM, either coming from a
|
||
foreign hypervisor, or one that you created yourself.
|
||
|
||
Suppose you created a Debian/Ubuntu disk image with the 'vmdebootstrap' tool:
|
||
|
||
vmdebootstrap --verbose \
|
||
--size 10GiB --serial-console \
|
||
--grub --no-extlinux \
|
||
--package openssh-server \
|
||
--package avahi-daemon \
|
||
--package qemu-guest-agent \
|
||
--hostname vm600 --enable-dhcp \
|
||
--customize=./copy_pub_ssh.sh \
|
||
--sparse --image vm600.raw
|
||
|
||
You can now create a new target VM for this image.
|
||
|
||
qm create 600 --net0 virtio,bridge=vmbr0 --name vm600 --serial0 socket \
|
||
--bootdisk scsi0 --scsihw virtio-scsi-pci --ostype l26
|
||
|
||
Add the disk image as +unused0+ to the VM, using the storage +pvedir+:
|
||
|
||
qm importdisk 600 vm600.raw pvedir
|
||
|
||
Finally attach the unused disk to the SCSI controller of the VM:
|
||
|
||
qm set 600 --scsi0 pvedir:600/vm-600-disk-1.raw
|
||
|
||
The VM is ready to be started.
|
||
|
||
|
||
ifndef::wiki[]
|
||
include::qm-cloud-init.adoc[]
|
||
endif::wiki[]
|
||
|
||
|
||
|
||
Managing Virtual Machines with `qm`
|
||
------------------------------------
|
||
|
||
qm is the tool to manage Qemu/Kvm virtual machines on {pve}. You can
|
||
create and destroy virtual machines, and control execution
|
||
(start/stop/suspend/resume). Besides that, you can use qm to set
|
||
parameters in the associated config file. It is also possible to
|
||
create and delete virtual disks.
|
||
|
||
CLI Usage Examples
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
Using an iso file uploaded on the 'local' storage, create a VM
|
||
with a 4 GB IDE disk on the 'local-lvm' storage
|
||
|
||
qm create 300 -ide0 local-lvm:4 -net0 e1000 -cdrom local:iso/proxmox-mailgateway_2.1.iso
|
||
|
||
Start the new VM
|
||
|
||
qm start 300
|
||
|
||
Send a shutdown request, then wait until the VM is stopped.
|
||
|
||
qm shutdown 300 && qm wait 300
|
||
|
||
Same as above, but only wait for 40 seconds.
|
||
|
||
qm shutdown 300 && qm wait 300 -timeout 40
|
||
|
||
|
||
[[qm_configuration]]
|
||
Configuration
|
||
-------------
|
||
|
||
VM configuration files are stored inside the Proxmox cluster file
|
||
system, and can be accessed at `/etc/pve/qemu-server/<VMID>.conf`.
|
||
Like other files stored inside `/etc/pve/`, they get automatically
|
||
replicated to all other cluster nodes.
|
||
|
||
NOTE: VMIDs < 100 are reserved for internal purposes, and VMIDs need to be
|
||
unique cluster wide.
|
||
|
||
.Example VM Configuration
|
||
----
|
||
cores: 1
|
||
sockets: 1
|
||
memory: 512
|
||
name: webmail
|
||
ostype: l26
|
||
bootdisk: virtio0
|
||
net0: e1000=EE:D2:28:5F:B6:3E,bridge=vmbr0
|
||
virtio0: local:vm-100-disk-1,size=32G
|
||
----
|
||
|
||
Those configuration files are simple text files, and you can edit them
|
||
using a normal text editor (`vi`, `nano`, ...). This is sometimes
|
||
useful to do small corrections, but keep in mind that you need to
|
||
restart the VM to apply such changes.
|
||
|
||
For that reason, it is usually better to use the `qm` command to
|
||
generate and modify those files, or do the whole thing using the GUI.
|
||
Our toolkit is smart enough to instantaneously apply most changes to
|
||
running VM. This feature is called "hot plug", and there is no
|
||
need to restart the VM in that case.
|
||
|
||
|
||
File Format
|
||
~~~~~~~~~~~
|
||
|
||
VM configuration files use a simple colon separated key/value
|
||
format. Each line has the following format:
|
||
|
||
-----
|
||
# this is a comment
|
||
OPTION: value
|
||
-----
|
||
|
||
Blank lines in those files are ignored, and lines starting with a `#`
|
||
character are treated as comments and are also ignored.
|
||
|
||
|
||
[[qm_snapshots]]
|
||
Snapshots
|
||
~~~~~~~~~
|
||
|
||
When you create a snapshot, `qm` stores the configuration at snapshot
|
||
time into a separate snapshot section within the same configuration
|
||
file. For example, after creating a snapshot called ``testsnapshot'',
|
||
your configuration file will look like this:
|
||
|
||
.VM configuration with snapshot
|
||
----
|
||
memory: 512
|
||
swap: 512
|
||
parent: testsnaphot
|
||
...
|
||
|
||
[testsnaphot]
|
||
memory: 512
|
||
swap: 512
|
||
snaptime: 1457170803
|
||
...
|
||
----
|
||
|
||
There are a few snapshot related properties like `parent` and
|
||
`snaptime`. The `parent` property is used to store the parent/child
|
||
relationship between snapshots. `snaptime` is the snapshot creation
|
||
time stamp (Unix epoch).
|
||
|
||
|
||
[[qm_options]]
|
||
Options
|
||
~~~~~~~
|
||
|
||
include::qm.conf.5-opts.adoc[]
|
||
|
||
|
||
Locks
|
||
-----
|
||
|
||
Online migrations, snapshots and backups (`vzdump`) set a lock to
|
||
prevent incompatible concurrent actions on the affected VMs. Sometimes
|
||
you need to remove such a lock manually (e.g., after a power failure).
|
||
|
||
qm unlock <vmid>
|
||
|
||
CAUTION: Only do that if you are sure the action which set the lock is
|
||
no longer running.
|
||
|
||
|
||
ifdef::wiki[]
|
||
|
||
See Also
|
||
~~~~~~~~
|
||
|
||
* link:/wiki/Cloud-Init_Support[Cloud-Init Support]
|
||
|
||
endif::wiki[]
|
||
|
||
|
||
ifdef::manvolnum[]
|
||
|
||
Files
|
||
------
|
||
|
||
`/etc/pve/qemu-server/<VMID>.conf`::
|
||
|
||
Configuration file for the VM '<VMID>'.
|
||
|
||
|
||
include::pve-copyright.adoc[]
|
||
endif::manvolnum[]
|