mirror of
git://git.proxmox.com/git/pve-docs.git
synced 2025-01-10 01:17:51 +03:00
554 lines
17 KiB
Plaintext
554 lines
17 KiB
Plaintext
ifdef::manvolnum[]
|
|
PVE({manvolnum})
|
|
================
|
|
include::attributes.txt[]
|
|
|
|
NAME
|
|
----
|
|
|
|
pct - Tool to manage Linux Containers (LXC) on Proxmox VE
|
|
|
|
|
|
SYNOPSYS
|
|
--------
|
|
|
|
include::pct.1-synopsis.adoc[]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
endif::manvolnum[]
|
|
|
|
ifndef::manvolnum[]
|
|
Proxmox Container Toolkit
|
|
=========================
|
|
include::attributes.txt[]
|
|
endif::manvolnum[]
|
|
|
|
|
|
Containers are a lightweight alternative to fully virtualized
|
|
VMs. Instead of emulating a complete Operating System (OS), containers
|
|
simply use the OS of the host they run on. This implies that all
|
|
containers use the same kernel, and that they can access resources
|
|
from the host directly.
|
|
|
|
This is great because containers do not waste CPU power nor memory due
|
|
to kernel emulation. Container run-time costs are close to zero and
|
|
usually negligible. But there are also some drawbacks you need to
|
|
consider:
|
|
|
|
* You can only run Linux based OS inside containers, i.e. it is not
|
|
possible to run FreeBSD or MS Windows inside.
|
|
|
|
* For security reasons, access to host resources needs to be
|
|
restricted. This is done with AppArmor, SecComp filters and other
|
|
kernel features. Be prepared that some syscalls are not allowed
|
|
inside containers.
|
|
|
|
{pve} uses https://linuxcontainers.org/[LXC] as underlying container
|
|
technology. We consider LXC as low-level library, which provides
|
|
countless options. It would be too difficult to use those tools
|
|
directly. Instead, we provide a small wrapper called `pct`, the
|
|
"Proxmox Container Toolkit".
|
|
|
|
The toolkit is tightly coupled with {pve}. That means that it is aware
|
|
of the cluster setup, and it can use the same network and storage
|
|
resources as fully virtualized VMs. You can even use the {pve}
|
|
firewall, or manage containers using the HA framework.
|
|
|
|
Our primary goal is to offer an environment as one would get from a
|
|
VM, but without the additional overhead. We call this "System
|
|
Containers".
|
|
|
|
NOTE: If you want to run micro-containers (with docker, rct, ...), it
|
|
is best to run them inside a VM.
|
|
|
|
|
|
Security Considerations
|
|
-----------------------
|
|
|
|
Containers use the same kernel as the host, so there is a big attack
|
|
surface for malicious users. You should consider this fact if you
|
|
provide containers to totally untrusted people. In general, fully
|
|
virtualized VMs provide better isolation.
|
|
|
|
The good news is that LXC uses many kernel security features like
|
|
AppArmor, CGroups and PID and user namespaces, which makes containers
|
|
usage quite secure. We distinguish two types of containers:
|
|
|
|
Privileged containers
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Security is done by dropping capabilities, using mandatory access
|
|
control (AppArmor), SecComp filters and namespaces. The LXC team
|
|
considers this kind of container as unsafe, and they will not consider
|
|
new container escape exploits to be security issues worthy of a CVE
|
|
and quick fix. So you should use this kind of containers only inside a
|
|
trusted environment, or when no untrusted task is running as root in
|
|
the container.
|
|
|
|
Unprivileged containers
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This kind of containers use a new kernel feature called user
|
|
namespaces. The root uid 0 inside the container is mapped to an
|
|
unprivileged user outside the container. This means that most security
|
|
issues (container escape, resource abuse, ...) in those containers
|
|
will affect a random unprivileged user, and so would be a generic
|
|
kernel security bug rather than an LXC issue. The LXC team thinks
|
|
unprivileged containers are safe by design.
|
|
|
|
|
|
Configuration
|
|
-------------
|
|
|
|
The '/etc/pve/lxc/<CTID>.conf' file stores container configuration,
|
|
where '<CTID>' is the numeric ID of the given container. Like all
|
|
other files stored inside '/etc/pve/', they get automatically
|
|
replicated to all other cluster nodes.
|
|
|
|
NOTE: CTIDs < 100 are reserved for internal purposes, and CTIDs need to be
|
|
unique cluster wide.
|
|
|
|
.Example Container Configuration
|
|
----
|
|
ostype: debian
|
|
arch: amd64
|
|
hostname: www
|
|
memory: 512
|
|
swap: 512
|
|
net0: bridge=vmbr0,hwaddr=66:64:66:64:64:36,ip=dhcp,name=eth0,type=veth
|
|
rootfs: local:107/vm-107-disk-1.raw,size=7G
|
|
----
|
|
|
|
Those configuration files are simple text files, and you can edit them
|
|
using a normal text editor ('vi', 'nano', ...). This is sometimes
|
|
useful to do small corrections, but keep in mind that you need to
|
|
restart the container to apply such changes.
|
|
|
|
For that reason, it is usually better to use the 'pct' command to
|
|
generate and modify those files, or do the whole thing using the GUI.
|
|
Our toolkit is smart enough to instantaneously apply most changes to
|
|
running containers. This feature is called "hot plug", and there is no
|
|
need to restart the container in that case.
|
|
|
|
File Format
|
|
~~~~~~~~~~~
|
|
|
|
Container configuration files use a simple colon separated key/value
|
|
format. Each line has the following format:
|
|
|
|
# this is a comment
|
|
OPTION: value
|
|
|
|
Blank lines in those files are ignored, and lines starting with a '#'
|
|
character are treated as comments and are also ignored.
|
|
|
|
It is possible to add low-level, LXC style configuration directly, for
|
|
example:
|
|
|
|
lxc.init_cmd: /sbin/my_own_init
|
|
|
|
or
|
|
|
|
lxc.init_cmd = /sbin/my_own_init
|
|
|
|
Those settings are directly passed to the LXC low-level tools.
|
|
|
|
Snapshots
|
|
~~~~~~~~~
|
|
|
|
When you create a snapshot, 'pct' stores the configuration at snapshot
|
|
time into a separate snapshot section within the same configuration
|
|
file. For example, after creating a snapshot called 'testsnapshot',
|
|
your configuration file will look like this:
|
|
|
|
.Container Configuration with Snapshot
|
|
----
|
|
memory: 512
|
|
swap: 512
|
|
parent: testsnaphot
|
|
...
|
|
|
|
[testsnaphot]
|
|
memory: 512
|
|
swap: 512
|
|
snaptime: 1457170803
|
|
...
|
|
----
|
|
|
|
There are a few snapshot related properties like 'parent' and
|
|
'snaptime'. The 'parent' property is used to store the parent/child
|
|
relationship between snapshots. 'snaptime' is the snapshot creation
|
|
time stamp (unix epoch).
|
|
|
|
Guest Operating System Configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We normally try to detect the operating system type inside the
|
|
container, and then modify some files inside the container to make
|
|
them work as expected. Here is a short list of things we do at
|
|
container startup:
|
|
|
|
set /etc/hostname:: to set the container name
|
|
|
|
modify /etc/hosts:: to allow lookup of the local hostname
|
|
|
|
network setup:: pass the complete network setup to the container
|
|
|
|
configure DNS:: pass information about DNS servers
|
|
|
|
adapt the init system:: for example, fix the number of spawned getty processes
|
|
|
|
set the root password:: when creating a new container
|
|
|
|
rewrite ssh_host_keys:: so that each container has unique keys
|
|
|
|
randomize crontab:: so that cron does not start at the same time on all containers
|
|
|
|
Changes made by {PVE} are enclosed by comment markers:
|
|
|
|
----
|
|
# --- BEGIN PVE ---
|
|
<data>
|
|
# --- END PVE ---
|
|
----
|
|
|
|
Those markers will be inserted at a reasonable location in the
|
|
file. If such a section already exists, it will be updated in place
|
|
and will not be moved.
|
|
|
|
Modification of a file can be prevented by adding a `.pve-ignore.`
|
|
file for it. For instance, if the file `/etc/.pve-ignore.hosts`
|
|
exists then the `/etc/hosts` file will not be touched. This can be a
|
|
simple empty file creatd via:
|
|
|
|
# touch /etc/.pve-ignore.hosts
|
|
|
|
Most modifications are OS dependent, so they differ between different
|
|
distributions and versions. You can completely disable modifications
|
|
by manually setting the 'ostype' to 'unmanaged'.
|
|
|
|
OS type detection is done by testing for certain files inside the
|
|
container:
|
|
|
|
Ubuntu:: inspect /etc/lsb-release ('DISTRIB_ID=Ubuntu')
|
|
|
|
Debian:: test /etc/debian_version
|
|
|
|
Fedora:: test /etc/fedora-release
|
|
|
|
RedHat or CentOS:: test /etc/redhat-release
|
|
|
|
ArchLinux:: test /etc/arch-release
|
|
|
|
Alpine:: test /etc/alpine-release
|
|
|
|
Gentoo:: test /etc/gentoo-release
|
|
|
|
NOTE: Container start fails if the configured 'ostype' differs from the auto
|
|
detected type.
|
|
|
|
Options
|
|
~~~~~~~
|
|
|
|
include::pct.conf.5-opts.adoc[]
|
|
|
|
|
|
Container Images
|
|
----------------
|
|
|
|
Container Images, sometimes also referred to as "templates" or
|
|
"appliances", are 'tar' archives which contain everything to run a
|
|
container. You can think of it as a tidy container backup. Like most
|
|
modern container toolkits, 'pct' uses those images when you create a
|
|
new container, for example:
|
|
|
|
pct create 999 local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz
|
|
|
|
Proxmox itself ships a set of basic templates for most common
|
|
operating systems, and you can download them using the 'pveam' (short
|
|
for {pve} Appliance Manager) command line utility. You can also
|
|
download https://www.turnkeylinux.org/[TurnKey Linux] containers using
|
|
that tool (or the graphical user interface).
|
|
|
|
Our image repositories contain a list of available images, and there
|
|
is a cron job run each day to download that list. You can trigger that
|
|
update manually with:
|
|
|
|
pveam update
|
|
|
|
After that you can view the list of available images using:
|
|
|
|
pveam available
|
|
|
|
You can restrict this large list by specifying the 'section' you are
|
|
interested in, for example basic 'system' images:
|
|
|
|
.List available system images
|
|
----
|
|
# pveam available --section system
|
|
system archlinux-base_2015-24-29-1_x86_64.tar.gz
|
|
system centos-7-default_20160205_amd64.tar.xz
|
|
system debian-6.0-standard_6.0-7_amd64.tar.gz
|
|
system debian-7.0-standard_7.0-3_amd64.tar.gz
|
|
system debian-8.0-standard_8.0-1_amd64.tar.gz
|
|
system ubuntu-12.04-standard_12.04-1_amd64.tar.gz
|
|
system ubuntu-14.04-standard_14.04-1_amd64.tar.gz
|
|
system ubuntu-15.04-standard_15.04-1_amd64.tar.gz
|
|
system ubuntu-15.10-standard_15.10-1_amd64.tar.gz
|
|
----
|
|
|
|
Before you can use such a template, you need to download them into one
|
|
of your storages. You can simply use storage 'local' for that
|
|
purpose. For clustered installations, it is preferred to use a shared
|
|
storage so that all nodes can access those images.
|
|
|
|
pveam download local debian-8.0-standard_8.0-1_amd64.tar.gz
|
|
|
|
You are now ready to create containers using that image, and you can
|
|
list all downloaded images on storage 'local' with:
|
|
|
|
----
|
|
# pveam list local
|
|
local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz 190.20MB
|
|
----
|
|
|
|
The above command shows you the full {pve} volume identifiers. They include
|
|
the storage name, and most other {pve} commands can use them. For
|
|
examply you can delete that image later with:
|
|
|
|
pveam remove local:vztmpl/debian-8.0-standard_8.0-1_amd64.tar.gz
|
|
|
|
|
|
Container Storage
|
|
-----------------
|
|
|
|
Traditional containers use a very simple storage model, only allowing
|
|
a single mount point, the root file system. This was further
|
|
restricted to specific file system types like 'ext4' and 'nfs'.
|
|
Additional mounts are often done by user provided scripts. This turend
|
|
out to be complex and error prone, so we try to avoid that now.
|
|
|
|
Our new LXC based container model is more flexible regarding
|
|
storage. First, you can have more than a single mount point. This
|
|
allows you to choose a suitable storage for each application. For
|
|
example, you can use a relatively slow (and thus cheap) storage for
|
|
the container root file system. Then you can use a second mount point
|
|
to mount a very fast, distributed storage for your database
|
|
application.
|
|
|
|
The second big improvement is that you can use any storage type
|
|
supported by the {pve} storage library. That means that you can store
|
|
your containers on local 'lvmthin' or 'zfs', shared 'iSCSI' storage,
|
|
or even on distributed storage systems like 'ceph'. It also enables us
|
|
to use advanced storage features like snapshots and clones. 'vzdump'
|
|
can also use the snapshot feature to provide consistent container
|
|
backups.
|
|
|
|
Last but not least, you can also mount local devices directly, or
|
|
mount local directories using bind mounts. That way you can access
|
|
local storage inside containers with zero overhead. Such bind mounts
|
|
also provide an easy way to share data between different containers.
|
|
|
|
|
|
Mount Points
|
|
~~~~~~~~~~~~
|
|
|
|
Beside the root directory the container can also have additional mount points.
|
|
Currently there are basically three types of mount points: storage backed
|
|
mount points, bind mounts and device mounts.
|
|
|
|
Storage backed mount points are managed by the {pve} storage subsystem and come
|
|
in three different flavors:
|
|
|
|
- Image based: These are raw images containing a single ext4 formatted file
|
|
system.
|
|
- ZFS Subvolumes: These are technically bind mounts, but with managed storage,
|
|
and thus allow resizing and snapshotting.
|
|
- Directories: passing `size=0` triggers a special case where instead of a raw
|
|
image a directory is created.
|
|
|
|
Bind mounts are considered to not be managed by the storage subsystem, so you
|
|
cannot make snapshots or deal with quotas from inside the container, and with
|
|
unprivileged containers you might run into permission problems caused by the
|
|
user mapping, and cannot use ACLs from inside an unprivileged container.
|
|
|
|
Similarly device mounts are not managed by the storage, but for these the
|
|
`quota` and `acl` options will be honored.
|
|
|
|
WARNING: Because of existing issues in the Linux kernel's freezer
|
|
subsystem the usage of FUSE mounts inside a container is strongly
|
|
advised against, as containers need to be frozen for suspend or
|
|
snapshot mode backups. If FUSE mounts cannot be replaced by other
|
|
mounting mechanisms or storage technologies, it is possible to
|
|
establish the FUSE mount on the Proxmox host and use a bind
|
|
mount point to make it accessible inside the container.
|
|
|
|
WARNING: For security reasons, bind mounts should only be established
|
|
using source directories especially reserved for this purpose, e.g., a
|
|
directory hierarchy under `/mnt/bindmounts`. Never bind mount system
|
|
directories like `/`, `/var` or `/etc` into a container - this poses a
|
|
great security risk. The bind mount source path must not contain any symlinks.
|
|
|
|
The root mount point is configured with the 'rootfs' property, and you can
|
|
configure up to 10 additional mount points. The corresponding options
|
|
are called 'mp0' to 'mp9', and they can contain the following setting:
|
|
|
|
include::pct-mountpoint-opts.adoc[]
|
|
|
|
.Typical Container 'rootfs' configuration
|
|
----
|
|
rootfs: thin1:base-100-disk-1,size=8G
|
|
----
|
|
|
|
Using quotas inside containers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Quotas allow to set limits inside a container for the amount of disk
|
|
space that each user can use. This only works on ext4 image based
|
|
storage types and currently does not work with unprivileged
|
|
containers.
|
|
|
|
Activating the `quota` option causes the following mount options to be
|
|
used for a mount point:
|
|
`usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0`
|
|
|
|
This allows quotas to be used like you would on any other system. You
|
|
can initialize the `/aquota.user` and `/aquota.group` files by running
|
|
|
|
----
|
|
quotacheck -cmug /
|
|
quotaon /
|
|
----
|
|
|
|
and edit the quotas via the `edquota` command. Refer to the documentation
|
|
of the distribution running inside the container for details.
|
|
|
|
NOTE: You need to run the above commands for every mount point by passing
|
|
the mount point's path instead of just `/`.
|
|
|
|
|
|
Using ACLs inside containers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The standard Posix Access Control Lists are also available inside containers.
|
|
ACLs allow you to set more detailed file ownership than the traditional user/
|
|
group/others model.
|
|
|
|
|
|
Container Network
|
|
-----------------
|
|
|
|
You can configure up to 10 network interfaces for a single
|
|
container. The corresponding options are called 'net0' to 'net9', and
|
|
they can contain the following setting:
|
|
|
|
include::pct-network-opts.adoc[]
|
|
|
|
|
|
Managing Containers with 'pct'
|
|
------------------------------
|
|
|
|
'pct' is the tool to manage Linux Containers on {pve}. You can create
|
|
and destroy containers, and control execution (start, stop, migrate,
|
|
...). You can use pct to set parameters in the associated config file,
|
|
like network configuration or memory limits.
|
|
|
|
CLI Usage Examples
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Create a container based on a Debian template (provided you have
|
|
already downloaded the template via the webgui)
|
|
|
|
pct create 100 /var/lib/vz/template/cache/debian-8.0-standard_8.0-1_amd64.tar.gz
|
|
|
|
Start container 100
|
|
|
|
pct start 100
|
|
|
|
Start a login session via getty
|
|
|
|
pct console 100
|
|
|
|
Enter the LXC namespace and run a shell as root user
|
|
|
|
pct enter 100
|
|
|
|
Display the configuration
|
|
|
|
pct config 100
|
|
|
|
Add a network interface called eth0, bridged to the host bridge vmbr0,
|
|
set the address and gateway, while it's running
|
|
|
|
pct set 100 -net0 name=eth0,bridge=vmbr0,ip=192.168.15.147/24,gw=192.168.15.1
|
|
|
|
Reduce the memory of the container to 512MB
|
|
|
|
pct set 100 -memory 512
|
|
|
|
|
|
Files
|
|
------
|
|
|
|
'/etc/pve/lxc/<CTID>.conf'::
|
|
|
|
Configuration file for the container '<CTID>'.
|
|
|
|
|
|
Container Advantages
|
|
--------------------
|
|
|
|
- Simple, and fully integrated into {pve}. Setup looks similar to a normal
|
|
VM setup.
|
|
|
|
* Storage (ZFS, LVM, NFS, Ceph, ...)
|
|
|
|
* Network
|
|
|
|
* Authentification
|
|
|
|
* Cluster
|
|
|
|
- Fast: minimal overhead, as fast as bare metal
|
|
|
|
- High density (perfect for idle workloads)
|
|
|
|
- REST API
|
|
|
|
- Direct hardware access
|
|
|
|
|
|
Technology Overview
|
|
-------------------
|
|
|
|
- Integrated into {pve} graphical user interface (GUI)
|
|
|
|
- LXC (https://linuxcontainers.org/)
|
|
|
|
- cgmanager for cgroup management
|
|
|
|
- lxcfs to provive containerized /proc file system
|
|
|
|
- apparmor
|
|
|
|
- CRIU: for live migration (planned)
|
|
|
|
- We use latest available kernels (4.4.X)
|
|
|
|
- Image based deployment (templates)
|
|
|
|
- Container setup from host (Network, DNS, Storage, ...)
|
|
|
|
|
|
ifdef::manvolnum[]
|
|
include::pve-copyright.adoc[]
|
|
endif::manvolnum[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|