2017-06-20 18:27:07 +02:00
[[chapter_pmxcfs]]
2016-04-09 13:28:57 +02:00
ifdef::manvolnum[]
2016-10-12 06:54:29 +02:00
pmxcfs(8)
=========
2016-10-08 17:22:48 +02:00
:pve-toplevel:
2016-04-09 13:28:57 +02:00
NAME
----
pmxcfs - Proxmox Cluster File System
2016-10-06 15:12:49 +02:00
SYNOPSIS
2016-04-09 13:28:57 +02:00
--------
2016-10-14 13:35:03 +02:00
include::pmxcfs.8-synopsis.adoc[]
2016-04-09 13:28:57 +02:00
DESCRIPTION
-----------
endif::manvolnum[]
ifndef::manvolnum[]
Proxmox Cluster File System (pmxcfs)
2016-01-05 10:11:13 +01:00
====================================
2016-10-08 17:22:48 +02:00
:pve-toplevel:
2016-10-14 07:18:04 +02:00
endif::manvolnum[]
2016-10-08 17:22:48 +02:00
2016-09-27 10:58:50 +02:00
The Proxmox Cluster file system (``pmxcfs'') is a database-driven file
2016-01-05 10:11:13 +01:00
system for storing configuration files, replicated in real time to all
2016-09-27 10:58:50 +02:00
cluster nodes using `corosync`. We use this to store all PVE related
2016-01-05 10:11:13 +01:00
configuration files.
Although the file system stores all data inside a persistent database
2021-09-14 18:14:33 +02:00
on disk, a copy of the data resides in RAM. This imposes restrictions
2016-09-27 10:58:51 +02:00
on the maximum size, which is currently 30MB. This is still enough to
2016-01-05 10:11:13 +01:00
store the configuration of several thousand virtual machines.
2016-04-09 15:34:55 +02:00
This system provides the following advantages:
2016-01-05 10:11:13 +01:00
2021-09-14 18:14:33 +02:00
* Seamless replication of all configuration to all nodes in real time
* Provides strong consistency checks to avoid duplicate VM IDs
* Read-only when a node loses quorum
* Automatic updates of the corosync cluster configuration to all nodes
* Includes a distributed locking mechanism
2016-01-05 10:11:13 +01:00
2016-09-27 10:58:51 +02:00
2016-01-05 10:11:13 +01:00
POSIX Compatibility
2016-04-09 15:34:55 +02:00
-------------------
2016-01-05 10:11:13 +01:00
The file system is based on FUSE, so the behavior is POSIX like. But
some feature are simply not implemented, because we do not need them:
2021-09-14 18:14:33 +02:00
* You can just generate normal files and directories, but no symbolic
2016-01-05 10:11:13 +01:00
links, ...
2021-09-14 18:14:33 +02:00
* You can't rename non-empty directories (because this makes it easier
2016-01-05 10:11:13 +01:00
to guarantee that VMIDs are unique).
2021-09-14 18:14:33 +02:00
* You can't change file permissions (permissions are based on paths)
2016-01-05 10:11:13 +01:00
* `O_EXCL` creates were not atomic (like old NFS)
* `O_TRUNC` creates are not atomic (FUSE restriction)
2016-09-27 10:58:51 +02:00
File Access Rights
2016-04-09 15:34:55 +02:00
------------------
2016-01-05 10:11:13 +01:00
2016-09-27 10:58:50 +02:00
All files and directories are owned by user `root` and have group
`www-data`. Only root has write permissions, but group `www-data` can
2021-09-14 18:14:33 +02:00
read most files. Files below the following paths are only accessible by root:
2016-01-05 10:11:13 +01:00
/etc/pve/priv/
/etc/pve/nodes/${NAME}/priv/
2016-04-09 15:34:55 +02:00
2016-01-05 10:11:13 +01:00
Technology
----------
2021-04-29 13:59:42 +02:00
We use the https://www.corosync.org[Corosync Cluster Engine] for
cluster communication, and https://www.sqlite.org[SQlite] for the
2016-09-27 10:58:51 +02:00
database file. The file system is implemented in user space using
2021-04-29 13:59:42 +02:00
https://github.com/libfuse/libfuse[FUSE].
2016-01-05 10:11:13 +01:00
2016-09-27 10:58:51 +02:00
File System Layout
2016-01-05 10:11:13 +01:00
------------------
The file system is mounted at:
/etc/pve
Files
~~~~~
[width="100%",cols="m,d"]
|=======
2021-09-14 18:14:34 +02:00
|`authkey.pub` | Public key used by the ticket system
|`ceph.conf` | Ceph configuration file (note: /etc/ceph/ceph.conf is a symbolic link to this)
|`corosync.conf` | Corosync cluster configuration file (prior to {pve} 4.x, this file was called cluster.conf)
|`datacenter.cfg` | {pve} data center-wide configuration (keyboard layout, proxy, ...)
2016-09-27 10:58:50 +02:00
|`domains.cfg` | {pve} authentication domains
2021-09-14 18:14:34 +02:00
|`firewall/cluster.fw` | Firewall configuration applied to all nodes
|`firewall/<NAME>.fw` | Firewall configuration for individual nodes
|`firewall/<VMID>.fw` | Firewall configuration for VMs and containers
|`ha/crm_commands` | Displays HA operations that are currently being carried out by the CRM
|`ha/manager_status` | JSON-formatted information regarding HA services on the cluster
|`ha/resources.cfg` | Resources managed by high availability, and their current state
|`nodes/<NAME>/config` | Node-specific configuration
|`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers
|`nodes/<NAME>/openvz/` | Prior to PVE 4.0, used for container configuration data (deprecated, removed soon)
2016-09-27 10:58:50 +02:00
|`nodes/<NAME>/pve-ssl.key` | Private SSL key for `pve-ssl.pem`
2021-09-14 18:14:34 +02:00
|`nodes/<NAME>/pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA)
2016-09-27 10:58:50 +02:00
|`nodes/<NAME>/pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional)
2021-09-14 18:14:34 +02:00
|`nodes/<NAME>/pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`)
2016-09-27 10:58:50 +02:00
|`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs
2021-09-14 18:14:34 +02:00
|`priv/authkey.key` | Private key used by ticket system
|`priv/authorized_keys` | SSH keys of cluster members for authentication
|`priv/ceph*` | Ceph authentication keys and associated capabilities
|`priv/known_hosts` | SSH keys of the cluster members for verification
|`priv/lock/*` | Lock files used by various services to ensure safe cluster-wide operations
|`priv/pve-root-ca.key` | Private key of cluster CA
|`priv/shadow.cfg` | Shadow password file for PVE Realm users
|`priv/storage/<STORAGE-ID>.pw` | Contains the password of a storage in plain text
|`priv/tfa.cfg` | Base64-encoded two-factor authentication configuration
|`priv/token.cfg` | API token secrets of all tokens
|`pve-root-ca.pem` | Public certificate of cluster CA
|`pve-www.key` | Private key used for generating CSRF tokens
|`sdn/*` | Shared configuration files for Software Defined Networking (SDN)
|`status.cfg` | {pve} external metrics server configuration
|`storage.cfg` | {pve} storage configuration
|`user.cfg` | {pve} access control configuration (users/groups/...)
|`virtual-guest/cpu-models.conf` | For storing custom CPU models
|`vzdump.cron` | Cluster-wide vzdump backup-job schedule
2016-01-05 10:11:13 +01:00
|=======
2016-09-27 10:58:51 +02:00
2016-01-05 10:11:13 +01:00
Symbolic links
~~~~~~~~~~~~~~
2021-09-14 18:14:34 +02:00
Certain directories within the cluster file system use symbolic links, in order
to point to a node's own configuration files. Thus, the files pointed to in the
table below refer to different files on each node of the cluster.
2016-01-05 10:11:13 +01:00
[width="100%",cols="m,m"]
|=======
2016-09-27 10:58:50 +02:00
|`local` | `nodes/<LOCAL_HOST_NAME>`
|`lxc` | `nodes/<LOCAL_HOST_NAME>/lxc/`
2021-09-14 18:14:34 +02:00
|`openvz` | `nodes/<LOCAL_HOST_NAME>/openvz/` (deprecated, removed soon)
|`qemu-server` | `nodes/<LOCAL_HOST_NAME>/qemu-server/`
2016-01-05 10:11:13 +01:00
|=======
2016-09-27 10:58:51 +02:00
2016-01-05 10:11:13 +01:00
Special status files for debugging (JSON)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[width="100%",cols="m,d"]
|=======
2016-09-27 10:58:50 +02:00
|`.version` |File versions (to detect file modifications)
|`.members` |Info about cluster members
|`.vmlist` |List of all VMs
|`.clusterlog` |Cluster log (last 50 entries)
|`.rrd` |RRD data (most recent entries)
2016-01-05 10:11:13 +01:00
|=======
2016-09-27 10:58:51 +02:00
2016-01-05 10:11:13 +01:00
Enable/Disable debugging
~~~~~~~~~~~~~~~~~~~~~~~~
You can enable verbose syslog messages with:
2017-01-24 12:10:44 +01:00
echo "1" >/etc/pve/.debug
2016-01-05 10:11:13 +01:00
And disable verbose syslog messages with:
2017-01-24 12:10:44 +01:00
echo "0" >/etc/pve/.debug
2016-01-05 10:11:13 +01:00
Recovery
--------
2021-09-14 18:14:33 +02:00
If you have major problems with your {pve} host, for example hardware
issues, it could be helpful to copy the pmxcfs database file
`/var/lib/pve-cluster/config.db`, and move it to a new {pve}
2016-01-05 10:11:13 +01:00
host. On the new host (with nothing running), you need to stop the
2021-09-14 18:14:33 +02:00
`pve-cluster` service and replace the `config.db` file (required permissions
`0600`). Following this, adapt `/etc/hostname` and `/etc/hosts` according to the
lost {pve} host, then reboot and check (and don't forget your
VM/CT data).
2016-01-05 10:11:13 +01:00
2016-09-27 10:58:51 +02:00
2021-09-14 18:14:33 +02:00
Remove Cluster Configuration
2016-01-05 10:11:13 +01:00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-09-14 18:14:33 +02:00
The recommended way is to reinstall the node after you remove it from
your cluster. This ensures that all secret cluster/ssh keys and any
2016-01-05 10:11:13 +01:00
shared configuration data is destroyed.
2016-10-20 17:11:38 +02:00
In some cases, you might prefer to put a node back to local mode without
2021-09-14 18:14:33 +02:00
reinstalling, which is described in
2016-10-20 17:11:38 +02:00
<<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
2016-04-09 13:28:57 +02:00
2016-11-08 14:44:15 +01:00
Recovering/Moving Guests from Failed Nodes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
2021-09-14 18:14:33 +02:00
`nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as the
2016-11-08 14:44:15 +01:00
owner of the respective guest. This concept enables the usage of local locks
instead of expensive cluster-wide locks for preventing concurrent guest
configuration changes.
2021-09-14 18:14:33 +02:00
As a consequence, if the owning node of a guest fails (for example, due to a power
outage, fencing event, etc.), a regular migration is not possible (even if all
the disks are located on shared storage), because such a local lock on the
(offline) owning node is unobtainable. This is not a problem for HA-managed
2016-11-08 14:44:15 +01:00
guests, as {pve}'s High Availability stack includes the necessary
(cluster-wide) locking and watchdog functionality to ensure correct and
automatic recovery of guests from fenced nodes.
If a non-HA-managed guest has only shared disks (and no other local resources
2021-09-14 18:14:33 +02:00
which are only available on the failed node), a manual recovery
2016-11-08 14:44:15 +01:00
is possible by simply moving the guest configuration file from the failed
2021-09-14 18:14:33 +02:00
node's directory in `/etc/pve/` to an online node's directory (which changes the
2016-11-08 14:44:15 +01:00
logical owner or location of the guest).
2021-09-14 18:14:33 +02:00
For example, recovering the VM with ID `100` from an offline `node1` to another
node `node2` works by running the following command as root on any member node
of the cluster:
2016-11-08 14:44:15 +01:00
mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
WARNING: Before manually recovering a guest like this, make absolutely sure
that the failed source node is really powered off/fenced. Otherwise {pve}'s
locking principles are violated by the `mv` command, which can have unexpected
consequences.
2021-09-14 18:14:33 +02:00
WARNING: Guests with local disks (or other local resources which are only
available on the offline node) are not recoverable like this. Either wait for the
2016-11-08 14:44:15 +01:00
failed node to rejoin the cluster or restore such guests from backups.
2016-04-09 13:28:57 +02:00
ifdef::manvolnum[]
include::pve-copyright.adoc[]
endif::manvolnum[]