mirror of
git://git.proxmox.com/git/pve-docs.git
synced 2025-01-21 18:03:45 +03:00
223 lines
7.8 KiB
Plaintext
223 lines
7.8 KiB
Plaintext
[[chapter_pmxcfs]]
|
|
ifdef::manvolnum[]
|
|
pmxcfs(8)
|
|
=========
|
|
:pve-toplevel:
|
|
|
|
NAME
|
|
----
|
|
|
|
pmxcfs - Proxmox Cluster File System
|
|
|
|
SYNOPSIS
|
|
--------
|
|
|
|
include::pmxcfs.8-synopsis.adoc[]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
endif::manvolnum[]
|
|
|
|
ifndef::manvolnum[]
|
|
Proxmox Cluster File System (pmxcfs)
|
|
====================================
|
|
:pve-toplevel:
|
|
endif::manvolnum[]
|
|
|
|
The Proxmox Cluster file system (``pmxcfs'') is a database-driven file
|
|
system for storing configuration files, replicated in real time to all
|
|
cluster nodes using `corosync`. We use this to store all PVE related
|
|
configuration files.
|
|
|
|
Although the file system stores all data inside a persistent database
|
|
on disk, a copy of the data resides in RAM. That imposes restriction
|
|
on the maximum size, which is currently 30MB. This is still enough to
|
|
store the configuration of several thousand virtual machines.
|
|
|
|
This system provides the following advantages:
|
|
|
|
* seamless replication of all configuration to all nodes in real time
|
|
* provides strong consistency checks to avoid duplicate VM IDs
|
|
* read-only when a node loses quorum
|
|
* automatic updates of the corosync cluster configuration to all nodes
|
|
* includes a distributed locking mechanism
|
|
|
|
|
|
POSIX Compatibility
|
|
-------------------
|
|
|
|
The file system is based on FUSE, so the behavior is POSIX like. But
|
|
some feature are simply not implemented, because we do not need them:
|
|
|
|
* you can just generate normal files and directories, but no symbolic
|
|
links, ...
|
|
|
|
* you can't rename non-empty directories (because this makes it easier
|
|
to guarantee that VMIDs are unique).
|
|
|
|
* you can't change file permissions (permissions are based on path)
|
|
|
|
* `O_EXCL` creates were not atomic (like old NFS)
|
|
|
|
* `O_TRUNC` creates are not atomic (FUSE restriction)
|
|
|
|
|
|
File Access Rights
|
|
------------------
|
|
|
|
All files and directories are owned by user `root` and have group
|
|
`www-data`. Only root has write permissions, but group `www-data` can
|
|
read most files. Files below the following paths:
|
|
|
|
/etc/pve/priv/
|
|
/etc/pve/nodes/${NAME}/priv/
|
|
|
|
are only accessible by root.
|
|
|
|
|
|
Technology
|
|
----------
|
|
|
|
We use the http://www.corosync.org[Corosync Cluster Engine] for
|
|
cluster communication, and http://www.sqlite.org[SQlite] for the
|
|
database file. The file system is implemented in user space using
|
|
http://fuse.sourceforge.net[FUSE].
|
|
|
|
File System Layout
|
|
------------------
|
|
|
|
The file system is mounted at:
|
|
|
|
/etc/pve
|
|
|
|
Files
|
|
~~~~~
|
|
|
|
[width="100%",cols="m,d"]
|
|
|=======
|
|
|`corosync.conf` | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf)
|
|
|`storage.cfg` | {pve} storage configuration
|
|
|`datacenter.cfg` | {pve} datacenter wide configuration (keyboard layout, proxy, ...)
|
|
|`user.cfg` | {pve} access control configuration (users/groups/...)
|
|
|`domains.cfg` | {pve} authentication domains
|
|
|`status.cfg` | {pve} external metrics server configuration
|
|
|`authkey.pub` | Public key used by ticket system
|
|
|`pve-root-ca.pem` | Public certificate of cluster CA
|
|
|`priv/shadow.cfg` | Shadow password file
|
|
|`priv/authkey.key` | Private key used by ticket system
|
|
|`priv/pve-root-ca.key` | Private key of cluster CA
|
|
|`nodes/<NAME>/pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA)
|
|
|`nodes/<NAME>/pve-ssl.key` | Private SSL key for `pve-ssl.pem`
|
|
|`nodes/<NAME>/pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`)
|
|
|`nodes/<NAME>/pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional)
|
|
|`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs
|
|
|`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers
|
|
|`firewall/cluster.fw` | Firewall configuration applied to all nodes
|
|
|`firewall/<NAME>.fw` | Firewall configuration for individual nodes
|
|
|`firewall/<VMID>.fw` | Firewall configuration for VMs and Containers
|
|
|=======
|
|
|
|
|
|
Symbolic links
|
|
~~~~~~~~~~~~~~
|
|
|
|
[width="100%",cols="m,m"]
|
|
|=======
|
|
|`local` | `nodes/<LOCAL_HOST_NAME>`
|
|
|`qemu-server` | `nodes/<LOCAL_HOST_NAME>/qemu-server/`
|
|
|`lxc` | `nodes/<LOCAL_HOST_NAME>/lxc/`
|
|
|=======
|
|
|
|
|
|
Special status files for debugging (JSON)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
[width="100%",cols="m,d"]
|
|
|=======
|
|
|`.version` |File versions (to detect file modifications)
|
|
|`.members` |Info about cluster members
|
|
|`.vmlist` |List of all VMs
|
|
|`.clusterlog` |Cluster log (last 50 entries)
|
|
|`.rrd` |RRD data (most recent entries)
|
|
|=======
|
|
|
|
|
|
Enable/Disable debugging
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You can enable verbose syslog messages with:
|
|
|
|
echo "1" >/etc/pve/.debug
|
|
|
|
And disable verbose syslog messages with:
|
|
|
|
echo "0" >/etc/pve/.debug
|
|
|
|
|
|
Recovery
|
|
--------
|
|
|
|
If you have major problems with your Proxmox VE host, e.g. hardware
|
|
issues, it could be helpful to just copy the pmxcfs database file
|
|
`/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE
|
|
host. On the new host (with nothing running), you need to stop the
|
|
`pve-cluster` service and replace the `config.db` file (needed permissions
|
|
`0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the
|
|
lost Proxmox VE host, then reboot and check. (And don't forget your
|
|
VM/CT data)
|
|
|
|
|
|
Remove Cluster configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The recommended way is to reinstall the node after you removed it from
|
|
your cluster. This makes sure that all secret cluster/ssh keys and any
|
|
shared configuration data is destroyed.
|
|
|
|
In some cases, you might prefer to put a node back to local mode without
|
|
reinstall, which is described in
|
|
<<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
|
|
|
|
|
|
Recovering/Moving Guests from Failed Nodes
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
|
|
`nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as
|
|
owner of the respective guest. This concept enables the usage of local locks
|
|
instead of expensive cluster-wide locks for preventing concurrent guest
|
|
configuration changes.
|
|
|
|
As a consequence, if the owning node of a guest fails (e.g., because of a power
|
|
outage, fencing event, ..), a regular migration is not possible (even if all
|
|
the disks are located on shared storage) because such a local lock on the
|
|
(dead) owning node is unobtainable. This is not a problem for HA-managed
|
|
guests, as {pve}'s High Availability stack includes the necessary
|
|
(cluster-wide) locking and watchdog functionality to ensure correct and
|
|
automatic recovery of guests from fenced nodes.
|
|
|
|
If a non-HA-managed guest has only shared disks (and no other local resources
|
|
which are only available on the failed node are configured), a manual recovery
|
|
is possible by simply moving the guest configuration file from the failed
|
|
node's directory in `/etc/pve/` to an alive node's directory (which changes the
|
|
logical owner or location of the guest).
|
|
|
|
For example, recovering the VM with ID `100` from a dead `node1` to another
|
|
node `node2` works with the following command executed when logged in as root
|
|
on any member node of the cluster:
|
|
|
|
mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
|
|
|
|
WARNING: Before manually recovering a guest like this, make absolutely sure
|
|
that the failed source node is really powered off/fenced. Otherwise {pve}'s
|
|
locking principles are violated by the `mv` command, which can have unexpected
|
|
consequences.
|
|
|
|
WARNING: Guest with local disks (or other local resources which are only
|
|
available on the dead node) are not recoverable like this. Either wait for the
|
|
failed node to rejoin the cluster or restore such guests from backups.
|
|
|
|
ifdef::manvolnum[]
|
|
include::pve-copyright.adoc[]
|
|
endif::manvolnum[]
|