5
0
mirror of git://git.proxmox.com/git/pve-docs.git synced 2025-05-28 13:05:37 +03:00

initial documentation for qdevice

Authored by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Co-Authored by: Oguz Bektas <o.bektas@proxmox.com>
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
This commit is contained in:
Oguz Bektas 2019-03-04 14:15:13 +01:00 committed by Thomas Lamprecht
parent b05a12f81d
commit c21d2cbe57

View File

@ -753,6 +753,156 @@ If you cannot reboot the whole cluster ensure no High Availability services are
configured and the stop the corosync service on all nodes. After corosync is configured and the stop the corosync service on all nodes. After corosync is
stopped on all nodes start it one after the other again. stopped on all nodes start it one after the other again.
Corosync External Vote Support
------------------------------
This section describes a way to deploy an external voter in a {pve} cluster.
When configured, the cluster can sustain more node failures without
violating safety properties of the cluster communication.
For this to work there are two services involved:
* a so called qdevice daemon which runs on each {pve} node
* an external vote daemon which runs on an independent server.
As a result you can achieve higher availability even in smaller setups (for
example 2+1 nodes).
QDevice Technical Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~
The Corosync Quroum Device (QDevice) is a daemon which runs on each cluster
node. It provides a configured number of votes to the clusters quorum
subsystem based on an external running third-party arbitrator's decision.
Its primary use is to allow a cluster to sustain more node failures than
standard quorum rules allow. This can be done safely as the external device
can see all nodes and thus choose only one set of nodes to give its vote.
This will only be done if said set of nodes can quorate (again) when
receiving the third-party vote.
Currently only 'QDevice Net' is supported as a third-party arbitrator. It is
a daemon which provides a vote to a cluster partition if it can reach the
partition members over the network. It will give only votes to one partition
of a cluster at any time.
It's designed to support multiple clusters and is almost configuration and
state free. New clusters are handled dynamically and no configuration file
is needed on the host running a QDevice.
The external host has the only requirement that it needs network access to the
cluster and a corosync-qnetd package available. We provide such a package
for Debian based hosts, other Linux distributions should also have a package
available through their respective package manager.
NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
TCP/IP and thus does not need a multicast capable network between itself and
the cluster. In fact the daemon may run outside of the LAN and can have
longer latencies than 2 ms.
Supported Setups
~~~~~~~~~~~~~~~~
We support QDevices for clusters with an even number of nodes and recommend
it for 2 node clusters, if they should provide higher availability.
For clusters with an odd node count we discourage the use of QDevices
currently. The reason for this, is the difference of the votes the QDevice
provides for each cluster type. Even numbered clusters get single additional
vote, with this we can only increase availability, i.e. if the QDevice
itself fails we are in the same situation as with no QDevice at all.
Now, with an odd numbered cluster size the QDevice provides '(N-1)' votes --
where 'N' corresponds to the cluster node count. This difference makes
sense, if we had only one additional vote the cluster can get into a split
brain situation.
This algorithm would allow that all nodes but one (and naturally the
QDevice itself) could fail.
There are two drawbacks with this:
* If the QNet daemon itself fails, no other node may fail or the cluster
immediately loses quorum. For example, in a cluster with 15 nodes 7
could fail before the cluster becomes inquorate. But, if a QDevice is
configured here and said QDevice fails itself **no single node** of
the 15 may fail. The QDevice acts almost as a single point of failure in
this case.
* The fact that all but one node plus QDevice may fail sound promising at
first, but this may result in a mass recovery of HA services that would
overload the single node left. Also ceph server will stop to provide
services after only '((N-1)/2)' nodes are online.
If you understand the drawbacks and implications you can decide yourself if
you should use this technology in an odd numbered cluster setup.
QDevice-Net Setup
~~~~~~~~~~~~~~~~~
We recommend to run any daemon which provides votes to corosync-qdevice as an
unprivileged user. {pve} and Debian Stretch provide a package which is
already configured to do so.
The traffic between the daemon and the cluster must be encrypted to ensure a
safe and secure QDevice integration in {pve}.
First install the 'corosync-qnetd' package on your external server and
the 'corosync-qdevice' package on all cluster nodes.
After that, ensure that all your nodes on the cluster are online.
You can now easily set up your QDevice by running the following command on one
of the {pve} nodes:
----
pve# pvecm qdevice setup <QDEVICE-IP>
----
The SSH key from the cluster will be automatically copied to the QDevice. You
might need to enter an SSH password during this step.
After you enter the password and all the steps are successfully completed, you
will see "Done". You can check the status now:
----
pve# pvecm status
...
Votequorum information
~~~~~~~~~~~~~~~~~~~~~
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Membership information
~~~~~~~~~~~~~~~~~~~~~~
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 192.168.22.180 (local)
0x00000002 1 A,V,NMW 192.168.22.181
0x00000000 1 Qdevice
----
which means the QDevice is set up.
Frequently Asked Questions
~~~~~~~~~~~~~~~~~~~~~~~~~~
Tie Breaking
^^^^^^^^^^^^
In case of a tie, where two same-sized cluster partitions cannot see each
other but the QDevice, the QDevice chooses randomly one of those partitions and
provides a vote to it.
Still TODO
^^^^^^^^^^
There ist still stuff to add here
Corosync Configuration Corosync Configuration
---------------------- ----------------------