mirror of
git://git.proxmox.com/git/pve-docs.git
synced 2025-05-28 13:05:37 +03:00
initial documentation for qdevice
Authored by: Thomas Lamprecht <t.lamprecht@proxmox.com> Co-Authored by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
This commit is contained in:
parent
b05a12f81d
commit
c21d2cbe57
150
pvecm.adoc
150
pvecm.adoc
@ -753,6 +753,156 @@ If you cannot reboot the whole cluster ensure no High Availability services are
|
||||
configured and the stop the corosync service on all nodes. After corosync is
|
||||
stopped on all nodes start it one after the other again.
|
||||
|
||||
Corosync External Vote Support
|
||||
------------------------------
|
||||
|
||||
This section describes a way to deploy an external voter in a {pve} cluster.
|
||||
When configured, the cluster can sustain more node failures without
|
||||
violating safety properties of the cluster communication.
|
||||
|
||||
For this to work there are two services involved:
|
||||
|
||||
* a so called qdevice daemon which runs on each {pve} node
|
||||
|
||||
* an external vote daemon which runs on an independent server.
|
||||
|
||||
As a result you can achieve higher availability even in smaller setups (for
|
||||
example 2+1 nodes).
|
||||
|
||||
QDevice Technical Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Corosync Quroum Device (QDevice) is a daemon which runs on each cluster
|
||||
node. It provides a configured number of votes to the clusters quorum
|
||||
subsystem based on an external running third-party arbitrator's decision.
|
||||
Its primary use is to allow a cluster to sustain more node failures than
|
||||
standard quorum rules allow. This can be done safely as the external device
|
||||
can see all nodes and thus choose only one set of nodes to give its vote.
|
||||
This will only be done if said set of nodes can quorate (again) when
|
||||
receiving the third-party vote.
|
||||
|
||||
Currently only 'QDevice Net' is supported as a third-party arbitrator. It is
|
||||
a daemon which provides a vote to a cluster partition if it can reach the
|
||||
partition members over the network. It will give only votes to one partition
|
||||
of a cluster at any time.
|
||||
It's designed to support multiple clusters and is almost configuration and
|
||||
state free. New clusters are handled dynamically and no configuration file
|
||||
is needed on the host running a QDevice.
|
||||
|
||||
The external host has the only requirement that it needs network access to the
|
||||
cluster and a corosync-qnetd package available. We provide such a package
|
||||
for Debian based hosts, other Linux distributions should also have a package
|
||||
available through their respective package manager.
|
||||
|
||||
NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
|
||||
TCP/IP and thus does not need a multicast capable network between itself and
|
||||
the cluster. In fact the daemon may run outside of the LAN and can have
|
||||
longer latencies than 2 ms.
|
||||
|
||||
|
||||
Supported Setups
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
We support QDevices for clusters with an even number of nodes and recommend
|
||||
it for 2 node clusters, if they should provide higher availability.
|
||||
For clusters with an odd node count we discourage the use of QDevices
|
||||
currently. The reason for this, is the difference of the votes the QDevice
|
||||
provides for each cluster type. Even numbered clusters get single additional
|
||||
vote, with this we can only increase availability, i.e. if the QDevice
|
||||
itself fails we are in the same situation as with no QDevice at all.
|
||||
|
||||
Now, with an odd numbered cluster size the QDevice provides '(N-1)' votes --
|
||||
where 'N' corresponds to the cluster node count. This difference makes
|
||||
sense, if we had only one additional vote the cluster can get into a split
|
||||
brain situation.
|
||||
This algorithm would allow that all nodes but one (and naturally the
|
||||
QDevice itself) could fail.
|
||||
There are two drawbacks with this:
|
||||
|
||||
* If the QNet daemon itself fails, no other node may fail or the cluster
|
||||
immediately loses quorum. For example, in a cluster with 15 nodes 7
|
||||
could fail before the cluster becomes inquorate. But, if a QDevice is
|
||||
configured here and said QDevice fails itself **no single node** of
|
||||
the 15 may fail. The QDevice acts almost as a single point of failure in
|
||||
this case.
|
||||
|
||||
* The fact that all but one node plus QDevice may fail sound promising at
|
||||
first, but this may result in a mass recovery of HA services that would
|
||||
overload the single node left. Also ceph server will stop to provide
|
||||
services after only '((N-1)/2)' nodes are online.
|
||||
|
||||
If you understand the drawbacks and implications you can decide yourself if
|
||||
you should use this technology in an odd numbered cluster setup.
|
||||
|
||||
|
||||
QDevice-Net Setup
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
We recommend to run any daemon which provides votes to corosync-qdevice as an
|
||||
unprivileged user. {pve} and Debian Stretch provide a package which is
|
||||
already configured to do so.
|
||||
The traffic between the daemon and the cluster must be encrypted to ensure a
|
||||
safe and secure QDevice integration in {pve}.
|
||||
|
||||
First install the 'corosync-qnetd' package on your external server and
|
||||
the 'corosync-qdevice' package on all cluster nodes.
|
||||
|
||||
After that, ensure that all your nodes on the cluster are online.
|
||||
|
||||
You can now easily set up your QDevice by running the following command on one
|
||||
of the {pve} nodes:
|
||||
|
||||
----
|
||||
pve# pvecm qdevice setup <QDEVICE-IP>
|
||||
----
|
||||
|
||||
The SSH key from the cluster will be automatically copied to the QDevice. You
|
||||
might need to enter an SSH password during this step.
|
||||
|
||||
After you enter the password and all the steps are successfully completed, you
|
||||
will see "Done". You can check the status now:
|
||||
|
||||
----
|
||||
pve# pvecm status
|
||||
|
||||
...
|
||||
|
||||
Votequorum information
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Expected votes: 3
|
||||
Highest expected: 3
|
||||
Total votes: 3
|
||||
Quorum: 2
|
||||
Flags: Quorate Qdevice
|
||||
|
||||
Membership information
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
Nodeid Votes Qdevice Name
|
||||
0x00000001 1 A,V,NMW 192.168.22.180 (local)
|
||||
0x00000002 1 A,V,NMW 192.168.22.181
|
||||
0x00000000 1 Qdevice
|
||||
|
||||
----
|
||||
|
||||
which means the QDevice is set up.
|
||||
|
||||
|
||||
Frequently Asked Questions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Tie Breaking
|
||||
^^^^^^^^^^^^
|
||||
|
||||
In case of a tie, where two same-sized cluster partitions cannot see each
|
||||
other but the QDevice, the QDevice chooses randomly one of those partitions and
|
||||
provides a vote to it.
|
||||
|
||||
Still TODO
|
||||
^^^^^^^^^^
|
||||
|
||||
There ist still stuff to add here
|
||||
|
||||
|
||||
Corosync Configuration
|
||||
----------------------
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user