mirror of
git://git.proxmox.com/git/pve-docs.git
synced 2025-05-28 13:05:37 +03:00
initial documentation for qdevice
Authored by: Thomas Lamprecht <t.lamprecht@proxmox.com> Co-Authored by: Oguz Bektas <o.bektas@proxmox.com> Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
This commit is contained in:
parent
b05a12f81d
commit
c21d2cbe57
150
pvecm.adoc
150
pvecm.adoc
@ -753,6 +753,156 @@ If you cannot reboot the whole cluster ensure no High Availability services are
|
|||||||
configured and the stop the corosync service on all nodes. After corosync is
|
configured and the stop the corosync service on all nodes. After corosync is
|
||||||
stopped on all nodes start it one after the other again.
|
stopped on all nodes start it one after the other again.
|
||||||
|
|
||||||
|
Corosync External Vote Support
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
This section describes a way to deploy an external voter in a {pve} cluster.
|
||||||
|
When configured, the cluster can sustain more node failures without
|
||||||
|
violating safety properties of the cluster communication.
|
||||||
|
|
||||||
|
For this to work there are two services involved:
|
||||||
|
|
||||||
|
* a so called qdevice daemon which runs on each {pve} node
|
||||||
|
|
||||||
|
* an external vote daemon which runs on an independent server.
|
||||||
|
|
||||||
|
As a result you can achieve higher availability even in smaller setups (for
|
||||||
|
example 2+1 nodes).
|
||||||
|
|
||||||
|
QDevice Technical Overview
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The Corosync Quroum Device (QDevice) is a daemon which runs on each cluster
|
||||||
|
node. It provides a configured number of votes to the clusters quorum
|
||||||
|
subsystem based on an external running third-party arbitrator's decision.
|
||||||
|
Its primary use is to allow a cluster to sustain more node failures than
|
||||||
|
standard quorum rules allow. This can be done safely as the external device
|
||||||
|
can see all nodes and thus choose only one set of nodes to give its vote.
|
||||||
|
This will only be done if said set of nodes can quorate (again) when
|
||||||
|
receiving the third-party vote.
|
||||||
|
|
||||||
|
Currently only 'QDevice Net' is supported as a third-party arbitrator. It is
|
||||||
|
a daemon which provides a vote to a cluster partition if it can reach the
|
||||||
|
partition members over the network. It will give only votes to one partition
|
||||||
|
of a cluster at any time.
|
||||||
|
It's designed to support multiple clusters and is almost configuration and
|
||||||
|
state free. New clusters are handled dynamically and no configuration file
|
||||||
|
is needed on the host running a QDevice.
|
||||||
|
|
||||||
|
The external host has the only requirement that it needs network access to the
|
||||||
|
cluster and a corosync-qnetd package available. We provide such a package
|
||||||
|
for Debian based hosts, other Linux distributions should also have a package
|
||||||
|
available through their respective package manager.
|
||||||
|
|
||||||
|
NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
|
||||||
|
TCP/IP and thus does not need a multicast capable network between itself and
|
||||||
|
the cluster. In fact the daemon may run outside of the LAN and can have
|
||||||
|
longer latencies than 2 ms.
|
||||||
|
|
||||||
|
|
||||||
|
Supported Setups
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
We support QDevices for clusters with an even number of nodes and recommend
|
||||||
|
it for 2 node clusters, if they should provide higher availability.
|
||||||
|
For clusters with an odd node count we discourage the use of QDevices
|
||||||
|
currently. The reason for this, is the difference of the votes the QDevice
|
||||||
|
provides for each cluster type. Even numbered clusters get single additional
|
||||||
|
vote, with this we can only increase availability, i.e. if the QDevice
|
||||||
|
itself fails we are in the same situation as with no QDevice at all.
|
||||||
|
|
||||||
|
Now, with an odd numbered cluster size the QDevice provides '(N-1)' votes --
|
||||||
|
where 'N' corresponds to the cluster node count. This difference makes
|
||||||
|
sense, if we had only one additional vote the cluster can get into a split
|
||||||
|
brain situation.
|
||||||
|
This algorithm would allow that all nodes but one (and naturally the
|
||||||
|
QDevice itself) could fail.
|
||||||
|
There are two drawbacks with this:
|
||||||
|
|
||||||
|
* If the QNet daemon itself fails, no other node may fail or the cluster
|
||||||
|
immediately loses quorum. For example, in a cluster with 15 nodes 7
|
||||||
|
could fail before the cluster becomes inquorate. But, if a QDevice is
|
||||||
|
configured here and said QDevice fails itself **no single node** of
|
||||||
|
the 15 may fail. The QDevice acts almost as a single point of failure in
|
||||||
|
this case.
|
||||||
|
|
||||||
|
* The fact that all but one node plus QDevice may fail sound promising at
|
||||||
|
first, but this may result in a mass recovery of HA services that would
|
||||||
|
overload the single node left. Also ceph server will stop to provide
|
||||||
|
services after only '((N-1)/2)' nodes are online.
|
||||||
|
|
||||||
|
If you understand the drawbacks and implications you can decide yourself if
|
||||||
|
you should use this technology in an odd numbered cluster setup.
|
||||||
|
|
||||||
|
|
||||||
|
QDevice-Net Setup
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
We recommend to run any daemon which provides votes to corosync-qdevice as an
|
||||||
|
unprivileged user. {pve} and Debian Stretch provide a package which is
|
||||||
|
already configured to do so.
|
||||||
|
The traffic between the daemon and the cluster must be encrypted to ensure a
|
||||||
|
safe and secure QDevice integration in {pve}.
|
||||||
|
|
||||||
|
First install the 'corosync-qnetd' package on your external server and
|
||||||
|
the 'corosync-qdevice' package on all cluster nodes.
|
||||||
|
|
||||||
|
After that, ensure that all your nodes on the cluster are online.
|
||||||
|
|
||||||
|
You can now easily set up your QDevice by running the following command on one
|
||||||
|
of the {pve} nodes:
|
||||||
|
|
||||||
|
----
|
||||||
|
pve# pvecm qdevice setup <QDEVICE-IP>
|
||||||
|
----
|
||||||
|
|
||||||
|
The SSH key from the cluster will be automatically copied to the QDevice. You
|
||||||
|
might need to enter an SSH password during this step.
|
||||||
|
|
||||||
|
After you enter the password and all the steps are successfully completed, you
|
||||||
|
will see "Done". You can check the status now:
|
||||||
|
|
||||||
|
----
|
||||||
|
pve# pvecm status
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
Votequorum information
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Expected votes: 3
|
||||||
|
Highest expected: 3
|
||||||
|
Total votes: 3
|
||||||
|
Quorum: 2
|
||||||
|
Flags: Quorate Qdevice
|
||||||
|
|
||||||
|
Membership information
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Nodeid Votes Qdevice Name
|
||||||
|
0x00000001 1 A,V,NMW 192.168.22.180 (local)
|
||||||
|
0x00000002 1 A,V,NMW 192.168.22.181
|
||||||
|
0x00000000 1 Qdevice
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
which means the QDevice is set up.
|
||||||
|
|
||||||
|
|
||||||
|
Frequently Asked Questions
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Tie Breaking
|
||||||
|
^^^^^^^^^^^^
|
||||||
|
|
||||||
|
In case of a tie, where two same-sized cluster partitions cannot see each
|
||||||
|
other but the QDevice, the QDevice chooses randomly one of those partitions and
|
||||||
|
provides a vote to it.
|
||||||
|
|
||||||
|
Still TODO
|
||||||
|
^^^^^^^^^^
|
||||||
|
|
||||||
|
There ist still stuff to add here
|
||||||
|
|
||||||
|
|
||||||
Corosync Configuration
|
Corosync Configuration
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user