125 lines
5.1 KiB
Plaintext
125 lines
5.1 KiB
Plaintext
|
I. Fence_xvm - the Xen virtual machine fencing agent
|
||
|
|
||
|
Fence_xvm is an agent which establishes a communications link between
|
||
|
a cluster of virtual machines (VC) and a cluster of domain0/physical
|
||
|
nodes which are hosting the virtual cluster. Its operations are
|
||
|
fairly simple.
|
||
|
|
||
|
(a) Start a listener service.
|
||
|
(b) Send a multicast packet requesting that a VM be fenced.
|
||
|
(c) Authenticate client.
|
||
|
(e) Read response.
|
||
|
(f) Exit with success/failure, depending on the response received.
|
||
|
|
||
|
If any of the above steps fail, the fencing agent exits with a failure
|
||
|
code and fencing is retried by the virtual cluster at a later time.
|
||
|
Because of the simplicty of fence_xvm, it is not necessary that
|
||
|
fence_xvm be run from within a virtualized guest - all it needs is
|
||
|
libnspr and libnss and a shared private key (for authentication; we
|
||
|
would hate to receive a false positive response from a node not in the
|
||
|
cluster!).
|
||
|
|
||
|
|
||
|
II. Fence_xvmd - The Xen virtual machine fencing host
|
||
|
|
||
|
Fence_xvmd is a daemon which runs on physical hosts (e.g. in domain0)
|
||
|
of the cluster hosting the Xen virtual cluster. It listens on a port
|
||
|
for multicast traffic from Xen virtual cluster(s), and takes actions.
|
||
|
Multiple disjoint virtual clusters can coexist on a single physical
|
||
|
host cluster, but this requires multiple instances of fence_xvmd.
|
||
|
|
||
|
NOTE: fence_xvmd *MUST* be run on ALL nodes in a given cluster which
|
||
|
will be hosting virtual machines if fence_xvm is to be used for
|
||
|
fencing!
|
||
|
|
||
|
There are a couple of ways the multicast packet is handled,
|
||
|
depending on the state of the host OS. It might be hosting the VM,
|
||
|
or it might not. Furthermore, the VM might "reside" on a host which
|
||
|
has failed.
|
||
|
|
||
|
In order to be able to guarantee safe fencing of a VM even if the
|
||
|
last- known host is down, we must store the last-known locations of
|
||
|
each virtual machine in some sort of cluster-wide way. For this, we
|
||
|
use the AIS Checkpointing API, which is provided by OpenAIS. Every
|
||
|
few seconds, fence_xvmd queries the Xen Hypervisor via libvirt and
|
||
|
stores any local VM states in a checkpoint. In the event of a
|
||
|
physical node failure (which consequently causes the failure of one
|
||
|
or more Xen guests), we can then read the checkpoint section
|
||
|
corresponding to the guest we need to fence to find out the previous
|
||
|
owner. With that information, we can then check with CMAN to see if
|
||
|
the last-known host node has been fenced. If so, then the VM is
|
||
|
clean as well. The physical cluster must, therefore, have fencing
|
||
|
in order for fence_xvmd to work.
|
||
|
|
||
|
Operation of a node hosting a VM which needs to be fenced:
|
||
|
|
||
|
(a) Receive multicast packet
|
||
|
(b) Authenticate multicast packet
|
||
|
(c) Open connection to host contained within multicast
|
||
|
packet.
|
||
|
(d) Authenticate server.
|
||
|
(e) Carry out fencing operation (e.g. call libvirt to destroy or
|
||
|
reboot the VM; there is no "on" method at this point).
|
||
|
(f) If operation succeeds, send success response.
|
||
|
|
||
|
Operation of high-node-ID:
|
||
|
|
||
|
(a) Receive multicast packet
|
||
|
(b) Authenticate multicast packet
|
||
|
(c) Read VM state from checkpoint
|
||
|
(d) Check liveliness of nodeID hosting VM (if alive, do nothing)
|
||
|
(e) Open connection to host contained within multicast
|
||
|
packet.
|
||
|
(f) Check with CMAN to see if last-known host has been fenced.
|
||
|
(g) If last-known host has been fenced, send success response.
|
||
|
(h) Authenticate server & send response.
|
||
|
|
||
|
NOTE: There is always a possibility that a VM is started again
|
||
|
before the fencing operation and checkpoint update for that VM
|
||
|
occurs. If the VM has booted and rejoined the cluster, fencing will
|
||
|
not be necessary. If it is in the process of booting, but has not
|
||
|
yet joined the cluster, fencing will also not be necessary - because
|
||
|
it will not be using cluster resources yet.
|
||
|
|
||
|
|
||
|
III. Security considerations
|
||
|
|
||
|
While fencing is generally expected to run on a more or less trusted
|
||
|
network, there are cases where it may not be.
|
||
|
|
||
|
* The multicast packet is subject to replay attacks, but because no
|
||
|
fencing action is taken based solely on the information contained
|
||
|
within the packet, this should not allow an attacker to maliciously
|
||
|
fence a VM from outside the cluster, though it may be possible to
|
||
|
cause a DoS of fence_xvmd if enough multicast packets are sent.
|
||
|
|
||
|
* The only currently supported authentication mechanisms are simple
|
||
|
challenge-response based on a shared private key and pseudorandom
|
||
|
number generation.
|
||
|
|
||
|
* An attacker with access to the shared key(s) can easily fence any
|
||
|
known VM, even if they are not on a cluster node.
|
||
|
|
||
|
* Different shared keys should be used for different virtual
|
||
|
clusters on the same subnet (whether in the same physical cluster
|
||
|
or not). Additionally, multiple fence_xvmd instances must be run
|
||
|
(each listening on a different multicast IP + port combination).
|
||
|
|
||
|
IV. Configuration
|
||
|
|
||
|
Generate a random key file. An example of how to generate it is:
|
||
|
|
||
|
dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4096 count=1
|
||
|
|
||
|
Distribute the generated key file to all domUs in a cluster as well
|
||
|
as all dom0s which will be hosting that particular cluster of domUs.
|
||
|
The key should not be placed on shared file systems (because shared
|
||
|
file systems require the cluster, which requires fencing...).
|
||
|
|
||
|
Start fence_xvmd on all dom0s
|
||
|
|
||
|
Configure fence_xvm on the domU cluster...
|
||
|
|
||
|
rest...tbd
|
||
|
|