mirror of
https://github.com/samba-team/samba.git
synced 2024-12-24 21:34:56 +03:00
f667ff6485
Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Thu Apr 28 13:18:07 CEST 2016 on sn-devel-144
80 lines
2.7 KiB
Plaintext
80 lines
2.7 KiB
Plaintext
Writing CTDB cluster mutex helpers
|
|
==================================
|
|
|
|
CTDB uses cluster-wide mutexes to protect against a "split brain",
|
|
which could occur if the cluster becomes partitioned due to network
|
|
failure or similar.
|
|
|
|
CTDB uses a cluster-wide mutex for its "recovery lock", which is used
|
|
to ensure that only one database recovery can happen at a time. For
|
|
an overview of recovery lock configuration see the RECOVERY LOCK
|
|
section in ctdb(7). CTDB tries to ensure correct operation of the
|
|
recovery lock by attempting to take the recovery lock when CTDB knows
|
|
that it should already be held.
|
|
|
|
By default, CTDB uses a supplied mutex helper that uses a fcntl(2)
|
|
lock on a specified file in the cluster filesystem.
|
|
|
|
However, a user supplied mutex helper can be used as an alternative.
|
|
The rest of this document describes the API for mutex helpers.
|
|
|
|
A mutex helper is an external executable
|
|
----------------------------------------
|
|
|
|
A mutex helper is an external executable that can be run by CTDB.
|
|
There are no CTDB-specific compilation dependencies. This means that
|
|
a helper could easily be scripted around existing commands. Mutex
|
|
helpers are run relatively rarely and are not time critical.
|
|
Therefore, reliability is preferred over high performance.
|
|
|
|
Taking a mutex with a helper
|
|
----------------------------
|
|
|
|
1. Helper is executed with helper-specific arguments
|
|
|
|
2. Helper attempts to take mutex
|
|
|
|
3. On success, the helper writes ASCII 0 to standard output
|
|
|
|
4. Helper stays running, holding mutex, awaiting termination by CTDB
|
|
|
|
5. When a helper receives SIGTERM it must release any mutex it is
|
|
holding and then exit.
|
|
|
|
Status codes
|
|
------------
|
|
|
|
CTDB ignores the exit code of a helper. Instead, CTDB reacts to a
|
|
single ASCII character that is sent to it via a helper's standard
|
|
output.
|
|
|
|
Valid status codes are:
|
|
|
|
0 - The helper took the mutex and is holding it, awaiting termination.
|
|
|
|
1 - The helper was unable to take the mutex due to contention.
|
|
|
|
2 - The helper took too long to take the mutex.
|
|
|
|
Helpers do not need to implement this status code. CTDB
|
|
already implements any required timeout handling.
|
|
|
|
3 - An unexpected error occurred.
|
|
|
|
If a 0 status code is sent then it the helper should periodically
|
|
check if the (original) parent processes still exists while awaiting
|
|
termination. If the parent process disappears then the helper should
|
|
release the mutex and exit. This avoids stale mutexes.
|
|
|
|
If a non-0 status code is sent then the helper can exit immediately.
|
|
However, if the helper does not exit then it must terminate if it
|
|
receives SIGTERM.
|
|
|
|
Logging
|
|
-------
|
|
|
|
Anything written to standard error by a helper is incorporated into
|
|
CTDB's logs. A helper should generally only output to stderr for
|
|
unexpected errors and avoid output to stderr on success or on mutex
|
|
contention.
|