mirror of
https://github.com/samba-team/samba.git
synced 2024-12-31 17:18:04 +03:00
1f942ec36c
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12113 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
81 lines
2.8 KiB
Plaintext
81 lines
2.8 KiB
Plaintext
Writing CTDB cluster mutex helpers
|
|
==================================
|
|
|
|
CTDB uses cluster-wide mutexes to protect against a "split brain",
|
|
which could occur if the cluster becomes partitioned due to network
|
|
failure or similar.
|
|
|
|
CTDB uses a cluster-wide mutex for its "recovery lock", which is used
|
|
to ensure that only one database recovery can happen at a time. For
|
|
an overview of recovery lock configuration see the RECOVERY LOCK
|
|
section in ctdb(7). CTDB tries to ensure correct operation of the
|
|
recovery lock by attempting to take the recovery lock when CTDB knows
|
|
that it should already be held.
|
|
|
|
By default, CTDB uses a supplied mutex helper that uses a fcntl(2)
|
|
lock on a specified file in the cluster filesystem.
|
|
|
|
However, a user supplied mutex helper can be used as an alternative.
|
|
The rest of this document describes the API for mutex helpers.
|
|
|
|
A mutex helper is an external executable
|
|
----------------------------------------
|
|
|
|
A mutex helper is an external executable that can be run by CTDB.
|
|
There are no CTDB-specific compilation dependencies. This means that
|
|
a helper could easily be scripted around existing commands. Mutex
|
|
helpers are run relatively rarely and are not time critical.
|
|
Therefore, reliability is preferred over high performance.
|
|
|
|
Taking a mutex with a helper
|
|
----------------------------
|
|
|
|
1. Helper is executed with helper-specific arguments
|
|
|
|
2. Helper attempts to take mutex
|
|
|
|
3. On success, the helper writes ASCII 0 to standard output
|
|
|
|
4. Helper stays running, holding mutex, awaiting termination by CTDB
|
|
|
|
5. When a helper receives SIGTERM it must release any mutex it is
|
|
holding and then exit.
|
|
|
|
Status codes
|
|
------------
|
|
|
|
CTDB ignores the exit code of a helper. Instead, CTDB reacts to a
|
|
single ASCII character that is sent to it via a helper's standard
|
|
output.
|
|
|
|
Valid status codes are:
|
|
|
|
0 - The helper took the mutex and is holding it, awaiting termination.
|
|
|
|
1 - The helper was unable to take the mutex due to contention.
|
|
|
|
2 - The helper took too long to take the mutex.
|
|
|
|
Helpers do not need to implement this status code. CTDB
|
|
already implements any required timeout handling.
|
|
|
|
3 - An unexpected error occurred.
|
|
|
|
If a 0 status code is sent then it the helper should periodically
|
|
check if the (original) parent processes still exists while awaiting
|
|
termination. If the parent process disappears then the helper should
|
|
release the mutex and exit. This avoids stale mutexes. Note that a
|
|
helper should never wait for parent process ID 1!
|
|
|
|
If a non-0 status code is sent then the helper can exit immediately.
|
|
However, if the helper does not exit then it must terminate if it
|
|
receives SIGTERM.
|
|
|
|
Logging
|
|
-------
|
|
|
|
Anything written to standard error by a helper is incorporated into
|
|
CTDB's logs. A helper should generally only output to stderr for
|
|
unexpected errors and avoid output to stderr on success or on mutex
|
|
contention.
|