1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

recoverd: Stabilise the recovery master role

On rare occasions when a node that has been inactive it will trigger
an election when it becomes active again.  If that node has been up
for the longest then it will win the election and the recovery master
role will spuriously move.

While a node remains inactive we reset the priority time to discourage
it from winning elections.  The priority time will now reflect roughly
how long the node has been active rather than how long it has been up.
That means the most stable node is more likely to win elections.

Having a stable recovery master means that disabling takeover runs
while reloading IPs is more likely to succeed.  It also improves the
chances of being able to cache information in the recovery master -
for example, between takeover runs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81)
This commit is contained in:
Martin Schwenke 2013-09-17 12:00:26 +10:00 committed by Amitay Isaacs
parent 630196423a
commit 30a50c6e1e

View File

@ -3442,6 +3442,14 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
also frozen and that the recmode is set to active.
*/
if (rec->node_flags & (NODE_FLAGS_STOPPED | NODE_FLAGS_BANNED)) {
/* If this node has become inactive then we want to
* reduce the chances of it taking over the recovery
* master role when it becomes active again. This
* helps to stabilise the recovery master role so that
* it stays on the most stable node.
*/
rec->priority_time = timeval_current();
ret = ctdb_ctrl_getrecmode(ctdb, mem_ctx, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->recovery_mode);
if (ret != 0) {
DEBUG(DEBUG_ERR,(__location__ " Failed to read recmode from local node\n"));