1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-12 09:18:10 +03:00
samba-mirror/ctdb/server
Martin Schwenke 705e4174c9 ctdb-recoverd: Gently abort recovery when election is underway
Sometimes the recovery daemon fails to get the recovery lock on one
node so that node is banned.  This seems to always happen during an
election.  The recovery is triggered because other nodes are found to
have recovery mode enabled.  They have recovery mode enabled because
an election has been forced.

The recovery daemon's main_loop() only does an initial check for an
election.  After that, a node can force an election and, in the
process, set itself to be the current winner.  In this situation,
verify_recmode() will always return MONITOR_RECOVERY_NEEDED so
do_recovery() is called.  If the previous recovery master hasn't
admitted defeat and released the recovery lock, then do_recovery()
will rightly fail.  However, it would be better if it failed a little
more gracefully, since this case is not that unusual.

Instead of trying to take the recovery lock, return early with an
error if there is an election in progress.  Note that the race is
still there but it is now much narrower.

There are probably more subtle ways of avoiding this issue, including
something like this in main_loop():

-	if (pnn != rec->recmaster) {
+	if (pnn != rec->recmaster || rec->election_timeout) {
 		return;
 	}

However, this check is done earlier so it leaves the race window open
a little wider.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jul 21 06:57:07 CEST 2014 on sn-devel-104
2014-07-21 06:57:07 +02:00
..
ctdb_banning.c ctdb-recoverd: Set recovery mode before freezing databases 2014-07-07 13:29:49 +02:00
ctdb_call.c ctdb-readonly: Do not use hard-coded value for readonly revoke timeout 2014-03-31 07:20:48 +02:00
ctdb_control.c ctdb-daemon: Do not thaw databases if recovery is active 2014-07-07 13:29:50 +02:00
ctdb_daemon.c ctdb-daemon: Remove ctdbd_pid global variable 2014-07-05 06:51:13 +02:00
ctdb_event_helper.c ctdb-daemon: Reset scheduler policy for helper processes 2014-06-12 08:10:36 +02:00
ctdb_freeze.c ctdb-daemon: Do not thaw databases if recovery is active 2014-07-07 13:29:50 +02:00
ctdb_keepalive.c Remove explicit include of lib/tevent/tevent.h. 2012-04-13 17:28:14 +10:00
ctdb_lock_helper.c ctdb-daemon: Reset scheduler policy for helper processes 2014-06-12 08:10:36 +02:00
ctdb_lock.c ctdb: Fix a comment typo 2014-04-30 21:05:09 +02:00
ctdb_logging.c ctdb-logging: Move controls handling functions from common to server 2014-06-12 05:40:10 +02:00
ctdb_ltdb_server.c ctdb-daemon: Support per-node robust mutex feature 2014-07-09 06:45:17 +02:00
ctdb_monitor.c ctdb/daemon: Untangle serialisation of 1st recovery -> startup -> monitor 2014-01-17 17:59:41 +11:00
ctdb_persistent.c ctdbd: Remove transaction code related to TRANS2 commits 2013-10-04 15:20:25 +10:00
ctdb_recover.c ctdb-daemon: Do not thaw databases if recovery is active 2014-07-07 13:29:50 +02:00
ctdb_recoverd.c ctdb-recoverd: Gently abort recovery when election is underway 2014-07-21 06:57:07 +02:00
ctdb_server.c ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO 2013-10-29 17:14:56 +11:00
ctdb_serverids.c RB_TREE: Add mechanism to abort a traverse 2011-11-08 13:40:28 +11:00
ctdb_statistics.c Remove explicit include of lib/tevent/tevent.h. 2012-04-13 17:28:14 +10:00
ctdb_takeover.c ctdb-daemon: Debugging for tickle updates 2014-06-20 02:07:48 +02:00
ctdb_traverse.c traverse: Send traverse end record from traverse child process 2013-09-25 14:59:45 +10:00
ctdb_tunables.c ctdb-daemon: Support per-node robust mutex feature 2014-07-09 06:45:17 +02:00
ctdb_update_record.c ctdbd: Set process names for child processes 2013-07-10 14:33:19 +10:00
ctdb_uptime.c Remove explicit include of lib/tevent/tevent.h. 2012-04-13 17:28:14 +10:00
ctdb_vacuum.c ctdb:vacuum: always run freelist_size again 2014-06-17 09:33:10 +02:00
ctdbd.c ctdb-build: Use CTDB_ETCDIR instead of ETCDIR/ctdb 2014-06-24 07:23:13 +02:00
eventscript.c ctdb-build: Use CTDB_ETCDIR instead of ETCDIR/ctdb 2014-06-24 07:23:13 +02:00