1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-06 13:18:07 +03:00
Commit Graph

587 Commits

Author SHA1 Message Date
Martin Schwenke
a53b264aee ctdb-recoverd: Use talloc() to allocate recovery lock handle
At the moment this is still local and is freed after the mutex is
successfully taken.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
af22f03dbe ctdb-recoverd: Rename hold_reclock_state to ctdb_recovery_lock_handle
This will be a longer lived structure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c516e58ce9 ctdb-recoverd: Re-check master on failure to take recovery lock
If the master changed while trying to take the lock then fail gracefully.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
59fc01646c ctdb-recoverd: Clean up taking of recovery lock
No functional changes, just coding style cleanups and debug message
tweaks.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
929634126a ctdb-config: Switch tunable DisableIPFailover to a config option
Use the "failover:disabled" option instead.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
914e9f22d8 ctdb-daemon: Pass DisableIPFailover tunable via environment variable
Preparation for obsoleting this tunable.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
b318cf22ba ctdb-recoverd: Set the process name correctly
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-07-02 08:51:22 +02:00
Martin Schwenke
57834c64be ctdb-common: Rename system utility files
system_socket.[ch] will contain all the raw socket code and other
functions that use ctdb_sock_addr.  system.[ch] will contain other
platform dependent functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-07-02 08:51:20 +02:00
Amitay Isaacs
6e588913dd ctdb-recoverd: Abort recovery/takeover if recmaster changes
Recovery and takeover are run via helper from recovery daemon.  While the
helpers are running, it's possible for the current node to lose election.
If that happens, abort the currently running recovery/takeover helper.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-09-12 12:23:19 +02:00
Amitay Isaacs
1f7f112317 ctdb-client: Fix ctdb_attach() to use database flags
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 25 13:32:58 CEST 2017 on sn-devel-144
2017-08-25 13:32:58 +02:00
Amitay Isaacs
9987fe7209 ctdb-client: Optionally return database id from ctdb_ctrl_createdb()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-08-25 09:41:26 +02:00
Amitay Isaacs
4bd0a20a75 ctdb-client: Fix ctdb_ctrl_createdb() to use database flags
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-08-25 09:41:25 +02:00
Amitay Isaacs
ea91967b0d ctdb-client: Drop tdb_flags argument to ctdb_attach()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-26 15:47:24 +02:00
Amitay Isaacs
ea46699b27 ctdb-recovery: Do not run local ip verification when in recovery
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

If we drop public IPs because CTDB is in recovery for too long, then
avoid spamming logs "Trigger takeoverrun" every second.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Amitay Isaacs
2fd2ccd4c8 ctdb-recovery: Get recmode unconditionally in the main_loop
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

This can be used later in the main_loop to avoid the local ip check.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Chris Lamb
f7dc9f1e12 Correct "supressed" typo.
Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Garming Sam <garming@catalyst.net.nz>
2017-02-22 08:26:21 +01:00
Martin Schwenke
f2485d3ab9 ctdb-recoverd: Integrate takeover helper
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-12-19 04:07:08 +01:00
Martin Schwenke
5b60414265 ctdb-recoverd: Generalise helper state, handler and launching
These can also be used for takeover handler.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-12-19 04:07:08 +01:00
Amitay Isaacs
41c964fdbc ctdb-recovery: Start recovery helper with ctdb_vfork_exec
The recovery helper does it's own logging, so there is no need to
pass logfd.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec  5 11:59:42 CET 2016 on sn-devel-144
2016-12-05 11:59:42 +01:00
Amitay Isaacs
d53dbd0dcc ctdb-daemon: Initialize logging in recovery daemon
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-12-05 08:09:22 +01:00
Amitay Isaacs
74ccc7280a ctdb-recoverd: Log a message when terminating
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-12-05 08:09:22 +01:00
Amitay Isaacs
3d6860b275 ctdb-daemon: Remove setting of debug_extra from switch_from_server_to_client()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-12-05 08:09:22 +01:00
Martin Schwenke
bdc049dfce ctdb-common: Drop CTDB's copy of sys_read() and sys_write()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 29 11:22:40 CET 2016 on sn-devel-144
2016-11-29 11:22:40 +01:00
Amitay Isaacs
2a9584dc0a ctdb-daemon: Remove unused code cmdline.[ch]
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-11-25 04:19:23 +01:00
Amitay Isaacs
67351e61ee ctdb-recoverd: Drop code to freeze databases from set_recovery_mode()
This function is called only once from force_election() and does not
require freezing of databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-09-14 08:39:28 +02:00
Martin Schwenke
abe5445c24 ctdb-recoverd: Don't directly release rogue IP addresses
This is inconsistent with the rest of the local IP verification.  It
should notice problems but not try to fix them directly.  Like other
cases, it should use an IP takeover run to try to fix the problem.  In
this case the address might have just been added and an out-of-band
RELEASE_IP might cause conflicts (i.e. "another change is in flight")
with a scheduled IP takeover run.

This effectively reverts commit
694c1b269e.  Not sure why this was
needed after c7e648c2d1.  More recently
commit 6471541d6d moves responsibility
for determining interface/netmask to 10.interface so this should
continue to work just fine.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-08-17 23:00:26 +02:00
Amitay Isaacs
6693fa59dc ctdb-recoverd: Remove code that updates database priorities during recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-07-25 21:29:42 +02:00
Amitay Isaacs
9338443a92 ctdb-recovery: Remove serial database recovery code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-07-25 21:29:42 +02:00
Martin Schwenke
a26d39e5ce ctdb-recoverd: Drop code to change the IP assignment tree
The tree is no longer used in verification.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-07-04 15:42:24 +02:00
Martin Schwenke
35644d0d82 ctdb-ipalloc: Drop remote IP verification
It is only run during a takeover run and only logs errors.  It doesn't
actually do anything to fix potential errors.  The takeover run should
fix any inconsistencies anyway.

Instead, leave a comment in the recovery daemon's monitoring loop to
add proper remote IP verification later.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-07-04 15:42:24 +02:00
Amitay Isaacs
ecb74721e7 ctdb-recoverd: Avoid duplicate recoverd event in parallel recovery
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11956

In do_recovery, after the recovery and takeover is complete, recoverd
event is triggered.  When the parallel database recovery was separated,
ctdb_recovery_helper implemented sending END_RECOVERY control which
causes recoverd event to be triggered.  So when there is parallel database
recovery, recoverd event is triggered twice.

Instead move the call to run_recovered_eventscript() explicitly in
the serial recovery code path.  This avoids the duplication trigger of
recoverd event.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-06-08 10:33:19 +02:00
Martin Schwenke
174449c1e0 ctdb-recoverd: Release recovery lock on exit
The recovery lock helper must exit when it notices its parent is gone.
However, that can take a few seconds.

The usual way of terminating the recovery daemon is for the main ctdbd
to send it a SIGTERM.  Installing a handler is nice and simple.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
75717ac667 ctdb-recoverd: Add handler for lost recovery lock
If the process holding the recovery lock terminates unexpectedly then
the recovery daemon needs to know that the lock is no longer held.

While here, rename hold_reclock_handler() to take_reclock_handler() so
there is a clear difference between the two handler names.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
95a7920d22 ctdb-cluster-mutex: Register an extra handler for when mutex is lost
Pass NULL if not needed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
4f0ca0107c ctdb-cluster-mutex: ctdb_cluster_mutex() registers handler and private data
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
145ddcbe37 ctdb-cluster-mutex: Drop cluster_mutex_handler() ctdb and handle arguments
This makes the API more general.  If they are needed in a handler then
they can be in the private data.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
a192364a12 ctdb-recoverd: Simplify reclock handler
Do the interesting work outside the handler.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
197264dfe7 ctdb-recoverd: Recovery lock handle should be in recovery deamon context
This shouldn't be in the CTDB context.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
5c4744e69d ctdb-cluster-mutex: Pass a talloc context to allocate the handle off
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
58be187de0 ctdb-recoverd: No need to reset reclock handler
It won't be called more than once by the cluster mutex code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
630f169653 ctdb-recoverd: Fix buggy function return on memory allocation failure
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
dbd4e67aee ctdb-recoverd: Don't expose internal cluster mutex status
Just expose whether the lock was taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
fdd214ce6a ctdb-daemon: Rename recovery lock file to just recovery lock
It isn't necessarily a file.

Don't bother changing the control, since it doesn't pervade the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
1127f3ae1e ctdb-recovery: Don't update recovery lock from daemon
It can't change after startup.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
23823f128f ctdb-recovery: Don't sync recovery lock across cluster
Support for updating the recovery lock is being removed because it
isn't possible to recover from failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Amitay Isaacs
f8141e91a6 ctdb-recoverd: Freeze databases whenever the node is INACTIVE
If the node becomes stopped or banned after recovery is marked
active, then it will never freeze the databases, and hence the
node will keep banning itself indefinitely, until ctdbd is restarted.

This is a regression from 4.3, introduced with

b4357a79d9

and

d8f3b490bb

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11945

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Jun  1 17:36:12 CEST 2016 on sn-devel-144
2016-06-01 17:36:12 +02:00
Martin Schwenke
f9d4cb4c29 ctdb-recoverd: Unify takeover run triggering code in main loop
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri May 13 17:15:57 CEST 2016 on sn-devel-144
2016-05-13 17:15:57 +02:00
Martin Schwenke
e3e4f37c41 ctdb-recoverd: Add early return in srvid_requests_reply()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-13 13:47:17 +02:00
Martin Schwenke
ebbeab74ed ctdb-recoverd: Drop an unnecessary log message
do_takeover_run() will logs something at NOTICE level anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-13 13:47:17 +02:00
Martin Schwenke
2a93b8423b ctdb-recoverd: Move takeover run checks after recover checks
If a recovery is going to be done then this will be followed by a
takeover run anyway.  So, there's no use doing the takeover run
checks, potentially doing a takeover run and then doing a recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-13 13:47:17 +02:00
Martin Schwenke
662f06de9f ctdb-recoverd: Drop explicit check to flag takeover run needed
The recovery daemon should be less involved in the service monitoring
logic.

The cases handled here are already handled elsewhere:

* When a node becomes unhealthy/healthy the monitoring code will
  trigger a takeover run

* When a node is disabled/enabled the ctdb CLI tool will trigger a
  takeover run

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-13 13:47:17 +02:00
Martin Schwenke
9dc3b117e2 ctdb-takeover: Recovery daemon no longer passes fail callback
Banning is now handled by the takeover code sending banning credit
messages.

This commit makes a change in behaviour quite obvious.  Takeover runs
were initiated from several locations in the code but banning was only
done from one of these locations.  Now banning can be done from any
failed takeover run.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-13 13:47:17 +02:00
Martin Schwenke
866ca591d4 ctdb-recoverd: Fold IP allocation house-keeping into IP verification
Now all the IP takeover code for non-master node is in this function.
The function can always be renamed to something more suitable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri May  6 15:10:59 CEST 2016 on sn-devel-144
2016-05-06 15:10:59 +02:00
Martin Schwenke
4947789b2a ctdb-recoverd: Clean up local IP verification
Update log levels and messages, comments and wrapping of long lines.
No functional changes.

Note that interfaces_have_changed() already does adequate logging.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
bdcc796f3c ctdb-recoverd: Skip known IP address checking when it is disabled
When public IP checking is disabled, verify_local_ip_allocation()
still retrieves known IP addresses and runs through a loop that does
nothing.

Instead, completely skip the retrieval and checking loop.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
fc4cbf5528 ctdb-recoverd: Check that IP failover is active in IP verification
This makes verify_local_ip_allocation() self-contained and simplifies
main_loop().

Due to indentation changes, this commit is most easily read when
ignoring whitespace.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
ff28cbb73d ctdb-recoverd: Call election when necessary in recovery master validation
There is no need to return one of several states and then trigger an
election for one of those return states.  Have the recovery master
validation trigger the election directly and just return whether
monitoring should continue.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
e8c33aa24a ctdb-recoverd: Simplify return values when updating local flags
Change this to return just 0 or -1.  It isn't monitoring anything.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
0a9401ff0e ctdb-recoverd: Drop unreachable code
update_local_flags() never returns MONITOR_ELECTION_NEEDED, so drop
this entire if-statement.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
721f64511c ctdb-recovery: Move recovery lock latency updating to handler
The cluster mutex code already passes the latency and expects the
handler to update the statistics.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
bcb838ba1e ctdb-recovery: Move recovery lock functions to recovery daemon code
ctdb_recovery_have_lock(), ctdb_recovery_lock(),
ctdb_recovery_unlock() are only used by recovery daemon, so move them
there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Amitay Isaacs
ae366fb932 ctdb-recoverd: Add message handler to assigning banning credits
This will be called from recovery helper to assign banning credits to
misbehaving node.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-03-25 03:26:16 +01:00
Martin Schwenke
c9e69a4b2e ctdb-recoverd: Drop use of DeferredRebalanceOnNodeAdd tunable
If set, this was used to setup an IP takeover run on a timer after
certain updates to the public IP address configuration (e.g. "ctdb
addip").

However, "ctdb reloadips" completely manages public IP reconfiguration
and avoids the anomalies that DeferredRebalanceOnNodeAdd was
introduced to work around.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-03-10 03:34:19 +01:00
Amitay Isaacs
19a411f839 ctdb-recovery: Create recovery databases in state dir
This matches the behaviour during serial database recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 11 08:01:14 CET 2016 on sn-devel-144
2016-02-11 08:01:14 +01:00
Martin Schwenke
56ce230de7 ctdb-recoverd: Fix some uninitialised memory issues
The first element of these structures is a 32-bit PNN.  On 64-bit
systems this field can be followed by 32-bits of padding.  When the
structures are copied this can cause uninitialised memory to be
copied.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2016-01-12 19:16:17 +01:00
Martin Schwenke
bd7c94d5ac ctdb-recoverd: Drop function unban_all_nodes()
It hasn't worked since commit cda5f02c7c
in 2009, which reworked the banning code.  Since then
ctdb_control_modflags() has contained a comment saying:

  /* we don't let other nodes modify our BANNED status */

Unbanning all nodes originally occurred here when the recovery master
role moved to a new node.  The logic could have been meant for the
case when the old recovery master was malfunctioning, so got banned.
If any other nodes had been banned by this recovery master then they
would be unbanned.  However, this would also unban the old recovery
master, which is probably suboptimal.  The logic would also trigger if
a node was banned for a good reason and then the recovery master was
stopped.  So, apart from doing nothing, the logic is too simplistic so
might as well be removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-12-04 09:17:17 +01:00
Christof Schmitt
03b27bd139 ctdb: Use prctl_set_comment from lib/util
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-18 04:05:13 +01:00
Martin Schwenke
44bf7c2a12 ctdb-recoverd: Factor out recovery master validation
Starting to untangle cluster management, database recovery and public
IP allocation.  This is a non-trivial subset of the cluster management
code that runs in the recovery daemon on all nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Nov 16 11:47:45 CET 2015 on sn-devel-104
2015-11-16 11:47:44 +01:00
Martin Schwenke
e44957fc8b ctdb-recmaster: Update capabilities before calling first election
Capabilities are used when computing an election result so having them
up-to-date seems like a good idea.

Also update several instances of an ambiguous comment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
c5e50a474b ctdb-recoverd: Move VNN map retrieval to where it is needed
The VNN map is only needed on the recovery master, so no need for all
recovery daemons to retrieve it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
d1f996a50f ctdb-recoverd: Drop explicit check for recovery lock
This is already handled in update_recovery_lock(), which is called
immediately before.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
1499f3e301 ctdb-recoverd: Simplify using TALLOC_FREE()
The only non-obvious part here is dropping the setting of the nodemap
local variable to NULL.  If the following control succeeds then it is
set, otherwise return and it doesn't matter.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
050e64b647 ctdb-recoverd: Clarify that recmaster is being set on the current node
That is, using CTDB_CURRENT_NODE makes this more obvious.

Also fix incorrect error messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
0833e478c3 ctdb-recoverd: Do not sanity check recovery master with local daemon
Each recovery daemon knows who the recmaster is and is in sync with
its local daemon.  The recovery master is running this check so do not
bother checking with its local daemon - both agree that it is the
recovery master.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
d8decd0b1d ctdb-recoverd: Don't retrieve recovery master from local daemon
The recovery daemon already knows which node is the master.  This
relies on rec->recmaster being correctly initialised and correctly set
during elections.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:12 +01:00
Martin Schwenke
e90cab7073 ctdb-recoverd: Explicitly set initial recovery master to unknown
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:11 +01:00
Martin Schwenke
018077f3b0 ctdb-recoverd: Do not set recovery master during recovery
Recovery should not do cluster management functions.  Setting the
recovery master should only be done via an election.

Main loop will determine if recovery master is inconsistent across the
cluster and force an election if necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:11 +01:00
Martin Schwenke
4b37cc7cf6 ctdb-recoverd: Have recovery daemon remember election result
The recovery daemon pushes knowledge of recovery master election
progress/result to local daemon.  It then retrieves that information
again.

Instead, have the recovery daemon reliably track election
progress/result in rec->recmaster so it doesn't need to be retrieved.
Be careful to maintain consistency by only doing this when the local
daemon has been updated.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:11 +01:00
Martin Schwenke
6f8837528f ctdb-recoverd: Clarify recovery master validation logic
There can be no holes in the nodemap.  Even if a node has been deleted
it will take a slot in the nodemap.  The only exception is that the
nodemap shrinks if nodes are deleted from the end.  That should never
include the master because a node should be shutdown before being
deleted, and an election should already have take place.

To avoid walking off the end of the nodemap nodes array just confirm
that the master node's PNN is a valid index into the array.  No need
to walk through the nodemap.

After this, in this section of the code j is now invalid.  So use the
master's PNN to index into the nodemap.  This is safe.

In the process, clean up some log messages to avoid saying "Force
reelection".  It's just an "election".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-16 08:42:11 +01:00
Amitay Isaacs
f50db5cba5 ctdb-server: Replace ctdb_logging.h with common/logging.h
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-11-16 00:46:15 +01:00
Martin Schwenke
0886637a5e ctdb-recoverd: Reload remote IPs as part of takeover run
This is currently done before each IP takeover run, so just factor it
in.

ctdb_reload_remote_public_ips() becomes static.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Nov 12 09:28:45 CET 2015 on sn-devel-104
2015-11-12 09:28:45 +01:00
Martin Schwenke
8cdae3ade6 ctdb-recoverd: Move ctdb_reload_remote_public_ips() to ctdb_takeover.c
This will help to untangle known and available public IP lists from
the CTDB context.

verify_remote_ip_allocation() needs a forward declaration.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
c37e3c05b0 ctdb-recoverd: Remote IP validation can't cause a takeover run
Remote IP validation is only called when a takeover run is about to
happen anyway, so don't bother flagging one.  Given that a takeover
run isn't being triggered, also drop the test that checks if takeover
runs are disabled.  These are the only uses of the rec argument, so
drop it.

One possible further simplification would be to remove this function
because it doesn't accomplish anything.  However, it is worth leaving
it as a reminder that remote IP validation should be done properly at
some time in the future.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
8b7b153cf6 ctdb-recoverd: Drop culprit argument from ctdb_reload_remote_public_ips()
It is only used by the caller to print a message that includes the
culprit.  However, ctdb_reload_remote_public_ips() already prints
perfectly good messages and they include the culprit.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
1d7d5abd31 ctdb-recoverd: Trigger takeover run after rebalance timeout
No need to do it immediately.  It will happen in less than a second.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
a7e8687b6d ctdb-recoverd: Remove unnecessary assignments of need_takeover_run
do_takeover_run() unsets this if it succeeds.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
d608862c5a ctdb-recoverd: Do not run recovery-related events around IP takeover
This is not a recovery, so do not run "startrecovery and "recovered"
events.  There are other IP takeover runs where these are not run.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
bd0befa529 ctdb-recoverd: Drop some sanity checking in local IP verification
The recovery start/end times used in the checks at the top of
verify_local_ip_allocation() are set by the START_RECOVERY and
END_RECOVERY controls.  A couple of takeover runs escape the checks
because they were added later and are not surrounded by these
controls.

Recovery and IP allocation need to be untangled from each other, so
recovery-related events should not be relied on for IP allocation.
This means the solution is not to add these where they are "missing".

The concern that the checks are addressing is to avoid local IP
verification when IP addresses are in a state of flux.  Takeover runs
on non-master nodes are already disabled while a takeover run is in
progress, so local IP verification is already skipped in that case.
The other case is the master node, which will be busy with the
takeover run, rather than running main_loop().

The other issue is races.  verify_local_ip_allocation() takes a
non-zero amount of time to fetch IP addresses from the local CTDB
daemon and during this time a recovery or takeover run can start, but
a takeover run can still be triggered.  The current tests do not stop
this.

Apart from all of this, with most reasonable public IP address
configurations, an extra takeover run will be a no-op so is not a
cause for concern.

It is safe to drop these checks.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Martin Schwenke
ceb0988e14 ctdb-recoverd: Simplify using TALLOC_FREE()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-12 06:24:15 +01:00
Mathieu Parent
c315fce17e Fix various spelling errors
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Fri Nov  6 13:43:45 CET 2015 on sn-devel-104
2015-11-06 13:43:45 +01:00
Amitay Isaacs
38d92788d6 ctdb-include: Use new protocol definitions
This gets rid of the duplicate definitions from ctdb_protocol.h.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:16 +01:00
Amitay Isaacs
44e611ddcf ctdb-daemon: Rename struct ctdb_control_get_ifaces to ctdb_iface_list_old
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:16 +01:00
Amitay Isaacs
c4e9d616ae ctdb-daemon: Rename struct ctdb_control_iface_info to ctdb_iface
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
417077c8a7 ctdb-daemon: Rename struct ctdb_control_transdb to ctdb_transdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
e34afd8516 ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
cf1ac77b3a ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
d4de4527b0 ctdb-daemon: Rename struct ctdb_ban_time to ctdb_ban_state
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
645cd43200 ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old
Match struct ctdb_dbid as per protocol/protocol.h

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
b99436e425 ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:14 +01:00
Amitay Isaacs
04eaa077aa ctdb-daemon: Rename struct ctdb_all_public_ips to ctdb_public_ip_list_old
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:14 +01:00
Amitay Isaacs
afc5d8a442 ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:14 +01:00
Amitay Isaacs
4647787773 ctdb-daemon: Separate prototypes for common client/server functions
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
01c6c90e98 ctdb-daemon: Remove dependency on includes.h
Instead of includes.h, include the required header files explicitly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
2fdb332fad ctdb-daemon: Stop using tevent compatibility definitions
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
7084cb92e2 ctdb-include: Move include/internal/cmdline.h to common/
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
b900adc55c ctdb-daemon: Separate prototypes for system specific functions
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Volker Lendecke
d527ab1094 ctdbd: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-10-14 02:19:14 +02:00
Amitay Isaacs
0101748287 ctdb-recoverd: Always check for recmaster before doing recovery
Recovery daemon checks if it is the recovery master before performing
certain checks.  During those checks it's possible that re-election can
change the recmaster.  In such a case, the recovery daemon should never
do a database recovery.

This is not complete fix since the recovery master can still change
while the recovery is going on.  The correct fix is to abort recovery
if the recovery master changes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Oct  7 17:55:05 CEST 2015 on sn-devel-104
2015-10-07 17:55:05 +02:00
Amitay Isaacs
3cf93d9136 ctdb-recoverd: Get rid of connected-ness comparison in election
The reason for favouring more connected node is to create a larger
cluster in case of a split brain.  In split brain condition, the nodes
are not communicating across partitions and each partition will run its
own election.  Among all the partitions, the node which holds the recovery
lock will eventually "win".  All the other nodes which won election but
could not grab recovery lock will end up banning themselves.

This also prevents the recovery master role from bouncing between nodes
during startup when the entire cluster is restarted.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
fbd9c9fd2f ctdb-recoverd: Do not freeze databases for election
If election occurs during SMB activity, then trying to freeze all the
databases can cause samba/ctdb deadlock which parallel database recovery
is trying to avoid.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
e6ff36506c ctdb-recoverd: Add code for parallel database recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
4b39a7706f ctdb-recoverd: Update flags on all nodes before database recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
9843363629 ctdb-recoverd: Update capabilities before the database recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
14cacd2925 ctdb-recovery: Factor out existing database recovery code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
62f1e2579a ctdb-daemon: Replace ctdb_message with srvid abstraction
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:28 +02:00
Amitay Isaacs
1df2594386 ctdb-daemon: Introduce per database generation
The database generation for each database is updated only during recovery.
After recovery is complete the database generation would be the same as
the global generation.

The database generation is required for parallel database recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Amitay Isaacs
4f155e77a8 ctdb-daemon: Rename ctdb_control_wipe_database to ctdb_control_transdb
The same structure is required in new controls for database transactions.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Martin Schwenke
b234ae0a90 ctdb-recoverd: Clear IP assignment tree on election loss
If a node was previously recovery master (say, 20 years ago) and it
becomes recovery master again then, if IP assignments have changed,
verify_remote_ip_allocation() can produce messages like the following
when called during recovery:

  ctdbd: recoverd:Inconsistent IP allocation - node 0 thinks 10.1.1.1 is held by node 0 while it is assigned to node 1

When a node loses an election it should clear all data specific to it
being the recovery master.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-01 04:18:28 +02:00
Amitay Isaacs
941669ae36 ctdb-recovered: Drop unused variable
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-06-05 11:28:23 +02:00
Amitay Isaacs
2e2dba8d13 ctdb-recoverd/vacuum: Remove vacuum_info structure
For all the records listed in VACUUM_FETCH, migration requests are sent
immediately without waiting.  This means there can only be a single
VACUUM_FETCH processing active at a time.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-06-05 11:28:23 +02:00
Michael Adam
92d1486b87 ctdb-recoverd/vacuum: move fetch loop back into fetch handler.
With the processing of one element factored out,
it is more natural to have the actual loop inside the
handler function. This also makes the talloc/free
bracked around the loop more obvious.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
4103463ad2 ctdb-recoverd/vacuum: slightly reorder the vacuum fetch loop
Reads more naturally this way, imho.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
a1c941be6f ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
9092617888 ctdb-recoverd/vacuum: factor vacuum_fetch_process_one out of vacuum_fetch_loop
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
84ab6d003a ctdb-recoverd/vacuum: move two variables into scope.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
9e5cf6fd5c ctdb-recoverd/vacuum: remove unneeded prototype.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Martin Schwenke
91f99ddfb3 ctdb-recoverd: Remove redundant condition when checking recovery lock
It isn't possible to hold the recovery lock without having a lock file
set.

This is part of a goal to generalise the recovery lock mechanism to
just use a helper program, which may use a lock file or may use
something else.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
a45ab7d1fe ctdb-recoverd: Simplify using TALLOC_FREE()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
2c72c9de48 ctdb-recoverd: Drop redundant condition in election handler
Election packets from the current node are ignored at the beginning of
the function, so this does not need to be checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
c75fdf208f ctdb-recoverd: Remove unused memory context variable
It is set, memory is allocated but it is never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
4b4ba77f4a ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master
Databases are only pulled by the recovery master, so it can compare
with current node PNN.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
6415edfa26 ctdb-recoverd: Rename some local variables to avoid conflict with convention
rec is always a (struct ctdb_recoverd *)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
36fc620898 ctdb_recoverd: Move num_lmasters calculation to near where it is used
Unless this node is the recovery master then this is not needed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
1fd2d3886c ctdb-recoverd: Make num_lmasters a local variable
It isn't used anywhere else and is always re-initialised to 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
385e9326ea ctdb-recoverd: Remove unused struct members num_active and num_connected
They are initialised and updated but the values are never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
c3d6678dbc ctdb-recoverd: Use capabilities API
Simplify update_capabilities() using the capabilities API and store
the capabilities in new field rec->caps rather than scattered around
ctdb->nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
85bd9a33eb ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled
The potential resulting recovery won't run anyway.  Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ee9619c28b ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES
Also add test stub support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
2ca484cd50 ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
108db3396f ctdb-recoverd: Add slightly more abstraction for disabling takeover runs
Factor out new function srvid_disable_and_reply(), which can be
re-used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ec32d9bea8 ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
281f7e8152 ctdb-recoverd: Use a goto for do_recovery() failures
This will allow extra things to be done on failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
a2044c65bc ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
55b246195b ctdb-recoverd: Add a new abstraction ctdb_op_disable()
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.

Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
48c91407ab ctdb-recoverd: Don't release and re-take the recovery lock
Just continue to hold it, otherwise a broken node might win an
election and grab the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
1d6ed91f55 ctdb-recoverd: Simplify ctdb_recovery_lock()
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).

This means that when it is called we need to keep the old behaviour
and explicitly release the lock.  In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
be19a17faf ctdb-recoverd: Remove check_recovery_lock()
This has not done anything useful since commit
b9d8bb23af.  Instead, just check that
the lock is held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
668ed53662 ctdb-recoverd: Improve logging when recovery lock file is changed
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
db32a2bce5 ctdb-recoverd: New function ctdb_recovery_unlock()
Unlock the recovery lock file.  This way knowledge of the file
descriptor isn't sprinkled throughout the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
72701be663 ctdb-recoverd: New function ctdb_recovery_have_lock()
True if this recovery daemon holds the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00