samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-06 13:18:07 +03:00

Author	SHA1	Message	Date
Martin Schwenke	a53b264aee	ctdb-recoverd: Use talloc() to allocate recovery lock handle At the moment this is still local and is freed after the mutex is successfully taken. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	af22f03dbe	ctdb-recoverd: Rename hold_reclock_state to ctdb_recovery_lock_handle This will be a longer lived structure. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	c516e58ce9	ctdb-recoverd: Re-check master on failure to take recovery lock If the master changed while trying to take the lock then fail gracefully. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	59fc01646c	ctdb-recoverd: Clean up taking of recovery lock No functional changes, just coding style cleanups and debug message tweaks. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	929634126a	ctdb-config: Switch tunable DisableIPFailover to a config option Use the "failover:disabled" option instead. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-08-24 10:59:21 +02:00
Martin Schwenke	914e9f22d8	ctdb-daemon: Pass DisableIPFailover tunable via environment variable Preparation for obsoleting this tunable. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-08-24 10:59:21 +02:00
Martin Schwenke	b318cf22ba	ctdb-recoverd: Set the process name correctly Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-07-02 08:51:22 +02:00
Martin Schwenke	57834c64be	ctdb-common: Rename system utility files system_socket.[ch] will contain all the raw socket code and other functions that use ctdb_sock_addr. system.[ch] will contain other platform dependent functions. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-07-02 08:51:20 +02:00
Amitay Isaacs	6e588913dd	ctdb-recoverd: Abort recovery/takeover if recmaster changes Recovery and takeover are run via helper from recovery daemon. While the helpers are running, it's possible for the current node to lose election. If that happens, abort the currently running recovery/takeover helper. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-09-12 12:23:19 +02:00
Amitay Isaacs	1f7f112317	ctdb-client: Fix ctdb_attach() to use database flags BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Aug 25 13:32:58 CEST 2017 on sn-devel-144	2017-08-25 13:32:58 +02:00
Amitay Isaacs	9987fe7209	ctdb-client: Optionally return database id from ctdb_ctrl_createdb() BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-08-25 09:41:26 +02:00
Amitay Isaacs	4bd0a20a75	ctdb-client: Fix ctdb_ctrl_createdb() to use database flags BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-08-25 09:41:25 +02:00
Amitay Isaacs	ea91967b0d	ctdb-client: Drop tdb_flags argument to ctdb_attach() Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-26 15:47:24 +02:00
Amitay Isaacs	ea46699b27	ctdb-recovery: Do not run local ip verification when in recovery BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857 If we drop public IPs because CTDB is in recovery for too long, then avoid spamming logs "Trigger takeoverrun" every second. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-24 10:28:21 +02:00
Amitay Isaacs	2fd2ccd4c8	ctdb-recovery: Get recmode unconditionally in the main_loop BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857 This can be used later in the main_loop to avoid the local ip check. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-24 10:28:21 +02:00
Chris Lamb	f7dc9f1e12	Correct "supressed" typo. Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Garming Sam <garming@catalyst.net.nz>	2017-02-22 08:26:21 +01:00
Martin Schwenke	f2485d3ab9	ctdb-recoverd: Integrate takeover helper Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-12-19 04:07:08 +01:00
Martin Schwenke	5b60414265	ctdb-recoverd: Generalise helper state, handler and launching These can also be used for takeover handler. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-12-19 04:07:08 +01:00
Amitay Isaacs	41c964fdbc	ctdb-recovery: Start recovery helper with ctdb_vfork_exec The recovery helper does it's own logging, so there is no need to pass logfd. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Mon Dec 5 11:59:42 CET 2016 on sn-devel-144	2016-12-05 11:59:42 +01:00
Amitay Isaacs	d53dbd0dcc	ctdb-daemon: Initialize logging in recovery daemon Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-12-05 08:09:22 +01:00
Amitay Isaacs	74ccc7280a	ctdb-recoverd: Log a message when terminating Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-12-05 08:09:22 +01:00
Amitay Isaacs	3d6860b275	ctdb-daemon: Remove setting of debug_extra from switch_from_server_to_client() Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-12-05 08:09:22 +01:00
Martin Schwenke	bdc049dfce	ctdb-common: Drop CTDB's copy of sys_read() and sys_write() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Nov 29 11:22:40 CET 2016 on sn-devel-144	2016-11-29 11:22:40 +01:00
Amitay Isaacs	2a9584dc0a	ctdb-daemon: Remove unused code cmdline.[ch] Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-11-25 04:19:23 +01:00
Amitay Isaacs	67351e61ee	ctdb-recoverd: Drop code to freeze databases from set_recovery_mode() This function is called only once from force_election() and does not require freezing of databases. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-09-14 08:39:28 +02:00
Martin Schwenke	abe5445c24	ctdb-recoverd: Don't directly release rogue IP addresses This is inconsistent with the rest of the local IP verification. It should notice problems but not try to fix them directly. Like other cases, it should use an IP takeover run to try to fix the problem. In this case the address might have just been added and an out-of-band RELEASE_IP might cause conflicts (i.e. "another change is in flight") with a scheduled IP takeover run. This effectively reverts commit `694c1b269e`. Not sure why this was needed after `c7e648c2d1`. More recently commit `6471541d6d` moves responsibility for determining interface/netmask to 10.interface so this should continue to work just fine. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-08-17 23:00:26 +02:00
Amitay Isaacs	6693fa59dc	ctdb-recoverd: Remove code that updates database priorities during recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-07-25 21:29:42 +02:00
Amitay Isaacs	9338443a92	ctdb-recovery: Remove serial database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-07-25 21:29:42 +02:00
Martin Schwenke	a26d39e5ce	ctdb-recoverd: Drop code to change the IP assignment tree The tree is no longer used in verification. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-07-04 15:42:24 +02:00
Martin Schwenke	35644d0d82	ctdb-ipalloc: Drop remote IP verification It is only run during a takeover run and only logs errors. It doesn't actually do anything to fix potential errors. The takeover run should fix any inconsistencies anyway. Instead, leave a comment in the recovery daemon's monitoring loop to add proper remote IP verification later. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-07-04 15:42:24 +02:00
Amitay Isaacs	ecb74721e7	ctdb-recoverd: Avoid duplicate recoverd event in parallel recovery BUG: https://bugzilla.samba.org/show_bug.cgi?id=11956 In do_recovery, after the recovery and takeover is complete, recoverd event is triggered. When the parallel database recovery was separated, ctdb_recovery_helper implemented sending END_RECOVERY control which causes recoverd event to be triggered. So when there is parallel database recovery, recoverd event is triggered twice. Instead move the call to run_recovered_eventscript() explicitly in the serial recovery code path. This avoids the duplication trigger of recoverd event. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-06-08 10:33:19 +02:00
Martin Schwenke	174449c1e0	ctdb-recoverd: Release recovery lock on exit The recovery lock helper must exit when it notices its parent is gone. However, that can take a few seconds. The usual way of terminating the recovery daemon is for the main ctdbd to send it a SIGTERM. Installing a handler is nice and simple. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	75717ac667	ctdb-recoverd: Add handler for lost recovery lock If the process holding the recovery lock terminates unexpectedly then the recovery daemon needs to know that the lock is no longer held. While here, rename hold_reclock_handler() to take_reclock_handler() so there is a clear difference between the two handler names. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	95a7920d22	ctdb-cluster-mutex: Register an extra handler for when mutex is lost Pass NULL if not needed. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	4f0ca0107c	ctdb-cluster-mutex: ctdb_cluster_mutex() registers handler and private data Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	145ddcbe37	ctdb-cluster-mutex: Drop cluster_mutex_handler() ctdb and handle arguments This makes the API more general. If they are needed in a handler then they can be in the private data. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	a192364a12	ctdb-recoverd: Simplify reclock handler Do the interesting work outside the handler. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	197264dfe7	ctdb-recoverd: Recovery lock handle should be in recovery deamon context This shouldn't be in the CTDB context. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00
Martin Schwenke	5c4744e69d	ctdb-cluster-mutex: Pass a talloc context to allocate the handle off Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Martin Schwenke	58be187de0	ctdb-recoverd: No need to reset reclock handler It won't be called more than once by the cluster mutex code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Martin Schwenke	630f169653	ctdb-recoverd: Fix buggy function return on memory allocation failure Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Martin Schwenke	dbd4e67aee	ctdb-recoverd: Don't expose internal cluster mutex status Just expose whether the lock was taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Martin Schwenke	fdd214ce6a	ctdb-daemon: Rename recovery lock file to just recovery lock It isn't necessarily a file. Don't bother changing the control, since it doesn't pervade the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Martin Schwenke	1127f3ae1e	ctdb-recovery: Don't update recovery lock from daemon It can't change after startup. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Martin Schwenke	23823f128f	ctdb-recovery: Don't sync recovery lock across cluster Support for updating the recovery lock is being removed because it isn't possible to recover from failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:28 +02:00
Amitay Isaacs	f8141e91a6	ctdb-recoverd: Freeze databases whenever the node is INACTIVE If the node becomes stopped or banned after recovery is marked active, then it will never freeze the databases, and hence the node will keep banning itself indefinitely, until ctdbd is restarted. This is a regression from 4.3, introduced with `b4357a79d9` and `d8f3b490bb` BUG: https://bugzilla.samba.org/show_bug.cgi?id=11945 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Wed Jun 1 17:36:12 CEST 2016 on sn-devel-144	2016-06-01 17:36:12 +02:00
Martin Schwenke	f9d4cb4c29	ctdb-recoverd: Unify takeover run triggering code in main loop Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri May 13 17:15:57 CEST 2016 on sn-devel-144	2016-05-13 17:15:57 +02:00
Martin Schwenke	e3e4f37c41	ctdb-recoverd: Add early return in srvid_requests_reply() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-13 13:47:17 +02:00
Martin Schwenke	ebbeab74ed	ctdb-recoverd: Drop an unnecessary log message do_takeover_run() will logs something at NOTICE level anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-13 13:47:17 +02:00
Martin Schwenke	2a93b8423b	ctdb-recoverd: Move takeover run checks after recover checks If a recovery is going to be done then this will be followed by a takeover run anyway. So, there's no use doing the takeover run checks, potentially doing a takeover run and then doing a recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-13 13:47:17 +02:00
Martin Schwenke	662f06de9f	ctdb-recoverd: Drop explicit check to flag takeover run needed The recovery daemon should be less involved in the service monitoring logic. The cases handled here are already handled elsewhere: * When a node becomes unhealthy/healthy the monitoring code will trigger a takeover run * When a node is disabled/enabled the ctdb CLI tool will trigger a takeover run Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-13 13:47:17 +02:00
Martin Schwenke	9dc3b117e2	ctdb-takeover: Recovery daemon no longer passes fail callback Banning is now handled by the takeover code sending banning credit messages. This commit makes a change in behaviour quite obvious. Takeover runs were initiated from several locations in the code but banning was only done from one of these locations. Now banning can be done from any failed takeover run. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-13 13:47:17 +02:00
Martin Schwenke	866ca591d4	ctdb-recoverd: Fold IP allocation house-keeping into IP verification Now all the IP takeover code for non-master node is in this function. The function can always be renamed to something more suitable. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri May 6 15:10:59 CEST 2016 on sn-devel-144	2016-05-06 15:10:59 +02:00
Martin Schwenke	4947789b2a	ctdb-recoverd: Clean up local IP verification Update log levels and messages, comments and wrapping of long lines. No functional changes. Note that interfaces_have_changed() already does adequate logging. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-06 11:39:09 +02:00
Martin Schwenke	bdcc796f3c	ctdb-recoverd: Skip known IP address checking when it is disabled When public IP checking is disabled, verify_local_ip_allocation() still retrieves known IP addresses and runs through a loop that does nothing. Instead, completely skip the retrieval and checking loop. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-06 11:39:09 +02:00
Martin Schwenke	fc4cbf5528	ctdb-recoverd: Check that IP failover is active in IP verification This makes verify_local_ip_allocation() self-contained and simplifies main_loop(). Due to indentation changes, this commit is most easily read when ignoring whitespace. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-06 11:39:09 +02:00
Martin Schwenke	ff28cbb73d	ctdb-recoverd: Call election when necessary in recovery master validation There is no need to return one of several states and then trigger an election for one of those return states. Have the recovery master validation trigger the election directly and just return whether monitoring should continue. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-06 11:39:09 +02:00
Martin Schwenke	e8c33aa24a	ctdb-recoverd: Simplify return values when updating local flags Change this to return just 0 or -1. It isn't monitoring anything. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-06 11:39:09 +02:00
Martin Schwenke	0a9401ff0e	ctdb-recoverd: Drop unreachable code update_local_flags() never returns MONITOR_ELECTION_NEEDED, so drop this entire if-statement. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-05-06 11:39:09 +02:00
Martin Schwenke	721f64511c	ctdb-recovery: Move recovery lock latency updating to handler The cluster mutex code already passes the latency and expects the handler to update the statistics. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-04-28 09:39:17 +02:00
Martin Schwenke	bcb838ba1e	ctdb-recovery: Move recovery lock functions to recovery daemon code ctdb_recovery_have_lock(), ctdb_recovery_lock(), ctdb_recovery_unlock() are only used by recovery daemon, so move them there. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-04-28 09:39:17 +02:00
Amitay Isaacs	ae366fb932	ctdb-recoverd: Add message handler to assigning banning credits This will be called from recovery helper to assign banning credits to misbehaving node. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:16 +01:00
Martin Schwenke	c9e69a4b2e	ctdb-recoverd: Drop use of DeferredRebalanceOnNodeAdd tunable If set, this was used to setup an IP takeover run on a timer after certain updates to the public IP address configuration (e.g. "ctdb addip"). However, "ctdb reloadips" completely manages public IP reconfiguration and avoids the anomalies that DeferredRebalanceOnNodeAdd was introduced to work around. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-03-10 03:34:19 +01:00
Amitay Isaacs	19a411f839	ctdb-recovery: Create recovery databases in state dir This matches the behaviour during serial database recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Feb 11 08:01:14 CET 2016 on sn-devel-144	2016-02-11 08:01:14 +01:00
Martin Schwenke	56ce230de7	ctdb-recoverd: Fix some uninitialised memory issues The first element of these structures is a 32-bit PNN. On 64-bit systems this field can be followed by 32-bits of padding. When the structures are copied this can cause uninitialised memory to be copied. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Michael Adam <obnox@samba.org>	2016-01-12 19:16:17 +01:00
Martin Schwenke	bd7c94d5ac	ctdb-recoverd: Drop function unban_all_nodes() It hasn't worked since commit `cda5f02c7c` in 2009, which reworked the banning code. Since then ctdb_control_modflags() has contained a comment saying: /* we don't let other nodes modify our BANNED status */ Unbanning all nodes originally occurred here when the recovery master role moved to a new node. The logic could have been meant for the case when the old recovery master was malfunctioning, so got banned. If any other nodes had been banned by this recovery master then they would be unbanned. However, this would also unban the old recovery master, which is probably suboptimal. The logic would also trigger if a node was banned for a good reason and then the recovery master was stopped. So, apart from doing nothing, the logic is too simplistic so might as well be removed. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-12-04 09:17:17 +01:00
Christof Schmitt	03b27bd139	ctdb: Use prctl_set_comment from lib/util Signed-off-by: Christof Schmitt <cs@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-18 04:05:13 +01:00
Martin Schwenke	44bf7c2a12	ctdb-recoverd: Factor out recovery master validation Starting to untangle cluster management, database recovery and public IP allocation. This is a non-trivial subset of the cluster management code that runs in the recovery daemon on all nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Nov 16 11:47:45 CET 2015 on sn-devel-104	2015-11-16 11:47:44 +01:00
Martin Schwenke	e44957fc8b	ctdb-recmaster: Update capabilities before calling first election Capabilities are used when computing an election result so having them up-to-date seems like a good idea. Also update several instances of an ambiguous comment. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	c5e50a474b	ctdb-recoverd: Move VNN map retrieval to where it is needed The VNN map is only needed on the recovery master, so no need for all recovery daemons to retrieve it. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	d1f996a50f	ctdb-recoverd: Drop explicit check for recovery lock This is already handled in update_recovery_lock(), which is called immediately before. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	1499f3e301	ctdb-recoverd: Simplify using TALLOC_FREE() The only non-obvious part here is dropping the setting of the nodemap local variable to NULL. If the following control succeeds then it is set, otherwise return and it doesn't matter. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	050e64b647	ctdb-recoverd: Clarify that recmaster is being set on the current node That is, using CTDB_CURRENT_NODE makes this more obvious. Also fix incorrect error messages. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	0833e478c3	ctdb-recoverd: Do not sanity check recovery master with local daemon Each recovery daemon knows who the recmaster is and is in sync with its local daemon. The recovery master is running this check so do not bother checking with its local daemon - both agree that it is the recovery master. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	d8decd0b1d	ctdb-recoverd: Don't retrieve recovery master from local daemon The recovery daemon already knows which node is the master. This relies on rec->recmaster being correctly initialised and correctly set during elections. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:12 +01:00
Martin Schwenke	e90cab7073	ctdb-recoverd: Explicitly set initial recovery master to unknown Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:11 +01:00
Martin Schwenke	018077f3b0	ctdb-recoverd: Do not set recovery master during recovery Recovery should not do cluster management functions. Setting the recovery master should only be done via an election. Main loop will determine if recovery master is inconsistent across the cluster and force an election if necessary. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:11 +01:00
Martin Schwenke	4b37cc7cf6	ctdb-recoverd: Have recovery daemon remember election result The recovery daemon pushes knowledge of recovery master election progress/result to local daemon. It then retrieves that information again. Instead, have the recovery daemon reliably track election progress/result in rec->recmaster so it doesn't need to be retrieved. Be careful to maintain consistency by only doing this when the local daemon has been updated. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:11 +01:00
Martin Schwenke	6f8837528f	ctdb-recoverd: Clarify recovery master validation logic There can be no holes in the nodemap. Even if a node has been deleted it will take a slot in the nodemap. The only exception is that the nodemap shrinks if nodes are deleted from the end. That should never include the master because a node should be shutdown before being deleted, and an election should already have take place. To avoid walking off the end of the nodemap nodes array just confirm that the master node's PNN is a valid index into the array. No need to walk through the nodemap. After this, in this section of the code j is now invalid. So use the master's PNN to index into the nodemap. This is safe. In the process, clean up some log messages to avoid saying "Force reelection". It's just an "election". Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:11 +01:00
Amitay Isaacs	f50db5cba5	ctdb-server: Replace ctdb_logging.h with common/logging.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Michael Adam <obnox@samba.org>	2015-11-16 00:46:15 +01:00
Martin Schwenke	0886637a5e	ctdb-recoverd: Reload remote IPs as part of takeover run This is currently done before each IP takeover run, so just factor it in. ctdb_reload_remote_public_ips() becomes static. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Thu Nov 12 09:28:45 CET 2015 on sn-devel-104	2015-11-12 09:28:45 +01:00
Martin Schwenke	8cdae3ade6	ctdb-recoverd: Move ctdb_reload_remote_public_ips() to ctdb_takeover.c This will help to untangle known and available public IP lists from the CTDB context. verify_remote_ip_allocation() needs a forward declaration. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	c37e3c05b0	ctdb-recoverd: Remote IP validation can't cause a takeover run Remote IP validation is only called when a takeover run is about to happen anyway, so don't bother flagging one. Given that a takeover run isn't being triggered, also drop the test that checks if takeover runs are disabled. These are the only uses of the rec argument, so drop it. One possible further simplification would be to remove this function because it doesn't accomplish anything. However, it is worth leaving it as a reminder that remote IP validation should be done properly at some time in the future. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	8b7b153cf6	ctdb-recoverd: Drop culprit argument from ctdb_reload_remote_public_ips() It is only used by the caller to print a message that includes the culprit. However, ctdb_reload_remote_public_ips() already prints perfectly good messages and they include the culprit. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	1d7d5abd31	ctdb-recoverd: Trigger takeover run after rebalance timeout No need to do it immediately. It will happen in less than a second. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	a7e8687b6d	ctdb-recoverd: Remove unnecessary assignments of need_takeover_run do_takeover_run() unsets this if it succeeds. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	d608862c5a	ctdb-recoverd: Do not run recovery-related events around IP takeover This is not a recovery, so do not run "startrecovery and "recovered" events. There are other IP takeover runs where these are not run. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	bd0befa529	ctdb-recoverd: Drop some sanity checking in local IP verification The recovery start/end times used in the checks at the top of verify_local_ip_allocation() are set by the START_RECOVERY and END_RECOVERY controls. A couple of takeover runs escape the checks because they were added later and are not surrounded by these controls. Recovery and IP allocation need to be untangled from each other, so recovery-related events should not be relied on for IP allocation. This means the solution is not to add these where they are "missing". The concern that the checks are addressing is to avoid local IP verification when IP addresses are in a state of flux. Takeover runs on non-master nodes are already disabled while a takeover run is in progress, so local IP verification is already skipped in that case. The other case is the master node, which will be busy with the takeover run, rather than running main_loop(). The other issue is races. verify_local_ip_allocation() takes a non-zero amount of time to fetch IP addresses from the local CTDB daemon and during this time a recovery or takeover run can start, but a takeover run can still be triggered. The current tests do not stop this. Apart from all of this, with most reasonable public IP address configurations, an extra takeover run will be a no-op so is not a cause for concern. It is safe to drop these checks. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Martin Schwenke	ceb0988e14	ctdb-recoverd: Simplify using TALLOC_FREE() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-12 06:24:15 +01:00
Mathieu Parent	c315fce17e	Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104	2015-11-06 13:43:45 +01:00
Amitay Isaacs	38d92788d6	ctdb-include: Use new protocol definitions This gets rid of the duplicate definitions from ctdb_protocol.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:16 +01:00
Amitay Isaacs	44e611ddcf	ctdb-daemon: Rename struct ctdb_control_get_ifaces to ctdb_iface_list_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:16 +01:00
Amitay Isaacs	c4e9d616ae	ctdb-daemon: Rename struct ctdb_control_iface_info to ctdb_iface Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	417077c8a7	ctdb-daemon: Rename struct ctdb_control_transdb to ctdb_transdb Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	e34afd8516	ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	cf1ac77b3a	ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	d4de4527b0	ctdb-daemon: Rename struct ctdb_ban_time to ctdb_ban_state Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	645cd43200	ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	b99436e425	ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:14 +01:00
Amitay Isaacs	04eaa077aa	ctdb-daemon: Rename struct ctdb_all_public_ips to ctdb_public_ip_list_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:14 +01:00
Amitay Isaacs	afc5d8a442	ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:14 +01:00
Amitay Isaacs	4647787773	ctdb-daemon: Separate prototypes for common client/server functions This groups function prototypes for common client/server functions in common/common.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	01c6c90e98	ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	2fdb332fad	ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	7084cb92e2	ctdb-include: Move include/internal/cmdline.h to common/ Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	b900adc55c	ctdb-daemon: Separate prototypes for system specific functions This groups function prototypes for system specific functions in common/system.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Volker Lendecke	d527ab1094	ctdbd: Fix a typo Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2015-10-14 02:19:14 +02:00
Amitay Isaacs	0101748287	ctdb-recoverd: Always check for recmaster before doing recovery Recovery daemon checks if it is the recovery master before performing certain checks. During those checks it's possible that re-election can change the recmaster. In such a case, the recovery daemon should never do a database recovery. This is not complete fix since the recovery master can still change while the recovery is going on. The correct fix is to abort recovery if the recovery master changes. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Oct 7 17:55:05 CEST 2015 on sn-devel-104	2015-10-07 17:55:05 +02:00
Amitay Isaacs	3cf93d9136	ctdb-recoverd: Get rid of connected-ness comparison in election The reason for favouring more connected node is to create a larger cluster in case of a split brain. In split brain condition, the nodes are not communicating across partitions and each partition will run its own election. Among all the partitions, the node which holds the recovery lock will eventually "win". All the other nodes which won election but could not grab recovery lock will end up banning themselves. This also prevents the recovery master role from bouncing between nodes during startup when the entire cluster is restarted. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:29 +02:00
Amitay Isaacs	fbd9c9fd2f	ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:29 +02:00
Amitay Isaacs	e6ff36506c	ctdb-recoverd: Add code for parallel database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:29 +02:00
Amitay Isaacs	4b39a7706f	ctdb-recoverd: Update flags on all nodes before database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:29 +02:00
Amitay Isaacs	9843363629	ctdb-recoverd: Update capabilities before the database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:29 +02:00
Amitay Isaacs	14cacd2925	ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:29 +02:00
Amitay Isaacs	62f1e2579a	ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:28 +02:00
Amitay Isaacs	1df2594386	ctdb-daemon: Introduce per database generation The database generation for each database is updated only during recovery. After recovery is complete the database generation would be the same as the global generation. The database generation is required for parallel database recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:27 +02:00
Amitay Isaacs	4f155e77a8	ctdb-daemon: Rename ctdb_control_wipe_database to ctdb_control_transdb The same structure is required in new controls for database transactions. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:27 +02:00
Martin Schwenke	b234ae0a90	ctdb-recoverd: Clear IP assignment tree on election loss If a node was previously recovery master (say, 20 years ago) and it becomes recovery master again then, if IP assignments have changed, verify_remote_ip_allocation() can produce messages like the following when called during recovery: ctdbd: recoverd:Inconsistent IP allocation - node 0 thinks 10.1.1.1 is held by node 0 while it is assigned to node 1 When a node loses an election it should clear all data specific to it being the recovery master. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-07-01 04:18:28 +02:00
Amitay Isaacs	941669ae36	ctdb-recovered: Drop unused variable Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>	2015-06-05 11:28:23 +02:00
Amitay Isaacs	2e2dba8d13	ctdb-recoverd/vacuum: Remove vacuum_info structure For all the records listed in VACUUM_FETCH, migration requests are sent immediately without waiting. This means there can only be a single VACUUM_FETCH processing active at a time. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>	2015-06-05 11:28:23 +02:00
Michael Adam	92d1486b87	ctdb-recoverd/vacuum: move fetch loop back into fetch handler. With the processing of one element factored out, it is more natural to have the actual loop inside the handler function. This also makes the talloc/free bracked around the loop more obvious. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-06-05 11:28:23 +02:00
Michael Adam	4103463ad2	ctdb-recoverd/vacuum: slightly reorder the vacuum fetch loop Reads more naturally this way, imho. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-06-05 11:28:23 +02:00
Michael Adam	a1c941be6f	ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-06-05 11:28:23 +02:00
Michael Adam	9092617888	ctdb-recoverd/vacuum: factor vacuum_fetch_process_one out of vacuum_fetch_loop Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-06-05 11:28:23 +02:00
Michael Adam	84ab6d003a	ctdb-recoverd/vacuum: move two variables into scope. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-06-05 11:28:23 +02:00
Michael Adam	9e5cf6fd5c	ctdb-recoverd/vacuum: remove unneeded prototype. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-06-05 11:28:23 +02:00
Martin Schwenke	91f99ddfb3	ctdb-recoverd: Remove redundant condition when checking recovery lock It isn't possible to hold the recovery lock without having a lock file set. This is part of a goal to generalise the recovery lock mechanism to just use a helper program, which may use a lock file or may use something else. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	a45ab7d1fe	ctdb-recoverd: Simplify using TALLOC_FREE() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	2c72c9de48	ctdb-recoverd: Drop redundant condition in election handler Election packets from the current node are ignored at the beginning of the function, so this does not need to be checked. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	c75fdf208f	ctdb-recoverd: Remove unused memory context variable It is set, memory is allocated but it is never used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	4b4ba77f4a	ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	6415edfa26	ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	36fc620898	ctdb_recoverd: Move num_lmasters calculation to near where it is used Unless this node is the recovery master then this is not needed. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	1fd2d3886c	ctdb-recoverd: Make num_lmasters a local variable It isn't used anywhere else and is always re-initialised to 0. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	385e9326ea	ctdb-recoverd: Remove unused struct members num_active and num_connected They are initialised and updated but the values are never used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	c3d6678dbc	ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Martin Schwenke	85bd9a33eb	ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled The potential resulting recovery won't run anyway. Also recoveries may have been disabled by "reloadnodes" and if the nodemaps are inconsistent between nodes then avoid triggering an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:13 +02:00
Martin Schwenke	ee9619c28b	ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES Also add test stub support. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:13 +02:00
Martin Schwenke	2ca484cd50	ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:13 +02:00
Martin Schwenke	108db3396f	ctdb-recoverd: Add slightly more abstraction for disabling takeover runs Factor out new function srvid_disable_and_reply(), which can be re-used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:13 +02:00
Martin Schwenke	ec32d9bea8	ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:12 +02:00
Martin Schwenke	281f7e8152	ctdb-recoverd: Use a goto for do_recovery() failures This will allow extra things to be done on failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:12 +02:00
Martin Schwenke	a2044c65bc	ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:12 +02:00
Martin Schwenke	55b246195b	ctdb-recoverd: Add a new abstraction ctdb_op_disable() This can be used to disable and re-enable an operation, and do all the relevant sanity checking. Most of this is from existing functions disable_takeover_runs_handler(), clear_takeover_runs_disable() and reenable_takeover_runs(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:12 +02:00
Martin Schwenke	48c91407ab	ctdb-recoverd: Don't release and re-take the recovery lock Just continue to hold it, otherwise a broken node might win an election and grab the lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	1d6ed91f55	ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	be19a17faf	ctdb-recoverd: Remove check_recovery_lock() This has not done anything useful since commit `b9d8bb23af`. Instead, just check that the lock is held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	668ed53662	ctdb-recoverd: Improve logging when recovery lock file is changed Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	db32a2bce5	ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	72701be663	ctdb-recoverd: New function ctdb_recovery_have_lock() True if this recovery daemon holds the lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00

1 2 3 4 5 ...

587 Commits