samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Martin Schwenke	5d655ac6f2	ctdb-recoverd: Only check for LMASTER nodes in the VNN map BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-21 11:50:30 +00:00
Martin Schwenke	6fe963c3f7	ctdb-recoverd: Periodically log recovery master of incomplete cluster Only do this if the recovery lock is unset. Log every minute for the first 10 minutes, then every 10 minutes, then every hour. This is useful for determining whether a split brain occurred. It is particularly useful if logging failed or was throttled at startup, so there is no evidence of the split brain when it began. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:16 +00:00
Martin Schwenke	f2559ef8ce	ctdb-recoverd: Log the master at the end of elections Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:16 +00:00
Martin Schwenke	35368d871d	ctdb-recovery: Avoid -1 as a PNN, use CTDB_UNKNOWN_PNN instead Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	978c7dbd55	ctdb-recovery: Fix signed/unsigned comparison by casting Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	fa7bd35b6a	ctdb-recovery: Fix signed/unsigned comparisons by declaring as unsigned Simple cases where variables need to be declared as an unsigned type instead of an int. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	6a2941e2a9	ctdb-recoverd: Fix memory leak state is always freed before exiting this function, so allocate fde off it instead of long-lived ctdb context. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-05-14 07:25:37 +00:00
Martin Schwenke	13a1a48089	ctdb-recoverd: Time out attempt to take recovery lock after 120s Currently this will wait forever. It really needs a timeout in case the cluster filesystem (or other lock mechanism) is completely wedged. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:17 +01:00
Martin Schwenke	45a77d65b2	ctdb-recoverd: Ban node on unknown error when taking recovery lock We really shouldn't see unknown errors. They probably represent a misconfigured recovery lock or similar. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:17 +01:00
Martin Schwenke	c0fb62ed39	ctdb-recoverd: Make recoverd context available in recovery lock handle BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:16 +01:00
Martin Schwenke	7e4aae6943	ctdb-recoverd: Clean up logging on failure to take recovery lock Add an explicit case for a timeout and clean up the other messages. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:16 +01:00
Martin Schwenke	621658cbed	ctdb-recoverd: Free cluster mutex handler on failure to take lock If nested events occur while the file descriptor handler is still active then chaos can ensue. For example, if a node is banned and the lock is explicitly cancelled (e.g. due to election loss) then double-talloc-free()s abound. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:16 +01:00
Martin Schwenke	da8aaf2aee	ctdb-recoverd: Call an election when the recovery lock is lost The lock may have been lost due to a failure in the underlying locking mechanism. This could be due to quorum loss or similar. It is best to call an election to confirm that this node should still be master. At worst, the node will reelect itself, fail to take the lock and then ban itself. This is a suitable outcome for a node that has been partitioned from others in the cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-12-18 02:02:03 +01:00
Andreas Schneider	2d512b278e	debug: Use debuglevel_(get\|set) function Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org> Autobuild-Date(master): Thu Nov 8 11:03:11 CET 2018 on sn-devel-144	2018-11-08 11:03:11 +01:00
Martin Schwenke	486022ef8f	ctdb-recoverd: Set recovery lock handle at start of attempt This allows the attempt to be cancelled if an election is lost and an unlock is done before the attempt is completed. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144	2018-09-18 02:18:30 +02:00
Martin Schwenke	b1dc568784	ctdb-recoverd: Handle cancellation when releasing recovery lock If the recovery lock is in the process of being taken then free the cluster mutex handle but leave the recovery lock handle in place. This allows ctdb_recovery_lock() to fail. Note that this isn't yet live because rec->recovery_lock_handle is still only set at the completion of the attempt to take the lock. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	a755d060c1	ctdb-recoverd: Return early when the recovery lock is not held This makes upcoming changes simpler. Update to modern debug macro while touching relevant line. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	c52216740b	ctdb-recoverd: Store recovery lock handle ... not just cluster mutex handle. This makes the recovery lock handle long-lived and with allow the releasing code to cancel an in-progress attempt to take the recovery lock. The cluster mutex handle is now allocated off the recovery lock handle. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	a53b264aee	ctdb-recoverd: Use talloc() to allocate recovery lock handle At the moment this is still local and is freed after the mutex is successfully taken. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	af22f03dbe	ctdb-recoverd: Rename hold_reclock_state to ctdb_recovery_lock_handle This will be a longer lived structure. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	c516e58ce9	ctdb-recoverd: Re-check master on failure to take recovery lock If the master changed while trying to take the lock then fail gracefully. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	59fc01646c	ctdb-recoverd: Clean up taking of recovery lock No functional changes, just coding style cleanups and debug message tweaks. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	929634126a	ctdb-config: Switch tunable DisableIPFailover to a config option Use the "failover:disabled" option instead. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-08-24 10:59:21 +02:00
Martin Schwenke	914e9f22d8	ctdb-daemon: Pass DisableIPFailover tunable via environment variable Preparation for obsoleting this tunable. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-08-24 10:59:21 +02:00
Martin Schwenke	b318cf22ba	ctdb-recoverd: Set the process name correctly Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-07-02 08:51:22 +02:00
Martin Schwenke	57834c64be	ctdb-common: Rename system utility files system_socket.[ch] will contain all the raw socket code and other functions that use ctdb_sock_addr. system.[ch] will contain other platform dependent functions. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-07-02 08:51:20 +02:00
Amitay Isaacs	6e588913dd	ctdb-recoverd: Abort recovery/takeover if recmaster changes Recovery and takeover are run via helper from recovery daemon. While the helpers are running, it's possible for the current node to lose election. If that happens, abort the currently running recovery/takeover helper. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-09-12 12:23:19 +02:00
Amitay Isaacs	1f7f112317	ctdb-client: Fix ctdb_attach() to use database flags BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Aug 25 13:32:58 CEST 2017 on sn-devel-144	2017-08-25 13:32:58 +02:00
Amitay Isaacs	9987fe7209	ctdb-client: Optionally return database id from ctdb_ctrl_createdb() BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-08-25 09:41:26 +02:00
Amitay Isaacs	4bd0a20a75	ctdb-client: Fix ctdb_ctrl_createdb() to use database flags BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-08-25 09:41:25 +02:00
Amitay Isaacs	ea91967b0d	ctdb-client: Drop tdb_flags argument to ctdb_attach() Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-26 15:47:24 +02:00
Amitay Isaacs	ea46699b27	ctdb-recovery: Do not run local ip verification when in recovery BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857 If we drop public IPs because CTDB is in recovery for too long, then avoid spamming logs "Trigger takeoverrun" every second. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-24 10:28:21 +02:00
Amitay Isaacs	2fd2ccd4c8	ctdb-recovery: Get recmode unconditionally in the main_loop BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857 This can be used later in the main_loop to avoid the local ip check. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-24 10:28:21 +02:00
Chris Lamb	f7dc9f1e12	Correct "supressed" typo. Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Garming Sam <garming@catalyst.net.nz>	2017-02-22 08:26:21 +01:00
Martin Schwenke	f2485d3ab9	ctdb-recoverd: Integrate takeover helper Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-12-19 04:07:08 +01:00
Martin Schwenke	5b60414265	ctdb-recoverd: Generalise helper state, handler and launching These can also be used for takeover handler. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-12-19 04:07:08 +01:00
Amitay Isaacs	41c964fdbc	ctdb-recovery: Start recovery helper with ctdb_vfork_exec The recovery helper does it's own logging, so there is no need to pass logfd. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Mon Dec 5 11:59:42 CET 2016 on sn-devel-144	2016-12-05 11:59:42 +01:00
Amitay Isaacs	d53dbd0dcc	ctdb-daemon: Initialize logging in recovery daemon Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-12-05 08:09:22 +01:00
Amitay Isaacs	74ccc7280a	ctdb-recoverd: Log a message when terminating Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-12-05 08:09:22 +01:00
Amitay Isaacs	3d6860b275	ctdb-daemon: Remove setting of debug_extra from switch_from_server_to_client() Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-12-05 08:09:22 +01:00
Martin Schwenke	bdc049dfce	ctdb-common: Drop CTDB's copy of sys_read() and sys_write() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Nov 29 11:22:40 CET 2016 on sn-devel-144	2016-11-29 11:22:40 +01:00
Amitay Isaacs	2a9584dc0a	ctdb-daemon: Remove unused code cmdline.[ch] Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-11-25 04:19:23 +01:00
Amitay Isaacs	67351e61ee	ctdb-recoverd: Drop code to freeze databases from set_recovery_mode() This function is called only once from force_election() and does not require freezing of databases. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-09-14 08:39:28 +02:00
Martin Schwenke	abe5445c24	ctdb-recoverd: Don't directly release rogue IP addresses This is inconsistent with the rest of the local IP verification. It should notice problems but not try to fix them directly. Like other cases, it should use an IP takeover run to try to fix the problem. In this case the address might have just been added and an out-of-band RELEASE_IP might cause conflicts (i.e. "another change is in flight") with a scheduled IP takeover run. This effectively reverts commit `694c1b269e`. Not sure why this was needed after `c7e648c2d1`. More recently commit `6471541d6d` moves responsibility for determining interface/netmask to 10.interface so this should continue to work just fine. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-08-17 23:00:26 +02:00
Amitay Isaacs	6693fa59dc	ctdb-recoverd: Remove code that updates database priorities during recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-07-25 21:29:42 +02:00
Amitay Isaacs	9338443a92	ctdb-recovery: Remove serial database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-07-25 21:29:42 +02:00
Martin Schwenke	a26d39e5ce	ctdb-recoverd: Drop code to change the IP assignment tree The tree is no longer used in verification. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-07-04 15:42:24 +02:00
Martin Schwenke	35644d0d82	ctdb-ipalloc: Drop remote IP verification It is only run during a takeover run and only logs errors. It doesn't actually do anything to fix potential errors. The takeover run should fix any inconsistencies anyway. Instead, leave a comment in the recovery daemon's monitoring loop to add proper remote IP verification later. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-07-04 15:42:24 +02:00
Amitay Isaacs	ecb74721e7	ctdb-recoverd: Avoid duplicate recoverd event in parallel recovery BUG: https://bugzilla.samba.org/show_bug.cgi?id=11956 In do_recovery, after the recovery and takeover is complete, recoverd event is triggered. When the parallel database recovery was separated, ctdb_recovery_helper implemented sending END_RECOVERY control which causes recoverd event to be triggered. So when there is parallel database recovery, recoverd event is triggered twice. Instead move the call to run_recovered_eventscript() explicitly in the serial recovery code path. This avoids the duplication trigger of recoverd event. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-06-08 10:33:19 +02:00
Martin Schwenke	174449c1e0	ctdb-recoverd: Release recovery lock on exit The recovery lock helper must exit when it notices its parent is gone. However, that can take a few seconds. The usual way of terminating the recovery daemon is for the main ctdbd to send it a SIGTERM. Installing a handler is nice and simple. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2016-06-08 00:51:29 +02:00

1 2 3 4 5 ...

505 Commits