samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-05 09:18:06 +03:00

Author	SHA1	Message	Date
Martin Schwenke	958746f947	ctdb-recoverd: Simplify some stopped/banned checks to inactive checks Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	358c59f51a	ctdb-recoverd: No longer take cluster lock during recovery Confirm instead that it is already held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	36ffaaa691	ctdb-recoverd: Add and use function cluster_lock_enabled() Now all references to ctdb->recovery_lock are encapsulated in the cluster lock code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5ee664ee17	ctdb-recoverd: Terminology change: recovery lock -> cluster lock No functional changes, just name changes for clarity. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	0f2250f4f9	ctdb-recoverd: Take cluster lock when election completes It is no longer just a recovery lock but is always held by the cluster leader. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	011e880002	ctdb-recoverd: Factor out function cluster_lock_take() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	b029ca4d51	ctdb-recoverd: Drop leader validation The introduction of the leader broadcast timeout provides an alternative to the current leader validation. Using the leader broadcast may not be as fast but it is more correct. When the leader node is stopped or banned, the only way of triggering an election is currently to fetch the leader's node map to check whether the it is still active. This is because the leader will no longer push the node map to other nodes. However, having all nodes fetch the node map from an inactive leader may be unreliable. Most of the other cases are also handled more reliably by the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7e53fab0a3	ctdb-recoverd: Drop special case for elected-before-connected This no longer occurs at startup due to the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef4b8c13c0	ctdb-recoverd: Handle leader broadcast timeout If no leader broadcasts have been received from the leader for more than 5s then trigger an election. Apart from being sane behaviour, this avoids elected-before-connected bugs at startup, where a node elects itself leader before it is connected to other nodes. When a node processes a leader broadcast timeout it sends an unknown leader broadcast to all nodes. That causes cancellation of the leader broadcast timeout across the cluster. This is particular important at startup, since nodes may be started in a staggered fashion. Without this cluster-wide cancellation, a node might notice the lack of leader, win an election and complete a recovery before other nodes notice the lack of leader. When the leader broadcast timeout finally occurs on the other nodes then they'll put the cluster back into an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	5c7f6da0f0	ctdb-recoverd: Send leader broadcasts These are triggered on 1 second timer, but are only sent if the node is the current leader and there is no election underway. If this node can not be the leader then ensure it releases the recovery lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	789a75abfa	ctdb-recoverd: Process leader broadcasts Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	c2cfd9c21a	ctdb-recoverd: Add an explicit flag for election in progress An alternate election method will be added that doesn't use the election timeout, so this provides a common way for recognising when an election is in progress. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ac5a3ca063	ctdb-recoverd: Only start election if node can be leader Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7baadfe27e	ctdb-recoverd: Add and use function this_node_can_be_leader() This makes the code self-documenting. In ctdb_election_data() there is a slight behaviour change. An inactive node will now try to lose an election. This case should not happen because: * An inactive node can't win an election round and then send a reply. * Any inactive node should never start an election. There are currently places where this happens and they will be fixed later. There is an instance where this could be used in validate_recovery_master() but this involves a more serious logic change. Overhaul this function later. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	94b546c268	ctdb-recoverd: Logging/comments: recovery master -> leader There are some remaining instances in this file but they will be removed in subsequent commits. Modernise debug macros as appropriate. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	dd79e9bd14	ctdb-recoverd: Rename recmaster field to leader Recovery master is being renamed to leader. This follows clustering best practice (e.g. RAFT). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	2ee6763c7d	ctdb-recoverd: Use rec->pnn everywhere This is currently referenced in a number of inconsistent ways, including: * pnn * rec->ctdb->pnn * ctdb->pnn * ctdb_get_pnn(ctdb) * ctdb_get_pnn(rec->ctdb) The first of these always requires some thought about the context - is this the node PNN or some other PNN (e.g. argument to function)? rec->pnn is now always used when referring to the recovery daemon's PNN. Doing this also reduces reliance on struct ctdb_context internals. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	4af3b10a37	ctdb-recoverd: Change argument to srvid_disable_and_reply() Reduce dependency on struct ctdb_context internals, enable a subsequent change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	b7c138ca99	ctdb-recoverd: Simplify arguments to ctdb_ban_node() ban_time argument is always ctdb->tunable.recovery_ban_period, so build this in and make the calling code more readable. ctdb_ban_node() already logs how long a node is banned for, so don't repeatedly log this. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	a5e0ddac62	ctdb-recoverd: Simplify arguments to verify_local_ip_allocation() All other arguments are available via rec, so simplify. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	67b5191640	ctdb-recoverd: Simplify arguments to do_recovery() pnn and nodemap are both available via the rec context, so simplify. vnnmap is unused. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	57882beb16	ctdb-recoverd: Simplify arguments to some election functions The pnn and nodemap arguments to force_election() and send_election_request() are always effectively rec->pnn and rec->nodemap, so simplify. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	9dbe7cc85e	ctdb-recoverd: Add PNN to recovery daemon context This is currently referenced in a number of inconsistent ways, including: * pnn * rec->ctdb->pnn * ctdb->pnn * ctdb_get_pnn(ctdb) * ctdb_get_pnn(rec->ctdb) The first of these always requires some thought about the context - is this the node PNN or some other PNN (e.g. argument to function)? The intention is to always use rec->pnn when referring to the recovery daemon's PNN. Doing this also reduces reliance on struct ctdb_context internals. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ff0140e470	ctdb-recoverd: Use this_node_is_leader() in an extra context This is arguably clearer. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	c8721d01c6	ctdb-recoverd: Factor out and use function this_node_is_leader() Make the code self-documenting. This preempts an upcoming change to terminology but doing it now saves a lot of churn. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	57a32cebdd	ctdb-recoverd: Pass SIGHUP to running helper The recovery and takeover helpers can run for a while and generate non-trivial logs, so have them reopen their logs to support log rotation. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Jan 17 04:36:30 UTC 2022 on sn-devel-184	2022-01-17 04:36:30 +00:00
Martin Schwenke	8e949a6082	ctdb-recoverd: Record helper PID in recovery daemon context Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	97a45f6f25	ctdb-recoverd: Add log reopening on SIGHUP to helpers Recovery and takeover helpers can run for a while and generate non-trivial logs. They should support log reopening. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	51f0380e83	ctdb-daemon: Enable log reopening for event daemon Add and call hook to pass on SIGHUP to eventd. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	c554a325fe	ctdb-daemon: Enable log reopening for recovery daemon Pass on a SIGHUP to the recovery daemon, which will then reopen its logs. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	4acfefed61	ctdb-recoverd: Add basic log reopening Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	4ed37de82b	ctdb-daemon: Add basic top-level log reopening Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	9e7d2d9794	ctdb-daemon: Don't mark a node as unhealthy when connecting to it Remote nodes are already initialised as UNHEALTHY when the node list is initialised at startup (ctdb_load_nodes_file() calls convert_node_map_to_list()) and when disconnected (ctdb_node_dead()). So, drop this code. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Thu Sep 9 02:38:34 UTC 2021 on sn-devel-184	2021-09-09 02:38:34 +00:00
Martin Schwenke	7f697b1938	ctdb-daemon: Ignore flag changes for disconnected nodes If this node is not connected to a node then we shouldn't know anything about it. The state will be pushed later by the recovery master. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Signed-off-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	ae10a8a4b7	ctdb-daemon: Simplify ctdb_control_modflags() Now that there are separate disable/enable controls used by the ctdb tool this control can ignore any flag updates for the current nodes. These only come from the recovery master, which depends on being able to fetch flags for all nodes. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	916c5ee131	ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler() and replace with srvid_not_implemented(). Mark the SRVID obsolete in its comment. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	e75256767f	ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS The code that handles this message is ctdb_recoverd.c:monitor_handler(). Although it appears to do something potentially useful, it only logs the flags changes. All changes made are to local structures - there are no actual side-effects. It used to trigger a takeover run when the DISABLED flag changed. This was dropped back in commit `662f06de9f`. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	0132bd5a22	ctdb-daemon: Modernise remaining debug macro in this function BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	b6d25d079e	ctdb-daemon: Update logging for flag changes When flags change, promote the message to NOTICE level and switch the message to the style that is currently generated by ctdb-recoverd.c:monitor_handler(). This will allow monitor_handler() to go away in future. Drop logging when flags do not change. The recovery master now logs when it pushes flags for a node, so the lack of a corresponding "changed flags" message here indicates that no update was required. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	eec44e2862	ctdb-daemon: Correct the condition for logging unchanged flags Don't trust the old flags from the recovery master. Surrounding code will change in future comments, including the use of old-style debug macros, so just make this change clear. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	15a6489c28	ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	60c1ef1465	ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED DISABLED is UNHEALTHY \| PERMANENTLY_DISABLED, which is not what is intended here. Luckily, it doesn't do any harm because nodes are marked unhealthy at startup anyway. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	1ac7bc7532	ctdb-daemon: Factor out a function to get node structure from PNN BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	e0a7b5a9e8	ctdb-daemon: Add a helper variable Simplifies a subsequent change. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	8305f6a7f1	ctdb-recoverd: Push flags for a node if any remote node disagrees This will usually happen if flags on the node in question change, so keeping the code simple and pushing to all nodes won't hurt. When all nodes come up there might be differences in connected nodes, causing such "fix ups". Receiving nodes will ignore no-op pushes. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	620d078714	ctdb-recoverd: Update the local node map before pushing out flags The resulting code structure looks a little weird. However, there is another condition that requires the flags to be pushed that will be inserted before the continue statement in a subsequent commit.. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	82a075d4d7	ctdb-recoverd: Add a helper variable Improves readability and simplifies subsequent changes. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00
Martin Schwenke	e40d452722	ctdb-daemon: Close server socket when switching to client The socket is set close-on-exec but that doesn't help for processes that do not exec(). This should be done for all child processes. This has been seen in testing where "ctdb shutdown" waits for the socket to close before succeeding. It appears that lingering vacuuming processes have not closed the socket when becoming clients so they cause "ctdb shutdown" to hang even though the main daemon process has exited. The cause of the lingering vacuuming processes has been previously examined but still isn't understood. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-06-25 09:16:31 +00:00
Amitay Isaacs	d07875330a	ctdb-locking: Pass additional arguments to debug locks script 1. PID of lock helper waiting for lock 2. Scope of lock: "record" or "db" 3. Path to database that lock helper is trying to lock 4. Whether the database uses mutexes: "mutex" or "fcntl" Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2021-05-28 06:46:29 +00:00
Volker Lendecke	06b740e2fb	ctdb: Fix a typo Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-03-09 22:36:28 +00:00

1 2 3 4 5 ...

2564 Commits