samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00

Author	SHA1	Message	Date
Martin Schwenke	5af2f09a1f	ctdb-server: Optimise local IP verification It is more efficient calling ctdb_sys_local_ip_check() inside a loop compared to calling ctdb_sys_have_ip(). There is a chance that this is premature optimisation... but it sure is easy. Fall back to checking with bind(). I think these checks really exist because of the weirdness fixed by commit `4b4e4d8870`. However, we might as well do what we can. Signed-off-by: Martin Schwenke <mschwenke@ddn.com> Reviewed-by: John Mulligan <jmulligan@redhat.com> Reviewed-by: Anoop C S <anoopcs@samba.org>	2024-10-07 15:58:38 +00:00
Martin Schwenke	a15167aafe	ctdb-server: Add some local variables Improve readability by not repeating the complex expression now assigned to addr. ctdb_sys_have_ip() is called in both arms of the if/else, so call it once when declaring the new variable. Modernise debug macros while touching lines. Signed-off-by: Martin Schwenke <mschwenke@ddn.com> Reviewed-by: John Mulligan <jmulligan@redhat.com> Reviewed-by: Anoop C S <anoopcs@samba.org>	2024-10-07 15:58:38 +00:00
John Mulligan	a0e8304ccf	ctdb-server: rename ctdb_load_nodes_file to ctdb_load_nodes Rename ctdb_load_nodes_file to ctdb_load_nodes as it can now load nodes from more than a regular file. Signed-off-by: John Mulligan <jmulligan@redhat.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2024-08-06 00:43:36 +00:00
Martin Schwenke	52e5e92693	ctdb-conf: Move all conf files to new conf/ subdirectory Leave common/conf.[ch] where they are to make this commit comprehensible. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Guenther Deschner <gd@samba.org> Reviewed-by: Anoop C S <anoopcs@samba.org>	2024-06-28 18:43:52 +05:30
Volker Lendecke	7e621b1b53	ctdb: Modernize a few DEBUGs Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Martin Schwenke <mschwenke@ddn.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Wed Apr 17 00:54:55 UTC 2024 on atb-devel-224	2024-04-17 00:54:55 +00:00
Joseph Sutton	a8085b3dd5	ctdb: Add missing newlines to logging messages Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-08-08 04:39:36 +00:00
Martin Schwenke	2e2d81b92a	ctdb-recoverd: CID 1509028 - Use of 32-bit time_t (Y2K38_SAFETY) usecs is going to be passed as a uint32_t. There is no need to calculate it as a time_t. Signed-off-by: Martin Schwenke <mschwenke@ddn.com> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2023-07-19 09:01:33 +00:00
Andreas Schneider	7749df4992	ctdb:server: Fix code spelling Best reviewed with: `git show --word-diff` Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Martin Schwenke <mschwenke@ddn.com>	2023-03-24 07:01:31 +00:00
Michael Tokarev	96154a26fe	spelling fixes for 4.18 (errror implemenation proces Controler) One of changes is somewhat interesting, it is "tfork waiter proces" process title in tfork.c. I wonder why no one noticed this before. There's another similar process title in there, "tfork waiter process(%d)". Hopefully no one does grep for "proces$" (and there's no reason to). Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Volker Lendecke <vl@samba.org> Reviewed-by: Rowland Penny <rpenny@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Thu Jan 26 20:46:11 UTC 2023 on atb-devel-224	2023-01-26 20:46:11 +00:00
Martin Schwenke	9898e7c555	ctdb-recoverd: Clean up banning culprit code Make this fully self-contained in the recovery daemon and avoid indexing by PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	19fbc2da38	ctdb-recoverd: Add pnn field to banning state structure This structure is now standalone, so indexing by PNN can be avoided via a subsequent commit. Index by culprit here to make this commit simple. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	0b5dd07604	ctdb-recoverd: Add function node_flags() and use it in elections Indexing a node map by PNN is suboptimal. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	90a96f06a9	ctdb-recoverd: Do not ban on unknown error when taking cluster lock If the cluster filesystem is unavailable then I/O errors may occur. This is no worse than contention, so don't ban. This avoids having services unavailable for longer than necessary. Update the associated test to simply confirm that this results in a leaderless cluster, and leadership is restored when the lock can once again be taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Martin Schwenke	0e74e03c9c	ctdb-recoverd: Consistently log start of election Elections should now be quite rare, so always log when one begins. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	bf55a0117d	ctdb-recoverd: Always send unknown leader broadcast when starting election This is currently missed when the cluster lock is lost. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	9b3fab052b	ctdb-recoverd: Consistently have caller set election-in-progress The problem here is that election-in-progress must be set to potentially avoid restarting the election broadcast timeout in main_loop(), so this is already done by leader_handler(). Have force_election() set election-in-progress for all election types and do not bother setting it in cluster_lock_election(). BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	188a902156	ctdb-recoverd: Always cancel election in progress Election-in-progress is set by unknown leader broadcast, so needs to be cleared in all cases when election completes. This was seen in a case where the leader node stalled, so didn't send leader broadcasts for some time. The node continued to hold the cluster lock, so another node could not become leader. However, after the node returned to normal it still did not send leader broadcasts because election-in-progress was never cleared. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	34d2ca0ae6	ctdb-config: Add configuration option [cluster] leader timeout Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	73555e8248	ctdb-recoverd: Use race for cluster lock as election when lock is enabled If the cluster is partitioned then nodes in one partition can not take the lock anyway, so election is pointless. It just introduces unnecessary corner cases. Instead just race for the lock. When a node notices a lack of leader and notifies other nodes of an election via an unknown leader broadcast, the cluster lock election is hooked into this broadcast. The test needs to be updated because losing the cluster lock can now result in a leadership change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	c68267b2a6	ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster() Nothing fetches this value anymore. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	58d7fcdf7c	ctdb-recoverd: Drop recovery master verification This doesn't make sense if leader broadcasts are used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	958746f947	ctdb-recoverd: Simplify some stopped/banned checks to inactive checks Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	358c59f51a	ctdb-recoverd: No longer take cluster lock during recovery Confirm instead that it is already held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	36ffaaa691	ctdb-recoverd: Add and use function cluster_lock_enabled() Now all references to ctdb->recovery_lock are encapsulated in the cluster lock code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5ee664ee17	ctdb-recoverd: Terminology change: recovery lock -> cluster lock No functional changes, just name changes for clarity. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	0f2250f4f9	ctdb-recoverd: Take cluster lock when election completes It is no longer just a recovery lock but is always held by the cluster leader. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	011e880002	ctdb-recoverd: Factor out function cluster_lock_take() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	b029ca4d51	ctdb-recoverd: Drop leader validation The introduction of the leader broadcast timeout provides an alternative to the current leader validation. Using the leader broadcast may not be as fast but it is more correct. When the leader node is stopped or banned, the only way of triggering an election is currently to fetch the leader's node map to check whether the it is still active. This is because the leader will no longer push the node map to other nodes. However, having all nodes fetch the node map from an inactive leader may be unreliable. Most of the other cases are also handled more reliably by the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7e53fab0a3	ctdb-recoverd: Drop special case for elected-before-connected This no longer occurs at startup due to the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef4b8c13c0	ctdb-recoverd: Handle leader broadcast timeout If no leader broadcasts have been received from the leader for more than 5s then trigger an election. Apart from being sane behaviour, this avoids elected-before-connected bugs at startup, where a node elects itself leader before it is connected to other nodes. When a node processes a leader broadcast timeout it sends an unknown leader broadcast to all nodes. That causes cancellation of the leader broadcast timeout across the cluster. This is particular important at startup, since nodes may be started in a staggered fashion. Without this cluster-wide cancellation, a node might notice the lack of leader, win an election and complete a recovery before other nodes notice the lack of leader. When the leader broadcast timeout finally occurs on the other nodes then they'll put the cluster back into an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	5c7f6da0f0	ctdb-recoverd: Send leader broadcasts These are triggered on 1 second timer, but are only sent if the node is the current leader and there is no election underway. If this node can not be the leader then ensure it releases the recovery lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	789a75abfa	ctdb-recoverd: Process leader broadcasts Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	c2cfd9c21a	ctdb-recoverd: Add an explicit flag for election in progress An alternate election method will be added that doesn't use the election timeout, so this provides a common way for recognising when an election is in progress. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ac5a3ca063	ctdb-recoverd: Only start election if node can be leader Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7baadfe27e	ctdb-recoverd: Add and use function this_node_can_be_leader() This makes the code self-documenting. In ctdb_election_data() there is a slight behaviour change. An inactive node will now try to lose an election. This case should not happen because: * An inactive node can't win an election round and then send a reply. * Any inactive node should never start an election. There are currently places where this happens and they will be fixed later. There is an instance where this could be used in validate_recovery_master() but this involves a more serious logic change. Overhaul this function later. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	94b546c268	ctdb-recoverd: Logging/comments: recovery master -> leader There are some remaining instances in this file but they will be removed in subsequent commits. Modernise debug macros as appropriate. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	dd79e9bd14	ctdb-recoverd: Rename recmaster field to leader Recovery master is being renamed to leader. This follows clustering best practice (e.g. RAFT). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	2ee6763c7d	ctdb-recoverd: Use rec->pnn everywhere This is currently referenced in a number of inconsistent ways, including: * pnn * rec->ctdb->pnn * ctdb->pnn * ctdb_get_pnn(ctdb) * ctdb_get_pnn(rec->ctdb) The first of these always requires some thought about the context - is this the node PNN or some other PNN (e.g. argument to function)? rec->pnn is now always used when referring to the recovery daemon's PNN. Doing this also reduces reliance on struct ctdb_context internals. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	4af3b10a37	ctdb-recoverd: Change argument to srvid_disable_and_reply() Reduce dependency on struct ctdb_context internals, enable a subsequent change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	b7c138ca99	ctdb-recoverd: Simplify arguments to ctdb_ban_node() ban_time argument is always ctdb->tunable.recovery_ban_period, so build this in and make the calling code more readable. ctdb_ban_node() already logs how long a node is banned for, so don't repeatedly log this. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	a5e0ddac62	ctdb-recoverd: Simplify arguments to verify_local_ip_allocation() All other arguments are available via rec, so simplify. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	67b5191640	ctdb-recoverd: Simplify arguments to do_recovery() pnn and nodemap are both available via the rec context, so simplify. vnnmap is unused. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	57882beb16	ctdb-recoverd: Simplify arguments to some election functions The pnn and nodemap arguments to force_election() and send_election_request() are always effectively rec->pnn and rec->nodemap, so simplify. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	9dbe7cc85e	ctdb-recoverd: Add PNN to recovery daemon context This is currently referenced in a number of inconsistent ways, including: * pnn * rec->ctdb->pnn * ctdb->pnn * ctdb_get_pnn(ctdb) * ctdb_get_pnn(rec->ctdb) The first of these always requires some thought about the context - is this the node PNN or some other PNN (e.g. argument to function)? The intention is to always use rec->pnn when referring to the recovery daemon's PNN. Doing this also reduces reliance on struct ctdb_context internals. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ff0140e470	ctdb-recoverd: Use this_node_is_leader() in an extra context This is arguably clearer. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	c8721d01c6	ctdb-recoverd: Factor out and use function this_node_is_leader() Make the code self-documenting. This preempts an upcoming change to terminology but doing it now saves a lot of churn. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	57a32cebdd	ctdb-recoverd: Pass SIGHUP to running helper The recovery and takeover helpers can run for a while and generate non-trivial logs, so have them reopen their logs to support log rotation. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Jan 17 04:36:30 UTC 2022 on sn-devel-184	2022-01-17 04:36:30 +00:00
Martin Schwenke	8e949a6082	ctdb-recoverd: Record helper PID in recovery daemon context Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	4acfefed61	ctdb-recoverd: Add basic log reopening Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	916c5ee131	ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler() and replace with srvid_not_implemented(). Mark the SRVID obsolete in its comment. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2021-09-09 01:46:49 +00:00

1 2 3 4 5 ...

590 Commits