samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00

Author	SHA1	Message	Date
Martin Schwenke	061315cc79	ctdb-mutex: Test the lock by locking a 2nd byte range Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	97a1714ee9	ctdb-mutex: open() and fstat() when testing lock file This makes a file descriptor available for other I/O. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	c07e81abf0	ctdb-mutex: Factor out function fcntl_lock_fd() Allows blocking mode and start offset to be specified. Always locks a 1-byte range. Make the lock structure static to avoid initialising the whole structure each time. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	9daf22a5c9	ctdb-mutex: Handle pings from lock checking child to parent The ping timeout is specified by passing an extra argument to the mutex helper, representing the ping timeout in seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	b5db286791	ctdb-mutex: Do inode checks in a child process In future this will allow extra I/O tests and a timeout in the parent to (hopefully) release the lock if the child gets wedged. For simplicity, use tmon only to detect when either parent or child goes away. Plumbing a timeout for pings from child to parent will be done later. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	2ecdbcb22c	ctdb-mutex: Rename wait_for_lost to lock_io_check This will be generalised to do more I/O-based checks. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	7ab2e8f127	ctdb-mutex: Rename recheck_time to recheck_interval There will be more timeouts so clarify the intent of this one. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	c396b61504	ctdb-mutex: Consistently use progname in error messages To avoid error messages having ridiculously long paths, set progname to basename(argv[0]). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-28 10:09:34 +00:00
Martin Schwenke	3efa56aa61	ctdb-daemon: Fix printing of tickle ACKs Commit `f5a2037734` arguably got this back-to-front: 2022-07-27T09:50:01.985857+10:00 testn1 ctdbd[17820]: ../../ctdb/server/ctdb_takeover.c:514 sending TAKE_IP for '10.0.1.173' 2022-07-27T09:50:01.990601+10:00 testn1 ctdbd[17820]: Send TCP tickle ACK: 10.0.1.77:33004 -> 10.0.1.173:2049 2022-07-27T09:50:01.991323+10:00 testn1 ctdb-takeover[19758]: TAKEOVER_IP 10.0.1.173 succeeded on node 0 Unfortunately there is an inconsistency somewhere in the connection tracking code used for tickle ACKs, making this less than obvious. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Thu Jul 28 09:02:08 UTC 2022 on sn-devel-184	2022-07-28 09:02:08 +00:00
Martin Schwenke	c77a4fde7a	ctdb-daemon: Modernise debug in ctdb_add_public_address() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	d62fcba7dc	ctdb-daemon: Avoid spurious error sending ARPs for released IP A public IP address can be released in between (and probably before) attempts to send ARPs. One situation when this can occur is when a cluster is shutting down: node A shuts down first, public IPs from node A are taken over by node B, node B is shutdown. Notice this when it occurs and cancel further attempts to send ARPs. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	f5a2037734	ctdb-daemon: Modernise debug in ctdb_control_send_arp() For the tickle ACK logging, render the connection in a buffer. This produces more complete information. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	9898e7c555	ctdb-recoverd: Clean up banning culprit code Make this fully self-contained in the recovery daemon and avoid indexing by PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	19fbc2da38	ctdb-recoverd: Add pnn field to banning state structure This structure is now standalone, so indexing by PNN can be avoided via a subsequent commit. Index by culprit here to make this commit simple. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	0b5dd07604	ctdb-recoverd: Add function node_flags() and use it in elections Indexing a node map by PNN is suboptimal. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-07-22 16:09:31 +00:00
Martin Schwenke	e752f841e6	ctdb-daemon: Use DEBUG() macro for child logging Directly using dbgtext() with file logging results in a log entry with no header, which is wrong. This is a regression, introduced in commit `10d15c9e5d`. Prior to this, CTDB's callback for file logging would always add a header. Use DEBUG() instead dbgtext(). Note that DEBUG() effectively compares the passed script_log_level with DEBUGLEVEL, so an explicit check is no longer necessary. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Thu Jun 16 13:33:10 UTC 2022 on sn-devel-184	2022-06-16 13:33:10 +00:00
Martin Schwenke	88f35cf862	ctdb-daemon: Drop unused prefix, logfn, logfn_private These aren't set anywhere in the code. Drop the log argument because it is also no longer used. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org>	2022-06-16 12:42:35 +00:00
Martin Schwenke	90a96f06a9	ctdb-recoverd: Do not ban on unknown error when taking cluster lock If the cluster filesystem is unavailable then I/O errors may occur. This is no worse than contention, so don't ban. This avoids having services unavailable for longer than necessary. Update the associated test to simply confirm that this results in a leaderless cluster, and leadership is restored when the lock can once again be taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Martin Schwenke	da9decfc5e	ctdb-daemon: Remove unused #includes of rb_tree.h ctdb_takeover.c and eventscript.c no longer use this. ipalloc_common.c has never used it. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Martin Schwenke	80de84d36e	ctdb-daemon: Log per-database summary of resent calls After a recovery that takes a significant amount of time the logs are flooded with messages about every resent call. Log a summary instead and demote per-call messages to INFO level. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Pavel Filipenský	8cb6565011	ctdb: Covscan: unchecked return value for trbt_traversearray32() Signed-off-by: Pavel Filipenský <pfilipen@redhat.com> Reviewed-by: Jeremy Allison <jra@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>	2022-05-14 03:49:32 +00:00
Martin Schwenke	d52b497d11	ctdb-locking: Don't pass NULL to tevent_req_is_unix_error() If there is an error then this pointer is unconditionally dereferenced. However, the only possible error appears to be ENOMEM, where a crash caused by dereferencing a NULL pointer isn't a terrible outcome. In the absence of a security issue this is probably not worth backporting. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	490e5f4d4c	ctdb-mutex: Don't pass NULL to tevent_req_is_unix_error() If there is an error then this pointer is unconditionally dereferenced. However, the only possible error appears to be ENOMEM, where a crash caused by dereferencing a NULL pointer isn't a terrible outcome. In the absence of a security issue this is probably not worth backporting. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	6fb08a6580	ctdb-daemon: Don't release all public IPs during shutdown sequence This further untangles public IP handling from the main daemon. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	f49446cb1e	ctdb-daemon: Load tunables from ctdb.tunables Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	a509ee059e	ctdb-daemon: New function ctdb_tunables_load() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	0e74e03c9c	ctdb-recoverd: Consistently log start of election Elections should now be quite rare, so always log when one begins. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	bf55a0117d	ctdb-recoverd: Always send unknown leader broadcast when starting election This is currently missed when the cluster lock is lost. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	9b3fab052b	ctdb-recoverd: Consistently have caller set election-in-progress The problem here is that election-in-progress must be set to potentially avoid restarting the election broadcast timeout in main_loop(), so this is already done by leader_handler(). Have force_election() set election-in-progress for all election types and do not bother setting it in cluster_lock_election(). BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	188a902156	ctdb-recoverd: Always cancel election in progress Election-in-progress is set by unknown leader broadcast, so needs to be cleared in all cases when election completes. This was seen in a case where the leader node stalled, so didn't send leader broadcasts for some time. The node continued to hold the cluster lock, so another node could not become leader. However, after the node returned to normal it still did not send leader broadcasts because election-in-progress was never cleared. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	34d2ca0ae6	ctdb-config: Add configuration option [cluster] leader timeout Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	1dfb266038	ctdb-config: [legacy] recmaster capability -> [cluster] leader capability Rename this configuration item and move it into the [cluster] configuration section. Update documentation to match. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	f5a39058f0	ctdb-config: [cluster] recovery lock -> [cluster] cluster lock Retain "recovery lock" and mark as deprecated for backward compatibility. Some documentation is still inconsistent. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	73555e8248	ctdb-recoverd: Use race for cluster lock as election when lock is enabled If the cluster is partitioned then nodes in one partition can not take the lock anyway, so election is pointless. It just introduces unnecessary corner cases. Instead just race for the lock. When a node notices a lack of leader and notifies other nodes of an election via an unknown leader broadcast, the cluster lock election is hooked into this broadcast. The test needs to be updated because losing the cluster lock can now result in a leadership change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	a76374070d	ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	16efbca003	ctdb-daemon: Drop unused old client recmaster functions Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	c68267b2a6	ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster() Nothing fetches this value anymore. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	58d7fcdf7c	ctdb-recoverd: Drop recovery master verification This doesn't make sense if leader broadcasts are used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	958746f947	ctdb-recoverd: Simplify some stopped/banned checks to inactive checks Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	358c59f51a	ctdb-recoverd: No longer take cluster lock during recovery Confirm instead that it is already held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	36ffaaa691	ctdb-recoverd: Add and use function cluster_lock_enabled() Now all references to ctdb->recovery_lock are encapsulated in the cluster lock code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5ee664ee17	ctdb-recoverd: Terminology change: recovery lock -> cluster lock No functional changes, just name changes for clarity. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	0f2250f4f9	ctdb-recoverd: Take cluster lock when election completes It is no longer just a recovery lock but is always held by the cluster leader. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	011e880002	ctdb-recoverd: Factor out function cluster_lock_take() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	b029ca4d51	ctdb-recoverd: Drop leader validation The introduction of the leader broadcast timeout provides an alternative to the current leader validation. Using the leader broadcast may not be as fast but it is more correct. When the leader node is stopped or banned, the only way of triggering an election is currently to fetch the leader's node map to check whether the it is still active. This is because the leader will no longer push the node map to other nodes. However, having all nodes fetch the node map from an inactive leader may be unreliable. Most of the other cases are also handled more reliably by the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7e53fab0a3	ctdb-recoverd: Drop special case for elected-before-connected This no longer occurs at startup due to the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef4b8c13c0	ctdb-recoverd: Handle leader broadcast timeout If no leader broadcasts have been received from the leader for more than 5s then trigger an election. Apart from being sane behaviour, this avoids elected-before-connected bugs at startup, where a node elects itself leader before it is connected to other nodes. When a node processes a leader broadcast timeout it sends an unknown leader broadcast to all nodes. That causes cancellation of the leader broadcast timeout across the cluster. This is particular important at startup, since nodes may be started in a staggered fashion. Without this cluster-wide cancellation, a node might notice the lack of leader, win an election and complete a recovery before other nodes notice the lack of leader. When the leader broadcast timeout finally occurs on the other nodes then they'll put the cluster back into an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	5c7f6da0f0	ctdb-recoverd: Send leader broadcasts These are triggered on 1 second timer, but are only sent if the node is the current leader and there is no election underway. If this node can not be the leader then ensure it releases the recovery lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	789a75abfa	ctdb-recoverd: Process leader broadcasts Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	c2cfd9c21a	ctdb-recoverd: Add an explicit flag for election in progress An alternate election method will be added that doesn't use the election timeout, so this provides a common way for recognising when an election is in progress. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00

1 2 3 4 5 ...

2602 Commits