samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-05 09:18:06 +03:00

Author	SHA1	Message	Date
Volker Lendecke	104fcaa89f	ctdb: Fix a use-after-free in run_proc If you happen to talloc_free(run_ctx) before all the tevent_req's hanging off it, you run into the following: ==495196== Invalid read of size 8 ==495196== at 0x10D757: run_proc_state_destructor (run_proc.c:413) ==495196== by 0x488F736: _tc_free_internal (talloc.c:1158) ==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248) ==495196== by 0x4890F41: _talloc_free (talloc.c:1792) ==495196== by 0x48538B1: tevent_req_received (tevent_req.c:293) ==495196== by 0x4853429: tevent_req_destructor (tevent_req.c:129) ==495196== by 0x488F736: _tc_free_internal (talloc.c:1158) ==495196== by 0x4890AF6: _tc_free_children_internal (talloc.c:1669) ==495196== by 0x488F967: _tc_free_internal (talloc.c:1184) ==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248) ==495196== by 0x4890F41: _talloc_free (talloc.c:1792) ==495196== by 0x10DE62: main (run_proc_test.c:86) ==495196== Address 0x55b77f8 is 152 bytes inside a block of size 160 free'd ==495196== at 0x48399AB: free (vg_replace_malloc.c:538) ==495196== by 0x488FB25: _tc_free_internal (talloc.c:1222) ==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248) ==495196== by 0x4890F41: _talloc_free (talloc.c:1792) ==495196== by 0x10D315: run_proc_context_destructor (run_proc.c:329) ==495196== by 0x488F736: _tc_free_internal (talloc.c:1158) ==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248) ==495196== by 0x4890F41: _talloc_free (talloc.c:1792) ==495196== by 0x10DE62: main (run_proc_test.c:86) ==495196== Block was alloc'd at ==495196== at 0x483877F: malloc (vg_replace_malloc.c:307) ==495196== by 0x488EAD9: __talloc_with_prefix (talloc.c:783) ==495196== by 0x488EC73: __talloc (talloc.c:825) ==495196== by 0x488F0FC: _talloc_named_const (talloc.c:982) ==495196== by 0x48925B1: _talloc_zero (talloc.c:2421) ==495196== by 0x10C8F2: proc_new (run_proc.c:61) ==495196== by 0x10D4C9: run_proc_send (run_proc.c:381) ==495196== by 0x10DDF6: main (run_proc_test.c:79) This happens because run_proc_context_destructor() directly does a talloc_free() on the struct proc_context's and not the enclosing tevent_req's. run_proc_kill() makes sure that we don't follow proc->req, but it forgets the "state->proc", which is free()'ed, but later dereferenced in run_proc_state_destructor(). This is an attempt at a quick fix, I believe we should convert run_proc_context->plist into an array of tevent_req's, so that we can properly TALLOC_FREE() according to the "natural" hierarchy and not just pull an arbitrary thread out of that heap. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15269 Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Thu Oct 6 15:10:20 UTC 2022 on sn-devel-184 (cherry picked from commit `688be0177b`)	2023-01-03 18:21:10 +00:00
Martin Schwenke	959d37e72c	ctdb-daemon: Use DEBUG() macro for child logging Directly using dbgtext() with file logging results in a log entry with no header, which is wrong. This is a regression, introduced in commit `10d15c9e5d`. Prior to this, CTDB's callback for file logging would always add a header. Use DEBUG() instead dbgtext(). Note that DEBUG() effectively compares the passed script_log_level with DEBUGLEVEL, so an explicit check is no longer necessary. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Thu Jun 16 13:33:10 UTC 2022 on sn-devel-184 (cherry picked from commit `e752f841e6`)	2022-06-18 08:47:17 +00:00
Martin Schwenke	c4e176e46c	ctdb-daemon: Drop unused prefix, logfn, logfn_private These aren't set anywhere in the code. Drop the log argument because it is also no longer used. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> (cherry picked from commit `88f35cf862`)	2022-06-18 08:47:17 +00:00
Martin Schwenke	7970676503	ctdb-common: Tell file logging not to redirect stderr This allows ctdb_set_child_logging() to work. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> (cherry picked from commit `1596a3e84b`)	2022-06-18 08:47:17 +00:00
Martin Schwenke	79b42f0f2b	ctdb-tests: Add a test for stalled node triggering election A stalled node probably continues to hold the cluster lock, so confirm elections work in this case. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Feb 14 02:46:01 UTC 2022 on sn-devel-184 (cherry picked from commit `331c435ce5`) Autobuild-User(v4-16-test): Jule Anger <janger@samba.org> Autobuild-Date(v4-16-test): Tue Feb 15 09:55:38 UTC 2022 on sn-devel-184	2022-02-15 09:55:38 +00:00
Martin Schwenke	f3047e90a8	ctdb-tests: Factor out functions to detect when generation changes BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit `265e44abc4`)	2022-02-15 09:01:14 +00:00
Martin Schwenke	d0133dd3a5	ctdb-recoverd: Consistently log start of election Elections should now be quite rare, so always log when one begins. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit `0e74e03c9c`)	2022-02-15 09:01:14 +00:00
Martin Schwenke	ddda97dc14	ctdb-recoverd: Always send unknown leader broadcast when starting election This is currently missed when the cluster lock is lost. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit `bf55a0117d`)	2022-02-15 09:01:14 +00:00
Martin Schwenke	758e953ee0	ctdb-recoverd: Consistently have caller set election-in-progress The problem here is that election-in-progress must be set to potentially avoid restarting the election broadcast timeout in main_loop(), so this is already done by leader_handler(). Have force_election() set election-in-progress for all election types and do not bother setting it in cluster_lock_election(). BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit `9b3fab052b`)	2022-02-15 09:01:14 +00:00
Martin Schwenke	07540a8cf4	ctdb-recoverd: Always cancel election in progress Election-in-progress is set by unknown leader broadcast, so needs to be cleared in all cases when election completes. This was seen in a case where the leader node stalled, so didn't send leader broadcasts for some time. The node continued to hold the cluster lock, so another node could not become leader. However, after the node returned to normal it still did not send leader broadcasts because election-in-progress was never cleared. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (cherry picked from commit `188a902156`)	2022-02-15 09:01:14 +00:00
Martin Schwenke	f7de2132bb	ctdb-doc: Remove documentation for recovery process This is many years out of date and recent changes make it worse. It is unlikely that anyone has the time to fix this in the near future, so remove it because it is misleading. Database recovery steps are well documented in comments in the recovery helper. Cluster monitoring documentation can be re-added when things stop changing. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	a940ad9370	ctdb-doc: Update example configuration migration script Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	01313ea243	ctdb-tests: Improve test coverage for leader role yield and elections Rename test, clean up node selection. Duplicate for for banning and removing leader capability cases. Repeat all 3 tests without cluster lock. All of the standard election triggers are now tested, with and without cluster lock. Due to test cluster configuration limitations, the tests without cluster lock are skipped on a real cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5d31778149	ctdb-tests: Support commenting out local daemons configuration options Can be used to disable default options, such as cluster lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	34d2ca0ae6	ctdb-config: Add configuration option [cluster] leader timeout Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	1dfb266038	ctdb-config: [legacy] recmaster capability -> [cluster] leader capability Rename this configuration item and move it into the [cluster] configuration section. Update documentation to match. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	f5a39058f0	ctdb-config: [cluster] recovery lock -> [cluster] cluster lock Retain "recovery lock" and mark as deprecated for backward compatibility. Some documentation is still inconsistent. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	d752a92e11	ctdb-doc: Update documentation for leader and cluster lock Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	73555e8248	ctdb-recoverd: Use race for cluster lock as election when lock is enabled If the cluster is partitioned then nodes in one partition can not take the lock anyway, so election is pointless. It just introduces unnecessary corner cases. Instead just race for the lock. When a node notices a lack of leader and notifies other nodes of an election via an unknown leader broadcast, the cluster lock election is hooked into this broadcast. The test needs to be updated because losing the cluster lock can now result in a leadership change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	938d64c8ff	ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	03ae158cff	ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	a76374070d	ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	193b624d26	ctdb-protocol: Drop protocol client functions for recmaster controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	cda673ff6d	ctdb-client: Drop unused recmaster functions Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	16efbca003	ctdb-daemon: Drop unused old client recmaster functions Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	c68267b2a6	ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster() Nothing fetches this value anymore. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	58d7fcdf7c	ctdb-recoverd: Drop recovery master verification This doesn't make sense if leader broadcasts are used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	f02e097485	ctdb-tools: recovery master -> leader The following command names are changed: recmaster -> leader setrecmasterrole -> setleaderrole Command output changed for the following commands: status getcapabilities Documentation and tests are updated to reflect these changes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	e60581d5b5	ctdb-tools: Use leader broadcast in get_leader() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	92fb68e9b8	ctdb-tools: Factor out get_leader() This seems pointless but it localises a subsequent change and also starts a terminology change in the tool code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	17ba15ccd8	ctdb-tools: Handle leader broadcasts in ctdb tool Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	ec90f36cc6	ctdb-tools: Print "UNKNOWN" when leader PNN is unknown Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	01a8d1a4a4	ctdb-client: Factor out function ctdb_client_wait_func_timeout() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	403db5b528	ctdb-tests: Factor out getting leader and waiting for leader change Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	4786982cc8	ctdb-tests: Add leader broadcasts to fake_ctdbd Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Amitay Isaacs	756dfdfed9	ctdb-tests: Implement srvid_handler for dispatching messages Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2022-01-17 10:21:33 +00:00
Martin Schwenke	958746f947	ctdb-recoverd: Simplify some stopped/banned checks to inactive checks Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	358c59f51a	ctdb-recoverd: No longer take cluster lock during recovery Confirm instead that it is already held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	36ffaaa691	ctdb-recoverd: Add and use function cluster_lock_enabled() Now all references to ctdb->recovery_lock are encapsulated in the cluster lock code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5ee664ee17	ctdb-recoverd: Terminology change: recovery lock -> cluster lock No functional changes, just name changes for clarity. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	0f2250f4f9	ctdb-recoverd: Take cluster lock when election completes It is no longer just a recovery lock but is always held by the cluster leader. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	011e880002	ctdb-recoverd: Factor out function cluster_lock_take() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	037abf8620	ctdb-tests: Avoid a race See the comment in the code for details. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef7e3265f7	ctdb-tests: Setup cluster with expected arguments ctdb_test_init() doesn't actually pass arguments to local_daemons.sh. This needs to be done using ctdb_nodes_start_custom(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	b029ca4d51	ctdb-recoverd: Drop leader validation The introduction of the leader broadcast timeout provides an alternative to the current leader validation. Using the leader broadcast may not be as fast but it is more correct. When the leader node is stopped or banned, the only way of triggering an election is currently to fetch the leader's node map to check whether the it is still active. This is because the leader will no longer push the node map to other nodes. However, having all nodes fetch the node map from an inactive leader may be unreliable. Most of the other cases are also handled more reliably by the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7e53fab0a3	ctdb-recoverd: Drop special case for elected-before-connected This no longer occurs at startup due to the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef4b8c13c0	ctdb-recoverd: Handle leader broadcast timeout If no leader broadcasts have been received from the leader for more than 5s then trigger an election. Apart from being sane behaviour, this avoids elected-before-connected bugs at startup, where a node elects itself leader before it is connected to other nodes. When a node processes a leader broadcast timeout it sends an unknown leader broadcast to all nodes. That causes cancellation of the leader broadcast timeout across the cluster. This is particular important at startup, since nodes may be started in a staggered fashion. Without this cluster-wide cancellation, a node might notice the lack of leader, win an election and complete a recovery before other nodes notice the lack of leader. When the leader broadcast timeout finally occurs on the other nodes then they'll put the cluster back into an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	5c7f6da0f0	ctdb-recoverd: Send leader broadcasts These are triggered on 1 second timer, but are only sent if the node is the current leader and there is no election underway. If this node can not be the leader then ensure it releases the recovery lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	789a75abfa	ctdb-recoverd: Process leader broadcasts Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	3d3767a259	ctdb-protocol: Add CTDB_SRVID_LEADER CTDB_SRVID_LEADER will be regularly broadcast to all connected nodes by the leader. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00

1 2 3 4 5 ...

8883 Commits