IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If you happen to talloc_free(run_ctx) before all the tevent_req's
hanging off it, you run into the following:
==495196== Invalid read of size 8
==495196== at 0x10D757: run_proc_state_destructor (run_proc.c:413)
==495196== by 0x488F736: _tc_free_internal (talloc.c:1158)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x48538B1: tevent_req_received (tevent_req.c:293)
==495196== by 0x4853429: tevent_req_destructor (tevent_req.c:129)
==495196== by 0x488F736: _tc_free_internal (talloc.c:1158)
==495196== by 0x4890AF6: _tc_free_children_internal (talloc.c:1669)
==495196== by 0x488F967: _tc_free_internal (talloc.c:1184)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x10DE62: main (run_proc_test.c:86)
==495196== Address 0x55b77f8 is 152 bytes inside a block of size 160 free'd
==495196== at 0x48399AB: free (vg_replace_malloc.c:538)
==495196== by 0x488FB25: _tc_free_internal (talloc.c:1222)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x10D315: run_proc_context_destructor (run_proc.c:329)
==495196== by 0x488F736: _tc_free_internal (talloc.c:1158)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x10DE62: main (run_proc_test.c:86)
==495196== Block was alloc'd at
==495196== at 0x483877F: malloc (vg_replace_malloc.c:307)
==495196== by 0x488EAD9: __talloc_with_prefix (talloc.c:783)
==495196== by 0x488EC73: __talloc (talloc.c:825)
==495196== by 0x488F0FC: _talloc_named_const (talloc.c:982)
==495196== by 0x48925B1: _talloc_zero (talloc.c:2421)
==495196== by 0x10C8F2: proc_new (run_proc.c:61)
==495196== by 0x10D4C9: run_proc_send (run_proc.c:381)
==495196== by 0x10DDF6: main (run_proc_test.c:79)
This happens because run_proc_context_destructor() directly does a
talloc_free() on the struct proc_context's and not the enclosing
tevent_req's. run_proc_kill() makes sure that we don't follow
proc->req, but it forgets the "state->proc", which is free()'ed, but
later dereferenced in run_proc_state_destructor().
This is an attempt at a quick fix, I believe we should convert
run_proc_context->plist into an array of tevent_req's, so that we can
properly TALLOC_FREE() according to the "natural" hierarchy and not
just pull an arbitrary thread out of that heap.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15269
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Oct 6 15:10:20 UTC 2022 on sn-devel-184
(cherry picked from commit 688be0177b)
Directly using dbgtext() with file logging results in a log entry with
no header, which is wrong. This is a regression, introduced in commit
10d15c9e5d. Prior to this, CTDB's
callback for file logging would always add a header.
Use DEBUG() instead dbgtext(). Note that DEBUG() effectively compares
the passed script_log_level with DEBUGLEVEL, so an explicit check is
no longer necessary.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Jun 16 13:33:10 UTC 2022 on sn-devel-184
(cherry picked from commit e752f841e6)
These aren't set anywhere in the code.
Drop the log argument because it is also no longer used.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
(cherry picked from commit 88f35cf862)
This allows ctdb_set_child_logging() to work.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
(cherry picked from commit 1596a3e84b)
A stalled node probably continues to hold the cluster lock, so confirm
elections work in this case.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Feb 14 02:46:01 UTC 2022 on sn-devel-184
(cherry picked from commit 331c435ce5)
Autobuild-User(v4-16-test): Jule Anger <janger@samba.org>
Autobuild-Date(v4-16-test): Tue Feb 15 09:55:38 UTC 2022 on sn-devel-184
Elections should now be quite rare, so always log when one begins.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 0e74e03c9c)
This is currently missed when the cluster lock is lost.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit bf55a0117d)
The problem here is that election-in-progress must be set to
potentially avoid restarting the election broadcast timeout in
main_loop(), so this is already done by leader_handler().
Have force_election() set election-in-progress for all election types
and do not bother setting it in cluster_lock_election().
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 9b3fab052b)
Election-in-progress is set by unknown leader broadcast, so needs to
be cleared in all cases when election completes.
This was seen in a case where the leader node stalled, so didn't send
leader broadcasts for some time. The node continued to hold the
cluster lock, so another node could not become leader. However, after
the node returned to normal it still did not send leader broadcasts
because election-in-progress was never cleared.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 188a902156)
This is many years out of date and recent changes make it worse. It
is unlikely that anyone has the time to fix this in the near future,
so remove it because it is misleading.
Database recovery steps are well documented in comments in the
recovery helper. Cluster monitoring documentation can be re-added
when things stop changing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Rename test, clean up node selection. Duplicate for for banning and
removing leader capability cases. Repeat all 3 tests without cluster
lock.
All of the standard election triggers are now tested, with and without
cluster lock. Due to test cluster configuration limitations, the
tests without cluster lock are skipped on a real cluster.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Can be used to disable default options, such as cluster lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Rename this configuration item and move it into the [cluster]
configuration section.
Update documentation to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Retain "recovery lock" and mark as deprecated for backward
compatibility.
Some documentation is still inconsistent.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the cluster is partitioned then nodes in one partition can not take
the lock anyway, so election is pointless. It just introduces
unnecessary corner cases.
Instead just race for the lock.
When a node notices a lack of leader and notifies other nodes of an
election via an unknown leader broadcast, the cluster lock election is
hooked into this broadcast.
The test needs to be updated because losing the cluster lock can now
result in a leadership change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This doesn't make sense if leader broadcasts are used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The following command names are changed:
recmaster -> leader
setrecmasterrole -> setleaderrole
Command output changed for the following commands:
status
getcapabilities
Documentation and tests are updated to reflect these changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This seems pointless but it localises a subsequent change and also
starts a terminology change in the tool code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is no longer just a recovery lock but is always held by the cluster
leader.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_test_init() doesn't actually pass arguments to local_daemons.sh.
This needs to be done using ctdb_nodes_start_custom().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation. Using the leader
broadcast may not be as fast but it is more correct.
When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active. This is because the leader will no
longer push the node map to other nodes. However, having all nodes
fetch the node map from an inactive leader may be unreliable.
Most of the other cases are also handled more reliably by the leader
broadcast timeout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This no longer occurs at startup due to the leader broadcast timeout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.
Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.
When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes. That causes cancellation of the leader
broadcast timeout across the cluster. This is particular important at
startup, since nodes may be started in a staggered fashion. Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader. When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These are triggered on 1 second timer, but are only sent if the node
is the current leader and there is no election underway.
If this node can not be the leader then ensure it releases the
recovery lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB_SRVID_LEADER will be regularly broadcast to all connected nodes
by the leader.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>