1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00
Commit Graph

2668 Commits

Author SHA1 Message Date
Martin Schwenke
a76374070d ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
16efbca003 ctdb-daemon: Drop unused old client recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
c68267b2a6 ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster()
Nothing fetches this value anymore.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
58d7fcdf7c ctdb-recoverd: Drop recovery master verification
This doesn't make sense if leader broadcasts are used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
958746f947 ctdb-recoverd: Simplify some stopped/banned checks to inactive checks
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
358c59f51a ctdb-recoverd: No longer take cluster lock during recovery
Confirm instead that it is already held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
36ffaaa691 ctdb-recoverd: Add and use function cluster_lock_enabled()
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
5ee664ee17 ctdb-recoverd: Terminology change: recovery lock -> cluster lock
No functional changes, just name changes for clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
0f2250f4f9 ctdb-recoverd: Take cluster lock when election completes
It is no longer just a recovery lock but is always held by the cluster
leader.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
011e880002 ctdb-recoverd: Factor out function cluster_lock_take()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
b029ca4d51 ctdb-recoverd: Drop leader validation
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation.  Using the leader
broadcast may not be as fast but it is more correct.

When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active.  This is because the leader will no
longer push the node map to other nodes.  However, having all nodes
fetch the node map from an inactive leader may be unreliable.

Most of the other cases are also handled more reliably by the leader
broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7e53fab0a3 ctdb-recoverd: Drop special case for elected-before-connected
This no longer occurs at startup due to the leader broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef4b8c13c0 ctdb-recoverd: Handle leader broadcast timeout
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.

Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.

When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes.  That causes cancellation of the leader
broadcast timeout across the cluster.  This is particular important at
startup, since nodes may be started in a staggered fashion.  Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader.  When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
5c7f6da0f0 ctdb-recoverd: Send leader broadcasts
These are triggered on 1 second timer, but are only sent if the node
is the current leader and there is no election underway.

If this node can not be the leader then ensure it releases the
recovery lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
789a75abfa ctdb-recoverd: Process leader broadcasts
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
c2cfd9c21a ctdb-recoverd: Add an explicit flag for election in progress
An alternate election method will be added that doesn't use the
election timeout, so this provides a common way for recognising when
an election is in progress.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ac5a3ca063 ctdb-recoverd: Only start election if node can be leader
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7baadfe27e ctdb-recoverd: Add and use function this_node_can_be_leader()
This makes the code self-documenting.

In ctdb_election_data() there is a slight behaviour change.  An
inactive node will now try to lose an election.  This case should not happen
because:

* An inactive node can't win an election round and then send a reply.

* Any inactive node should never start an election.  There are
  currently places where this happens and they will be fixed later.

There is an instance where this could be used in
validate_recovery_master() but this involves a more serious logic
change.  Overhaul this function later.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
94b546c268 ctdb-recoverd: Logging/comments: recovery master -> leader
There are some remaining instances in this file but they will be
removed in subsequent commits.

Modernise debug macros as appropriate.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
dd79e9bd14 ctdb-recoverd: Rename recmaster field to leader
Recovery master is being renamed to leader.  This follows clustering
best practice (e.g. RAFT).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
2ee6763c7d ctdb-recoverd: Use rec->pnn everywhere
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

rec->pnn is now always used when referring to the recovery daemon's
PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
4af3b10a37 ctdb-recoverd: Change argument to srvid_disable_and_reply()
Reduce dependency on struct ctdb_context internals, enable a
subsequent change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
b7c138ca99 ctdb-recoverd: Simplify arguments to ctdb_ban_node()
ban_time argument is always ctdb->tunable.recovery_ban_period, so
build this in and make the calling code more readable.

ctdb_ban_node() already logs how long a node is banned for, so don't
repeatedly log this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
a5e0ddac62 ctdb-recoverd: Simplify arguments to verify_local_ip_allocation()
All other arguments are available via rec, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
67b5191640 ctdb-recoverd: Simplify arguments to do_recovery()
pnn and nodemap are both available via the rec context, so simplify.
vnnmap is unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57882beb16 ctdb-recoverd: Simplify arguments to some election functions
The pnn and nodemap arguments to force_election() and
send_election_request() are always effectively rec->pnn and
rec->nodemap, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
9dbe7cc85e ctdb-recoverd: Add PNN to recovery daemon context
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

The intention is to always use rec->pnn when referring to the recovery
daemon's PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ff0140e470 ctdb-recoverd: Use this_node_is_leader() in an extra context
This is arguably clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
c8721d01c6 ctdb-recoverd: Factor out and use function this_node_is_leader()
Make the code self-documenting.

This preempts an upcoming change to terminology but doing it now saves
a lot of churn.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57a32cebdd ctdb-recoverd: Pass SIGHUP to running helper
The recovery and takeover helpers can run for a while and generate
non-trivial logs, so have them reopen their logs to support log
rotation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jan 17 04:36:30 UTC 2022 on sn-devel-184
2022-01-17 04:36:30 +00:00
Martin Schwenke
8e949a6082 ctdb-recoverd: Record helper PID in recovery daemon context
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
97a45f6f25 ctdb-recoverd: Add log reopening on SIGHUP to helpers
Recovery and takeover helpers can run for a while and generate
non-trivial logs.  They should support log reopening.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
51f0380e83 ctdb-daemon: Enable log reopening for event daemon
Add and call hook to pass on SIGHUP to eventd.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
c554a325fe ctdb-daemon: Enable log reopening for recovery daemon
Pass on a SIGHUP to the recovery daemon, which will then reopen its
logs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
4acfefed61 ctdb-recoverd: Add basic log reopening
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
4ed37de82b ctdb-daemon: Add basic top-level log reopening
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
9e7d2d9794 ctdb-daemon: Don't mark a node as unhealthy when connecting to it
Remote nodes are already initialised as UNHEALTHY when the node list
is initialised at startup (ctdb_load_nodes_file() calls
convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
So, drop this code.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep  9 02:38:34 UTC 2021 on sn-devel-184
2021-09-09 02:38:34 +00:00
Martin Schwenke
7f697b1938 ctdb-daemon: Ignore flag changes for disconnected nodes
If this node is not connected to a node then we shouldn't know
anything about it.  The state will be pushed later by the recovery
master.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
ae10a8a4b7 ctdb-daemon: Simplify ctdb_control_modflags()
Now that there are separate disable/enable controls used by the ctdb
tool this control can ignore any flag updates for the current nodes.
These only come from the recovery master, which depends on being able
to fetch flags for all nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
916c5ee131 ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
and replace with srvid_not_implemented().  Mark the SRVID obsolete in
its comment.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e75256767f ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
The code that handles this message is
ctdb_recoverd.c:monitor_handler().  Although it appears to do
something potentially useful, it only logs the flags changes.  All
changes made are to local structures - there are no actual
side-effects.

It used to trigger a takeover run when the DISABLED flag changed.
This was dropped back in commit
662f06de9f.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
0132bd5a22 ctdb-daemon: Modernise remaining debug macro in this function
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
b6d25d079e ctdb-daemon: Update logging for flag changes
When flags change, promote the message to NOTICE level and switch the
message to the style that is currently generated by
ctdb-recoverd.c:monitor_handler().  This will allow monitor_handler()
to go away in future.

Drop logging when flags do not change.  The recovery master now logs
when it pushes flags for a node, so the lack of a corresponding
"changed flags" message here indicates that no update was required.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
eec44e2862 ctdb-daemon: Correct the condition for logging unchanged flags
Don't trust the old flags from the recovery master.

Surrounding code will change in future comments, including the use of
old-style debug macros, so just make this change clear.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
15a6489c28 ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
60c1ef1465 ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
intended here.  Luckily, it doesn't do any harm because nodes are
marked unhealthy at startup anyway.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
1ac7bc7532 ctdb-daemon: Factor out a function to get node structure from PNN
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e0a7b5a9e8 ctdb-daemon: Add a helper variable
Simplifies a subsequent change.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
8305f6a7f1 ctdb-recoverd: Push flags for a node if any remote node disagrees
This will usually happen if flags on the node in question change, so
keeping the code simple and pushing to all nodes won't hurt.  When all
nodes come up there might be differences in connected nodes, causing
such "fix ups".  Receiving nodes will ignore no-op pushes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
620d078714 ctdb-recoverd: Update the local node map before pushing out flags
The resulting code structure looks a little weird.  However, there is
another condition that requires the flags to be pushed that will be
inserted before the continue statement in a subsequent commit..

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
82a075d4d7 ctdb-recoverd: Add a helper variable
Improves readability and simplifies subsequent changes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e40d452722 ctdb-daemon: Close server socket when switching to client
The socket is set close-on-exec but that doesn't help for processes
that do not exec().  This should be done for all child processes.

This has been seen in testing where "ctdb shutdown" waits for the
socket to close before succeeding.  It appears that lingering
vacuuming processes have not closed the socket when becoming clients
so they cause "ctdb shutdown" to hang even though the main daemon
process has exited.  The cause of the lingering vacuuming processes
has been previously examined but still isn't understood.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Amitay Isaacs
d07875330a ctdb-locking: Pass additional arguments to debug locks script
1. PID of lock helper waiting for lock
2. Scope of lock: "record" or "db"
3. Path to database that lock helper is trying to lock
4. Whether the database uses mutexes: "mutex" or "fcntl"

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2021-05-28 06:46:29 +00:00
Volker Lendecke
06b740e2fb ctdb: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-03-09 22:36:28 +00:00
Martin Schwenke
65ab8cb014 ctdb-daemon: Do not attempt to chown Unix domain socket in test mode
If run with UID wrapper and UID_WRAPPER_ROOT=1 then securing the
socket will fail.

Test mode means that local daemons are in use, so securing the socket
is not important.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
2020-11-02 08:58:31 +00:00
Martin Schwenke
78c3b5b6a8 ctdb-daemon: Clean up call to bind socket
Variable res is only used once and ret is re-used many times.  Drop
res, use ret, which doesn't need to be initialised.  Modernise debug
macro.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
2020-11-02 08:58:31 +00:00
Martin Schwenke
9404f8631e ctdb-daemon: Clean up socket bind/secure/listen
Obey the coding style, modernise debug macros, clean up whitespace.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
2020-11-02 08:58:31 +00:00
Martin Schwenke
4b01f54041 ctdb-recoverd: Drop unnecessary and broken code
update_flags() has already updated the recovery master's canonical
node map, based on the flags from each remote node, and pushed out
these flags to all nodes.

If i == j then the node map has already been updated from this remote
node's flags, so simply drop this case.

Although update_flags() has updated flags for all nodes, it did not
update each node map in remote_nodemaps[] to reflect this.  This means
that remote_nodemaps[] may contain inconsistent flags for some nodes
so it should not be used to check consistency when i != j.

Further, a meaningful difference in flags can only really occur if
update_flags() failed.  In that case this code is never reached.

These observations combine to imply that this whole loop should be
dropped.

This leaves potential sub-second inconsistencies due to out-of-band
healthy/unhealthy flag changes pushed via CTDB_SRVID_PUSH_NODE_FLAGS.
These updates could be dropped (takeover run asks each node for
available IPs rather than making centralised decisions based on node
flags) but for now they will be fixed in the next iteration of
main_loop().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-10-06 03:12:35 +00:00
Martin Schwenke
3ab52b5286 ctdb-recoverd: Drop unnecessary code
This has already been done in update_flags().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-10-06 03:12:35 +00:00
Martin Schwenke
d98f68f918 ctdb-daemon: Drop implementation of old-style database pull/push controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Sep 11 06:29:32 UTC 2020 on sn-devel-184
2020-09-11 06:29:32 +00:00
Martin Schwenke
2efce7d477 ctdb-recovery: Simplify database push function names
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-09-11 05:06:42 +00:00
Martin Schwenke
f4e2206e88 ctdb-recovery: Drop unnecessary database push wrapper
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-09-11 05:06:42 +00:00
Martin Schwenke
225a699633 ctdb-recovery: Drop passing of capabilities into database pull
This is no longer necessary because the capability new style database
pull is assumed to always be available.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-09-11 05:06:42 +00:00
Martin Schwenke
595c1a7c0f ctdb-recovery: Simplify database pull function names
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-09-11 05:06:42 +00:00
Martin Schwenke
f968576642 ctdb-recovery: Remove use of old pull and push controls
Removes use of the old controls without cleaning up the code.  Clean
up can be done later.

After this change the CTDB_CAP_FRAGMENTED_CONTROLS capability is no
longer checked.  This capability can be removed along with the
controls.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-09-11 05:06:42 +00:00
Martin Schwenke
8bb6a6607d ctdb-recoverd: Broadcast takeover run message when verifying IPs
This makes it consistent with the monitoring code.  If the master has
changed then this means the master will always get the message.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 18 06:24:11 UTC 2020 on sn-devel-184
2020-08-18 06:24:11 +00:00
Martin Schwenke
4aa8e72d60 ctdb-recoverd: Rename update_local_flags() -> update_flags()
This also updates remote flags so the name is misleading.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
702c7c4934 ctdb-recoverd: Change update_local_flags() to use already retrieved nodemaps
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
910a0b3b74 ctdb-recoverd: Get remote nodemaps earlier
update_local_flags() will be changed to use these nodemaps.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
d50919b0cb ctdb-recoverd: Do not fetch the nodemap from the recovery master
The nodemap has already been fetched from the local node and is
actually passed to this function.  Care must be taken to avoid
referencing the "remote" nodemap for the recovery master.  It also
isn't useful to do so, since it would be the same nodemap.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
762d1d8a96 ctdb-recoverd: Change get_remote_nodemaps() to use connected nodes
The plan here is to use the nodemaps retrieved by get_remote_nodes()
in update_local_flags().  This will improve efficiency, since
get_remote_nodes() fetches flags from nodes in parallel.  It also
means that get_remote_nodes() can be used exactly once early on in
main_loop() to retrieve remote nodemaps.  Retrieving nodemaps multiple
times is unnecessary and racy - a single monitoring iteration should
not fetch flags multiple times and compare them.

This introduces a temporary behaviour change but it will be of no
consequence when the above changes are made.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
368c83bfe3 ctdb-recoverd: Fix node_pnn check and assignment of nodemap into array
This array is indexed by the same index as nodemap, not the PNN.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
10ce0dbf1c ctdb-recoverd: Add fail callback to assign banning credits
Also drop error handling in main_loop() that is replaced by this
change.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
a079ee3169 ctdb-recoverd: Add an intermediate state struct for nodemap fetching
This will allow an error callback to be added.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
2eaa0af616 ctdb-recoverd: Move memory allocation into get_remote_nodemaps()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
3324dd272c ctdb-recoverd: Change signature of get_remote_nodemaps()
Change 1st argument to a rec context, since this will be needed later.
Drop the nodemap argument and access it via rec->nodemap instead.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
d2d90f2502 ctdb-recoverd: Fix a local memory leak
The memory is allocated off the memory context used by the current
iteration of main loop.  It is freed when main loop completes the fix
doesn't require backporting to stable branches.  However, it is sloppy
so it is worth fixing.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
52f520d39c ctdb-recoverd: Basic cleanups for get_remote_nodemaps()
Don't log an error on failure - let the caller can do this.  Apart
from this: fix up coding style and modernise the remaining error
message.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Ralph Boehme
2327471756 lib: relicense smb_strtoul(l) under LGPLv3
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug  3 22:21:04 UTC 2020 on sn-devel-184
2020-08-03 22:21:02 +00:00
Martin Schwenke
5ce6133a75 ctdb-recoverd: Simplify calculation of new flags
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jul 24 06:03:23 UTC 2020 on sn-devel-184
2020-07-24 06:03:23 +00:00
Martin Schwenke
3654e41677 ctdb-recoverd: Correctly find nodemap entry for pnn
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
9475ab0441 ctdb-recoverd: Do not retrieve nodemap from recovery master
It is already in rec->nodemap.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
0c6a7db3ba ctdb-recoverd: Flatten update_flags_on_all_nodes()
The logic currently in ctdb_ctrl_modflags() will be optimised so that
it no longer matches the pattern for a control function.  So, remove
this function and squash its functionality into the only caller.

Although there are some superficial changes, the behaviour is
unchanged.

Flattening the 2 functions produces some seriously weird logic for
setting the new flags, to the point where using ctdb_ctrl_modflags()
for this purpose now looks very strange.  The weirdness will be
cleaned up in a subsequent commit.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
a88c10c5a9 ctdb-recoverd: Move ctdb_ctrl_modflags() to ctdb_recoverd.c
This file is the only user of this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
b1e631ff92 ctdb-recoverd: Improve a call to update_flags_on_all_nodes()
This should take a PNN, not an array index.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
915d24ac12 ctdb-recoverd: Use update_flags_on_all_nodes()
This is clearer than using the MODFLAGS control directly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
f681c0e947 ctdb-recoverd: Introduce some local variables to improve readability
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
cb3a3147b7 ctdb-recoverd: Change update_flags_on_all_nodes() to take rec argument
This makes fields such as recmaster and nodemap easily available if
required.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
6982fcb3e6 ctdb-recoverd: Drop unused nodemap argument from update_flags_on_all_nodes()
An unused argument needlessly extends the length of function calls.  A
subsequent change will allow rec->nodemap to be used if necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
53b73b9b0f ctdb-daemon: Fix sorting of hot keys
The current code only ever swaps with slot 0.  This will only ever
happen with slots 0 and 1, so probably never sorts.

Replace with qsort().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:45 +00:00
Martin Schwenke
5c8dfbbf9b ctdb-daemon: Add extra logging of hot keys
ctdbd currently only logs when a new hot key is added.  If a key gets
hotter then nothing new is logged.

Log hot key updates when the number of migrations has doubled since
the last time that key was logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:45 +00:00
Martin Schwenke
baf058dcf7 ctdb-daemon: Update hot key logging
This message indicates that a hot key was added, so say that.  After
all the hot key slots have been filled the id will always be 0, so
don't bother logging it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
1ab39b3270 ctdb-daemon: Fix bug in slot 0 comparison optimisation
This is only valid if all slots are in use.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
f9f60c2a60 ctdb-daemon: Switch some variables to unsigned
These should be unsigned but luck is currently on our side.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
21b9844bcb ctdb-daemon: Add separate hot keys array for database statistics
There are 2 reasons for this.  Sorting of hot keys is broken and will
be changed to an implementation that needs a named (i.e. not
anonymous) structure.  Also, at least one non-protocol field will be
added to facilitate more useful logging.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Volker Lendecke
d9ccd853c3 ctdb: Implement CTDB_CONTROL_ECHO_DATA
Testing control: 4 bytes msec delay plus a blob, return the request after the
delay. This is an enhanced "ping" which can be used to test asynchronous
clients.

Doesn't have the full protocol implementation yet

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-04-28 09:08:39 +00:00
Volker Lendecke
ad4b53f2d9 ctdb: Fix a memleak
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14348
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Apr 17 08:32:35 UTC 2020 on sn-devel-184
2020-04-17 08:32:35 +00:00
Martin Schwenke
f8f3d7954d ctdb-vacuum: Reschedule vacuum event if VacuumInterval has increased
The vacuuming integration tests set VacuumInterval to a very high
number to avoid vacuuming collisions.  This is done after the cluster
is healthy, so Samba will have already been started and vacuuming will
already be scheduled *at the default interval* for databases attached
by Samba.  This means that vacuuming controls used by vacuuming tests
can still collide with the scheduled vacuuming events.

Add some logic to reschedule a vacuuming event that has fired but
where VacuumInterval has increased since it was originally scheduled.
The increase in VacuumInterval is used as the time offset for
rescheduling the event.

Although this changes production behaviour for the convenience of
testing, the new behaviour is completely reasonable and obeys the
principle of least surprise.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr  7 03:04:57 UTC 2020 on sn-devel-184
2020-04-07 03:04:57 +00:00
Martin Schwenke
5d03a3c86e ctdb-vacuum: Store value of VacuumInterval in ctdb_vacuum_handle
No behaviour change.  This is final staging to make the next change
completely obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-04-07 01:26:41 +00:00
Martin Schwenke
7ad7c0b932 ctdb-vacuum: Use vacuum_handle local variables
No behaviour change.  This just makes future changes clearer by
avoiding reformatting (or introducing local variables).

Clean up error handling while touching a relevant line.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-04-07 01:26:41 +00:00
Martin Schwenke
716f52f68b ctdb-recoverd: Avoid dereferencing NULL rec->nodemap
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.

One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:

  list_of_nodes()
  list_of_active_nodes()
  set_recovery_mode()
  force_election()
  lost_reclock_handler()

Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL.  Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324

Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
2020-03-24 01:22:45 +00:00
Martin Schwenke
147afe77de ctdb-daemon: Don't allow attach from recovery if recovery is not active
Neither the recovery daemon nor the recovery helper should attach
databases outside of the recovery process.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
052f1bdb9c ctdb-daemon: Remove more unused old client database functions
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
3a66d181b6 ctdb-recovery: Remove old code for creating missing databases
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
76a8174279 ctdb-recovery: Create database on nodes where it is missing
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
e6e63f8fb8 ctdb-recovery: Fetch database name from all nodes where it is attached
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
1bdfeb3fdc ctdb-recovery: Pass db structure for each database recovery
Instead of db_id and db_flags.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
c6f74e590f ctdb-recovery: GET_DBMAP from all nodes
This builds a complete list of databases across the cluster so it can
be used to create databases on the nodes where they are missing.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
4c0b9c3605 ctdb-recovery: Replace use of ctdb_dbid_map with local db_list
This will be used to build a merged list of databases from all nodes,
allowing the recovery helper to create missing databases.

It would be possible to also include the db_name field in this
structure but that would cause a lot of churn.  This field is used
locally in the recovery of each database so can continue to live in
the relevant state structure(s).

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
7e5a8a4884 ctdb-daemon: Respect CTDB_CTRL_FLAG_ATTACH_RECOVERY when attaching databases
This is currently only set by the recovery daemon when it attaches
missing databases, so there is no obvious behaviour change.  However,
attaching missing databases can now be moved to the recovery helper as
long as it sets this flag.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
98e3d0db2b ctdb-recovery: Use CTDB_CTRL_FLAG_ATTACH_RECOVERY to attach during recovery
ctdb_ctrl_createdb() is only called by the recovery daemon, so this is
a safe, temporary change.  This is temporary because
ctdb_ctrl_createdb(), create_missing_remote_databases() and
create_missing_local_databases() will all go away soon.

Note that this doesn't cause a change in behaviour.  The main daemon
will still only defer attaches from non-recoverd processes during
recovery.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
fc23cd1b9c ctdb-daemon: Remove unused old client database functions
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:37 +00:00
Martin Schwenke
c6c89495fb ctdb-daemon: Fix database attach deferral logic
Commit 3cc230b5ee says:

  Dont allow clients to connect to databases untile we are well past
  and through the initial recovery phase

It is unclear what this commit was attempting to do.  The commit
message implies that more attaches should be deferred but the code
change adds a conjunction that causes less attaches to be deferred.
In particular, no attaches will be deferred after startup is complete.
This seems wrong.

To implement what seems to be stated in the commit message an "or"
needs to be used so that non-recovery daemon attaches are deferred
either when in recovery or before startup is complete.  Making this
change highlights that attaches need to be allowed during the
"startup" event because this is when smbd is started.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
1c56d6413f ctdb-recovery: Refactor banning a node into separate computation
If a node is marked for banning, confirm that it's not become inactive
during the recovery.  If yes, then don't ban the node.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
c6a0ff1bed ctdb-recovery: Don't trust nodemap obtained from local node
It's possible to have a node stopped, but recovery master not yet
updated flags on the local ctdb daemon when recovery is started.  So do
not trust the list of active nodes obtained from the local node.  Query
the connected nodes to calculate the list of active nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
6e2f8756f1 ctdb-recovery: Consolidate node state
This avoids passing multiple arguments to async computation.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
072ff4d12b ctdb-recovery: Fetched vnnmap is never used, so don't fetch it
New vnnmap is constructed using the information from all the connected
nodes.  So there is no need to fetch the vnnmap from recovery master.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Martin Schwenke
15762a3455 ctdb-daemon: more logical whitespace, debug modernisation
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Boehme <slow@samba.org>
2020-03-12 03:47:30 +00:00
Ralph Boehme
6a4fa0785f ctdb-daemon: ensure restart() callback is called in half-connected state
If NODE_FLAGS_DISCONNECTED is set the node can be in half-connected state. With
this change we ensure to restart the transport for this case.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Martin Schwenke
c9405aec70 ctdb-daemon: Check for lock count underflow
This is a programming error.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-02-18 02:56:38 +00:00
Martin Schwenke
1a0e1f8924 ctdb-daemon: Fork when not interactive and test mode is enabled
There is no sane way of keeping stdin open when using the shell to
background ctdbd in local_daemons.sh.  Instead, have ctdbd fork when
not interactive and when test mode is enabled.  become_daemon() can't
be used for this: if it forks then it also closes stdin.

For the interactive case, become_daemon() wasn't doing anything
special, so do nothing instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-02-10 04:07:39 +00:00
Martin Schwenke
a220e9454a ctdb-daemon: Make some conditions more explicit
These don't need to depend on do_fork.  Child logging should be set up
whenever the daemon is not interactive.  The stdin handler should be
setup whenever test mode is enabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-02-10 04:07:39 +00:00
Martin Schwenke
cefb3327c6 ctdb-daemon: Pass more information to ctdb_start_daemon()
No functional changes.

This is staging for a change that makes ctdbd fork when test mode is
enabled but interactive is not set.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-02-10 04:07:38 +00:00
Martin Schwenke
cf460bd9c4 ctdb-daemon: Shut down if interactive and stdin is closed
This allows a test environment to simply close its end of a pipe to
cleanly shutdown ctdbd.  Like in smbd, this is only done if stdin is a
pipe or a socket.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-01-28 09:57:32 +00:00
Martin Schwenke
d79e2dcfc8 ctdb-daemon: Only stop monitoring if it has been initialised
This avoids a crash if ctdb_shutdown_sequence() is called before
monitoring is initialised.

Switch to using TALLOC_FREE() while touching this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-01-28 09:57:32 +00:00
Martin Schwenke
aa2977e151 ctdb-mutex: Change default re-check time for fcntl helper to 5s
Testing against a commonly used cluster filesystem has shown no
performance impact, as expected.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-01-21 11:39:40 +00:00
Volker Lendecke
42a3e2e503 ctdbd: Use struct initialization
2 lines less

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2020-01-19 18:29:39 +00:00
Björn Jacke
f3754b6487 ctdb/server/ctdb_daemon.c: typo fixes
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-31 00:43:39 +00:00
Björn Jacke
5d2a257c2e ctdb/server/ctdb_client.c: typo fixes
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-31 00:43:39 +00:00
Björn Jacke
7722bd80fc ctdb/server/ctdb_call.c: typo fixes
Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-31 00:43:38 +00:00
Martin Schwenke
41a41d5f3e ctdb-daemon: Implement DB_VACUUM control
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-10-24 04:06:43 +00:00
Martin Schwenke
d462d64cdf ctdb-vacuum: Only schedule next vacuum event if vacuuuming is scheduled
At the moment vacuuming is always scheduled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-10-24 04:06:43 +00:00
Martin Schwenke
13cedaf019 ctdb-daemon: Factor out code to create vacuuming child
This changes the behaviour for some failures from exiting to simply
attempting to schedule the next run.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-10-24 04:06:43 +00:00
Martin Schwenke
5539edfdbe ctdb-vacuum: Simplify recording of in-progress vacuuming child
There can only be one, so simplify the logic.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
d0cc9edc05 ctdb-vacuum: Avoid processing any more packets
All the vacuum operations if required have an event loop to ensure
completion of pending operations.  Once all the steps are complete,
there is no reason to process any more packets.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
680df07630 ctdb-daemon: Avoid memory leak when packet is deferred
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
c6427dddf5 ctdb-recoverd: No need for database detach handler
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming.  Recovery daemon does
not take part in database vacuuming any more.

The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
fc81729dd2 ctdb-recoverd: Drop VACUUM_FETCH message handling
This is now implemented in the ctdb daemon using VACUMM_FETCH control.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
498932c0e8 ctdb-vacuum: Replace VACUUM_FETCH message with control
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:42 +00:00
Amitay Isaacs
86521837b6 ctdb-vacuum: Add processing of fetch queue
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:42 +00:00
Amitay Isaacs
da617f90d9 ctdb-daemon: Add implementation of VACUUM_FETCH control
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:42 +00:00
Martin Schwenke
815ae64400 ctdb-vacuum: Drop debug level of repacking message to NOTICE
This occurs rarely but can adversely impact performance, so it is
worth logging it more frequently.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-10-04 05:47:35 +00:00
Amitay Isaacs
33f1c9d965 ctdb-vacuum: Process all records not deleted on a remote node
This currently skips the last record.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147
RN: Avoid potential data loss during recovery after vacuuming error

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-04 05:47:34 +00:00
Mathieu Parent
7cb0ca4171 Spelling fixes s/ dont / don't /
Excluding examples/tridge/smb.conf

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
2019-09-01 22:21:27 +00:00
Mathieu Parent
736bb924f7 Spelling fixes s/ ot / to /
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
2019-09-01 22:21:27 +00:00
Martin Schwenke
8190993d99 ctdb-recoverd: Fix typo in previous fix
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 27 15:29:11 UTC 2019 on sn-devel-184
2019-08-27 15:29:11 +00:00
Martin Schwenke
5d655ac6f2 ctdb-recoverd: Only check for LMASTER nodes in the VNN map
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-08-21 11:50:30 +00:00
Martin Schwenke
e9f2e205ee ctdb-daemon: Make node inactive in the NODE_STOP control
Currently some of this is supported by a periodic check in the
recovery daemon's main_loop(), which notices the flag change, sets
recovery mode active and freezes databases.  If STOP_NODE returns
immediately then the associated recovery can complete and the node can
be continued before databases are actually frozen.

Instead, immediately do all of the things that make a node inactive.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
RN: Stop "ctdb stop" from completing before freezing databases

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 20 08:32:27 UTC 2019 on sn-devel-184
2019-08-20 08:32:27 +00:00
Martin Schwenke
91ac4c13d8 ctdb-daemon: Drop unused function ctdb_local_node_got_banned()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-08-20 07:15:41 +00:00
Martin Schwenke
0f5f7b7cf4 ctdb-daemon: Switch banning code to use ctdb_node_become_inactive()
There's no reason to avoid immediately setting recovery mode to active
and initiating freeze of databases.

This effectively reverts the following commits:

  d8f3b490bb
  b4357a79d9

The latter is now implemented using a control, resulting in looser
coupling.

See also the following commit:

  f8141e91a6

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-08-20 07:15:41 +00:00