1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-28 07:21:54 +03:00
Commit Graph

128611 Commits

Author SHA1 Message Date
Martin Schwenke
5d31778149 ctdb-tests: Support commenting out local daemons configuration options
Can be used to disable default options, such as cluster lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
34d2ca0ae6 ctdb-config: Add configuration option [cluster] leader timeout
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
1dfb266038 ctdb-config: [legacy] recmaster capability -> [cluster] leader capability
Rename this configuration item and move it into the [cluster]
configuration section.

Update documentation to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
f5a39058f0 ctdb-config: [cluster] recovery lock -> [cluster] cluster lock
Retain "recovery lock" and mark as deprecated for backward
compatibility.

Some documentation is still inconsistent.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
d752a92e11 ctdb-doc: Update documentation for leader and cluster lock
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
73555e8248 ctdb-recoverd: Use race for cluster lock as election when lock is enabled
If the cluster is partitioned then nodes in one partition can not take
the lock anyway, so election is pointless.  It just introduces
unnecessary corner cases.

Instead just race for the lock.

When a node notices a lack of leader and notifies other nodes of an
election via an unknown leader broadcast, the cluster lock election is
hooked into this broadcast.

The test needs to be updated because losing the cluster lock can now
result in a leadership change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
938d64c8ff ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
03ae158cff ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
a76374070d ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
193b624d26 ctdb-protocol: Drop protocol client functions for recmaster controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
cda673ff6d ctdb-client: Drop unused recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
16efbca003 ctdb-daemon: Drop unused old client recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
c68267b2a6 ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster()
Nothing fetches this value anymore.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
58d7fcdf7c ctdb-recoverd: Drop recovery master verification
This doesn't make sense if leader broadcasts are used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
f02e097485 ctdb-tools: recovery master -> leader
The following command names are changed:

  recmaster -> leader
  setrecmasterrole -> setleaderrole

Command output changed for the following commands:

  status
  getcapabilities

Documentation and tests are updated to reflect these changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
e60581d5b5 ctdb-tools: Use leader broadcast in get_leader()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
92fb68e9b8 ctdb-tools: Factor out get_leader()
This seems pointless but it localises a subsequent change and also
starts a terminology change in the tool code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
17ba15ccd8 ctdb-tools: Handle leader broadcasts in ctdb tool
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
ec90f36cc6 ctdb-tools: Print "UNKNOWN" when leader PNN is unknown
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
01a8d1a4a4 ctdb-client: Factor out function ctdb_client_wait_func_timeout()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
403db5b528 ctdb-tests: Factor out getting leader and waiting for leader change
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
4786982cc8 ctdb-tests: Add leader broadcasts to fake_ctdbd
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Amitay Isaacs
756dfdfed9 ctdb-tests: Implement srvid_handler for dispatching messages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-01-17 10:21:33 +00:00
Martin Schwenke
958746f947 ctdb-recoverd: Simplify some stopped/banned checks to inactive checks
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
358c59f51a ctdb-recoverd: No longer take cluster lock during recovery
Confirm instead that it is already held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
36ffaaa691 ctdb-recoverd: Add and use function cluster_lock_enabled()
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
5ee664ee17 ctdb-recoverd: Terminology change: recovery lock -> cluster lock
No functional changes, just name changes for clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
0f2250f4f9 ctdb-recoverd: Take cluster lock when election completes
It is no longer just a recovery lock but is always held by the cluster
leader.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
011e880002 ctdb-recoverd: Factor out function cluster_lock_take()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
037abf8620 ctdb-tests: Avoid a race
See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef7e3265f7 ctdb-tests: Setup cluster with expected arguments
ctdb_test_init() doesn't actually pass arguments to local_daemons.sh.
This needs to be done using ctdb_nodes_start_custom().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
b029ca4d51 ctdb-recoverd: Drop leader validation
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation.  Using the leader
broadcast may not be as fast but it is more correct.

When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active.  This is because the leader will no
longer push the node map to other nodes.  However, having all nodes
fetch the node map from an inactive leader may be unreliable.

Most of the other cases are also handled more reliably by the leader
broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7e53fab0a3 ctdb-recoverd: Drop special case for elected-before-connected
This no longer occurs at startup due to the leader broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef4b8c13c0 ctdb-recoverd: Handle leader broadcast timeout
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.

Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.

When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes.  That causes cancellation of the leader
broadcast timeout across the cluster.  This is particular important at
startup, since nodes may be started in a staggered fashion.  Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader.  When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
5c7f6da0f0 ctdb-recoverd: Send leader broadcasts
These are triggered on 1 second timer, but are only sent if the node
is the current leader and there is no election underway.

If this node can not be the leader then ensure it releases the
recovery lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
789a75abfa ctdb-recoverd: Process leader broadcasts
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
3d3767a259 ctdb-protocol: Add CTDB_SRVID_LEADER
CTDB_SRVID_LEADER will be regularly broadcast to all connected nodes
by the leader.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
c2cfd9c21a ctdb-recoverd: Add an explicit flag for election in progress
An alternate election method will be added that doesn't use the
election timeout, so this provides a common way for recognising when
an election is in progress.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ac5a3ca063 ctdb-recoverd: Only start election if node can be leader
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7baadfe27e ctdb-recoverd: Add and use function this_node_can_be_leader()
This makes the code self-documenting.

In ctdb_election_data() there is a slight behaviour change.  An
inactive node will now try to lose an election.  This case should not happen
because:

* An inactive node can't win an election round and then send a reply.

* Any inactive node should never start an election.  There are
  currently places where this happens and they will be fixed later.

There is an instance where this could be used in
validate_recovery_master() but this involves a more serious logic
change.  Overhaul this function later.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
94b546c268 ctdb-recoverd: Logging/comments: recovery master -> leader
There are some remaining instances in this file but they will be
removed in subsequent commits.

Modernise debug macros as appropriate.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
dd79e9bd14 ctdb-recoverd: Rename recmaster field to leader
Recovery master is being renamed to leader.  This follows clustering
best practice (e.g. RAFT).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
2ee6763c7d ctdb-recoverd: Use rec->pnn everywhere
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

rec->pnn is now always used when referring to the recovery daemon's
PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
4af3b10a37 ctdb-recoverd: Change argument to srvid_disable_and_reply()
Reduce dependency on struct ctdb_context internals, enable a
subsequent change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
b7c138ca99 ctdb-recoverd: Simplify arguments to ctdb_ban_node()
ban_time argument is always ctdb->tunable.recovery_ban_period, so
build this in and make the calling code more readable.

ctdb_ban_node() already logs how long a node is banned for, so don't
repeatedly log this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
a5e0ddac62 ctdb-recoverd: Simplify arguments to verify_local_ip_allocation()
All other arguments are available via rec, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
67b5191640 ctdb-recoverd: Simplify arguments to do_recovery()
pnn and nodemap are both available via the rec context, so simplify.
vnnmap is unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57882beb16 ctdb-recoverd: Simplify arguments to some election functions
The pnn and nodemap arguments to force_election() and
send_election_request() are always effectively rec->pnn and
rec->nodemap, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
9dbe7cc85e ctdb-recoverd: Add PNN to recovery daemon context
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

The intention is to always use rec->pnn when referring to the recovery
daemon's PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ff0140e470 ctdb-recoverd: Use this_node_is_leader() in an extra context
This is arguably clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00