1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-20 14:03:59 +03:00

549 Commits

Author SHA1 Message Date
Martin Schwenke
67b5191640 ctdb-recoverd: Simplify arguments to do_recovery()
pnn and nodemap are both available via the rec context, so simplify.
vnnmap is unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57882beb16 ctdb-recoverd: Simplify arguments to some election functions
The pnn and nodemap arguments to force_election() and
send_election_request() are always effectively rec->pnn and
rec->nodemap, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
9dbe7cc85e ctdb-recoverd: Add PNN to recovery daemon context
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

The intention is to always use rec->pnn when referring to the recovery
daemon's PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ff0140e470 ctdb-recoverd: Use this_node_is_leader() in an extra context
This is arguably clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
c8721d01c6 ctdb-recoverd: Factor out and use function this_node_is_leader()
Make the code self-documenting.

This preempts an upcoming change to terminology but doing it now saves
a lot of churn.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57a32cebdd ctdb-recoverd: Pass SIGHUP to running helper
The recovery and takeover helpers can run for a while and generate
non-trivial logs, so have them reopen their logs to support log
rotation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jan 17 04:36:30 UTC 2022 on sn-devel-184
2022-01-17 04:36:30 +00:00
Martin Schwenke
8e949a6082 ctdb-recoverd: Record helper PID in recovery daemon context
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
4acfefed61 ctdb-recoverd: Add basic log reopening
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
916c5ee131 ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
and replace with srvid_not_implemented().  Mark the SRVID obsolete in
its comment.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
8305f6a7f1 ctdb-recoverd: Push flags for a node if any remote node disagrees
This will usually happen if flags on the node in question change, so
keeping the code simple and pushing to all nodes won't hurt.  When all
nodes come up there might be differences in connected nodes, causing
such "fix ups".  Receiving nodes will ignore no-op pushes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
620d078714 ctdb-recoverd: Update the local node map before pushing out flags
The resulting code structure looks a little weird.  However, there is
another condition that requires the flags to be pushed that will be
inserted before the continue statement in a subsequent commit..

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
82a075d4d7 ctdb-recoverd: Add a helper variable
Improves readability and simplifies subsequent changes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
4b01f54041 ctdb-recoverd: Drop unnecessary and broken code
update_flags() has already updated the recovery master's canonical
node map, based on the flags from each remote node, and pushed out
these flags to all nodes.

If i == j then the node map has already been updated from this remote
node's flags, so simply drop this case.

Although update_flags() has updated flags for all nodes, it did not
update each node map in remote_nodemaps[] to reflect this.  This means
that remote_nodemaps[] may contain inconsistent flags for some nodes
so it should not be used to check consistency when i != j.

Further, a meaningful difference in flags can only really occur if
update_flags() failed.  In that case this code is never reached.

These observations combine to imply that this whole loop should be
dropped.

This leaves potential sub-second inconsistencies due to out-of-band
healthy/unhealthy flag changes pushed via CTDB_SRVID_PUSH_NODE_FLAGS.
These updates could be dropped (takeover run asks each node for
available IPs rather than making centralised decisions based on node
flags) but for now they will be fixed in the next iteration of
main_loop().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-10-06 03:12:35 +00:00
Martin Schwenke
3ab52b5286 ctdb-recoverd: Drop unnecessary code
This has already been done in update_flags().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-10-06 03:12:35 +00:00
Martin Schwenke
8bb6a6607d ctdb-recoverd: Broadcast takeover run message when verifying IPs
This makes it consistent with the monitoring code.  If the master has
changed then this means the master will always get the message.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 18 06:24:11 UTC 2020 on sn-devel-184
2020-08-18 06:24:11 +00:00
Martin Schwenke
4aa8e72d60 ctdb-recoverd: Rename update_local_flags() -> update_flags()
This also updates remote flags so the name is misleading.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
702c7c4934 ctdb-recoverd: Change update_local_flags() to use already retrieved nodemaps
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
910a0b3b74 ctdb-recoverd: Get remote nodemaps earlier
update_local_flags() will be changed to use these nodemaps.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
d50919b0cb ctdb-recoverd: Do not fetch the nodemap from the recovery master
The nodemap has already been fetched from the local node and is
actually passed to this function.  Care must be taken to avoid
referencing the "remote" nodemap for the recovery master.  It also
isn't useful to do so, since it would be the same nodemap.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
762d1d8a96 ctdb-recoverd: Change get_remote_nodemaps() to use connected nodes
The plan here is to use the nodemaps retrieved by get_remote_nodes()
in update_local_flags().  This will improve efficiency, since
get_remote_nodes() fetches flags from nodes in parallel.  It also
means that get_remote_nodes() can be used exactly once early on in
main_loop() to retrieve remote nodemaps.  Retrieving nodemaps multiple
times is unnecessary and racy - a single monitoring iteration should
not fetch flags multiple times and compare them.

This introduces a temporary behaviour change but it will be of no
consequence when the above changes are made.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
368c83bfe3 ctdb-recoverd: Fix node_pnn check and assignment of nodemap into array
This array is indexed by the same index as nodemap, not the PNN.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
10ce0dbf1c ctdb-recoverd: Add fail callback to assign banning credits
Also drop error handling in main_loop() that is replaced by this
change.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
a079ee3169 ctdb-recoverd: Add an intermediate state struct for nodemap fetching
This will allow an error callback to be added.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
2eaa0af616 ctdb-recoverd: Move memory allocation into get_remote_nodemaps()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
3324dd272c ctdb-recoverd: Change signature of get_remote_nodemaps()
Change 1st argument to a rec context, since this will be needed later.
Drop the nodemap argument and access it via rec->nodemap instead.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
d2d90f2502 ctdb-recoverd: Fix a local memory leak
The memory is allocated off the memory context used by the current
iteration of main loop.  It is freed when main loop completes the fix
doesn't require backporting to stable branches.  However, it is sloppy
so it is worth fixing.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
52f520d39c ctdb-recoverd: Basic cleanups for get_remote_nodemaps()
Don't log an error on failure - let the caller can do this.  Apart
from this: fix up coding style and modernise the remaining error
message.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-08-18 05:02:25 +00:00
Martin Schwenke
5ce6133a75 ctdb-recoverd: Simplify calculation of new flags
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jul 24 06:03:23 UTC 2020 on sn-devel-184
2020-07-24 06:03:23 +00:00
Martin Schwenke
3654e41677 ctdb-recoverd: Correctly find nodemap entry for pnn
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
9475ab0441 ctdb-recoverd: Do not retrieve nodemap from recovery master
It is already in rec->nodemap.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
0c6a7db3ba ctdb-recoverd: Flatten update_flags_on_all_nodes()
The logic currently in ctdb_ctrl_modflags() will be optimised so that
it no longer matches the pattern for a control function.  So, remove
this function and squash its functionality into the only caller.

Although there are some superficial changes, the behaviour is
unchanged.

Flattening the 2 functions produces some seriously weird logic for
setting the new flags, to the point where using ctdb_ctrl_modflags()
for this purpose now looks very strange.  The weirdness will be
cleaned up in a subsequent commit.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
a88c10c5a9 ctdb-recoverd: Move ctdb_ctrl_modflags() to ctdb_recoverd.c
This file is the only user of this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
b1e631ff92 ctdb-recoverd: Improve a call to update_flags_on_all_nodes()
This should take a PNN, not an array index.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
915d24ac12 ctdb-recoverd: Use update_flags_on_all_nodes()
This is clearer than using the MODFLAGS control directly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
f681c0e947 ctdb-recoverd: Introduce some local variables to improve readability
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
cb3a3147b7 ctdb-recoverd: Change update_flags_on_all_nodes() to take rec argument
This makes fields such as recmaster and nodemap easily available if
required.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
6982fcb3e6 ctdb-recoverd: Drop unused nodemap argument from update_flags_on_all_nodes()
An unused argument needlessly extends the length of function calls.  A
subsequent change will allow rec->nodemap to be used if necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Volker Lendecke
ad4b53f2d9 ctdb: Fix a memleak
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14348
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Apr 17 08:32:35 UTC 2020 on sn-devel-184
2020-04-17 08:32:35 +00:00
Martin Schwenke
716f52f68b ctdb-recoverd: Avoid dereferencing NULL rec->nodemap
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.

One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:

  list_of_nodes()
  list_of_active_nodes()
  set_recovery_mode()
  force_election()
  lost_reclock_handler()

Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL.  Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324

Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
2020-03-24 01:22:45 +00:00
Martin Schwenke
3a66d181b6 ctdb-recovery: Remove old code for creating missing databases
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Amitay Isaacs
c6427dddf5 ctdb-recoverd: No need for database detach handler
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming.  Recovery daemon does
not take part in database vacuuming any more.

The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
fc81729dd2 ctdb-recoverd: Drop VACUUM_FETCH message handling
This is now implemented in the ctdb daemon using VACUMM_FETCH control.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Mathieu Parent
736bb924f7 Spelling fixes s/ ot / to /
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
2019-09-01 22:21:27 +00:00
Martin Schwenke
8190993d99 ctdb-recoverd: Fix typo in previous fix
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 27 15:29:11 UTC 2019 on sn-devel-184
2019-08-27 15:29:11 +00:00
Martin Schwenke
5d655ac6f2 ctdb-recoverd: Only check for LMASTER nodes in the VNN map
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-08-21 11:50:30 +00:00
Martin Schwenke
6fe963c3f7 ctdb-recoverd: Periodically log recovery master of incomplete cluster
Only do this if the recovery lock is unset.  Log every minute for the
first 10 minutes, then every 10 minutes, then every hour.

This is useful for determining whether a split brain occurred.  It is
particularly useful if logging failed or was throttled at startup, so
there is no evidence of the split brain when it began.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-07-26 03:34:16 +00:00
Martin Schwenke
f2559ef8ce ctdb-recoverd: Log the master at the end of elections
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-07-26 03:34:16 +00:00
Martin Schwenke
35368d871d ctdb-recovery: Avoid -1 as a PNN, use CTDB_UNKNOWN_PNN instead
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-06-05 10:25:50 +00:00
Martin Schwenke
978c7dbd55 ctdb-recovery: Fix signed/unsigned comparison by casting
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-06-05 10:25:50 +00:00
Martin Schwenke
fa7bd35b6a ctdb-recovery: Fix signed/unsigned comparisons by declaring as unsigned
Simple cases where variables need to be declared as an unsigned type
instead of an int.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-06-05 10:25:50 +00:00