1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

518 Commits

Author SHA1 Message Date
Martin Schwenke
a88c10c5a9 ctdb-recoverd: Move ctdb_ctrl_modflags() to ctdb_recoverd.c
This file is the only user of this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
b1e631ff92 ctdb-recoverd: Improve a call to update_flags_on_all_nodes()
This should take a PNN, not an array index.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
915d24ac12 ctdb-recoverd: Use update_flags_on_all_nodes()
This is clearer than using the MODFLAGS control directly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
f681c0e947 ctdb-recoverd: Introduce some local variables to improve readability
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
cb3a3147b7 ctdb-recoverd: Change update_flags_on_all_nodes() to take rec argument
This makes fields such as recmaster and nodemap easily available if
required.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Martin Schwenke
6982fcb3e6 ctdb-recoverd: Drop unused nodemap argument from update_flags_on_all_nodes()
An unused argument needlessly extends the length of function calls.  A
subsequent change will allow rec->nodemap to be used if necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-07-24 04:41:25 +00:00
Volker Lendecke
ad4b53f2d9 ctdb: Fix a memleak
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14348
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Apr 17 08:32:35 UTC 2020 on sn-devel-184
2020-04-17 08:32:35 +00:00
Martin Schwenke
716f52f68b ctdb-recoverd: Avoid dereferencing NULL rec->nodemap
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.

One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:

  list_of_nodes()
  list_of_active_nodes()
  set_recovery_mode()
  force_election()
  lost_reclock_handler()

Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL.  Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324

Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
2020-03-24 01:22:45 +00:00
Martin Schwenke
3a66d181b6 ctdb-recovery: Remove old code for creating missing databases
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Amitay Isaacs
c6427dddf5 ctdb-recoverd: No need for database detach handler
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming.  Recovery daemon does
not take part in database vacuuming any more.

The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Amitay Isaacs
fc81729dd2 ctdb-recoverd: Drop VACUUM_FETCH message handling
This is now implemented in the ctdb daemon using VACUMM_FETCH control.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2019-10-24 04:06:43 +00:00
Mathieu Parent
736bb924f7 Spelling fixes s/ ot / to /
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
2019-09-01 22:21:27 +00:00
Martin Schwenke
8190993d99 ctdb-recoverd: Fix typo in previous fix
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 27 15:29:11 UTC 2019 on sn-devel-184
2019-08-27 15:29:11 +00:00
Martin Schwenke
5d655ac6f2 ctdb-recoverd: Only check for LMASTER nodes in the VNN map
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-08-21 11:50:30 +00:00
Martin Schwenke
6fe963c3f7 ctdb-recoverd: Periodically log recovery master of incomplete cluster
Only do this if the recovery lock is unset.  Log every minute for the
first 10 minutes, then every 10 minutes, then every hour.

This is useful for determining whether a split brain occurred.  It is
particularly useful if logging failed or was throttled at startup, so
there is no evidence of the split brain when it began.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-07-26 03:34:16 +00:00
Martin Schwenke
f2559ef8ce ctdb-recoverd: Log the master at the end of elections
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-07-26 03:34:16 +00:00
Martin Schwenke
35368d871d ctdb-recovery: Avoid -1 as a PNN, use CTDB_UNKNOWN_PNN instead
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-06-05 10:25:50 +00:00
Martin Schwenke
978c7dbd55 ctdb-recovery: Fix signed/unsigned comparison by casting
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-06-05 10:25:50 +00:00
Martin Schwenke
fa7bd35b6a ctdb-recovery: Fix signed/unsigned comparisons by declaring as unsigned
Simple cases where variables need to be declared as an unsigned type
instead of an int.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-06-05 10:25:50 +00:00
Martin Schwenke
6a2941e2a9 ctdb-recoverd: Fix memory leak
state is always freed before exiting this function, so allocate fde
off it instead of long-lived ctdb context.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-05-14 07:25:37 +00:00
Martin Schwenke
13a1a48089 ctdb-recoverd: Time out attempt to take recovery lock after 120s
Currently this will wait forever.  It really needs a timeout in case
the cluster filesystem (or other lock mechanism) is completely wedged.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-02-25 02:12:17 +01:00
Martin Schwenke
45a77d65b2 ctdb-recoverd: Ban node on unknown error when taking recovery lock
We really shouldn't see unknown errors.  They probably represent a
misconfigured recovery lock or similar.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-02-25 02:12:17 +01:00
Martin Schwenke
c0fb62ed39 ctdb-recoverd: Make recoverd context available in recovery lock handle
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-02-25 02:12:16 +01:00
Martin Schwenke
7e4aae6943 ctdb-recoverd: Clean up logging on failure to take recovery lock
Add an explicit case for a timeout and clean up the other messages.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-02-25 02:12:16 +01:00
Martin Schwenke
621658cbed ctdb-recoverd: Free cluster mutex handler on failure to take lock
If nested events occur while the file descriptor handler is still
active then chaos can ensue.  For example, if a node is banned and the
lock is explicitly cancelled (e.g. due to election loss) then
double-talloc-free()s abound.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2019-02-25 02:12:16 +01:00
Martin Schwenke
da8aaf2aee ctdb-recoverd: Call an election when the recovery lock is lost
The lock may have been lost due to a failure in the underlying locking
mechanism.  This could be due to quorum loss or similar.  It is best
to call an election to confirm that this node should still be master.
At worst, the node will reelect itself, fail to take the lock and then
ban itself.  This is a suitable outcome for a node that has been
partitioned from others in the cluster.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-12-18 02:02:03 +01:00
Andreas Schneider
2d512b278e debug: Use debuglevel_(get|set) function
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Thu Nov  8 11:03:11 CET 2018 on sn-devel-144
2018-11-08 11:03:11 +01:00
Martin Schwenke
486022ef8f ctdb-recoverd: Set recovery lock handle at start of attempt
This allows the attempt to be cancelled if an election is lost and an
unlock is done before the attempt is completed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144
2018-09-18 02:18:30 +02:00
Martin Schwenke
b1dc568784 ctdb-recoverd: Handle cancellation when releasing recovery lock
If the recovery lock is in the process of being taken then free the
cluster mutex handle but leave the recovery lock handle in place.
This allows ctdb_recovery_lock() to fail.

Note that this isn't yet live because rec->recovery_lock_handle is
still only set at the completion of the attempt to take the lock.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
a755d060c1 ctdb-recoverd: Return early when the recovery lock is not held
This makes upcoming changes simpler.

Update to modern debug macro while touching relevant line.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c52216740b ctdb-recoverd: Store recovery lock handle
... not just cluster mutex handle.

This makes the recovery lock handle long-lived and with allow the
releasing code to cancel an in-progress attempt to take the recovery
lock.

The cluster mutex handle is now allocated off the recovery lock
handle.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
a53b264aee ctdb-recoverd: Use talloc() to allocate recovery lock handle
At the moment this is still local and is freed after the mutex is
successfully taken.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
af22f03dbe ctdb-recoverd: Rename hold_reclock_state to ctdb_recovery_lock_handle
This will be a longer lived structure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c516e58ce9 ctdb-recoverd: Re-check master on failure to take recovery lock
If the master changed while trying to take the lock then fail gracefully.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
59fc01646c ctdb-recoverd: Clean up taking of recovery lock
No functional changes, just coding style cleanups and debug message
tweaks.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
929634126a ctdb-config: Switch tunable DisableIPFailover to a config option
Use the "failover:disabled" option instead.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
914e9f22d8 ctdb-daemon: Pass DisableIPFailover tunable via environment variable
Preparation for obsoleting this tunable.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
b318cf22ba ctdb-recoverd: Set the process name correctly
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-07-02 08:51:22 +02:00
Martin Schwenke
57834c64be ctdb-common: Rename system utility files
system_socket.[ch] will contain all the raw socket code and other
functions that use ctdb_sock_addr.  system.[ch] will contain other
platform dependent functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-07-02 08:51:20 +02:00
Amitay Isaacs
6e588913dd ctdb-recoverd: Abort recovery/takeover if recmaster changes
Recovery and takeover are run via helper from recovery daemon.  While the
helpers are running, it's possible for the current node to lose election.
If that happens, abort the currently running recovery/takeover helper.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-09-12 12:23:19 +02:00
Amitay Isaacs
1f7f112317 ctdb-client: Fix ctdb_attach() to use database flags
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 25 13:32:58 CEST 2017 on sn-devel-144
2017-08-25 13:32:58 +02:00
Amitay Isaacs
9987fe7209 ctdb-client: Optionally return database id from ctdb_ctrl_createdb()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-08-25 09:41:26 +02:00
Amitay Isaacs
4bd0a20a75 ctdb-client: Fix ctdb_ctrl_createdb() to use database flags
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-08-25 09:41:25 +02:00
Amitay Isaacs
ea91967b0d ctdb-client: Drop tdb_flags argument to ctdb_attach()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-26 15:47:24 +02:00
Amitay Isaacs
ea46699b27 ctdb-recovery: Do not run local ip verification when in recovery
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

If we drop public IPs because CTDB is in recovery for too long, then
avoid spamming logs "Trigger takeoverrun" every second.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Amitay Isaacs
2fd2ccd4c8 ctdb-recovery: Get recmode unconditionally in the main_loop
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

This can be used later in the main_loop to avoid the local ip check.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Chris Lamb
f7dc9f1e12 Correct "supressed" typo.
Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Garming Sam <garming@catalyst.net.nz>
2017-02-22 08:26:21 +01:00
Martin Schwenke
f2485d3ab9 ctdb-recoverd: Integrate takeover helper
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-12-19 04:07:08 +01:00
Martin Schwenke
5b60414265 ctdb-recoverd: Generalise helper state, handler and launching
These can also be used for takeover handler.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-12-19 04:07:08 +01:00
Amitay Isaacs
41c964fdbc ctdb-recovery: Start recovery helper with ctdb_vfork_exec
The recovery helper does it's own logging, so there is no need to
pass logfd.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec  5 11:59:42 CET 2016 on sn-devel-144
2016-12-05 11:59:42 +01:00