1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00
Commit Graph

8986 Commits

Author SHA1 Message Date
Vinit Agnihotri
794f125802 ctdb-tool: Add UNKNOWN pseudo state
When a node is starting, CTDB reports remote nodes as unhealthy by
default.  This can be misleading.

To hide this, report an "UNKNOWN" pseudo state when a remote node is
not disconnected and the runstate is less than or equal to
"FIRST_RECOVERY".

Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-28 09:24:31 +00:00
Vinit Agnihotri
428bc71f98 ctdb-tests: Add runstate handling to fake ctdbd
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-28 09:24:31 +00:00
Martin Schwenke
05601cebc9 ctdb-tests: Return error on empty fake ctdbd configuration blocks
These would be unintended errors.  The block should be omitted to keep
the default value.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-28 09:24:31 +00:00
Martin Schwenke
80ba66013e ctdb-scripts: Drop use of eval in CTDB callout handling
eval is not required and causes the follow ShellCheck warning:

  SC2294 (warning): eval negates the benefit of arrays. Drop eval to
  preserve whitespace/symbols (or eval as string).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 24 10:40:50 UTC 2022 on sn-devel-184
2022-06-24 10:40:50 +00:00
Martin Schwenke
4cbb0b13ba ctdb-tests: Do not require eval tricks for faking NFS callout
The current code requires the use of eval in the NFS callout handling
to facilitate testing.  Improve the code to remove this need.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
0247fd8a02 ctdb-scripts: Avoid ShellCheck warning SC2162
SC2162 read without -r will mangle backslashes

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
7f799a8d6f ctdb-tests: Fix faking of program stack traces
The current code works in all current cases but is lazy and wrong.
Fix it to avoid breaking on code changes involving different thread
setups.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
0b728a4e8f ctdb-tests: Improve Debian-style event script unit testing
Tests can be run by hand using different distro styles, such as:

  CTDB_NFS_DISTRO_STYLE=systemd-debian \
    ./tests/run_tests.sh ./tests/UNIT/eventscripts/{06,60}.nfs.*

This fixes known problems for Debian styles, so the tests now pass for
the following values of CTDB_NFS_DISTRO_STYLE:

  systemd-redhat
  sysvinit-redhat
  systemd-debian
  sysvinit-debian

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
7f3a0c7e9c ctdb-scripts: Parameterise /etc directory to aid testing
At the moment test results can be influenced by real system
configuration files.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
337ef7c1b4 ctdb-scripts: Set NFS services to "AUTO" if started by another service
For example, in Sys-V init "rquotad" is started by the main "nfs"
service.  At the moment the call-out can't distinguish between this
case and "should never be run".  Services set to "AUTO" are
hand-stopped/started via service_stop()/service_start() on failure via
restart_after.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
8b8660d883 ctdb-scripts: Refactor the manual RPC service start/stop
This logic needs improving, so factor the decision making into new
functions service_or_manual_stop() and service_or_manual_start().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
cd018d0ff5 ctdb-scripts: Simplify and rename basic_stop() and basic_start()
Drop the argument.  These now just stop/start the overall NFS service,
so rename them appropriately.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
09fd1e5579 ctdb-scripts: Move nfslock out of basic_stop() and basic_start()
These are only called in one place and should be done inline, since
that is less confusing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
a43a1ebe51 ctdb-tests: Reformat script
Samba is reformatting shell scripts using

  shfmt -w -p -i 0 -fn

so update this one before editing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
e752f841e6 ctdb-daemon: Use DEBUG() macro for child logging
Directly using dbgtext() with file logging results in a log entry with
no header, which is wrong.  This is a regression, introduced in commit
10d15c9e5d.  Prior to this, CTDB's
callback for file logging would always add a header.

Use DEBUG() instead dbgtext().  Note that DEBUG() effectively compares
the passed script_log_level with DEBUGLEVEL, so an explicit check is
no longer necessary.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Jun 16 13:33:10 UTC 2022 on sn-devel-184
2022-06-16 13:33:10 +00:00
Martin Schwenke
88f35cf862 ctdb-daemon: Drop unused prefix, logfn, logfn_private
These aren't set anywhere in the code.

Drop the log argument because it is also no longer used.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2022-06-16 12:42:35 +00:00
Martin Schwenke
1596a3e84b ctdb-common: Tell file logging not to redirect stderr
This allows ctdb_set_child_logging() to work.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2022-06-16 12:42:35 +00:00
Martin Schwenke
b20ee18031 ctdb-tests: Fix a cut and paste error in a comment
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue May 31 05:56:43 UTC 2022 on sn-devel-184
2022-05-31 05:56:43 +00:00
Martin Schwenke
90a96f06a9 ctdb-recoverd: Do not ban on unknown error when taking cluster lock
If the cluster filesystem is unavailable then I/O errors may occur.
This is no worse than contention, so don't ban.  This avoids having
services unavailable for longer than necessary.

Update the associated test to simply confirm that this results in a
leaderless cluster, and leadership is restored when the lock can once
again be taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Martin Schwenke
a400f4e7cc ctdb-doc: Fix typos in the policy routing documentation
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Martin Schwenke
da9decfc5e ctdb-daemon: Remove unused #includes of rb_tree.h
ctdb_takeover.c and eventscript.c no longer use this.
ipalloc_common.c has never used it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Martin Schwenke
80de84d36e ctdb-daemon: Log per-database summary of resent calls
After a recovery that takes a significant amount of time the logs are
flooded with messages about every resent call.

Log a summary instead and demote per-call messages to INFO level.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Pavel Filipenský
8cb6565011 ctdb: Covscan: unchecked return value for trbt_traversearray32()
Signed-off-by: Pavel Filipenský <pfilipen@redhat.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-14 03:49:32 +00:00
Pavel Filipenský
91d1d0e4c8 ctdb: Fix trailing whitespace in rb_tree.c
Signed-off-by: Pavel Filipenský <pfilipen@redhat.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-14 03:49:32 +00:00
Martin Schwenke
64275fc1a2 ctdb-tests: Add backtrace on abort to some tests
These are easier to debug with a backtrace.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue May  3 10:13:23 UTC 2022 on sn-devel-184
2022-05-03 10:13:23 +00:00
Martin Schwenke
d39377d6fc ctdb-tests: Provide a method to dump the stack on abort
Some tests make generous use of assert() and it can be difficult to
guess the cause of failures without resorting to GDB.  This provides
some help.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
73b27def7b build: Add missing ctdb-client dependencies
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
d57d624a77 ctdb-build: Drop unnecessary uses of include/ sub-directory
None of these include any files from the include/ sub-directory.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
6d3c9e64d9 ctdb-tests: Use test_case() to help document test cases
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
d52b497d11 ctdb-locking: Don't pass NULL to tevent_req_is_unix_error()
If there is an error then this pointer is unconditionally
dereferenced.

However, the only possible error appears to be ENOMEM, where a crash
caused by dereferencing a NULL pointer isn't a terrible outcome.  In
the absence of a security issue this is probably not worth
backporting.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
490e5f4d4c ctdb-mutex: Don't pass NULL to tevent_req_is_unix_error()
If there is an error then this pointer is unconditionally
dereferenced.

However, the only possible error appears to be ENOMEM, where a crash
caused by dereferencing a NULL pointer isn't a terrible outcome.  In
the absence of a security issue this is probably not worth
backporting.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
8deec3bc67 ctdb-scripts: Drop unused ctdbd_wrapper
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
a1e78cc372 ctdb-scripts: Drop uses of ctdbd_wrapper
The only value this now provides is use of a notification script to
log when start/stop are called.  This was used for debugging strange
start/stop failures, which have not been recently seen.  Also, systemd
does a good job of logging start/stop.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
aca5972233 ctdb-scripts: Remove failsafe that drops all IPs on failed shutdown
IPs are dropped in the shutdown event.

If a watchdog is necessary to ensure public IPs aren't on interfaces
when CTDB isn't running, then see ctdb-crash-cleanup.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
6fb08a6580 ctdb-daemon: Don't release all public IPs during shutdown sequence
This further untangles public IP handling from the main daemon.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
cb438ecfd4 ctdb-scripts: Drop all public IPs in the "shutdown" event
This is functionally the same as ctdb_release_all_ips().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
3caddaafa0 ctdb-config: Drop CTDB_STARTUP_TIMEOUT
This was added to be able to notice startup failures when unknown
tunables were present in the configuration.  Tunables are now set by
the daemon, so this is no longer necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
208034ecfe ctdb-doc: Update documentation for tunables configuration
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
0902553d15 ctdb-scripts: No longer load tunables via 00.ctdb.script setup event
Drop related tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
f49446cb1e ctdb-daemon: Load tunables from ctdb.tunables
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
a509ee059e ctdb-daemon: New function ctdb_tunables_load()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
b14f2a205d ctdb-tests: Add unit tests for tunables code
This aims to test ctdb_tunable_load_file() but also exercises
ctdb_tunable_names() and ctdb_tunable_get_value().
ctdb_tunable_set_value() is indirectly exercised via
ctdb_tunable_load_file().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
381134939b ctdb-tests: Add function test_case(), tweak unit test header format
Instead of documenting test cases with a comment, this allows them to
be documented via an argument to a function that is printed when the
test case is run.  This makes it easier locate test case failures when
commands used by test cases look similar,

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
c413838f79 ctdb-tests: Strip trailing newlines from expected result output
This allows the provided output to be specified a little more
carelessly.  As per the comment, trailing newlines can't be matched
anyway, so this is notionally a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
5fa0c86b61 ctdb-tests: Reformat script
Samba is reformatting shell scripts using

  shfmt -w -p -i 0 -fn

so update this one before editing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
bcd66e17ee ctdb-common: Add function ctdb_tunable_load_file()
Allows direct loading of tunables from a file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Vinit Agnihotri
93824b8c33 packaging: move CTDB service file to top-level
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
2f6b31788b ctdb-packaging: Move RPM spec file to examples directory
We used to use this for building test packages for standalone CTDB.
However, our testing has now changed to use binary tarballs.  We
believe we were the only users of this spec file and expect CTDB to
only be installed as part of a top-level Samba build, especially in
RPM form.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Stefan Metzmacher
aa02cf3c44 ctdb/packaging/RPM: don't use waf directly
./configure && make && make install is will always work.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-03-29 22:32:32 +00:00
Stefan Metzmacher
22c46d9f41 configure/Makefile: export PYTHONHASHSEED=1 in all 'configure/Makefile' scripts
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-03-29 22:32:32 +00:00
Archana
7debfe7a23 ctdb-tools: Remove deprecated networking commands and replace with new commands
The changes are made to replace the deprecated network commands
(ifconfig,netstat) with the new commands
(ip addr,ss) respectively

Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Mar  8 12:30:53 UTC 2022 on sn-devel-184
2022-03-08 12:30:53 +00:00
Archana
e16cd0316f ctdb-packaging: Remove deprecated networking command netstat and replace with "ss" command
Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-03-08 11:32:36 +00:00
Martin Schwenke
0d8084ed62 ctdb-protocol: CID 1499395: Uninitialized variables (UNINIT)
Issue is reported here:

853     	case CTDB_CONTROL_DB_VACUUM: {
854     		struct ctdb_db_vacuum db_vacuum;
855
>>>     CID 1499395:  Uninitialized variables  (UNINIT)
>>>     Using uninitialized value "db_vacuum.full_vacuum_run" when calling "ctdb_db_vacuum_len".
856     		CHECK_CONTROL_DATA_SIZE(ctdb_db_vacuum_len(&db_vacuum));
857     		return ctdb_control_db_vacuum(ctdb, c, indata, async_reply);
858     	}

The problem is that ctdb_bool_len() unnecessarily dereferences its
argument, which in this case is &db_vacuum.full_vacuum_run.  Not a
security issue because the value copied by dereferencing is not used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 23 02:02:06 UTC 2022 on sn-devel-184
2022-02-23 02:02:06 +00:00
Martin Schwenke
0f373443ef ctdb-tests: Fix missing #include for sigaction(2)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-23 01:08:37 +00:00
Martin Schwenke
ef9017a150 ctdb-tests: Dump a stack trace on abort
Debugging a test failure here without GDB is not possible.  Dumping a
stack trace gives a good hint.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-23 01:08:37 +00:00
Martin Schwenke
17d792e9aa ctdb-tests: Iterate protocol tests internally
Instead of repeatedly running a test binary.

Run time for these tests reduces from ~90s to ~75s.

When run under valgrind, the run time for protocol_test_001.sh reduces
from ~390s to <1s.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Feb 14 04:32:29 UTC 2022 on sn-devel-184
2022-02-14 04:32:29 +00:00
Martin Schwenke
2329305019 ctdb-tests: Add iteration support for protocol tests
The current method of repeatedly running a binary has huge overhead,
especially with valgrind.

protocol_test_iterate_tag() allows output that is usually used for
hinting where a test failure occurred to be replaced with a tag
stored in a buffer, which is printed on test failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 03:36:38 +00:00
Martin Schwenke
331c435ce5 ctdb-tests: Add a test for stalled node triggering election
A stalled node probably continues to hold the cluster lock, so confirm
elections work in this case.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Feb 14 02:46:01 UTC 2022 on sn-devel-184
2022-02-14 02:46:01 +00:00
Martin Schwenke
265e44abc4 ctdb-tests: Factor out functions to detect when generation changes
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
0e74e03c9c ctdb-recoverd: Consistently log start of election
Elections should now be quite rare, so always log when one begins.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
bf55a0117d ctdb-recoverd: Always send unknown leader broadcast when starting election
This is currently missed when the cluster lock is lost.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
9b3fab052b ctdb-recoverd: Consistently have caller set election-in-progress
The problem here is that election-in-progress must be set to
potentially avoid restarting the election broadcast timeout in
main_loop(), so this is already done by leader_handler().

Have force_election() set election-in-progress for all election types
and do not bother setting it in cluster_lock_election().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
188a902156 ctdb-recoverd: Always cancel election in progress
Election-in-progress is set by unknown leader broadcast, so needs to
be cleared in all cases when election completes.

This was seen in a case where the leader node stalled, so didn't send
leader broadcasts for some time.  The node continued to hold the
cluster lock, so another node could not become leader.  However, after
the node returned to normal it still did not send leader broadcasts
because election-in-progress was never cleared.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
f7de2132bb ctdb-doc: Remove documentation for recovery process
This is many years out of date and recent changes make it worse.  It
is unlikely that anyone has the time to fix this in the near future,
so remove it because it is misleading.

Database recovery steps are well documented in comments in the
recovery helper.  Cluster monitoring documentation can be re-added
when things stop changing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
a940ad9370 ctdb-doc: Update example configuration migration script
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
01313ea243 ctdb-tests: Improve test coverage for leader role yield and elections
Rename test, clean up node selection.  Duplicate for for banning and
removing leader capability cases.  Repeat all 3 tests without cluster
lock.

All of the standard election triggers are now tested, with and without
cluster lock.  Due to test cluster configuration limitations, the
tests without cluster lock are skipped on a real cluster.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
5d31778149 ctdb-tests: Support commenting out local daemons configuration options
Can be used to disable default options, such as cluster lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
34d2ca0ae6 ctdb-config: Add configuration option [cluster] leader timeout
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
1dfb266038 ctdb-config: [legacy] recmaster capability -> [cluster] leader capability
Rename this configuration item and move it into the [cluster]
configuration section.

Update documentation to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
f5a39058f0 ctdb-config: [cluster] recovery lock -> [cluster] cluster lock
Retain "recovery lock" and mark as deprecated for backward
compatibility.

Some documentation is still inconsistent.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
d752a92e11 ctdb-doc: Update documentation for leader and cluster lock
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
73555e8248 ctdb-recoverd: Use race for cluster lock as election when lock is enabled
If the cluster is partitioned then nodes in one partition can not take
the lock anyway, so election is pointless.  It just introduces
unnecessary corner cases.

Instead just race for the lock.

When a node notices a lack of leader and notifies other nodes of an
election via an unknown leader broadcast, the cluster lock election is
hooked into this broadcast.

The test needs to be updated because losing the cluster lock can now
result in a leadership change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
938d64c8ff ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
03ae158cff ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
a76374070d ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
193b624d26 ctdb-protocol: Drop protocol client functions for recmaster controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
cda673ff6d ctdb-client: Drop unused recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
16efbca003 ctdb-daemon: Drop unused old client recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
c68267b2a6 ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster()
Nothing fetches this value anymore.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
58d7fcdf7c ctdb-recoverd: Drop recovery master verification
This doesn't make sense if leader broadcasts are used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
f02e097485 ctdb-tools: recovery master -> leader
The following command names are changed:

  recmaster -> leader
  setrecmasterrole -> setleaderrole

Command output changed for the following commands:

  status
  getcapabilities

Documentation and tests are updated to reflect these changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
e60581d5b5 ctdb-tools: Use leader broadcast in get_leader()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
92fb68e9b8 ctdb-tools: Factor out get_leader()
This seems pointless but it localises a subsequent change and also
starts a terminology change in the tool code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
17ba15ccd8 ctdb-tools: Handle leader broadcasts in ctdb tool
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
ec90f36cc6 ctdb-tools: Print "UNKNOWN" when leader PNN is unknown
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
01a8d1a4a4 ctdb-client: Factor out function ctdb_client_wait_func_timeout()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
403db5b528 ctdb-tests: Factor out getting leader and waiting for leader change
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
4786982cc8 ctdb-tests: Add leader broadcasts to fake_ctdbd
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Amitay Isaacs
756dfdfed9 ctdb-tests: Implement srvid_handler for dispatching messages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-01-17 10:21:33 +00:00
Martin Schwenke
958746f947 ctdb-recoverd: Simplify some stopped/banned checks to inactive checks
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
358c59f51a ctdb-recoverd: No longer take cluster lock during recovery
Confirm instead that it is already held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
36ffaaa691 ctdb-recoverd: Add and use function cluster_lock_enabled()
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
5ee664ee17 ctdb-recoverd: Terminology change: recovery lock -> cluster lock
No functional changes, just name changes for clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
0f2250f4f9 ctdb-recoverd: Take cluster lock when election completes
It is no longer just a recovery lock but is always held by the cluster
leader.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
011e880002 ctdb-recoverd: Factor out function cluster_lock_take()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
037abf8620 ctdb-tests: Avoid a race
See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef7e3265f7 ctdb-tests: Setup cluster with expected arguments
ctdb_test_init() doesn't actually pass arguments to local_daemons.sh.
This needs to be done using ctdb_nodes_start_custom().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
b029ca4d51 ctdb-recoverd: Drop leader validation
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation.  Using the leader
broadcast may not be as fast but it is more correct.

When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active.  This is because the leader will no
longer push the node map to other nodes.  However, having all nodes
fetch the node map from an inactive leader may be unreliable.

Most of the other cases are also handled more reliably by the leader
broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7e53fab0a3 ctdb-recoverd: Drop special case for elected-before-connected
This no longer occurs at startup due to the leader broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef4b8c13c0 ctdb-recoverd: Handle leader broadcast timeout
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.

Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.

When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes.  That causes cancellation of the leader
broadcast timeout across the cluster.  This is particular important at
startup, since nodes may be started in a staggered fashion.  Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader.  When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00