samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00

Author	SHA1	Message	Date
Vinit Agnihotri	794f125802	ctdb-tool: Add UNKNOWN pseudo state When a node is starting, CTDB reports remote nodes as unhealthy by default. This can be misleading. To hide this, report an "UNKNOWN" pseudo state when a remote node is not disconnected and the runstate is less than or equal to "FIRST_RECOVERY". Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-28 09:24:31 +00:00
Vinit Agnihotri	428bc71f98	ctdb-tests: Add runstate handling to fake ctdbd Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-28 09:24:31 +00:00
Martin Schwenke	05601cebc9	ctdb-tests: Return error on empty fake ctdbd configuration blocks These would be unintended errors. The block should be omitted to keep the default value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-28 09:24:31 +00:00
Martin Schwenke	80ba66013e	ctdb-scripts: Drop use of eval in CTDB callout handling eval is not required and causes the follow ShellCheck warning: SC2294 (warning): eval negates the benefit of arrays. Drop eval to preserve whitespace/symbols (or eval as string). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Jun 24 10:40:50 UTC 2022 on sn-devel-184	2022-06-24 10:40:50 +00:00
Martin Schwenke	4cbb0b13ba	ctdb-tests: Do not require eval tricks for faking NFS callout The current code requires the use of eval in the NFS callout handling to facilitate testing. Improve the code to remove this need. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:33 +00:00
Martin Schwenke	0247fd8a02	ctdb-scripts: Avoid ShellCheck warning SC2162 SC2162 read without -r will mangle backslashes Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:33 +00:00
Martin Schwenke	7f799a8d6f	ctdb-tests: Fix faking of program stack traces The current code works in all current cases but is lazy and wrong. Fix it to avoid breaking on code changes involving different thread setups. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:33 +00:00
Martin Schwenke	0b728a4e8f	ctdb-tests: Improve Debian-style event script unit testing Tests can be run by hand using different distro styles, such as: CTDB_NFS_DISTRO_STYLE=systemd-debian \ ./tests/run_tests.sh ./tests/UNIT/eventscripts/{06,60}.nfs.* This fixes known problems for Debian styles, so the tests now pass for the following values of CTDB_NFS_DISTRO_STYLE: systemd-redhat sysvinit-redhat systemd-debian sysvinit-debian Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:33 +00:00
Martin Schwenke	7f3a0c7e9c	ctdb-scripts: Parameterise /etc directory to aid testing At the moment test results can be influenced by real system configuration files. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:32 +00:00
Martin Schwenke	337ef7c1b4	ctdb-scripts: Set NFS services to "AUTO" if started by another service For example, in Sys-V init "rquotad" is started by the main "nfs" service. At the moment the call-out can't distinguish between this case and "should never be run". Services set to "AUTO" are hand-stopped/started via service_stop()/service_start() on failure via restart_after. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:32 +00:00
Martin Schwenke	8b8660d883	ctdb-scripts: Refactor the manual RPC service start/stop This logic needs improving, so factor the decision making into new functions service_or_manual_stop() and service_or_manual_start(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:32 +00:00
Martin Schwenke	cd018d0ff5	ctdb-scripts: Simplify and rename basic_stop() and basic_start() Drop the argument. These now just stop/start the overall NFS service, so rename them appropriately. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:32 +00:00
Martin Schwenke	09fd1e5579	ctdb-scripts: Move nfslock out of basic_stop() and basic_start() These are only called in one place and should be done inline, since that is less confusing. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:32 +00:00
Martin Schwenke	a43a1ebe51	ctdb-tests: Reformat script Samba is reformatting shell scripts using shfmt -w -p -i 0 -fn so update this one before editing. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-06-24 09:49:32 +00:00
Martin Schwenke	e752f841e6	ctdb-daemon: Use DEBUG() macro for child logging Directly using dbgtext() with file logging results in a log entry with no header, which is wrong. This is a regression, introduced in commit `10d15c9e5d`. Prior to this, CTDB's callback for file logging would always add a header. Use DEBUG() instead dbgtext(). Note that DEBUG() effectively compares the passed script_log_level with DEBUGLEVEL, so an explicit check is no longer necessary. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Thu Jun 16 13:33:10 UTC 2022 on sn-devel-184	2022-06-16 13:33:10 +00:00
Martin Schwenke	88f35cf862	ctdb-daemon: Drop unused prefix, logfn, logfn_private These aren't set anywhere in the code. Drop the log argument because it is also no longer used. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org>	2022-06-16 12:42:35 +00:00
Martin Schwenke	1596a3e84b	ctdb-common: Tell file logging not to redirect stderr This allows ctdb_set_child_logging() to work. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org>	2022-06-16 12:42:35 +00:00
Martin Schwenke	b20ee18031	ctdb-tests: Fix a cut and paste error in a comment Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue May 31 05:56:43 UTC 2022 on sn-devel-184	2022-05-31 05:56:43 +00:00
Martin Schwenke	90a96f06a9	ctdb-recoverd: Do not ban on unknown error when taking cluster lock If the cluster filesystem is unavailable then I/O errors may occur. This is no worse than contention, so don't ban. This avoids having services unavailable for longer than necessary. Update the associated test to simply confirm that this results in a leaderless cluster, and leadership is restored when the lock can once again be taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Martin Schwenke	a400f4e7cc	ctdb-doc: Fix typos in the policy routing documentation Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Martin Schwenke	da9decfc5e	ctdb-daemon: Remove unused #includes of rb_tree.h ctdb_takeover.c and eventscript.c no longer use this. ipalloc_common.c has never used it. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Martin Schwenke	80de84d36e	ctdb-daemon: Log per-database summary of resent calls After a recovery that takes a significant amount of time the logs are flooded with messages about every resent call. Log a summary instead and demote per-call messages to INFO level. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-31 05:06:29 +00:00
Pavel Filipenský	8cb6565011	ctdb: Covscan: unchecked return value for trbt_traversearray32() Signed-off-by: Pavel Filipenský <pfilipen@redhat.com> Reviewed-by: Jeremy Allison <jra@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>	2022-05-14 03:49:32 +00:00
Pavel Filipenský	91d1d0e4c8	ctdb: Fix trailing whitespace in rb_tree.c Signed-off-by: Pavel Filipenský <pfilipen@redhat.com> Reviewed-by: Jeremy Allison <jra@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>	2022-05-14 03:49:32 +00:00
Martin Schwenke	64275fc1a2	ctdb-tests: Add backtrace on abort to some tests These are easier to debug with a backtrace. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue May 3 10:13:23 UTC 2022 on sn-devel-184	2022-05-03 10:13:23 +00:00
Martin Schwenke	d39377d6fc	ctdb-tests: Provide a method to dump the stack on abort Some tests make generous use of assert() and it can be difficult to guess the cause of failures without resorting to GDB. This provides some help. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	73b27def7b	build: Add missing ctdb-client dependencies Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	d57d624a77	ctdb-build: Drop unnecessary uses of include/ sub-directory None of these include any files from the include/ sub-directory. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	6d3c9e64d9	ctdb-tests: Use test_case() to help document test cases Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	d52b497d11	ctdb-locking: Don't pass NULL to tevent_req_is_unix_error() If there is an error then this pointer is unconditionally dereferenced. However, the only possible error appears to be ENOMEM, where a crash caused by dereferencing a NULL pointer isn't a terrible outcome. In the absence of a security issue this is probably not worth backporting. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	490e5f4d4c	ctdb-mutex: Don't pass NULL to tevent_req_is_unix_error() If there is an error then this pointer is unconditionally dereferenced. However, the only possible error appears to be ENOMEM, where a crash caused by dereferencing a NULL pointer isn't a terrible outcome. In the absence of a security issue this is probably not worth backporting. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-05-03 09:19:31 +00:00
Martin Schwenke	8deec3bc67	ctdb-scripts: Drop unused ctdbd_wrapper Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	a1e78cc372	ctdb-scripts: Drop uses of ctdbd_wrapper The only value this now provides is use of a notification script to log when start/stop are called. This was used for debugging strange start/stop failures, which have not been recently seen. Also, systemd does a good job of logging start/stop. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	aca5972233	ctdb-scripts: Remove failsafe that drops all IPs on failed shutdown IPs are dropped in the shutdown event. If a watchdog is necessary to ensure public IPs aren't on interfaces when CTDB isn't running, then see ctdb-crash-cleanup.sh. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	6fb08a6580	ctdb-daemon: Don't release all public IPs during shutdown sequence This further untangles public IP handling from the main daemon. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	cb438ecfd4	ctdb-scripts: Drop all public IPs in the "shutdown" event This is functionally the same as ctdb_release_all_ips(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	3caddaafa0	ctdb-config: Drop CTDB_STARTUP_TIMEOUT This was added to be able to notice startup failures when unknown tunables were present in the configuration. Tunables are now set by the daemon, so this is no longer necessary. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	208034ecfe	ctdb-doc: Update documentation for tunables configuration Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	0902553d15	ctdb-scripts: No longer load tunables via 00.ctdb.script setup event Drop related tests. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	f49446cb1e	ctdb-daemon: Load tunables from ctdb.tunables Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	a509ee059e	ctdb-daemon: New function ctdb_tunables_load() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	b14f2a205d	ctdb-tests: Add unit tests for tunables code This aims to test ctdb_tunable_load_file() but also exercises ctdb_tunable_names() and ctdb_tunable_get_value(). ctdb_tunable_set_value() is indirectly exercised via ctdb_tunable_load_file(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	381134939b	ctdb-tests: Add function test_case(), tweak unit test header format Instead of documenting test cases with a comment, this allows them to be documented via an argument to a function that is printed when the test case is run. This makes it easier locate test case failures when commands used by test cases look similar, Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	c413838f79	ctdb-tests: Strip trailing newlines from expected result output This allows the provided output to be specified a little more carelessly. As per the comment, trailing newlines can't be matched anyway, so this is notionally a bug fix. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	5fa0c86b61	ctdb-tests: Reformat script Samba is reformatting shell scripts using shfmt -w -p -i 0 -fn so update this one before editing. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	bcd66e17ee	ctdb-common: Add function ctdb_tunable_load_file() Allows direct loading of tunables from a file. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Vinit Agnihotri	93824b8c33	packaging: move CTDB service file to top-level Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Martin Schwenke	2f6b31788b	ctdb-packaging: Move RPM spec file to examples directory We used to use this for building test packages for standalone CTDB. However, our testing has now changed to use binary tarballs. We believe we were the only users of this spec file and expect CTDB to only be installed as part of a top-level Samba build, especially in RPM form. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-04-06 06:34:37 +00:00
Stefan Metzmacher	aa02cf3c44	ctdb/packaging/RPM: don't use waf directly ./configure && make && make install is will always work. Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2022-03-29 22:32:32 +00:00
Stefan Metzmacher	22c46d9f41	configure/Makefile: export PYTHONHASHSEED=1 in all 'configure/Makefile' scripts Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2022-03-29 22:32:32 +00:00
Archana	7debfe7a23	ctdb-tools: Remove deprecated networking commands and replace with new commands The changes are made to replace the deprecated network commands (ifconfig,netstat) with the new commands (ip addr,ss) respectively Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com> Reviewed-by: Volker Lendecke <vl@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Tue Mar 8 12:30:53 UTC 2022 on sn-devel-184	2022-03-08 12:30:53 +00:00
Archana	e16cd0316f	ctdb-packaging: Remove deprecated networking command netstat and replace with "ss" command Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com> Reviewed-by: Volker Lendecke <vl@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net>	2022-03-08 11:32:36 +00:00
Martin Schwenke	0d8084ed62	ctdb-protocol: CID 1499395: Uninitialized variables (UNINIT) Issue is reported here: 853 case CTDB_CONTROL_DB_VACUUM: { 854 struct ctdb_db_vacuum db_vacuum; 855 >>> CID 1499395: Uninitialized variables (UNINIT) >>> Using uninitialized value "db_vacuum.full_vacuum_run" when calling "ctdb_db_vacuum_len". 856 CHECK_CONTROL_DATA_SIZE(ctdb_db_vacuum_len(&db_vacuum)); 857 return ctdb_control_db_vacuum(ctdb, c, indata, async_reply); 858 } The problem is that ctdb_bool_len() unnecessarily dereferences its argument, which in this case is &db_vacuum.full_vacuum_run. Not a security issue because the value copied by dereferencing is not used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Feb 23 02:02:06 UTC 2022 on sn-devel-184	2022-02-23 02:02:06 +00:00
Martin Schwenke	0f373443ef	ctdb-tests: Fix missing #include for sigaction(2) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-23 01:08:37 +00:00
Martin Schwenke	ef9017a150	ctdb-tests: Dump a stack trace on abort Debugging a test failure here without GDB is not possible. Dumping a stack trace gives a good hint. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-23 01:08:37 +00:00
Martin Schwenke	17d792e9aa	ctdb-tests: Iterate protocol tests internally Instead of repeatedly running a test binary. Run time for these tests reduces from ~90s to ~75s. When run under valgrind, the run time for protocol_test_001.sh reduces from ~390s to <1s. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Feb 14 04:32:29 UTC 2022 on sn-devel-184	2022-02-14 04:32:29 +00:00
Martin Schwenke	2329305019	ctdb-tests: Add iteration support for protocol tests The current method of repeatedly running a binary has huge overhead, especially with valgrind. protocol_test_iterate_tag() allows output that is usually used for hinting where a test failure occurred to be replaced with a tag stored in a buffer, which is printed on test failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 03:36:38 +00:00
Martin Schwenke	331c435ce5	ctdb-tests: Add a test for stalled node triggering election A stalled node probably continues to hold the cluster lock, so confirm elections work in this case. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Feb 14 02:46:01 UTC 2022 on sn-devel-184	2022-02-14 02:46:01 +00:00
Martin Schwenke	265e44abc4	ctdb-tests: Factor out functions to detect when generation changes BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	0e74e03c9c	ctdb-recoverd: Consistently log start of election Elections should now be quite rare, so always log when one begins. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	bf55a0117d	ctdb-recoverd: Always send unknown leader broadcast when starting election This is currently missed when the cluster lock is lost. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	9b3fab052b	ctdb-recoverd: Consistently have caller set election-in-progress The problem here is that election-in-progress must be set to potentially avoid restarting the election broadcast timeout in main_loop(), so this is already done by leader_handler(). Have force_election() set election-in-progress for all election types and do not bother setting it in cluster_lock_election(). BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	188a902156	ctdb-recoverd: Always cancel election in progress Election-in-progress is set by unknown leader broadcast, so needs to be cleared in all cases when election completes. This was seen in a case where the leader node stalled, so didn't send leader broadcasts for some time. The node continued to hold the cluster lock, so another node could not become leader. However, after the node returned to normal it still did not send leader broadcasts because election-in-progress was never cleared. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-02-14 01:47:31 +00:00
Martin Schwenke	f7de2132bb	ctdb-doc: Remove documentation for recovery process This is many years out of date and recent changes make it worse. It is unlikely that anyone has the time to fix this in the near future, so remove it because it is misleading. Database recovery steps are well documented in comments in the recovery helper. Cluster monitoring documentation can be re-added when things stop changing. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	a940ad9370	ctdb-doc: Update example configuration migration script Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	01313ea243	ctdb-tests: Improve test coverage for leader role yield and elections Rename test, clean up node selection. Duplicate for for banning and removing leader capability cases. Repeat all 3 tests without cluster lock. All of the standard election triggers are now tested, with and without cluster lock. Due to test cluster configuration limitations, the tests without cluster lock are skipped on a real cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5d31778149	ctdb-tests: Support commenting out local daemons configuration options Can be used to disable default options, such as cluster lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	34d2ca0ae6	ctdb-config: Add configuration option [cluster] leader timeout Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	1dfb266038	ctdb-config: [legacy] recmaster capability -> [cluster] leader capability Rename this configuration item and move it into the [cluster] configuration section. Update documentation to match. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	f5a39058f0	ctdb-config: [cluster] recovery lock -> [cluster] cluster lock Retain "recovery lock" and mark as deprecated for backward compatibility. Some documentation is still inconsistent. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	d752a92e11	ctdb-doc: Update documentation for leader and cluster lock Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	73555e8248	ctdb-recoverd: Use race for cluster lock as election when lock is enabled If the cluster is partitioned then nodes in one partition can not take the lock anyway, so election is pointless. It just introduces unnecessary corner cases. Instead just race for the lock. When a node notices a lack of leader and notifies other nodes of an election via an unknown leader broadcast, the cluster lock election is hooked into this broadcast. The test needs to be updated because losing the cluster lock can now result in a leadership change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	938d64c8ff	ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	03ae158cff	ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	a76374070d	ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	193b624d26	ctdb-protocol: Drop protocol client functions for recmaster controls Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	cda673ff6d	ctdb-client: Drop unused recmaster functions Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	16efbca003	ctdb-daemon: Drop unused old client recmaster functions Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	c68267b2a6	ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster() Nothing fetches this value anymore. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	58d7fcdf7c	ctdb-recoverd: Drop recovery master verification This doesn't make sense if leader broadcasts are used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	f02e097485	ctdb-tools: recovery master -> leader The following command names are changed: recmaster -> leader setrecmasterrole -> setleaderrole Command output changed for the following commands: status getcapabilities Documentation and tests are updated to reflect these changes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	e60581d5b5	ctdb-tools: Use leader broadcast in get_leader() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	92fb68e9b8	ctdb-tools: Factor out get_leader() This seems pointless but it localises a subsequent change and also starts a terminology change in the tool code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	17ba15ccd8	ctdb-tools: Handle leader broadcasts in ctdb tool Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	ec90f36cc6	ctdb-tools: Print "UNKNOWN" when leader PNN is unknown Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	01a8d1a4a4	ctdb-client: Factor out function ctdb_client_wait_func_timeout() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	403db5b528	ctdb-tests: Factor out getting leader and waiting for leader change Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	4786982cc8	ctdb-tests: Add leader broadcasts to fake_ctdbd Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Amitay Isaacs	756dfdfed9	ctdb-tests: Implement srvid_handler for dispatching messages Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2022-01-17 10:21:33 +00:00
Martin Schwenke	958746f947	ctdb-recoverd: Simplify some stopped/banned checks to inactive checks Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	358c59f51a	ctdb-recoverd: No longer take cluster lock during recovery Confirm instead that it is already held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	36ffaaa691	ctdb-recoverd: Add and use function cluster_lock_enabled() Now all references to ctdb->recovery_lock are encapsulated in the cluster lock code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	5ee664ee17	ctdb-recoverd: Terminology change: recovery lock -> cluster lock No functional changes, just name changes for clarity. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	0f2250f4f9	ctdb-recoverd: Take cluster lock when election completes It is no longer just a recovery lock but is always held by the cluster leader. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	011e880002	ctdb-recoverd: Factor out function cluster_lock_take() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:33 +00:00
Martin Schwenke	037abf8620	ctdb-tests: Avoid a race See the comment in the code for details. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef7e3265f7	ctdb-tests: Setup cluster with expected arguments ctdb_test_init() doesn't actually pass arguments to local_daemons.sh. This needs to be done using ctdb_nodes_start_custom(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	b029ca4d51	ctdb-recoverd: Drop leader validation The introduction of the leader broadcast timeout provides an alternative to the current leader validation. Using the leader broadcast may not be as fast but it is more correct. When the leader node is stopped or banned, the only way of triggering an election is currently to fetch the leader's node map to check whether the it is still active. This is because the leader will no longer push the node map to other nodes. However, having all nodes fetch the node map from an inactive leader may be unreliable. Most of the other cases are also handled more reliably by the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	7e53fab0a3	ctdb-recoverd: Drop special case for elected-before-connected This no longer occurs at startup due to the leader broadcast timeout. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00
Martin Schwenke	ef4b8c13c0	ctdb-recoverd: Handle leader broadcast timeout If no leader broadcasts have been received from the leader for more than 5s then trigger an election. Apart from being sane behaviour, this avoids elected-before-connected bugs at startup, where a node elects itself leader before it is connected to other nodes. When a node processes a leader broadcast timeout it sends an unknown leader broadcast to all nodes. That causes cancellation of the leader broadcast timeout across the cluster. This is particular important at startup, since nodes may be started in a staggered fashion. Without this cluster-wide cancellation, a node might notice the lack of leader, win an election and complete a recovery before other nodes notice the lack of leader. When the leader broadcast timeout finally occurs on the other nodes then they'll put the cluster back into an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 10:21:32 +00:00

1 2 3 4 5 ...

8986 Commits