1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00
Commit Graph

9112 Commits

Author SHA1 Message Date
Martin Schwenke
8d04235f46 ctdb-common: Add trivial FD monitoring abstraction
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-28 10:09:34 +00:00
Martin Schwenke
f9467cdf3b ctdb-build: Link in backtrace support for ctdb_util_tests
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-28 10:09:34 +00:00
Martin Schwenke
7a1c43fc74 ctdb-build: Separate test backtrace support into separate subsystem
A convention when testing members of ctdb-util is to include the .c
file so that static functions can potentially be tested.  This means
that such tests can't be linked against ctdb-util or duplicate symbols
will be encountered.

ctdb-tests-common depends on ctdb-client, which depends in turn on
ctdb-util, so this can't be used to pull in backtrace support.
Instead, make ctdb-tests-backtrace its own subsystem.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-28 10:09:34 +00:00
Martin Schwenke
b195e8c0d0 ctdb-build: Sort sources in ctdb-util and ctdb_unit_tests
Also, rename ctdb_unit_tests to ctdb_util_tests.  The sorting makes
it clear that only items from ctdb-util are tested here.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-28 10:09:34 +00:00
Martin Schwenke
3efa56aa61 ctdb-daemon: Fix printing of tickle ACKs
Commit f5a2037734 arguably got this
back-to-front:

  2022-07-27T09:50:01.985857+10:00 testn1 ctdbd[17820]: ../../ctdb/server/ctdb_takeover.c:514 sending TAKE_IP for '10.0.1.173'
  2022-07-27T09:50:01.990601+10:00 testn1 ctdbd[17820]: Send TCP tickle ACK: 10.0.1.77:33004 -> 10.0.1.173:2049
  2022-07-27T09:50:01.991323+10:00 testn1 ctdb-takeover[19758]: TAKEOVER_IP 10.0.1.173 succeeded on node 0

Unfortunately there is an inconsistency somewhere in the connection
tracking code used for tickle ACKs, making this less than obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 28 09:02:08 UTC 2022 on sn-devel-184
2022-07-28 09:02:08 +00:00
Martin Schwenke
30c40046ef ctdb-build: Add missing dependency on talloc
The include isn't strictly necessary, since it is included via
common/reqid.c anyway.  However, it is a useful hint.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jul 22 17:01:00 UTC 2022 on sn-devel-184
2022-07-22 17:01:00 +00:00
Martin Schwenke
e831af7b25 ctdb-tests: Work around unreadable file test failure when root
root can read files for which the mode prohibits reading, so this test
case fails when run as root.  Work around this when running as root.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
b20ccaa36d ctdb-scripts: Use "git config" as last resort to parse nfs.conf
Some versions of nfs-utils (e.g. recent CentOS 7) use /etc/nfs.conf
but do not include the nfsconf utility to extract values from the
file.  However, git has an excellent conf file parser, so use it as a
last resort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
db37043bc5 ctdb-scripts: Avoid ShellCheck warning SC2295
For example:

In /home/martins/samba/samba/ctdb/tools/onnode line 304:
    [ "$nodes" != "${nodes%[ ${nl}]*}" ] && verbose=true
                             ^---^ SC2295 (info): Expansions inside ${..} need to be quoted separately, otherwise they match as patterns.

Did you mean:
    [ "$nodes" != "${nodes%[ "${nl}"]*}" ] && verbose=true

For more information:
  https://www.shellcheck.net/wiki/SC2295 -- Expansions inside ${..} need to b...

Who knew?  Thanks ShellCheck!

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
00f1d6d947 ctdb-common: Use POSIX if_nameindex() to check interface existence
This works as an unprivileged user, so avoids unnecessary errors when
running in test mode (and not as root):

  2022-02-18T12:21:12.436491+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
  2022-02-18T12:21:12.436534+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
  2022-02-18T12:21:12.436557+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
  2022-02-18T12:21:12.436577+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket

The corresponding porting test would now become pointless because it
would just confirm that "fake" does not exist.  Attempt to make it
useful by using a less likely name than "fake" and attempting to
detect the loopback interface.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
c77a4fde7a ctdb-daemon: Modernise debug in ctdb_add_public_address()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
d62fcba7dc ctdb-daemon: Avoid spurious error sending ARPs for released IP
A public IP address can be released in between (and probably before)
attempts to send ARPs.  One situation when this can occur is when a
cluster is shutting down: node A shuts down first, public IPs from
node A are taken over by node B, node B is shutdown.

Notice this when it occurs and cancel further attempts to send ARPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
f5a2037734 ctdb-daemon: Modernise debug in ctdb_control_send_arp()
For the tickle ACK logging, render the connection in a buffer.  This
produces more complete information.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
ec5f6425b7 ctdb-protocol: Add separator argument to ctdb_connection_to_buf()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
440bd86a99 ctdb-daemon: Drop unused ban_state element from CTDB node structure
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
9898e7c555 ctdb-recoverd: Clean up banning culprit code
Make this fully self-contained in the recovery daemon and avoid
indexing by PNN.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
19fbc2da38 ctdb-recoverd: Add pnn field to banning state structure
This structure is now standalone, so indexing by PNN can be avoided
via a subsequent commit.  Index by culprit here to make this commit
simple.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
0b5dd07604 ctdb-recoverd: Add function node_flags() and use it in elections
Indexing a node map by PNN is suboptimal.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 16:09:31 +00:00
Martin Schwenke
e396eb9fbc ctdb-scripts: Only run unhealthy call-out when passing threshold
For memory usage, no need to dump all of this data on every failed
monitor event.  The first call will be enough to diagnose the problem.
The node will then go unhealthy, drop clients and memory usage should
then drop.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jul 22 07:32:54 UTC 2022 on sn-devel-184
2022-07-22 07:32:54 +00:00
Martin Schwenke
36bd6fd01f ctdb-scripts: Always check memory usage
If filesystem usage exceeds the unhealthy threshold then checking
memory usage checking is not done.  Always do them both.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 06:38:32 +00:00
Martin Schwenke
5e7bbcb069 ctdb-scripts: Avoid ShellCheck info SC2162
SC2162 (info): read without -r will mangle backslashes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 06:38:32 +00:00
Martin Schwenke
dc7aaca889 ctdb-scripts: Reduce length of very long lines
Use printf to allow easier line breaks and use some early returns.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 06:38:32 +00:00
Martin Schwenke
fc485feae8 ctdb-scripts: De-clutter validate_percentage()
It always takes 2 arguments.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 06:38:32 +00:00
Martin Schwenke
a832c8e273 ctdb-scripts: Reformat using shfmt -w -p -i 0 -fn
About to modify this file, so reformat first as per recent Samba
convention.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 06:38:32 +00:00
Martin Schwenke
3df39aa7fb ctdb-scripts: Avoid ShellCheck warning SC2164
SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

A problem can only occur if /etc/ctdb/ or an important subdirectory is
removed, which means the script itself would not be found.  Use && to
silence ShellCheck.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-07-22 06:38:32 +00:00
Martin Schwenke
be293a125f ctdb-tests: Add new tool unit tests to cover UNKNOWN state
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jun 28 10:16:59 UTC 2022 on sn-devel-184
2022-06-28 10:16:59 +00:00
Vinit Agnihotri
794f125802 ctdb-tool: Add UNKNOWN pseudo state
When a node is starting, CTDB reports remote nodes as unhealthy by
default.  This can be misleading.

To hide this, report an "UNKNOWN" pseudo state when a remote node is
not disconnected and the runstate is less than or equal to
"FIRST_RECOVERY".

Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-28 09:24:31 +00:00
Vinit Agnihotri
428bc71f98 ctdb-tests: Add runstate handling to fake ctdbd
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-28 09:24:31 +00:00
Martin Schwenke
05601cebc9 ctdb-tests: Return error on empty fake ctdbd configuration blocks
These would be unintended errors.  The block should be omitted to keep
the default value.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-28 09:24:31 +00:00
Martin Schwenke
80ba66013e ctdb-scripts: Drop use of eval in CTDB callout handling
eval is not required and causes the follow ShellCheck warning:

  SC2294 (warning): eval negates the benefit of arrays. Drop eval to
  preserve whitespace/symbols (or eval as string).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 24 10:40:50 UTC 2022 on sn-devel-184
2022-06-24 10:40:50 +00:00
Martin Schwenke
4cbb0b13ba ctdb-tests: Do not require eval tricks for faking NFS callout
The current code requires the use of eval in the NFS callout handling
to facilitate testing.  Improve the code to remove this need.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
0247fd8a02 ctdb-scripts: Avoid ShellCheck warning SC2162
SC2162 read without -r will mangle backslashes

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
7f799a8d6f ctdb-tests: Fix faking of program stack traces
The current code works in all current cases but is lazy and wrong.
Fix it to avoid breaking on code changes involving different thread
setups.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
0b728a4e8f ctdb-tests: Improve Debian-style event script unit testing
Tests can be run by hand using different distro styles, such as:

  CTDB_NFS_DISTRO_STYLE=systemd-debian \
    ./tests/run_tests.sh ./tests/UNIT/eventscripts/{06,60}.nfs.*

This fixes known problems for Debian styles, so the tests now pass for
the following values of CTDB_NFS_DISTRO_STYLE:

  systemd-redhat
  sysvinit-redhat
  systemd-debian
  sysvinit-debian

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:33 +00:00
Martin Schwenke
7f3a0c7e9c ctdb-scripts: Parameterise /etc directory to aid testing
At the moment test results can be influenced by real system
configuration files.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
337ef7c1b4 ctdb-scripts: Set NFS services to "AUTO" if started by another service
For example, in Sys-V init "rquotad" is started by the main "nfs"
service.  At the moment the call-out can't distinguish between this
case and "should never be run".  Services set to "AUTO" are
hand-stopped/started via service_stop()/service_start() on failure via
restart_after.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
8b8660d883 ctdb-scripts: Refactor the manual RPC service start/stop
This logic needs improving, so factor the decision making into new
functions service_or_manual_stop() and service_or_manual_start().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
cd018d0ff5 ctdb-scripts: Simplify and rename basic_stop() and basic_start()
Drop the argument.  These now just stop/start the overall NFS service,
so rename them appropriately.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
09fd1e5579 ctdb-scripts: Move nfslock out of basic_stop() and basic_start()
These are only called in one place and should be done inline, since
that is less confusing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
a43a1ebe51 ctdb-tests: Reformat script
Samba is reformatting shell scripts using

  shfmt -w -p -i 0 -fn

so update this one before editing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-06-24 09:49:32 +00:00
Martin Schwenke
e752f841e6 ctdb-daemon: Use DEBUG() macro for child logging
Directly using dbgtext() with file logging results in a log entry with
no header, which is wrong.  This is a regression, introduced in commit
10d15c9e5d.  Prior to this, CTDB's
callback for file logging would always add a header.

Use DEBUG() instead dbgtext().  Note that DEBUG() effectively compares
the passed script_log_level with DEBUGLEVEL, so an explicit check is
no longer necessary.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Jun 16 13:33:10 UTC 2022 on sn-devel-184
2022-06-16 13:33:10 +00:00
Martin Schwenke
88f35cf862 ctdb-daemon: Drop unused prefix, logfn, logfn_private
These aren't set anywhere in the code.

Drop the log argument because it is also no longer used.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2022-06-16 12:42:35 +00:00
Martin Schwenke
1596a3e84b ctdb-common: Tell file logging not to redirect stderr
This allows ctdb_set_child_logging() to work.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15090

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2022-06-16 12:42:35 +00:00
Martin Schwenke
b20ee18031 ctdb-tests: Fix a cut and paste error in a comment
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue May 31 05:56:43 UTC 2022 on sn-devel-184
2022-05-31 05:56:43 +00:00
Martin Schwenke
90a96f06a9 ctdb-recoverd: Do not ban on unknown error when taking cluster lock
If the cluster filesystem is unavailable then I/O errors may occur.
This is no worse than contention, so don't ban.  This avoids having
services unavailable for longer than necessary.

Update the associated test to simply confirm that this results in a
leaderless cluster, and leadership is restored when the lock can once
again be taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Martin Schwenke
a400f4e7cc ctdb-doc: Fix typos in the policy routing documentation
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Martin Schwenke
da9decfc5e ctdb-daemon: Remove unused #includes of rb_tree.h
ctdb_takeover.c and eventscript.c no longer use this.
ipalloc_common.c has never used it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Martin Schwenke
80de84d36e ctdb-daemon: Log per-database summary of resent calls
After a recovery that takes a significant amount of time the logs are
flooded with messages about every resent call.

Log a summary instead and demote per-call messages to INFO level.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-31 05:06:29 +00:00
Pavel Filipenský
8cb6565011 ctdb: Covscan: unchecked return value for trbt_traversearray32()
Signed-off-by: Pavel Filipenský <pfilipen@redhat.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-14 03:49:32 +00:00
Pavel Filipenský
91d1d0e4c8 ctdb: Fix trailing whitespace in rb_tree.c
Signed-off-by: Pavel Filipenský <pfilipen@redhat.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-14 03:49:32 +00:00
Martin Schwenke
64275fc1a2 ctdb-tests: Add backtrace on abort to some tests
These are easier to debug with a backtrace.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue May  3 10:13:23 UTC 2022 on sn-devel-184
2022-05-03 10:13:23 +00:00
Martin Schwenke
d39377d6fc ctdb-tests: Provide a method to dump the stack on abort
Some tests make generous use of assert() and it can be difficult to
guess the cause of failures without resorting to GDB.  This provides
some help.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
73b27def7b build: Add missing ctdb-client dependencies
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
d57d624a77 ctdb-build: Drop unnecessary uses of include/ sub-directory
None of these include any files from the include/ sub-directory.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
6d3c9e64d9 ctdb-tests: Use test_case() to help document test cases
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
d52b497d11 ctdb-locking: Don't pass NULL to tevent_req_is_unix_error()
If there is an error then this pointer is unconditionally
dereferenced.

However, the only possible error appears to be ENOMEM, where a crash
caused by dereferencing a NULL pointer isn't a terrible outcome.  In
the absence of a security issue this is probably not worth
backporting.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
490e5f4d4c ctdb-mutex: Don't pass NULL to tevent_req_is_unix_error()
If there is an error then this pointer is unconditionally
dereferenced.

However, the only possible error appears to be ENOMEM, where a crash
caused by dereferencing a NULL pointer isn't a terrible outcome.  In
the absence of a security issue this is probably not worth
backporting.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-05-03 09:19:31 +00:00
Martin Schwenke
8deec3bc67 ctdb-scripts: Drop unused ctdbd_wrapper
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
a1e78cc372 ctdb-scripts: Drop uses of ctdbd_wrapper
The only value this now provides is use of a notification script to
log when start/stop are called.  This was used for debugging strange
start/stop failures, which have not been recently seen.  Also, systemd
does a good job of logging start/stop.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
aca5972233 ctdb-scripts: Remove failsafe that drops all IPs on failed shutdown
IPs are dropped in the shutdown event.

If a watchdog is necessary to ensure public IPs aren't on interfaces
when CTDB isn't running, then see ctdb-crash-cleanup.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
6fb08a6580 ctdb-daemon: Don't release all public IPs during shutdown sequence
This further untangles public IP handling from the main daemon.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
cb438ecfd4 ctdb-scripts: Drop all public IPs in the "shutdown" event
This is functionally the same as ctdb_release_all_ips().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
3caddaafa0 ctdb-config: Drop CTDB_STARTUP_TIMEOUT
This was added to be able to notice startup failures when unknown
tunables were present in the configuration.  Tunables are now set by
the daemon, so this is no longer necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
208034ecfe ctdb-doc: Update documentation for tunables configuration
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
0902553d15 ctdb-scripts: No longer load tunables via 00.ctdb.script setup event
Drop related tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
f49446cb1e ctdb-daemon: Load tunables from ctdb.tunables
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
a509ee059e ctdb-daemon: New function ctdb_tunables_load()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
b14f2a205d ctdb-tests: Add unit tests for tunables code
This aims to test ctdb_tunable_load_file() but also exercises
ctdb_tunable_names() and ctdb_tunable_get_value().
ctdb_tunable_set_value() is indirectly exercised via
ctdb_tunable_load_file().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
381134939b ctdb-tests: Add function test_case(), tweak unit test header format
Instead of documenting test cases with a comment, this allows them to
be documented via an argument to a function that is printed when the
test case is run.  This makes it easier locate test case failures when
commands used by test cases look similar,

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
c413838f79 ctdb-tests: Strip trailing newlines from expected result output
This allows the provided output to be specified a little more
carelessly.  As per the comment, trailing newlines can't be matched
anyway, so this is notionally a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
5fa0c86b61 ctdb-tests: Reformat script
Samba is reformatting shell scripts using

  shfmt -w -p -i 0 -fn

so update this one before editing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
bcd66e17ee ctdb-common: Add function ctdb_tunable_load_file()
Allows direct loading of tunables from a file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Vinit Agnihotri
93824b8c33 packaging: move CTDB service file to top-level
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Martin Schwenke
2f6b31788b ctdb-packaging: Move RPM spec file to examples directory
We used to use this for building test packages for standalone CTDB.
However, our testing has now changed to use binary tarballs.  We
believe we were the only users of this spec file and expect CTDB to
only be installed as part of a top-level Samba build, especially in
RPM form.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-04-06 06:34:37 +00:00
Stefan Metzmacher
aa02cf3c44 ctdb/packaging/RPM: don't use waf directly
./configure && make && make install is will always work.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-03-29 22:32:32 +00:00
Stefan Metzmacher
22c46d9f41 configure/Makefile: export PYTHONHASHSEED=1 in all 'configure/Makefile' scripts
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-03-29 22:32:32 +00:00
Archana
7debfe7a23 ctdb-tools: Remove deprecated networking commands and replace with new commands
The changes are made to replace the deprecated network commands
(ifconfig,netstat) with the new commands
(ip addr,ss) respectively

Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Mar  8 12:30:53 UTC 2022 on sn-devel-184
2022-03-08 12:30:53 +00:00
Archana
e16cd0316f ctdb-packaging: Remove deprecated networking command netstat and replace with "ss" command
Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-03-08 11:32:36 +00:00
Martin Schwenke
0d8084ed62 ctdb-protocol: CID 1499395: Uninitialized variables (UNINIT)
Issue is reported here:

853     	case CTDB_CONTROL_DB_VACUUM: {
854     		struct ctdb_db_vacuum db_vacuum;
855
>>>     CID 1499395:  Uninitialized variables  (UNINIT)
>>>     Using uninitialized value "db_vacuum.full_vacuum_run" when calling "ctdb_db_vacuum_len".
856     		CHECK_CONTROL_DATA_SIZE(ctdb_db_vacuum_len(&db_vacuum));
857     		return ctdb_control_db_vacuum(ctdb, c, indata, async_reply);
858     	}

The problem is that ctdb_bool_len() unnecessarily dereferences its
argument, which in this case is &db_vacuum.full_vacuum_run.  Not a
security issue because the value copied by dereferencing is not used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 23 02:02:06 UTC 2022 on sn-devel-184
2022-02-23 02:02:06 +00:00
Martin Schwenke
0f373443ef ctdb-tests: Fix missing #include for sigaction(2)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-23 01:08:37 +00:00
Martin Schwenke
ef9017a150 ctdb-tests: Dump a stack trace on abort
Debugging a test failure here without GDB is not possible.  Dumping a
stack trace gives a good hint.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-23 01:08:37 +00:00
Martin Schwenke
17d792e9aa ctdb-tests: Iterate protocol tests internally
Instead of repeatedly running a test binary.

Run time for these tests reduces from ~90s to ~75s.

When run under valgrind, the run time for protocol_test_001.sh reduces
from ~390s to <1s.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Feb 14 04:32:29 UTC 2022 on sn-devel-184
2022-02-14 04:32:29 +00:00
Martin Schwenke
2329305019 ctdb-tests: Add iteration support for protocol tests
The current method of repeatedly running a binary has huge overhead,
especially with valgrind.

protocol_test_iterate_tag() allows output that is usually used for
hinting where a test failure occurred to be replaced with a tag
stored in a buffer, which is printed on test failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 03:36:38 +00:00
Martin Schwenke
331c435ce5 ctdb-tests: Add a test for stalled node triggering election
A stalled node probably continues to hold the cluster lock, so confirm
elections work in this case.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Feb 14 02:46:01 UTC 2022 on sn-devel-184
2022-02-14 02:46:01 +00:00
Martin Schwenke
265e44abc4 ctdb-tests: Factor out functions to detect when generation changes
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
0e74e03c9c ctdb-recoverd: Consistently log start of election
Elections should now be quite rare, so always log when one begins.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
bf55a0117d ctdb-recoverd: Always send unknown leader broadcast when starting election
This is currently missed when the cluster lock is lost.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
9b3fab052b ctdb-recoverd: Consistently have caller set election-in-progress
The problem here is that election-in-progress must be set to
potentially avoid restarting the election broadcast timeout in
main_loop(), so this is already done by leader_handler().

Have force_election() set election-in-progress for all election types
and do not bother setting it in cluster_lock_election().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
188a902156 ctdb-recoverd: Always cancel election in progress
Election-in-progress is set by unknown leader broadcast, so needs to
be cleared in all cases when election completes.

This was seen in a case where the leader node stalled, so didn't send
leader broadcasts for some time.  The node continued to hold the
cluster lock, so another node could not become leader.  However, after
the node returned to normal it still did not send leader broadcasts
because election-in-progress was never cleared.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14958

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-02-14 01:47:31 +00:00
Martin Schwenke
f7de2132bb ctdb-doc: Remove documentation for recovery process
This is many years out of date and recent changes make it worse.  It
is unlikely that anyone has the time to fix this in the near future,
so remove it because it is misleading.

Database recovery steps are well documented in comments in the
recovery helper.  Cluster monitoring documentation can be re-added
when things stop changing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
a940ad9370 ctdb-doc: Update example configuration migration script
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
01313ea243 ctdb-tests: Improve test coverage for leader role yield and elections
Rename test, clean up node selection.  Duplicate for for banning and
removing leader capability cases.  Repeat all 3 tests without cluster
lock.

All of the standard election triggers are now tested, with and without
cluster lock.  Due to test cluster configuration limitations, the
tests without cluster lock are skipped on a real cluster.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
5d31778149 ctdb-tests: Support commenting out local daemons configuration options
Can be used to disable default options, such as cluster lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
34d2ca0ae6 ctdb-config: Add configuration option [cluster] leader timeout
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
1dfb266038 ctdb-config: [legacy] recmaster capability -> [cluster] leader capability
Rename this configuration item and move it into the [cluster]
configuration section.

Update documentation to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
f5a39058f0 ctdb-config: [cluster] recovery lock -> [cluster] cluster lock
Retain "recovery lock" and mark as deprecated for backward
compatibility.

Some documentation is still inconsistent.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
d752a92e11 ctdb-doc: Update documentation for leader and cluster lock
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
73555e8248 ctdb-recoverd: Use race for cluster lock as election when lock is enabled
If the cluster is partitioned then nodes in one partition can not take
the lock anyway, so election is pointless.  It just introduces
unnecessary corner cases.

Instead just race for the lock.

When a node notices a lack of leader and notifies other nodes of an
election via an unknown leader broadcast, the cluster lock election is
hooked into this broadcast.

The test needs to be updated because losing the cluster lock can now
result in a leadership change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
938d64c8ff ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
03ae158cff ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
a76374070d ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
193b624d26 ctdb-protocol: Drop protocol client functions for recmaster controls
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
cda673ff6d ctdb-client: Drop unused recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
16efbca003 ctdb-daemon: Drop unused old client recmaster functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
c68267b2a6 ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster()
Nothing fetches this value anymore.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
58d7fcdf7c ctdb-recoverd: Drop recovery master verification
This doesn't make sense if leader broadcasts are used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
f02e097485 ctdb-tools: recovery master -> leader
The following command names are changed:

  recmaster -> leader
  setrecmasterrole -> setleaderrole

Command output changed for the following commands:

  status
  getcapabilities

Documentation and tests are updated to reflect these changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
e60581d5b5 ctdb-tools: Use leader broadcast in get_leader()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
92fb68e9b8 ctdb-tools: Factor out get_leader()
This seems pointless but it localises a subsequent change and also
starts a terminology change in the tool code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
17ba15ccd8 ctdb-tools: Handle leader broadcasts in ctdb tool
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
ec90f36cc6 ctdb-tools: Print "UNKNOWN" when leader PNN is unknown
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
01a8d1a4a4 ctdb-client: Factor out function ctdb_client_wait_func_timeout()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
403db5b528 ctdb-tests: Factor out getting leader and waiting for leader change
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
4786982cc8 ctdb-tests: Add leader broadcasts to fake_ctdbd
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Amitay Isaacs
756dfdfed9 ctdb-tests: Implement srvid_handler for dispatching messages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-01-17 10:21:33 +00:00
Martin Schwenke
958746f947 ctdb-recoverd: Simplify some stopped/banned checks to inactive checks
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
358c59f51a ctdb-recoverd: No longer take cluster lock during recovery
Confirm instead that it is already held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
36ffaaa691 ctdb-recoverd: Add and use function cluster_lock_enabled()
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
5ee664ee17 ctdb-recoverd: Terminology change: recovery lock -> cluster lock
No functional changes, just name changes for clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
0f2250f4f9 ctdb-recoverd: Take cluster lock when election completes
It is no longer just a recovery lock but is always held by the cluster
leader.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
011e880002 ctdb-recoverd: Factor out function cluster_lock_take()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:33 +00:00
Martin Schwenke
037abf8620 ctdb-tests: Avoid a race
See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef7e3265f7 ctdb-tests: Setup cluster with expected arguments
ctdb_test_init() doesn't actually pass arguments to local_daemons.sh.
This needs to be done using ctdb_nodes_start_custom().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
b029ca4d51 ctdb-recoverd: Drop leader validation
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation.  Using the leader
broadcast may not be as fast but it is more correct.

When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active.  This is because the leader will no
longer push the node map to other nodes.  However, having all nodes
fetch the node map from an inactive leader may be unreliable.

Most of the other cases are also handled more reliably by the leader
broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7e53fab0a3 ctdb-recoverd: Drop special case for elected-before-connected
This no longer occurs at startup due to the leader broadcast timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ef4b8c13c0 ctdb-recoverd: Handle leader broadcast timeout
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.

Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.

When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes.  That causes cancellation of the leader
broadcast timeout across the cluster.  This is particular important at
startup, since nodes may be started in a staggered fashion.  Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader.  When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
5c7f6da0f0 ctdb-recoverd: Send leader broadcasts
These are triggered on 1 second timer, but are only sent if the node
is the current leader and there is no election underway.

If this node can not be the leader then ensure it releases the
recovery lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
789a75abfa ctdb-recoverd: Process leader broadcasts
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
3d3767a259 ctdb-protocol: Add CTDB_SRVID_LEADER
CTDB_SRVID_LEADER will be regularly broadcast to all connected nodes
by the leader.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
c2cfd9c21a ctdb-recoverd: Add an explicit flag for election in progress
An alternate election method will be added that doesn't use the
election timeout, so this provides a common way for recognising when
an election is in progress.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ac5a3ca063 ctdb-recoverd: Only start election if node can be leader
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
7baadfe27e ctdb-recoverd: Add and use function this_node_can_be_leader()
This makes the code self-documenting.

In ctdb_election_data() there is a slight behaviour change.  An
inactive node will now try to lose an election.  This case should not happen
because:

* An inactive node can't win an election round and then send a reply.

* Any inactive node should never start an election.  There are
  currently places where this happens and they will be fixed later.

There is an instance where this could be used in
validate_recovery_master() but this involves a more serious logic
change.  Overhaul this function later.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
94b546c268 ctdb-recoverd: Logging/comments: recovery master -> leader
There are some remaining instances in this file but they will be
removed in subsequent commits.

Modernise debug macros as appropriate.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
dd79e9bd14 ctdb-recoverd: Rename recmaster field to leader
Recovery master is being renamed to leader.  This follows clustering
best practice (e.g. RAFT).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
2ee6763c7d ctdb-recoverd: Use rec->pnn everywhere
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

rec->pnn is now always used when referring to the recovery daemon's
PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
4af3b10a37 ctdb-recoverd: Change argument to srvid_disable_and_reply()
Reduce dependency on struct ctdb_context internals, enable a
subsequent change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
b7c138ca99 ctdb-recoverd: Simplify arguments to ctdb_ban_node()
ban_time argument is always ctdb->tunable.recovery_ban_period, so
build this in and make the calling code more readable.

ctdb_ban_node() already logs how long a node is banned for, so don't
repeatedly log this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
a5e0ddac62 ctdb-recoverd: Simplify arguments to verify_local_ip_allocation()
All other arguments are available via rec, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
67b5191640 ctdb-recoverd: Simplify arguments to do_recovery()
pnn and nodemap are both available via the rec context, so simplify.
vnnmap is unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57882beb16 ctdb-recoverd: Simplify arguments to some election functions
The pnn and nodemap arguments to force_election() and
send_election_request() are always effectively rec->pnn and
rec->nodemap, so simplify.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
9dbe7cc85e ctdb-recoverd: Add PNN to recovery daemon context
This is currently referenced in a number of inconsistent
ways, including:

* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)

The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?

The intention is to always use rec->pnn when referring to the recovery
daemon's PNN.

Doing this also reduces reliance on struct ctdb_context internals.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
ff0140e470 ctdb-recoverd: Use this_node_is_leader() in an extra context
This is arguably clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
c8721d01c6 ctdb-recoverd: Factor out and use function this_node_is_leader()
Make the code self-documenting.

This preempts an upcoming change to terminology but doing it now saves
a lot of churn.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 10:21:32 +00:00
Martin Schwenke
57a32cebdd ctdb-recoverd: Pass SIGHUP to running helper
The recovery and takeover helpers can run for a while and generate
non-trivial logs, so have them reopen their logs to support log
rotation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jan 17 04:36:30 UTC 2022 on sn-devel-184
2022-01-17 04:36:30 +00:00
Martin Schwenke
8e949a6082 ctdb-recoverd: Record helper PID in recovery daemon context
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
97a45f6f25 ctdb-recoverd: Add log reopening on SIGHUP to helpers
Recovery and takeover helpers can run for a while and generate
non-trivial logs.  They should support log reopening.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
51f0380e83 ctdb-daemon: Enable log reopening for event daemon
Add and call hook to pass on SIGHUP to eventd.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
4f14d7c0b9 ctdb-event: Reopen logs on SIGHUP
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
c554a325fe ctdb-daemon: Enable log reopening for recovery daemon
Pass on a SIGHUP to the recovery daemon, which will then reopen its
logs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
4acfefed61 ctdb-recoverd: Add basic log reopening
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
4ed37de82b ctdb-daemon: Add basic top-level log reopening
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
7277385390 ctdb-common: Add support for reopening logs
Now that CTDB uses Samba's file logging it is possible to reopen the
logs, so that log rotation can be supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
d0a19778cd ctdb-common: Separate sock_daemon's SIGHUP and SIGUSR1 handling
SIGHUP is for reopening logs, SIGUSR1 is for reconfigure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
10d15c9e5d ctdb-common: Use Samba's DEBUG_FILE logging
This has support for log rotation (or re-opening).

The log format is updated to use an RFC5424 timestamp and to include a
hostname.  The addition of the hostname allows trivial merging of log
files from multiple cluster nodes.

The hostname is faked from the CTDB_BASE environment variable during
testing, as per the comment in the code.  It is currently faked in a
similar manner in local_daemons.sh when printing logs, so drop this.

Unit tests need updating because stderr logging no longer produces a
"PROGNAME[PID]: " header.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
666a048707 ctdb-common: Switch initial debug type to DEBUG_DEFAULT_STDERR
This can be overridden by DEBUG_FILE, whereas DEBUG_STDERR can not.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2022-01-17 03:43:30 +00:00
Martin Schwenke
7163846a49 ctdb-protocol: Print IPv6 sockets with RFC5952 "[2001:db8::1]:80" notation
RFC5952 says the existing style is not recommended and the [] style
should be employed.

There are more optimised ways of adding the square brackets but they
tend to be uglier.

Parsing IPv6 sockets without [] is now tested indirectly by parsing
examples in both styles and comparing the results.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Jan 13 17:02:21 UTC 2022 on sn-devel-184
2022-01-13 17:02:21 +00:00
Martin Schwenke
255fe69c90 ctdb-tests: Add extra IPv6 socket parsing tests
Add tests to confirm that square brackets are handled and that
IPv4-mapped IPv6 addresses are parsed as expected.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2022-01-13 16:13:38 +00:00
Volker Lendecke
224e99804e ctdb-protocol: Allow rfc5952 "[2001:db8::1]:80" ipv6 notation
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14934
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-01-13 16:13:38 +00:00
Volker Lendecke
820b0a63cc ctdb-protocol: Save 50 bytes .text segment
Having this as a small static .text is simpler than having to create
this on the stack.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-01-13 16:13:38 +00:00
Volker Lendecke
baaedd69b3 ctdb-protocol: rindex->strrchr
According to "man rindex" on debian bullseye rindex() was deprecated
in Posix.1-2001 and removed from Posix.1-2008.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2022-01-13 16:13:38 +00:00
Pavel Filipenský
5ac8762256 ctdb:utils: Improve error handling of hex_decode()
This has been found by covscan and make analyzers happy.

Pair-programmed-with: Andreas Schneider <asn@samba.org>

Signed-off-by: Pavel Filipenský <pfilipen@redhat.com>
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2022-01-10 23:31:33 +00:00
Andreas Schneider
90fd7674f8 ctdb:client: Initialize structs and pointers in ctdb_ctrl_(en|dis)able_node()
Found by covscan.

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-12-15 19:32:30 +00:00
Martin Schwenke
1719ef7893 ctdb-tests: Drop unused function ctdb_get_all_public_addresses()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Boehme <slow@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Oct 12 23:24:18 UTC 2021 on sn-devel-184
2021-10-12 23:24:18 +00:00
Ralph Boehme
4e3676cb3c ctdb-tests: add a comment to the generated public_addresses file used by eventscript UNIT tests
test stub code has been updated to handle this, so now let's put it
to work.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826
RN: Correctly ignore comments in CTDB public addresses file

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2021-10-12 22:38:32 +00:00
Martin Schwenke
5426c104f5 ctdb-tests: Fix typo in ctdb stub comment matching
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-10-12 22:38:32 +00:00
Ralph Boehme
530e8d4b9e ctdb-scripts: filter out comments in public_addresses file
Note that order of sed expressions matters: the expression to delete
comment lines must come first as the second expression would transform

  # comment

to

  comment

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2021-10-12 22:38:32 +00:00
Martin Schwenke
9e7d2d9794 ctdb-daemon: Don't mark a node as unhealthy when connecting to it
Remote nodes are already initialised as UNHEALTHY when the node list
is initialised at startup (ctdb_load_nodes_file() calls
convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
So, drop this code.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep  9 02:38:34 UTC 2021 on sn-devel-184
2021-09-09 02:38:34 +00:00
Martin Schwenke
7f697b1938 ctdb-daemon: Ignore flag changes for disconnected nodes
If this node is not connected to a node then we shouldn't know
anything about it.  The state will be pushed later by the recovery
master.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
ae10a8a4b7 ctdb-daemon: Simplify ctdb_control_modflags()
Now that there are separate disable/enable controls used by the ctdb
tool this control can ignore any flag updates for the current nodes.
These only come from the recovery master, which depends on being able
to fetch flags for all nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
916c5ee131 ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
and replace with srvid_not_implemented().  Mark the SRVID obsolete in
its comment.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e75256767f ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
The code that handles this message is
ctdb_recoverd.c:monitor_handler().  Although it appears to do
something potentially useful, it only logs the flags changes.  All
changes made are to local structures - there are no actual
side-effects.

It used to trigger a takeover run when the DISABLED flag changed.
This was dropped back in commit
662f06de9f.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
0132bd5a22 ctdb-daemon: Modernise remaining debug macro in this function
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
b6d25d079e ctdb-daemon: Update logging for flag changes
When flags change, promote the message to NOTICE level and switch the
message to the style that is currently generated by
ctdb-recoverd.c:monitor_handler().  This will allow monitor_handler()
to go away in future.

Drop logging when flags do not change.  The recovery master now logs
when it pushes flags for a node, so the lack of a corresponding
"changed flags" message here indicates that no update was required.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
eec44e2862 ctdb-daemon: Correct the condition for logging unchanged flags
Don't trust the old flags from the recovery master.

Surrounding code will change in future comments, including the use of
old-style debug macros, so just make this change clear.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
5914054698 ctdb-tools: Use disable and enable controls in tool
Note that there a change from broadcast to a directed control here.
This is OK because the recovery master will push flags if any nodes
disagree with the canonical flags fetched from a node.

Static function ctdb_ctrl_modflags() is no longer used to drop it.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
6fe6a54e7f ctdb-client: Add client code for disable/enable controls
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
15a6489c28 ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
60c1ef1465 ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
intended here.  Luckily, it doesn't do any harm because nodes are
marked unhealthy at startup anyway.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
1ac7bc7532 ctdb-daemon: Factor out a function to get node structure from PNN
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e0a7b5a9e8 ctdb-daemon: Add a helper variable
Simplifies a subsequent change.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
6845dca87e ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
49dc5d8cd2 ctdb-protocol: Add new controls to disable and enable nodes
These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.

For consistency these match CTDB_CONTROL_STOP_NODE and
CTDB_CONTROL_CONTINUE_NODE.  It would be possible to add a single
control but it would need to take data.

The aim is to finally fix races in flag handling.  Previous fixes have
improved the situation but they have only narrowed the race window.
The problem is that the recovery daemon on the master node pushes
flags to nodes the same way that disable and enable are implemented.
So the following sequence is still racy:

1. Node A is disabled
2. Recovery master pulls flags from all nodes including A
3. Node A is enabled
4. Recovery master notices A is disabled and pushes a flag update to
   all nodes including node A
5. Node A is erroneously marked disabled

Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
disable" command or a flag update from the recovery master.

The solution is to use a different mechanism for disable/enable and
for a node to ignore MODIFY_FLAGS controls for their own flags.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
8305f6a7f1 ctdb-recoverd: Push flags for a node if any remote node disagrees
This will usually happen if flags on the node in question change, so
keeping the code simple and pushing to all nodes won't hurt.  When all
nodes come up there might be differences in connected nodes, causing
such "fix ups".  Receiving nodes will ignore no-op pushes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
620d078714 ctdb-recoverd: Update the local node map before pushing out flags
The resulting code structure looks a little weird.  However, there is
another condition that requires the flags to be pushed that will be
inserted before the continue statement in a subsequent commit..

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
82a075d4d7 ctdb-recoverd: Add a helper variable
Improves readability and simplifies subsequent changes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
b724c1e6a6 utils: Avoid pylint warning
pylint warns:

  Use lazy % formatting in logging functions

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jul 20 05:29:18 UTC 2021 on sn-devel-184
2021-07-20 05:29:18 +00:00
Martin Schwenke
319e27343d utils: Reformat lines that are longer than 80 columns
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
98c7a38b71 utils: Tweak exception handling to stop flake8 complaining
Don't bother with "as e" to avoid warning about unused variable.
Don't use bare "except:" (though pylint still complains about this
version).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
12d3e215a6 utils: Simplify log level logic, drop global variable
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
e323d16a9d utils: Inline defaults and help strings
Removes an unnecessary level of indirection: defaults and help strings
are now where they are expected.  Also removes some global variables.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
af5aecced1 utils: Move argument processing into function and call from main()
Removes the need for the global variables currently associated with
this processing.  Also removes unnecessarily double-handling the
defaults, which are assigned to the global variables and set via
add_argument().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
e66637a079 utils: Reorder imports so that standard imports are first
Avoids numerous pylint warnings.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
bd0b2bb6ee utils: Clean up ctdb_etcd_lock using autopep8
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
939aed0498 utils: Use Python 3
Due to the number of flake8 and pylint warnings it is unclear if the
source has Python 3 incompatibilities.  These will be cleaned up in
subsequent commits.

Signed-off-by: "L.P.H. van Belle" <belle@bazuin.nl>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
466aa8b6f5 ctdb-scripts: Ignore ShellCheck SC3013 for test -nt
In ShellCheck 0.7.2, POSIX compatibility warnings got their own SC3xxx
error codes, so now both the old and new codes need to be ignored.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 25 10:06:48 UTC 2021 on sn-devel-184
2021-06-25 10:06:48 +00:00
Martin Schwenke
fc0da6b0f8 ctdb-tests: Force stub version of service in eventscript tests
Fedora 34 now has a shell function for the which command, which causes
these uses of which to return the enclosing function definition rather
than the executable file as expected.

The event script unit tests always expect the stub service command to
be used, so the conditional in these functions is unnecessary.
$CTDB_HELPER_BINDIR already conveniently points to the stub directory,
so use it here.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Martin Schwenke
23b2fab2c8 ctdb-common: Drop unused include of mkdir_p.h
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Martin Schwenke
e40d452722 ctdb-daemon: Close server socket when switching to client
The socket is set close-on-exec but that doesn't help for processes
that do not exec().  This should be done for all child processes.

This has been seen in testing where "ctdb shutdown" waits for the
socket to close before succeeding.  It appears that lingering
vacuuming processes have not closed the socket when becoming clients
so they cause "ctdb shutdown" to hang even though the main daemon
process has exited.  The cause of the lingering vacuuming processes
has been previously examined but still isn't understood.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Martin Schwenke
f7cf8132b0 ctdb-tests: Add debug_locks.sh tests for mutexes
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri May 28 07:34:23 UTC 2021 on sn-devel-184
2021-05-28 07:34:23 +00:00
Amitay Isaacs
99c3b49260 ctdb-scripts: Add lock debugging for tdb mutex locks
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
2021-05-28 06:46:29 +00:00