1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-03 01:18:10 +03:00
Commit Graph

9240 Commits

Author SHA1 Message Date
Martin Schwenke
b3e2c69ad9 ctdb-scripts: update_tickles() should use the public IPs cache
This avoids duplicating logic.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Martin Schwenke
1a4a6c46f1 ctdb-scripts: Don't list connections when not hosting IPs
With an empty IP filter, all incoming connections to port 2049 will be
listed, not just those to public IP addresses.  This causes error
messages like the following to be logged:

  ctdb-eventd[...]: 60.nfs: Failed to add 1 tickles

since the connection being added seems to be for a random NFS mount
that doesn't use a public IP addresses.

This has been a problem for a long time (probably since commit
04fe9e2074 in 2015).  It isn't currently
a huge deal because it only affects NFS connections.  However, this
code will soon be used to track connections to public IP addresses on
all ports.  This would result in a constant stream of log messages,
since there will always be some active connections.

The theory behind the fix is that if a node hosts no public IPs then
it should have no relevant connections and has no business changing
the list of registered tickles.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Martin Schwenke
3410eddd93 ctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"
Massage a couple of lines manually so they're formatted sanely given
the new indentation.   Re-run shfmt to ensure no further changes.

Best reviewed with "git show -w".

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Martin Schwenke
025bd34dfc ctdb-doc: Improve 10.interface documentation and comments
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Martin Schwenke
60067e2a74 ctdb-tests: Fix ss -a not supported
This is currently just a series of typos.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Martin Schwenke
4817e32c1d ctdb-tests: Drop unsupported long options from ss stub usage
These have not been supported since commit
896c77df1c in 2018.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Martin Schwenke
557b034200 ctdb-tests: Ensure ss stub handles square brackets around addresses
It isn't unreasonable for unit test cases to use square brackets in
their input.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jerry Heyman <jheyman@ddn.com>
2024-11-06 23:03:42 +00:00
Douglas Bagnall
076c284d6f ctdb:tests: s/the the\b/the/ in comments
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Volker Lendecke <vl@samba.org>
2024-11-06 10:57:34 +00:00
Martin Schwenke
919510d86b ctdb-scripts: Don't set arp_filter=1 by default in 10.interface
That is, no longer set sysctl net.ipv4.conf.all.arp_filter=1 in
10.interface.  Only do this in 13.per_ip_routing.

This effectively reverts commit
0ebd7beb4b by Ronnie Sahlberg from 2007.
I have discussed this with Ronnie.  This setting was originally added
to force incoming traffic to the interface hosting each IP.  This
would spread the load across multiple interfaces hosting the same
subnet.  Without the setting, incoming traffic would go to the first
interface to answer an ARP request, so could be unbalanced if one
interface tended to answer more quickly.

However, networks are now faster and interface bonding/teaming works
well in Linux, so it is less likely that multiple interfaces will be
used in this way.

Also, problems are occurring in exactly the case this is meant to
help: when multiple interfaces host the same subnet.

The Linux kernel documentation for this option says:

  arp_filter - BOOLEAN
        - 1 - Allows you to have multiple network interfaces on the same
          subnet, and have the ARPs for each interface be answered
          based on whether or not the kernel would route a packet from
          the ARP'd IP out that interface (therefore you must use source
          based routing for this to work). In other words it allows control
          of which cards (usually 1) will respond to an arp request.

        - 0 - (default) The kernel can respond to arp requests with addresses
          from other interfaces. This may seem wrong but it usually makes
          sense, because it increases the chance of successful communication.
          IP addresses are owned by the complete host on Linux, not by
          particular interfaces. Only for more complex setups like load-
          balancing, does this behaviour cause problems.

        arp_filter for the interface will be enabled if at least one of
        conf/{all,interface}/arp_filter is set to TRUE,
        it will be disabled otherwise

Note the part for arp_filter=1 that says "you must use source based
routing for this to work".  The problems are probably due to a lack of
source-based routing when this is only used with 10.interface.  In
this case, outbound packets can come from a different
interface (corresponding to the first matching route), with a
different MAC address.  There is clearly some infrastructure or packet
filtering out there that objects to such asymmetric packet flows.

So, drop this setting from 10.interface because it isn't working as
intended.  Continue to enable it in 13.per_ip_routing, which exists to
set up the required source-based routing.

This change may affect balancing of packet flows when public IP
addresses can be hosted by multiple interfaces, but does not stop that
feature from working.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>

Autobuild-User(master): Anoop C S <anoopcs@samba.org>
Autobuild-Date(master): Thu Oct 17 18:53:32 UTC 2024 on atb-devel-224
2024-10-17 18:53:32 +00:00
Anoop C S
9ad287ed9c ctdb-build: Add missing ctdb-tcp dependency
Since 02c9e7a63f, common/path.h is
included within ctdb/tcp/tcp_connect.c. Therefore add ctdb-util
as a dependency for ctdb-tcp.

Signed-off-by: Anoop C S <anoopcs@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Oct  8 12:27:17 UTC 2024 on atb-devel-224
2024-10-08 12:27:17 +00:00
Martin Schwenke
9a2249c990 ctdb-server: Use find_public_ip_vnn() in a couple of extra places
Reorder code to use early returns, modernise debug.

Best reviewed with "git show -w".

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>

Autobuild-User(master): Anoop C S <anoopcs@samba.org>
Autobuild-Date(master): Tue Oct  8 06:42:04 UTC 2024 on atb-devel-224
2024-10-08 06:42:04 +00:00
Martin Schwenke
2167261b0e ctdb-server: Clean up find_public_ip_vnn()
Fix the comment (NULL versus -1), apply some README.Coding.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
d268c605c1 ctdb-daemon: Ensure CTDB_BASE is set, don't fetch it
Uses of CTDB_BASE in the subsequent code are now handled by the path
module, so there is no point getting the value of CTDB_BASE.  Instead,
check that the attempt to set it worked, noting that:

  [...] if overwrite is zero, then the value of name is not
  changed (and setenv() returns a success status).

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
b6151faf45 ctdb-daemon: Use path_etcdir_append() to construct some paths
No need to use CTDB_BASE directly.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
3b613a085b ctdb-daemon: Replace remaining uses of CTDB_NO_MEMORY() in this file
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
01cc3f0784 ctdb-daemon: Clean up error handling and debug
Add some missing error handling and error messages.

Remove a use of CTDB_NO_MEMORY(), which then renders the caller's use
of ctdb_errstr() pointless, so remove that too.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
3429ba764c ctdb-daemon: Use ctdb_vnn_address_string() in old-style debugging
Modernise the debug macros along the way.

These are done separately because they will require a little more
patience to review.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
c17e629a8a ctdb-daemon: Add ctdb_vnn_address_string() and use in trivial places
Define a static function to return the string.  This clearly doesn't
need a ctdb_ prefix, but it matches ctdb_vnn_iface_string(), so
doesn't look out of place.

Use it in the places where review is trivial.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
766e6d35c4 ctdb-daemon: Store public address string in VNN
These are currently converted to strings constantly in log messages
and other places.  This clutters the code and probably has a minor
performance impact.

Add a new string field to the VNN structure.  Populate it when a
public address is added and the VNN structure is allocated.  This is
consistent with how node addresses are handled.

Don't use it yet, or this commit becomes huge.

A short-term goal is that each VNN public address will be converted to
a string only once.  A longer-term goal is to reduce use of
ctdb_addr_to_str().

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
ef3d9c227f ctdb-daemon: Fix a comment
The word "no" was accidentally dropped in commit
1e47a1b3f6.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
7b4447b4d3 ctdb-daemon: Drop unused arguments
Unused since commit a10545ab6b.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
46f6b50f7a ctdb-daemon: Improve error handling when releasing all IPs
Currently, event failures are completely ignored in favour of checking
if the IP is on an interface.  This misses the case where event
scripts up to and including 10.interface succeed, but something later
fails.  When that occurs, count is incremented, so the failure is
counted as a success in the summary that is logged.

Fail when releaseip fails even though 10.interface succeeded in
releasing the IP.  This may result in the IP address coming back, but
that's a different problem.

Underlying this is a design question about when releaseip is
successful.  Should releaseip be a distinct operation, with subsequent
reconfigurations considered separately?

Update logging to clearly identify each of the 3 possible errors.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-08 05:34:30 +00:00
Martin Schwenke
d1cb6dca72 ctdb-tcp: Modernise a DEBUG
This is last old-style one in this file.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>

Autobuild-User(master): Anoop C S <anoopcs@samba.org>
Autobuild-Date(master): Mon Oct  7 17:12:18 UTC 2024 on atb-devel-224
2024-10-07 17:12:18 +00:00
Martin Schwenke
939e5bdfd2 ctdb-tcp: Only attempt to automatically bind to local IPs
Automatic node address selection in the TCP transport does not work if
net.ipv4.ip_nonlocal_bind=1 because all nodes will be able to bind()
to the first address in the nodes list.

Before getting to the bind() step, add a check to see if an address is
local (i.e. on an interface).  If not, it is not considered.

This is defensively coded so that this step is skipped if local
addresses can not be retrieved.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
5af2f09a1f ctdb-server: Optimise local IP verification
It is more efficient calling ctdb_sys_local_ip_check() inside a loop
compared to calling ctdb_sys_have_ip().  There is a chance that this
is premature optimisation... but it sure is easy.  Fall back to
checking with bind().

I think these checks really exist because of the weirdness fixed by
commit 4b4e4d8870.  However, we might as
well do what we can.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
0536d7a98b ctdb-common: Reimplement ctdb_sys_have_ip() using new infrastructure
It can now be used when net.ipv4.ip_nonlocal_bind=1.

This makes the recovery daemon's local IP verification inefficient.
It can be optimised in a subsequent commit.

Fall back to bind() if unable to fetch IPs.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
a489b6699d ctdb-common: Make the argument to ctdb_sys_have_ip() const
Arguably, this would have made sense back in commit
bf86562144.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
a15167aafe ctdb-server: Add some local variables
Improve readability by not repeating the complex expression now
assigned to addr.  ctdb_sys_have_ip() is called in both arms of the
if/else, so call it once when declaring the new variable.

Modernise debug macros while touching lines.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
33e28deeef ctdb-tests: Add test code for ctdb_sys_have_ip()
Do not add any automated test cases because they will always be racy.
This allows manual testing of the function.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
cc99d0047d ctdb-common: Add functions for local IP address checking
This is a wrapper around getifaddrs(2), which is in libreplace, so
should always be available.

Some users want to set net.ipv4.ip_nonlocal_bind = 1.  So, CTDB needs
a way of testing if public IPs are present, without using bind(2).

Doing all of this unconditionally in ctdb_sys_have_ip() will be
inefficient in the recovery daemon's local IP verification if there
are a lot of IP addresses.  Split it this way so the interface
information can be retrieved once and used multiple times.

This doesn't appear to need IP canonicalisation for IPv4-mapped IPv6
addresses.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
aab763d659 ctdb-protocol: Add function ctdb_sock_addr_from_sockaddr()
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-07 15:58:38 +00:00
Martin Schwenke
250947611c ctdb-tests: Fix test failure when tests are installed
This currently works when tests are run in-tree.

However, when installed, use of an incorrect variable means it fails
to find statd_callout in the tests/ subdirectory.  Switch to using the
correct variable.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sun Oct  6 11:07:05 UTC 2024 on atb-devel-224
2024-10-06 11:07:05 +00:00
Martin Schwenke
18b0ea3e9a ctdb-tests: Add missing quotes in test output
These should have caused test failure since commit
ef921bdbdb.  However, the test failure
occurred in a sub-shell, which covered the failure.  So, add an error
exit if the sub-shell fails.

While here, add an error exit for another potential uncaught
sub-shell-related failure in a related test.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
2024-10-06 09:52:35 +00:00
Volker Lendecke
25a222225d ctdb: Use str_list_add_printf() in lock_helper_args()
Saves lines, str_list_add_printf takes care of NULL checks

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sun Sep 22 10:44:59 UTC 2024 on atb-devel-224
2024-09-22 10:44:59 +00:00
Volker Lendecke
83716809a8 ctdb: Change the ctdb_vfork_exec prototype to const char*const*
I could not find out how to cast a char ** to const char ** without
warning. This transfers fine to the execv call as well.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2024-09-22 09:36:36 +00:00
Volker Lendecke
53750d9deb ctdb: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Noel Power <noel.power@suse.com>
2024-09-20 17:13:37 +00:00
Volker Lendecke
65b3081f4b ctdb: Use str_list_add_printf() in debug_locks_args()
Saves lines, str_list_add_printf takes care of NULL checks

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Noel Power <noel.power@suse.com>
2024-09-20 17:13:37 +00:00
Volker Lendecke
2fa0eabe64 ctdb: Make ctdb_lock_timeout_handler() easier to understand
Don't hide the real action inside an if-branch

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Noel Power <noel.power@suse.com>
2024-09-20 17:13:37 +00:00
Martin Schwenke
574f2c3ed8 ctdb-tests: Add persistent TDB backup tests
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 30 00:08:41 UTC 2024 on atb-devel-224
2024-08-30 00:08:41 +00:00
Martin Schwenke
05da9001b9 ctdb-scripts: Add support for backing up persistent TDBs
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-29 22:48:33 +00:00
Martin Schwenke
82250f3629 ctdb-scripts: Move database handling to its own event script
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-29 22:48:33 +00:00
Martin Schwenke
9c354e358e ctdb-scripts: Reformat with "shfmt -w -p -i 0 -fn"
Best reviewed with "git show -w".

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-29 22:48:33 +00:00
Martin Schwenke
b4c7a4f7f0 ctdb-scripts: Remove unused variable NFS_HOSTNAME
This was passed to CTDB's old smnotify.  This has been replaced by use
of nfs-utils' sm-notify, which doesn't need this.

In test, a fake NFS_HOSTNAME is still needed.  Real sm-notify will get
it from a reverse host lookup of the IP address.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-29 22:48:33 +00:00
Martin Schwenke
ece6153038 ctdb-scripts: Use nfs-utils' sm-notify instead of CTDB's smnotify
CTDB's smnotify does not support IPv6 and is difficult to maintain.

So, create directories of files and pass them to NFS util's sm-notify.

There is an implied change here, because NFS utils sm-notify stopped
sending IP addresses as mon_name back in 2010:

  http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commitdiff;h=900df0e7c0b9006d72d8459b30dc2cd69ce495a5

This will change advice given in the wiki to use a hostname for the
cluster with round-robin DNS, since this is what is best supported.

Another behavioural change is that sm-notify only sends "up"
notifications with an odd state.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-29 22:48:33 +00:00
Martin Schwenke
d89506449f ctdb-failover: Add ctdb_smnotify_helper
statd callout will shortly be updated to use NFS utils' sm-notify.
This tiny helper will be used to create on-disk state files used by
sm-notify.  These state files contain endian-specific fields, so
better to write a simple C implementation than to do crazy things in a
shell script (or call out to Python).

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-29 22:48:33 +00:00
Volker Lendecke
3cc3329420 ctdb: Add a NULL check to convert_node_map_to_list()
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jennifer Sutton <jsutton@samba.org>
2024-08-27 07:19:32 +00:00
Martin Schwenke
578dfa5765 ctdb-scripts: Avoid flapping NFS services at startup
If an NFS service check is set to, say, unhealthy_after=2 then it will
always switch from the (default startup) unhealthy state to healthy,
even if there is a fatal problem.  If all services/scripts appear OK
then the node will become healthy.  When the counter hits the limit it
will return to unhealthy.  This is misleading.

Instead, never use the counter at startup, until the service becomes
healthy.  This stops services flapping unhealthy-healthy-unhealthy.

A side-effect is that a service that starts in a broken state will
never be restarted to try to fix the problem.  This makes sense.  The
counting and restarting really exist to deal with problems that might
occur under load.  The first monitor events occur before public IPs
are hosted, so there can be no load.  If a service doesn't start
reliably the first time then the admin probably wants to know about
it.

nfs_iterate_test() is updated to run an initial monitor event to mark
the services as healthy.  This initialises the counter so it can be
used for the important part of the test.  Passing the -i option avoids
running the extra monitor event, so the first iteration will be the
initial monitor event.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-20 22:50:34 +00:00
Martin Schwenke
18a29ed367 ctdb-scripts: Make initial statistics output empty
This makes initial failure to retrieve statistics less likely to
result in a statistics change.  To help with this, statistics
retrieval stderr now goes to the log - only stdout goes to the file.

This means that the test code for checking statistics changes needs to
be redone to actually run the statistics command and check.  As with
rpcinfo output, this output needs to behave as deterministically in
the test code as it done in the event script.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-20 22:50:34 +00:00
Martin Schwenke
032b7b49c9 ctdb-scripts: Only consider statistics on timeout
Checking statistics is only really relevant to timeouts.  That is, if
an rpcinfo times out it is worth checking if the service making
progress.  If the RPC service is not registered then the statistics
don't need to be checked because they shouldn't be changing.

The 2 previously added tests added to check statistics progress now
behave identically and fail on all iterations.  To support testing
with "timeouts", an optional TIMEOUT flag can now be added to the RPC
service passed to nfs_iterate_test().  2 new tests are added to
exercise the new behaviour.

The 2 new "if" statements in nfs_iterate_test() could be combined.
However, a subsequent commit would split them and would be more
difficult to read.

Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-20 22:50:34 +00:00
Martin Schwenke
f7a96deafa ctdb-tests: Make _rpc_service_up() and _rpc_services_down() internal
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2024-08-20 22:50:34 +00:00