IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If an NFS service check is set to, say, unhealthy_after=2 then it will
always switch from the (default startup) unhealthy state to healthy,
even if there is a fatal problem. If all services/scripts appear OK
then the node will become healthy. When the counter hits the limit it
will return to unhealthy. This is misleading.
Instead, never use the counter at startup, until the service becomes
healthy. This stops services flapping unhealthy-healthy-unhealthy.
A side-effect is that a service that starts in a broken state will
never be restarted to try to fix the problem. This makes sense. The
counting and restarting really exist to deal with problems that might
occur under load. The first monitor events occur before public IPs
are hosted, so there can be no load. If a service doesn't start
reliably the first time then the admin probably wants to know about
it.
nfs_iterate_test() is updated to run an initial monitor event to mark
the services as healthy. This initialises the counter so it can be
used for the important part of the test. Passing the -i option avoids
running the extra monitor event, so the first iteration will be the
initial monitor event.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes initial failure to retrieve statistics less likely to
result in a statistics change. To help with this, statistics
retrieval stderr now goes to the log - only stdout goes to the file.
This means that the test code for checking statistics changes needs to
be redone to actually run the statistics command and check. As with
rpcinfo output, this output needs to behave as deterministically in
the test code as it done in the event script.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Checking statistics is only really relevant to timeouts. That is, if
an rpcinfo times out it is worth checking if the service making
progress. If the RPC service is not registered then the statistics
don't need to be checked because they shouldn't be changing.
The 2 previously added tests added to check statistics progress now
behave identically and fail on all iterations. To support testing
with "timeouts", an optional TIMEOUT flag can now be added to the RPC
service passed to nfs_iterate_test(). 2 new tests are added to
exercise the new behaviour.
The 2 new "if" statements in nfs_iterate_test() could be combined.
However, a subsequent commit would split them and would be more
difficult to read.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
rpc.statd is single-threaded and runs its HA callout synchronously. If
it is too slow then latency accumulates and rpc.statd's backlog grows.
Running a pair of add-client/del-client events with the current code
averages ~0.030s in my test environment. This mean that 1000 clients
reclaiming locks after failover can easily cause 10s of latency. This
could cause rpc.statd to become unresponsive, resulting in a time out
for an rpcinfo-based health check of the status service.
Split the add-client/del-client events out to a standalone
statd_callout executable, written in C, to be used as the HA callout
for rpc.statd. All other functions move to statd_callout_helper.
Now, running a pair of add-client/del-client events in my test
environment averages only ~0.002s. This seems less likely to cause
latency problems.
The standalone statd_callout executable needs to read a configuration
file, which is generated by statd_callout_helper from the "startup"
event. It also needs access to a list of currently assigned public
IPs.
For backward compatibility, during installation a symlink is created
from $CTDB_BASE/statd-callout to the new statd_callout, which is
installed in the helper directory.
Testing this as part of the eventscript unit tests starts to become
even more of a hack than it used to be. However, the dependency on
stubs and the corresponding setup of fake state makes it hard to move
this elsewhere.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jun 25 04:24:57 UTC 2024 on atb-devel-224
When monitoring an RPC service, the rpcinfo command might time out
even though the service is making progress. In this case, it is just
slow, so counting the timeout as a failure and potentially restarting
the service will not help. The problem is determining if a service is
making progress.
Add a new NFS checks service_stats_command. This command is intended
to run a statistics command. The output is naively compared using
cmp(1). If the output changes then rpcinfo failures are converted to
successes.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is way more complicated than I would like but, as per the
comment, this is due to complexities in the way public IPs work. The
main consumer will be statd-callout, which will then be able to run as
a non-root user.
Also generate the cache file in test code, whenever the PNN is set.
However, this can cause "ctdb ip" to generate a fake IP layout before
public IPs are setup. So, have the "ctdb ip" stub generate the IP
layout every time it is run to avoid it being stale.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
All of the other uses of ctdb.tdb are in statd-callout.
New variable statd_callout_db makes it easy to change the database
name in future, perhaps even allowing it to be configurable.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Tweak some lines to avoid overflowing 80 columns.
Best viewed with "git show -w".
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Using xargs with sort -u to de-duplicate this list was my idea and
causes a couple of things to go wrong. The use of xargs causes
double-quotes to be lost. The resulting $public_ifaces value also
contains newlines. The newlines could be removed with an additional
xargs at the end of the pipeline... but that would add an extra level
of quote stripping.
I have unsuccessfully tried to find an alternative, but still elegant,
command pipeline that de-duplicates the list, while maintaining
quoting.
So, just drop the de-duplication.
This might make interface_ifindex_exists_with_options() slightly less
efficient. However, that function walks the whole list, only
terminating early when a match is found on both interface and options,
so at least it will be correct.
Include an extra testcase.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Apr 18 09:08:34 UTC 2024 on atb-devel-224
get_all_interfaces() functions gets all names for all public interfaces.
However name is misleading. Thus renamed it to get_public_ifaces() and
moved it under functions.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
If someone wants to enable the witness service
samba-dcerpcd needs to be started as standalone service
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15577
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Guenther Deschner <gd@samba.org>
We can easily monitor if the service is running at all,
that better than no monitoring at all...
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15577
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Guenther Deschner <gd@samba.org>
This effectively provides simple testing for the threshold-based
approach.
Add new script option CTDB_VSFTPD_MONITOR_THRESHOLDS.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Oct 3 04:53:38 UTC 2023 on atb-devel-224
Loading tunables is now done in ctdbd, so find another example for the
"setup" event.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
New in ShellCheck 0.9.0:
SC2317 (info): Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
Also:
SC2086 (info): Double quote to prevent globbing and word splitting.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
When testparm processes the output of "testparm -v" (which includes
default values) it appears to do global checks (or some other sort of
initialisation logic) for all specified values. This includes a DNS
lookup for the node's hostname, as a side-effect of a libldap
ldap_set_option() call when processing "ldap debug level". If DNS
servers are down then this can induce timeouts, possibly resulting in
monitor timeouts.
Avoid this by using sed to extract configuration values from the
testparm cache file.
This is already shown to work when retrieving share paths, where
testparm is basically used as cat. Update the sed pattern to avoid
matching empty values on the right-hand side of the equals ('=') -
this avoids the default empty path value (and "smb ports" never has an
empty value).
Corresponding test changes:
* 50.samba.monitor.111.sh no longer expects a failure from being
unable to set smb ports, since testparm is no longer used in that
code path.
* smb ports needs to be set in fake smb.conf so it is in the default
output and can be extracted using sed.
* Although testparm --parameter-name is no longer used in
50.samba.script, update the stub implementation (in case it is ever
used again) to extract from fake smb.conf, since "smb ports" is now
set there. The change from $parameter to $param allows a long line
to stay below 80 columns.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Feb 14 08:43:53 UTC 2023 on atb-devel-224
The list changed back to space-separated in commit
93448f4be92d4e018aaf2f9705f0351360b2ed0f, so simplify the code a
little.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
For example:
In /home/martins/samba/samba/ctdb/tools/onnode line 304:
[ "$nodes" != "${nodes%[ ${nl}]*}" ] && verbose=true
^---^ SC2295 (info): Expansions inside ${..} need to be quoted separately, otherwise they match as patterns.
Did you mean:
[ "$nodes" != "${nodes%[ "${nl}"]*}" ] && verbose=true
For more information:
https://www.shellcheck.net/wiki/SC2295 -- Expansions inside ${..} need to b...
Who knew? Thanks ShellCheck!
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
For memory usage, no need to dump all of this data on every failed
monitor event. The first call will be enough to diagnose the problem.
The node will then go unhealthy, drop clients and memory usage should
then drop.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jul 22 07:32:54 UTC 2022 on sn-devel-184
If filesystem usage exceeds the unhealthy threshold then checking
memory usage checking is not done. Always do them both.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use printf to allow easier line breaks and use some early returns.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
About to modify this file, so reformat first as per recent Samba
convention.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
A problem can only occur if /etc/ctdb/ or an important subdirectory is
removed, which means the script itself would not be found. Use && to
silence ShellCheck.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is functionally the same as ctdb_release_all_ips().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Note that order of sed expressions matters: the expression to delete
comment lines must come first as the second expression would transform
# comment
to
comment
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original. A couple of minor cleanups were made in the
documentation (such as "LVSMASTER" -> "LVS leader").
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Till now 50.samba script was based on RHEL versions <=6 where we didn't
have separate start up script for nmb and smbd used to start nmbd when
required. Now that nmbd has its own start up script named "nmb" it is
reasonable to have "nmb" as default value for CTDB_SERVICE_NMB inside
new 48.netbios ctdb script.
Signed-off-by: Anoop C S <anoopcs@redhat.com>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This change basically moves out nmbd references from 50.samba script to
a new 48.netbios script. Accordingly ctdb test scripts are tweaked to
cope with newly added script.
Signed-off-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This covers the following:
SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
POSIX agrees that -a and -o should not be used:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
Fixing these doesn't cause much churn.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit ea7708d8c7fa674111ccea58b3cd0757765c702a simplified
01.reclock.script and removed include of functions file which is
required for setting CTDB_HELPER_BINDIR and for die() function.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The "init" event is only run once so don't bother caching the
configured value of the recovery lock. Add some extra error checking.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jul 26 04:52:04 UTC 2019 on sn-devel-184
CTDB's system memory monitoring in 05.system.script monitors both main
memory and swap. The swap monitoring was originally based on
the (possibly incorrect, see below) idea that swap space stacks on top
of main memory, so that when a system starts filling swap space then
this is supposed to be a good sign that the system is running out of
memory. Additionally, performance on a Linux system tends to be
destroyed by the I/O associated with a lot of swapping to spinning
disks.
However, some platforms default to creating only 4GB of swap space
even when there is 128GB of main memory. With such a small swap to
main memory ratio, memory pressure can force swap to be nearly full
even when a significant amount of main memory is still available and
the system is performing well. This suggests that checking swap
utilisation might be less than useful in many circumstances.
So, remove the separate swap space checking and change the memory
check to cover the total of main memory and swap space.
Test function set_mem_usage() still takes an argument for each of main
memory and swap space utilisation. For simplicity, the same number is
now passed twice to make the intended results comprehensible. This
could be changed later.
A couple of tests are cleaned up to no longer use hard-coded
/proc/meminfo and ps output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>