IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Fix the comment (NULL versus -1), apply some README.Coding.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Uses of CTDB_BASE in the subsequent code are now handled by the path
module, so there is no point getting the value of CTDB_BASE. Instead,
check that the attempt to set it worked, noting that:
[...] if overwrite is zero, then the value of name is not
changed (and setenv() returns a success status).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Add some missing error handling and error messages.
Remove a use of CTDB_NO_MEMORY(), which then renders the caller's use
of ctdb_errstr() pointless, so remove that too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Modernise the debug macros along the way.
These are done separately because they will require a little more
patience to review.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Define a static function to return the string. This clearly doesn't
need a ctdb_ prefix, but it matches ctdb_vnn_iface_string(), so
doesn't look out of place.
Use it in the places where review is trivial.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
These are currently converted to strings constantly in log messages
and other places. This clutters the code and probably has a minor
performance impact.
Add a new string field to the VNN structure. Populate it when a
public address is added and the VNN structure is allocated. This is
consistent with how node addresses are handled.
Don't use it yet, or this commit becomes huge.
A short-term goal is that each VNN public address will be converted to
a string only once. A longer-term goal is to reduce use of
ctdb_addr_to_str().
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
The word "no" was accidentally dropped in commit
1e47a1b3f6ab1e2ad9d86dfb28c3e086c99a97e5.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Unused since commit a10545ab6bd8a1b9ca87b0fdba8381cb8af0e284.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Currently, event failures are completely ignored in favour of checking
if the IP is on an interface. This misses the case where event
scripts up to and including 10.interface succeed, but something later
fails. When that occurs, count is incremented, so the failure is
counted as a success in the summary that is logged.
Fail when releaseip fails even though 10.interface succeeded in
releasing the IP. This may result in the IP address coming back, but
that's a different problem.
Underlying this is a design question about when releaseip is
successful. Should releaseip be a distinct operation, with subsequent
reconfigurations considered separately?
Update logging to clearly identify each of the 3 possible errors.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
This is last old-style one in this file.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Autobuild-User(master): Anoop C S <anoopcs@samba.org>
Autobuild-Date(master): Mon Oct 7 17:12:18 UTC 2024 on atb-devel-224
Automatic node address selection in the TCP transport does not work if
net.ipv4.ip_nonlocal_bind=1 because all nodes will be able to bind()
to the first address in the nodes list.
Before getting to the bind() step, add a check to see if an address is
local (i.e. on an interface). If not, it is not considered.
This is defensively coded so that this step is skipped if local
addresses can not be retrieved.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
It is more efficient calling ctdb_sys_local_ip_check() inside a loop
compared to calling ctdb_sys_have_ip(). There is a chance that this
is premature optimisation... but it sure is easy. Fall back to
checking with bind().
I think these checks really exist because of the weirdness fixed by
commit 4b4e4d8870475d994fe42a7b2c57dc69842d91f6. However, we might as
well do what we can.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
It can now be used when net.ipv4.ip_nonlocal_bind=1.
This makes the recovery daemon's local IP verification inefficient.
It can be optimised in a subsequent commit.
Fall back to bind() if unable to fetch IPs.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Arguably, this would have made sense back in commit
bf86562144fe4e9541bd993519aca958c2bdb794.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Improve readability by not repeating the complex expression now
assigned to addr. ctdb_sys_have_ip() is called in both arms of the
if/else, so call it once when declaring the new variable.
Modernise debug macros while touching lines.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Do not add any automated test cases because they will always be racy.
This allows manual testing of the function.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
This is a wrapper around getifaddrs(2), which is in libreplace, so
should always be available.
Some users want to set net.ipv4.ip_nonlocal_bind = 1. So, CTDB needs
a way of testing if public IPs are present, without using bind(2).
Doing all of this unconditionally in ctdb_sys_have_ip() will be
inefficient in the recovery daemon's local IP verification if there
are a lot of IP addresses. Split it this way so the interface
information can be retrieved once and used multiple times.
This doesn't appear to need IP canonicalisation for IPv4-mapped IPv6
addresses.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
This currently works when tests are run in-tree.
However, when installed, use of an incorrect variable means it fails
to find statd_callout in the tests/ subdirectory. Switch to using the
correct variable.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sun Oct 6 11:07:05 UTC 2024 on atb-devel-224
These should have caused test failure since commit
ef921bdbdbacecf39ee2a1851f16dbba62175fcc. However, the test failure
occurred in a sub-shell, which covered the failure. So, add an error
exit if the sub-shell fails.
While here, add an error exit for another potential uncaught
sub-shell-related failure in a related test.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Anoop C S <anoopcs@samba.org>
Saves lines, str_list_add_printf takes care of NULL checks
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sun Sep 22 10:44:59 UTC 2024 on atb-devel-224
I could not find out how to cast a char ** to const char ** without
warning. This transfers fine to the execv call as well.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 30 00:08:41 UTC 2024 on atb-devel-224
This was passed to CTDB's old smnotify. This has been replaced by use
of nfs-utils' sm-notify, which doesn't need this.
In test, a fake NFS_HOSTNAME is still needed. Real sm-notify will get
it from a reverse host lookup of the IP address.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB's smnotify does not support IPv6 and is difficult to maintain.
So, create directories of files and pass them to NFS util's sm-notify.
There is an implied change here, because NFS utils sm-notify stopped
sending IP addresses as mon_name back in 2010:
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commitdiff;h=900df0e7c0b9006d72d8459b30dc2cd69ce495a5
This will change advice given in the wiki to use a hostname for the
cluster with round-robin DNS, since this is what is best supported.
Another behavioural change is that sm-notify only sends "up"
notifications with an odd state.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
statd callout will shortly be updated to use NFS utils' sm-notify.
This tiny helper will be used to create on-disk state files used by
sm-notify. These state files contain endian-specific fields, so
better to write a simple C implementation than to do crazy things in a
shell script (or call out to Python).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If an NFS service check is set to, say, unhealthy_after=2 then it will
always switch from the (default startup) unhealthy state to healthy,
even if there is a fatal problem. If all services/scripts appear OK
then the node will become healthy. When the counter hits the limit it
will return to unhealthy. This is misleading.
Instead, never use the counter at startup, until the service becomes
healthy. This stops services flapping unhealthy-healthy-unhealthy.
A side-effect is that a service that starts in a broken state will
never be restarted to try to fix the problem. This makes sense. The
counting and restarting really exist to deal with problems that might
occur under load. The first monitor events occur before public IPs
are hosted, so there can be no load. If a service doesn't start
reliably the first time then the admin probably wants to know about
it.
nfs_iterate_test() is updated to run an initial monitor event to mark
the services as healthy. This initialises the counter so it can be
used for the important part of the test. Passing the -i option avoids
running the extra monitor event, so the first iteration will be the
initial monitor event.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes initial failure to retrieve statistics less likely to
result in a statistics change. To help with this, statistics
retrieval stderr now goes to the log - only stdout goes to the file.
This means that the test code for checking statistics changes needs to
be redone to actually run the statistics command and check. As with
rpcinfo output, this output needs to behave as deterministically in
the test code as it done in the event script.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Checking statistics is only really relevant to timeouts. That is, if
an rpcinfo times out it is worth checking if the service making
progress. If the RPC service is not registered then the statistics
don't need to be checked because they shouldn't be changing.
The 2 previously added tests added to check statistics progress now
behave identically and fail on all iterations. To support testing
with "timeouts", an optional TIMEOUT flag can now be added to the RPC
service passed to nfs_iterate_test(). 2 new tests are added to
exercise the new behaviour.
The 2 new "if" statements in nfs_iterate_test() could be combined.
However, a subsequent commit would split them and would be more
difficult to read.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Update the remaining RPC monitoring tests to use nfs_iterate_test(),
depending on it to set results. This makes all RPC monitoring tests
consistent, so they will all benefit from future improvements.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Doing this in a previous commit would have made it more difficult to
read that commit.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The early exits from the sub-shell make the obvious successes much
more obvious, and slightly simplify the code that follows.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Handling this across two different functions led to insanity, so
simplify.
The handling of unhealthy_after when $_numfails = 0 implicitly causes
the node to be healthy. This is how the "rpcinfo succeeds" case
works. Doing it this way for statistics makes this patch easier to
read. The implicit behaviour will go away in the next patch.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The current structure here is wrong and repetitive. Checking rpcinfo
result and determining output should be in the same place.
Failure counting is now contained in
rpc_set_service_failure_response(), but needs a file to survive the
sub-shell.
Don't attempt to combine and simplify code yet. That would make this
commit harder to review.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The output file is initialised, so doesn't need to be created on
success. Treat the return code file the same way.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Nothing more complex is ever done, so we might as well simplify and
reduce coupling.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If an RPC service is given, it is automatically marked down. This
avoids repetition in test cases and loosens coupling.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This one is in a rarely used error path, so call a function that
talloc()s the string instead.
Again, this will also print the port, which might be useful if we ever
add the ability to also specify ports in the nodes list.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Aug 20 14:24:14 UTC 2024 on atb-devel-224
Same thing several times, so change to common failure code.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>