IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
intended here. Luckily, it doesn't do any harm because nodes are
marked unhealthy at startup anyway.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.
For consistency these match CTDB_CONTROL_STOP_NODE and
CTDB_CONTROL_CONTINUE_NODE. It would be possible to add a single
control but it would need to take data.
The aim is to finally fix races in flag handling. Previous fixes have
improved the situation but they have only narrowed the race window.
The problem is that the recovery daemon on the master node pushes
flags to nodes the same way that disable and enable are implemented.
So the following sequence is still racy:
1. Node A is disabled
2. Recovery master pulls flags from all nodes including A
3. Node A is enabled
4. Recovery master notices A is disabled and pushes a flag update to
all nodes including node A
5. Node A is erroneously marked disabled
Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
disable" command or a flag update from the recovery master.
The solution is to use a different mechanism for disable/enable and
for a node to ignore MODIFY_FLAGS controls for their own flags.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will usually happen if flags on the node in question change, so
keeping the code simple and pushing to all nodes won't hurt. When all
nodes come up there might be differences in connected nodes, causing
such "fix ups". Receiving nodes will ignore no-op pushes.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The resulting code structure looks a little weird. However, there is
another condition that requires the flags to be pushed that will be
inserted before the continue statement in a subsequent commit..
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
pylint warns:
Use lazy % formatting in logging functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jul 20 05:29:18 UTC 2021 on sn-devel-184
Don't bother with "as e" to avoid warning about unused variable.
Don't use bare "except:" (though pylint still complains about this
version).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Removes an unnecessary level of indirection: defaults and help strings
are now where they are expected. Also removes some global variables.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Removes the need for the global variables currently associated with
this processing. Also removes unnecessarily double-handling the
defaults, which are assigned to the global variables and set via
add_argument().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Avoids numerous pylint warnings.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Due to the number of flake8 and pylint warnings it is unclear if the
source has Python 3 incompatibilities. These will be cleaned up in
subsequent commits.
Signed-off-by: "L.P.H. van Belle" <belle@bazuin.nl>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
In ShellCheck 0.7.2, POSIX compatibility warnings got their own SC3xxx
error codes, so now both the old and new codes need to be ignored.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 25 10:06:48 UTC 2021 on sn-devel-184
Fedora 34 now has a shell function for the which command, which causes
these uses of which to return the enclosing function definition rather
than the executable file as expected.
The event script unit tests always expect the stub service command to
be used, so the conditional in these functions is unnecessary.
$CTDB_HELPER_BINDIR already conveniently points to the stub directory,
so use it here.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
The socket is set close-on-exec but that doesn't help for processes
that do not exec(). This should be done for all child processes.
This has been seen in testing where "ctdb shutdown" waits for the
socket to close before succeeding. It appears that lingering
vacuuming processes have not closed the socket when becoming clients
so they cause "ctdb shutdown" to hang even though the main daemon
process has exited. The cause of the lingering vacuuming processes
has been previously examined but still isn't understood.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The path of the TDB is known, so calculate the file ID (device number
+ inode number) from it and use this to directly filter /proc/locks to
find processes holding locks.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't use the arguments yet. They will be used in a simplified
version of the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
1. PID of lock helper waiting for lock
2. Scope of lock: "record" or "db"
3. Path to database that lock helper is trying to lock
4. Whether the database uses mutexes: "mutex" or "fcntl"
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
These were fine (though still lazy) when these tests were the only
user of this stub. However, the ps stub is about to be enhanced, so
fix these uses of it to represent the intended usage.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The main reason for this is to facilitate testing.
Avoid some /proc accesses entirely by using ps(1) (which can be
replaced by a stub when testing) because this script might as well be
more portable in case anyone wants to add lock debugging for a
non-Linux platform. While the "state" format specification isn't
POSIX-compliant, it works on both Linux and FreeBSD so it is a
reasonable improvement.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a script times out the caller can talloc_free() the script_list
output of run_event_recv, which talloc_free's proc->output from
run_proc.c as well. If the script generates further output after the
timeout and then exits after a while, the SIGCHLD handler in the
eventd tries to read into proc->output, which was already free'ed.
Fix this by not doing just a talloc_steal but a talloc_move. This way
proc_read_handler() called from run_proc_signal_handler() does not try
to realloc the stale reference to proc->output but gets a NULL
reference.
I don't really know how to do a knownfail in ctdb, so this commit
actually activates catching the signal by waiting long enough for
22.bar to exit and generate the SIGCHLD.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
This will lead to a crash in run_event_test.c soon
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Triggers a different code path in run_event_* and aligns it more what
the ctdb eventd really does.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Use F_GETLK to get the lock holder PID, this is more accurate than
reading the file contents: A conflicting process might not have
written its PID yet. Also, F_GETLK easily allows to do a retry if the
lock holder just died.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
This test has been failing with:
Wait until record is migrated to lmaster node 0
<30|BAD: node 0 is not dmaster
dmaster: 1
rsn: 8
flags: 0x00010000 MIGRATED_WITH_DATA
data(6) = "value1"
*** TEST COMPLETED (RC=1) AT 2021-02-02 06:18:48, CLEANING UP...
This should never happen. If this really fails then the wait should
time out.
The problem is that wait_until() does:
"$@" || _rc=$?
and vacuum_test_key_dmaster() currently calls ctdb_test_fail() on
failure, which causes the shell to exit. Instead, pass a variant to
wait_until() that simply returns the correct status instead of
exiting.
An alternative would be to change the statement in wait_until() to do:
("$@") || _rc=$?
so it captures the exit. However, this is a global change and
requires more thought.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>
This is helpful if you are in a listening loop with the same receiver
for many sockets doing the same thing.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
If run with UID wrapper and UID_WRAPPER_ROOT=1 then securing the
socket will fail.
Test mode means that local daemons are in use, so securing the socket
is not important.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Variable res is only used once and ret is re-used many times. Drop
res, use ret, which doesn't need to be initialised. Modernise debug
macro.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
When compiling with GCC 10.x and -O3 optimization, the IP checksum
calculation code generates wrong checksum. The function uint16_checksum
gets inlined during optimization and ip4pkt->tcp data gets wrongly
aliased.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14537
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Oct 21 05:52:28 UTC 2020 on sn-devel-184
Check that the desired state is set on all nodes instead of just the
test node. This ensures that node flags have correctly propagated
across the cluster.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Oct 6 04:32:06 UTC 2020 on sn-devel-184
update_flags() has already updated the recovery master's canonical
node map, based on the flags from each remote node, and pushed out
these flags to all nodes.
If i == j then the node map has already been updated from this remote
node's flags, so simply drop this case.
Although update_flags() has updated flags for all nodes, it did not
update each node map in remote_nodemaps[] to reflect this. This means
that remote_nodemaps[] may contain inconsistent flags for some nodes
so it should not be used to check consistency when i != j.
Further, a meaningful difference in flags can only really occur if
update_flags() failed. In that case this code is never reached.
These observations combine to imply that this whole loop should be
dropped.
This leaves potential sub-second inconsistencies due to out-of-band
healthy/unhealthy flag changes pushed via CTDB_SRVID_PUSH_NODE_FLAGS.
These updates could be dropped (takeover run asks each node for
available IPs rather than making centralised decisions based on node
flags) but for now they will be fixed in the next iteration of
main_loop().
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This has already been done in update_flags().
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@samba.org>
Autobuild-User(master): David Disseldorp <ddiss@samba.org>
Autobuild-Date(master): Thu Sep 24 00:52:42 UTC 2020 on sn-devel-184
The Ceph Manager's service map is useful for tracking the status of
Ceph related services. By registering the CTDB recovery lock holder,
Ceph storage administrators can more easily identify where and when a
CTDB cluster is up and running.
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@samba.org>
This is no longer necessary because the capability new style database
pull is assumed to always be available.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Removes use of the old controls without cleaning up the code. Clean
up can be done later.
After this change the CTDB_CAP_FRAGMENTED_CONTROLS capability is no
longer checked. This capability can be removed along with the
controls.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The new waf detects a duplicate instance of
ctdb_mutex_ceph_rados_helper.7.xml, which is due
to manpages_extra being a pointer to
manpages_misc, therefore each call to build()
added duplicate entries to the manpages_misc
global entry.
Signed-off-by: David Mulder <dmulder@suse.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This makes it consistent with the monitoring code. If the master has
changed then this means the master will always get the message.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 18 06:24:11 UTC 2020 on sn-devel-184
This also updates remote flags so the name is misleading.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
update_local_flags() will be changed to use these nodemaps.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The nodemap has already been fetched from the local node and is
actually passed to this function. Care must be taken to avoid
referencing the "remote" nodemap for the recovery master. It also
isn't useful to do so, since it would be the same nodemap.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The plan here is to use the nodemaps retrieved by get_remote_nodes()
in update_local_flags(). This will improve efficiency, since
get_remote_nodes() fetches flags from nodes in parallel. It also
means that get_remote_nodes() can be used exactly once early on in
main_loop() to retrieve remote nodemaps. Retrieving nodemaps multiple
times is unnecessary and racy - a single monitoring iteration should
not fetch flags multiple times and compare them.
This introduces a temporary behaviour change but it will be of no
consequence when the above changes are made.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This array is indexed by the same index as nodemap, not the PNN.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Also drop error handling in main_loop() that is replaced by this
change.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow an error callback to be added.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Change 1st argument to a rec context, since this will be needed later.
Drop the nodemap argument and access it via rec->nodemap instead.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The memory is allocated off the memory context used by the current
iteration of main loop. It is freed when main loop completes the fix
doesn't require backporting to stable branches. However, it is sloppy
so it is worth fixing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't log an error on failure - let the caller can do this. Apart
from this: fix up coding style and modernise the remaining error
message.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This isn't used anywhere and can easily be checked via "ctdb pnn" and
"ctdb recmaster" commands.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug 3 22:21:04 UTC 2020 on sn-devel-184
If nfsconf exists then use it as last resort to attempt to extract
[nfsd]:threads from /etc/nfs.conf.
Invocation of nfsconf requires "|| true" because this script uses "set
-e". Add a stub that always fails to at least test this much.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14444
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jul 27 07:06:58 UTC 2020 on sn-devel-184
If nfsconf exists then use it as last resort to attempt to extract
[statd]:name from /etc/nfs.conf.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14444
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original. A couple of minor cleanups were made in the
documentation (such as "LVSMASTER" -> "LVS leader").
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The logic currently in ctdb_ctrl_modflags() will be optimised so that
it no longer matches the pattern for a control function. So, remove
this function and squash its functionality into the only caller.
Although there are some superficial changes, the behaviour is
unchanged.
Flattening the 2 functions produces some seriously weird logic for
setting the new flags, to the point where using ctdb_ctrl_modflags()
for this purpose now looks very strange. The weirdness will be
cleaned up in a subsequent commit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is clearer than using the MODFLAGS control directly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes fields such as recmaster and nodemap easily available if
required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
An unused argument needlessly extends the length of function calls. A
subsequent change will allow rec->nodemap to be used if necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Avoid use of non-portable md5sum by constructing database names using
index. Improve indentation, use more modern commands, code
improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Jul 22 09:14:35 UTC 2020 on sn-devel-184
Simplify code, use more modern commands, code improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"wc -l" on some platforms (e.g. FreeBSD) contains leading spaces, so
strip them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Select test node with IPs instead of using a fixed node. Remove
unnecessary code, use more modern commands, code
improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"wc -l" on some platforms (e.g. FreeBSD) contains leading spaces and
stops "$num from being a number. Create a more portable solution and
put it in a function instead of repeating the logic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It would be nice to get rid of "onnode any". There's no use making
tests nondeterministic. If covering different cases matters then they
should be explicitly handled.
In most places "any" is replaced by "$test_node". In some cases,
where $test_node is not set, a fixed node that is already used
elsewhere can be reused.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Separate cluster startup from test initialisation for tests that start
the cluster with customised configuration. In these cases the result
of the cluster startup is actually the point of the test.
Additionally, pubips.013.failover_noop.sh claims to have completed
test initialisation twice, which just seems wrong.
The result is:
* ctdb_test_init() takes one option (-n) to indicate when it should
not configure/start the cluster
* New function ctdb_nodes_start_custom() accepts options for special
cluster configuration, only operates on local daemons and triggers a
test failure rather than a test error on failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The only caller calls ctdb_test_error() on failure and nesting this
calls can be confusing. A future change will make this even more
confusing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Mostly avoidance of quoting warnings.
Silencing warnings about unquoted $CTDB_TEST_CAT_RESULTS_OPTS is
handled by passing '-' to cat when that variable's value is empty.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Apart from the non-constant sourcing of include files.
Mostly avoidance of quoting warnings.
One subtle change is to simply pass "120" to wait_until_ready() to
stop warnings that it expects arguments but none are passed (both
SC2119 and SC2120). There seems no way to indicate to structure
function argument handling so that shellcheck realises arguments are
optional. In later shellcheck versions, disabling SC2120 for a
function also silences complaints about its callers... but not all of
our testing uses "later" shellcheck versions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Use "#!/usr/bin/env bash" for improved portability
* Drop test_info() definition and replace it with a comment
The use of test_info() is pointless.
* Drop call to cluster_is_healthy()
This is a holdover from when the previous test would restart daemons
to get things ready for a test. There was also a bug where going
into recovery during the restart would sometimes cause the cluster
to become unhealthy. If we really need something like this then we
can add it to ctdb_test_init().
* Make order of preamble consistent
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Avoid:
.../UNIT/shellcheck/scripts/local.sh: line 14: type: shellcheck: not found
The "type" command in dash prints the "not found" message to stdout
but the bash version prints to stderr, so redirect stderr too.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The output in a test failure appears to contain no pstree output
because "00\.test\.script,.*" does not match. However, this is just a
guess because the output is not shown.
Showing the output makes it easier to understand test failures.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow local daemons to be used in more contexts, especially
in tests run by Jenkins where the directory names for some targets can
be very long.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The standalone build still includes tests, as does the top-level build
when --enable-selftest is used. The latter is consistent with the use
of --enable-selftest in the rest of the tree.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In certain circumstance, which aren't obvious, cat(1) can fail when
attempting to write a lot of data. This is due to something (probably
write(2)) returning EAGAIN.
Given that the -v option should only really be used for test
debugging, ignore the failure instead of spending time debugging it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14446
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In certain circumstance, which aren't obvious, cat(1) can fail when
attempting to write a lot of data. This is due to something (probably
write(2)) returning EAGAIN.
Given that the -v option should only really be used for test
debugging, ignore the failure instead of spending time debugging it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14446
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 22 04:10:47 UTC 2020 on sn-devel-184
Avoid use of non-portable md5sum by constructing database names using
index. Improve indentation, use more modern commands, code
improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Select test node with IPs instead of using a fixed node. Remove
unnecessary code, use more modern commands, code
improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
"wc -l" on some platforms (e.g. FreeBSD) contains leading spaces and
stops "$num from being a number. Create a more portable solution and
put it in a function instead of repeating the logic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
It would be nice to get rid of "onnode any". There's no use making
tests nondeterministic. If covering different cases matters then they
should be explicitly handled.
In most places "any" is replaced by "$test_node". In some cases,
where $test_node is not set, a fixed node that is already used
elsewhere can be reused.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Separate cluster startup from test initialisation for tests that start
the cluster with customised configuration. In these cases the result
of the cluster startup is actually the point of the test.
Additionally, pubips.013.failover_noop.sh claims to have completed
test initialisation twice, which just seems wrong.
The result is:
* ctdb_test_init() takes one option (-n) to indicate when it should
not configure/start the cluster
* New function ctdb_nodes_start_custom() accepts options for special
cluster configuration, only operates on local daemons and triggers a
test failure rather than a test error on failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
The only caller calls ctdb_test_error() on failure and nesting this
calls can be confusing. A future change will make this even more
confusing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Mostly avoidance of quoting warnings.
Silencing warnings about unquoted $CTDB_TEST_CAT_RESULTS_OPTS is
handled by passing '-' to cat when that variable's value is empty.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Apart from the non-constant sourcing of include files.
Mostly avoidance of quoting warnings.
One subtle change is to simply pass "120" to wait_until_ready() to
stop warnings that it expects arguments but none are passed (both
SC2119 and SC2120). There seems no way to indicate to structure
function argument handling so that shellcheck realises arguments are
optional. In later shellcheck versions, disabling SC2120 for a
function also silences complaints about its callers... but not all of
our testing uses "later" shellcheck versions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
* Use "#!/usr/bin/env bash" for improved portability
* Drop test_info() definition and replace it with a comment
The use of test_info() is pointless.
* Drop call to cluster_is_healthy()
This is a holdover from when the previous test would restart daemons
to get things ready for a test. There was also a bug where going
into recovery during the restart would sometimes cause the cluster
to become unhealthy. If we really need something like this then we
can add it to ctdb_test_init().
* Make order of preamble consistent
Signed-off-by: Martin Schwenke <martin@meltin.net>
Avoid:
.../UNIT/shellcheck/scripts/local.sh: line 14: type: shellcheck: not found
The "type" command in dash prints the "not found" message to stdout
but the bash version prints to stderr, so redirect stderr too.
Signed-off-by: Martin Schwenke <martin@meltin.net>
The output in a test failure appears to contain no pstree output
because "00\.test\.script,.*" does not match. However, this is just a
guess because the output is not shown.
Showing the output makes it easier to understand test failures.
Signed-off-by: Martin Schwenke <martin@meltin.net>
This will allow local daemons to be used in more contexts, especially
in tests run by Jenkins where the directory names for some targets can
be very long.
Signed-off-by: Martin Schwenke <martin@meltin.net>
The standalone build still includes tests, as does the top-level build
when --enable-selftest is used. The latter is consistent with the use
of --enable-selftest in the rest of the tree.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Drop some unnecessary whitespace and re-indent push().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Options can be set in ONNODE_SSH, so this variable is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The current code only ever swaps with slot 0. This will only ever
happen with slots 0 and 1, so probably never sorts.
Replace with qsort().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdbd currently only logs when a new hot key is added. If a key gets
hotter then nothing new is logged.
Log hot key updates when the number of migrations has doubled since
the last time that key was logged.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This message indicates that a hot key was added, so say that. After
all the hot key slots have been filled the id will always be 0, so
don't bother logging it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These should be unsigned but luck is currently on our side.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are 2 reasons for this. Sorting of hot keys is broken and will
be changed to an implementation that needs a named (i.e. not
anonymous) structure. Also, at least one non-protocol field will be
added to facilitate more useful logging.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Testing control: 4 bytes msec delay plus a blob, return the request after the
delay. This is an enhanced "ping" which can be used to test asynchronous
clients.
Doesn't have the full protocol implementation yet
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
On debian buster, this variable doesn't exist anymore. Look at this PR
as a reference:
https://github.com/gluster/storhaug/pull/30
Signed-off-by: Renaud Fortier <renaud.fortier@fsaa.ulaval.ca>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Apr 23 08:07:51 UTC 2020 on sn-devel-184
The vacuuming integration tests set VacuumInterval to a very high
number to avoid vacuuming collisions. This is done after the cluster
is healthy, so Samba will have already been started and vacuuming will
already be scheduled *at the default interval* for databases attached
by Samba. This means that vacuuming controls used by vacuuming tests
can still collide with the scheduled vacuuming events.
Add some logic to reschedule a vacuuming event that has fired but
where VacuumInterval has increased since it was originally scheduled.
The increase in VacuumInterval is used as the time offset for
rescheduling the event.
Although this changes production behaviour for the convenience of
testing, the new behaviour is completely reasonable and obeys the
principle of least surprise.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr 7 03:04:57 UTC 2020 on sn-devel-184
No behaviour change. This is final staging to make the next change
completely obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No behaviour change. This just makes future changes clearer by
avoiding reformatting (or introducing local variables).
Clean up error handling while touching a relevant line.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.
One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:
list_of_nodes()
list_of_active_nodes()
set_recovery_mode()
force_election()
lost_reclock_handler()
Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL. Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324
Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
Neither the recovery daemon nor the recovery helper should attach
databases outside of the recovery process.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This builds a complete list of databases across the cluster so it can
be used to create databases on the nodes where they are missing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will be used to build a merged list of databases from all nodes,
allowing the recovery helper to create missing databases.
It would be possible to also include the db_name field in this
structure but that would cause a lot of churn. This field is used
locally in the recovery of each database so can continue to live in
the relevant state structure(s).
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently only set by the recovery daemon when it attaches
missing databases, so there is no obvious behaviour change. However,
attaching missing databases can now be moved to the recovery helper as
long as it sets this flag.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_ctrl_createdb() is only called by the recovery daemon, so this is
a safe, temporary change. This is temporary because
ctdb_ctrl_createdb(), create_missing_remote_databases() and
create_missing_local_databases() will all go away soon.
Note that this doesn't cause a change in behaviour. The main daemon
will still only defer attaches from non-recoverd processes during
recovery.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 3cc230b5ee says:
Dont allow clients to connect to databases untile we are well past
and through the initial recovery phase
It is unclear what this commit was attempting to do. The commit
message implies that more attaches should be deferred but the code
change adds a conjunction that causes less attaches to be deferred.
In particular, no attaches will be deferred after startup is complete.
This seems wrong.
To implement what seems to be stated in the commit message an "or"
needs to be used so that non-recovery daemon attaches are deferred
either when in recovery or before startup is complete. Making this
change highlights that attaches need to be allowed during the
"startup" event because this is when smbd is started.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a node is marked for banning, confirm that it's not become inactive
during the recovery. If yes, then don't ban the node.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
It's possible to have a node stopped, but recovery master not yet
updated flags on the local ctdb daemon when recovery is started. So do
not trust the list of active nodes obtained from the local node. Query
the connected nodes to calculate the list of active nodes.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
New vnnmap is constructed using the information from all the connected
nodes. So there is no need to fetch the vnnmap from recovery master.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The only place the outgoing connection needs to be stopped is when
there is a timeout when waiting for the connection to become writable.
Add a new function ctdb_tcp_node_connect_timeout() to handle this
case.
All of the other cases are attempts to establish a new outgoing
connection (initial attempt, retry after an error or disconnect, ...)
so drop stopping the connection in those cases.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar 12 05:29:20 UTC 2020 on sn-devel-184
No change in behaviour. This makes the code self-documenting.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
No change in behaviour. This makes the code self-documenting.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The node dead upcall has already restarted the outgoing connection.
There's no need to repeat it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
ctdb_tcp_tnode_cb() is called when we receive data on the outgoing connection.
This can happen when we get an EOF on the connection because the other side as
closed. In this case data will be NULL.
It would also be called if we received data from the peer. In this case data
will not be NULL.
The latter case is a fatal error though and we already call
ctdb_tcp_stop_connection() for this case as well, which means even though the
node is not fully connected anymore, by not calling the node_dead() upcall
NODE_FLAGS_DISCONNECTED will not be set.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Since commit 77deaadca8, a nodeA which
had previously accepted a connection from nodeB (where nodeB dies
e.g. as as result of fencing) when nodeB attempts to connect again
after restarting is always rejected with
ctdb_listen_event: Incoming queue active, rejecting connection from w.x.y.z
messages.
Consolidate dead node handling in the TCP restart handling.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If NODE_FLAGS_DISCONNECTED is set the node can be in half-connected state. With
this change we ensure to restart the transport for this case.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Better compatibility, since od output isn't consistent on FreeBSD.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Mar 10 09:17:12 UTC 2020 on sn-devel-184
When the tests are run interactively this will make it more noticeable
that shellcheck is not installed because the test summary will
indicate missing tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Skipped tests return a status that indicates failure. In combination
with the -e option this results in an exit with failure on the first
skipped test.
Convert skipped test status to success. The skip has already been
counted.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Christof Schmitt <cs@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Mar 9 02:27:18 UTC 2020 on sn-devel-184
When "ctdb shutdown" is run with -n <N> it does not wait for the node
<N>'s ctdbd to go down but exits immediately. This means that the
local_daemons.sh shutdown command can find the PID file still present
and then attempt the shutdown, but the daemon can have exited between
the check and the shutdown. Although the test waits until the node is
disconnected, the transport is taken down just before the exit, so
this does not guarantee the daemon has exited.
A local shutdown command (no -n <N>) waits until the socket
disconnects and this happens *after* the PID file is gone, so this is
safe to use with the local_daemons.sh shutdown command.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 2 10:39:28 UTC 2020 on sn-devel-184
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Sat Feb 29 11:53:42 UTC 2020 on sn-devel-184
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sat Feb 29 08:02:50 UTC 2020 on sn-devel-184
Signed-off-by: Anoop C S <anoopcs@redhat.com>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 27 07:34:53 UTC 2020 on sn-devel-184
Till now 50.samba script was based on RHEL versions <=6 where we didn't
have separate start up script for nmb and smbd used to start nmbd when
required. Now that nmbd has its own start up script named "nmb" it is
reasonable to have "nmb" as default value for CTDB_SERVICE_NMB inside
new 48.netbios ctdb script.
Signed-off-by: Anoop C S <anoopcs@redhat.com>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This change basically moves out nmbd references from 50.samba script to
a new 48.netbios script. Accordingly ctdb test scripts are tweaked to
cope with newly added script.
Signed-off-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The example output doesn't tell a user what command generated it.
Adding the command makes the examples much more useful.
Reported-by: Stefan Kania <stefan@kania-online.de>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 18 04:22:56 UTC 2020 on sn-devel-184
This depends on a small amount of internal knowledge but is the
cleanest way of avoiding errors for nodes that have already been shut
down.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Separate functions are not needed for stopping/starting/restarting
individual nodes. The stop and start functions essentially just use
onnode, though for local daemons this is embedded in local_daemons.sh.
So, just provide one stop and one start function that takes an
optional nodespec, defaulting to all nodes.
Restarting becomes common.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
csbuild says:
ctdb/server/ctdb_lock.c: scope_hint: In function ‘ctdb_find_lock_context’
ctdb/server/ctdb_lock.c:671:33: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Typo in variable name db_map_pattern
* Variable num_db_init used before set
* dbmap_pattern does not cover database flags
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 12 04:38:47 UTC 2020 on sn-devel-184
Shellcheck noticed that pnn was never referenced. Not sure this ever
worked or whether it got broken somewhere along the way.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
select_test_node_and_ips() is not required in these cases.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If we can't bind the local end of an outgoing connection then
something has gone wrong. Retrying is better than failing into a
zombie state. The interface might come back up and/or the address my
be reconfigured.
While here, do the same thing for the other (potentially transient)
failures.
The unknown address family failure is special but just handle it via a
retry. Technically it can't happen because the node address parsing
can only return values with address family AF_INET or AF_INET6.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14274
Reported-by: 耿纪超 <gengjichao@jd.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If ctdbd hangs when shutting down in post-test clean-up then killing
the process group can kill the test. When in test mode, create a
process group but only in the top-level ctdb tool - the natgw and lvs
helpers also run the ctdb tool.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When running tests from a top-level build, a stale build in ctdb/bin/
will be preferred and may cause confusing results.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is used to set several variables so it might as well be cd-proof.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no sane way of keeping stdin open when using the shell to
background ctdbd in local_daemons.sh. Instead, have ctdbd fork when
not interactive and when test mode is enabled. become_daemon() can't
be used for this: if it forks then it also closes stdin.
For the interactive case, become_daemon() wasn't doing anything
special, so do nothing instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These don't need to depend on do_fork. Child logging should be set up
whenever the daemon is not interactive. The stdin handler should be
setup whenever test mode is enabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No functional changes.
This is staging for a change that makes ctdbd fork when test mode is
enabled but interactive is not set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A subsequent file descriptor allocation may return 0 and unexpected
things may then happen.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Otherwise, if the test is run via ssh it will "unexpectedly" find
itself at the other end of a pipe.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This doesn't work when stdin is not a tty.
This reverts commit ea754bfdec.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Thu Jan 30 13:53:22 UTC 2020 on sn-devel-184
POSIX says:
If job control is disabled (see set, -m), the standard input for an
asynchronous list, before any explicit redirections are performed,
shall be considered to be assigned to a file that has the same
properties as /dev/null. This shall not happen if job control is
enabled. In all cases, explicit redirection of standard input shall
override this activity.
ctdbd is backgrounded at startup, so the above causes stdin to be
redirected from /dev/null. Enable job control to work around this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 28 11:24:35 UTC 2020 on sn-devel-184
This allows a test environment to simply close its end of a pipe to
cleanly shutdown ctdbd. Like in smbd, this is only done if stdin is a
pipe or a socket.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This avoids a crash if ctdb_shutdown_sequence() is called before
monitoring is initialised.
Switch to using TALLOC_FREE() while touching this function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Testing against a commonly used cluster filesystem has shown no
performance impact, as expected.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes it more like the way it works with a cluster filesystem.
It also allows the subdirectory to be manipulated in tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
See the comments added to the tests.
It may be possible to rewrite these so they do something sane for
IPv6... some other time.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14227
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 3 00:00:55 UTC 2020 on sn-devel-184
ss added square brackets around IPv6 addresses in versions > 4.12.0
via commit aba9c23a6e1cb134840c998df14888dca469a485. CentOS 7 added
this feature somewhere mid-release. So, backward compatibility is
obviously needed.
As per the comment protocol/protocol_util.c should probably print and
parse such square brackets. However, for backward compatibility the
brackets would have to be stripped in both places in
update_tickles()... or added to the ss output when missing. Best to
leave this until we have a connection tracking daemon.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14227
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 14 12:03:46 UTC 2019 on sn-devel-184
If any of the option parsing or command parsing fails, generate usage
message.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The type-checking is superfluous and gets in the way of readability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Nov 14 03:45:44 UTC 2019 on sn-devel-184
Commit c68b6f96f2 changed the talloc hierarchy such that outgoing TCP sockets
while sitting in the async connect() syscall are not freed via
ctdb_tcp_shutdown() anymore, they are hanging off a longer-running structure.
Free this structure as well.
If an outgoing TCP socket leaks into a long-running child process (possibly the
recovery daemon), this connection will never be closed as seen by the
destination node. Because with recent changes incoming connections will not be
accepted as long as any incoming connection is alive, with that socket leak
into the recovery daemon we will never again be able to successfully connect to
the node that is affected by this leak. Further attempts to connect will be
discarded by the destination as long as the recovery daemon keeps this socket
alive.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Nov 13 13:31:10 UTC 2019 on sn-devel-184
We have a macro for NULLing out the pointer
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Nov 8 01:35:11 UTC 2019 on sn-devel-184
This can fail as follows:
--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
Running test ./tests/UNIT/tool/ctdb.process-exists.003.sh (02:26:30)
--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
ctdb.process-exists.003 - ctdbd process with multiple connections on node 0
Setting up fake ctdbd
<10||0|
OK
<10|PID 26107 exists
|0|
OK
==================================================
Running "ctdb -d NOTICE process-exists 26107 0x1234567812345678"
PASSED
==================================================
Running "ctdb -d NOTICE process-exists 26107 0xaebbccdd12345678"
Registered SRVID 0xaebbccdd12345678
--------------------------------------------------
Output (Exit status: 1):
--------------------------------------------------
PID 26107 with SRVID 0xaebbccdd12345678 does not exist
--------------------------------------------------
Required output (Exit status: 0):
--------------------------------------------------
PID 26107 with SRVID 0xaebbccdd12345678 exists
FAILED
connection to daemon closed, exiting
==========================================================================
TEST FAILED: ./tests/UNIT/tool/ctdb.process-exists.003.sh (status 1) (duration: 0s)
==========================================================================
This happens when dummy_client has not registered the SRVID (for its
10th connection) before the 2nd simple_test.
Change the initial wait to ensure that the SRVID is registered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Nov 6 02:46:24 UTC 2019 on sn-devel-184
Improve quoting and indentation. Print a clear error if the cluster
goes back into recovery and doesn't come back out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Retrying like this hides bugs. The cluster should come up first time,
every time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This file descriptor is owned by the incoming queue. It will be
closed when the queue is torn down.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB's incoming queue handling does not check whether an existing
queue exists, so can overwrite the pointer to the queue. This used to
be harmless until commit c68b6f96f2
changed the read callback to use a parent structure as the callback
data. Instead of cleaning up an orphaned queue on disconnect, as
before, this will now free the new queue.
At first glance it doesn't seem possible that 2 incoming connections
from the same node could be processed before the intervening
disconnect. However, the incoming connections and disconnect occur on
different file descriptors. The queue can become orphaned on node A
when the following sequence occurs:
1. Node A comes up
2. Node A accepts an incoming connection from node B
3. Node B processes a timeout before noticing that outgoing the queue is writable
4. Node B tears down the outgoing connection to node A
5. Node B initiates a new connection to node A
6. Node A accepts an incoming connection from node B
Node A processes then the disconnect of the old incoming connection
from (2) but tears down the new incoming connection from (6). This
then occurs until the originally affected node is restarted.
However, due to the number of outgoing connection attempts and
associated teardowns, this induces the same behaviour on the
corresponding incoming queue on all nodes that node A attempts to
connect to. Therefore, other nodes become affected and need to be
restarted too.
As a result, the whole cluster probably needs to be restarted to
recover from this situation.
The problem can occur any time CTDB is started on a node.
The fix is to avoid accepting new incoming connections when a queue
for incoming connections is already present. The connecting node will
simply retry establishing its outgoing connection.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes it consistent with the reverse case. Also, in_fd will soon
be removed.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This changes the behaviour for some failures from exiting to simply
attempting to schedule the next run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
All the vacuum operations if required have an event loop to ensure
completion of pending operations. Once all the steps are complete,
there is no reason to process any more packets.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming. Recovery daemon does
not take part in database vacuuming any more.
The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is now implemented in the ctdb daemon using VACUMM_FETCH control.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is the only place where setting an environment variable by hand
is recommended, so remove the anomaly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Tue Oct 22 21:02:11 UTC 2019 on sn-devel-184
Now they are clearly all part of CTDB.
TEST_SOCKET_WRAPPER_SO_PATH gets too long in
integration_local_daemons.bash, so change it to
CTDB_TEST_SWRAP_SO_PATH instead of just prefixing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is the default and deciding this should be left to run_tests.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Implement this for local daemons integration tests, dumping last 100
lines of logs. This makes it possible to debug some failures in
automated tests where the logs are unavailable for analysis.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When run from integration tests $CTDB already includes $VALGRIND, if
set. So only add $VALGRIND if $CTDB is not set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no good reason why the code needs to be this way. The
intervening code was removed years ago leaving a more complex version
of something very simple.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
That's all they do now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Oct 4 10:58:10 UTC 2019 on sn-devel-184
It makes sense to do this in one place in case other headers/footers
are added.
Reindent ctdb_test_begin() while touching this function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It makes sense to do this in one place in case other headers/footers
are added.
Simplify ctdb_test_end() accordingly, reindenting because nearly all
lines are modified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This allows more variables to be set in this function because they are
no longer in a sub-shell.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Only the test file name is ever passed.
Reindent while touching many existing lines.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These are not used outside this script so they do not need to be
environment variables.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb_onnode()
* testprog_onnode()
* function_onnode()
These encapsulate familiar patterns found when running
try_command_on_node(). The new function names are more concise and
encourage more readable tests. Test writers can do less thinking
about the subtleties of running different types of commands on nodes.
For example, these functions ensure that $CTDB and $VALGRIND are used
in the correct contexts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no point folding this down to 1. Tests should be able to see
the original value, if required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The main change is to source cluster.bash instead of integration.bash.
While touching the preamble, the following additional changes are also
made:
* Drop test_info() definition and replace it with a comment
The use of test_info() is pointless.
* Drop call to ctdb_test_check_real_cluster()
cluster.bash now does this.
* Drop call to cluster_is_healthy()
This is a holdover from when the previous test would restart daemons
to get things ready for a test. There was also a bug where going
into recovery during the restart would sometimes cause the cluster
to become unhealthy. If we really need something like this then we
can add it to ctdb_test_init().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This centralises this logic. Use it in a subset of tests - there are
other cases but these will be cleaned up soon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This allows standard exit codes for failed and skipped tests, and test
errors.
Skipped tests currently just succeed and a test error is the same as a
failure. These can be easily changed later when run_tests.sh is ready
to handle them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use of this tunable was dropped over 5 years ago in commit
16837bc309.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Oct 4 07:07:21 UTC 2019 on sn-devel-184
This occurs rarely but can adversely impact performance, so it is
worth logging it more frequently.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Otherwise it is uninitialised, so...
==22889== Conditional jump or move depends on uninitialised value(s)
==22889== at 0x12257B: ctdb_req_control_data_len (protocol_control.c:39)
==22889== by 0x1228E9: ctdb_req_control_len (protocol_control.c:1786)
==22889== by 0x12A51C: ctdb_client_control_send (client_control.c:101)
==22889== by 0x138BE1: ctdb_tunnel_setup_send (client_tunnel.c:100)
==22889== by 0x10EE4F: tunnel_test_send (tunnel_test.c:135)
==22889== by 0x10EE4F: main (tunnel_test.c:463)
and similar.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This currently skips the last record.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147
RN: Avoid potential data loss during recovery after vacuuming error
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Bring this up to date. Drop descriptions of command-line options
because these tend to bit-rot - refer to "run_tests.sh -h" instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep 26 06:01:33 UTC 2019 on sn-devel-184
Split some tests out into database/ and failover/ subdirectories.
Rename the remaining tests in simple/.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Do the same with the alternative code for real clusters.
Both of these can now be used by other test suites.
Fix some basic shellcheck warnings (e.g. avoid word-splitting by
quoting) while moving code and add the new files to the shellcheck
test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>