IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Note that the 60.ganesha RPC checks need to be identical to those in
the nfs-checks.d/ directory. This is because the NFS unit test
infrastructure checks output against what should be produced by the
checks in nfs-checks.d/. This is a minor issue, since one of the aims
of this work is to remove the need for a separate 60.ganesha.
In most cases configuration variable CTDB_NFS_DUMP_STUCK_THREADS is
now ignored. This is now handled by passing the desired number of
threads to the command specified in the service_debug_cmd variable in
a .check file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Change status, nlockmgr, mountd, rquotad to be unhealthy after 6
rpcinfo check failures and do a verbose restart after every 2
failures. Change 60.ganesha for consistency, since 60.ganesha tests
are broken and depend on the consistency.
Apart from the consistency aspect, the check infrastructure will soon
be simplified so that it only allows the equivalent of "unhealthy" and
"verbose restart:b" actions.
Update tests to have a corresponding numbers of iterations. Run 1
extra iteration in most tests to check there are no unexpected
behaviour changes after the designated number of iterations completes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
They are contrived and hard to read. Better to just enumerate the few
sub-tests in these testcases.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A useful baseline test to ensure that certain things (e.g. rpcinfo)
aren't consistently broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This means that required result will not be calculated on each
iteration. This is useful in baseline tests where, say, all
iterations should succeed and produce no output. This is useful for
confirming that the eventscript and test infrastructure is working
correctly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Much clearer than using iterate_test() for this purpose. This also
does failover counting by calling rpcinfo in each iteration.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It needs to have a default for the standalone case, when it is not run
in a loop inside "ip addr show".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
60.nfs backgrounds it so it persists in the background causing
problems. In particular, it causes the "ctdb ip" command stub to be
run in parallel, which produces inconstent results.
Better not to run it at all in the NFS tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jul 9 09:27:02 CEST 2015 on sn-devel-104
To do any cleanup before exiting the test, register hooks with
test_cleanup(). Multiple hooks can be registered. All the hooks will
be called before exiting from the test.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This code was copied from onnode unit tests, but not used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is now achieved by defining functions extra_header() and
extra_footer().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
On IPv4-only or IPv6-only systems one of these files will not exist.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This works around cases where ctdb_transaction gets stuck - this still
needs to be debugged. However, this change will at least cause
individual tests to fail rather than having whole test runs time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
00.ctdb should not know about public IP addresses.
Move related tests to operate on 10.interface.
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_get_pnn() incorrectly caches to the same file regardless of what
node is selected via FAKE_CTDB_PNN.
Instead, set the PNN using new function ctdb_get_pnn(), which also
makes CTDB_VARDIR point to a node-specific subdirectory. This means
that ctdb_get_pnn() will correctly cache to the node-specific
directory.
Fake tickle and TDB files/directories used by the ctdb stub need to be
the same across all PNNs, so change these to use
$EVENTSCRIPTS_TESTS_VAR_DIR instead of node-specific $CTDB_VARDIR.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drop copy of old ctdb_control_nodemap().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr 7 10:20:41 CEST 2015 on sn-devel-104
Use -T tcp instead of deprecated options -u and -t. Also, check for
localhost.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Mar 27 09:16:50 CET 2015 on sn-devel-104
There is no reason to serialise these or even handle remote nodes
first. Using a broadcast is more efficient and is less code.
Update expected test results to reflect changed order of messages.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 23 15:04:00 CET 2015 on sn-devel-104
"ctdb reloadnodes" currently does no sanity checking of the nodes
file. This can cause chaos if a line is deleted from the nodes file
rather than commented out. It also repeatedly produces a spurious
warning for each deleted node, even if the node was deleted a long
time ago.
Instead compare the nodemap with the contents of the local nodes file
to sanity check before attempting any reloads. Note that this is
still imperfect if the nodes files are inconsistent across nodes but
it is better. Also ensure that any nodes that are to be deleted are
already disconnected. Avoid trying to talk to deleted nodes.
The current implementation is a bit unfortunate when it comes to
deleting nodes. The most obvious alternative to the above complexity
would be to reloadnodes on the specified node first, then fetch the
node map (in which newly deleted nodes would be marked as such) and
then handle the remote nodes. However, the implementation of
reloadnodes is asynchronous and it only actions the reload after 1
second. This is presumably to avoid the recovery master noticing the
inconsistency between nodemaps and triggering a recovery before all
nodes have had their nodemaps updated.
Note that this recovery can still occur if the check is done at an
inconvenient time. A better long term approach might be to quiesce
the recovery master checks while reloadnodes is in progress.
Update a unit test to reflect the change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A basic test and some for cross-node consistency checking.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
It should be -1 even without a failure callback registered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These can be unset if a NODEMAP, IFACES or VNNMAP section is missing.
Affected functions would then dereference a NULL pointer and the test
program would crash. Adding some helpful messages makes the problem
easier to diagnose when writing tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When executing a shell script code "foo | bar", if "bar" terminates early,
then "foo" can get I/O error when writing to stdout.
The tdbtool stub did not wait to read anything from stdin when it is
expected to. This would cause tests to fail randomly under load when
tdbtool process exited early.
Similarly, debug function read from stdin only under certain conditions
(higher debug and when not reading from tty). Otherwise, exited early.
Thanks to Andrew Bartlett for noticing the problem and Catalyst Cloud
(http://catalyst.net.nz/cloud) for providing resources to test fixes.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 20 16:26:37 CET 2015 on sn-devel-104
Although much of the test infrastructure in recent commits is actually
targeted for "reloadnodes", it is worthwhile adding some tests for
"reloadips" and "recover". This allows most of the test
infrastructure to be tried out against known good code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 16 09:18:55 CET 2015 on sn-devel-104
With support for CTDB_CONTROL_RELOAD_PUBLIC_IPS and
CTDB_CONTROL_RELOAD_NODES_FILE for now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_ctrl_reload_nodes_file_stub() does nothing except print a helpful
message. That's enough to help test the tool. It could update the
nodemap but that would not be incredibly useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Stub for ctdb_client_send_message() only implements
CTDB_SRVID_TAKEOVER_RUN and CTDB_SRVID_DISABLE_TAKEOVER_RUNS. It
assumes srvid_broadcast() is in use and just calls handler to fake
appropriate replies.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Initialise ctdb->ev in ctdb_cmdline_client_stub().
Add a comment to tevent_context_init_stub() explaining why the ctdb
context is initialised there instead of ctdb_cmdline_client_stub().
This information is in the git log but that doesn't help someone who
is reading the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The daemon uses an IP address of "0.0.0.0" when handling deleted
nodes. Do the same in the tests when loading a fake nodemap.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If recovery mode is set to active then it updates the generation and
immediately sets recovery mode back to normal.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
IP addresses and routes are only changed if either the NAT gateway
configuration or the NAT gateway master node has changed. If running
"ip monitor" this will minimise the amount of noise seen. It should
also be more lightweight at the expense of managing a couple of state
files.
Add a test to check that configuration changes behave correctly.
Tweak the static route result generation code so that the required
output is sorted.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Scripts in eventscript unit tests are run under an explicitly
specified shell so they do not need to be executable. Checking that
the script is executable breaks on scripts that are installed without
the execute bit set, such as disabled eventscripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Mar 6 04:40:07 CET 2015 on sn-devel-104
Some eventscript unit test failures get lost because _passed=false is
set in the tail of a pipe. Add a new function test_fail() and call it
when necessary to ensure the value of _passed is set correctly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar 5 07:16:54 CET 2015 on sn-devel-104
statd-callout tries to remove global files from /var/lib/nfs/statd and
this causes errors in tests. Add an rm stub that ignores attempts to
remove these files but invokes /bin/rm for anything else.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Updating ctdb.tdb on each add-client, del-client and each delete
during notify was too ambitious. Persistent transactions do not
perform well enough to do this.
Revert to having add-client and del-client create touch files. Each
monitor event calls "statd-callout update" to convert touch files into
ctdb.tdb records.
Update testcases to do the "update" and add an extra test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
With improvements to unit test infrastructure to support. This
includes linking the real statd-callout into etc-ctdb/ in place of the
placeholder script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There's so much infrastructure here that it would be a shame not to
use it for testing things like statd-callout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drops the iptables() and ip6tables() functions and, hence, the
hardcoding of paths /sbin/iptables and /sbin/ip6tables. The latter
avoids problems on openSUSE where (for example) /usr/sbin/iptables is
used instead.
This means that locking around ip*tables commands is only done when
iptables_wrapper is called directly. This is fine because the only
conflict is when "releaseip" or "takeip"/"updateip" events are run in
parallel. The other uses in 11.natgw and 70.iscsi are in events where
there will be no collisions.
Making 11.natgw support IPv6 is unnecessary. Just put a static IPv6
address on each interface - they're plentiful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jan 28 08:29:55 CET 2015 on sn-devel-104
This is a gawk extension and can't be used reliably if just running
"awk". It is simple enough to switch to using the standard sub() and
gsub() functions.
The alternative is to switch to explicitly running "gawk". However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
tcptickle_sniff_start() assumes that if $dst contains a ': then it
should use the IPv6 sniffing code. However, $dst is a socket, so has
a trailing ":<port>".
Strip the trailing ":<port>" before checking for ':' as a marker for
an IPv6 address.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
These tests simulate a dead node rather than a CTDB failure, so drop
IP addresses when killing a "node" to avoid problems with duplicates.
To cope with a CTDB failure a watchdog would be needed to ensure that
the public IPs are dropped when CTDB dies. Let's not do that now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Dec 5 23:29:39 CET 2014 on sn-devel-104
Extend select_test_node_and_ips() to set $test_prefix in addition to
$test_ip.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If CTDB_USE_IPV6 is set then use IPv6 addresses for nodes and public
IPs. This can be useful for some simple tests. However, the node
address actually needs to be on lo so that ctdbd can bind to the port
on that address, so they actually need to be added as root before
running tests, like this:
for i in $(seq 1 10) ; do ip addr add "fc00:10::${i}/64" dev lo ; done
IPv4 127.0.0.0/8 addresses are somehow magic and only one needs to be
on lo so that many can be bound to.
Also change the IPv4 node addresses to be (slightly) more exotic.
For both IPv4 and IPv6, choose addresses that are compatible with
socket wrapper.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com> (socket wrapper fixes)
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net> (socket wrapper fixes)
Add checking to "releaseip" and "updateip" to ensure that the given IP
address is really on the given interface with the given netmask. If
reality doesn't match the given arguments then believe reality.
Use new function iptables_wrapper() instead of calling iptables()
directly.
Use new function flush_route_cache() instead of doing IPv4-specific
/proc magic.
Remove setting of otherwise unused variable "failed".
Fix a test for which the error message has changed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Also update associated eventscript unit tests and ctdb stub.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are parentheses missing that stop the default pattern from
matching commands with trailing garbage (e.g. "exportfs.orig").
A careful check of POSIX (and running GNU sed with --posix) suggests
that "\|" isn't a supported way of specifying alternation in a regular
expression. Therefore, it is clearer to switch to extended regular
expressions so that this has a chance of being portable (even though
the point is to print /proc/<pid>/stack, which only works on Linux).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104
Also add and update tests for statd stack dumps. Update the existing
60.ganesha statd test to do more iterations. Duplicate the result as
a new test for 60.nfs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the process, fix a bug where an extra trace would be printed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Remove --logfile and --syslog daemon options and replace with
--logging.
Modularise and clean up logging initialisation code. The
initialisation API includes an app_name argument that is currently
unused - this will be used in extensions to the syslog backend.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the code cleaner and allows the syslog backend to be easily
modified without affecting other code. Also do some extra clean-up,
including whitespace fixups.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Internally map them to DEBUG_ERR to limit code churn.
This reduces the unwieldy number of debug levels used by CTDB. ALERT
and CRIT aren't of much use as separate errors, since everything from
ERR up should always be logged. In future just ERR can be used.
This also improves compatibility with Samba's debug.c system priority
mapping.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't used and shouldn't be. CTDB can't make the system unusable.
Update associated test to ensure that EMERG isn't attempted. Actually
test all remaining debug levels and modernise the test a bit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use a variable to allow easy change of this string in case future
logging changes modify the timestamp format or do not support
timestamping.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
As far as we know, nobody uses this and it just complicates the
logging subsystem.
Remove all ringbuffer code and documentation. Update the local
daemons startup code correspondingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
Some of this implements logic that exists in functions. Some of it is
overly complicated and potentially failure-prone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The log ringbuffer will probably be removed. The test can be
implemented just as reliably by checking IP assignments using "ctdb
ip".
Update wait_until_ips_are_on_node() to print a more useful log
message.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The "-n all" is wrong.
Simplify the implementation and tighten up some uses of this function.
_select_test_node_and_ips() can't use this function anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The glob functionality is unsed so simplify the code by removing it.
Rename this function to wait_until_ips_are_on_node(). Update all
calls.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Local daemons are started mainly for testing and usually not as root.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Some declarations get lost because they basically get #define-d away,
so they need to be repeated after the #undef-s. Also, some functions
are introduced due the #define-s.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This event was introduced to handle misconfiguration. For example,
where all nodes where configured as NAT gateway slaves.
However, this event can fail when there are performance issues and
capabilities can't be retrieved from a remote node. The problem is
most likely with the remote node, so marking the local node UNHEALTHY
is probably a mistake.
Having a NAT gateway master node only matters in "ipreallocated", so
leave it to do the checking. Given that a node will run
"ipreallocated" as part of the first recovery, this should cause
misconfigurations to be detected nice and early.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Debugging can still be running when a monitor event times out and
scriptstatus output changes.
When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done. The test
then waits for the log file to appear.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 3 08:19:23 CEST 2014 on sn-devel-104
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Routines in system_common and system_<os> are supposed to be ctdb
functions with OS specific implementations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Recent changes have caused these commands to attempt to get
capabilities from all nodes before doing further filtering. This
means that capabilities are unnecessarily fetched from nodes that are
unlikely to be the master. If such a node does not answer the control
then many nodes can fail to calculate the master node. In the case of
natgwlist this will cause "monitor" events to fail resulting in
unhealthy nodes.
Restore the behaviour where capabilities are only fetched for a node
that will be the master if it has the desired flags.
Although this masks a problem where a connected node is not replying,
it can help to avoid an outage in some cases.
Add supporting tests and infrastructure. Infrastructure just lets a
timeout be faked - just for ctdb_ctrl_getcapabilities_stub() so far.
First test checks that this infrastructure works if the first node
times out in natgwlist. Second test checks the case worked around by
the above fix - that is, no failure when a node with PNN beyond the
NATGW master can time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu May 29 05:59:37 CEST 2014 on sn-devel-104
The range
CTDB_PER_IP_ROUTING_TABLE_ID_LOW..CTDB_PER_IP_ROUTING_TABLE_ID_HIGH
should not include 253-255. Otherwise policy routing may overwrite
the default system routing tables.
Add some corresponding tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 4ee4925d41 forgot about
CTDB_NATGW_SLAVE_ONLY so it introduces an incorrect failure when this
is set, and CTDB_NATGW_PUBLIC_IFACE or CTDB_NATGW_PUBLIC_IP is unset.
Relax the sanity check to see if CTDB_NATGW_SLAVE_ONLY is set.
Update the documentation to explicitly state that
CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IP are optional and
unused if CTDB_NATGW_SLAVE_ONLY is set. It would be possible to
insist that CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IFACE should
be unset in that case. However, it is more reasonable to allow
consistent configuration across nodes except with some nodes
configured slave-only.
Add tests, update infrastructure and fix a thinko in the stub's
"natgwlist" implementation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Apr 14 06:06:49 CEST 2014 on sn-devel-104
Commit ba69742ccd missed the point of
filtering disconnected nodes while limiting the nodemap to those in
the NAT gateway group. It was really to avoid trying to fetch
capabilities from disconnected nodes. This should be explicitly done
in filter_nodemap_by_capabilities(), otherwise "ctdb natgwlist" simply
fails when there is a disconnected node.
Note that the alternate solution where filter_nodemap_by_flags() is
called before filter_nodemap_by_capabilities() would not be not
correct. Filtering on flags first can produce a "healthier" set of
nodes where none of them have the NAT gateway capability.
Also extend stub for ctdb_ctrl_getcapabilities() to fail when trying
to get capabilities from a disconnected node and add a corresponding
test to confirm that "ctdb natgwlist" is no longer broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will test that ctdb_fetch_lock correctly revokes readonly
delegations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This test currently counts the number of read-only-enabled databases
and expects there to only be 1. It fails when there are existing
databases with read-only already enabled. Instead, check just the
test database.
Clean up the test by adding some functions to check for precisely the
read-only flags that should be set on a node after each operation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This one ensures that a newly started node gets an up-to-date tickle
list. Tweak some of the integration test functions to accommodate
this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Mar 26 06:24:01 CET 2014 on sn-devel-104
This includes adding support for:
* Configuring fake NATGW state in the eventscript unit tests
* "natgwlist" and "setnatgwstate" in ctdb command stub
* ip command stub to default to "main table" when no table specified,
allow routes to be added without "dev" option (just add a default
dev), support "metric" option
Signed-off-by: Martin Schwenke <martin@meltin.net>
It is hard to diagnose failures in the NFS tickle test because there's
no way of telling if the test node doesn't have the tickle or if it
didn't get propagated.
Factor out check_tickles() into local.bash and give it some
parameters.
Have the NFS test call it first to ensure the tickle has been
registered. Then use new function check_tickles_all() to ensure the
tickle has been propagated to all nodes. Give this a bit of extra
time (double the timeout) just in case we're racing with the update.
Add a useful comment to the CIFS test so that I stop asking myself how
the test could ever have worked reliably. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Tests for xpnn need to implement a stub for ctdb_sys_have_ip(). The
cheapest way of doing this is to read a fake nodemap using the
existing code and check if the IP of the "current" node is the one
being asked about. However, the fake state initialisation isn't
currently available to without_daemon commands because it is meant to
represent daemon state. However, it can be made available by moving
the relevant code into a new stub for tevent_context_init(). The stub
still needs to initialise a tevent context - this can be done by
calling a lower level function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This looks to have got left behind a long time ago when things got
moved around...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Should stop on 1st error
* Fix up value of CTDB_TESTS_ARE_INSTALLED
* Improve fixing of broken symlinks in INSTALL
This is all of the links in tests/eventscript/etc-ctdb/ so no need
to list them. Just find and fix them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Add stack dumps for "interesting" processes that sometimes get
stuck, so try to print stack traces for them if they appear in the
pstree output.
* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing
but the latter may be useful for live debugging.
* Load CTDB configuration so that above configuration variables can be
set/changed without restarting ctdbd.
Add a test that tries to ensure that all of this is working.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the first case, reconfiguration can longer happen in a monitor
event, so this is no longer a problem. Drop it.
Running a monitor event by hand no longer cancels the existing monitor
event. Instead the hand-run event fails. So do this differently and
just wait for a monitor event before continuing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104
srcimbl gets changed on every iteration of the loop. The value that
should be stored for the new imbalance of the source node is
minsrcimbl.
To help diagnose this, added some extra debug that can be left in.
The extra debug changes the output of a couple of tests. Note that
the resulting IP allocations in those tests is unchanged - only the
debug output is changed.
Also add some new tests that illustrates the bug.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps. First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct. Then about 1/2
of the IPs and deleted and remaining IPs are checked. Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just enable this behaviour by default in the ip command stub, since
10.interface assumes/sets it. The rc.local replacement for set_proc()
doesn't do anything...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104