IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Note that the 60.ganesha RPC checks need to be identical to those in
the nfs-checks.d/ directory. This is because the NFS unit test
infrastructure checks output against what should be produced by the
checks in nfs-checks.d/. This is a minor issue, since one of the aims
of this work is to remove the need for a separate 60.ganesha.
In most cases configuration variable CTDB_NFS_DUMP_STUCK_THREADS is
now ignored. This is now handled by passing the desired number of
threads to the command specified in the service_debug_cmd variable in
a .check file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Change status, nlockmgr, mountd, rquotad to be unhealthy after 6
rpcinfo check failures and do a verbose restart after every 2
failures. Change 60.ganesha for consistency, since 60.ganesha tests
are broken and depend on the consistency.
Apart from the consistency aspect, the check infrastructure will soon
be simplified so that it only allows the equivalent of "unhealthy" and
"verbose restart:b" actions.
Update tests to have a corresponding numbers of iterations. Run 1
extra iteration in most tests to check there are no unexpected
behaviour changes after the designated number of iterations completes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
They are contrived and hard to read. Better to just enumerate the few
sub-tests in these testcases.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A useful baseline test to ensure that certain things (e.g. rpcinfo)
aren't consistently broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This means that required result will not be calculated on each
iteration. This is useful in baseline tests where, say, all
iterations should succeed and produce no output. This is useful for
confirming that the eventscript and test infrastructure is working
correctly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Much clearer than using iterate_test() for this purpose. This also
does failover counting by calling rpcinfo in each iteration.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It needs to have a default for the standalone case, when it is not run
in a loop inside "ip addr show".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
60.nfs backgrounds it so it persists in the background causing
problems. In particular, it causes the "ctdb ip" command stub to be
run in parallel, which produces inconstent results.
Better not to run it at all in the NFS tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jul 9 09:27:02 CEST 2015 on sn-devel-104
To do any cleanup before exiting the test, register hooks with
test_cleanup(). Multiple hooks can be registered. All the hooks will
be called before exiting from the test.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This code was copied from onnode unit tests, but not used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is now achieved by defining functions extra_header() and
extra_footer().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
On IPv4-only or IPv6-only systems one of these files will not exist.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This works around cases where ctdb_transaction gets stuck - this still
needs to be debugged. However, this change will at least cause
individual tests to fail rather than having whole test runs time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
00.ctdb should not know about public IP addresses.
Move related tests to operate on 10.interface.
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_get_pnn() incorrectly caches to the same file regardless of what
node is selected via FAKE_CTDB_PNN.
Instead, set the PNN using new function ctdb_get_pnn(), which also
makes CTDB_VARDIR point to a node-specific subdirectory. This means
that ctdb_get_pnn() will correctly cache to the node-specific
directory.
Fake tickle and TDB files/directories used by the ctdb stub need to be
the same across all PNNs, so change these to use
$EVENTSCRIPTS_TESTS_VAR_DIR instead of node-specific $CTDB_VARDIR.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drop copy of old ctdb_control_nodemap().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr 7 10:20:41 CEST 2015 on sn-devel-104
Use -T tcp instead of deprecated options -u and -t. Also, check for
localhost.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Mar 27 09:16:50 CET 2015 on sn-devel-104
There is no reason to serialise these or even handle remote nodes
first. Using a broadcast is more efficient and is less code.
Update expected test results to reflect changed order of messages.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 23 15:04:00 CET 2015 on sn-devel-104
"ctdb reloadnodes" currently does no sanity checking of the nodes
file. This can cause chaos if a line is deleted from the nodes file
rather than commented out. It also repeatedly produces a spurious
warning for each deleted node, even if the node was deleted a long
time ago.
Instead compare the nodemap with the contents of the local nodes file
to sanity check before attempting any reloads. Note that this is
still imperfect if the nodes files are inconsistent across nodes but
it is better. Also ensure that any nodes that are to be deleted are
already disconnected. Avoid trying to talk to deleted nodes.
The current implementation is a bit unfortunate when it comes to
deleting nodes. The most obvious alternative to the above complexity
would be to reloadnodes on the specified node first, then fetch the
node map (in which newly deleted nodes would be marked as such) and
then handle the remote nodes. However, the implementation of
reloadnodes is asynchronous and it only actions the reload after 1
second. This is presumably to avoid the recovery master noticing the
inconsistency between nodemaps and triggering a recovery before all
nodes have had their nodemaps updated.
Note that this recovery can still occur if the check is done at an
inconvenient time. A better long term approach might be to quiesce
the recovery master checks while reloadnodes is in progress.
Update a unit test to reflect the change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A basic test and some for cross-node consistency checking.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
It should be -1 even without a failure callback registered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These can be unset if a NODEMAP, IFACES or VNNMAP section is missing.
Affected functions would then dereference a NULL pointer and the test
program would crash. Adding some helpful messages makes the problem
easier to diagnose when writing tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When executing a shell script code "foo | bar", if "bar" terminates early,
then "foo" can get I/O error when writing to stdout.
The tdbtool stub did not wait to read anything from stdin when it is
expected to. This would cause tests to fail randomly under load when
tdbtool process exited early.
Similarly, debug function read from stdin only under certain conditions
(higher debug and when not reading from tty). Otherwise, exited early.
Thanks to Andrew Bartlett for noticing the problem and Catalyst Cloud
(http://catalyst.net.nz/cloud) for providing resources to test fixes.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 20 16:26:37 CET 2015 on sn-devel-104
Although much of the test infrastructure in recent commits is actually
targeted for "reloadnodes", it is worthwhile adding some tests for
"reloadips" and "recover". This allows most of the test
infrastructure to be tried out against known good code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 16 09:18:55 CET 2015 on sn-devel-104
With support for CTDB_CONTROL_RELOAD_PUBLIC_IPS and
CTDB_CONTROL_RELOAD_NODES_FILE for now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_ctrl_reload_nodes_file_stub() does nothing except print a helpful
message. That's enough to help test the tool. It could update the
nodemap but that would not be incredibly useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Stub for ctdb_client_send_message() only implements
CTDB_SRVID_TAKEOVER_RUN and CTDB_SRVID_DISABLE_TAKEOVER_RUNS. It
assumes srvid_broadcast() is in use and just calls handler to fake
appropriate replies.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Initialise ctdb->ev in ctdb_cmdline_client_stub().
Add a comment to tevent_context_init_stub() explaining why the ctdb
context is initialised there instead of ctdb_cmdline_client_stub().
This information is in the git log but that doesn't help someone who
is reading the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The daemon uses an IP address of "0.0.0.0" when handling deleted
nodes. Do the same in the tests when loading a fake nodemap.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If recovery mode is set to active then it updates the generation and
immediately sets recovery mode back to normal.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
IP addresses and routes are only changed if either the NAT gateway
configuration or the NAT gateway master node has changed. If running
"ip monitor" this will minimise the amount of noise seen. It should
also be more lightweight at the expense of managing a couple of state
files.
Add a test to check that configuration changes behave correctly.
Tweak the static route result generation code so that the required
output is sorted.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Scripts in eventscript unit tests are run under an explicitly
specified shell so they do not need to be executable. Checking that
the script is executable breaks on scripts that are installed without
the execute bit set, such as disabled eventscripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Mar 6 04:40:07 CET 2015 on sn-devel-104
Some eventscript unit test failures get lost because _passed=false is
set in the tail of a pipe. Add a new function test_fail() and call it
when necessary to ensure the value of _passed is set correctly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar 5 07:16:54 CET 2015 on sn-devel-104
statd-callout tries to remove global files from /var/lib/nfs/statd and
this causes errors in tests. Add an rm stub that ignores attempts to
remove these files but invokes /bin/rm for anything else.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Updating ctdb.tdb on each add-client, del-client and each delete
during notify was too ambitious. Persistent transactions do not
perform well enough to do this.
Revert to having add-client and del-client create touch files. Each
monitor event calls "statd-callout update" to convert touch files into
ctdb.tdb records.
Update testcases to do the "update" and add an extra test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
With improvements to unit test infrastructure to support. This
includes linking the real statd-callout into etc-ctdb/ in place of the
placeholder script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There's so much infrastructure here that it would be a shame not to
use it for testing things like statd-callout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drops the iptables() and ip6tables() functions and, hence, the
hardcoding of paths /sbin/iptables and /sbin/ip6tables. The latter
avoids problems on openSUSE where (for example) /usr/sbin/iptables is
used instead.
This means that locking around ip*tables commands is only done when
iptables_wrapper is called directly. This is fine because the only
conflict is when "releaseip" or "takeip"/"updateip" events are run in
parallel. The other uses in 11.natgw and 70.iscsi are in events where
there will be no collisions.
Making 11.natgw support IPv6 is unnecessary. Just put a static IPv6
address on each interface - they're plentiful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jan 28 08:29:55 CET 2015 on sn-devel-104
This is a gawk extension and can't be used reliably if just running
"awk". It is simple enough to switch to using the standard sub() and
gsub() functions.
The alternative is to switch to explicitly running "gawk". However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
tcptickle_sniff_start() assumes that if $dst contains a ': then it
should use the IPv6 sniffing code. However, $dst is a socket, so has
a trailing ":<port>".
Strip the trailing ":<port>" before checking for ':' as a marker for
an IPv6 address.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
These tests simulate a dead node rather than a CTDB failure, so drop
IP addresses when killing a "node" to avoid problems with duplicates.
To cope with a CTDB failure a watchdog would be needed to ensure that
the public IPs are dropped when CTDB dies. Let's not do that now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Dec 5 23:29:39 CET 2014 on sn-devel-104
Extend select_test_node_and_ips() to set $test_prefix in addition to
$test_ip.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If CTDB_USE_IPV6 is set then use IPv6 addresses for nodes and public
IPs. This can be useful for some simple tests. However, the node
address actually needs to be on lo so that ctdbd can bind to the port
on that address, so they actually need to be added as root before
running tests, like this:
for i in $(seq 1 10) ; do ip addr add "fc00:10::${i}/64" dev lo ; done
IPv4 127.0.0.0/8 addresses are somehow magic and only one needs to be
on lo so that many can be bound to.
Also change the IPv4 node addresses to be (slightly) more exotic.
For both IPv4 and IPv6, choose addresses that are compatible with
socket wrapper.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com> (socket wrapper fixes)
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net> (socket wrapper fixes)
Add checking to "releaseip" and "updateip" to ensure that the given IP
address is really on the given interface with the given netmask. If
reality doesn't match the given arguments then believe reality.
Use new function iptables_wrapper() instead of calling iptables()
directly.
Use new function flush_route_cache() instead of doing IPv4-specific
/proc magic.
Remove setting of otherwise unused variable "failed".
Fix a test for which the error message has changed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Also update associated eventscript unit tests and ctdb stub.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are parentheses missing that stop the default pattern from
matching commands with trailing garbage (e.g. "exportfs.orig").
A careful check of POSIX (and running GNU sed with --posix) suggests
that "\|" isn't a supported way of specifying alternation in a regular
expression. Therefore, it is clearer to switch to extended regular
expressions so that this has a chance of being portable (even though
the point is to print /proc/<pid>/stack, which only works on Linux).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104
Also add and update tests for statd stack dumps. Update the existing
60.ganesha statd test to do more iterations. Duplicate the result as
a new test for 60.nfs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the process, fix a bug where an extra trace would be printed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Remove --logfile and --syslog daemon options and replace with
--logging.
Modularise and clean up logging initialisation code. The
initialisation API includes an app_name argument that is currently
unused - this will be used in extensions to the syslog backend.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the code cleaner and allows the syslog backend to be easily
modified without affecting other code. Also do some extra clean-up,
including whitespace fixups.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Internally map them to DEBUG_ERR to limit code churn.
This reduces the unwieldy number of debug levels used by CTDB. ALERT
and CRIT aren't of much use as separate errors, since everything from
ERR up should always be logged. In future just ERR can be used.
This also improves compatibility with Samba's debug.c system priority
mapping.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't used and shouldn't be. CTDB can't make the system unusable.
Update associated test to ensure that EMERG isn't attempted. Actually
test all remaining debug levels and modernise the test a bit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use a variable to allow easy change of this string in case future
logging changes modify the timestamp format or do not support
timestamping.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
As far as we know, nobody uses this and it just complicates the
logging subsystem.
Remove all ringbuffer code and documentation. Update the local
daemons startup code correspondingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
Some of this implements logic that exists in functions. Some of it is
overly complicated and potentially failure-prone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The log ringbuffer will probably be removed. The test can be
implemented just as reliably by checking IP assignments using "ctdb
ip".
Update wait_until_ips_are_on_node() to print a more useful log
message.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The "-n all" is wrong.
Simplify the implementation and tighten up some uses of this function.
_select_test_node_and_ips() can't use this function anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The glob functionality is unsed so simplify the code by removing it.
Rename this function to wait_until_ips_are_on_node(). Update all
calls.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Local daemons are started mainly for testing and usually not as root.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Some declarations get lost because they basically get #define-d away,
so they need to be repeated after the #undef-s. Also, some functions
are introduced due the #define-s.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This event was introduced to handle misconfiguration. For example,
where all nodes where configured as NAT gateway slaves.
However, this event can fail when there are performance issues and
capabilities can't be retrieved from a remote node. The problem is
most likely with the remote node, so marking the local node UNHEALTHY
is probably a mistake.
Having a NAT gateway master node only matters in "ipreallocated", so
leave it to do the checking. Given that a node will run
"ipreallocated" as part of the first recovery, this should cause
misconfigurations to be detected nice and early.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Debugging can still be running when a monitor event times out and
scriptstatus output changes.
When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done. The test
then waits for the log file to appear.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 3 08:19:23 CEST 2014 on sn-devel-104
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Routines in system_common and system_<os> are supposed to be ctdb
functions with OS specific implementations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Recent changes have caused these commands to attempt to get
capabilities from all nodes before doing further filtering. This
means that capabilities are unnecessarily fetched from nodes that are
unlikely to be the master. If such a node does not answer the control
then many nodes can fail to calculate the master node. In the case of
natgwlist this will cause "monitor" events to fail resulting in
unhealthy nodes.
Restore the behaviour where capabilities are only fetched for a node
that will be the master if it has the desired flags.
Although this masks a problem where a connected node is not replying,
it can help to avoid an outage in some cases.
Add supporting tests and infrastructure. Infrastructure just lets a
timeout be faked - just for ctdb_ctrl_getcapabilities_stub() so far.
First test checks that this infrastructure works if the first node
times out in natgwlist. Second test checks the case worked around by
the above fix - that is, no failure when a node with PNN beyond the
NATGW master can time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu May 29 05:59:37 CEST 2014 on sn-devel-104
The range
CTDB_PER_IP_ROUTING_TABLE_ID_LOW..CTDB_PER_IP_ROUTING_TABLE_ID_HIGH
should not include 253-255. Otherwise policy routing may overwrite
the default system routing tables.
Add some corresponding tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 4ee4925d41 forgot about
CTDB_NATGW_SLAVE_ONLY so it introduces an incorrect failure when this
is set, and CTDB_NATGW_PUBLIC_IFACE or CTDB_NATGW_PUBLIC_IP is unset.
Relax the sanity check to see if CTDB_NATGW_SLAVE_ONLY is set.
Update the documentation to explicitly state that
CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IP are optional and
unused if CTDB_NATGW_SLAVE_ONLY is set. It would be possible to
insist that CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IFACE should
be unset in that case. However, it is more reasonable to allow
consistent configuration across nodes except with some nodes
configured slave-only.
Add tests, update infrastructure and fix a thinko in the stub's
"natgwlist" implementation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Apr 14 06:06:49 CEST 2014 on sn-devel-104
Commit ba69742ccd missed the point of
filtering disconnected nodes while limiting the nodemap to those in
the NAT gateway group. It was really to avoid trying to fetch
capabilities from disconnected nodes. This should be explicitly done
in filter_nodemap_by_capabilities(), otherwise "ctdb natgwlist" simply
fails when there is a disconnected node.
Note that the alternate solution where filter_nodemap_by_flags() is
called before filter_nodemap_by_capabilities() would not be not
correct. Filtering on flags first can produce a "healthier" set of
nodes where none of them have the NAT gateway capability.
Also extend stub for ctdb_ctrl_getcapabilities() to fail when trying
to get capabilities from a disconnected node and add a corresponding
test to confirm that "ctdb natgwlist" is no longer broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will test that ctdb_fetch_lock correctly revokes readonly
delegations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This test currently counts the number of read-only-enabled databases
and expects there to only be 1. It fails when there are existing
databases with read-only already enabled. Instead, check just the
test database.
Clean up the test by adding some functions to check for precisely the
read-only flags that should be set on a node after each operation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This one ensures that a newly started node gets an up-to-date tickle
list. Tweak some of the integration test functions to accommodate
this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Mar 26 06:24:01 CET 2014 on sn-devel-104
This includes adding support for:
* Configuring fake NATGW state in the eventscript unit tests
* "natgwlist" and "setnatgwstate" in ctdb command stub
* ip command stub to default to "main table" when no table specified,
allow routes to be added without "dev" option (just add a default
dev), support "metric" option
Signed-off-by: Martin Schwenke <martin@meltin.net>
It is hard to diagnose failures in the NFS tickle test because there's
no way of telling if the test node doesn't have the tickle or if it
didn't get propagated.
Factor out check_tickles() into local.bash and give it some
parameters.
Have the NFS test call it first to ensure the tickle has been
registered. Then use new function check_tickles_all() to ensure the
tickle has been propagated to all nodes. Give this a bit of extra
time (double the timeout) just in case we're racing with the update.
Add a useful comment to the CIFS test so that I stop asking myself how
the test could ever have worked reliably. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Tests for xpnn need to implement a stub for ctdb_sys_have_ip(). The
cheapest way of doing this is to read a fake nodemap using the
existing code and check if the IP of the "current" node is the one
being asked about. However, the fake state initialisation isn't
currently available to without_daemon commands because it is meant to
represent daemon state. However, it can be made available by moving
the relevant code into a new stub for tevent_context_init(). The stub
still needs to initialise a tevent context - this can be done by
calling a lower level function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This looks to have got left behind a long time ago when things got
moved around...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Should stop on 1st error
* Fix up value of CTDB_TESTS_ARE_INSTALLED
* Improve fixing of broken symlinks in INSTALL
This is all of the links in tests/eventscript/etc-ctdb/ so no need
to list them. Just find and fix them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Add stack dumps for "interesting" processes that sometimes get
stuck, so try to print stack traces for them if they appear in the
pstree output.
* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing
but the latter may be useful for live debugging.
* Load CTDB configuration so that above configuration variables can be
set/changed without restarting ctdbd.
Add a test that tries to ensure that all of this is working.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the first case, reconfiguration can longer happen in a monitor
event, so this is no longer a problem. Drop it.
Running a monitor event by hand no longer cancels the existing monitor
event. Instead the hand-run event fails. So do this differently and
just wait for a monitor event before continuing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104
srcimbl gets changed on every iteration of the loop. The value that
should be stored for the new imbalance of the source node is
minsrcimbl.
To help diagnose this, added some extra debug that can be left in.
The extra debug changes the output of a couple of tests. Note that
the resulting IP allocations in those tests is unchanged - only the
debug output is changed.
Also add some new tests that illustrates the bug.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps. First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct. Then about 1/2
of the IPs and deleted and remaining IPs are checked. Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just enable this behaviour by default in the ip command stub, since
10.interface assumes/sets it. The rc.local replacement for set_proc()
doesn't do anything...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104
It should support primary and secondaries per network instead of per
interface.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Currently the lock is held until the corresponding eventscript
completes, since the process still exists. If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time. The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held. This can cause an unwanted monitor replay.
Change this so that the lock is released immediately after the
reconfiguration is complete.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"monitor" events can be cancelled. If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped. In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).
A long time ago we did service reconfiguration in "monitor" events
following failovers. Service reconfiguration was then moved to the
"ipreallocated" event. However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur. The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated". Therefore, IPs can be deleted without
running the required service reconfiguration.
The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.
This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.
Also update the associated tests. Make the first confirm that the
monitor event no longer does reconfiguration. Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
explain how to run individual tests and test collections and remove mention of
tests/scripts/run_tests which does not exist any more.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
At the moment run_tests.sh has quite fragile argument processing. It
needs that annoying "--" between options and tests. The random
default (mktemp -d) for TEST_VAR_DIR is wrong and is worked around in
various places.
Instead:
* Change the default behaviour to print a summary, add new option -N
to turn off summary, and remove old -s option.
* Change the default behaviour to run integration tests with local
daemons, add new options -c to run on a cluster, remove old -l
option.
* Make $testdir/var the default if the tests are not installed, and
$(mktemp -d ) the default if tests are installed.
* Move the default tests for local/cluster into scripts/run_tests.
run_tests.sh (and the run_cluster_tests.sh symlink) should behave as
before but with slightly more reasonable defaults.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Don't scatter the TEST_LOCAL_DAEMONS logic around the code. Limit it
to the local daemons file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
setup_ctdb() doesn't need to do anything on a cluster. To avoid a
conditional, just override it for local daemons.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This was the start of some refactorisation that was never completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This hasn't been required for a long time and is probably broken. If
it is needed in future then we know where to find it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This is just a straight move. The clever stuff will follow. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred. This code is horrible and fragile.
A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.
Also remove the old code that checks for tickle information in shared
storage. CTDB hasn't done things this way for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104
This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes scripts called in the unit tests behave like
when called from ctdbd which ignodes SIGPIPE.
This also makes the scrips behave the same when
called from "make autotest" directly and via autobuild (python).
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Otherwise this should use mktemp, something should look at the output
and the file should be removed. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Nov 27 20:39:00 CET 2013 on sn-devel-104
Using a variable is too fragile, so use a function instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Also match $TEST_VAR_DIR in the socket name. This means that we'll
only ever kill ctdbd process belong to our own test run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
* Low level DB checks should ignore the sequence number record.
* A restart is needed after messing with the RecoverPDBBySeqNum
tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Also add test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Currently it only passes the last (non -v) option seen. It should
pass them all.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
$CTDB_TEST_WRAPPER is required only to run test functions or test binaries
on remote nodes. For running ctdb command, $CTDB is sufficient.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Tue Nov 19 19:06:51 CET 2013 on sn-devel-104
This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.
This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46615c8e0e63291605d76a6d35f1a93180718c36)
This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d1674aad224f8f0c9a03c3cd38a647318ba0f03e)
This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.
This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration. Fixing it is hard and involves adding a more
complex stub for testparm. We already have that in the eventscript
unit tests above.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81b94fbb7495ac3204f1a84c673c8babf04663bc)
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning. Instead, put a
timeout on a foreground update.
If the foreground update fails:
* If there's no available cache file then die.
* If there is a previous cache file then use it and log a warning.
* Do a background update at the end of the monitor event.
Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value. Update the
associated test to add a comma.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562)
This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output. The fake "ctdb -Y status" output in the
test was never updated to reflect this change.
Instead of making sure that all columns are "0", just check that
they're not "1". This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.
Also update associated tests. The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.
This fixes samba bz#8122.
Signed-off-by: Martin Schwenke <martin@meltin.net>
onnode test fixup
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 01a46205c3a3d6609dc0b0324319b89667dffa32)
Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.
The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location. This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc67a4e24af9d07aead2a1710eeaf5d6cc409201)
Behaves like mkdir -p.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit afe2145d91725daf1399f0a24f1cddcf65f0ec31)
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.
Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded. It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.
Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8)
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.
Remove tests related to the removed checkers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
Some scripts are disabled by default so are no executable. Explicitly
running them under sh allows them to be run without having to mess
around and make them executable or similar.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9437d4809bfbbb5c6a32a610665333d2f641881d)
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog). If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
All CTDB configuration variables should start with CTDB_.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
Otherwise we end up with lots of useless temporary directories.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 63924ff372b066cd878b79e71f06de4c24c814a2)
* --public-interface is not needed
* Add --sloppy-start to speed up restarts
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0dec5b8e60316701fdd02150c4dd8f01aacbfda)
With the new persistent transaction code, sequence numbers will be
automatically updated whenever a record is updated.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 961dd5d0acbb971756944ea9f69992020ea7d9fc)
Main changes are:
libctdb_test.c -> ctdb_test_stubs.c
ctdb_tool_libctdb.c -> ctdb_functest.c
ctdb_tool_stubby.c is gone, replaced with existing ctdb_test.c.
Functions starting with "libctdb_test_" now start with
"ctdb_test_stubs_".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 6182bd0c19f215a997efe5272e633b1b1bd0c882)
Instead, override controls using preprocessor magic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 10aac42f30cc0d56dca42ece17d04ccbc321056d)
Specifying nodes to reload no longer uses -n.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d921b2756d5f1c4ad7a35fe120f6fda9f5bf5686)
The current implementation has a few flaws:
* A takeover run is called unconditionally when the timer goes even if
the recovery master role has moved. This means a node other than
the recovery master can incorrectly do a takeover run.
* The rebalancing target nodes are cleared in the setup for a takeover
run, regardless of whether the takeover run succeeds.
* The timer to force a rebalance isn't cleared if another takeover run
occurs before the deadline. Any forced rebalancing will happen in
the first takeover run and when the timer expires some time later
then an unnecessary takeover run will occur.
* If the recovery master role moves then the rebalancing data will
stay on the original node and affect the next takeover run to occur
if the recovery master role should come back to the original node.
Instead, store an array of rebalance target nodes in the recovery
master context. This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds. The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.
This means that it is possible to lose rebalance data if the recovery
master role moves. However, that's a difficult problem to solve. The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.
The long term solution is to avoid this nonsense completely. The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy. This also needs recovery
master stability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
... plus updates to test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7)
A monitor event following a "ctdb delip" might reconfigure services.
If the monitor event is cancelled then a service might be stopped but
not yet restarted and this could result in the subsequent monitor
events failing.
This obviously needs to be fixed in CTDB itself. This will happen by
making "ctdb reloadips" the supported way of reconfiguring IPs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 618ea3660e36e7bd92b686e1ca8728cf63c3c068)
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
This makes the gaps in the logs more obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
Also add it to the corresponding eventscript unit test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
While doing this:
* Explicitly assign RPC program and version information in
_nfs_check_rpc_common(). This is more lines of code but is easier
to read.
* Don't print the options when starting a service. Trying to print it
makes the code messy for little benefit.
Update the eventscript unit testing code and a Ganesha test to
reflect this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
That is, output that goes through background_with_logging() just gets
"&" prepended to each line. This is cleaner than having the tests
grovel through logs.
Update some 49.winbind/50.samba tests to deal with this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5)
They're hard to maintain and provide very little benefit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
This should minimise the chances of a control timing out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 63be516673c5d9c0d543617bf1bb8bca919956a8)
Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.
Also, do not disable 10.interface until it matters. Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45)
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.
Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
Regardless of whether a summary is being printed!
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1)
Refactor the NFS test setup/cleanup code into new common functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b)
Change the command from "true" to "hostname" since the former won't
produce any output when used in combination with "onnode -p". This
could just be changed to "echo" but the hostname might actually be
useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ae3c03d80264e997b7da9f3279d7810e18b8a1df)
This fixes the segmentation error if any of the test code fails to
connect to CTDB daemon.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d48eecd748830598f4f080952f2bf05d6f92738c)
Also check that we're not in recovery mode.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b7aaa28b3a6a2de923417f3d143f8d516447711e)
No need for 2 recoveries after a restart.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b953524185632d7f96a76d8f3bbed7ac1d143d40)
These test dropping of IPs and TDB checking.
New stubs for date, tdbdump, tdbtool.
Enhance ip stub to handle "ip addr show to ..."
Tweak some infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit aabf0bf41cb8ec344f06b69492fb6c2a27f9e900)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c3e7a6e10d486ba0dbafdf110db540675b2317bc)
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cd4358b01c6c3d413b431f5760029d2b163b9c03)
... and delete a bogus comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0e2b5a8f89440a53f996482ac0c98b31a4f2cad3)
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ce2ef2be8aa22c0baf868daac8d4cf27246baa14)