IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Stub for ctdb_client_send_message() only implements
CTDB_SRVID_TAKEOVER_RUN and CTDB_SRVID_DISABLE_TAKEOVER_RUNS. It
assumes srvid_broadcast() is in use and just calls handler to fake
appropriate replies.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Initialise ctdb->ev in ctdb_cmdline_client_stub().
Add a comment to tevent_context_init_stub() explaining why the ctdb
context is initialised there instead of ctdb_cmdline_client_stub().
This information is in the git log but that doesn't help someone who
is reading the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The daemon uses an IP address of "0.0.0.0" when handling deleted
nodes. Do the same in the tests when loading a fake nodemap.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If recovery mode is set to active then it updates the generation and
immediately sets recovery mode back to normal.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
IP addresses and routes are only changed if either the NAT gateway
configuration or the NAT gateway master node has changed. If running
"ip monitor" this will minimise the amount of noise seen. It should
also be more lightweight at the expense of managing a couple of state
files.
Add a test to check that configuration changes behave correctly.
Tweak the static route result generation code so that the required
output is sorted.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Scripts in eventscript unit tests are run under an explicitly
specified shell so they do not need to be executable. Checking that
the script is executable breaks on scripts that are installed without
the execute bit set, such as disabled eventscripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Mar 6 04:40:07 CET 2015 on sn-devel-104
Some eventscript unit test failures get lost because _passed=false is
set in the tail of a pipe. Add a new function test_fail() and call it
when necessary to ensure the value of _passed is set correctly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar 5 07:16:54 CET 2015 on sn-devel-104
statd-callout tries to remove global files from /var/lib/nfs/statd and
this causes errors in tests. Add an rm stub that ignores attempts to
remove these files but invokes /bin/rm for anything else.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Updating ctdb.tdb on each add-client, del-client and each delete
during notify was too ambitious. Persistent transactions do not
perform well enough to do this.
Revert to having add-client and del-client create touch files. Each
monitor event calls "statd-callout update" to convert touch files into
ctdb.tdb records.
Update testcases to do the "update" and add an extra test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
With improvements to unit test infrastructure to support. This
includes linking the real statd-callout into etc-ctdb/ in place of the
placeholder script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There's so much infrastructure here that it would be a shame not to
use it for testing things like statd-callout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drops the iptables() and ip6tables() functions and, hence, the
hardcoding of paths /sbin/iptables and /sbin/ip6tables. The latter
avoids problems on openSUSE where (for example) /usr/sbin/iptables is
used instead.
This means that locking around ip*tables commands is only done when
iptables_wrapper is called directly. This is fine because the only
conflict is when "releaseip" or "takeip"/"updateip" events are run in
parallel. The other uses in 11.natgw and 70.iscsi are in events where
there will be no collisions.
Making 11.natgw support IPv6 is unnecessary. Just put a static IPv6
address on each interface - they're plentiful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jan 28 08:29:55 CET 2015 on sn-devel-104
This is a gawk extension and can't be used reliably if just running
"awk". It is simple enough to switch to using the standard sub() and
gsub() functions.
The alternative is to switch to explicitly running "gawk". However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
tcptickle_sniff_start() assumes that if $dst contains a ': then it
should use the IPv6 sniffing code. However, $dst is a socket, so has
a trailing ":<port>".
Strip the trailing ":<port>" before checking for ':' as a marker for
an IPv6 address.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
These tests simulate a dead node rather than a CTDB failure, so drop
IP addresses when killing a "node" to avoid problems with duplicates.
To cope with a CTDB failure a watchdog would be needed to ensure that
the public IPs are dropped when CTDB dies. Let's not do that now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Dec 5 23:29:39 CET 2014 on sn-devel-104
Extend select_test_node_and_ips() to set $test_prefix in addition to
$test_ip.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If CTDB_USE_IPV6 is set then use IPv6 addresses for nodes and public
IPs. This can be useful for some simple tests. However, the node
address actually needs to be on lo so that ctdbd can bind to the port
on that address, so they actually need to be added as root before
running tests, like this:
for i in $(seq 1 10) ; do ip addr add "fc00:10::${i}/64" dev lo ; done
IPv4 127.0.0.0/8 addresses are somehow magic and only one needs to be
on lo so that many can be bound to.
Also change the IPv4 node addresses to be (slightly) more exotic.
For both IPv4 and IPv6, choose addresses that are compatible with
socket wrapper.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com> (socket wrapper fixes)
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net> (socket wrapper fixes)
Add checking to "releaseip" and "updateip" to ensure that the given IP
address is really on the given interface with the given netmask. If
reality doesn't match the given arguments then believe reality.
Use new function iptables_wrapper() instead of calling iptables()
directly.
Use new function flush_route_cache() instead of doing IPv4-specific
/proc magic.
Remove setting of otherwise unused variable "failed".
Fix a test for which the error message has changed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Also update associated eventscript unit tests and ctdb stub.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are parentheses missing that stop the default pattern from
matching commands with trailing garbage (e.g. "exportfs.orig").
A careful check of POSIX (and running GNU sed with --posix) suggests
that "\|" isn't a supported way of specifying alternation in a regular
expression. Therefore, it is clearer to switch to extended regular
expressions so that this has a chance of being portable (even though
the point is to print /proc/<pid>/stack, which only works on Linux).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104
Also add and update tests for statd stack dumps. Update the existing
60.ganesha statd test to do more iterations. Duplicate the result as
a new test for 60.nfs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the process, fix a bug where an extra trace would be printed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Remove --logfile and --syslog daemon options and replace with
--logging.
Modularise and clean up logging initialisation code. The
initialisation API includes an app_name argument that is currently
unused - this will be used in extensions to the syslog backend.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the code cleaner and allows the syslog backend to be easily
modified without affecting other code. Also do some extra clean-up,
including whitespace fixups.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Internally map them to DEBUG_ERR to limit code churn.
This reduces the unwieldy number of debug levels used by CTDB. ALERT
and CRIT aren't of much use as separate errors, since everything from
ERR up should always be logged. In future just ERR can be used.
This also improves compatibility with Samba's debug.c system priority
mapping.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't used and shouldn't be. CTDB can't make the system unusable.
Update associated test to ensure that EMERG isn't attempted. Actually
test all remaining debug levels and modernise the test a bit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use a variable to allow easy change of this string in case future
logging changes modify the timestamp format or do not support
timestamping.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
As far as we know, nobody uses this and it just complicates the
logging subsystem.
Remove all ringbuffer code and documentation. Update the local
daemons startup code correspondingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
Some of this implements logic that exists in functions. Some of it is
overly complicated and potentially failure-prone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The log ringbuffer will probably be removed. The test can be
implemented just as reliably by checking IP assignments using "ctdb
ip".
Update wait_until_ips_are_on_node() to print a more useful log
message.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The "-n all" is wrong.
Simplify the implementation and tighten up some uses of this function.
_select_test_node_and_ips() can't use this function anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The glob functionality is unsed so simplify the code by removing it.
Rename this function to wait_until_ips_are_on_node(). Update all
calls.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Local daemons are started mainly for testing and usually not as root.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Some declarations get lost because they basically get #define-d away,
so they need to be repeated after the #undef-s. Also, some functions
are introduced due the #define-s.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This event was introduced to handle misconfiguration. For example,
where all nodes where configured as NAT gateway slaves.
However, this event can fail when there are performance issues and
capabilities can't be retrieved from a remote node. The problem is
most likely with the remote node, so marking the local node UNHEALTHY
is probably a mistake.
Having a NAT gateway master node only matters in "ipreallocated", so
leave it to do the checking. Given that a node will run
"ipreallocated" as part of the first recovery, this should cause
misconfigurations to be detected nice and early.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Debugging can still be running when a monitor event times out and
scriptstatus output changes.
When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done. The test
then waits for the log file to appear.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 3 08:19:23 CEST 2014 on sn-devel-104
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Routines in system_common and system_<os> are supposed to be ctdb
functions with OS specific implementations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Recent changes have caused these commands to attempt to get
capabilities from all nodes before doing further filtering. This
means that capabilities are unnecessarily fetched from nodes that are
unlikely to be the master. If such a node does not answer the control
then many nodes can fail to calculate the master node. In the case of
natgwlist this will cause "monitor" events to fail resulting in
unhealthy nodes.
Restore the behaviour where capabilities are only fetched for a node
that will be the master if it has the desired flags.
Although this masks a problem where a connected node is not replying,
it can help to avoid an outage in some cases.
Add supporting tests and infrastructure. Infrastructure just lets a
timeout be faked - just for ctdb_ctrl_getcapabilities_stub() so far.
First test checks that this infrastructure works if the first node
times out in natgwlist. Second test checks the case worked around by
the above fix - that is, no failure when a node with PNN beyond the
NATGW master can time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu May 29 05:59:37 CEST 2014 on sn-devel-104
The range
CTDB_PER_IP_ROUTING_TABLE_ID_LOW..CTDB_PER_IP_ROUTING_TABLE_ID_HIGH
should not include 253-255. Otherwise policy routing may overwrite
the default system routing tables.
Add some corresponding tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 4ee4925d41 forgot about
CTDB_NATGW_SLAVE_ONLY so it introduces an incorrect failure when this
is set, and CTDB_NATGW_PUBLIC_IFACE or CTDB_NATGW_PUBLIC_IP is unset.
Relax the sanity check to see if CTDB_NATGW_SLAVE_ONLY is set.
Update the documentation to explicitly state that
CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IP are optional and
unused if CTDB_NATGW_SLAVE_ONLY is set. It would be possible to
insist that CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IFACE should
be unset in that case. However, it is more reasonable to allow
consistent configuration across nodes except with some nodes
configured slave-only.
Add tests, update infrastructure and fix a thinko in the stub's
"natgwlist" implementation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Apr 14 06:06:49 CEST 2014 on sn-devel-104
Commit ba69742ccd missed the point of
filtering disconnected nodes while limiting the nodemap to those in
the NAT gateway group. It was really to avoid trying to fetch
capabilities from disconnected nodes. This should be explicitly done
in filter_nodemap_by_capabilities(), otherwise "ctdb natgwlist" simply
fails when there is a disconnected node.
Note that the alternate solution where filter_nodemap_by_flags() is
called before filter_nodemap_by_capabilities() would not be not
correct. Filtering on flags first can produce a "healthier" set of
nodes where none of them have the NAT gateway capability.
Also extend stub for ctdb_ctrl_getcapabilities() to fail when trying
to get capabilities from a disconnected node and add a corresponding
test to confirm that "ctdb natgwlist" is no longer broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will test that ctdb_fetch_lock correctly revokes readonly
delegations.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This test currently counts the number of read-only-enabled databases
and expects there to only be 1. It fails when there are existing
databases with read-only already enabled. Instead, check just the
test database.
Clean up the test by adding some functions to check for precisely the
read-only flags that should be set on a node after each operation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This one ensures that a newly started node gets an up-to-date tickle
list. Tweak some of the integration test functions to accommodate
this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Mar 26 06:24:01 CET 2014 on sn-devel-104
This includes adding support for:
* Configuring fake NATGW state in the eventscript unit tests
* "natgwlist" and "setnatgwstate" in ctdb command stub
* ip command stub to default to "main table" when no table specified,
allow routes to be added without "dev" option (just add a default
dev), support "metric" option
Signed-off-by: Martin Schwenke <martin@meltin.net>
It is hard to diagnose failures in the NFS tickle test because there's
no way of telling if the test node doesn't have the tickle or if it
didn't get propagated.
Factor out check_tickles() into local.bash and give it some
parameters.
Have the NFS test call it first to ensure the tickle has been
registered. Then use new function check_tickles_all() to ensure the
tickle has been propagated to all nodes. Give this a bit of extra
time (double the timeout) just in case we're racing with the update.
Add a useful comment to the CIFS test so that I stop asking myself how
the test could ever have worked reliably. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Tests for xpnn need to implement a stub for ctdb_sys_have_ip(). The
cheapest way of doing this is to read a fake nodemap using the
existing code and check if the IP of the "current" node is the one
being asked about. However, the fake state initialisation isn't
currently available to without_daemon commands because it is meant to
represent daemon state. However, it can be made available by moving
the relevant code into a new stub for tevent_context_init(). The stub
still needs to initialise a tevent context - this can be done by
calling a lower level function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This looks to have got left behind a long time ago when things got
moved around...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Should stop on 1st error
* Fix up value of CTDB_TESTS_ARE_INSTALLED
* Improve fixing of broken symlinks in INSTALL
This is all of the links in tests/eventscript/etc-ctdb/ so no need
to list them. Just find and fix them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Add stack dumps for "interesting" processes that sometimes get
stuck, so try to print stack traces for them if they appear in the
pstree output.
* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing
but the latter may be useful for live debugging.
* Load CTDB configuration so that above configuration variables can be
set/changed without restarting ctdbd.
Add a test that tries to ensure that all of this is working.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the first case, reconfiguration can longer happen in a monitor
event, so this is no longer a problem. Drop it.
Running a monitor event by hand no longer cancels the existing monitor
event. Instead the hand-run event fails. So do this differently and
just wait for a monitor event before continuing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104
srcimbl gets changed on every iteration of the loop. The value that
should be stored for the new imbalance of the source node is
minsrcimbl.
To help diagnose this, added some extra debug that can be left in.
The extra debug changes the output of a couple of tests. Note that
the resulting IP allocations in those tests is unchanged - only the
debug output is changed.
Also add some new tests that illustrates the bug.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps. First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct. Then about 1/2
of the IPs and deleted and remaining IPs are checked. Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just enable this behaviour by default in the ip command stub, since
10.interface assumes/sets it. The rc.local replacement for set_proc()
doesn't do anything...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104
It should support primary and secondaries per network instead of per
interface.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Currently the lock is held until the corresponding eventscript
completes, since the process still exists. If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time. The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held. This can cause an unwanted monitor replay.
Change this so that the lock is released immediately after the
reconfiguration is complete.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"monitor" events can be cancelled. If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped. In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).
A long time ago we did service reconfiguration in "monitor" events
following failovers. Service reconfiguration was then moved to the
"ipreallocated" event. However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur. The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated". Therefore, IPs can be deleted without
running the required service reconfiguration.
The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.
This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.
Also update the associated tests. Make the first confirm that the
monitor event no longer does reconfiguration. Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
explain how to run individual tests and test collections and remove mention of
tests/scripts/run_tests which does not exist any more.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
At the moment run_tests.sh has quite fragile argument processing. It
needs that annoying "--" between options and tests. The random
default (mktemp -d) for TEST_VAR_DIR is wrong and is worked around in
various places.
Instead:
* Change the default behaviour to print a summary, add new option -N
to turn off summary, and remove old -s option.
* Change the default behaviour to run integration tests with local
daemons, add new options -c to run on a cluster, remove old -l
option.
* Make $testdir/var the default if the tests are not installed, and
$(mktemp -d ) the default if tests are installed.
* Move the default tests for local/cluster into scripts/run_tests.
run_tests.sh (and the run_cluster_tests.sh symlink) should behave as
before but with slightly more reasonable defaults.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Don't scatter the TEST_LOCAL_DAEMONS logic around the code. Limit it
to the local daemons file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
setup_ctdb() doesn't need to do anything on a cluster. To avoid a
conditional, just override it for local daemons.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This was the start of some refactorisation that was never completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This hasn't been required for a long time and is probably broken. If
it is needed in future then we know where to find it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This is just a straight move. The clever stuff will follow. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred. This code is horrible and fragile.
A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.
Also remove the old code that checks for tickle information in shared
storage. CTDB hasn't done things this way for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104
This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes scripts called in the unit tests behave like
when called from ctdbd which ignodes SIGPIPE.
This also makes the scrips behave the same when
called from "make autotest" directly and via autobuild (python).
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Otherwise this should use mktemp, something should look at the output
and the file should be removed. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Nov 27 20:39:00 CET 2013 on sn-devel-104
Using a variable is too fragile, so use a function instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Also match $TEST_VAR_DIR in the socket name. This means that we'll
only ever kill ctdbd process belong to our own test run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
* Low level DB checks should ignore the sequence number record.
* A restart is needed after messing with the RecoverPDBBySeqNum
tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Also add test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Currently it only passes the last (non -v) option seen. It should
pass them all.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
$CTDB_TEST_WRAPPER is required only to run test functions or test binaries
on remote nodes. For running ctdb command, $CTDB is sufficient.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Tue Nov 19 19:06:51 CET 2013 on sn-devel-104
This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.
This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46615c8e0e63291605d76a6d35f1a93180718c36)
This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d1674aad224f8f0c9a03c3cd38a647318ba0f03e)
This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.
This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration. Fixing it is hard and involves adding a more
complex stub for testparm. We already have that in the eventscript
unit tests above.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81b94fbb7495ac3204f1a84c673c8babf04663bc)
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning. Instead, put a
timeout on a foreground update.
If the foreground update fails:
* If there's no available cache file then die.
* If there is a previous cache file then use it and log a warning.
* Do a background update at the end of the monitor event.
Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value. Update the
associated test to add a comma.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562)
This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output. The fake "ctdb -Y status" output in the
test was never updated to reflect this change.
Instead of making sure that all columns are "0", just check that
they're not "1". This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.
Also update associated tests. The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.
This fixes samba bz#8122.
Signed-off-by: Martin Schwenke <martin@meltin.net>
onnode test fixup
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 01a46205c3a3d6609dc0b0324319b89667dffa32)
Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.
The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location. This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc67a4e24af9d07aead2a1710eeaf5d6cc409201)
Behaves like mkdir -p.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit afe2145d91725daf1399f0a24f1cddcf65f0ec31)
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.
Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded. It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.
Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8)
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.
Remove tests related to the removed checkers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
Some scripts are disabled by default so are no executable. Explicitly
running them under sh allows them to be run without having to mess
around and make them executable or similar.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9437d4809bfbbb5c6a32a610665333d2f641881d)
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog). If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
All CTDB configuration variables should start with CTDB_.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
Otherwise we end up with lots of useless temporary directories.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 63924ff372b066cd878b79e71f06de4c24c814a2)
* --public-interface is not needed
* Add --sloppy-start to speed up restarts
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0dec5b8e60316701fdd02150c4dd8f01aacbfda)
With the new persistent transaction code, sequence numbers will be
automatically updated whenever a record is updated.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 961dd5d0acbb971756944ea9f69992020ea7d9fc)
Main changes are:
libctdb_test.c -> ctdb_test_stubs.c
ctdb_tool_libctdb.c -> ctdb_functest.c
ctdb_tool_stubby.c is gone, replaced with existing ctdb_test.c.
Functions starting with "libctdb_test_" now start with
"ctdb_test_stubs_".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 6182bd0c19f215a997efe5272e633b1b1bd0c882)
Instead, override controls using preprocessor magic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 10aac42f30cc0d56dca42ece17d04ccbc321056d)
Specifying nodes to reload no longer uses -n.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d921b2756d5f1c4ad7a35fe120f6fda9f5bf5686)
The current implementation has a few flaws:
* A takeover run is called unconditionally when the timer goes even if
the recovery master role has moved. This means a node other than
the recovery master can incorrectly do a takeover run.
* The rebalancing target nodes are cleared in the setup for a takeover
run, regardless of whether the takeover run succeeds.
* The timer to force a rebalance isn't cleared if another takeover run
occurs before the deadline. Any forced rebalancing will happen in
the first takeover run and when the timer expires some time later
then an unnecessary takeover run will occur.
* If the recovery master role moves then the rebalancing data will
stay on the original node and affect the next takeover run to occur
if the recovery master role should come back to the original node.
Instead, store an array of rebalance target nodes in the recovery
master context. This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds. The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.
This means that it is possible to lose rebalance data if the recovery
master role moves. However, that's a difficult problem to solve. The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.
The long term solution is to avoid this nonsense completely. The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy. This also needs recovery
master stability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
... plus updates to test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7)
A monitor event following a "ctdb delip" might reconfigure services.
If the monitor event is cancelled then a service might be stopped but
not yet restarted and this could result in the subsequent monitor
events failing.
This obviously needs to be fixed in CTDB itself. This will happen by
making "ctdb reloadips" the supported way of reconfiguring IPs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 618ea3660e36e7bd92b686e1ca8728cf63c3c068)
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
This makes the gaps in the logs more obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
Also add it to the corresponding eventscript unit test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
While doing this:
* Explicitly assign RPC program and version information in
_nfs_check_rpc_common(). This is more lines of code but is easier
to read.
* Don't print the options when starting a service. Trying to print it
makes the code messy for little benefit.
Update the eventscript unit testing code and a Ganesha test to
reflect this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
That is, output that goes through background_with_logging() just gets
"&" prepended to each line. This is cleaner than having the tests
grovel through logs.
Update some 49.winbind/50.samba tests to deal with this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5)
They're hard to maintain and provide very little benefit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
This should minimise the chances of a control timing out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 63be516673c5d9c0d543617bf1bb8bca919956a8)
Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.
Also, do not disable 10.interface until it matters. Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45)
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.
Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
Regardless of whether a summary is being printed!
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1)
Refactor the NFS test setup/cleanup code into new common functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b)
Change the command from "true" to "hostname" since the former won't
produce any output when used in combination with "onnode -p". This
could just be changed to "echo" but the hostname might actually be
useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ae3c03d80264e997b7da9f3279d7810e18b8a1df)
This fixes the segmentation error if any of the test code fails to
connect to CTDB daemon.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d48eecd748830598f4f080952f2bf05d6f92738c)
Also check that we're not in recovery mode.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b7aaa28b3a6a2de923417f3d143f8d516447711e)
No need for 2 recoveries after a restart.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b953524185632d7f96a76d8f3bbed7ac1d143d40)
These test dropping of IPs and TDB checking.
New stubs for date, tdbdump, tdbtool.
Enhance ip stub to handle "ip addr show to ..."
Tweak some infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit aabf0bf41cb8ec344f06b69492fb6c2a27f9e900)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c3e7a6e10d486ba0dbafdf110db540675b2317bc)
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cd4358b01c6c3d413b431f5760029d2b163b9c03)
... and delete a bogus comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0e2b5a8f89440a53f996482ac0c98b31a4f2cad3)
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ce2ef2be8aa22c0baf868daac8d4cf27246baa14)
This is needed for AIX and possibly others.
Also provide a cheaper mktemp function is needed in the run_tests
script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
2 tests to show a bad result and a 3rd test for the fix.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ef35c8889d90220929e48e66eb62da9ea2025ede)
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined. Both of
these events can (re)start services.
This stops IPs being hosted before the "startup" event has completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)
Modifying the node flags with IP-allocation-only flags is not
necessary. It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.
Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags. As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446)
This has been replaced by set_ipflags() and associated functionality.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0a3822573db296e73cc897835f783c8abc084b3)
This is a no-op and is in a separate commit to make the previous
commit less cumbersome.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7cf63722873a6a7baafd77aa3d8a1989b221dee9)
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
Implemented for CTDB_SET_NoIPTakeover.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a1addd89fd9c0390912604097acd028cc24d3483)
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs
* Installation and packaging additions to handle nfs-rpc-checks.d/
* Unit test updates, including deleting 1 test that sanity checked
test infrastructure
* Test infrastructure changes to use nfs-rpc-checks.d/
Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs. To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7e792d6768d9ca420ce3713cb122e63afd594b15)
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().
Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
The current logic is horrible and creates an unnecessary file. Let's
make the script debug level independent of ctddb's debug level.
* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly
* Remove ctdb_set_current_debuglevel()
* Remove the "getdebug" command from ctdb stub in eventscript unit
tests
* Update relevant eventscript unit tests to use
$CTDB_SCRIPT_DEBUGLEVEL
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 85efa446c7f5c5af1c3a960001aa777775ae562f)
The comment explains that we use "ctdb stop" and "ctdb continue"
but we should use "ctdb setcrecmasterrole off".
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 06ac62f890299021220214327f1b611c3cf00145)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit b1577a11d548479ff1a05702d106af9465921ad4)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 2438f3a4944f7adbcae4cc1b9d5452714244afe7)
Curiously test_ctdb_sys_check_iface_exists fails on Linux
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit 109f428aa34f8f4cc0329880d2f4a5593a6cc6f3)
Ensure that RSN based recovery and __db_sequence_number__ based recovery
methods for persistent databases work correctly. They should not cause
corruption of the database.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 45d439a1ab093b420c27b1502ef109021833c7af)
Also update ips_are_on_nodeglob() to handle negation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 13a5944f8a27d43006acfffba76958693cae7702)
The default IP allocation algorithm used by ctdb_takeover_tests
changed from "non-deterministic IPs" to "LCP2". The latter generates
a lot more debug output. ctdb_takeover_tests is used by the ctdb tool
stub to calculate IP address changes for failovers. This resulted in
unexpected debug output that caused tests to fail. Since eventscript
tests don't care how IP allocations are arrived at, the best solution
is to turn down the debug level.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3cc596d2b459d834f9785b3a98027e46431ff2b9)
The retry loop is currently in ctdb_takeover_run_core(). Pushing it
into each function will make it possible to put each algorithm into a
separate top-level function. This will make the code much clearer and
more maintainable.
Also keep associated test code compatible.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)
Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so
there's no point looping back that far.
Also fix a unit test that now fails because looping back to handle
unassigned IPs is no longer logged.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c09aeaecad7d3232b1c07bab826b96818756f5e0)
3 tests should assign IPs to all nodes.
3 tests set NoIPTakeoverOnDisabled=1 and should drop all IPs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit edda58a45915494027785608126b5da7c98fee85)
Via $CTDB_SET_NoIPTakeoverOnDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d357d52dbd533444a4af6151d04ba119a1533068)
Default to LCP2, like ctdbd. Also support "det" for deterministic
IPs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 20631f5f29859920844dd8f410e24917aabd3dfd)
It looks like this restart was accidentally reintroduced in commit
fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure
became unset so the default action of restarting the service would
occur. From there cleanups have explicitly reintroduced it and
carried it through the code.
Also update the unit tests affected by this change.
The restart was originally removed in commit
bc481c3f1a44c50648488c4f8a7f15ec395d446f.
The default reconfigure action of restarting a service is clearly
suboptimal and will be addressed in a separate patch.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 3221fce9ee2f6fdd3bb17a5e1629ad52a32f90d6)
release suffix added by RPM is to track packaging changes. Core CTDB
version does not include the release suffix.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit aad1584da8a8425bc6f5163c95810e9d2390dc91)
This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099.
This is dangerous and, on reflection, I can't see it being useful.
There are often permanent IPs on interfaces that CTDB shares with its
public IPs.
(This used to be ctdb commit 16aba4eb620844626a1c71c58b51658caf44dea6)
Less copying and pasting is a good thing...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7d4b8cce96f33fff647a0c9d259c121dfc8403e9)
If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done
in the background with logging.
Fix some unit tests for samba and winbind.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
winbind and samba can be separately managed. This makes the service
starting and stopping code way too complicated, and even adds a small
amount of complexity to the monitoring code. The sensible option is
to split this eventscript in two.
There are two potentially backward incompatible changes here:
* Functionality has been removed that allowed 50.samba to manage
winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf
"security" parameter was set to "ADS" or "DOMAIN".
Maintaining this functionality would have required moving the
testparm-related code to the functions file, deciding where the
cache file should go, and then calling it from both 49.winbind and
50.samba. This feature wasn't of great value and asking
administrators to set an extra variable in exchange for code
simplicity seems like a reasonable deal.
* External code will need to be changed if it calls 50.samba directly
with winbind-related expectations. This is fairly obvious!
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)
Some code (e.g. NAT gateway code) modifies the returned result so was
modifying the original.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a3f15d2828325bbfba5bc5c0a30429e2ce572a44)
This involves refactoring ip_route_check_table() into a new function
ip_check_table() which tables the operation type (i.e. rule/route) as
an argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit acdaa04079a9827885f32a7bc078d3365c89b474)
Test the startup and monitor events too.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c29a943f9bbcfecb861e71d007c7698a53dc8773)
It currently needs the real testparm command installed even though it
only uses limited features. It is easy enough to fake up the
functionality that 50.samba uses.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7ef9916bd95ff2472359a412eac5489f1aad2dce)
The correct variable is $test_node_ips, not $ips.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8d17dacee415dd0b4268805a366a86f83e33f27c)
Sometimes "ctdb sync" doesn't do its job, so we end up with unassigned
IPs.
If $test_node isn't set then this is bad. However, try a few times to
ensure it is set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2fd0157382b42aa5c5212b8e743c6f589edc6662)
Note the old $CTDB_TEST_REAL_CLUSTER - it doesn't exist anymore...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 47180dc75d15f3d61470705603565b718491c9f8)
There's no point recalculating this value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 6e7bd9685406ae024d413a5d9d8c6e0d89b15567)
Instead of selecting the 1st pnn found, select the 1st one that isn't -1.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f02e501342112aab67aee95f253e29a670b29273)
If the record does not exist in persistent DB, RSN for that record is
considered 0. To write a record, RSN for that record should be set to 1,
otherwise the RSN check would fail.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ac89da4eea98fa686408c5671a6c44c0fd1d7a58)
There were two issues with this test:
1. Since the messages are sent from one node to the next, if a node
does not register for messages before CTDB on that nodes receives
the message, it will never be seen by ctdb_fetch and it would
block on receive and would not send any messages to next node.
The crude solution is to sleep just before the messages are sent,
so that ctdb_fetch on all nodes have registered for the messages.
2. If ctdb_fetch stops sending messages after timelimit expiry, the
next node will keep waiting to receive messages in event_loop_once().
The default timeout is 30 seconds for event_loop_once(). Adding a
timed event will always set the timeout value to the time remaining
for the timed event to expire.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit bc55e09fdac9f743d6428bfe0be77840ad0fd1ba)
Commit 13acd58c41fba1a33894fbd654fed69ea0eac322 mades this test fail,
since lockd:b and lockd:bs were incorrectly producing the same output.
(This used to be ctdb commit fd3b73d7e634f16cbb99d7d5a548e12f00d1aadb)
Tickle tests fail if run from a node involved in the test.
The condition is actually weaker than this: the test can't be run from
a CTDB node that is hosting public addresses that may be used by the
test.
Rework ctdb_test_check_real_cluster() to support checking this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 14012781c3751a514055df29ea70adfb12ecb2d9)
This is made possible by separation of public addresses files for
local daemons and the addition of get_ctdbd_command_line_option().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2bcd58b30d7cf6dd48ad7f019810c6965a44c85a)
This allows, for example, the public addresses file used by a
particular daemon to be known.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f4b7d14f2e3c7345e7a09abb27c32923fb78cbc4)
This allows a node's public addresses file to be hacked for testing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c7d6e4557d00de674737e2c8d6cbebaa2461c303)
If the -V option is given and no tests are supplied, the "cd" command
in run_tests.sh cause scripts/run_tests to interpret the argument to
-V incorrectly. Therefore, the wrapper scripts can't use "cd" because
they don't know what the options are doing!
Instead scripts/run_tests searches for each test relative to the
current directory and, if not previously found, then searches relative
to the top-level tests directory. This is a much better way of doing
things.
Given that run_tests.sh and run_cluster_tests.sh were starting to
contain duplicate complex logic, remove run_cluster_tests.sh and
replace it with a symlink to run_tests.sh. Run_tests.sh checks $0 to
see what options/defaults to use. Update INSTALL to deal with this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ed2db1f4e8d2b222d7f912a4a007ce48a23e83b0)
However, options must be followed by "--".
This also fixes:
* a bug where specifying tests caused local daemons to be used; and
* an incorrect comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 6b8507d4d3062e709409b3790117d87311b3460d)
However, options must be followed by "--".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit db8cf8f5e644a0b21a6040287887fee40f38d4db)
The previous commit 55006ea8999ab3721fcde81b92692661065f0688
highlighted an error in this test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9f20fbf91706db94f65f62dbd6a4e087890c1da9)
The policy routing tests write the configuration file into $CTDB_BASE,
as per rcommended practice. Unless this is in $TEST_VAR_DIR this
won't work sensible when the tests are installed.
Things are done slightly different than for /etc. Here we use
symlinks and we want them to be dereferenced.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 61c80f58a8cfbaca7e669ef8cd95b4f6b5dc66c7)
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.
Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a
(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
The policy routing tests modify /etc/iproute2/rt_tables, so this
directory should not be in the installation area.
Instead the contents of tests/eventscripts/etc are copied into a place
under $TEST_VAR_DIR where the directory can be modified with gay
abandon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a0afb4195caab39891a304b8b4eadd94cab7c4a7)
The link is hard to manage and has no real advantage.
The canned config is 2 of the 3 currently non-comment/whitespace lines
in config/ctdb.sysconfig.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 66d0b913301c3b1037278bcaa0452ecbe07248df)