IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
"ctdb reloadips" use of ipreallocate() can result in a spurious
takeover runs. This can cause a subsequent "ctdb reloadips" to fail
to disable takeover runs (due to there being one already in progress).
There are various possible improvements but a proper fix probably
requires a protocol change. That would mean receiving an ACK for a
takeover run request to indicate that the request will be processes
and then a broadcast to indicate a completed takeover run.
There are various other partial fixes (e.g. de-duping queued takeover
run requests against those in the in-progess queue) and workarounds
(e.g. always do a double ipreallocate() in the tool, which should
absorb the spurious takeover run).
However, this is unlikely to be a real-world problem. Real use cases
should not involve repeatedly reloading the IP configuration.
Instead, work around the problem of flaky tests by manually adding
"ctdb sync" commands to cause extra no-op takeover runs. These should
not add spurious takeover runs and will create synchronisation points
to help avoid the issue.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These commands are now replaced with ctdb event ...
ctdb scriptstatus is maintained for backward compatibility.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Aug 18 02:50:16 CEST 2016 on sn-devel-144
This drastically simplifies the code. "ctdb reloadips" behaves the
same, since it causes a takeover run immediately after IPs are
deleted. "ctdb delip" now needs to be followed with an explicit "ctdb
ipreallocate".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This test sets TakeoverTimeout=90 to avoid banning during takeover.
However, the setting is done on the test node instead of the recovery
master node. During "ctdb reloadips", the recovery master will used
the default value of TakeoverTimeout.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Current NFS and CIFS tickle tests do not test the killtcp
functionality on the releasing node. 2-way killing is done for NFS,
so this test explicitly looks for packets from the releasing node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
tcpdump does not support filtering on MAC address when reading from a
file. Therefore, this is implemented by conditionally using grep to
filter the output of tcpdump.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There's a tiny chance that the connection information may not be
transferred to other nodes quickly enough, so add an explicit wait.
Also clean up the description and recognise that it is the takeover
node that does the tickling.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is a gawk extension and can't be used reliably if just running
"awk". It is simple enough to switch to using the standard sub() and
gsub() functions.
The alternative is to switch to explicitly running "gawk". However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
tcptickle_sniff_start() assumes that if $dst contains a ': then it
should use the IPv6 sniffing code. However, $dst is a socket, so has
a trailing ":<port>".
Strip the trailing ":<port>" before checking for ':' as a marker for
an IPv6 address.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
These tests simulate a dead node rather than a CTDB failure, so drop
IP addresses when killing a "node" to avoid problems with duplicates.
To cope with a CTDB failure a watchdog would be needed to ensure that
the public IPs are dropped when CTDB dies. Let's not do that now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Dec 5 23:29:39 CET 2014 on sn-devel-104
Extend select_test_node_and_ips() to set $test_prefix in addition to
$test_ip.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are parentheses missing that stop the default pattern from
matching commands with trailing garbage (e.g. "exportfs.orig").
A careful check of POSIX (and running GNU sed with --posix) suggests
that "\|" isn't a supported way of specifying alternation in a regular
expression. Therefore, it is clearer to switch to extended regular
expressions so that this has a chance of being portable (even though
the point is to print /proc/<pid>/stack, which only works on Linux).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104
Some of this implements logic that exists in functions. Some of it is
overly complicated and potentially failure-prone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Debugging can still be running when a monitor event times out and
scriptstatus output changes.
When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done. The test
then waits for the log file to appear.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 3 08:19:23 CEST 2014 on sn-devel-104
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This one ensures that a newly started node gets an up-to-date tickle
list. Tweak some of the integration test functions to accommodate
this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is hard to diagnose failures in the NFS tickle test because there's
no way of telling if the test node doesn't have the tickle or if it
didn't get propagated.
Factor out check_tickles() into local.bash and give it some
parameters.
Have the NFS test call it first to ensure the tickle has been
registered. Then use new function check_tickles_all() to ensure the
tickle has been propagated to all nodes. Give this a bit of extra
time (double the timeout) just in case we're racing with the update.
Add a useful comment to the CIFS test so that I stop asking myself how
the test could ever have worked reliably. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
* Add stack dumps for "interesting" processes that sometimes get
stuck, so try to print stack traces for them if they appear in the
pstree output.
* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing
but the latter may be useful for live debugging.
* Load CTDB configuration so that above configuration variables can be
set/changed without restarting ctdbd.
Add a test that tries to ensure that all of this is working.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps. First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct. Then about 1/2
of the IPs and deleted and remaining IPs are checked. Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred. This code is horrible and fragile.
A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.
Also remove the old code that checks for tickle information in shared
storage. CTDB hasn't done things this way for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d1674aad224f8f0c9a03c3cd38a647318ba0f03e)
This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.
This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration. Fixing it is hard and involves adding a more
complex stub for testparm. We already have that in the eventscript
unit tests above.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81b94fbb7495ac3204f1a84c673c8babf04663bc)
Refactor the NFS test setup/cleanup code into new common functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b)
Tickle tests fail if run from a node involved in the test.
The condition is actually weaker than this: the test can't be run from
a CTDB node that is hosting public addresses that may be used by the
test.
Rework ctdb_test_check_real_cluster() to support checking this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 14012781c3751a514055df29ea70adfb12ecb2d9)
This is made possible by separation of public addresses files for
local daemons and the addition of get_ctdbd_command_line_option().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2bcd58b30d7cf6dd48ad7f019810c6965a44c85a)
Testing with local daemons is the current default but this is not the
most common use case. Therefore, we make local daemons optional by
using the -l switch with run_tests or by setting TEST_LOCAL_DAEMONS to
the number of daemons to be used (-l sets this to 3).
TEST_LOCAL_DAEMONS replaces CTDB_TEST_NUM_DAEMONS and
CTDB_TEST_REAL_CLUSTER is removed.
Most relevant logic is moved from ctdb_test_env to integration.bash.
ctdb_test_check_real_cluster() is moved from integration.bash to
complex/scripts/local.bash.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 72ecae61c43b318ec94b527a12cbb0a382e8c3db)
* run_tests no longer includes common.sh, which is only to be included
by test cases. Therefore, it defines its own die() function.
* TEST_SUBDIR is now set in common.sh
* Move complex-only functions to complex/scripts/local.bash
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bfa1d6638d3e116640eb4e3bb71b21ba6ef8cae5)