1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-15 23:24:37 +03:00

3683 Commits

Author SHA1 Message Date
Martin Schwenke
5f4ab05766 Eventscripts: new functions set_proc() and get_proc().
These provide a thin layer around writing and reading files in /proc.
They can be easily replaced by stubs for unit testing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 637f9d8af517b73c72ed8f3cc2a2661f11eb2126)
2011-08-03 17:04:58 +10:00
Martin Schwenke
571e55ac0d Eventscripts: remove ctdb_wait_command() and ctdb_wait_tcp_ports() functions.
These haven't been used for a long time.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f5fd361cadb3ea18d29e2d7215a7853718e48d00)
2011-08-03 17:02:41 +10:00
Ronnie Sahlberg
c6c3d83477 Merge remote branch 'martins/test_suite'
(This used to be ctdb commit 113c763f15ab1db3810f40504b60bab5d3f2f212)
2011-08-03 16:56:26 +10:00
Martin Schwenke
e3a9991e46 Eventscripts: iptables() should put lock in $CTDB_VARDIR.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3f04793f391c63b78ffb9c9851ab3f0daf3ed50a)
2011-08-03 16:55:43 +10:00
Martin Schwenke
3bbfdfcdd3 Make Emacs recognise that the eventscript functions file is a shell script.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a6dfb76cfa759f6f9409f24368111c4f85ca0fbf)
2011-08-03 16:49:38 +10:00
Martin Schwenke
3380c6ce1d Eventscript functions: add $CTDB_ETCDIR and hook service() functions.
* $CTDB_ETCDIR defaults to /etc but can be changed for testing.  All
  hard-coded instances of /etc have been changed to $CTDB_ETCDIR.
  This includes references to /etc/init.d and /etc/sysconfig.

* service() and nice_service() functions now call new function
  _service().  This makes it easier to override these functions (say,
  in rc.local) for testing and call most of the existing functionality
  using _service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f43c9a7604b779bb6257ddb2bf3cbe266d496a63)
2011-08-03 16:45:54 +10:00
Martin Schwenke
d31fbcab4b Set $CTDB_VARDIR in the functions file.
This will be needed when eventscripts that use it are called
externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ebd53b66b0cc66d9d04830781886234167fc2164)
2011-08-03 16:44:49 +10:00
Ronnie Sahlberg
8f03eed985 Merge remote branch 'martins/onnode_tests'
(This used to be ctdb commit 0384f1902bb64d6683b689de226fff4e54331c24)
2011-08-03 16:26:43 +10:00
Ronnie Sahlberg
8a0e932008 Merge remote branch 'martins/lcp2_sim'
(This used to be ctdb commit cc9a09b2cbe300ad5848932b9273270ad50ea6b0)
2011-08-03 16:25:46 +10:00
Martin Schwenke
000fbb607e Test suite: when the cluster flip-flops (un)healthy, using "ctdb status -Y".
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d3dc9410501767c07d9b0106bb73c979d869c127)
2011-08-03 16:06:40 +10:00
Martin Schwenke
7e48ba58c6 Test suite: Print debug info from cluster nodes when time jumps occur.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 21cdc7ed6942238faeb42983c862d4abc3f54ffb)
2011-08-03 16:06:40 +10:00
Martin Schwenke
372f0a1bff Test suite: Add debug for cluster (un)healthy flip-flop after restart.
We're seeing the cluster become healthy after a restart and then
revert to being unhealthy.  It looks like there's a race and the
cluster shouldn't have been healthy, given that we seem to see that
the monitor cycle hasn't yet been run.

This collects some state debug info from all nodes after the cluster
becomes healthy.  This is printed if the cluster is then unexpectedly
unhealthy a short time later.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2efb5897e4258df649149f9904d7ac47322e1b4)
2011-08-03 16:06:40 +10:00
Martin Schwenke
659f54e61a Test suite: add more debug to time jump post mortem.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fed3c2b80b8add8d1cf33abdd5dd8d8001af44d4)
2011-08-03 16:06:40 +10:00
Martin Schwenke
e05b902f99 Test suite: add automated checking of time logs.
This depends on the format of onnode output and also depends on
simple/00_ctdb_onnode.sh having been run.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93b53b186df55942bf4d9e90cae329f47889af72)
2011-08-03 16:06:40 +10:00
Martin Schwenke
7e5549a54e Test suite: make time log use seconds since epoch.
Easier to implement automatic checking.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 18db530880849b59445d7aa508bf218bdd77ea1c)
2011-08-03 16:06:40 +10:00
Martin Schwenke
88fc88caf5 Test suite: CTDB_SAMBA_SKIP_SHARE_CHECK test now uses _loadconfig().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 651e6703b6dc4d11ba7d6d0b44d3be1f485a0f75)
2011-08-03 16:06:40 +10:00
Martin Schwenke
3a18451cef Test suite: CTDB_NFS_SKIP_SHARE_CHECK test now uses _loadconfig().
The manual replacement of loadconfig() had bit rotted and no longer
worked.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bf23e7166385d305c6860b37c120f70a9aa33aa5)
2011-08-03 16:06:40 +10:00
Martin Schwenke
4f4cf7b100 Test suite: make time logging only happen on a real cluster, not local daemons.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a6b3a7b7db9aa5fc971aae11b9b012e72c7d240c)
2011-08-03 16:06:40 +10:00
Martin Schwenke
bb32a6cf70 Test suite: add time logging.
We're seeing some weirdness with CTDB controls timing out.  We're
wondering if time is jumping forward, so this creates a time log on
each node that we can examine later if tests fail weirdly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d82d89ee99f10bead101aebda645a80435ba246)
2011-08-03 16:06:40 +10:00
Martin Schwenke
7cec7807e1 Tests: eventscripts and onnode tests use stubs/ subdirectory instead of bin/.
This sets up a more useful convention and avoids future .gitignore
problems.

Resolved conflict while cherry-picking this:

  Don't take the eventscripts files for this branch.  We'll put them
  elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9879e37d4e3bb714ef6c0c4144c6949daec0b53)
2011-08-03 15:55:35 +10:00
Martin Schwenke
8006aec7b1 Tests: run_tests script no longer prints filename in summary descriptions.
If filenames should be printed in descriptions in the summary then the
descriptions should include the filename.  A better option is to
include something more human-readable that makes the test just as
easily identifiable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0efdbd61bdc2343e5459959b300bccc9986b1d78)
2011-08-03 15:51:44 +10:00
Martin Schwenke
3ee6a63e47 Tests: onnode tests changed to use a simple define_test() function.
This makes global changes easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3af086398fecb5f7c501190f9620b9c7b201f0ca)
2011-08-03 15:51:44 +10:00
Martin Schwenke
51ef4b4e55 Tests: add initial onnode tests
Add some simple tests for the onnode command.  These use fake ssh and
ctdb commands that are added to $PATH.  The infrastructure used is
quite flexible and would allow more complex tests to be written.

As-is, these tests expose some bugs in the an older version of onnode
that is included so it can be used to validate some of the tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f7f9d0943474cb2de7832d7ca95210ea9e9c772b)
2011-08-03 15:51:44 +10:00
Martin Schwenke
8d2c726deb Tests: change output format of run_tests script and add -q option
Putting PASSED/FAILED on the left makes it easier to scan the results
and simplifies the code.  Also put starts around the word "*FAILED*"
to make it more obvious.

Also add a -q option to throw away test output and only display the
summary (if -s is also specified).

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c44b632b010b7d57007f3c8f294271c7e0217e0d)
2011-08-03 15:51:44 +10:00
Martin Schwenke
eae91c959e Test suite: add a -d option to the run_tests script.
This causes summary lines (when used with -s) to be pretty printed and
include the test description.  This is the 4th line of the test output
- that is, immediately after the header.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0e5cc2a58b0d38e10a2ef9e81dc887c20f3fbdcb)
2011-08-03 15:51:44 +10:00
Martin Schwenke
07d2ecfbcc ctdb natgwlist should return non-zero when there is no natgw.
This makes it 2, since this error corresponds loosely to ENOENT.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1bf289abdd3067a40e9a67091aba78222d13eddf)
2011-08-03 15:39:33 +10:00
Ronnie Sahlberg
82d59bbc8e make test: add two missing events to the special test eventscript
(This used to be ctdb commit 771b1e9c2e694ccc8825fb8088174c122532e74d)
2011-08-02 19:25:14 +10:00
Martin Schwenke
652bf326e1 Eventscripts - 10.interfaces should not check orphaned interfaces.
If the last IP address on an interfaces is removed then that
interfaces should no longer be checked by 10.interfaces.  However,
"ctdb ifaces" still lists such interfaces so they are currently
checked.

The problem really needs to be addressed in ctdbd but a neat quick
eventscript fix will be minimally invasive...

This changes the code to use "ctdb -Y ip -v" instead of "ctdb -Y
ifaces".  The former includes details of all public addresses and
associated interfaces, so when an address is removed there is no
output for it.  This avoids orphaned interfaces from being listed.

The logic is also slightly improved so that $IFACES includes just a
(non-uniquified) list of interfaces, allowing an existing loop to be
removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443)
2011-08-02 16:53:14 +10:00
Martin Schwenke
7fcfea6141 IP allocation simulation - Pad IPv4 addresses in LCP2 algorithm.
This makes IPv4 addresses comparable with IPv6 but reduces the overall
effectiveness of the algorithm.  The alternative would be to treat
these addresses separately while trying to keep all the IPs in overall
balance...  which is basically the problem that LCP2 solves.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3a7624f9d468b99714a7b6a45313f9e7f66011ed)
2011-07-29 14:32:07 +10:00
Martin Schwenke
0a96e936f2 IP allocation simulation - make stats label for LCP2 imbalance more meaningful.
This time in the stats summary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit aabb2507dacc63ae026e6c99704a2fb79950e82c)
2011-07-29 14:32:07 +10:00
Martin Schwenke
581375d56d IP allocation simulation - make stats label for LCP2 imbalance more meaningful.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 721a06e28bacf9e03fd8eb4aff53dd17c363ffa1)
2011-07-29 14:32:07 +10:00
Martin Schwenke
adf8dbe8c0 IP allocation simulation - add examples.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8a1ae0c5a3aa788ed0f29c264249ba7bc5d226a7)
2011-07-29 14:32:07 +10:00
Martin Schwenke
85c0b10a38 IP allocation simulation - tighten up termination condition for -x.
When there are IP groups, do not terminate when the overall cluster
goes out of balance.

Also make explicit that grat_ip_moves is an integer not a boolean, so
only terminate if it is greater than 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0899f14b1483682d73d1ee2d2419db54ffeadc4b)
2011-07-29 14:32:07 +10:00
Martin Schwenke
d2ec92ba71 IP allocation simulation - fix documentation for diff() function.
It had out-of-date information and a typo.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d0d2b8b528414c859da0e6fd5959321db33608b)
2011-07-29 14:32:07 +10:00
Martin Schwenke
c84310e512 IP allocation simulation - add mean imbalance statistics.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b541194d6075e5db72fb691fb79dc81659771cb1)
2011-07-29 14:32:07 +10:00
Martin Schwenke
9f8a781ff1 IP allocation simulation - add -A/--aggressive option.
This is likely to cause many more state changes for nodes.  In this
mode the odds of a failover are applied to determine whether a state
change occurs for each node.  If no state change occurs then the
process is repeated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b7c42bff9457ec8294b04245af8e3b6010707d1a)
2011-07-29 14:32:07 +10:00
Martin Schwenke
0829e1a22b IP allocation simulation - add LCP2 imbalance metric to node state output.
Print the LCP imbalance metric after the list of IPs.

To make this more sensible, but most of the printing logic into the
Node class.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2e680e6b421d72cf2d217d3c3c1564da0bb19633)
2011-07-29 14:32:07 +10:00
Martin Schwenke
04196c78c7 IP allocation simulation - add analysis of IP groups.
The public addresses passed to the node constructer can be nested 2
levels.  Each sub-list is an IP group for which separate balance
analysis is done.  However, the public address list is flattened and
the actual IP assignment algorithm doesn't know about IP groups.

This allows extra statistics to be printed and an extra termination
condition to be added for unbalanced IP groups.

Most code from calculate_imbalance() is factored out to a a new
function imbalance_for_ips(), which calculates imbalance for the given
IPs.  calculate_imbalance() now returns the overall imbalance and a
list containing imbalances for each IP group.  To support this
node_ip_coverage() now takes an optional list of IPs to check coverage
within.

This also adds extra output to show statistics for the LCP2 imbalance
metric.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 701395087156b2a5c7be1564897b796df35b69ec)
2011-07-29 14:32:07 +10:00
Martin Schwenke
43e9a7b6c4 IP allocation simulation - add -H/-S options for hard/soft imbalance limit.
An imbalance exceeding the hard limit, as specified by -H (and
defaulting to 1), now causes termination when -x is specified.

Imbalances exceeding the soft limit, as specified by -S (and
defaulting to 1), are counted and printed in the statistics summary.

A side-effect is that imbalances less than 2 are no longer rounded
down to 0, since we want to see them in the stats.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b5e9a4c50eedb8cc786c52af06352788ca25f51e)
2011-07-29 14:32:07 +10:00
Martin Schwenke
2acf892e6e IP allocation simulation - add LCP2 algorithm.
Add -L/--lcp2 option and implement LCP2 algorithm as an alternative to
the basic non-deterministic algorithm.

Existing examples will break if used with LCP2 since it needs real IP
addresses.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 22b14e1a887f0479cc75ed9027af5cc24797f217)
2011-07-29 14:32:07 +10:00
Martin Schwenke
c3bdf4a0a1 IP allocation simulation - options.exit is boolean, so don't compare with 0.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 68a49739763b7125382186504b9cb9b770cfde0e)
2011-07-29 14:32:07 +10:00
Martin Schwenke
cec93a2ffe IP allocation simulation - remove unused function find_least_loaded_node().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ff3b73b1ccb795fac98b26e038f41f5e32f0d6b)
2011-07-29 14:32:07 +10:00
Martin Schwenke
fb385a2043 IP reallocation simulation - remove --hack option.
The hacks were attempts at improving the deterministic IPs algorithm
but they didn't work.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6034de0e24438e012f9f1d2065531b1ce467ac52)
2011-07-29 14:32:07 +10:00
Martin Schwenke
540c2cbcfd IP allocation simulation - add debug output using -vv.
-v can now be provided more than once to increase verbosity.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ce4fb56c9972a854bd139429b6f4a26e8d5c3956)
2011-07-29 14:32:07 +10:00
Ronnie Sahlberg
a17ae8a8be Merge branch 'master' of 10.1.1.27:/shared/ctdb/ctdb-master
(This used to be ctdb commit 518945e59e2e48f07fcc0955f3aa81cd0d946aea)
2011-07-29 09:04:01 +10:00
Martin Schwenke
5ac67504ca Tests: Initial test code for LCP2 IP allocation algorithm.
Move struct ctdb_public_ip_list to ctdb_private.h and put some
definitions for some functions from ctdb_takeover.c there.  This
allows those functions to be called from unit tests.

Add ctdb_takeover_tests.c and the Makefile support to build it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)
2011-07-29 09:01:36 +10:00
Martin Schwenke
ff1a81c872 IP allocation - add LCP2 algorithm.
The current non-deterministic IP allocation algorithm balances IPs
across the whole cluster.  It does not consider different
interfaces/VLANs/subnets, so these different groups of IPs aren't
generally well balanced.

This adds the LCP2 algorithm for IP allocation and allows it to be
enabled by setting the "LCP2PublicIPs" tunable to 1.

The LCP2 algorithm calculates the imbalance of a node by totalling the
squares of the distances between each IP on the node.  The IP distance
is defined as the length longest common prefix (LCP) of bits that is
found when comparing 2 IPs.  The imbalance of a cluster is the maximum
imbalance for any node.  At each step the algorithm selects an
allocation to the IP/node combination that results in the choosing the
allocation that best reduces the imbalance of the cluster.

The implementation splits out the IP allocation part of
ctdb_takeover_run() into new function ctdb_takeover_run_core(), and
then extracts out the basic IP assignment code into new functions
basic_allocate_unassigned() and basic_failback().  3 new functions
lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement
the LCP2 algorithm, and are hooked into ctdb_takeover_run_core().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)
2011-07-29 09:01:17 +10:00
Ronnie Sahlberg
a5cd8a3270 Merge branch 'master' of 10.1.1.27:/shared/ctdb/ctdb-master
(This used to be ctdb commit 0e60a738f9a6275ed45abc3d933f872d93132d92)
2011-07-29 08:53:43 +10:00
Ronnie Sahlberg
e707f23596 Update the delip command
Dont talloc_free(vnn) immediately but postphone it until later when
the eventscript callback has completed.

CQ S1026664

(This used to be ctdb commit 0a99e8742a261b1d3a2c8830f5c19ea6c2c47cad)
2011-07-29 08:50:48 +10:00
Rusty Russell
87ea4818bf eventscript: fix callback after free
ctdb_event_script_callback() takes a mem_ctx arg which it doesn't use, but
the implication is pretty clear, that when that mem_ctx is freed, the callback
shouldn't happen.  Indeed, Ronnie reproduced a case where that callback
refers to freed memory, in the ip reallocation code under stress.

So attach the callback to the mem_ctx they give us, and remove it from the
script state structure when that's freed.  It's a bit weird, but it works.

CQ: S1026179
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 6fcd867cc835ef1ffc1c50964f135c346503d40c)
2011-07-29 08:50:39 +10:00