1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00
Commit Graph

237 Commits

Author SHA1 Message Date
Martin Schwenke
950e23f664 ctdbd: Make ctdb_reloadips_child send controls asynchronously
Deleting IPs can take a while because IPs are released and connections
are killed.  This can take a while so do them in parallel.  In fact,
since the set of IPs being added and deleted will be disjoint, send
all the adds/deletes at the same time and then wait.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 85a5b544ec032173e98c9cc3b5402a76b961aa3b)
2013-09-19 12:54:31 +10:00
Martin Schwenke
b33ee7a2a4 recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE
The current implementation has a few flaws:

* A takeover run is called unconditionally when the timer goes even if
  the recovery master role has moved.  This means a node other than
  the recovery master can incorrectly do a takeover run.

* The rebalancing target nodes are cleared in the setup for a takeover
  run, regardless of whether the takeover run succeeds.

* The timer to force a rebalance isn't cleared if another takeover run
  occurs before the deadline.  Any forced rebalancing will happen in
  the first takeover run and when the timer expires some time later
  then an unnecessary takeover run will occur.

* If the recovery master role moves then the rebalancing data will
  stay on the original node and affect the next takeover run to occur
  if the recovery master role should come back to the original node.

Instead, store an array of rebalance target nodes in the recovery
master context.  This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds.  The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.

This means that it is possible to lose rebalance data if the recovery
master role moves.  However, that's a difficult problem to solve.  The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.

The long term solution is to avoid this nonsense completely.  The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy.  This also needs recovery
master stability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
2013-09-19 12:54:31 +10:00
Martin Schwenke
c503997746 recoverd: Move disabling of IP checks into do_takeover_run()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)
2013-09-19 12:54:30 +10:00
Martin Schwenke
701c450e90 recoverd: Fail takeover run if "ipreallocated" fails
Previously flagging a failure was probably avoided because of attempts
to run "ipreallocated" events on stopped and banned nodes, which would
fail because they are in recovery.  Given the change to a new control
and that fallback only retries the old method on active nodes, this
should never fail in reasonable circumstances.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)
2013-09-19 12:54:30 +10:00
Martin Schwenke
630196423a recoverd: Banned nodes should not be told to run "ipreallocated" event
They will reject it because they are in recovery.  This can result in
extra banning credits being applied to banned nodes.

This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
from the 1.2.40 branch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)
2013-09-18 17:16:35 +10:00
Martin Schwenke
8d11da3546 recoverd: Remove an orphaned comment
This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)
2013-09-11 15:35:16 +10:00
Martin Schwenke
4e62553fcb recoverd: Update a comment to use current terminology
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)
2013-09-11 15:35:10 +10:00
Martin Schwenke
1ae731198a recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c
This is an internal structure.  It was moved into ctdb_private.h a
long time ago to allow unit testing.  Unit test compilation was
changed shortly afterwards to make this unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)
2013-08-22 17:00:20 +10:00
Martin Schwenke
a5cb72cac3 ctdbd: Kill client process without checking for tracked child
Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
to ensure that CTDB never kills unrelated processes.  However, client
processes are unrelated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 782814288bb560099ee44b607bf35f3eddf37f82)
2013-07-29 15:58:51 +10:00
Martin Schwenke
f46ab595d1 recoverd: Call takeover fail callback only once per node
Currently the fail callback is called once per (takeip/releaseip) control
failure.  This is overkill and can get a node banned much too quickly.

Instead, keep track of control failures per node and only call fail
callback once per failed node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)
2013-07-29 15:48:48 +10:00
Amitay Isaacs
1c21f37e57 ctdbd: Set process names for child processes
This helps distinguish processes in process list in top, perf, etc.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
bcb64aa55f recoverd: Fix buffer overflow error in reloadips
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 41182623891d74a7e9e9c453183411a161201e67)
2013-07-05 15:52:34 +10:00
Martin Schwenke
dcdae86dc7 ctdbd: Log something when releasing all IPs
At the moment this is silent and it can be confusing to see IPs just
disappear.

Also, this message:

  Been in recovery mode for too long. Dropping all IPS

can cause anxiety when all IPs should already have been dropped.
Adding a comforting message saying that 0 IPs were dropped relieves
such anxiety.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4d0f26b306fc465d551d340b0e7dce4412eae3fd)
2013-07-05 15:52:33 +10:00
Martin Schwenke
7290798a41 recoverd: Clean up log messages in remote IP verification
The log messages in verify_remote_ip_allocation() are confusing
because they don't include the PNN of the problem node, because it is
not known in this function.

Add the PNN of the node being verified as a function argument and then
shuffle the log messages around to make them clearer.

Also fold 3 nested if statements into just one.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0942fa01cd422133fc9398f56b4855397d7bc86)
2013-07-05 15:52:33 +10:00
Martin Schwenke
26b161156a ctdbd: Release IP callback should fail if the IP is still hosted
At the moment there (at least) are 2 bugs that cause rogue IPs:

* A race where release_ip_callback() runs after a "subsequent" take IP
  has completed.  The IP is back on an interface but we unset
  vnn->iface in the callback.

* A "releaseip" eventscript times out.  We ignore the timeout and call
  it success, deleting the VNN even if the IP is still hosted.

  We could decide not to ignore the timeout and ban the node, but
  killing TCP connections can take a long time and that might result
  in a lot of manning.  We probably won't reinstate banning on
  "releaseip" until killing TCP connections has been optimised.

In both cases, a rogue IP can be avoided by leaving vnn->iface set and
simply failing the control.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c5797f2942e83da24df548ea07196fbbac0eab20)
2013-07-05 15:52:32 +10:00
Martin Schwenke
793233f6b6 ctdbd: Log warnings in release IP when unexpected interface is encountered
Previous code changes work around a potential problems but do not
provide useful information when the a problem occurs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f1f1b0c24b9b6cd24b83a4e4da16e179287ec6ac)
2013-07-05 15:52:32 +10:00
Amitay Isaacs
6391f61fbc build: Fix compiler warnings for uninitialized variables
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5408c5c4050539e5aa06a5e82ceb63a6cb5cef0c)
2013-07-04 20:43:52 +10:00
Mathieu Parent
d82b9ae410 build: Fix tdb.h path to enable building with system TDB library
(This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)
2013-06-14 16:45:27 +10:00
Martin Schwenke
1ab2bbb349 recoverd: Backward compatibility for nodes without IPREALLOCATED control
Consider the case of upgrading a cluster node by node, where some
nodes are still running older versions of CTDB without the
IPREALLOCATED control.  If a "new" node takes over as recovery master
and a failover occurs, then it will attempt to send IPREALLOCATED
controls to all nodes.  The "old" nodes will fail in a fairly
nondescript way (result == -1).

To try to handle this situation, fall back to the EVENTSCRIPT control
to handle "ipreallocated".  Only do this on the failed nodes.
However, do not do this on nodes that timed out (they've probably
implemented the control and we should call the regular fail_callback
to get those nodes banned) or for stopped nodes (since they can't
actually run the "ipreallocated" event via the EVENTSCRIPT control).

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b2654853ce9b7c18c5874b080bc94d3118078a5d)
2013-05-27 15:15:25 +10:00
Martin Schwenke
f35e9bba9b recoverd: Nodes can only takeover IPs if they are in runstate RUNNING
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined.  Both of
these events can (re)start services.

This stops IPs being hosted before the "startup" event has completed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)
2013-05-24 16:27:55 +10:00
Martin Schwenke
7f03618ae4 recoverd: Handle errors carefully when fetching tunables
If a tunable is not implemented on a remote node then this should not
be fatal.  In this case the takeover run can continue using benign
defaults for the tunables.

However, timeouts and any unexpected errors should be fatal.  These
should abort the takeover run because they can lead to unexpected IP
movements.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c0c27762ea728ed86405b29c642ba9e43200f4ae)
2013-05-24 16:27:55 +10:00
Martin Schwenke
116f62a7b3 recoverd: Set explicit default value when getting tunable from nodes
Both of the current defaults are implicitly 0.  It is better to make
the defaults obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1190bb0d9c14dc5889c2df56f6c8986db23d81a1)
2013-05-24 16:04:57 +10:00
Martin Schwenke
e78b064dcc recoverd: Whitespace improvements
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 473cfcb019f0cb4a094bf10397f7414f7923ee57)
2013-05-24 15:55:11 +10:00
Martin Schwenke
1a181a4284 recoverd: Use talloc_array_length() for simpler code
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f6792f478197774d2f3b2258c969b67c83e017ab)
2013-05-24 15:55:10 +10:00
Martin Schwenke
63577c96db ctdbd: Replace ctdb->done_startup with ctdb->runstate
This allows states, including startup and shutdown states, to be
clearly tracked.  This doesn't include regular runtime "states", which
are handled by node flags.

Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
2013-05-24 14:08:06 +10:00
Martin Schwenke
5fdf71b898 recoverd: takeover_run_core() should not use modified node flags
Modifying the node flags with IP-allocation-only flags is not
necessary.  It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.

Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags.  As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446)
2013-05-23 16:18:23 +10:00
Martin Schwenke
e769f8575a ctdbd: Log add and delete of IPs
At the moment, when someone deletes all the IPs on a node, all we see
are the release IP messages and we have to guess why.

Some would argue that add/release are more significant than
take/release so they should be logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3c3df1d6afec7e3e721f9bcd4e8b8e008fd6e50b)
2013-05-22 14:24:22 +10:00
Martin Schwenke
0baefba368 ctdbd: Removed bogus comment in ctdb_find_iface()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4a8d90d0812a3242f58a2a0e2aa0f528f60f7013)
2013-05-22 14:24:21 +10:00
Martin Schwenke
54e91df60d recoverd: Move IP flags into ctdb_takeover.c
These should never be seen outside the IP allocation code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e143abd16ccde2e0edfe103673d31a5fb06b6aef)
2013-05-09 12:55:42 +10:00
Martin Schwenke
50f19b5bd4 recoverd: Clear IP flags after IP allocation algorithm has run
If these flags are left set they will confuse other recovery daemon
code.

Factor the clearing code into new function clear_ipflags().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 45c776958017ea7001f061842c9e0f60e4a25f23)
2013-05-09 12:55:42 +10:00
Martin Schwenke
530020d83b recoverd: Remove unused mask argument and initial mask calculation
This has been replaced by set_ipflags() and associated functionality.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0a3822573db296e73cc897835f783c8abc084b3)
2013-05-07 16:20:47 +10:00
Martin Schwenke
ee7357de51 recoverd: When calculating rebalance candidates don't consider flags
This is really a check to see if a node is already hosting IPs.  If
so, we assume it was previously healthy so it isn't considered as a
rebalance candidate.  There's no need to limit this to healthy node,
since this is checked elsewhere.

Due to this the variable newly_healthy is renamed everywhere to
rebalance_candidates.

The mask argument is now completely unused.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 65e0ea6c2c0629e19349ba4b9affa221fde2b070)
2013-05-07 16:20:47 +10:00
Martin Schwenke
c9056b4f88 recoverd: Remove unused mask argument from IP allocation functions
This is a no-op and is in a separate commit to make the previous
commit less cumbersome.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)
2013-05-07 16:20:47 +10:00
Martin Schwenke
0445c988e2 recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled
This really needs to be per-node.  The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).

* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.

* Enhance set_ipflags_internal() and set_ipflags() to setup
  NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
  and/or whether nodes are disabled/inactive.

* Replace can_node_servce_ip() with functions can_node_host_ip() and
  can_node_takeover_ip().  These functions are the only ones that need
  to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST.  They
  can make the decision without looking at any other flags due to
  previous setup.

* Remove explicit flag checking in IP allocation functions (including
  unassign_unsuitable_ips()) and just call can_node_host_ip() and
  can_node_takeover_ip() as appropriate.

* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
2013-05-07 16:20:46 +10:00
Martin Schwenke
ac80824709 recoverd: Factor out new function all_nodes_are_disabled()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 12aef10e9889760d98f58c8d916f19d069fa381a)
2013-05-07 16:20:46 +10:00
Martin Schwenke
657162fb34 recoverd: Refactor code to get NoIPTakeover tunable from all nodes
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1fb5352d2b6918fcc6f630db49275d25a3eebe8d)
2013-05-07 16:20:46 +10:00
Martin Schwenke
17521b31b2 recoverd: Add debug message when dropping IPs in IP allocation
Update tests accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 91405282ba4abad4ad8e8c5f7ee4c83c75f38280)
2013-05-07 16:20:46 +10:00
Martin Schwenke
745c6bc363 recoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATED
This means "ipreallocated" is now run on stopped nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 83b61f7414b1f7a3424497ac987ca0724fba9eaa)
2013-05-06 13:38:21 +10:00
Martin Schwenke
2e59cd5428 ctdbd: New control CTDB_CONTROL_IPREALLOCATED
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)
2013-05-06 13:38:21 +10:00
Amitay Isaacs
77a29b3733 recoverd/takeover: Use IP->node mapping info from nodes hosting that IP
When collating IP information for IP layout, only trust the nodes that are
hosting an IP, to have correct information about that IP.  Ignore what all the
other nodes think.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1c7adbccc69ac276d2b957ad16c3802fdb8868ca)
2013-04-08 11:14:32 +10:00
Martin Schwenke
53bd183683 recoverd: Separate each IP allocation algorithm into its own function
This makes the code much more readable and maintainable.

As a side effect, fix a memory leak in LCP2.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6a1d88a17321f7e1dc84b4823d5e7588516a6904)
2013-01-08 10:16:11 +11:00
Martin Schwenke
2e8df43561 recoverd: New function unassign_unsuitable_ips()
Move the code into a new function so it can be called from a number of
places.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8adb255e62dbe60d1e983047acd7b9c941231d11)
2013-01-08 10:16:11 +11:00
Martin Schwenke
bcefb76884 recoverd: Move failback retry loop into basic_failback() and lcp2_failback()
The retry loop is currently in ctdb_takeover_run_core().  Pushing it
into each function will make it possible to put each algorithm into a
separate top-level function.  This will make the code much clearer and
more maintainable.

Also keep associated test code compatible.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)
2013-01-08 10:16:11 +11:00
Martin Schwenke
443fbb9e01 recoverd: Trying to failback more IPs no longer allocates unassigned IPs
Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so
there's no point looping back that far.

Also fix a unit test that now fails because looping back to handle
unassigned IPs is no longer logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c09aeaecad7d3232b1c07bab826b96818756f5e0)
2013-01-08 10:16:11 +11:00
Martin Schwenke
dfa7ce7b73 recoverd: basic_failback() can call find_takeover_node() directly
Instead of unassigning, looping back and depending on
basic_allocate_unassigned.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4dc08e37dec464c8785a2ddae15c7c69d3c81ac3)
2013-01-08 10:16:11 +11:00
Martin Schwenke
326328d520 recoverd: Don't do failback at all when deterministic IPs are in use
This seems to be the right thing to do instead of calling into the
failback code and continually skipping the release of an IP.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4c87e7cb3fa2cf2e034fa8454364e0a7fe0c8f81)
2013-01-08 10:16:11 +11:00
Martin Schwenke
ef403f70f2 recoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' set
If this is done earlier then some other logic can be improved.  Also,
this should be a warning since no error condition is set.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e06476e07197b7327b8bdac9c0b2e7281798ffec)
2013-01-08 10:16:11 +11:00
Martin Schwenke
a3911ed7bf recoverd: Fix a memory leak in IP allocation
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bcd5f587aff3ba536cb0b5ef00d2d802352bae25)
2013-01-08 10:16:11 +11:00
Martin Schwenke
4f0d68cba6 ctdbd: Clean up orphaned interfaces when an IP is deleted
Add a new function ctdb_remove_orphaned_ifaces() and call it in
ctdb_control_del_public_address().

ctdb_remove_orphaned_ifaces() uses a naive implementation that does
things in a very obvious way.  There are many ways to improve the
performance - some are mentioned in a comment in the code.  However, I
doubt that this will be a bottleneck even with a large number of
public IPs.  Running the eventscript is likely to outweigh the cost of
this cleanup.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a)
2013-01-07 12:19:33 +11:00
Martin Schwenke
0f1bcebc80 ctdbd: Make the link status of new interfaces more flexible
Neither up nor down is a good default value for the link status of a
new interface.  Up means that IPs can be assigned to interfaces before
the true state is known and they can move away quickly if the interface
is actually down.  Down means that IPs can't be assigned to an interface
for a variable amount of time - until a monitor cycle occurs - and this
can result in imbalanced IPs.

This is a neat compromise.  Before the startup event completes, IPs
can't be assigned to interfaces because all interfaces begin in a down
state.  As soon as the startup event completes, IPs can be allocated
to any interface that has been marked up by the eventscript.  Later,
during normal operation, newly added IPs can be assigned to new
interfaces immediately.  The IPs will still move away if an interface
is noticed to be down in the next monitor cycle, but that is the
exception rather than the rule.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9275a69a414482f1053ae14528d5972575b9214e)
2012-11-19 15:53:13 +11:00