1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

149 Commits

Author SHA1 Message Date
Ronnie Sahlberg
c93a968619 When trying to re-balance the ip assignment and shuffle ips from
nodes with many addresses to nodes with few addresses,
loop up to num_ips+5 times instead of only 5 times.

When we have very many public ips per node, we might need to loop more than
5 times or else we will exit without reaching optimal balance.

(This used to be ctdb commit aa8114a625a637277561a66c80bdece3c27e9e20)
2011-07-06 13:14:13 +10:00
Ronnie Sahlberg
f84bd3b5f1 Dont call the UPDATE event if both old and new interface is the same.
CQ S1018175

(This used to be ctdb commit 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7)
2011-05-04 13:29:29 +10:00
Ronnie Sahlberg
c04505724a IFACE handling. Assume links are always good on nstartup (they almost always
Simplify the handling of setting the links in the 10.interface eventscript
and remove the optimization to only call setifacelink on state change
to make the code simpler to read.

If a take ip event fails, flag the node as unhealthy.

Add a check to the interface script to check if the interface exists
or if it has been deleted.
So that we can capture and become UNHELTHY if someone deletes an interface
we are using to host public addresses.

(This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)
2011-04-11 07:40:05 +10:00
Ronnie Sahlberg
f82936402f IP reallocation. If a public address is already hosted on the node when we startup, log a warning message but do not cause the recovery to fail.
CQ S1022356

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 89f8169c24da96c1fdd0ac19b8a1e0e1df01a72a)
2011-03-14 13:35:53 +01:00
Ronnie Sahlberg
93bea39391 IPALLOCATION : If the node is held pinned down in "init" state
by external services failing to start, or blocking CTDBD from finishing the startup phase,
we can encounter a situation where we have not yet fully initialized, but a
remote recovery master tries to release a certain ip clusterwide.

In this situation the node that is pinned down in init/startup phase
would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces.

In this situation, we just need to remain unhealthy, there is on need to
also ban the node.

Remove the autobanning for this condition and just let the node remain in
unhealthy mode.
Banning is overkill in this situation when the system is broken and just
draws attention to ctdbd instead of the root cause.

(This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)
2011-01-13 09:42:01 +11:00
Ronnie Sahlberg
a9a6ae064d When assigning the single-public-ip during startup,
flag the interface as initially being "link ok"
so that we can add it and startup.

The eventscript can later drop the flag if required

(This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)
2010-12-13 14:24:04 +11:00
Ronnie Sahlberg
c2c53db49d during ip allocation, there are failure modes where a node might hold a ip address
but thinks it is still unassigned (-1).

add code to the recovery daemon to detect this case and trigger a reallocation
so that the ip gets covered

and change the takeip code to allow for this condition, taking on an ip address that is
already hosted.

cq s1021073

(This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)
2010-12-03 13:30:39 +11:00
Ronnie Sahlberg
dbcf0de18c Dont exit the update ip function if the old and new interfaces are the same
since if they are the same for whatever reason this triggers the system
to go into an infinite loop and is unrobust

The scriptds have been changed instead to be able to cope with this
situation for enhanced robustness

During takeover_run and when merging all ip allocations across the cluster
try to kepe track of when and which node currently hosts an ip address
so that we avoid extra ip failovers between nodes

(This used to be ctdb commit cf778b5aaf6356401e3985acccc7df9e08ab6930)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
6fa8e1fddb when we load the public address file, at the same time check if we are already hosting the public address, if so, set ourselves up as the pnn for that address
(This used to be ctdb commit 0f2a2dac91a61be188c3578c8bb89d47cbf9a0f8)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
5f76f3c0e2 Add a new tunable : DisableIPFailover that when set to non 0
will stopp any ip reallocations at all from happening.

(This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
87a0ece976 when creating/adding a public ip, set the initial interface to be the first interface specified
(This used to be ctdb commit 4308935ba48ac7a29e7523315acf580019715f0f)
2010-11-10 14:55:23 +11:00
Ronnie Sahlberg
d8d8b9e1d7 add a new serverid to send a message everytime an ip address is taken on the local node
(This used to be ctdb commit 1261f3d9702800a4e59550c881350daf479f00ef)
2010-09-13 15:43:19 +10:00
Ronnie Sahlberg
19211f99c8 remove an unused variable
(This used to be ctdb commit e07fdbaf12bbe84370bc47a1979fe198a06a6cc8)
2010-09-13 13:13:12 +10:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Ronnie Sahlberg
5aa5f3e7bf Remove the structure ctdb_control_tcp_vnn since this is identical to the structure ctdb_tcp_connection.
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".

Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'

(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
2010-08-18 12:36:03 +10:00
Rusty Russell
1a009aff73 takeover: prevent crash by avoiding free in traverse on RST timeout
After 5 attempts to send a RST to a client without any response, we free
"con"; this is done during a traverse.  This frees the node we are walking
through (the node is made a child of "con" down in rb_tree.c's
trbt_create_node() (Valgrind would catch this, as Martin confirmed).

So, we create a temporary parent and reparent onto that; then we free
that parent after the traverse, thus deleting the unwanted nodes.

CQ:S1019041
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 08f7f85477610a4916c1ec866aa467b28f1bbec3)
2010-08-18 11:40:17 +09:30
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Ronnie Sahlberg
4136f27145 When adding an ip at runtime, it might not yet have an iface assigned to it, so ensure that the next takover_ip call will fall through to accept the ip and add it.
(This used to be ctdb commit 2d60f96680d16c2992e2a35517822f88c12538b7)
2010-06-01 16:22:48 +10:00
Ronnie Sahlberg
92340e4d6f check if vnn is a valid pointer before dereferencing it
based on rustys patch for bz62783

(This used to be ctdb commit bdd250b9afdd1060cfd1e2b0f0a5a567150bb380)
2010-05-26 13:43:28 +10:00
Ronnie Sahlberg
4a43428440 The recent change to the recovery daemon to keep track of and
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.

Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.

BZ62782

(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
2010-05-03 15:47:17 +10:00
Ronnie Sahlberg
c3c7aa934f Make create_merged_ip_list() a static function since
it is not called from outside of ctdb_takeover.c

(This used to be ctdb commit 880896a27adfdd5173b2810b6b2f3889802046f0)
2010-05-03 15:47:06 +10:00
Ronnie Sahlberg
79fac9771d In the log message when we have found an inconsistent ip address allocation,
add extra log information about what the inconsistency is.

(This used to be ctdb commit d2e4a9912c4bd13eb4f12681adebe7e59a6d1fb2)
2010-05-03 15:46:36 +10:00
Ronnie Sahlberg
06885ea9a7 In the recovery daemon, keep track of which node we have assigned public ip
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.

If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.

(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
2010-04-08 14:25:26 +10:00
Ronnie Sahlberg
7f2f7364ad lower the loglevel for a debug message for redundant releases of public ips
(This used to be ctdb commit cfc1a4f878b61c85063af649d2339431e799647d)
2010-02-16 11:01:09 +11:00
Stefan Metzmacher
76cb4ce34c server: ban ourself if the ctdb and kernel knowledge of a public ip differs
metze

(This used to be ctdb commit 48e0af91113d6cead6cae3f28d8d8f610cacaa71)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
405368eeb0 server: give an error if we're getting an takeover_ip event with a wrong pnn
metze

(This used to be ctdb commit 2f44d6f3d290cc1b37b19ec34edfbad12cc0c0a7)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
a5ba5c129a server: return an error if we get an takeover ip event and we cannot serve the ip
metze

(This used to be ctdb commit f5c221e6abc118aefa489aa7e07755af952fd2bb)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
55d824bd77 server: print node number as signed integer on release ip event
metze

(This used to be ctdb commit 6c456face30606641f6b8beaad3121c9b05ca763)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
c5e579b56a server: debug redundant takeover ip events with level INFO
metze

(This used to be ctdb commit 7bc9969c4c28f2c4a4848bd730db3c63bb9204fe)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
ffdf32dedf server: be less verbose on redundant release_ip events
metze

(This used to be ctdb commit 72ef5f891f85ce51f5ca7e0c03d0c7cc955be110)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
58d7c44b1c server: add a ctdb_do_updateip()
metze

(This used to be ctdb commit eded224368dded2264e53546c196b1b485cb2094)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
aa485b17bb server: split out a ctdb_do_takeover_ip() function
metze

(This used to be ctdb commit 8fd6f4aab0c173b4c9c4c02c546e7d2ec1a98423)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
da59e0b162 server: split out a ctdb_announce_vnn_iface() function
metze

(This used to be ctdb commit ec87a51660cfa8a6851923f757fed31f7ffc7153)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
179c098e86 server: start with disabled interfaces and let the event scripts enable the interfaces explicit
This makes sure that we don't get public addresses assigned during the
initial recovery and remove them again in the startup event.

metze

(This used to be ctdb commit f872e8c63a2f8979e6a0d088630575bdd4d7b4f1)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
f4f72024fe server: implement ctdb_control_set_iface_link()
This only marks the interface status and doesn't
generate any directly triggered action.

The actions is later taken by the recovery process
in verify_ip_allocation.

metze

(This used to be ctdb commit cff58b27c970e9252d131125941c372019fd6660)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
0dd7e1bfa1 server: implement ctdb_control_get_ifaces()
metze

(This used to be ctdb commit 0e982a416a126d9856145c19baef320cd0e71d66)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
80e3ab04de server: implement ctdb_control_get_public_ip_info()
metze

(This used to be ctdb commit 486fbd15f4cc4f45a4c110b2ddbba48bade22c9f)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
32d00d0a0d controls: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE
metze

(This used to be ctdb commit a2c9e4578e149eccb2c6183f64a6b657eb95c5e1)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
37880b0d0a server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.

metze

(This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
d89604afab server: implement CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE behavior
metze

(This used to be ctdb commit 09a5c59bc8d1301edf60d7ae77504dc6d11a7da2)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
bea53c60b8 server: keep the interface information in a list of ctdb_iface structures
metze

(This used to be ctdb commit ff5291778f0752e176539397e9530dcf0e546bea)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
539ebdc94c server: we don't need to copy strings we pass as talloc_asprintf() arguments
metze

(This used to be ctdb commit 080ba5ac2195fb73ef6f18740abdde57a7b97151)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
a1da4e05b5 server: allow multiple interfaces comma separated in public_addresses
metze

(This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
8d50eda2b1 server: add a ctdb_vnn_iface_string() helper function to access vnn->iface
metze

(This used to be ctdb commit 9e5532e215892b2e0aadd9b106a730727f92c62e)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
bec35e6441 server: add a ctdb_set_single_public_ip() helper function
metze

(This used to be ctdb commit 400b4806c4a9686a2ee6398b5d7c3e0ca0793fd1)
2010-01-20 11:10:57 +01:00
Rusty Russell
928b8dcb31 eventscript: handle banning within the callbacks
Currently the timeout handler in eventscript.c does the banning if a
timeout happens.  However, because monitor events are different, it has
to special case them.

As we call the callback anyway in this case, we should make that handle
-ETIME as it sees fit: for everyone but the monitor event, we simply ban
ourselves.  The more complicated monitor event banning logic is now in
ctdb_monitor.c where it belongs.

Note: I wrapped the other bans in "if (status == -ETIME)", though they
should probably ban themselves on any error.  This change should be a
noop.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)
2009-12-07 23:48:57 +10:30
Ronnie Sahlberg
569001afd0 Merge commit 'martins/status-test-2'
Conflicts:

	server/eventscript.c

(This used to be ctdb commit e9b3477a5b9a2eff18f727e7d59338bfb5214793)
2009-12-01 10:53:18 +11:00
Martin Schwenke
a64ccf07c1 Add flag to ctdb_event_script_callback indicating when called by client.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)
2009-11-26 15:49:49 +11:00
Ronnie Sahlberg
926261aafc use a binary tree and sort all ipv4/v6 addresses before we assign them out on nodes.
(This used to be ctdb commit 862526e558099fad4c8259cb88da9b776aa7f80d)
2009-11-25 11:54:40 +11:00