1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-12 09:18:10 +03:00
Commit Graph

73 Commits

Author SHA1 Message Date
Ronnie Sahlberg
d8433cacb2 first cut to convert takeover_callback_state{}
to use ctdb_sock_addr instead of sockaddr_in

(This used to be ctdb commit 5444ebd0815e335a75ef4857546e23f490a22338)
2008-06-04 17:12:57 +10:00
Ronnie Sahlberg
598fba7fad fix a comment
note that we dont actually send the ipv6 "gratious arp" on the wire just yet.
(since ipv6 doesnt use arp)
but all the infrastructure is there when we implement sending raw neig.disc. packets

(This used to be ctdb commit b87fab857bc9b3537527be93b7f68484502d6b84)
2008-06-04 15:23:06 +10:00
Ronnie Sahlberg
7d39ac131b convert handling of gratious arps and their controls and helpers to
use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6

(This used to be ctdb commit 86d6f53512d358ff68b58dac737ffa7576c3cce6)
2008-06-04 15:13:00 +10:00
Ronnie Sahlberg
92a0c0fc13 lowe the loglevel for the warning that releaseip was called for a non-public address.
the address might be a public address on a different node so no need to fiull up the logs with thoise messages

(This used to be ctdb commit c8181476748395fe6ec5284c49e9d37b882d15ea)
2008-05-21 11:50:41 +10:00
Ronnie Sahlberg
9c23bf7776 lower the loglevel for when we have "tickles" for an ip address that is not a public address on the local node (it may be a public address on other nodes)
(This used to be ctdb commit 1360c2f08a463f288b344d02025e84113743026d)
2008-05-21 11:44:50 +10:00
Ronnie Sahlberg
f4fd4d0af8 dont disable/enable monitoring for each eventscript, instead
just disable the monitoring during the "startrecovery" event and enable it again once recovery has completed

(This used to be ctdb commit 68029894f80804c9f31fc90ed0c1b58f75812c3d)
2008-05-16 08:20:40 +10:00
Ronnie Sahlberg
909ff219e0 Start implementing support for ipv6.
This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well.

Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close)  a raw socket of the appropriate type inside ctdb_sys_send_tcp().
We know which type of socket v4/v6 to use based on the sin_family of the destination address.

Since ctdb_sys_send_tcp() opens its own socket  we no longer nede to pass a socket
descriptor as a parameter.  Get rid of this redundant parameter and fixup all callers.

(This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)
2008-05-14 15:47:47 +10:00
Ronnie Sahlberg
0d7b34c9e5 Add two new controls to add/delete public ip address from a node at runtime.
The controls only modify the runtime setting of which public addresses a node
can server and does not modify /etc/ctdb/public_addresses.
To make the change permanent you also need to edit /etc/ctdb/public_addresses
manually.

After ip addresses have been added/deleted you need to invoke a recovery
for the ip addresses to be redistributed.

(This used to be ctdb commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c)
2008-03-27 09:23:27 +11:00
Ronnie Sahlberg
e19264ea26 change the log level for the message when someone connects to a non-public ip
(This used to be ctdb commit bc9c4f0d52e9b06aceb08cea99ed3fd20b44616c)
2008-03-13 07:54:55 +11:00
Ronnie Sahlberg
a89ed0fdc2 add a new tunable 'NoIPFailback'
when this tunable is set, ip addresses will only be failed over when a node
fails. And only those ip addresses held by the failed node will be reallocated
in the cluster.

When a node becomes active again, this will not lead to any failback of ip addresses.

This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes.

This tuneable can NOT be active at the same time as DeterministicIPs are used.

(This used to be ctdb commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4)
2008-03-03 12:52:16 +11:00
Ronnie Sahlberg
e08519b74d when we reallocate the ip addresses for nodes, we must make sure that
a node that has been allocated to server an ip actually CAN serve that ip
(if we use differing public_addresses files on each node)

(This used to be ctdb commit fdaf7cb2d7682507fbf4c6c2b833b327c93fac08)
2008-03-03 10:53:23 +11:00
Andrew Tridgell
f6e53f433b merge from ronnie
(This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c)
2008-02-04 20:07:15 +11:00
Andrew Tridgell
9d6ac0cf55 added debug constants to allow for better mapping to syslog levels
(This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)
2008-02-04 17:44:24 +11:00
Andrew Tridgell
146d4b0db7 merge async recovery changes from Ronnie
(This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2)
2008-01-29 13:59:28 +11:00
Andrew Tridgell
e9987cf236 fixed a warning
(This used to be ctdb commit f34d0f9351c1cda3327efb14e173f249f7854570)
2008-01-05 09:30:49 +11:00
Andrew Tridgell
7edb41692e merge from ronnie
(This used to be ctdb commit 6653a0b67381310236e548e5fc0a9e27209b44e0)
2007-12-03 10:19:24 +11:00
Ronnie Sahlberg
50573c5391 add ctdb_disable/enable_monitoring() that only modifies the monitoring
flag.
change calling of the recovered/takeip/releaseip event scripts to use 
these enable/disable functions instead of stopping/starting monitoring.

when we disable monitoring we want all events to still be running
in particular the events to monitor for dead nodes  and we only want to 
supress running the monitor event scripts

(This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)
2007-11-30 10:09:54 +11:00
Ronnie Sahlberg
8ac8cce487 dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE
control, instead call ctdb_start/stop_monitoring()

ctdb_stop_monitoring() dont allocate a new monitoring context, leave it 
NULL. Also set the monitoring_mode in this function so that 
ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync.
Add a debug message to log that we have stopped monitoring.

ctdb_start_monitoring()  check whether monitoring is already active and 
make the function idempotent.
Create the monitoring context when monitoring is started.
Update ->monitoring_mode once the monitoring has been started.
Add a debug message to log that we have started monitoring.

When we temporarily stop monitoring while running an event script,
restart monitoring after the event script wrapper returns instead of in 
the event script callback.

Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring.

dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If 
monitoring is disabled, this event handler will not be called.

(This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa)
2007-11-30 08:44:34 +11:00
Ronnie Sahlberg
3c1f9882a8 revert 773
(This used to be ctdb commit 5a1c8f458ddc9b0ff532afda6007e32db10a71c8)
2007-11-12 10:23:35 +11:00
Ronnie Sahlberg
df5dd43e7c add a new tunable "CheckNodesFile" that when set to 0 will disable the
check in the recovery daemon that all nodes are using the same 
/etc/ctdb/nodes file.

Also add some more missing checks that the pnn used is a valid pnn 
before using it to dereferencing the ctdb->nodes array


This is useful since it allows us to add more physical nodes to a an 
existing cluster without having to bring down the entire cluster.

The to add an additional node to an existing cluster would then be
1, on all nodes set CheckNodesFile=0 using 'ctdb setvar'
2, on all nodes add CTDB_SET_CheckNodesFile=0 to /etc/sysconfig/ctdb
For each each node, one at a time :
3, use 'ctdb disable' to stop the hosted services
4, service ctdb stop
5, service ctdb start
Once all nodes have been restarted 
6, on all nodes remove CTDB_SET_CheckNodesFile=0 from 
/etc/sysconfig/ctdb
7, on all nodes set CheckNodesFile=0 using 'ctdb setvar'

8, configure and start up the new node

During this procedure, only one node at a time was brought 
down/restarted and was so only for a short period.

(This used to be ctdb commit 462501a32143e943ce350bd904a47c0955414a51)
2007-11-05 13:36:11 +11:00
Ronnie Sahlberg
056aac6e0c add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set,    the first entry in /etc/ctdb/public_addresses will 
always be hosted by node 0, when that node is available, the second 
entry by node1 and so on.

This tunable allows the allocation of addresses to become very 
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are 
identical on all the nodes in the cluster.

(This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
2007-10-16 12:15:02 +10:00
Ronnie Sahlberg
bdd67bba1e add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip 
address


update the killtcp control in the daemon that if a socketpair that is to 
be killed does not match a normal public address it checks if the 
destination address maches the single public ip address and if so uses 
that vnn structure from the ctdb context


this allows killtcp to kill also connections to the single public ip 
instead of only normal public addresses

(This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)
2007-10-10 09:42:32 +10:00
Ronnie Sahlberg
7735957693 remove some debug outputs
(This used to be ctdb commit f29c0b52df1f455909ba133e3ad3bc462dc32929)
2007-10-09 13:45:42 +10:00
Ronnie Sahlberg
80cd82f8e4 add a control to send gratious arps from the ctdb daemon
(This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)
2007-10-09 11:56:09 +10:00
Andrew Tridgell
30de14fe79 force recovery if unable to tell a node to release an IP
(This used to be ctdb commit 6895788d2499344a03357e5c1103cb8383e9eaf7)
2007-09-13 11:19:49 +10:00
Andrew Tridgell
3c0f61cb92 we don't need the is_loopback logic in ctdb any more
(This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)
2007-09-13 10:45:06 +10:00
Andrew Tridgell
67bd64ef35 - don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier
(This used to be ctdb commit 5f9aed62adaf87750f953412c55b29c58e4bb6c0)
2007-09-12 13:22:31 +10:00
Andrew Tridgell
5b65a6c7f0 get interface right
(This used to be ctdb commit e0edc38d7e897f7de2850eb2cfd17fea75c16fcc)
2007-09-10 20:45:27 +10:00
Andrew Tridgell
f3ae1cdb02 - use struct sockaddr_in more consistently instead of string addresses
- allow for public_address lines with a defaulting interface

(This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)
2007-09-10 14:27:29 +10:00
Andrew Tridgell
42168177ef merge from ronnie
(This used to be ctdb commit 1f21d4d563232926c35d03c4d69eb69190823dc6)
2007-09-10 13:21:11 +10:00
Ronnie Sahlberg
4ac749bfa4 change the signature to ctdb_sys_have_ip() to also return:
a bool that specifies whether the ip was held by a loopback adaptor or 
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address 
over to the loopback interface

when we release an ip address  after we have move it onto loopback, 
use 60.nfs to kill off the server side (the local part) of the tcp 
connection   so that the tcp connections dont survive a 
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip 
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public 
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
2007-09-10 07:20:44 +10:00
Ronnie Sahlberg
e4eeceaf3a dont dereference vnn before we have assigned it a pointer value
(This used to be ctdb commit 2a8fc69aea8527b22a3fe57427677e4caff57338)
2007-09-05 14:29:44 +10:00
Ronnie Sahlberg
77ec4d5248 allow different nodes in the cluster to use different public_addresses
files
so that we can partition the cluster into different subsets of nodes 
which each serve a different subset of the public addresses

(This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)
2007-09-04 23:15:23 +10:00
Ronnie Sahlberg
8f819c6a0e get rid of the ctdb_vnn_list structure and just use a single list of
ctdb_vnn

(This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)
2007-09-04 18:20:29 +10:00
Ronnie Sahlberg
cf45c5096c we cant have takeover_ctx hanging off ctdb since it is freed/recreated
everytime we release an ip.
this context is used to hold all resources needed when sending out 
gratious arps and tcp tickles during ip takeover.

we hang it off the vnn structure that manages that particular ip address 
instead   so that we can have multiple ones going in parallell

this bug (or the same bug in different shape) has probably been in ctdb 
for very very long   but is likely to be hard to trigger

(This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)
2007-09-04 14:36:52 +10:00
Ronnie Sahlberg
3e6be59f61 fix typo in debug output
(This used to be ctdb commit 011a777c6e538ca79f104c7884a4f0e222997382)
2007-09-04 14:21:35 +10:00
Ronnie Sahlberg
784eac9079 dont just always return 0 from the killtcp control.
return 0 or -1 so that the ctdb tool knows whether the control succeeded 
or not

(This used to be ctdb commit cace8b40090be5529ec6b463d3839d0e22f4039d)
2007-09-04 14:19:18 +10:00
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
12ebb74838 change how we do public addresses and takeover so that we can have
multiple public addresses spread across multiple interfaces on each 
node.

this is a massive patch since we have previously made the assumtion that 
we only have one public address per node.

get rid of the public_interface argument.  the public addresses file 
now explicitely lists which interface the address belongs to

(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
2007-09-04 09:50:07 +10:00
Andrew Tridgell
7f630b67f6 fixed segv when no public interface is set
(This used to be ctdb commit 55b415f87bd3cba13c73ccd2fe661720754a6af7)
2007-08-27 11:49:42 +10:00
Ronnie Sahlberg
7e1f840c8d if a public address has already been taken over by a node, then let that
public address remain at that node until either the node becomes 
unhealthy or the original/primary node for that address becomes healthy 
again.


Othervise what will happen is 
1, if we ban a node,   the banning code immediately does a 
takeover_run() and reassigns the public address to a different node in 
the cluster.
2, a few seconds later (at most) the recovery daemon will detect that 
the number of nodes has shrunk and will initiate a recovery.
During the recovery  the public address would again be assigned to a 
node, this time a different node.

(This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)
2007-08-20 14:16:58 +10:00
Ronnie Sahlberg
3b9d50f3ee change the now rather small /etc/ctdb/events script into a service
specific script /etc/ctdb/events.d/00.ctdb

get rid of CTDB_EVENTS_SCRIPT and --event-script

(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
2007-08-15 15:01:31 +10:00
Ronnie Sahlberg
4023576e50 call the service specific event scripts directly from the forked child
instead for from /etc/ctdb/events so that we can get better debugging 
output in the logs when something fails in the scripts

(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)
2007-08-15 14:44:03 +10:00
Ronnie Sahlberg
56d5ef27b6 add a wrapper function to create the key used to insert/lookup a certain
tcp connection in the tree that stores the tcp connections to kill by 
sending an RST

add a define that specified the keylength instead of hardcoding it as 4

(This used to be ctdb commit 6a8322cbae10f2c78b2e286c75aeb25ece12ea7f)
2007-08-15 10:01:00 +10:00
Ronnie Sahlberg
adb49f02f0 change the mem hierarchy for trees. let the node be owned by the data
we store in the tree and use a node destructor so that when the data is 
talloc_free()d we also remove the node from the tree.

(This used to be ctdb commit b8dabd1811ebd85ee031563e95085f720a2fa04d)
2007-08-09 14:08:59 +10:00
Ronnie Sahlberg
9c216d0d76 when we want to kill a tcp connection we stored the connection
description (src + dst sockaddr_in) in a linked list.
everytime we receive a captured packet from the network we had to walk 
this list in linear time to see if the packet matched a connection we 
wanted to RST.
which wouldnt scale very well.


replace the linked list with a redblack tree that is indexed by
src address, src port,  dst address,   dst port
to make checking whether the packet belongs to a connection we want to 
RST very fast and scalable


the reason we need to capture packets when we want to kill a TCP 
connection is because we must wait for an ACK coming back from the 
remote host  so that we can learn which sequence number to use in the 
RST.
Most tcp today will ingore any and all RST segments unless the 
sequencenumber lies exactly on the right edge of the window to make 
spoofing RST a little bit more difficult.

(This used to be ctdb commit ced18caea8582af042287beb6333dd1f8ba3344d)
2007-08-08 15:09:19 +10:00
Ronnie Sahlberg
203306400e add helpers to traverse a tree where the key is an array of uint32
(This used to be ctdb commit d328c66827cafff6356e96df2a782930274fe139)
2007-08-08 13:50:18 +10:00
Ronnie Sahlberg
dd14afe6aa after we have checked dest address that it is a public address
update addr to the source address so the rpintout in the log matches
the client that attached to samba

(This used to be ctdb commit 72098b71c79469c86769ca82bbd484c81902d27c)
2007-07-30 16:10:14 +10:00
Ronnie Sahlberg
e666808f60 no need to have a separate assignment of the tcparray pointer followed
by a talloc_steal()
use the returned pointer in talloc_steal as the value to assign

(This used to be ctdb commit 5c6375ad3bbecfa725ec3b1477f259e5a8191866)
2007-07-25 08:03:58 +10:00
Ronnie Sahlberg
81294825e7 when we build the arp structure for sending gratious arp (and tcp
tickles) just talloc_steal the enture tcp_array into the arp 
structure instead of copying each of the entries into a linked list
and then releasing the tcparray.

(This used to be ctdb commit 468e237740cf37a65872ef700bbb1284ede8352a)
2007-07-24 07:46:51 +10:00