1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-12 09:18:10 +03:00
Commit Graph

1491 Commits

Author SHA1 Message Date
Ronnie Sahlberg
fe7b5b4d85 add a stub restart method for IB
(This used to be ctdb commit d318504ad5a49dbdfa307be39ae88df839e6308d)
2007-10-19 09:04:52 +10:00
Ronnie Sahlberg
d1ba047b7f add a new transport method so that when a node is marked as dead, we
shut down and restart the transport

othervise, if we use the tcp transport the tcp connection might try to 
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp 
connection quickly reaches the maximum backoff rto which is often 60 or 
120 seconds.   this would mean that it could take up to 60/120 seconds 
before the tcp layer detects that the connection is dead and it has to 
be reestablished.

(This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)
2007-10-19 08:58:30 +10:00
Ronnie Sahlberg
755511d28d set the flags explicitely isnstead of masking them in
(This used to be ctdb commit 27a5f9dead44890683f9dbc4f07cda11264aa03b)
2007-10-18 16:54:00 +10:00
Andrew Tridgell
b814462c38 added some debug lines to help track down a problem
(This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760)
2007-10-18 16:27:36 +10:00
Ronnie Sahlberg
06cdb1ff31 merge from tridge
(This used to be ctdb commit ad03e63906270c9c076ffdb1f62f912bb414ea10)
2007-10-18 15:53:50 +10:00
Andrew Tridgell
5e3d5b1314 merge from ronnie
(This used to be ctdb commit a6b094fdede0ae850e87877fad0b9dd1f3a26869)
2007-10-18 15:51:15 +10:00
Andrew Tridgell
d939a2901b merge from ronnie
(This used to be ctdb commit 75d4b386293e186a6bb8532515585ab72670d663)
2007-10-18 15:44:02 +10:00
Ronnie Sahlberg
e4ec6e9d6b flush the route cache when we have added the single public ip to the
node

cleanup and remove everything when we do a shutdown event

(This used to be ctdb commit 221432f45073bc7624803058c8bbf18838e7ceeb)
2007-10-18 14:13:48 +10:00
Ronnie Sahlberg
537841fadb use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
about this packet any more and just forget it ever saw it

(This used to be ctdb commit 42a2a777cbc15a8cbbea7ecf2fb1c6dafa242d0c)
2007-10-17 15:03:58 +10:00
Ronnie Sahlberg
9a93f4b8df reverse the order in which public ips are listed so it matches the order
of the public_addresses file

(This used to be ctdb commit ce987661edd9160982e65866fb773445d296e5c7)
2007-10-17 13:42:42 +10:00
Ronnie Sahlberg
805ba22d65 merge from tridge
(This used to be ctdb commit 87760a95ec0a9e3cb2c415c569235a1ff58318cb)
2007-10-17 10:10:52 +10:00
Andrew Tridgell
85f91b9d5c increase release number
(This used to be ctdb commit 69fe7ce1d7874ce51d79de29adc53c207cb8869f)
2007-10-16 20:14:04 +10:00
Andrew Tridgell
6b9d73a96d more detail on multipath config
(This used to be ctdb commit 78c44f2267cbef5fbc57d56dfd5ff40972733a1f)
2007-10-16 20:13:28 +10:00
Ronnie Sahlberg
ce7a054d20 add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file   
since at this stage it MUST be locked by the recovery daemon.

in order to avoid a non-blocking fnctl() lock from blocking and cause 
"issues"  we move the 'test that we can not lock reclock file' into a 
child process.

(This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e)
2007-10-16 15:27:07 +10:00
Ronnie Sahlberg
056aac6e0c add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set,    the first entry in /etc/ctdb/public_addresses will 
always be hosted by node 0, when that node is available, the second 
entry by node1 and so on.

This tunable allows the allocation of addresses to become very 
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are 
identical on all the nodes in the cluster.

(This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
2007-10-16 12:15:02 +10:00
Ronnie Sahlberg
25d3a031d0 include system/network.h so we get the prototype for inet_aton()
(This used to be ctdb commit 7145764b2d217f88a723dcb0ffd4e5a1567d64cf)
2007-10-16 11:29:33 +10:00
Ronnie Sahlberg
7e2e1b14fb merge from tridge
(This used to be ctdb commit 9e6bc12c9be2dabcfb9c6aeef257ef4737287fab)
2007-10-16 11:26:22 +10:00
Ronnie Sahlberg
b3ff7d904d dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding 
itself after a node arriving/leaving the cluster

(This used to be ctdb commit 6c0d206dea7116db71bccb4802a93dd7283249f6)
2007-10-16 09:50:31 +10:00
Andrew Tridgell
d7f6b63f0a only link to -lipq if needed
(This used to be ctdb commit 7c378d881e37db0f14e07ccba19fde1f9f4f0831)
2007-10-15 14:44:06 +10:00
Andrew Tridgell
574db736f2 improved handling of systems without libipq.h
(This used to be ctdb commit cfa8ddd3ca53c0160558137cccfc7e73e46ec36c)
2007-10-15 14:37:54 +10:00
Andrew Tridgell
9570939337 disable ipmux code until we have a configure test
(This used to be ctdb commit fd83f0f3eb233f22ce9b5b4afbc4f26e3c865b3c)
2007-10-15 14:29:47 +10:00
Andrew Tridgell
99bc0aca93 sync flags between nodes in monitor loop in recmaster
(This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210)
2007-10-15 14:28:51 +10:00
Andrew Tridgell
0e855c0772 merge from ronnie
(This used to be ctdb commit d18712caba11855010be52f90bac656683076676)
2007-10-15 14:17:49 +10:00
Andrew Tridgell
0a5bfdd315 disable optimisation for now, until we find a occasional segv
(This used to be ctdb commit d09570c70551aa40390ce9ceffe7bc234e1afafe)
2007-10-15 13:31:09 +10:00
Andrew Tridgell
174879621e add config option for disabling bans
(This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)
2007-10-15 13:22:58 +10:00
Ronnie Sahlberg
ebe772b1b2 use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb
(This used to be ctdb commit 6abb46b010851f5719f12273b4a3d46ec986f0c7)
2007-10-11 07:51:57 +10:00
Ronnie Sahlberg
870a57a55b use kill_tcp_connections() to kill off all tcp connections to the
"single public ip" address when we do a recovery

(This used to be ctdb commit 19b52a2d5db31efa9e7c77037097ff8539986ac3)
2007-10-11 07:30:10 +10:00
Ronnie Sahlberg
fa5d51c238 move the kill_tcp_connections() function from 10.interfaces to functions
(This used to be ctdb commit 055948530fb16bf49c42fc4489f29a21665156c0)
2007-10-11 07:27:38 +10:00
Ronnie Sahlberg
1a4999076b first check that recovery master is connected (we know this from our own
flags)

then pull the flags off recovery master before checking if it is banned

(This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433)
2007-10-11 07:10:17 +10:00
Ronnie Sahlberg
167e100d4b simplify election handling
make sure we read and update the flags from all remote nodes before we 
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.

(This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f)
2007-10-11 06:16:36 +10:00
Ronnie Sahlberg
33a6aa3c3f merge from tridge
(This used to be ctdb commit 4690a205fe4325b03ab044bdb5fbc9aa3e94db6e)
2007-10-10 10:49:55 +10:00
Andrew Tridgell
011a205b86 make sure reconnected nodes start off as unhealthy so they don't get a public IP
(This used to be ctdb commit c733ec6760cae01ce277f491caf1355e46de5cf7)
2007-10-10 10:45:22 +10:00
Ronnie Sahlberg
bdd67bba1e add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip 
address


update the killtcp control in the daemon that if a socketpair that is to 
be killed does not match a normal public address it checks if the 
destination address maches the single public ip address and if so uses 
that vnn structure from the ctdb context


this allows killtcp to kill also connections to the single public ip 
instead of only normal public addresses

(This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)
2007-10-10 09:42:32 +10:00
Ronnie Sahlberg
7735957693 remove some debug outputs
(This used to be ctdb commit f29c0b52df1f455909ba133e3ad3bc462dc32929)
2007-10-09 13:45:42 +10:00
Ronnie Sahlberg
03e0277e03 send out gratious arps when we are starting up serving the "single
public ip" but before we start the ipmux tool

(This used to be ctdb commit dad1a80f39763314825939095f7656c13dcdbdc3)
2007-10-09 12:00:12 +10:00
Ronnie Sahlberg
80cd82f8e4 add a control to send gratious arps from the ctdb daemon
(This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)
2007-10-09 11:56:09 +10:00
Ronnie Sahlberg
292e9d9109 add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.

this ip address is attached to lo on all nodes but only the recmaster 
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming 
packets to this ip address onto the other node sin the cluster based on 
the ip address of the client host


to use this feature one must
1, have one fixed ip address in the customers network attached 
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
   to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
   to specify which ipaddress should be the "single public ip address"




to test with only one single client,   attach several ip addresses to 
the client and ping the public address from the client with different -I 
options.   look in network trace to see to which node the packet is 
passed onto.

(This used to be ctdb commit 50d648c95e4e6d7c2867a034c2b550086d853320)
2007-10-08 14:05:22 +10:00
Ronnie Sahlberg
ab5d098bf6 add a function in the ctdb tool to determine whether the local node is
the recmaster or not.

return 0 if the node is the recmaster and 1 (true) if it is not or if 
we could not communicate with the ctdb daemon.


call it 'isnotrecmaster' to cope with that if the tool could not bind to 
the socket to tyalk to the daemon, the tool will automatically return an 
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the 
local daemon and if the local daemon confirms this node is the recmaster

(This used to be ctdb commit ae5fcb790b6c3985f514fa8a96bc00c2619f2a28)
2007-10-08 09:47:20 +10:00
Ronnie Sahlberg
de6c5ed14d merge from tridge
(This used to be ctdb commit 02cda01c032804cb1c53593ceb98685c827e2d58)
2007-10-06 08:11:24 +10:00
Andrew Tridgell
50770008df fixed several places where we set the recovery culprit incorrectly
(This used to be ctdb commit d9da73395fa443801fc68ec53a42b548e832d58a)
2007-10-05 13:51:31 +10:00
Andrew Tridgell
4115492992 - catch ESTALE in the recovery lock by trying a read()
- priortise nodes that are unbanned and healthy in the election

(This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f)
2007-10-05 13:28:21 +10:00
Andrew Tridgell
fb48f2d5a2 we are the culprit if we can't get the reclock
(This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a)
2007-10-05 12:01:40 +10:00
Ronnie Sahlberg
72379ee3eb change async.private to async.private_data since private is a reserved
work in c++

(This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)
2007-09-26 14:25:32 +10:00
Ronnie Sahlberg
0ee8424615 merge from tridge
(This used to be ctdb commit 5655fab1284dce8f4a09ad426d53f5151c88968b)
2007-09-25 11:43:42 +10:00
Andrew Tridgell
ef995062b2 upped version number
(This used to be ctdb commit 4312e20e047ddb0f825c5e0c51d85dfa6a1b7df8)
2007-09-24 15:27:01 +10:00
Andrew Tridgell
e9a47bb5ec merge from ronnie
(This used to be ctdb commit c67f516f01f8033e3fbd0f338eaa3a8afb862495)
2007-09-24 13:52:35 +10:00
Ronnie Sahlberg
359448ff00 when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt   or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode

(This used to be ctdb commit a77a77e8b5180f6a4a1f3d7d4ff03811f3b71b56)
2007-09-24 10:52:26 +10:00
Ronnie Sahlberg
fa90c8e26e merge from tridge
(This used to be ctdb commit 7f9242747543ea1a2cc05f5c8afc51ab26e7d4bb)
2007-09-24 10:27:48 +10:00
Andrew Tridgell
e3d0ec8797 fixed a fd leak on the recovery lock
(This used to be ctdb commit 186f35c42ed4fcc9ed44390b0dd036ece475d45e)
2007-09-24 10:19:07 +10:00
Andrew Tridgell
80100c3573 run monitoring more quickly when unhealthy and at startup
(This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)
2007-09-24 10:12:18 +10:00