1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

1322 Commits

Author SHA1 Message Date
Andrew Tridgell
3427793f01 don't do the first startup event until we are out of recovery
(This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea)
2007-11-12 13:10:15 +11:00
Andrew Tridgell
bde886988b prevent a deadly embrace between smbd and ctdbd by moving the calling
of the startup event scripts after the point where recovery has
started and the node is in normal operation

This makes the 'startup' script just a special type of the 'monitor'
script which is called first

(This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)
2007-11-12 10:53:11 +11:00
Andrew Tridgell
82bd652749 patch from michael adam
(This used to be ctdb commit a7a3bef90f033bab5cb110a6ef77a8bef48f2588)
2007-11-02 13:20:29 +11:00
Andrew Tridgell
29e48fe54a increase release number
(This used to be ctdb commit dc648b1bb6becc52dcf900add97418a5634367eb)
2007-10-30 10:19:43 +11:00
Andrew Tridgell
684282f7a1 added bonding info to ctdb_diagnostics
(This used to be ctdb commit 71b5fc434bc5d88eb0669ee29aa932ba12737e07)
2007-10-30 10:18:52 +11:00
root
2a70ac8801 the while loop in the startup event runs as a subshell so we need an extra || exit 1 at the end
to propagate the error code back to the caller of the script

(This used to be ctdb commit c30d5c328784059949f5e82a07008e9632234f20)
2007-10-29 12:34:45 +11:00
Ronnie Sahlberg
8599f2008d if bond* interfaces are used as public interfaces we can not rely on ethtool but
have to check /proc for the status instead

(This used to be ctdb commit 4ed7747267aea265b7a71c651abf6d5db4f4718b)
2007-10-29 10:51:16 +11:00
Ronnie Sahlberg
ba6f9ae4a7 merge from tridge
(This used to be ctdb commit c7777b966f6a6e0f4126c03300338fdc822ac6c9)
2007-10-29 08:50:51 +11:00
Ronnie Sahlberg
bd73497a18 merge from tridge
(This used to be ctdb commit 919ba610c61cfaf5ecc1ab64ad8be34a80d928f4)
2007-10-29 08:40:46 +11:00
Andrew Tridgell
6d75f0703e added monitoring of ftp ports
(This used to be ctdb commit 4780e078fb55d69053f78a4bbc7c67e569bb5dae)
2007-10-26 14:53:09 +10:00
Ronnie Sahlberg
533a530177 since service nfs stop/start sometimes fail to bring up the mount daemon on rhel5
check if mountd is running during monitoring and if it is not, try to restart it

(This used to be ctdb commit 3d4b74669164b519398aeeacd59714f1e3884eff)
2007-10-23 12:35:43 +10:00
Andrew Tridgell
1d6b4f418d update release number
(This used to be ctdb commit fe6766940b2cf8a84ed51824158c956362a5806d)
2007-10-23 11:56:52 +10:00
Andrew Tridgell
2cea351f45 merge from ronnie
(This used to be ctdb commit cc70a2cc5f5400d6480cb609e1fa203236917976)
2007-10-23 11:45:36 +10:00
Ronnie Sahlberg
44ab81763d merge from tridge
(This used to be ctdb commit 938e375a80ce2f1827117c38554f576f73a5c71e)
2007-10-23 06:42:45 +10:00
Andrew Tridgell
6e6de1e4b7 fixed a problem with backgrounding onnnode
(This used to be ctdb commit 4e23630224bb219cfbbf129c4562da5a4c2d601a)
2007-10-22 21:11:02 +10:00
Andrew Tridgell
8e22bca5ca fixed a double close of a socket, leading to an EPOLL error
(This used to be ctdb commit bbe8ad842bdfedd37ef14a6be07ad939113fe9b1)
2007-10-22 16:41:11 +10:00
Ronnie Sahlberg
6a32af60b8 nfs may take a while to stop so do it in hte background
(This used to be ctdb commit 2ccaeaf6a65731c17173a4945e3e00e230e67d35)
2007-10-22 15:14:49 +10:00
Andrew Tridgell
2d8afd85d5 another place where we need to mark connect_fde as freed
(This used to be ctdb commit d047fbeafebe4b150602f9a91802795659058b16)
2007-10-22 15:13:32 +10:00
Andrew Tridgell
2931ed5d17 fixed a valgrind uninitialised memory error due to pad bytes
(This used to be ctdb commit aea9b0c8d467fe19815c046969e9c1049a3a20ac)
2007-10-22 15:13:08 +10:00
Andrew Tridgell
f09537e7f1 prevent a double free
(This used to be ctdb commit 5a1b923abb36c6deb99ae178fdd54f12235dc309)
2007-10-22 14:07:35 +10:00
Ronnie Sahlberg
1d6a74f943 when shutting down, we should stop monitoring
(This used to be ctdb commit 325683ef8f326f0565a827ff2c493adcab6e0d64)
2007-10-22 12:34:51 +10:00
Ronnie Sahlberg
4a97876fb7 when we are shutting down, we should first shut down the recovery daemon
(This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)
2007-10-22 12:34:08 +10:00
Ronnie Sahlberg
f022df1d40 dont set parameters in statd-callout if they should be set they
bshould be set from 10.interfaces

(This used to be ctdb commit 0c7c2dae0a976922de58793d576855bc37cd38e1)
2007-10-22 10:18:38 +10:00
Ronnie Sahlberg
caad5dc38d dont set some of the sysctl variables in statd-callout. these are
mainly useful for avoiding ack-storms when doing very rapid 
failover/failback during testing   but should not be required in 
real-world.

this gets rid of a lof of annoying messages from the messages file

(This used to be ctdb commit 50d289dcce2caa7c7be9b6faa3b38b69c2237038)
2007-10-21 06:42:33 +10:00
Andrew Tridgell
1a8338e443 increase release number
(This used to be ctdb commit 747ff96f1d93c52ba7548d0540266b0277d88ac1)
2007-10-19 12:22:24 +10:00
Andrew Tridgell
f47f758fe8 merge from ronnie
(This used to be ctdb commit d444fdc7782496abe4b27003b647ac49fb52e6be)
2007-10-19 09:39:07 +10:00
Andrew Tridgell
623e216dcf remove a incorrectly added file
(This used to be ctdb commit ff01a32db81b6c04d42634f5660181c270988264)
2007-10-19 09:30:55 +10:00
Ronnie Sahlberg
e81e008a36 add missing ) in the IB transport (which i dont compile for)
(This used to be ctdb commit 7f7a184bae87d46bd589d11068b6443b007366b4)
2007-10-19 09:05:37 +10:00
Ronnie Sahlberg
fe7b5b4d85 add a stub restart method for IB
(This used to be ctdb commit d318504ad5a49dbdfa307be39ae88df839e6308d)
2007-10-19 09:04:52 +10:00
Ronnie Sahlberg
d1ba047b7f add a new transport method so that when a node is marked as dead, we
shut down and restart the transport

othervise, if we use the tcp transport the tcp connection might try to 
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp 
connection quickly reaches the maximum backoff rto which is often 60 or 
120 seconds.   this would mean that it could take up to 60/120 seconds 
before the tcp layer detects that the connection is dead and it has to 
be reestablished.

(This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)
2007-10-19 08:58:30 +10:00
Ronnie Sahlberg
755511d28d set the flags explicitely isnstead of masking them in
(This used to be ctdb commit 27a5f9dead44890683f9dbc4f07cda11264aa03b)
2007-10-18 16:54:00 +10:00
Andrew Tridgell
b814462c38 added some debug lines to help track down a problem
(This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760)
2007-10-18 16:27:36 +10:00
Andrew Tridgell
5e3d5b1314 merge from ronnie
(This used to be ctdb commit a6b094fdede0ae850e87877fad0b9dd1f3a26869)
2007-10-18 15:51:15 +10:00
Andrew Tridgell
d939a2901b merge from ronnie
(This used to be ctdb commit 75d4b386293e186a6bb8532515585ab72670d663)
2007-10-18 15:44:02 +10:00
Ronnie Sahlberg
e4ec6e9d6b flush the route cache when we have added the single public ip to the
node

cleanup and remove everything when we do a shutdown event

(This used to be ctdb commit 221432f45073bc7624803058c8bbf18838e7ceeb)
2007-10-18 14:13:48 +10:00
Ronnie Sahlberg
537841fadb use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
about this packet any more and just forget it ever saw it

(This used to be ctdb commit 42a2a777cbc15a8cbbea7ecf2fb1c6dafa242d0c)
2007-10-17 15:03:58 +10:00
Ronnie Sahlberg
9a93f4b8df reverse the order in which public ips are listed so it matches the order
of the public_addresses file

(This used to be ctdb commit ce987661edd9160982e65866fb773445d296e5c7)
2007-10-17 13:42:42 +10:00
Ronnie Sahlberg
805ba22d65 merge from tridge
(This used to be ctdb commit 87760a95ec0a9e3cb2c415c569235a1ff58318cb)
2007-10-17 10:10:52 +10:00
Andrew Tridgell
85f91b9d5c increase release number
(This used to be ctdb commit 69fe7ce1d7874ce51d79de29adc53c207cb8869f)
2007-10-16 20:14:04 +10:00
Andrew Tridgell
6b9d73a96d more detail on multipath config
(This used to be ctdb commit 78c44f2267cbef5fbc57d56dfd5ff40972733a1f)
2007-10-16 20:13:28 +10:00
Ronnie Sahlberg
ce7a054d20 add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file   
since at this stage it MUST be locked by the recovery daemon.

in order to avoid a non-blocking fnctl() lock from blocking and cause 
"issues"  we move the 'test that we can not lock reclock file' into a 
child process.

(This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e)
2007-10-16 15:27:07 +10:00
Ronnie Sahlberg
056aac6e0c add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set,    the first entry in /etc/ctdb/public_addresses will 
always be hosted by node 0, when that node is available, the second 
entry by node1 and so on.

This tunable allows the allocation of addresses to become very 
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are 
identical on all the nodes in the cluster.

(This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
2007-10-16 12:15:02 +10:00
Ronnie Sahlberg
25d3a031d0 include system/network.h so we get the prototype for inet_aton()
(This used to be ctdb commit 7145764b2d217f88a723dcb0ffd4e5a1567d64cf)
2007-10-16 11:29:33 +10:00
Ronnie Sahlberg
7e2e1b14fb merge from tridge
(This used to be ctdb commit 9e6bc12c9be2dabcfb9c6aeef257ef4737287fab)
2007-10-16 11:26:22 +10:00
Ronnie Sahlberg
b3ff7d904d dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding 
itself after a node arriving/leaving the cluster

(This used to be ctdb commit 6c0d206dea7116db71bccb4802a93dd7283249f6)
2007-10-16 09:50:31 +10:00
Andrew Tridgell
d7f6b63f0a only link to -lipq if needed
(This used to be ctdb commit 7c378d881e37db0f14e07ccba19fde1f9f4f0831)
2007-10-15 14:44:06 +10:00
Andrew Tridgell
574db736f2 improved handling of systems without libipq.h
(This used to be ctdb commit cfa8ddd3ca53c0160558137cccfc7e73e46ec36c)
2007-10-15 14:37:54 +10:00
Andrew Tridgell
9570939337 disable ipmux code until we have a configure test
(This used to be ctdb commit fd83f0f3eb233f22ce9b5b4afbc4f26e3c865b3c)
2007-10-15 14:29:47 +10:00
Andrew Tridgell
99bc0aca93 sync flags between nodes in monitor loop in recmaster
(This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210)
2007-10-15 14:28:51 +10:00
Andrew Tridgell
0e855c0772 merge from ronnie
(This used to be ctdb commit d18712caba11855010be52f90bac656683076676)
2007-10-15 14:17:49 +10:00
Andrew Tridgell
0a5bfdd315 disable optimisation for now, until we find a occasional segv
(This used to be ctdb commit d09570c70551aa40390ce9ceffe7bc234e1afafe)
2007-10-15 13:31:09 +10:00
Andrew Tridgell
174879621e add config option for disabling bans
(This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)
2007-10-15 13:22:58 +10:00
Ronnie Sahlberg
ebe772b1b2 use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb
(This used to be ctdb commit 6abb46b010851f5719f12273b4a3d46ec986f0c7)
2007-10-11 07:51:57 +10:00
Ronnie Sahlberg
870a57a55b use kill_tcp_connections() to kill off all tcp connections to the
"single public ip" address when we do a recovery

(This used to be ctdb commit 19b52a2d5db31efa9e7c77037097ff8539986ac3)
2007-10-11 07:30:10 +10:00
Ronnie Sahlberg
fa5d51c238 move the kill_tcp_connections() function from 10.interfaces to functions
(This used to be ctdb commit 055948530fb16bf49c42fc4489f29a21665156c0)
2007-10-11 07:27:38 +10:00
Ronnie Sahlberg
1a4999076b first check that recovery master is connected (we know this from our own
flags)

then pull the flags off recovery master before checking if it is banned

(This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433)
2007-10-11 07:10:17 +10:00
Ronnie Sahlberg
167e100d4b simplify election handling
make sure we read and update the flags from all remote nodes before we 
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.

(This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f)
2007-10-11 06:16:36 +10:00
Ronnie Sahlberg
33a6aa3c3f merge from tridge
(This used to be ctdb commit 4690a205fe4325b03ab044bdb5fbc9aa3e94db6e)
2007-10-10 10:49:55 +10:00
Andrew Tridgell
011a205b86 make sure reconnected nodes start off as unhealthy so they don't get a public IP
(This used to be ctdb commit c733ec6760cae01ce277f491caf1355e46de5cf7)
2007-10-10 10:45:22 +10:00
Ronnie Sahlberg
bdd67bba1e add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip 
address


update the killtcp control in the daemon that if a socketpair that is to 
be killed does not match a normal public address it checks if the 
destination address maches the single public ip address and if so uses 
that vnn structure from the ctdb context


this allows killtcp to kill also connections to the single public ip 
instead of only normal public addresses

(This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)
2007-10-10 09:42:32 +10:00
Ronnie Sahlberg
7735957693 remove some debug outputs
(This used to be ctdb commit f29c0b52df1f455909ba133e3ad3bc462dc32929)
2007-10-09 13:45:42 +10:00
Ronnie Sahlberg
03e0277e03 send out gratious arps when we are starting up serving the "single
public ip" but before we start the ipmux tool

(This used to be ctdb commit dad1a80f39763314825939095f7656c13dcdbdc3)
2007-10-09 12:00:12 +10:00
Ronnie Sahlberg
80cd82f8e4 add a control to send gratious arps from the ctdb daemon
(This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)
2007-10-09 11:56:09 +10:00
Ronnie Sahlberg
292e9d9109 add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.

this ip address is attached to lo on all nodes but only the recmaster 
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming 
packets to this ip address onto the other node sin the cluster based on 
the ip address of the client host


to use this feature one must
1, have one fixed ip address in the customers network attached 
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
   to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
   to specify which ipaddress should be the "single public ip address"




to test with only one single client,   attach several ip addresses to 
the client and ping the public address from the client with different -I 
options.   look in network trace to see to which node the packet is 
passed onto.

(This used to be ctdb commit 50d648c95e4e6d7c2867a034c2b550086d853320)
2007-10-08 14:05:22 +10:00
Ronnie Sahlberg
ab5d098bf6 add a function in the ctdb tool to determine whether the local node is
the recmaster or not.

return 0 if the node is the recmaster and 1 (true) if it is not or if 
we could not communicate with the ctdb daemon.


call it 'isnotrecmaster' to cope with that if the tool could not bind to 
the socket to tyalk to the daemon, the tool will automatically return an 
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the 
local daemon and if the local daemon confirms this node is the recmaster

(This used to be ctdb commit ae5fcb790b6c3985f514fa8a96bc00c2619f2a28)
2007-10-08 09:47:20 +10:00
Ronnie Sahlberg
de6c5ed14d merge from tridge
(This used to be ctdb commit 02cda01c032804cb1c53593ceb98685c827e2d58)
2007-10-06 08:11:24 +10:00
Andrew Tridgell
50770008df fixed several places where we set the recovery culprit incorrectly
(This used to be ctdb commit d9da73395fa443801fc68ec53a42b548e832d58a)
2007-10-05 13:51:31 +10:00
Andrew Tridgell
4115492992 - catch ESTALE in the recovery lock by trying a read()
- priortise nodes that are unbanned and healthy in the election

(This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f)
2007-10-05 13:28:21 +10:00
Andrew Tridgell
fb48f2d5a2 we are the culprit if we can't get the reclock
(This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a)
2007-10-05 12:01:40 +10:00
Ronnie Sahlberg
72379ee3eb change async.private to async.private_data since private is a reserved
work in c++

(This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)
2007-09-26 14:25:32 +10:00
Andrew Tridgell
ef995062b2 upped version number
(This used to be ctdb commit 4312e20e047ddb0f825c5e0c51d85dfa6a1b7df8)
2007-09-24 15:27:01 +10:00
Ronnie Sahlberg
359448ff00 when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt   or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode

(This used to be ctdb commit a77a77e8b5180f6a4a1f3d7d4ff03811f3b71b56)
2007-09-24 10:52:26 +10:00
Andrew Tridgell
e3d0ec8797 fixed a fd leak on the recovery lock
(This used to be ctdb commit 186f35c42ed4fcc9ed44390b0dd036ece475d45e)
2007-09-24 10:19:07 +10:00
Andrew Tridgell
80100c3573 run monitoring more quickly when unhealthy and at startup
(This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)
2007-09-24 10:12:18 +10:00
Andrew Tridgell
b87ddd9148 no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
2007-09-24 10:00:14 +10:00
Andrew Tridgell
4178cb98a1 fixed a valgrind error, and some warnings
(This used to be ctdb commit c0f52dbb385fa0748680adb7c40755c92e577551)
2007-09-24 09:57:14 +10:00
Andrew Tridgell
416c0cec6e make the persistent dbdir configurable
(This used to be ctdb commit 2587b887dcfce26b12c66fcb5d34e92da42a1776)
2007-09-21 16:12:04 +10:00
Andrew Tridgell
2607c222fc avoid using connected nodes that aren't in the vnn map yet
(This used to be ctdb commit 2b5ae133f5f6fa9ad1a8896fe4b4c542d4ca462d)
2007-09-21 15:44:13 +10:00
Ronnie Sahlberg
51d912063c in ctdb_control_persistent_store() we must talloc_steal() the pointer to
c   to prevent it from being immediately freed (and our persistent store 
state with it) if we need to wait asynchronously for other nodes before 
we can reply back to the client

(This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335)
2007-09-21 15:19:33 +10:00
Ronnie Sahlberg
61e885d0b9 when ctdb attaches to a database it broadcasts the attach to all other
nodes so that the db is created on them as well

when we send this broadcast   we must use the correct control and not 
assume all databases created are of the temporary kind 

(This used to be ctdb commit 106f816d4a0814ca4418de051289d9fc62df7dd2)
2007-09-21 13:47:40 +10:00
Ronnie Sahlberg
e9f45419da merge from tridge
(This used to be ctdb commit bb283ee8ebaea848366e9c3b3d3244da459a7967)
2007-09-21 13:20:29 +10:00
Andrew Tridgell
c60988325d added support for persistent databases in ctdbd
(This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)
2007-09-21 12:24:02 +10:00
Ronnie Sahlberg
577fffdc5d merge from tridge
(This used to be ctdb commit c4262a3f980a9fabe0985fc3281e9f5de1cdf35e)
2007-09-19 11:54:45 +10:00
Ronnie Sahlberg
dc0bd93789 one more command to run to enable winbind for vsftpd
(This used to be ctdb commit 1f6d13a364cde58b66a4bc52e909cc68b8c807d7)
2007-09-19 11:53:48 +10:00
Andrew Tridgell
81bfa58d58 make sure we set close on exec on any possibly inherited fds
(This used to be ctdb commit d9dec82076f14a348e7b67b4350180681ff86f32)
2007-09-19 11:46:37 +10:00
Andrew Tridgell
0438b07b53 separate out the various fs display ops
(This used to be ctdb commit dc89e1a428da5d5ca2a9c4988c05de3ea65f00f4)
2007-09-19 11:46:11 +10:00
Andrew Tridgell
bd7eeebe16 expanded ctdb_diagnostics a bit
(This used to be ctdb commit 70a4bb3dc7e624ad778949dbc874c2617fd532e6)
2007-09-17 15:31:33 +10:00
Ronnie Sahlberg
d9f936fefe add documantation of additional requirements for FTP so that users can
log in and access files using the AD username/password

(This used to be ctdb commit 679e125770247fc24dfb14b5781d44f639457ecd)
2007-09-17 13:01:16 +10:00
Andrew Tridgell
c6abc4e1a8 increase release number
(This used to be ctdb commit b213f4f1bf5bcfb40b9ce176df22216e3ebbe964)
2007-09-14 19:27:11 +10:00
Andrew Tridgell
ed75f988d5 merge from ronnie
(This used to be ctdb commit 913c33a7d2f67570548fecc568dba874e5f72dd2)
2007-09-14 15:23:23 +10:00
Ronnie Sahlberg
2d0261afeb let ctdb ip only print the ip addresses known to the specified node
and not the entire cluster

(This used to be ctdb commit eb1f67a56d752c9f42a9a26a6697a7ab8e668b3a)
2007-09-14 15:19:44 +10:00
Ronnie Sahlberg
90a37c4fb4 update vnn -> pnn in documentation
(This used to be ctdb commit bb62b4df514255b95d3871b254bec9c440bc4a06)
2007-09-14 14:24:53 +10:00
Ronnie Sahlberg
05e1f67381 documentation updates
it is --event-script-dir      not --event-script

add explanation of the public_addresses file

(This used to be ctdb commit 21325b23e786ac1c2abc07ea75b0814e9c725a9e)
2007-09-14 14:19:12 +10:00
Andrew Tridgell
c62490569b cope with non-standard install dirs in event scripts
(This used to be ctdb commit 52fff5345873690a9cc86495f414343eaa3bd540)
2007-09-14 14:14:03 +10:00
Andrew Tridgell
305f432e50 fix pkill args
(This used to be ctdb commit 9690de97b4746f4a79830465e3a1679e9fbda671)
2007-09-14 11:59:04 +10:00
Andrew Tridgell
955d4d8615 make sure all public IPs are removed at startup
(This used to be ctdb commit b16f33787f2a9471285037f4a6d470e826536570)
2007-09-14 11:56:40 +10:00
Ronnie Sahlberg
8edcd3f83f during startup make sure to delete any public addresses from any
interface

(This used to be ctdb commit 18d80ea6db39e61f60e4c01de164d58bcbd8ab10)
2007-09-14 10:37:10 +10:00
Ronnie Sahlberg
6052078b53 let each node verify that they have a correct assignment of public ip
addresses (i.e. htey hold those they should hold   and they dont hold 
any of those they shouldnt hold)

if an inconsistency is found, mark the local node as recovery mode 
active
and wait for the recovery master to trigger a full blown recovery

(This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc)
2007-09-14 10:16:36 +10:00
Andrew Tridgell
42fc00bda9 - merge from ronnie
- add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring

(This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b)
2007-09-14 09:49:12 +10:00
Andrew Tridgell
3b159e4e60 wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
(This used to be ctdb commit 216eb4be7ec481cfe9aaeeada257b77cb394d2e4)
2007-09-14 09:25:11 +10:00
Andrew Tridgell
9cf96a5e4c nicer use of testparm
(This used to be ctdb commit a611ea930fb9dae6e56f6a74b2bdc9e08066d4d1)
2007-09-14 09:24:34 +10:00
Ronnie Sahlberg
4c20141659 update the section about event scripts
(This used to be ctdb commit a0744480c85a4e8648bd0ae7600f90d311b931ea)
2007-09-14 08:56:27 +10:00
Ronnie Sahlberg
528b8af87f disable nfsv4 in etc/sysconfig/nfs
(This used to be ctdb commit b71e11f0e27bb3ff908ad171aa5b1f724609ad05)
2007-09-14 08:15:24 +10:00
Ronnie Sahlberg
4186d8eaba when a ctdb_takeover_run has failed we must make sure that
need_takeover_run is set to true  or else we might forget to rerun it 
again during the next recovery


othervise,  need_takeover_run is only set to true IFF the node flags for 
a remote node and the local nodes differ.
It is possible that a takeover run fails  and thus the reassignment of 
ip addresses is incomplete  but before we get back to the test in    
monitor_cluster()  that all the node flags of all nodes have converged 
and they now match each others again.   and thus causing 
monitor_cluster() to fail to realize that a takeover run is needed.

(This used to be ctdb commit ae7e866787cebd14394983ce1834387c959d1022)
2007-09-13 14:51:37 +10:00
Andrew Tridgell
2f86c3f827 ensure smbd and winbindd do die in 50.samba
(This used to be ctdb commit 6f23affedb626fc7a5ca86c4763f3045a5586231)
2007-09-13 14:36:23 +10:00
Ronnie Sahlberg
ab1c8c074e merge from tridge
(This used to be ctdb commit eda3caa77be352967a41ff9bddda5296c94797a9)
2007-09-13 14:28:18 +10:00
Andrew Tridgell
9d50595b8a prevent recursion in the calling of ctdb_takeover_run
(This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d)
2007-09-13 14:08:18 +10:00
Andrew Tridgell
6fa6101b1a more shell scripting fixes in 10.interface
(This used to be ctdb commit 4ee2230b3f2ae7437a9d0cf973eb4645d276accd)
2007-09-13 11:57:42 +10:00
Andrew Tridgell
30de14fe79 force recovery if unable to tell a node to release an IP
(This used to be ctdb commit 6895788d2499344a03357e5c1103cb8383e9eaf7)
2007-09-13 11:19:49 +10:00
Andrew Tridgell
25940014c0 fixed script errors in 10.interface
(This used to be ctdb commit 0c759614d27758cef3eba5942b2cccad54193cbb)
2007-09-13 11:19:30 +10:00
Andrew Tridgell
3c0f61cb92 we don't need the is_loopback logic in ctdb any more
(This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)
2007-09-13 10:45:06 +10:00
Andrew Tridgell
4f261ae191 remove more cruft from the logs
(This used to be ctdb commit b67f35c483b6cbb5facaa6380c7794709f44213a)
2007-09-13 10:39:05 +10:00
Andrew Tridgell
023b885793 new approach for killing TCP connections on IP release
(This used to be ctdb commit c33a0db29b5604966f582b1f8c5fd66760c72197)
2007-09-13 10:24:48 +10:00
Andrew Tridgell
1b53ecc445 remove clutter from ctdb log file
(This used to be ctdb commit 54d5dcaaee0498f40bbee5059cc72d0ca75d33b7)
2007-09-13 10:03:18 +10:00
Andrew Tridgell
a919f6927a fixed return code
(This used to be ctdb commit 30165b5a19f9bd9d1f62c9c222df0711c1c6a927)
2007-09-13 10:02:56 +10:00
Andrew Tridgell
96c54c6188 handle hung or slow ctdb daemons on shutdown
(This used to be ctdb commit a3089211782ab12387c1b04efa28914c94d89b30)
2007-09-12 13:26:24 +10:00
Andrew Tridgell
6c77184d96 - set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;

(This used to be ctdb commit e2df1d6d08e67a36ff05a590a34c56e900741287)
2007-09-12 13:23:36 +10:00
Andrew Tridgell
67bd64ef35 - don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier
(This used to be ctdb commit 5f9aed62adaf87750f953412c55b29c58e4bb6c0)
2007-09-12 13:22:31 +10:00
Andrew Tridgell
a478c78f03 changed some debug levels
(This used to be ctdb commit ed764533e1c2f8982e1577ca5e7f5f4482a15345)
2007-09-12 13:21:19 +10:00
Ronnie Sahlberg
536d393452 use the public addresses variable instead of hardcoding the path
(This used to be ctdb commit 8e23f173cda8a76bbc243863bfc49fe8c7b907f4)
2007-09-12 07:28:24 +10:00
Ronnie Sahlberg
98f968d8d3 move all ip addresses onto loopback when we startup ctdb
(This used to be ctdb commit 5d7500f7d93f0d36ffbf3c966c5b38f82f0376c7)
2007-09-12 07:26:30 +10:00
Andrew Tridgell
a6728e0520 fixed location of arp_filter
(This used to be ctdb commit ea239c82fca2b9a648d21e5c603e632011958452)
2007-09-11 16:38:32 +10:00
Andrew Tridgell
5b65a6c7f0 get interface right
(This used to be ctdb commit e0edc38d7e897f7de2850eb2cfd17fea75c16fcc)
2007-09-10 20:45:27 +10:00
Ronnie Sahlberg
a9a8ad07b4 grab the interface name from tok and not from the uninitialized array
(This used to be ctdb commit 23a47ca2331a163b5fde03bd2f6f1d478633aede)
2007-09-10 16:34:11 +10:00
Ronnie Sahlberg
9c1b2f4856 merged patch from tridge
(This used to be ctdb commit 90ab044093f67b656e21861ce12d6fee5794d21f)
2007-09-10 16:23:06 +10:00
Andrew Tridgell
8cd7ca149e fixed a pointer cast warning
(This used to be ctdb commit df0e7a4aa13112d613702d8ea0fb0e18510d293c)
2007-09-10 15:16:17 +10:00
Andrew Tridgell
57d8102cf8 added back --public-interface to startup script
(This used to be ctdb commit 9e9cb3c0da7251f522c655366ef0868037577a9c)
2007-09-10 15:09:28 +10:00
Andrew Tridgell
f3ae1cdb02 - use struct sockaddr_in more consistently instead of string addresses
- allow for public_address lines with a defaulting interface

(This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)
2007-09-10 14:27:29 +10:00
Andrew Tridgell
70ec39b1b1 add back in --public-interface as a default
(This used to be ctdb commit cdf56daf69b2c8381ee673943e982ad20f19affd)
2007-09-10 14:26:35 +10:00
Andrew Tridgell
42168177ef merge from ronnie
(This used to be ctdb commit 1f21d4d563232926c35d03c4d69eb69190823dc6)
2007-09-10 13:21:11 +10:00
Andrew Tridgell
f3927719c9 add crontab and sysctl output
(This used to be ctdb commit b1b59f3294ee7a5ed6d685f373bf19d3152170fa)
2007-09-10 11:27:07 +10:00
Ronnie Sahlberg
50381480eb update a comment
(This used to be ctdb commit e7d3ef4443686529299e8f293398cc0522235627)
2007-09-10 07:45:57 +10:00
Ronnie Sahlberg
4ac749bfa4 change the signature to ctdb_sys_have_ip() to also return:
a bool that specifies whether the ip was held by a loopback adaptor or 
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address 
over to the loopback interface

when we release an ip address  after we have move it onto loopback, 
use 60.nfs to kill off the server side (the local part) of the tcp 
connection   so that the tcp connections dont survive a 
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip 
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public 
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
2007-09-10 07:20:44 +10:00
Ronnie Sahlberg
0ebd7beb4b set /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when
10.interfaces startsup

this setting makes the system only respond to APR requests from the NIC 
where the ip address is tied to and adds to the 
"principle of least surprise" when using multihoming servers

(This used to be ctdb commit 39ddf347dc45f599964a4c17e67e71faed00e544)
2007-09-08 08:09:02 +10:00
Ronnie Sahlberg
d91b28f8b7 ctdb ip must loop over all connected nodes to pull hte public ip list
and merge into a big list   since with the deassociation between a node 
and a public ipaddress    the /etc/ctdb/public_addresses files can 
differ between nodes and no node know about all public addresses that a 
cluster can use

(This used to be ctdb commit e208294fed183977cacc44b2cd1195c11d967c18)
2007-09-07 16:45:19 +10:00
Ronnie Sahlberg
3cad21d6be remove the ctdb publicip command
this command no longer makes sense when there is no on-to-one mapping 
between a node and its default public ip

(This used to be ctdb commit 91280db7f6dd3d659edd86fae21ba347d6f9da9e)
2007-09-07 15:39:26 +10:00
Ronnie Sahlberg
d0dd8df752 update web nfs with the new NFS_HOSTNAME variable we need to be able to
stat notify using the correct hostname

(This used to be ctdb commit 1498e33e48a4654e02b74a00ef7473fed3225d69)
2007-09-07 12:20:48 +10:00
Ronnie Sahlberg
eb7a15730e add a short delay after stopping nfslock to make it less likely that
"weird" things happen

(This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a)
2007-09-07 12:14:53 +10:00
Ronnie Sahlberg
68c37f9b41 merge from tridge
(This used to be ctdb commit 58c918b1bfe09c31049769dee266129cbad4cb20)
2007-09-07 09:21:40 +10:00
Ronnie Sahlberg
fa872de664 60.nfs:
we must always restart the lockmanager when the cluster has been 
reconfigured and ip addresses has changed. This is to make sure we get a 
clusterwide grace period for nfs locking.
if we dont do this and only restart locking on the nodes that were 
direclty affected, a different client can take out a conflicting lock 
from a different node before affected clients has had a chance to
reclaim all the locks lost during reconfigure.
grace period on rhel5 kernel has bene increased to 90 seconds!

statd-callout:
we must restart lockmanager to ensure a clusterwide grace period for 
nfs. this makes locking "more correct" for nfs clients and prevents
other clients/nodes from taking out a conflicting lock while a different
client/node tries to reclaim lost locks.
This makes it "almost consistent" for NFS clients   but there is still 
the possibility that a cifs client can take out a conflicting lock 
before an nfs client has had a chance to reclaim an existing lock.
This can not be solved with anything less than making the kernel nfs 
lock manager "samba aware" and making samba aware of the internal state 
of the kernel lock manager so that they can cooperate.

we can not just stop/start the lockmanager back to back in rhel5 since 
if they are stopped/started too close to eachother then when the new 
lockmanager upon starting up sends out statd notifications two things 
can happen:
1, new lockmanager sends out notification BEFORE it has registered with 
portmapper leading to 
  lockmanager starts
  lockmanager sends notification to the client
  client tries to recover the lock and tries to portmap the lockmanager
  port on the server.
  server is not (yet) registered with portmapper and server responds
  "no such program" to hte clients request to discover where lockmanager
   is.
  client then just completely gives up reclaiming the lock and doesnt 
  even reattempt the portmapper call after some timeout.
  ==> lock reclaim failed.
2, if they are started back to back, and a client tries to reclaim the
   lock  the lockmanager sometimes sends two responses back to back
   to the client.   one with status NLM_GRANTED (==you got the lock 
reclaimed) and one with status NLM_DENIED (==you could not get the lock 
reclaimed)
   This confuses the client and leads to the server thinking that the 
client does have the lock   and the client thinking it has not got the 
lock    and orphaned locks result.


We also send out additional notification messages of different formats
to allow more legacy clients to interoperate with locking.

(This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 08:52:56 +10:00
Ronnie Sahlberg
82984577f1 we dont need the rpc.statd on shared directory neither do we need
PUBLIC_IP anymore

(This used to be ctdb commit fd571ac87f65928e92dde6977745083bf381df1a)
2007-09-06 11:32:18 +10:00
Ronnie Sahlberg
00453a375a improve the handling of hosts to notify with statd
(This used to be ctdb commit cc87bda7e344bc777b9620a6211e62de4dce4e3b)
2007-09-06 11:30:49 +10:00
Ronnie Sahlberg
19546fb007 specify the additional ports for nfs
(This used to be ctdb commit 1934163f0b393738615a05854082a7d488003e1c)
2007-09-06 10:26:44 +10:00
Ronnie Sahlberg
f7d193e9ce the event scripts for nfs are called 60.nfs and 61.nfstickle
(This used to be ctdb commit b15f1c25560320993b93aa3d943985dab4e47947)
2007-09-06 10:18:13 +10:00
Ronnie Sahlberg
0781616ef9 document NFS_TICKLE_SHARED_DIRECTORY on our web page
(This used to be ctdb commit 40ec29f602897e9b01a6747806f502ab38423d54)
2007-09-06 08:21:11 +10:00
Ronnie Sahlberg
46eecfea27 we dont use 'sendip' any more so dont check for it and exit from the
61.nfstickles script if it is missing from the host

(This used to be ctdb commit 8eac441e24f4ef33b55f9eaa4856b5c1e1c15213)
2007-09-05 15:39:51 +10:00
Ronnie Sahlberg
a9c8456ed6 we should always get data back from getnodemap
(This used to be ctdb commit ff999a4b56f714c58c81baa454a2d39d04944136)
2007-09-05 14:59:29 +10:00
Ronnie Sahlberg
e4eeceaf3a dont dereference vnn before we have assigned it a pointer value
(This used to be ctdb commit 2a8fc69aea8527b22a3fe57427677e4caff57338)
2007-09-05 14:29:44 +10:00
Andrew Tridgell
c572d3c226 added a diagnostics tool for ctdb
(This used to be ctdb commit 032a2238caf688656b00e06bf363182368e037e1)
2007-09-05 14:20:34 +10:00
Ronnie Sahlberg
77ec4d5248 allow different nodes in the cluster to use different public_addresses
files
so that we can partition the cluster into different subsets of nodes 
which each serve a different subset of the public addresses

(This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)
2007-09-04 23:15:23 +10:00
Ronnie Sahlberg
8f819c6a0e get rid of the ctdb_vnn_list structure and just use a single list of
ctdb_vnn

(This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)
2007-09-04 18:20:29 +10:00
Ronnie Sahlberg
cf45c5096c we cant have takeover_ctx hanging off ctdb since it is freed/recreated
everytime we release an ip.
this context is used to hold all resources needed when sending out 
gratious arps and tcp tickles during ip takeover.

we hang it off the vnn structure that manages that particular ip address 
instead   so that we can have multiple ones going in parallell

this bug (or the same bug in different shape) has probably been in ctdb 
for very very long   but is likely to be hard to trigger

(This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)
2007-09-04 14:36:52 +10:00
Ronnie Sahlberg
3e6be59f61 fix typo in debug output
(This used to be ctdb commit 011a777c6e538ca79f104c7884a4f0e222997382)
2007-09-04 14:21:35 +10:00
Ronnie Sahlberg
784eac9079 dont just always return 0 from the killtcp control.
return 0 or -1 so that the ctdb tool knows whether the control succeeded 
or not

(This used to be ctdb commit cace8b40090be5529ec6b463d3839d0e22f4039d)
2007-09-04 14:19:18 +10:00
Ronnie Sahlberg
a50e83448c change vnn to pnn in the traverse structure
(This used to be ctdb commit d56ae0963b420edea6a2d5eeb408a9811af3f3f6)
2007-09-04 10:49:21 +10:00
Ronnie Sahlberg
f69321edc8 change debug output from vnn to pnn
(This used to be ctdb commit 93a7cf759ae3f9af6671b9f8589e1399a669b46f)
2007-09-04 10:47:02 +10:00
Ronnie Sahlberg
d66d9cdd22 change debug output from vnn to pnn
change ctdb_daemon_send_message to take pnn as parameter isntead of vnn

(This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)
2007-09-04 10:45:41 +10:00
Ronnie Sahlberg
0c91261340 change ctdb_send_message to take pnn as parameter instead of vnn
(This used to be ctdb commit 93dd4fba2e0fa6a011d15406652836785a974880)
2007-09-04 10:42:20 +10:00
Ronnie Sahlberg
157be530dd change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn
(This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f)
2007-09-04 10:38:48 +10:00
Ronnie Sahlberg
211b497818 change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn
change ctdb_ban_info.vnn to ctdb_ban_info.pnn

(This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)
2007-09-04 10:33:10 +10:00
Ronnie Sahlberg
6f693bbcbd change server_id.vnn to server_id.pnn
(This used to be ctdb commit 26f2ee2b754a9271454412f05111a19b3013c6eb)
2007-09-04 10:21:51 +10:00
Ronnie Sahlberg
583b6e6ba6 change ctdb_get_vnn to ctdb_get_pnn
(This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)
2007-09-04 10:18:44 +10:00
Ronnie Sahlberg
4ba9990143 change vnn to pnn in the ctdb tool
(This used to be ctdb commit 822556a4d4ba23459be3a25cbd3f48d1f64ba95f)
2007-09-04 10:14:41 +10:00
Ronnie Sahlberg
fc9d39c3a6 change ctdb_validate_vnn to ctdb_validate_pnn
(This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)
2007-09-04 10:09:58 +10:00
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
12ebb74838 change how we do public addresses and takeover so that we can have
multiple public addresses spread across multiple interfaces on each 
node.

this is a massive patch since we have previously made the assumtion that 
we only have one public address per node.

get rid of the public_interface argument.  the public addresses file 
now explicitely lists which interface the address belongs to

(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
2007-09-04 09:50:07 +10:00
Andrew Tridgell
7423bcaabe up the release number
(This used to be ctdb commit 71a6213c92a12bf794c17c30ae4987149b68fe1b)
2007-08-30 17:51:05 +10:00
Ronnie Sahlberg
4e61e05f49 when we start 60.nfs we must make sure that the shared storage
nfs-state directory actually exists (by creating it)
or else the lock manager will not start 

(This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)
2007-08-30 15:27:45 +10:00
Andrew Tridgell
8c94d4dc87 merge from ronnie
(This used to be ctdb commit ab11fd70cf4d2165a5b55930cbad6fddf5397f54)
2007-08-27 18:04:53 +10:00
Ronnie Sahlberg
794fb10634 add an extra debug statement when we send a SIGTERM to a process
(This used to be ctdb commit a9c1be9cf9efdc69bfc95657b70e9f8b8230cda8)
2007-08-27 17:33:46 +10:00
Ronnie Sahlberg
2c0c94782a make the ctdb shutdown command use the async _send() function to send
the shutdown command
and return success to the caller if the _send() was successful

(This used to be ctdb commit 6bacaf8c7a96044708a6eda10cc8576adb7f5f79)
2007-08-27 15:03:52 +10:00
Andrew Tridgell
7f630b67f6 fixed segv when no public interface is set
(This used to be ctdb commit 55b415f87bd3cba13c73ccd2fe661720754a6af7)
2007-08-27 11:49:42 +10:00
Ronnie Sahlberg
7f02e16143 add async versions of the freeze node control and freeze all nodes in
parallell 

(This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505)
2007-08-27 10:31:22 +10:00
Ronnie Sahlberg
a9c45b2562 change the monitoring of recmode in the recovery daemon to use a fully
async eventdriven api for controls

(This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546)
2007-08-27 09:40:10 +10:00
Ronnie Sahlberg
801bdbdc80 add a control to pull the server id list off a node
(This used to be ctdb commit 38aa759aa88a042c31b401551f6a713fb7bbe84e)
2007-08-26 10:57:02 +10:00
Ronnie Sahlberg
6681da31df add an initial implementation of a service_id structure and three
controls to  register/unregister/check a server id.

a server id consists of TYPE:VNN:ID    where type is specific to the 
application.  VNN is the node where the serverid was registered and ID 
might be a node unique identifier such as a pid or similar.


Clients can register a server id for themself at the local ctdb daemon.
When a client dissappears   or when the domain socket connection for the 
client drops  then any and all server ids registered across that domain 
socket will also be automatically removed from the store.

clients can register as many server_ids as they want at the same time    
but each TYPE:VNN:ID must be globally unique.

Clients have the option of explicitely unregister a server id by using 
the UNREGISTER control.


Registration and unregistration can only be done by clients to the local 
daemon. clients can not register their server id to a remote node.


clients can check if a server id does exist on any ctdb node in the 
network by using the check control

(This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)
2007-08-24 15:53:41 +10:00
Ronnie Sahlberg
de23937368 cleanup invoke_control_callback. we dont need to pass some of these
parameters to _recv() since they are already set

(This used to be ctdb commit 2034dbebb26d7a2d51241943f6ccbe15bb6a5169)
2007-08-24 10:54:34 +10:00
Ronnie Sahlberg
495a6403da change the api for managing callbacks to controls so that isntead of
passing it as a parameter we set the callback function explicitely from 
the caller if the ..._send() function returned a valid state pointer.

(This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)
2007-08-24 10:42:06 +10:00
Ronnie Sahlberg
1da9c03b1f comment why we do a talloc_steal
(This used to be ctdb commit aba7972728307e0ae52ccf8c0dd5808110fb92d7)
2007-08-24 09:34:04 +10:00
Ronnie Sahlberg
62a03ef9d5 get rid of the explicit global timeout used in the previous example and
try this time by relying on the timeouts for the individual controls

(This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be)
2007-08-23 19:38:54 +10:00
Ronnie Sahlberg
f854b5f876 try out a slightly different api for controls where you provide a
callback function which is called upon completion (or timeout) of the 
control.

modify scanning of recmaster in the monitoring_cluster code to try the 
api out

(This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)
2007-08-23 19:27:09 +10:00
Ronnie Sahlberg
4c13bf0c5f break checking that the recoverymode on all nodes are ok out into its
own function

(This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939)
2007-08-23 13:48:39 +10:00
Ronnie Sahlberg
8fd3df2553 hang the ctdb_req_control structure off the ctdb_client_control_state
struct  so that if we timeout a control we can print debug info such as 
what opcode failed and to which node

we dont need the *status parameter to ctdb_client_control_state

create async versions of the getrecmaster control

pass a memory context to getrecmaster

(This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)
2007-08-23 13:00:10 +10:00
Ronnie Sahlberg
20120c2331 in ctdb_call_recv() we must check that state is non-NULL since
ctdb_call() may pass a null pointer to _recv() and this would cause a 
segfault.
fortunately there appears there are no critical users for this codepath 
right now so the risk was more theoretical IF clients start using this 
call it coult segfault.


change ctdb_control() to become fully async so we later can make 
recovery daemon do the expensive controls to nodes in parallell instead 
of in sequence

(This used to be ctdb commit 379789cda6ef049f389f10136aaa1b37a4d063a9)
2007-08-23 11:58:09 +10:00
Ronnie Sahlberg
277cdbe3d1 create an enum to describe the state of a control in flight instead of
using the enum that is for calls

(This used to be ctdb commit f9cf7076151af983a1c4ea56fbeb6d94ea508a34)
2007-08-23 09:53:10 +10:00
Andrew Tridgell
d95476fa38 merge from ronnie
(This used to be ctdb commit e0f1c1acb1188500674626d631e1a1b8726e72ad)
2007-08-22 17:31:29 +10:00
Andrew Tridgell
df9ec77b6b merge from volker
(This used to be ctdb commit a5587b3c065f7115ad5e55429c2c9d9923d3b4dc)
2007-08-22 17:18:55 +10:00
Andrew Tridgell
95f6328678 merge from volker
(This used to be ctdb commit 7007e4f2292aa96287b899d6b9e82c7b597ef58f)
2007-08-22 17:16:01 +10:00
Ronnie Sahlberg
50c09b7465 when we receive a packet from the network, check explicitely that the
node is not banned it the call is for a database record. i.e a REQ/REPLY 
CALL/DMASTER

if we get such a call while banned, ignore the packet and write an entry 
in the logfile

(This used to be ctdb commit 79eb0863609fbb12e28ebf734101b1d3f359b330)
2007-08-22 12:53:24 +10:00
Ronnie Sahlberg
f6e0336b23 create a define to represent the 'invalid' generation id we used in two
places.

create a new helper function to generate new generation id values that 
know about the invalid id and avoids generating it.

update the ctdb status tool to know about the invalid generation id and 
print the string INVALID instead

(This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda)
2007-08-22 12:38:31 +10:00
Ronnie Sahlberg
e3b6d1e511 if the node is inactive i.e. banned or disconnected then that node is
not participating in the cluster

if a client tries to attach to a database while the node is inactive,  
return an error back to the client and fail the attach

(This used to be ctdb commit b26949f3c8e54f3bc60da04d7b4ac69f301068fc)
2007-08-22 11:34:48 +10:00
Ronnie Sahlberg
b47384d57a when a node becomes banned its databases are no longer part of ctdb
and it should thus no longer serve any database access calls until it 
has been reintroduced into the cluster.

when becoming banned,   reset the local generation id to 1   to prevent 
any further database access calls from other nodes from being processed.

(This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8)
2007-08-22 10:38:35 +10:00
Ronnie Sahlberg
5fef81a6f1 if lockwait takes an excessive time to complete. log the time it took to
complete and also the name of the database

(This used to be ctdb commit 221ef0348fd8113a017d229d8c2c7aa5c4dfb5c2)
2007-08-22 09:46:48 +10:00
Ronnie Sahlberg
8b06fc7284 change the structure used for node flag change messages so that we can
see both the old flags as well as the new flags (so we can tell which 
flags changed)

send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to 
every node, connected or not, in the cluster.


in the handler inside the recovery daemon which is invoked for node flag 
change messages, only do a takeover_run() and redistribute the ip addresses IF it was the 
disabled or the unhealthy flags that changed. Also send out the cluster 
reconfigured message in this case.
If any of the other flags changed we dont need to do the takeover_run(0 
here since that will be done during recovery.

(This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)
2007-08-21 17:25:15 +10:00
Ronnie Sahlberg
4e4dd6b886 when we shutdown the service due to receiving a 'ctdb shutdown' command
from the administrator, log this as 'Received SHUTDOWN command. Stopping 
CTDB daemon.'   so that the administrator will know when looking at the 
log 'why' the ctdb service was terminated.

Previously the only thing logged was 'shutting down' which is not 
detailed enough.

(This used to be ctdb commit 5b818c1b72b6594a8d6e45e1865026e3ce33ae63)
2007-08-21 09:46:27 +10:00
Ronnie Sahlberg
5228abef64 add an atexit() that will print "CTDB daemon shutting down" in the log
when the main daemon exits

(This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)
2007-08-21 09:43:53 +10:00
Ronnie Sahlberg
a03c8d4954 setup the logfile much earlier in the startup procedure for ctdbd
change initial errors that cause ctdb to fail to start from printf to 
DEBUG(0

add a DEBUG(0 to log that the ctdb service is starting

(This used to be ctdb commit 680b4fbb283dd68567a62a83345f11a6cc1dd0e5)
2007-08-21 09:33:03 +10:00
Ronnie Sahlberg
b582e13cae make sure that the event script is executable and just ignore it
othervise

(This used to be ctdb commit 65eb7845c70489d654acaaf99cd2c8eac7df11dc)
2007-08-21 09:22:14 +10:00
Ronnie Sahlberg
aed2c58c64 dont pollute the log with 'Registered PID XXX for client YYY' at log
level 0.

change the log level to 3 for this information message

(This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)
2007-08-21 08:42:42 +10:00
Ronnie Sahlberg
7e1f840c8d if a public address has already been taken over by a node, then let that
public address remain at that node until either the node becomes 
unhealthy or the original/primary node for that address becomes healthy 
again.


Othervise what will happen is 
1, if we ban a node,   the banning code immediately does a 
takeover_run() and reassigns the public address to a different node in 
the cluster.
2, a few seconds later (at most) the recovery daemon will detect that 
the number of nodes has shrunk and will initiate a recovery.
During the recovery  the public address would again be assigned to a 
node, this time a different node.

(This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)
2007-08-20 14:16:58 +10:00