samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Andrew Tridgell	3427793f01	don't do the first startup event until we are out of recovery (This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea)	2007-11-12 13:10:15 +11:00
Andrew Tridgell	bde886988b	prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)	2007-11-12 10:53:11 +11:00
Andrew Tridgell	82bd652749	patch from michael adam (This used to be ctdb commit a7a3bef90f033bab5cb110a6ef77a8bef48f2588)	2007-11-02 13:20:29 +11:00
Andrew Tridgell	29e48fe54a	increase release number (This used to be ctdb commit dc648b1bb6becc52dcf900add97418a5634367eb)	2007-10-30 10:19:43 +11:00
Andrew Tridgell	684282f7a1	added bonding info to ctdb_diagnostics (This used to be ctdb commit 71b5fc434bc5d88eb0669ee29aa932ba12737e07)	2007-10-30 10:18:52 +11:00
root	2a70ac8801	the while loop in the startup event runs as a subshell so we need an extra \|\| exit 1 at the end to propagate the error code back to the caller of the script (This used to be ctdb commit c30d5c328784059949f5e82a07008e9632234f20)	2007-10-29 12:34:45 +11:00
Ronnie Sahlberg	8599f2008d	if bond* interfaces are used as public interfaces we can not rely on ethtool but have to check /proc for the status instead (This used to be ctdb commit 4ed7747267aea265b7a71c651abf6d5db4f4718b)	2007-10-29 10:51:16 +11:00
Ronnie Sahlberg	ba6f9ae4a7	merge from tridge (This used to be ctdb commit c7777b966f6a6e0f4126c03300338fdc822ac6c9)	2007-10-29 08:50:51 +11:00
Ronnie Sahlberg	bd73497a18	merge from tridge (This used to be ctdb commit 919ba610c61cfaf5ecc1ab64ad8be34a80d928f4)	2007-10-29 08:40:46 +11:00
Andrew Tridgell	6d75f0703e	added monitoring of ftp ports (This used to be ctdb commit 4780e078fb55d69053f78a4bbc7c67e569bb5dae)	2007-10-26 14:53:09 +10:00
Ronnie Sahlberg	533a530177	since service nfs stop/start sometimes fail to bring up the mount daemon on rhel5 check if mountd is running during monitoring and if it is not, try to restart it (This used to be ctdb commit 3d4b74669164b519398aeeacd59714f1e3884eff)	2007-10-23 12:35:43 +10:00
Andrew Tridgell	1d6b4f418d	update release number (This used to be ctdb commit fe6766940b2cf8a84ed51824158c956362a5806d)	2007-10-23 11:56:52 +10:00
Andrew Tridgell	2cea351f45	merge from ronnie (This used to be ctdb commit cc70a2cc5f5400d6480cb609e1fa203236917976)	2007-10-23 11:45:36 +10:00
Ronnie Sahlberg	44ab81763d	merge from tridge (This used to be ctdb commit 938e375a80ce2f1827117c38554f576f73a5c71e)	2007-10-23 06:42:45 +10:00
Andrew Tridgell	6e6de1e4b7	fixed a problem with backgrounding onnnode (This used to be ctdb commit 4e23630224bb219cfbbf129c4562da5a4c2d601a)	2007-10-22 21:11:02 +10:00
Andrew Tridgell	8e22bca5ca	fixed a double close of a socket, leading to an EPOLL error (This used to be ctdb commit bbe8ad842bdfedd37ef14a6be07ad939113fe9b1)	2007-10-22 16:41:11 +10:00
Ronnie Sahlberg	6a32af60b8	nfs may take a while to stop so do it in hte background (This used to be ctdb commit 2ccaeaf6a65731c17173a4945e3e00e230e67d35)	2007-10-22 15:14:49 +10:00
Andrew Tridgell	2d8afd85d5	another place where we need to mark connect_fde as freed (This used to be ctdb commit d047fbeafebe4b150602f9a91802795659058b16)	2007-10-22 15:13:32 +10:00
Andrew Tridgell	2931ed5d17	fixed a valgrind uninitialised memory error due to pad bytes (This used to be ctdb commit aea9b0c8d467fe19815c046969e9c1049a3a20ac)	2007-10-22 15:13:08 +10:00
Andrew Tridgell	f09537e7f1	prevent a double free (This used to be ctdb commit 5a1b923abb36c6deb99ae178fdd54f12235dc309)	2007-10-22 14:07:35 +10:00
Ronnie Sahlberg	1d6a74f943	when shutting down, we should stop monitoring (This used to be ctdb commit 325683ef8f326f0565a827ff2c493adcab6e0d64)	2007-10-22 12:34:51 +10:00
Ronnie Sahlberg	4a97876fb7	when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)	2007-10-22 12:34:08 +10:00
Ronnie Sahlberg	f022df1d40	dont set parameters in statd-callout if they should be set they bshould be set from 10.interfaces (This used to be ctdb commit 0c7c2dae0a976922de58793d576855bc37cd38e1)	2007-10-22 10:18:38 +10:00
Ronnie Sahlberg	caad5dc38d	dont set some of the sysctl variables in statd-callout. these are mainly useful for avoiding ack-storms when doing very rapid failover/failback during testing but should not be required in real-world. this gets rid of a lof of annoying messages from the messages file (This used to be ctdb commit 50d289dcce2caa7c7be9b6faa3b38b69c2237038)	2007-10-21 06:42:33 +10:00
Andrew Tridgell	1a8338e443	increase release number (This used to be ctdb commit 747ff96f1d93c52ba7548d0540266b0277d88ac1)	2007-10-19 12:22:24 +10:00
Andrew Tridgell	f47f758fe8	merge from ronnie (This used to be ctdb commit d444fdc7782496abe4b27003b647ac49fb52e6be)	2007-10-19 09:39:07 +10:00
Andrew Tridgell	623e216dcf	remove a incorrectly added file (This used to be ctdb commit ff01a32db81b6c04d42634f5660181c270988264)	2007-10-19 09:30:55 +10:00
Ronnie Sahlberg	e81e008a36	add missing ) in the IB transport (which i dont compile for) (This used to be ctdb commit 7f7a184bae87d46bd589d11068b6443b007366b4)	2007-10-19 09:05:37 +10:00
Ronnie Sahlberg	fe7b5b4d85	add a stub restart method for IB (This used to be ctdb commit d318504ad5a49dbdfa307be39ae88df839e6308d)	2007-10-19 09:04:52 +10:00
Ronnie Sahlberg	d1ba047b7f	add a new transport method so that when a node is marked as dead, we shut down and restart the transport othervise, if we use the tcp transport the tcp connection might try to retransmit the queued data during the time the node is unavailable. this together with the exponential backoff for tcp means that the tcp connection quickly reaches the maximum backoff rto which is often 60 or 120 seconds. this would mean that it could take up to 60/120 seconds before the tcp layer detects that the connection is dead and it has to be reestablished. (This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)	2007-10-19 08:58:30 +10:00
Ronnie Sahlberg	755511d28d	set the flags explicitely isnstead of masking them in (This used to be ctdb commit 27a5f9dead44890683f9dbc4f07cda11264aa03b)	2007-10-18 16:54:00 +10:00
Andrew Tridgell	b814462c38	added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760)	2007-10-18 16:27:36 +10:00
Andrew Tridgell	5e3d5b1314	merge from ronnie (This used to be ctdb commit a6b094fdede0ae850e87877fad0b9dd1f3a26869)	2007-10-18 15:51:15 +10:00
Andrew Tridgell	d939a2901b	merge from ronnie (This used to be ctdb commit 75d4b386293e186a6bb8532515585ab72670d663)	2007-10-18 15:44:02 +10:00
Ronnie Sahlberg	e4ec6e9d6b	flush the route cache when we have added the single public ip to the node cleanup and remove everything when we do a shutdown event (This used to be ctdb commit 221432f45073bc7624803058c8bbf18838e7ceeb)	2007-10-18 14:13:48 +10:00
Ronnie Sahlberg	537841fadb	use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry about this packet any more and just forget it ever saw it (This used to be ctdb commit 42a2a777cbc15a8cbbea7ecf2fb1c6dafa242d0c)	2007-10-17 15:03:58 +10:00
Ronnie Sahlberg	9a93f4b8df	reverse the order in which public ips are listed so it matches the order of the public_addresses file (This used to be ctdb commit ce987661edd9160982e65866fb773445d296e5c7)	2007-10-17 13:42:42 +10:00
Ronnie Sahlberg	805ba22d65	merge from tridge (This used to be ctdb commit 87760a95ec0a9e3cb2c415c569235a1ff58318cb)	2007-10-17 10:10:52 +10:00
Andrew Tridgell	85f91b9d5c	increase release number (This used to be ctdb commit 69fe7ce1d7874ce51d79de29adc53c207cb8869f)	2007-10-16 20:14:04 +10:00
Andrew Tridgell	6b9d73a96d	more detail on multipath config (This used to be ctdb commit 78c44f2267cbef5fbc57d56dfd5ff40972733a1f)	2007-10-16 20:13:28 +10:00
Ronnie Sahlberg	ce7a054d20	add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e)	2007-10-16 15:27:07 +10:00
Ronnie Sahlberg	056aac6e0c	add a new tunable : DeterministicIPs that makes the allocation of public addresses to nodes deterministic. Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb When this is set, the first entry in /etc/ctdb/public_addresses will always be hosted by node 0, when that node is available, the second entry by node1 and so on. This tunable allows the allocation of addresses to become very unbalanced and is only for debugging/testing use. Beware, this feature requires that /etc/ctdb/public_addresses are identical on all the nodes in the cluster. (This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)	2007-10-16 12:15:02 +10:00
Ronnie Sahlberg	25d3a031d0	include system/network.h so we get the prototype for inet_aton() (This used to be ctdb commit 7145764b2d217f88a723dcb0ffd4e5a1567d64cf)	2007-10-16 11:29:33 +10:00
Ronnie Sahlberg	7e2e1b14fb	merge from tridge (This used to be ctdb commit 9e6bc12c9be2dabcfb9c6aeef257ef4737287fab)	2007-10-16 11:26:22 +10:00
Ronnie Sahlberg	b3ff7d904d	dont try to lock the file from inside the ctdb daemon. eventhough we dont want a blocking lock it does appear that the fcntl() call can block for a while if gpfs is in the process of rebuilding itself after a node arriving/leaving the cluster (This used to be ctdb commit 6c0d206dea7116db71bccb4802a93dd7283249f6)	2007-10-16 09:50:31 +10:00
Andrew Tridgell	d7f6b63f0a	only link to -lipq if needed (This used to be ctdb commit 7c378d881e37db0f14e07ccba19fde1f9f4f0831)	2007-10-15 14:44:06 +10:00
Andrew Tridgell	574db736f2	improved handling of systems without libipq.h (This used to be ctdb commit cfa8ddd3ca53c0160558137cccfc7e73e46ec36c)	2007-10-15 14:37:54 +10:00
Andrew Tridgell	9570939337	disable ipmux code until we have a configure test (This used to be ctdb commit fd83f0f3eb233f22ce9b5b4afbc4f26e3c865b3c)	2007-10-15 14:29:47 +10:00
Andrew Tridgell	99bc0aca93	sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210)	2007-10-15 14:28:51 +10:00
Andrew Tridgell	0e855c0772	merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676)	2007-10-15 14:17:49 +10:00
Andrew Tridgell	0a5bfdd315	disable optimisation for now, until we find a occasional segv (This used to be ctdb commit d09570c70551aa40390ce9ceffe7bc234e1afafe)	2007-10-15 13:31:09 +10:00
Andrew Tridgell	174879621e	add config option for disabling bans (This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)	2007-10-15 13:22:58 +10:00
Ronnie Sahlberg	ebe772b1b2	use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb (This used to be ctdb commit 6abb46b010851f5719f12273b4a3d46ec986f0c7)	2007-10-11 07:51:57 +10:00
Ronnie Sahlberg	870a57a55b	use kill_tcp_connections() to kill off all tcp connections to the "single public ip" address when we do a recovery (This used to be ctdb commit 19b52a2d5db31efa9e7c77037097ff8539986ac3)	2007-10-11 07:30:10 +10:00
Ronnie Sahlberg	fa5d51c238	move the kill_tcp_connections() function from 10.interfaces to functions (This used to be ctdb commit 055948530fb16bf49c42fc4489f29a21665156c0)	2007-10-11 07:27:38 +10:00
Ronnie Sahlberg	1a4999076b	first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433)	2007-10-11 07:10:17 +10:00
Ronnie Sahlberg	167e100d4b	simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f)	2007-10-11 06:16:36 +10:00
Ronnie Sahlberg	33a6aa3c3f	merge from tridge (This used to be ctdb commit 4690a205fe4325b03ab044bdb5fbc9aa3e94db6e)	2007-10-10 10:49:55 +10:00
Andrew Tridgell	011a205b86	make sure reconnected nodes start off as unhealthy so they don't get a public IP (This used to be ctdb commit c733ec6760cae01ce277f491caf1355e46de5cf7)	2007-10-10 10:45:22 +10:00
Ronnie Sahlberg	bdd67bba1e	add a --single-public-ip argument to ctdbd to specify the ip address used in single public ip address mode. when using this argument, --public-interface must also be used. add a vnn structure to the ctdb context to describe the single public ip address update the killtcp control in the daemon that if a socketpair that is to be killed does not match a normal public address it checks if the destination address maches the single public ip address and if so uses that vnn structure from the ctdb context this allows killtcp to kill also connections to the single public ip instead of only normal public addresses (This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)	2007-10-10 09:42:32 +10:00
Ronnie Sahlberg	7735957693	remove some debug outputs (This used to be ctdb commit f29c0b52df1f455909ba133e3ad3bc462dc32929)	2007-10-09 13:45:42 +10:00
Ronnie Sahlberg	03e0277e03	send out gratious arps when we are starting up serving the "single public ip" but before we start the ipmux tool (This used to be ctdb commit dad1a80f39763314825939095f7656c13dcdbdc3)	2007-10-09 12:00:12 +10:00
Ronnie Sahlberg	80cd82f8e4	add a control to send gratious arps from the ctdb daemon (This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)	2007-10-09 11:56:09 +10:00
Ronnie Sahlberg	292e9d9109	add an initial test version of an ip multiplex tool that allows us to have one single public ip address for the entire cluster. this ip address is attached to lo on all nodes but only the recmaster will respond to arp requests for this address. the recmaster then runs an ipmux process that will pass any incoming packets to this ip address onto the other node sin the cluster based on the ip address of the client host to use this feature one must 1, have one fixed ip address in the customers network attached permanently attached to an interface 2, set CTDB_PUBLI_INTERFACE= to specify on which interface the clients attach to the node 3, CTDB_SINGLE_PUBLI_IP=ip-address to specify which ipaddress should be the "single public ip address" to test with only one single client, attach several ip addresses to the client and ping the public address from the client with different -I options. look in network trace to see to which node the packet is passed onto. (This used to be ctdb commit 50d648c95e4e6d7c2867a034c2b550086d853320)	2007-10-08 14:05:22 +10:00
Ronnie Sahlberg	ab5d098bf6	add a function in the ctdb tool to determine whether the local node is the recmaster or not. return 0 if the node is the recmaster and 1 (true) if it is not or if we could not communicate with the ctdb daemon. call it 'isnotrecmaster' to cope with that if the tool could not bind to the socket to tyalk to the daemon, the tool will automatically return an error and exit code 1 thus the tool will only return 0 if it could talk successfully to the local daemon and if the local daemon confirms this node is the recmaster (This used to be ctdb commit ae5fcb790b6c3985f514fa8a96bc00c2619f2a28)	2007-10-08 09:47:20 +10:00
Ronnie Sahlberg	de6c5ed14d	merge from tridge (This used to be ctdb commit 02cda01c032804cb1c53593ceb98685c827e2d58)	2007-10-06 08:11:24 +10:00
Andrew Tridgell	50770008df	fixed several places where we set the recovery culprit incorrectly (This used to be ctdb commit d9da73395fa443801fc68ec53a42b548e832d58a)	2007-10-05 13:51:31 +10:00
Andrew Tridgell	4115492992	- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f)	2007-10-05 13:28:21 +10:00
Andrew Tridgell	fb48f2d5a2	we are the culprit if we can't get the reclock (This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a)	2007-10-05 12:01:40 +10:00
Ronnie Sahlberg	72379ee3eb	change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)	2007-09-26 14:25:32 +10:00
Andrew Tridgell	ef995062b2	upped version number (This used to be ctdb commit 4312e20e047ddb0f825c5e0c51d85dfa6a1b7df8)	2007-09-24 15:27:01 +10:00
Ronnie Sahlberg	359448ff00	when we have a public ip address mismatch (i.e. we hold addresses we shouldnt or we are not holding addresses wqe should) we must first freeze the local node before we set the recovery mode (This used to be ctdb commit a77a77e8b5180f6a4a1f3d7d4ff03811f3b71b56)	2007-09-24 10:52:26 +10:00
Andrew Tridgell	e3d0ec8797	fixed a fd leak on the recovery lock (This used to be ctdb commit 186f35c42ed4fcc9ed44390b0dd036ece475d45e)	2007-09-24 10:19:07 +10:00
Andrew Tridgell	80100c3573	run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)	2007-09-24 10:12:18 +10:00
Andrew Tridgell	b87ddd9148	no longer wait at startup for services to become available, instead set the node initially unhealthy and let the status monitoring bring the node online. This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup and thus frozen (This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)	2007-09-24 10:00:14 +10:00
Andrew Tridgell	4178cb98a1	fixed a valgrind error, and some warnings (This used to be ctdb commit c0f52dbb385fa0748680adb7c40755c92e577551)	2007-09-24 09:57:14 +10:00
Andrew Tridgell	416c0cec6e	make the persistent dbdir configurable (This used to be ctdb commit 2587b887dcfce26b12c66fcb5d34e92da42a1776)	2007-09-21 16:12:04 +10:00
Andrew Tridgell	2607c222fc	avoid using connected nodes that aren't in the vnn map yet (This used to be ctdb commit 2b5ae133f5f6fa9ad1a8896fe4b4c542d4ca462d)	2007-09-21 15:44:13 +10:00
Ronnie Sahlberg	51d912063c	in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335)	2007-09-21 15:19:33 +10:00
Ronnie Sahlberg	61e885d0b9	when ctdb attaches to a database it broadcasts the attach to all other nodes so that the db is created on them as well when we send this broadcast we must use the correct control and not assume all databases created are of the temporary kind (This used to be ctdb commit 106f816d4a0814ca4418de051289d9fc62df7dd2)	2007-09-21 13:47:40 +10:00
Ronnie Sahlberg	e9f45419da	merge from tridge (This used to be ctdb commit bb283ee8ebaea848366e9c3b3d3244da459a7967)	2007-09-21 13:20:29 +10:00
Andrew Tridgell	c60988325d	added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)	2007-09-21 12:24:02 +10:00
Ronnie Sahlberg	577fffdc5d	merge from tridge (This used to be ctdb commit c4262a3f980a9fabe0985fc3281e9f5de1cdf35e)	2007-09-19 11:54:45 +10:00
Ronnie Sahlberg	dc0bd93789	one more command to run to enable winbind for vsftpd (This used to be ctdb commit 1f6d13a364cde58b66a4bc52e909cc68b8c807d7)	2007-09-19 11:53:48 +10:00
Andrew Tridgell	81bfa58d58	make sure we set close on exec on any possibly inherited fds (This used to be ctdb commit d9dec82076f14a348e7b67b4350180681ff86f32)	2007-09-19 11:46:37 +10:00
Andrew Tridgell	0438b07b53	separate out the various fs display ops (This used to be ctdb commit dc89e1a428da5d5ca2a9c4988c05de3ea65f00f4)	2007-09-19 11:46:11 +10:00
Andrew Tridgell	bd7eeebe16	expanded ctdb_diagnostics a bit (This used to be ctdb commit 70a4bb3dc7e624ad778949dbc874c2617fd532e6)	2007-09-17 15:31:33 +10:00
Ronnie Sahlberg	d9f936fefe	add documantation of additional requirements for FTP so that users can log in and access files using the AD username/password (This used to be ctdb commit 679e125770247fc24dfb14b5781d44f639457ecd)	2007-09-17 13:01:16 +10:00
Andrew Tridgell	c6abc4e1a8	increase release number (This used to be ctdb commit b213f4f1bf5bcfb40b9ce176df22216e3ebbe964)	2007-09-14 19:27:11 +10:00
Andrew Tridgell	ed75f988d5	merge from ronnie (This used to be ctdb commit 913c33a7d2f67570548fecc568dba874e5f72dd2)	2007-09-14 15:23:23 +10:00
Ronnie Sahlberg	2d0261afeb	let ctdb ip only print the ip addresses known to the specified node and not the entire cluster (This used to be ctdb commit eb1f67a56d752c9f42a9a26a6697a7ab8e668b3a)	2007-09-14 15:19:44 +10:00
Ronnie Sahlberg	90a37c4fb4	update vnn -> pnn in documentation (This used to be ctdb commit bb62b4df514255b95d3871b254bec9c440bc4a06)	2007-09-14 14:24:53 +10:00
Ronnie Sahlberg	05e1f67381	documentation updates it is --event-script-dir not --event-script add explanation of the public_addresses file (This used to be ctdb commit 21325b23e786ac1c2abc07ea75b0814e9c725a9e)	2007-09-14 14:19:12 +10:00
Andrew Tridgell	c62490569b	cope with non-standard install dirs in event scripts (This used to be ctdb commit 52fff5345873690a9cc86495f414343eaa3bd540)	2007-09-14 14:14:03 +10:00
Andrew Tridgell	305f432e50	fix pkill args (This used to be ctdb commit 9690de97b4746f4a79830465e3a1679e9fbda671)	2007-09-14 11:59:04 +10:00
Andrew Tridgell	955d4d8615	make sure all public IPs are removed at startup (This used to be ctdb commit b16f33787f2a9471285037f4a6d470e826536570)	2007-09-14 11:56:40 +10:00
Ronnie Sahlberg	8edcd3f83f	during startup make sure to delete any public addresses from any interface (This used to be ctdb commit 18d80ea6db39e61f60e4c01de164d58bcbd8ab10)	2007-09-14 10:37:10 +10:00
Ronnie Sahlberg	6052078b53	let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc)	2007-09-14 10:16:36 +10:00
Andrew Tridgell	42fc00bda9	- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b)	2007-09-14 09:49:12 +10:00
Andrew Tridgell	3b159e4e60	wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done (This used to be ctdb commit 216eb4be7ec481cfe9aaeeada257b77cb394d2e4)	2007-09-14 09:25:11 +10:00
Andrew Tridgell	9cf96a5e4c	nicer use of testparm (This used to be ctdb commit a611ea930fb9dae6e56f6a74b2bdc9e08066d4d1)	2007-09-14 09:24:34 +10:00
Ronnie Sahlberg	4c20141659	update the section about event scripts (This used to be ctdb commit a0744480c85a4e8648bd0ae7600f90d311b931ea)	2007-09-14 08:56:27 +10:00
Ronnie Sahlberg	528b8af87f	disable nfsv4 in etc/sysconfig/nfs (This used to be ctdb commit b71e11f0e27bb3ff908ad171aa5b1f724609ad05)	2007-09-14 08:15:24 +10:00
Ronnie Sahlberg	4186d8eaba	when a ctdb_takeover_run has failed we must make sure that need_takeover_run is set to true or else we might forget to rerun it again during the next recovery othervise, need_takeover_run is only set to true IFF the node flags for a remote node and the local nodes differ. It is possible that a takeover run fails and thus the reassignment of ip addresses is incomplete but before we get back to the test in monitor_cluster() that all the node flags of all nodes have converged and they now match each others again. and thus causing monitor_cluster() to fail to realize that a takeover run is needed. (This used to be ctdb commit ae7e866787cebd14394983ce1834387c959d1022)	2007-09-13 14:51:37 +10:00
Andrew Tridgell	2f86c3f827	ensure smbd and winbindd do die in 50.samba (This used to be ctdb commit 6f23affedb626fc7a5ca86c4763f3045a5586231)	2007-09-13 14:36:23 +10:00
Ronnie Sahlberg	ab1c8c074e	merge from tridge (This used to be ctdb commit eda3caa77be352967a41ff9bddda5296c94797a9)	2007-09-13 14:28:18 +10:00
Andrew Tridgell	9d50595b8a	prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d)	2007-09-13 14:08:18 +10:00
Andrew Tridgell	6fa6101b1a	more shell scripting fixes in 10.interface (This used to be ctdb commit 4ee2230b3f2ae7437a9d0cf973eb4645d276accd)	2007-09-13 11:57:42 +10:00
Andrew Tridgell	30de14fe79	force recovery if unable to tell a node to release an IP (This used to be ctdb commit 6895788d2499344a03357e5c1103cb8383e9eaf7)	2007-09-13 11:19:49 +10:00
Andrew Tridgell	25940014c0	fixed script errors in 10.interface (This used to be ctdb commit 0c759614d27758cef3eba5942b2cccad54193cbb)	2007-09-13 11:19:30 +10:00
Andrew Tridgell	3c0f61cb92	we don't need the is_loopback logic in ctdb any more (This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)	2007-09-13 10:45:06 +10:00
Andrew Tridgell	4f261ae191	remove more cruft from the logs (This used to be ctdb commit b67f35c483b6cbb5facaa6380c7794709f44213a)	2007-09-13 10:39:05 +10:00
Andrew Tridgell	023b885793	new approach for killing TCP connections on IP release (This used to be ctdb commit c33a0db29b5604966f582b1f8c5fd66760c72197)	2007-09-13 10:24:48 +10:00
Andrew Tridgell	1b53ecc445	remove clutter from ctdb log file (This used to be ctdb commit 54d5dcaaee0498f40bbee5059cc72d0ca75d33b7)	2007-09-13 10:03:18 +10:00
Andrew Tridgell	a919f6927a	fixed return code (This used to be ctdb commit 30165b5a19f9bd9d1f62c9c222df0711c1c6a927)	2007-09-13 10:02:56 +10:00
Andrew Tridgell	96c54c6188	handle hung or slow ctdb daemons on shutdown (This used to be ctdb commit a3089211782ab12387c1b04efa28914c94d89b30)	2007-09-12 13:26:24 +10:00
Andrew Tridgell	6c77184d96	- set arp_ignore to prevent replying to arp requests for addresses on loopback - put removed IPs on loopback with scope host - check for nul strings in ethtool call ; (This used to be ctdb commit e2df1d6d08e67a36ff05a590a34c56e900741287)	2007-09-12 13:23:36 +10:00
Andrew Tridgell	67bd64ef35	- don't allow the registration of clients with IPs we don't hold - change some debug levels to make tracking of IP release problems easier (This used to be ctdb commit 5f9aed62adaf87750f953412c55b29c58e4bb6c0)	2007-09-12 13:22:31 +10:00
Andrew Tridgell	a478c78f03	changed some debug levels (This used to be ctdb commit ed764533e1c2f8982e1577ca5e7f5f4482a15345)	2007-09-12 13:21:19 +10:00
Ronnie Sahlberg	536d393452	use the public addresses variable instead of hardcoding the path (This used to be ctdb commit 8e23f173cda8a76bbc243863bfc49fe8c7b907f4)	2007-09-12 07:28:24 +10:00
Ronnie Sahlberg	98f968d8d3	move all ip addresses onto loopback when we startup ctdb (This used to be ctdb commit 5d7500f7d93f0d36ffbf3c966c5b38f82f0376c7)	2007-09-12 07:26:30 +10:00
Andrew Tridgell	a6728e0520	fixed location of arp_filter (This used to be ctdb commit ea239c82fca2b9a648d21e5c603e632011958452)	2007-09-11 16:38:32 +10:00
Andrew Tridgell	5b65a6c7f0	get interface right (This used to be ctdb commit e0edc38d7e897f7de2850eb2cfd17fea75c16fcc)	2007-09-10 20:45:27 +10:00
Ronnie Sahlberg	a9a8ad07b4	grab the interface name from tok and not from the uninitialized array (This used to be ctdb commit 23a47ca2331a163b5fde03bd2f6f1d478633aede)	2007-09-10 16:34:11 +10:00
Ronnie Sahlberg	9c1b2f4856	merged patch from tridge (This used to be ctdb commit 90ab044093f67b656e21861ce12d6fee5794d21f)	2007-09-10 16:23:06 +10:00
Andrew Tridgell	8cd7ca149e	fixed a pointer cast warning (This used to be ctdb commit df0e7a4aa13112d613702d8ea0fb0e18510d293c)	2007-09-10 15:16:17 +10:00
Andrew Tridgell	57d8102cf8	added back --public-interface to startup script (This used to be ctdb commit 9e9cb3c0da7251f522c655366ef0868037577a9c)	2007-09-10 15:09:28 +10:00
Andrew Tridgell	f3ae1cdb02	- use struct sockaddr_in more consistently instead of string addresses - allow for public_address lines with a defaulting interface (This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)	2007-09-10 14:27:29 +10:00
Andrew Tridgell	70ec39b1b1	add back in --public-interface as a default (This used to be ctdb commit cdf56daf69b2c8381ee673943e982ad20f19affd)	2007-09-10 14:26:35 +10:00
Andrew Tridgell	42168177ef	merge from ronnie (This used to be ctdb commit 1f21d4d563232926c35d03c4d69eb69190823dc6)	2007-09-10 13:21:11 +10:00
Andrew Tridgell	f3927719c9	add crontab and sysctl output (This used to be ctdb commit b1b59f3294ee7a5ed6d685f373bf19d3152170fa)	2007-09-10 11:27:07 +10:00
Ronnie Sahlberg	50381480eb	update a comment (This used to be ctdb commit e7d3ef4443686529299e8f293398cc0522235627)	2007-09-10 07:45:57 +10:00
Ronnie Sahlberg	4ac749bfa4	change the signature to ctdb_sys_have_ip() to also return: a bool that specifies whether the ip was held by a loopback adaptor or not the name of the interface where the ip was held when we release an ip address from an interface, move the ip address over to the loopback interface when we release an ip address after we have move it onto loopback, use 60.nfs to kill off the server side (the local part) of the tcp connection so that the tcp connections dont survive a failover/failback 61.nfstickle, since we kill hte tcp connections when we release an ip address we no longer need to restart the nfs service in 61.nfstickle update ctdb_takeover to use the new signature for ctdb_sys_have_ip when we add a tcp connection to kill in ctdb_killtcp_add_connection() check if either the srouce or destination address match a known public address (This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)	2007-09-10 07:20:44 +10:00
Ronnie Sahlberg	0ebd7beb4b	set /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when 10.interfaces startsup this setting makes the system only respond to APR requests from the NIC where the ip address is tied to and adds to the "principle of least surprise" when using multihoming servers (This used to be ctdb commit 39ddf347dc45f599964a4c17e67e71faed00e544)	2007-09-08 08:09:02 +10:00
Ronnie Sahlberg	d91b28f8b7	ctdb ip must loop over all connected nodes to pull hte public ip list and merge into a big list since with the deassociation between a node and a public ipaddress the /etc/ctdb/public_addresses files can differ between nodes and no node know about all public addresses that a cluster can use (This used to be ctdb commit e208294fed183977cacc44b2cd1195c11d967c18)	2007-09-07 16:45:19 +10:00
Ronnie Sahlberg	3cad21d6be	remove the ctdb publicip command this command no longer makes sense when there is no on-to-one mapping between a node and its default public ip (This used to be ctdb commit 91280db7f6dd3d659edd86fae21ba347d6f9da9e)	2007-09-07 15:39:26 +10:00
Ronnie Sahlberg	d0dd8df752	update web nfs with the new NFS_HOSTNAME variable we need to be able to stat notify using the correct hostname (This used to be ctdb commit 1498e33e48a4654e02b74a00ef7473fed3225d69)	2007-09-07 12:20:48 +10:00
Ronnie Sahlberg	eb7a15730e	add a short delay after stopping nfslock to make it less likely that "weird" things happen (This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a)	2007-09-07 12:14:53 +10:00
Ronnie Sahlberg	68c37f9b41	merge from tridge (This used to be ctdb commit 58c918b1bfe09c31049769dee266129cbad4cb20)	2007-09-07 09:21:40 +10:00
Ronnie Sahlberg	fa872de664	60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)	2007-09-07 08:52:56 +10:00
Ronnie Sahlberg	82984577f1	we dont need the rpc.statd on shared directory neither do we need PUBLIC_IP anymore (This used to be ctdb commit fd571ac87f65928e92dde6977745083bf381df1a)	2007-09-06 11:32:18 +10:00
Ronnie Sahlberg	00453a375a	improve the handling of hosts to notify with statd (This used to be ctdb commit cc87bda7e344bc777b9620a6211e62de4dce4e3b)	2007-09-06 11:30:49 +10:00
Ronnie Sahlberg	19546fb007	specify the additional ports for nfs (This used to be ctdb commit 1934163f0b393738615a05854082a7d488003e1c)	2007-09-06 10:26:44 +10:00
Ronnie Sahlberg	f7d193e9ce	the event scripts for nfs are called 60.nfs and 61.nfstickle (This used to be ctdb commit b15f1c25560320993b93aa3d943985dab4e47947)	2007-09-06 10:18:13 +10:00
Ronnie Sahlberg	0781616ef9	document NFS_TICKLE_SHARED_DIRECTORY on our web page (This used to be ctdb commit 40ec29f602897e9b01a6747806f502ab38423d54)	2007-09-06 08:21:11 +10:00
Ronnie Sahlberg	46eecfea27	we dont use 'sendip' any more so dont check for it and exit from the 61.nfstickles script if it is missing from the host (This used to be ctdb commit 8eac441e24f4ef33b55f9eaa4856b5c1e1c15213)	2007-09-05 15:39:51 +10:00
Ronnie Sahlberg	a9c8456ed6	we should always get data back from getnodemap (This used to be ctdb commit ff999a4b56f714c58c81baa454a2d39d04944136)	2007-09-05 14:59:29 +10:00
Ronnie Sahlberg	e4eeceaf3a	dont dereference vnn before we have assigned it a pointer value (This used to be ctdb commit 2a8fc69aea8527b22a3fe57427677e4caff57338)	2007-09-05 14:29:44 +10:00
Andrew Tridgell	c572d3c226	added a diagnostics tool for ctdb (This used to be ctdb commit 032a2238caf688656b00e06bf363182368e037e1)	2007-09-05 14:20:34 +10:00
Ronnie Sahlberg	77ec4d5248	allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)	2007-09-04 23:15:23 +10:00
Ronnie Sahlberg	8f819c6a0e	get rid of the ctdb_vnn_list structure and just use a single list of ctdb_vnn (This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)	2007-09-04 18:20:29 +10:00
Ronnie Sahlberg	cf45c5096c	we cant have takeover_ctx hanging off ctdb since it is freed/recreated everytime we release an ip. this context is used to hold all resources needed when sending out gratious arps and tcp tickles during ip takeover. we hang it off the vnn structure that manages that particular ip address instead so that we can have multiple ones going in parallell this bug (or the same bug in different shape) has probably been in ctdb for very very long but is likely to be hard to trigger (This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)	2007-09-04 14:36:52 +10:00
Ronnie Sahlberg	3e6be59f61	fix typo in debug output (This used to be ctdb commit 011a777c6e538ca79f104c7884a4f0e222997382)	2007-09-04 14:21:35 +10:00
Ronnie Sahlberg	784eac9079	dont just always return 0 from the killtcp control. return 0 or -1 so that the ctdb tool knows whether the control succeeded or not (This used to be ctdb commit cace8b40090be5529ec6b463d3839d0e22f4039d)	2007-09-04 14:19:18 +10:00
Ronnie Sahlberg	a50e83448c	change vnn to pnn in the traverse structure (This used to be ctdb commit d56ae0963b420edea6a2d5eeb408a9811af3f3f6)	2007-09-04 10:49:21 +10:00
Ronnie Sahlberg	f69321edc8	change debug output from vnn to pnn (This used to be ctdb commit 93a7cf759ae3f9af6671b9f8589e1399a669b46f)	2007-09-04 10:47:02 +10:00
Ronnie Sahlberg	d66d9cdd22	change debug output from vnn to pnn change ctdb_daemon_send_message to take pnn as parameter isntead of vnn (This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)	2007-09-04 10:45:41 +10:00
Ronnie Sahlberg	0c91261340	change ctdb_send_message to take pnn as parameter instead of vnn (This used to be ctdb commit 93dd4fba2e0fa6a011d15406652836785a974880)	2007-09-04 10:42:20 +10:00
Ronnie Sahlberg	157be530dd	change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn (This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f)	2007-09-04 10:38:48 +10:00
Ronnie Sahlberg	211b497818	change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)	2007-09-04 10:33:10 +10:00
Ronnie Sahlberg	6f693bbcbd	change server_id.vnn to server_id.pnn (This used to be ctdb commit 26f2ee2b754a9271454412f05111a19b3013c6eb)	2007-09-04 10:21:51 +10:00
Ronnie Sahlberg	583b6e6ba6	change ctdb_get_vnn to ctdb_get_pnn (This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)	2007-09-04 10:18:44 +10:00
Ronnie Sahlberg	4ba9990143	change vnn to pnn in the ctdb tool (This used to be ctdb commit 822556a4d4ba23459be3a25cbd3f48d1f64ba95f)	2007-09-04 10:14:41 +10:00
Ronnie Sahlberg	fc9d39c3a6	change ctdb_validate_vnn to ctdb_validate_pnn (This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)	2007-09-04 10:09:58 +10:00
Ronnie Sahlberg	eb4cf6a686	change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)	2007-09-04 10:06:36 +10:00
Ronnie Sahlberg	12ebb74838	change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)	2007-09-04 09:50:07 +10:00
Andrew Tridgell	7423bcaabe	up the release number (This used to be ctdb commit 71a6213c92a12bf794c17c30ae4987149b68fe1b)	2007-08-30 17:51:05 +10:00
Ronnie Sahlberg	4e61e05f49	when we start 60.nfs we must make sure that the shared storage nfs-state directory actually exists (by creating it) or else the lock manager will not start (This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)	2007-08-30 15:27:45 +10:00
Andrew Tridgell	8c94d4dc87	merge from ronnie (This used to be ctdb commit ab11fd70cf4d2165a5b55930cbad6fddf5397f54)	2007-08-27 18:04:53 +10:00
Ronnie Sahlberg	794fb10634	add an extra debug statement when we send a SIGTERM to a process (This used to be ctdb commit a9c1be9cf9efdc69bfc95657b70e9f8b8230cda8)	2007-08-27 17:33:46 +10:00
Ronnie Sahlberg	2c0c94782a	make the ctdb shutdown command use the async _send() function to send the shutdown command and return success to the caller if the _send() was successful (This used to be ctdb commit 6bacaf8c7a96044708a6eda10cc8576adb7f5f79)	2007-08-27 15:03:52 +10:00
Andrew Tridgell	7f630b67f6	fixed segv when no public interface is set (This used to be ctdb commit 55b415f87bd3cba13c73ccd2fe661720754a6af7)	2007-08-27 11:49:42 +10:00
Ronnie Sahlberg	7f02e16143	add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505)	2007-08-27 10:31:22 +10:00
Ronnie Sahlberg	a9c45b2562	change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546)	2007-08-27 09:40:10 +10:00
Ronnie Sahlberg	801bdbdc80	add a control to pull the server id list off a node (This used to be ctdb commit 38aa759aa88a042c31b401551f6a713fb7bbe84e)	2007-08-26 10:57:02 +10:00
Ronnie Sahlberg	6681da31df	add an initial implementation of a service_id structure and three controls to register/unregister/check a server id. a server id consists of TYPE:VNN:ID where type is specific to the application. VNN is the node where the serverid was registered and ID might be a node unique identifier such as a pid or similar. Clients can register a server id for themself at the local ctdb daemon. When a client dissappears or when the domain socket connection for the client drops then any and all server ids registered across that domain socket will also be automatically removed from the store. clients can register as many server_ids as they want at the same time but each TYPE:VNN:ID must be globally unique. Clients have the option of explicitely unregister a server id by using the UNREGISTER control. Registration and unregistration can only be done by clients to the local daemon. clients can not register their server id to a remote node. clients can check if a server id does exist on any ctdb node in the network by using the check control (This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)	2007-08-24 15:53:41 +10:00
Ronnie Sahlberg	de23937368	cleanup invoke_control_callback. we dont need to pass some of these parameters to _recv() since they are already set (This used to be ctdb commit 2034dbebb26d7a2d51241943f6ccbe15bb6a5169)	2007-08-24 10:54:34 +10:00
Ronnie Sahlberg	495a6403da	change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)	2007-08-24 10:42:06 +10:00
Ronnie Sahlberg	1da9c03b1f	comment why we do a talloc_steal (This used to be ctdb commit aba7972728307e0ae52ccf8c0dd5808110fb92d7)	2007-08-24 09:34:04 +10:00
Ronnie Sahlberg	62a03ef9d5	get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be)	2007-08-23 19:38:54 +10:00
Ronnie Sahlberg	f854b5f876	try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)	2007-08-23 19:27:09 +10:00
Ronnie Sahlberg	4c13bf0c5f	break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939)	2007-08-23 13:48:39 +10:00
Ronnie Sahlberg	8fd3df2553	hang the ctdb_req_control structure off the ctdb_client_control_state struct so that if we timeout a control we can print debug info such as what opcode failed and to which node we dont need the *status parameter to ctdb_client_control_state create async versions of the getrecmaster control pass a memory context to getrecmaster (This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)	2007-08-23 13:00:10 +10:00
Ronnie Sahlberg	20120c2331	in ctdb_call_recv() we must check that state is non-NULL since ctdb_call() may pass a null pointer to _recv() and this would cause a segfault. fortunately there appears there are no critical users for this codepath right now so the risk was more theoretical IF clients start using this call it coult segfault. change ctdb_control() to become fully async so we later can make recovery daemon do the expensive controls to nodes in parallell instead of in sequence (This used to be ctdb commit 379789cda6ef049f389f10136aaa1b37a4d063a9)	2007-08-23 11:58:09 +10:00
Ronnie Sahlberg	277cdbe3d1	create an enum to describe the state of a control in flight instead of using the enum that is for calls (This used to be ctdb commit f9cf7076151af983a1c4ea56fbeb6d94ea508a34)	2007-08-23 09:53:10 +10:00
Andrew Tridgell	d95476fa38	merge from ronnie (This used to be ctdb commit e0f1c1acb1188500674626d631e1a1b8726e72ad)	2007-08-22 17:31:29 +10:00
Andrew Tridgell	df9ec77b6b	merge from volker (This used to be ctdb commit a5587b3c065f7115ad5e55429c2c9d9923d3b4dc)	2007-08-22 17:18:55 +10:00
Andrew Tridgell	95f6328678	merge from volker (This used to be ctdb commit 7007e4f2292aa96287b899d6b9e82c7b597ef58f)	2007-08-22 17:16:01 +10:00
Ronnie Sahlberg	50c09b7465	when we receive a packet from the network, check explicitely that the node is not banned it the call is for a database record. i.e a REQ/REPLY CALL/DMASTER if we get such a call while banned, ignore the packet and write an entry in the logfile (This used to be ctdb commit 79eb0863609fbb12e28ebf734101b1d3f359b330)	2007-08-22 12:53:24 +10:00
Ronnie Sahlberg	f6e0336b23	create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda)	2007-08-22 12:38:31 +10:00
Ronnie Sahlberg	e3b6d1e511	if the node is inactive i.e. banned or disconnected then that node is not participating in the cluster if a client tries to attach to a database while the node is inactive, return an error back to the client and fail the attach (This used to be ctdb commit b26949f3c8e54f3bc60da04d7b4ac69f301068fc)	2007-08-22 11:34:48 +10:00
Ronnie Sahlberg	b47384d57a	when a node becomes banned its databases are no longer part of ctdb and it should thus no longer serve any database access calls until it has been reintroduced into the cluster. when becoming banned, reset the local generation id to 1 to prevent any further database access calls from other nodes from being processed. (This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8)	2007-08-22 10:38:35 +10:00
Ronnie Sahlberg	5fef81a6f1	if lockwait takes an excessive time to complete. log the time it took to complete and also the name of the database (This used to be ctdb commit 221ef0348fd8113a017d229d8c2c7aa5c4dfb5c2)	2007-08-22 09:46:48 +10:00
Ronnie Sahlberg	8b06fc7284	change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)	2007-08-21 17:25:15 +10:00
Ronnie Sahlberg	4e4dd6b886	when we shutdown the service due to receiving a 'ctdb shutdown' command from the administrator, log this as 'Received SHUTDOWN command. Stopping CTDB daemon.' so that the administrator will know when looking at the log 'why' the ctdb service was terminated. Previously the only thing logged was 'shutting down' which is not detailed enough. (This used to be ctdb commit 5b818c1b72b6594a8d6e45e1865026e3ce33ae63)	2007-08-21 09:46:27 +10:00
Ronnie Sahlberg	5228abef64	add an atexit() that will print "CTDB daemon shutting down" in the log when the main daemon exits (This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)	2007-08-21 09:43:53 +10:00
Ronnie Sahlberg	a03c8d4954	setup the logfile much earlier in the startup procedure for ctdbd change initial errors that cause ctdb to fail to start from printf to DEBUG(0 add a DEBUG(0 to log that the ctdb service is starting (This used to be ctdb commit 680b4fbb283dd68567a62a83345f11a6cc1dd0e5)	2007-08-21 09:33:03 +10:00
Ronnie Sahlberg	b582e13cae	make sure that the event script is executable and just ignore it othervise (This used to be ctdb commit 65eb7845c70489d654acaaf99cd2c8eac7df11dc)	2007-08-21 09:22:14 +10:00
Ronnie Sahlberg	aed2c58c64	dont pollute the log with 'Registered PID XXX for client YYY' at log level 0. change the log level to 3 for this information message (This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)	2007-08-21 08:42:42 +10:00
Ronnie Sahlberg	7e1f840c8d	if a public address has already been taken over by a node, then let that public address remain at that node until either the node becomes unhealthy or the original/primary node for that address becomes healthy again. Othervise what will happen is 1, if we ban a node, the banning code immediately does a takeover_run() and reassigns the public address to a different node in the cluster. 2, a few seconds later (at most) the recovery daemon will detect that the number of nodes has shrunk and will initiate a recovery. During the recovery the public address would again be assigned to a node, this time a different node. (This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)	2007-08-20 14:16:58 +10:00

... 2 3 4 5 6 ...

1322 Commits