samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	6b284e5905	add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f)	2007-11-23 12:36:14 +11:00
Ronnie Sahlberg	b5e79fb06f	If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb)	2007-11-23 11:53:06 +11:00
Ronnie Sahlberg	b2a81fb6b1	when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd)	2007-11-23 11:31:42 +11:00
Ronnie Sahlberg	af5bc9b915	add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53)	2007-11-23 10:52:29 +11:00
Ronnie Sahlberg	c36ce05d08	if we get a modflag control but the flags remain unchanged, log this (This used to be ctdb commit 5a0cd9b37b21665054bd35facd87f0a6ff4dcd55)	2007-11-23 10:31:51 +11:00
Ronnie Sahlberg	e95a4b5cdb	when we print "Remote node had flags xx local had flags xx we swapped the flags when printing them to the log (This used to be ctdb commit 9fc8831a7fcd34763567227d61cd525ec441ebf2)	2007-11-23 09:54:38 +11:00
Andrew Tridgell	45f0fdfc20	make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675)	2007-11-13 10:27:44 +11:00
Andrew Tridgell	3427793f01	don't do the first startup event until we are out of recovery (This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea)	2007-11-12 13:10:15 +11:00
Andrew Tridgell	bde886988b	prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)	2007-11-12 10:53:11 +11:00
Ronnie Sahlberg	1d6a74f943	when shutting down, we should stop monitoring (This used to be ctdb commit 325683ef8f326f0565a827ff2c493adcab6e0d64)	2007-10-22 12:34:51 +10:00
Ronnie Sahlberg	4a97876fb7	when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)	2007-10-22 12:34:08 +10:00
Andrew Tridgell	f47f758fe8	merge from ronnie (This used to be ctdb commit d444fdc7782496abe4b27003b647ac49fb52e6be)	2007-10-19 09:39:07 +10:00
Ronnie Sahlberg	d1ba047b7f	add a new transport method so that when a node is marked as dead, we shut down and restart the transport othervise, if we use the tcp transport the tcp connection might try to retransmit the queued data during the time the node is unavailable. this together with the exponential backoff for tcp means that the tcp connection quickly reaches the maximum backoff rto which is often 60 or 120 seconds. this would mean that it could take up to 60/120 seconds before the tcp layer detects that the connection is dead and it has to be reestablished. (This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)	2007-10-19 08:58:30 +10:00
Ronnie Sahlberg	755511d28d	set the flags explicitely isnstead of masking them in (This used to be ctdb commit 27a5f9dead44890683f9dbc4f07cda11264aa03b)	2007-10-18 16:54:00 +10:00
Andrew Tridgell	b814462c38	added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760)	2007-10-18 16:27:36 +10:00
Andrew Tridgell	d939a2901b	merge from ronnie (This used to be ctdb commit 75d4b386293e186a6bb8532515585ab72670d663)	2007-10-18 15:44:02 +10:00
Ronnie Sahlberg	ce7a054d20	add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e)	2007-10-16 15:27:07 +10:00
Ronnie Sahlberg	056aac6e0c	add a new tunable : DeterministicIPs that makes the allocation of public addresses to nodes deterministic. Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb When this is set, the first entry in /etc/ctdb/public_addresses will always be hosted by node 0, when that node is available, the second entry by node1 and so on. This tunable allows the allocation of addresses to become very unbalanced and is only for debugging/testing use. Beware, this feature requires that /etc/ctdb/public_addresses are identical on all the nodes in the cluster. (This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)	2007-10-16 12:15:02 +10:00
Ronnie Sahlberg	25d3a031d0	include system/network.h so we get the prototype for inet_aton() (This used to be ctdb commit 7145764b2d217f88a723dcb0ffd4e5a1567d64cf)	2007-10-16 11:29:33 +10:00
Ronnie Sahlberg	7e2e1b14fb	merge from tridge (This used to be ctdb commit 9e6bc12c9be2dabcfb9c6aeef257ef4737287fab)	2007-10-16 11:26:22 +10:00
Ronnie Sahlberg	b3ff7d904d	dont try to lock the file from inside the ctdb daemon. eventhough we dont want a blocking lock it does appear that the fcntl() call can block for a while if gpfs is in the process of rebuilding itself after a node arriving/leaving the cluster (This used to be ctdb commit 6c0d206dea7116db71bccb4802a93dd7283249f6)	2007-10-16 09:50:31 +10:00
Andrew Tridgell	99bc0aca93	sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210)	2007-10-15 14:28:51 +10:00
Andrew Tridgell	0e855c0772	merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676)	2007-10-15 14:17:49 +10:00
Andrew Tridgell	174879621e	add config option for disabling bans (This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)	2007-10-15 13:22:58 +10:00
Ronnie Sahlberg	1a4999076b	first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433)	2007-10-11 07:10:17 +10:00
Ronnie Sahlberg	167e100d4b	simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f)	2007-10-11 06:16:36 +10:00
Ronnie Sahlberg	33a6aa3c3f	merge from tridge (This used to be ctdb commit 4690a205fe4325b03ab044bdb5fbc9aa3e94db6e)	2007-10-10 10:49:55 +10:00
Andrew Tridgell	011a205b86	make sure reconnected nodes start off as unhealthy so they don't get a public IP (This used to be ctdb commit c733ec6760cae01ce277f491caf1355e46de5cf7)	2007-10-10 10:45:22 +10:00
Ronnie Sahlberg	bdd67bba1e	add a --single-public-ip argument to ctdbd to specify the ip address used in single public ip address mode. when using this argument, --public-interface must also be used. add a vnn structure to the ctdb context to describe the single public ip address update the killtcp control in the daemon that if a socketpair that is to be killed does not match a normal public address it checks if the destination address maches the single public ip address and if so uses that vnn structure from the ctdb context this allows killtcp to kill also connections to the single public ip instead of only normal public addresses (This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)	2007-10-10 09:42:32 +10:00
Ronnie Sahlberg	7735957693	remove some debug outputs (This used to be ctdb commit f29c0b52df1f455909ba133e3ad3bc462dc32929)	2007-10-09 13:45:42 +10:00
Ronnie Sahlberg	80cd82f8e4	add a control to send gratious arps from the ctdb daemon (This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)	2007-10-09 11:56:09 +10:00
Ronnie Sahlberg	de6c5ed14d	merge from tridge (This used to be ctdb commit 02cda01c032804cb1c53593ceb98685c827e2d58)	2007-10-06 08:11:24 +10:00
Andrew Tridgell	50770008df	fixed several places where we set the recovery culprit incorrectly (This used to be ctdb commit d9da73395fa443801fc68ec53a42b548e832d58a)	2007-10-05 13:51:31 +10:00
Andrew Tridgell	4115492992	- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f)	2007-10-05 13:28:21 +10:00
Andrew Tridgell	fb48f2d5a2	we are the culprit if we can't get the reclock (This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a)	2007-10-05 12:01:40 +10:00
Ronnie Sahlberg	72379ee3eb	change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)	2007-09-26 14:25:32 +10:00
Ronnie Sahlberg	359448ff00	when we have a public ip address mismatch (i.e. we hold addresses we shouldnt or we are not holding addresses wqe should) we must first freeze the local node before we set the recovery mode (This used to be ctdb commit a77a77e8b5180f6a4a1f3d7d4ff03811f3b71b56)	2007-09-24 10:52:26 +10:00
Andrew Tridgell	e3d0ec8797	fixed a fd leak on the recovery lock (This used to be ctdb commit 186f35c42ed4fcc9ed44390b0dd036ece475d45e)	2007-09-24 10:19:07 +10:00
Andrew Tridgell	80100c3573	run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)	2007-09-24 10:12:18 +10:00
Andrew Tridgell	b87ddd9148	no longer wait at startup for services to become available, instead set the node initially unhealthy and let the status monitoring bring the node online. This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup and thus frozen (This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)	2007-09-24 10:00:14 +10:00
Andrew Tridgell	4178cb98a1	fixed a valgrind error, and some warnings (This used to be ctdb commit c0f52dbb385fa0748680adb7c40755c92e577551)	2007-09-24 09:57:14 +10:00
Andrew Tridgell	2607c222fc	avoid using connected nodes that aren't in the vnn map yet (This used to be ctdb commit 2b5ae133f5f6fa9ad1a8896fe4b4c542d4ca462d)	2007-09-21 15:44:13 +10:00
Ronnie Sahlberg	51d912063c	in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335)	2007-09-21 15:19:33 +10:00
Ronnie Sahlberg	61e885d0b9	when ctdb attaches to a database it broadcasts the attach to all other nodes so that the db is created on them as well when we send this broadcast we must use the correct control and not assume all databases created are of the temporary kind (This used to be ctdb commit 106f816d4a0814ca4418de051289d9fc62df7dd2)	2007-09-21 13:47:40 +10:00
Andrew Tridgell	c60988325d	added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)	2007-09-21 12:24:02 +10:00
Andrew Tridgell	81bfa58d58	make sure we set close on exec on any possibly inherited fds (This used to be ctdb commit d9dec82076f14a348e7b67b4350180681ff86f32)	2007-09-19 11:46:37 +10:00
Andrew Tridgell	c62490569b	cope with non-standard install dirs in event scripts (This used to be ctdb commit 52fff5345873690a9cc86495f414343eaa3bd540)	2007-09-14 14:14:03 +10:00
Andrew Tridgell	955d4d8615	make sure all public IPs are removed at startup (This used to be ctdb commit b16f33787f2a9471285037f4a6d470e826536570)	2007-09-14 11:56:40 +10:00
Ronnie Sahlberg	6052078b53	let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc)	2007-09-14 10:16:36 +10:00
Andrew Tridgell	42fc00bda9	- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b)	2007-09-14 09:49:12 +10:00

1 2 3 4

168 Commits