samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00

Author	SHA1	Message	Date
Andrew Tridgell	25bb60f112	show start/stop time of recovery on all nodes (This used to be ctdb commit 9f7662279c367eb3e8a58e6f4aeca521e6f1f1d0)	2008-01-08 09:30:11 +11:00
Andrew Tridgell	37861932ce	merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7)	2008-01-07 16:17:22 +11:00
Andrew Tridgell	d38fbaa38b	nicer onnode output (This used to be ctdb commit ac5c1e090d007bc2e3965589731620b87c0217fb)	2008-01-07 14:31:13 +11:00
Andrew Tridgell	4258098e98	catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4)	2008-01-07 14:08:25 +11:00
Andrew Tridgell	528e4d7a2b	more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716)	2008-01-07 14:07:01 +11:00
Andrew Tridgell	748843a3c6	added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e)	2008-01-06 13:24:55 +11:00
Andrew Tridgell	c08f2616cd	new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3)	2008-01-06 12:38:01 +11:00
Andrew Tridgell	4f5b717aa3	change default tunables to cope with larger dbs (This used to be ctdb commit d91a2d43d1f0562cc3a12e6e1e2767f75d888f72)	2008-01-06 12:36:58 +11:00
Andrew Tridgell	108aafcdb2	non-persistent databases don't need sync transactions (This used to be ctdb commit 52fd86addd23e4d6e0af2c716bd83d19675b1f5a)	2008-01-06 12:36:30 +11:00
Andrew Tridgell	9311f7fb7e	fixed the bug that make "onnode N service ctdb start" hang (This used to be ctdb commit b50dcb16f30a60abce42f491f9b0aae7948b8206)	2008-01-05 12:09:29 +11:00
Andrew Tridgell	e4aefbc66d	a new tunable DatabaseMaxDead that enables the tdb max dead cache logic (This used to be ctdb commit 01c519c3658a8fcb9545b507b597e723658e4c4e)	2008-01-05 09:36:53 +11:00
Andrew Tridgell	023a230d9c	a useful hack for checking correct behaviour of recovery (This used to be ctdb commit d88b95a5407b53ead47ca0638ee60653ea3d3d07)	2008-01-05 09:36:21 +11:00
Andrew Tridgell	f79dfd04c0	convert much of the recovery logic to be async and parallel across all nodes (This used to be ctdb commit 8b72a02bf1045d8befb342a4111ca1316889262e)	2008-01-05 09:35:43 +11:00
Andrew Tridgell	9a625534c1	this fixes the non-dmaster bug that has plagued us for months (This used to be ctdb commit 2acf6c6201862debfca054a09262f75c066d2deb)	2008-01-05 09:34:47 +11:00
Andrew Tridgell	fc21f78231	make some specific cases of the non-dmaster bug non-fatal (This used to be ctdb commit 7b516ab06c7ba7ffe9ecf3f76720df5360176b2c)	2008-01-05 09:32:29 +11:00
Andrew Tridgell	e9987cf236	fixed a warning (This used to be ctdb commit f34d0f9351c1cda3327efb14e173f249f7854570)	2008-01-05 09:30:49 +11:00
Andrew Tridgell	afc7275c16	fixed a warning (This used to be ctdb commit d6255438d63943736b24a7a6da190b6933379a61)	2008-01-04 12:42:10 +11:00
Andrew Tridgell	2509821503	prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848)	2008-01-04 12:11:29 +11:00
Andrew Tridgell	41fb8e283b	add randrec to Makefile (This used to be ctdb commit ded1f7903e8a6525ab1888e8c4f50c71fa23cc19)	2008-01-04 09:19:06 +11:00
Andrew Tridgell	bb06e831a0	more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e)	2008-01-02 22:44:46 +11:00
Andrew Tridgell	2a2f1e3d91	fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec)	2007-12-27 10:07:01 +11:00
Andrew Tridgell	6ef3bff4ed	merge from ronnie (This used to be ctdb commit 072ef744951d3aa59dd8be70578b99b18c37d988)	2007-12-04 15:20:40 +11:00
Andrew Tridgell	a55c3709ea	make DeterministicIPs the default (This used to be ctdb commit e7d077e98a40a62dbd6bfd174f29afba7b5529ef)	2007-12-04 15:18:27 +11:00
Ronnie Sahlberg	7cef33b40a	rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e)	2007-12-03 15:45:53 +11:00
Ronnie Sahlberg	64008e28bb	for the banned status, we should allocate this structure as a child of the banned_nodes array and not the rec structure so that ban_state is destroyed when the banned_nodes array gets destroyed (and so that when this struct is destroyed, that any pending ctdb_ban_timeout events are also destroyed.) othervise we may end up with multiple ban_timeout timed events going in parallell since we destroy/recreate the banned_nodes structure during election but we never destroy/recreate the rec structure. (This used to be ctdb commit fbd663d56a2a4421a5c0e541962c87e2e9c7cd82)	2007-12-03 11:39:17 +11:00
Andrew Tridgell	7edb41692e	merge from ronnie (This used to be ctdb commit 6653a0b67381310236e548e5fc0a9e27209b44e0)	2007-12-03 10:19:24 +11:00
Ronnie Sahlberg	2f1baf34d3	up the loglevel for the enable/disable monitoring to level 1 (This used to be ctdb commit 5043a0afeedbd30c7f64c2733c8ae5bf75479a98)	2007-12-01 10:06:42 +11:00
Ronnie Sahlberg	07dd0f6ff0	log that monitoring has been "disabled" not that it has been "stopped" when monitoring is disabled (This used to be ctdb commit e7c92f661a523deae9544b679d412ae79cc0ede7)	2007-11-30 10:53:35 +11:00
Ronnie Sahlberg	975fbc8e22	always set up a new monitoring event regardless of whether monitoring is enabled or not (This used to be ctdb commit c3035f46d1a65d2d97c8be7e679d59e471c092c2)	2007-11-30 10:14:43 +11:00
Ronnie Sahlberg	50573c5391	add ctdb_disable/enable_monitoring() that only modifies the monitoring flag. change calling of the recovered/takeip/releaseip event scripts to use these enable/disable functions instead of stopping/starting monitoring. when we disable monitoring we want all events to still be running in particular the events to monitor for dead nodes and we only want to supress running the monitor event scripts (This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)	2007-11-30 10:09:54 +11:00
Ronnie Sahlberg	0eb6c04dc1	get rid of the control to set the monitoring mode. monitoring should always be enabled (though a node may want to temporarily disable running the "monitor" event scripts but can do so internally without the need for this control) (This used to be ctdb commit e3a33618026823e6af845fd8513cddb08e6b5584)	2007-11-30 10:00:04 +11:00
Ronnie Sahlberg	192ba82b73	->monitor_context is NULL when monitoring is disabled. Check whether monitoring is enabled or not before creating new events and log why the event is not set up othervise (This used to be ctdb commit 2f352b2606c04a65ce461fc2e99e6d6251ac4f20)	2007-11-30 09:02:37 +11:00
Ronnie Sahlberg	8ac8cce487	dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa)	2007-11-30 08:44:34 +11:00
Ronnie Sahlberg	5c3a270991	move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795)	2007-11-28 15:04:20 +11:00
Ronnie Sahlberg	9e73dc87cc	Add a --node-ip argument so that one can specify which ip address a specific instance of ctdbd should bind to. This helps when running a "virtual" cluster on a single machine where all instcances bind to different alias interfaces. If --node-ip is specified, then we will only try to bind to this ip address only. Othervise we fall back to the original method trying the ip addresses in /etc/ctdb/nodes one by one until we find one we can bind to. No variable in /etc/sysconfig/ctdb added since this parameter only makes sense in a virtual test/debug cluster. (This used to be ctdb commit d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0)	2007-11-26 10:52:55 +11:00
Ronnie Sahlberg	0597be3386	when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0)	2007-11-23 12:41:29 +11:00
Ronnie Sahlberg	a260145f9f	check for recursive bans in ctdb_ban_node() and remove the previous ban if this is an attempt to ban an already banned node (This used to be ctdb commit 214f2d7b04d0a491d466fc85c8d016efde416f9e)	2007-11-23 12:38:37 +11:00
Ronnie Sahlberg	6b284e5905	add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f)	2007-11-23 12:36:14 +11:00
Ronnie Sahlberg	b5e79fb06f	If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb)	2007-11-23 11:53:06 +11:00
Ronnie Sahlberg	b2a81fb6b1	when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd)	2007-11-23 11:31:42 +11:00
Ronnie Sahlberg	af5bc9b915	add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53)	2007-11-23 10:52:29 +11:00
Ronnie Sahlberg	c36ce05d08	if we get a modflag control but the flags remain unchanged, log this (This used to be ctdb commit 5a0cd9b37b21665054bd35facd87f0a6ff4dcd55)	2007-11-23 10:31:51 +11:00
Ronnie Sahlberg	e95a4b5cdb	when we print "Remote node had flags xx local had flags xx we swapped the flags when printing them to the log (This used to be ctdb commit 9fc8831a7fcd34763567227d61cd525ec441ebf2)	2007-11-23 09:54:38 +11:00
Andrew Tridgell	45f0fdfc20	make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675)	2007-11-13 10:27:44 +11:00
Andrew Tridgell	3427793f01	don't do the first startup event until we are out of recovery (This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea)	2007-11-12 13:10:15 +11:00
Andrew Tridgell	bde886988b	prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)	2007-11-12 10:53:11 +11:00
Ronnie Sahlberg	1d6a74f943	when shutting down, we should stop monitoring (This used to be ctdb commit 325683ef8f326f0565a827ff2c493adcab6e0d64)	2007-10-22 12:34:51 +10:00
Ronnie Sahlberg	4a97876fb7	when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)	2007-10-22 12:34:08 +10:00
Andrew Tridgell	f47f758fe8	merge from ronnie (This used to be ctdb commit d444fdc7782496abe4b27003b647ac49fb52e6be)	2007-10-19 09:39:07 +10:00
Ronnie Sahlberg	d1ba047b7f	add a new transport method so that when a node is marked as dead, we shut down and restart the transport othervise, if we use the tcp transport the tcp connection might try to retransmit the queued data during the time the node is unavailable. this together with the exponential backoff for tcp means that the tcp connection quickly reaches the maximum backoff rto which is often 60 or 120 seconds. this would mean that it could take up to 60/120 seconds before the tcp layer detects that the connection is dead and it has to be reestablished. (This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)	2007-10-19 08:58:30 +10:00

1 2 3 4 5

205 Commits