samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-26 10:04:02 +03:00

Author	SHA1	Message	Date
root	08492a524b	change the talloc hierarchy for the main transaction_start context and the individual transaction_all handles (This used to be ctdb commit 919b29850671b59bcf748aec25658ea09d8b4f1c)	2009-05-06 07:33:07 +10:00
root	af25fa38f3	fixed a problem with clients disconnecting during a traverse When a client (such as smbstatus) is killed, it may have outstanding traverse children on remote nodes. We need to catch the client disconnect in ctdbd and send a control to all nodes telling them to kill those outstanding traverse children. (This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)	2009-05-06 07:32:25 +10:00
root	bfea570af4	when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0 Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero and when the client disconnects we decrement it to a negative number. this is a pure cosmetic patch with no operational impact to ctdb (This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)	2009-05-01 12:30:26 +10:00
root	6793f077a8	Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery (This used to be ctdb commit 329df9e47e6ca8ab5143985a999e68f37c6d88a5)	2009-05-01 01:17:59 +10:00
Ronnie Sahlberg	3a6ace330e	we only need to have transaction nesting disabled when we start the new transaction for the recovery (This used to be ctdb commit bf8dae63d10498e6b6179bbacdd72f1ff0fc60be)	2009-04-26 08:48:15 +10:00
Ronnie Sahlberg	d20bb2498d	set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery (This used to be ctdb commit 1b2029dbb055ff07367ebc1f307f5241320227b2)	2009-04-26 08:42:54 +10:00
Ronnie Sahlberg	38ea6708dd	add a tuneable RecoveryDropAllIPs so it is possible to control after how long a node that has been stuck in recovery will wait until it will yield all public addresses. this now defaults to 60 seconds This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses (This used to be ctdb commit 70f21428c9eec96bcc787be191e7478ad68956dc)	2009-04-24 18:28:08 +10:00
Ronnie Sahlberg	ce3283f7cb	increase the loglevel for the message we print when we automatically release all ips when we have been in recovery for too long (This used to be ctdb commit 7af060ded5113a49832f6a08a942523a202586b3)	2009-04-24 18:11:10 +10:00
Ronnie Sahlberg	3363480da4	tweak some timeouts so that we do trigger a banning even if the control hangs/timesout (This used to be ctdb commit 1860a365e6ba8212e15c33016c80a2adcf8d10f4)	2009-04-24 14:45:07 +10:00
Ronnie Sahlberg	e5532b6f26	If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9)	2009-04-24 14:44:57 +10:00
Ronnie Sahlberg	a87e6f56ae	we only need to switch into client mode from the eventscript child if we are running the monitor event (This used to be ctdb commit 13e2c9044950f21918e4610726e73ed3d8f76920)	2009-04-06 14:03:09 +10:00
Ronnie Sahlberg	e5e2f6f8f7	increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell) (This used to be ctdb commit 018f8b0b1823ef59b46f1a671aec5309d10628f4)	2009-04-06 14:00:41 +10:00
Ronnie Sahlberg	1f87ee85bc	use _exit() and not exit() when we terminate a failed eventscript child process (This used to be ctdb commit 33b296cee177adc61edc911caec8c24b3efa8441)	2009-04-06 13:16:36 +10:00
Ronnie Sahlberg	2e1208e648	We dont need to verify the nodemap on remote nodes that are banned (This used to be ctdb commit 7f8f9385deee6eff2b7303147bc6412bbdc122df)	2009-04-06 12:00:22 +10:00
Ronnie Sahlberg	2393df3989	if we cant pull the remote nodemap off a node we should mark it as a culprit so it eventually becomes banned. (This used to be ctdb commit 0889ae3c237bdb3bd72d45f2f64f5e5d8420870c)	2009-04-02 14:50:43 +11:00
Ronnie Sahlberg	d94917ec49	Change the (dodgy) seqnumfrequency variable to have ms resolution instead of second resolution. Rename the variable to SeqnumInterval for 1, it is an interval and not a 1/interval unit 2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable this is a real dodgy variable (This used to be ctdb commit 68eac459e5d2b6b534f72821036675ffe5d7a350)	2009-04-01 17:21:38 +11:00
Ronnie Sahlberg	ad40ee25f9	add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state. This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes. (This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)	2009-03-31 14:23:31 +11:00
Ronnie Sahlberg	7265c713db	we need to set the port properly in the parse_ip helper (This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)	2009-03-24 13:45:11 +11:00
root	629d5ee1fa	add a new command "ctdb scriptstatus" this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript. If an eventscript timedout or returned an error we also show the output from the eventscript. Example : [root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus 6 scripts were executed last monitoring cycle 00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009 10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009 20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009 OUTPUT:ERROR: Samba tcp port 445 is not responding Add a new helper function "switch_from_server_to_client()" which both the recovery daemon can use as well as in the child process we start for running the actual eventscripts. Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon. (This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)	2009-03-23 19:07:45 +11:00
root	dc05c1b80c	create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a)	2009-03-23 12:37:30 +11:00
Mathieu PARENT	f0d585217e	build: Make log-directory configurable indepently of VARDIR This adds a new configure option "--with-logdir". logdir defaults to "${localstatedir}/log" . It is important to have logdir configurable for debian systems, where localstatedir is set to "/var/lib" and not "/var". Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit b0c6854d1e886456fabdc8f1c3bd21c89311c601)	2009-02-04 00:19:22 +01:00
Michael Adam	3cca0f75e4	Fix treatment of link local ipv6 addresses: set the scope id. metze / Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)	2009-01-19 22:50:53 +01:00
Stefan Metzmacher	23b550d6fc	Fix segfault in ip takeover fallback code. metze Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 3b88f3dec5227e8579672974f7028fb356ee1d94)	2009-01-16 07:22:59 +11:00
root	321866dbba	finish the ipv6 support. allow clients to register either ipv4 or ipv6 client connections to the tickles list (This used to be ctdb commit d9b44d7c3255b0fd7359b9afeb613e6ff4c4eaac)	2009-01-13 16:17:20 +11:00
Ronnie Sahlberg	28bbe2f407	dont call ctdb_fatal() just because we are asked to restart a connection to a remote node and ctdb->methods is NULL. This can happen when we are in the middle of a normal shutdown of the daemon and we have already shut down the transport layer (thus setting ctdb->methods == NULL in the transport layer destructor) band there is some unprocessed data related to a remote node. This prevents an ugly race condition where ctdb might sometimes (rare) cause a core dump during "ctdb shutdown". (This used to be ctdb commit fc4e8b5a5d3699221620a8d76701c8589f2b4ff1)	2008-12-17 12:04:41 +11:00
root	8241d3f9cf	update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94)	2008-12-09 10:45:14 +11:00
root	e54347fa4e	redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e)	2008-12-05 16:32:30 +11:00
Ronnie Sahlberg	edb7241c05	redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)	2008-12-02 13:26:30 +11:00
Ronnie Sahlberg	a782bdbacd	inew version 1.0.66 ddwq (This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5)	2008-11-24 19:06:02 +11:00
Ronnie Sahlberg	1e2831898c	allow to change the recmaster even the database is not frozen (This used to be ctdb commit 03e2e436db5cfd29a56d13f5d2101e42389bfc94)	2008-11-21 16:24:12 +11:00
Andrew Tridgell	59b6a9a9e6	fixed problem with looping ctdb recoveries After a node failure, GPFS can get into a state where non-blocking fcntl() locks can take a long time. This means to the ctdb set_recmode test timing out, which leads to a recovery failure, and a new recovery. The recovery loop can last a long time. The fix is to consider a fcntl timeout as a success of this test. The test is to see that we can't lock the shared reclock file, so a timeout is fine for a success. (This used to be ctdb commit 6579a6a2a7161214adedf0f67dce62f4a4ad1afe)	2008-11-21 10:24:13 +11:00
Ronnie Sahlberg	331b9bdb5f	dont override/change CTDB_BASE if it is already set by the shell (This used to be ctdb commit 0a6f9326cb99f14b5c9edd0d8854d8229df49910)	2008-11-20 16:39:56 +11:00
Ronnie Sahlberg	a2a5904f66	Keepalive packets were only sent every KeepaliveInterval if the socket had been completely idle during that interval. If we had been sending other packets such as Messages, Calls or Controls there wouldnt be any need for an explicit keepalive and thus we didnt send one. This does make it somewhat awkward when analyzing traces since it is non-intuitive when keepalives are sent and when they are not sent. Change the keepalive logic to always send a keepalive regardless of whether the link is idle or not. (This used to be ctdb commit 7a18f33ec7512100dd067c65f0470889ff8fd591)	2008-11-20 13:35:08 +11:00
Ronnie Sahlberg	94a56ea410	reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)	2008-11-20 12:43:18 +11:00
Ronnie Sahlberg	06728fdac9	we actually need a ctdb_db variable (This used to be ctdb commit aba984f1b85f5a2d370b093061cf15843ee53758)	2008-11-03 21:54:52 +11:00
Ronnie Sahlberg	d7007793ea	latency is measured in us, not ms use an explicit ctdb_db variable instead of dereferencing state (This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)	2008-10-30 13:34:10 +11:00
Ronnie Sahlberg	e1b0cea427	add control and logging of very high latencies. log the type of operation and the database name for all latencies higher than a treshold (This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)	2008-10-30 12:49:53 +11:00
Ronnie Sahlberg	b9bd20ce55	add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0)	2008-10-22 11:04:41 +11:00
Ronnie Sahlberg	beed899c4f	null out the pointer before we reload the nodes file (This used to be ctdb commit 4b0f32047e8bece0a052bdbe2209afe91b7e8ce3)	2008-10-17 21:38:42 +11:00
Ronnie Sahlberg	a924ef78b6	when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef)	2008-10-17 21:18:06 +11:00
Ronnie Sahlberg	ce66008e08	specify a "script log level" on the commandline to set under which log level any/all output from eventscripts will be logged as (This used to be ctdb commit cdc79d4f22f1a6aec5c34115969421f93663932a)	2008-10-17 07:56:12 +11:00
Ronnie Sahlberg	5808a7be96	allow multiple eventscripts using the same prefix. this eases the pain for users that use out of tree eventscripts (This used to be ctdb commit 8313dfb6fc5404cd2d065af6620412f8664ada11)	2008-10-16 17:57:50 +11:00
Ronnie Sahlberg	233b0e5cbb	lower the loglevel for the informational message that a TCP_ADD opeation described an ip address not known to be a public address. This could happen if someone for genuine reasons accesses a share through a static ip address. It can also happen if non homogenous public address configurations are used and when a tcp description is pushed out to a different node that does not server/know the specific ip address. (This used to be ctdb commit 9b1d089c99413f3681440f3cf33c293d118c9108)	2008-10-15 03:02:09 +11:00
Ronnie Sahlberg	41d19e650c	Revert "from Mathieu Parent <math.parent@gmail.com>" This reverts commit dc9cd4779db4a89697731e4cf415be51067a07c1. Conflicts: (This used to be ctdb commit d13da2e8fe2fab619540525d98a5502a23ab7d20)	2008-10-15 01:08:29 +11:00
Ronnie Sahlberg	cb300382b0	update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an older ipv4-only version of these controls. We need this so that we are backwardcompatible with old versions of ctdb and so that we can interoperate with a ipv4-only recmaster during a rolling upgrade. (This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)	2008-10-14 10:40:29 +11:00
Ronnie Sahlberg	e5a3a73e64	from Mathieu Parent <math.parent@gmail.com> Hi, I have attached a patch necessary as debian log dir (/var/log) is not a subdir of VARDIR (/var/lib on rpm systems, /var/lib/ctdb on debian). As I don't know much about autotools and friends, this patch may be hacky. This is part of the process to minimize diff between distributions. (This used to be ctdb commit dc9cd4779db4a89697731e4cf415be51067a07c1)	2008-10-13 08:27:33 +11:00
Ronnie Sahlberg	3411e98e14	skip empty lines in the public addresses file, not skip all non-empty lines (This used to be ctdb commit dc108adada33bb713f71a2859eda3b439ed0cd1a)	2008-10-07 19:34:34 +11:00
Ronnie Sahlberg	374906860c	from Michael Adams : allow #-style comments in the nodes and public addresses file (This used to be ctdb commit 5f96b33a379c80ed8a39de1ee41f254cf48733f9)	2008-10-07 19:25:10 +11:00
Ronnie Sahlberg	46187433ca	remove an unused variable (This used to be ctdb commit 4237bd3753dcb024c17461e974414bef1b609416)	2008-10-07 18:14:44 +11:00
Ronnie Sahlberg	1778280d50	When we reload the nodes file instead of shutting down/restarting the entire tcp layer just bounce all outgoing connections and reconnect (This used to be ctdb commit e701a531868149f16561011e65794a4a46ee6596)	2008-10-07 18:12:54 +11:00

1 2 3 4 5 ...

410 Commits