samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-03-09 08:58:35 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	7f8d98ebb0	update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7)	2009-06-25 12:55:43 +10:00
Ronnie Sahlberg	2b253c094c	add a control to read the current reclock file from a node (This used to be ctdb commit ed6a4cbcdcbb4e0df83bec8be67c30288bf9bd41)	2009-06-25 12:17:19 +10:00
Ronnie Sahlberg	77ef745394	Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking. Update the init script to allow starting without a reclock file. (This used to be ctdb commit 07855ff5eba71e7d607d52e234a42553d9b93605)	2009-06-25 11:50:45 +10:00
Ronnie Sahlberg	180a576f7b	Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292)	2009-06-25 11:41:18 +10:00
Ronnie Sahlberg	de1402d471	dont log an error if waitpid returns -1 and errno is ECHILD (This used to be ctdb commit fdf50f3e774e3980af81c0b6f4ff81d085f4f697)	2009-06-19 15:55:13 +10:00
Ronnie Sahlberg	baead0fdcc	dont leak file descriptors when set recmdoe timesout (This used to be ctdb commit fc8a364eb095ec11ca01246a583bf1dc53510141)	2009-06-19 14:58:06 +10:00
Ronnie Sahlberg	d3c5fb4bd1	dont leak file descriptors (This used to be ctdb commit 268c3e4b269a92741a02280c84384178e73de10e)	2009-06-19 14:54:22 +10:00
Ronnie Sahlberg	d72b14e86c	in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process. This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery. (This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20)	2009-06-19 14:44:26 +10:00
Ronnie Sahlberg	1183b364f1	reduce the timeout we wait for the reclock child process to finish to 5 seconds before we log an error and abort (This used to be ctdb commit 6d1e4321b63973c2e53c63d386e8cc0bd9605cae)	2009-06-19 13:09:11 +10:00
Ronnie Sahlberg	0ddf79a3bc	increase the timeout before we shutdown when ther ecovery daemon is hung (This used to be ctdb commit facddcacb4a961cddb117818fa38a3e97770b2fa)	2009-06-18 09:20:18 +10:00
Ronnie Sahlberg	d1c40424f6	When we ban a node, only drop the IPs on the node being banned, not on every node (This used to be ctdb commit 46e8c3737e6ff54fc80de8e962e922924c27bc35)	2009-06-10 10:35:20 +10:00
Ronnie Sahlberg	b046f5e3aa	when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing (This used to be ctdb commit 4d5d22e64270cfb31be6acd71f4f97ec43df5b2c)	2009-06-05 17:00:47 +10:00
Ronnie Sahlberg	5371e3a793	lower the loglevel when we long that we skip an evenscript because it is not executable (This used to be ctdb commit c265df3c7950aab51b8b6ef17040229b97345c35)	2009-06-01 15:29:36 +10:00
Ronnie Sahlberg	6c0c3577f8	dont try to queue packets for sending to (recently) deleted nodes since these nodes do not have a queue. (This used to be ctdb commit 1b7c88ae7643f9bcc52b1d33095f97de88fc2316)	2009-06-01 14:56:19 +10:00
Ronnie Sahlberg	8a0880c843	when building the initial vnnmap, make sure to skip any deleted nodes (This used to be ctdb commit 0cd66c744cd9533ce8d4c4374bcee3bf49b66dae)	2009-06-01 14:44:15 +10:00
Ronnie Sahlberg	dc5e4906cc	use num_nodes and the nodes array instead of walking the vnnmap when counting the number of active nodes (This used to be ctdb commit df20cd9b05ad9ca72e32ccc42354eafc12b68c04)	2009-06-01 14:39:34 +10:00
Ronnie Sahlberg	e6170b5389	add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)	2009-06-01 14:18:34 +10:00
Ronnie Sahlberg	4259156050	dont remove the socket when the dameon stops. This can race if the service is immediately restarted (This used to be ctdb commit b18356764cd49d934eab901e596bb75c6e3ecdf8)	2009-05-29 18:16:13 +10:00
Ronnie Sahlberg	96340bd166	Revert "we only need to have transaction nesting disabled when we start the new transaction for the recovery" This reverts commit bf8dae63d10498e6b6179bbacdd72f1ff0fc60be. (This used to be ctdb commit 87292029cb444ffab130ff7dae47a629c2d15787)	2009-05-25 16:55:27 +10:00
Ronnie Sahlberg	270907faec	Revert "set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery" This reverts commit 1b2029dbb055ff07367ebc1f307f5241320227b2. (This used to be ctdb commit 9762a3408f10409b629637d237ec513a825a6059)	2009-05-25 16:55:02 +10:00
Ronnie Sahlberg	26e1486db7	Whitespace changes and using the CTDB_NO_MEMORY() macro changes to the previous patch. (This used to be ctdb commit d623ea7c04daa6349b42d50862843c9f86115488)	2009-05-21 11:49:16 +10:00
Sumit Bose	2fcedf6dac	add missing checks on so far ignored return values Most of these were found during a review by Jim Meyering <meyering@redhat.com> (This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)	2009-05-21 11:22:21 +10:00
Sumit Bose	11988fc77a	structure member node_list_file is not used anywhere (This used to be ctdb commit 0e84ea23d1d998d4d4ac7d8a858b3d8294f056cb)	2009-05-21 11:16:43 +10:00
Sumit Bose	9171a7784c	structure member logfile is not used anywhere (This used to be ctdb commit 4f86c991812c2d0bddbe3de9a9906cf5df118cd4)	2009-05-21 11:15:43 +10:00
Ronnie Sahlberg	9a3e19658d	Change the loglevel of "registered tcp client for ..." to INFO instead of ERR (This used to be ctdb commit 92b5580c38c23b99c1692708540983b0c0fcd6cf)	2009-05-19 08:55:42 +10:00
Ronnie Sahlberg	98a54c4675	Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon. Log this in "ctdb statistics". Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file. (This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)	2009-05-14 10:33:25 +10:00
Ronnie Sahlberg	42891227a4	add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging. (This used to be ctdb commit 5b0067a4e335cbbf6e606646e612d4bfcfdb7441)	2009-05-12 18:39:34 +10:00
root	08492a524b	change the talloc hierarchy for the main transaction_start context and the individual transaction_all handles (This used to be ctdb commit 919b29850671b59bcf748aec25658ea09d8b4f1c)	2009-05-06 07:33:07 +10:00
root	af25fa38f3	fixed a problem with clients disconnecting during a traverse When a client (such as smbstatus) is killed, it may have outstanding traverse children on remote nodes. We need to catch the client disconnect in ctdbd and send a control to all nodes telling them to kill those outstanding traverse children. (This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)	2009-05-06 07:32:25 +10:00
root	bfea570af4	when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0 Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero and when the client disconnects we decrement it to a negative number. this is a pure cosmetic patch with no operational impact to ctdb (This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)	2009-05-01 12:30:26 +10:00
root	6793f077a8	Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery (This used to be ctdb commit 329df9e47e6ca8ab5143985a999e68f37c6d88a5)	2009-05-01 01:17:59 +10:00
Ronnie Sahlberg	3a6ace330e	we only need to have transaction nesting disabled when we start the new transaction for the recovery (This used to be ctdb commit bf8dae63d10498e6b6179bbacdd72f1ff0fc60be)	2009-04-26 08:48:15 +10:00
Ronnie Sahlberg	d20bb2498d	set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery (This used to be ctdb commit 1b2029dbb055ff07367ebc1f307f5241320227b2)	2009-04-26 08:42:54 +10:00
Ronnie Sahlberg	38ea6708dd	add a tuneable RecoveryDropAllIPs so it is possible to control after how long a node that has been stuck in recovery will wait until it will yield all public addresses. this now defaults to 60 seconds This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses (This used to be ctdb commit 70f21428c9eec96bcc787be191e7478ad68956dc)	2009-04-24 18:28:08 +10:00
Ronnie Sahlberg	ce3283f7cb	increase the loglevel for the message we print when we automatically release all ips when we have been in recovery for too long (This used to be ctdb commit 7af060ded5113a49832f6a08a942523a202586b3)	2009-04-24 18:11:10 +10:00
Ronnie Sahlberg	3363480da4	tweak some timeouts so that we do trigger a banning even if the control hangs/timesout (This used to be ctdb commit 1860a365e6ba8212e15c33016c80a2adcf8d10f4)	2009-04-24 14:45:07 +10:00
Ronnie Sahlberg	e5532b6f26	If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9)	2009-04-24 14:44:57 +10:00
Ronnie Sahlberg	a87e6f56ae	we only need to switch into client mode from the eventscript child if we are running the monitor event (This used to be ctdb commit 13e2c9044950f21918e4610726e73ed3d8f76920)	2009-04-06 14:03:09 +10:00
Ronnie Sahlberg	e5e2f6f8f7	increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell) (This used to be ctdb commit 018f8b0b1823ef59b46f1a671aec5309d10628f4)	2009-04-06 14:00:41 +10:00
Ronnie Sahlberg	1f87ee85bc	use _exit() and not exit() when we terminate a failed eventscript child process (This used to be ctdb commit 33b296cee177adc61edc911caec8c24b3efa8441)	2009-04-06 13:16:36 +10:00
Ronnie Sahlberg	2e1208e648	We dont need to verify the nodemap on remote nodes that are banned (This used to be ctdb commit 7f8f9385deee6eff2b7303147bc6412bbdc122df)	2009-04-06 12:00:22 +10:00
Ronnie Sahlberg	2393df3989	if we cant pull the remote nodemap off a node we should mark it as a culprit so it eventually becomes banned. (This used to be ctdb commit 0889ae3c237bdb3bd72d45f2f64f5e5d8420870c)	2009-04-02 14:50:43 +11:00
Ronnie Sahlberg	d94917ec49	Change the (dodgy) seqnumfrequency variable to have ms resolution instead of second resolution. Rename the variable to SeqnumInterval for 1, it is an interval and not a 1/interval unit 2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable this is a real dodgy variable (This used to be ctdb commit 68eac459e5d2b6b534f72821036675ffe5d7a350)	2009-04-01 17:21:38 +11:00
Ronnie Sahlberg	ad40ee25f9	add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state. This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes. (This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)	2009-03-31 14:23:31 +11:00
Ronnie Sahlberg	7265c713db	we need to set the port properly in the parse_ip helper (This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)	2009-03-24 13:45:11 +11:00
root	629d5ee1fa	add a new command "ctdb scriptstatus" this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript. If an eventscript timedout or returned an error we also show the output from the eventscript. Example : [root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus 6 scripts were executed last monitoring cycle 00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009 10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009 20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009 OUTPUT:ERROR: Samba tcp port 445 is not responding Add a new helper function "switch_from_server_to_client()" which both the recovery daemon can use as well as in the child process we start for running the actual eventscripts. Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon. (This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)	2009-03-23 19:07:45 +11:00
root	dc05c1b80c	create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a)	2009-03-23 12:37:30 +11:00
Mathieu PARENT	f0d585217e	build: Make log-directory configurable indepently of VARDIR This adds a new configure option "--with-logdir". logdir defaults to "${localstatedir}/log" . It is important to have logdir configurable for debian systems, where localstatedir is set to "/var/lib" and not "/var". Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit b0c6854d1e886456fabdc8f1c3bd21c89311c601)	2009-02-04 00:19:22 +01:00
Michael Adam	3cca0f75e4	Fix treatment of link local ipv6 addresses: set the scope id. metze / Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)	2009-01-19 22:50:53 +01:00
Stefan Metzmacher	23b550d6fc	Fix segfault in ip takeover fallback code. metze Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 3b88f3dec5227e8579672974f7028fb356ee1d94)	2009-01-16 07:22:59 +11:00

... 3 4 5 6 7 ...

637 Commits