samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	366d413f2b	new version 1.0.87 (This used to be ctdb commit d187eb8507f35a650ff3ffc50fa49110eebca0bd)	2009-07-17 13:01:11 +10:00
Ronnie Sahlberg	4a405e564e	Merge commit 'martins/master' (This used to be ctdb commit febf3d6d3f2bdf187c042f560aefc54b8ac72454)	2009-07-17 12:45:08 +10:00
Ronnie Sahlberg	6db0f01532	document the new stopped event (This used to be ctdb commit 70603d9a79c80379bf65d9d703c399a65c109c52)	2009-07-17 12:30:05 +10:00
Ronnie Sahlberg	e5e9fc48b1	create a new event : stopped. This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ... Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered. (This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)	2009-07-17 12:26:16 +10:00
Ronnie Sahlberg	df00979158	When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522)	2009-07-17 11:37:03 +10:00
Ronnie Sahlberg	9c6aa4e420	update the eventscript to ensure that stopped nodes can not become the natgw master also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not. (This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)	2009-07-17 09:45:05 +10:00
Ronnie Sahlberg	5ce69e2fa3	if all nodes are STOPPED, pick one of the STOPPED nodes as natgw master (This used to be ctdb commit 8bbd96cfbbe98f3fc19e432797cbf4478f753a0b)	2009-07-17 09:36:22 +10:00
Ronnie Sahlberg	bf9ad9c934	Do not allow STOPPED or DELETED nodes to become the NATGW master (This used to be ctdb commit 4505ea15408ad40dd8deb4041fd75a65a0ad9336)	2009-07-17 09:29:58 +10:00
Martin Schwenke	d846eb78db	Test suite: Fix debug code for unexpectedly unhealthy cluster. The debug code should run "ctdb status" on a cluster node, not on the test client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34e6f8a04b12f8879eb42d417f9741502ccccf0f)	2009-07-16 14:04:06 +10:00
Ronnie Sahlberg	0c5f5ae58d	stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba)	2009-07-09 14:44:03 +10:00
Ronnie Sahlberg	b57811bee6	change the infolevel when logging stop/continue commands (This used to be ctdb commit 1e007c833098b03dd81797c081da1ae1b10c971c)	2009-07-09 14:34:12 +10:00
Ronnie Sahlberg	82c1be95ed	recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a)	2009-07-09 14:19:32 +10:00
Ronnie Sahlberg	9d0941bf83	document the new commands ctdb stop/continue (This used to be ctdb commit d6ddea4167ccdad05e88378ee3f22b6125969562)	2009-07-09 13:07:15 +10:00
Ronnie Sahlberg	41a519191e	dont let other nodes modify the STOPPED flag for the local process when pushing out flags changes (This used to be ctdb commit 501a2747d839ca291b70c761098549cf6d47a158)	2009-07-09 13:20:14 +10:00
Ronnie Sahlberg	88f3c40d9c	add two new controls, CTOP_NODE and CONTINUE_NODE that are used to stop/continue a node instead of using modflags messages (This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a)	2009-07-09 12:22:46 +10:00
Ronnie Sahlberg	66c8d4fb3d	make it possible to start the daemon in STOPPED mode (This used to be ctdb commit 866aa995dc029db6e510060e9e95a8ca149094ac)	2009-07-09 11:57:20 +10:00
Ronnie Sahlberg	d6a5fd5c9d	remove the header printed for the machinereadable output for natgwlist (This used to be ctdb commit 049271c83a09afb8d6c3e5212cf9ca782956b0c6)	2009-07-09 11:43:37 +10:00
Ronnie Sahlberg	9f0dc4b93b	Add a new node flag : STOPPED This node flag means the node is DISABLED and that all its public ip addresses are failed over, but also that it has been removed from the VNNmap. A STOPPED node should be in recovery mode active untill restarted using the continue command. Adding two new commands "ctdb stop" "ctdb continue" (This used to be ctdb commit d47dab1026deba0554f21282a59bd172209ea066)	2009-07-09 11:38:18 +10:00
Martin Schwenke	d6862832ed	Merge branch 'ronnie_merge' (This used to be ctdb commit 2ff6ee042080ba1c2bea76bbef3742997d84c9a8)	2009-07-08 14:21:36 +10:00
Martin Schwenke	168ec02adf	Test suite: new tests and code factoring. * 2 new tests for NFS failover. * Factor repeated code from tests into new functions select_test_node_and_ips(), gratarp_sniff_start() and gratarp_sniff_wait_show(). Use these new functions in existing and new tests. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit de0b58e18fcc0f90075fca74077ab62ae8dab5da)	2009-07-08 13:37:52 +10:00
Martin Schwenke	dae498a1e7	Test suite: better debug info when the cluster is unexpectedly unhealthy. cluster_is_healthy() is now run locally in tests and internally causes _cluster_is_healthy() to be run on node 0. When it detects that the cluster is unhealthy and $ctdb_test_restart_scheduled is not true, debug information is printed. This replaces the previous use of $CTDB_TEST_CLEANING_UP. To avoid spurious debug on expected restarts, added scheduled restarts to several tests. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b67946a6f6b185a7920bf1e560988417c8c4d87d)	2009-07-08 09:45:35 +10:00
Martin Schwenke	7e1cdac0ab	Make ctdbd restarts in tests more reliable. This works around potential race conditions in the init script where the restart operation is not necessarily reliable. It just wraps the actual restart in a loop and tries for a successful restart up to 5 times. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3f7a4afa0fcc5825beb89267973939df8cde4999)	2009-07-08 09:43:55 +10:00
Martin Schwenke	4bd8e0d87a	When testing make the time taken for some operations more obvious. If wait_until() does not timeout, print the time taken for the command to succeed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8d12fe61eb59a4a611dd5950506d14bd4891075d)	2009-07-08 09:43:45 +10:00
Martin Schwenke	21a891cb79	New tests for different aspects of failover. 3 separate tests: * Check that gratuitous ARPs are received and take effect. * Check that ping still works after failover. * Check, via SSH, that the hostname changes after failover. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aa9f79e4b3e077b48a8a16903d2236c284617e49)	2009-07-08 09:43:29 +10:00
Martin Schwenke	55a04d757f	Updates to TCP tickle tests and supporting functions. * Removed a race from tpcdump_start(). It seems impossible to tell when tcpdump is actually ready to capture packets. So this function now generates some dummy ping packets and waits until it sees them in the output file. * tcpdump_start() sets $tcpdump_filter. This is the default filter for tcpdump_wait() and tcpdump_show(), but other filters may be passed to those functions. * New functions tcptickle_sniff_start() and tcptickle_sniff_wait_show() handle capturing TCP tickle packets. These are used by complex/31_nfs_tickle.sh and complex/32_cifs_tickle.sh. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 52e1cd7e9217cfa521850a9a9a9daddcce011f27)	2009-07-08 09:43:01 +10:00
Martin Schwenke	4edbb2e5f2	Add an extra ctdb recovery to test function restart_ctdb(). There are still very rare cases where IPs haven't been reallocated before the beginning of the next test, so this adds a sleep and an extra call to "ctdb recover" to restart_ctdb(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7c27c493a6de92544754e42f2a8f227b3d663c73)	2009-07-08 09:42:10 +10:00
Martin Schwenke	74acb6f97e	Fix the run_tests script so that the number of columns is never 0. Sometimes "stty size" reports 0, for example when running in a shell under Emacs. In this case, we just change it to 80. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit da87914ab47fe5786b620587464b58853e98dd7e)	2009-07-08 09:41:06 +10:00
Martin Schwenke	5824f3aca9	Separate test cleanup code in output and clean up ctdb restart code. * ctdb_restart_when_done() now schedules a restart by setting an explicit variable that is respected in ctdb_test_exit(), rather than adding a restart to $ctdb_test_exit_hook. This means that restarts are all done in one place. * ctdb_test_exit() turns off "set -e" to make sure that all cleanup happens. * ctdb_test_exit() now prints a clear message indicating where the test ends and the cleanup begins. This message also includes the return code of the test. * Add debug in cluster_is_healthy to try to capture information about unexpected unhealthiness when a test starts. * Simplify simple/07_ctdb_process_exists.sh so that the exit code is generated more obviously. * Remove redundant calls to ctdb_test_exit at the end of tests, since they're done automatically via a trap. Also remove any preceding warnings of restarts or final hints about test success/failure. * Allow multi-digit debug levels in simple/12_ctdb_getdebug.sh and simple/13_ctdb_setdebug.sh. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 56ece515e047a54f33e8b07726e52ba21a1d67e1)	2009-07-08 09:40:11 +10:00
Ronnie Sahlberg	2708b305ca	Initscript cleanups. * Move building of CTDB_OPTIONS to new function build_ctdb_options() and have it use a helper function for readability. * New functions check_persistent_databases() and set_ctdb_variables(). * Remove valgrind-specific stop code, since the general pkill should kill ctdbd when running under valgrind. * Remove some bash-isms (e.g. >& /dev/null) since the script is /bin/sh. * Make indentation consistent. * Minor clean-ups. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/ctdb.init (This used to be ctdb commit bebb21f18e3026cb78a306104e92ee005d1077b2)	2009-07-07 13:45:19 +10:00
Ronnie Sahlberg	021c09a842	Merge root@10.1.1.27:/shared/ctdb/ctdb-git (This used to be ctdb commit 5e3b590e384bacfbebab1dd85e89cd87b63c620e)	2009-07-07 11:19:44 +10:00
Ronnie Sahlberg	1593e67399	send ARPs with an interval of 1.1 seconds during ip takeover. this is to better handle linux clients which often default to ignore grat arps that arrive within 1 second of eachother. (This used to be ctdb commit 5664da36943b4901a807a9594b0f45e859aafbf3)	2009-07-07 11:40:01 +10:00
Martin Schwenke	96b3517356	Test suite: better debug info when the cluster is unexpectedly unhealthy. cluster_is_healthy() is now run locally in tests and internally causes _cluster_is_healthy() to be run on node 0. When it detects that the cluster is unhealthy and $ctdb_test_restart_scheduled is not true, debug information is printed. This replaces the previous use of $CTDB_TEST_CLEANING_UP. To avoid spurious debug on expected restarts, added scheduled restarts to several tests. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ee7caae3a55a64fb50cd28fa2fd4663c5dd83b4f)	2009-07-06 17:52:11 +10:00
Martin Schwenke	d90d54ea3e	Make ctdbd restarts in tests more reliable. This works around potential race conditions in the init script where the restart operation is not necessarily reliable. It just wraps the actual restart in a loop and tries for a successful restart up to 5 times. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1cac8a0ad429f29d1508158c7f7c42a2f1a22945)	2009-07-06 16:40:31 +10:00
Martin Schwenke	35f998346e	When testing make the time taken for some operations more obvious. If wait_until() does not timeout, print the time taken for the command to succeed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bdb856ee22816ae1f6b8d15856555f488054f489)	2009-07-06 16:39:08 +10:00
Ronnie Sahlberg	20887a15ad	Perform an ipreallocate efter each enable/disable. This will force a wait until the ip addresses have been reallocated after a disable/enable command and will make scripting of enable/disable more predictable. This will cause the command enable/disable to wait until the ip realocation that normally follows shortly after a enable/disable to finish before the command returns to the prompt. (This used to be ctdb commit 6e1f60d8d780c1240aaabb78ecc8550d0480cd7e)	2009-07-06 11:49:55 +10:00
Ronnie Sahlberg	8c1bf5abb0	Merge root@10.1.1.27:/shared/ctdb/ctdb-git (This used to be ctdb commit 49e7584679c7467a367888c5b14529c8e338f032)	2009-07-06 11:28:10 +10:00
Martin Schwenke	5d67aa2332	New tests for different aspects of failover. 3 separate tests: * Check that gratuitous ARPs are received and take effect. * Check that ping still works after failover. * Check, via SSH, that the hostname changes after failover. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 92011cc05bbdb517ec6a4573f5cb9f6f21c3059e)	2009-07-03 20:55:02 +10:00
Martin Schwenke	613341d150	Updates to TCP tickle tests and supporting functions. * Removed a race from tpcdump_start(). It seems impossible to tell when tcpdump is actually ready to capture packets. So this function now generates some dummy ping packets and waits until it sees them in the output file. * tcpdump_start() sets $tcpdump_filter. This is the default filter for tcpdump_wait() and tcpdump_show(), but other filters may be passed to those functions. * New functions tcptickle_sniff_start() and tcptickle_sniff_wait_show() handle capturing TCP tickle packets. These are used by complex/31_nfs_tickle.sh and complex/32_cifs_tickle.sh. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8e2a89935a969340bfead8ed040d74703947cb81)	2009-07-03 20:44:55 +10:00
Martin Schwenke	7b3abce684	Add an extra ctdb recovery to test function restart_ctdb(). There are still very rare cases where IPs haven't been reallocated before the beginning of the next test, so this adds a sleep and an extra call to "ctdb recover" to restart_ctdb(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2bdb77d91761c003e2f0e6918a27c54150f6030)	2009-07-03 18:01:29 +10:00
Martin Schwenke	dba6c1ca77	Fix the run_tests script so that the number of columns is never 0. Sometimes "stty size" reports 0, for example when running in a shell under Emacs. In this case, we just change it to 80. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e309cb3f95efcf6cff7d7c19713d7b161a138383)	2009-07-03 17:58:38 +10:00
Martin Schwenke	0d425a07d4	Separate test cleanup code in output and clean up ctdb restart code. * ctdb_restart_when_done() now schedules a restart by setting an explicit variable that is respected in ctdb_test_exit(), rather than adding a restart to $ctdb_test_exit_hook. This means that restarts are all done in one place. * ctdb_test_exit() turns off "set -e" to make sure that all cleanup happens. * ctdb_test_exit() now prints a clear message indicating where the test ends and the cleanup begins. This message also includes the return code of the test. * Add debug in cluster_is_healthy to try to capture information about unexpected unhealthiness when a test starts. * Simplify simple/07_ctdb_process_exists.sh so that the exit code is generated more obviously. * Remove redundant calls to ctdb_test_exit at the end of tests, since they're done automatically via a trap. Also remove any preceding warnings of restarts or final hints about test success/failure. * Allow multi-digit debug levels in simple/12_ctdb_getdebug.sh and simple/13_ctdb_setdebug.sh. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b6fa044a1364cbb3008085041453ee4885f7ced1)	2009-07-03 17:40:16 +10:00
Ronnie Sahlberg	289c58e9b6	add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216)	2009-07-02 13:00:26 +10:00
Ronnie Sahlberg	ff104c6f5a	When we dispatch a message to a handler, pass the data as a real talloc object so that the handler can talloc_steal() the message content. (This used to be ctdb commit c69f5fe1db5b6ed4a009f0c10ab82c6f32b2e0bc)	2009-07-02 12:58:49 +10:00
Ronnie Sahlberg	e40dad890c	document the ipreallocate command (This used to be ctdb commit 6baaf5bec3ba0094c71d83315170acb5dc729711)	2009-07-02 12:45:14 +10:00
Ronnie Sahlberg	8e435c0605	update enable/disable (This used to be ctdb commit b99afc98bedf1a51d315e311f27c3fc55fd940e7)	2009-07-01 09:33:08 +10:00
Ronnie Sahlberg	3c1351eabd	update the sysconfig to show setting the debuglevel using a string literal instead of a numeric value (This used to be ctdb commit 964530d70ba2ca949380d30a0e3d622963a6206c)	2009-07-01 09:23:52 +10:00
Ronnie Sahlberg	2770cb4397	show the valid debuglevels that can be used in the error text when an invalid level was specified to ctdb setdebug (This used to be ctdb commit 421c0566094b91221fab2ea68f2c9bd35d5dfbcb)	2009-07-01 09:21:07 +10:00
Ronnie Sahlberg	93026f4cbf	update the handling of debug levels so that we always can use a literal instead of a numeric value. validate the input values used and refuse setting the debug level to an unknown value (This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)	2009-07-01 09:17:13 +10:00
Ronnie Sahlberg	9802a0c2f6	when no debuglevel is specified, make 'ctdb setdebug' show the available options (This used to be ctdb commit f4b0825d9da34578b9f90dc9bd7f99fcc2519ddf)	2009-07-01 08:26:00 +10:00
Ronnie Sahlberg	e6e1ff32a5	dont try sending a keepalive if the transport is down (This used to be ctdb commit 5cdc04669db8c2ddbbff5af82307a16e8d807b83)	2009-06-30 12:17:05 +10:00
Ronnie Sahlberg	6450ae533a	Dont even try allocating and sending a CALL packet if the transport is down (This used to be ctdb commit cb8dd896914d4e44ad7b8bb000176a7c78f394ae)	2009-06-30 12:16:13 +10:00
Ronnie Sahlberg	127754e192	failing a dmaster send due to the transport being down is fatal (This used to be ctdb commit c17dafc79bec25bbb796478c33f503503d382a20)	2009-06-30 12:14:58 +10:00
Ronnie Sahlberg	757ba01ddc	if we fail a dmaster migration due to the transport being down, then that is a fatal condition. (This used to be ctdb commit 75dea671f68ac6649095357c36b3697a927721e9)	2009-06-30 12:13:15 +10:00
Ronnie Sahlberg	dd1774cd85	dont try to send error packets if the transport is down (This used to be ctdb commit 65b94d280731df3245b26d69f39acfaf5bccf0d8)	2009-06-30 12:10:27 +10:00
Ronnie Sahlberg	d4b30b34aa	dont even try to send a message from the main daemon if the transport is down (This used to be ctdb commit 9a2c4c3ed09ac9ea781d06999d11e5c3b5b4a97a)	2009-06-30 12:09:28 +10:00
Ronnie Sahlberg	9e5064dcea	Dont try to allocate and send packets if the transport is down (This used to be ctdb commit 945f04f06a425fd3940a2e4b832c63223a3f26b3)	2009-06-30 12:03:12 +10:00
Ronnie Sahlberg	22fb69d337	dont even try to allocate a packet if the transport is down since it will fail (This used to be ctdb commit a73f316cb9cec877dc0bc3f7baa21be1b1454273)	2009-06-30 11:55:42 +10:00
Ronnie Sahlberg	243bb51f02	New version 1.0.86 (This used to be ctdb commit 841a2d9635341baa1a6dd9ec558fc7cadb4e3af4)	2009-06-30 09:09:06 +10:00
Ronnie Sahlberg	ce54b6dc8b	update the man pages with the "getreclock" and "setreclock" commands. (This used to be ctdb commit 3db8b1d7425ed5bd41e58b43c55fdac517d71baf)	2009-06-25 14:45:57 +10:00
Ronnie Sahlberg	816db4be38	Do not allow the "VerifyRecoveryLock" tunable to be changed if there is no reclock file (This used to be ctdb commit 5334e40978350b6b597ee020bac52e37c8f9a8ba)	2009-06-25 14:45:17 +10:00
Ronnie Sahlberg	969cb64056	disable VerifyRecoveryLock when the user modifies the filename (This used to be ctdb commit d973cb6e83b2f7cc37bd39c1219dcfbd4911a8ee)	2009-06-25 14:34:21 +10:00
Ronnie Sahlberg	5b235c3999	add a control to set the reclock file (This used to be ctdb commit 36cc2e586f03fa497ee9b06f3e6afc80219c4aaa)	2009-06-25 14:25:18 +10:00
Ronnie Sahlberg	7f8d98ebb0	update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7)	2009-06-25 12:55:43 +10:00
Ronnie Sahlberg	10db6a41df	return NULL and not a "" when there is no reclock file returned from the server (This used to be ctdb commit 6755f89f81aba63bfe00ee16d44a0201cbfa90ca)	2009-06-25 12:26:14 +10:00
Ronnie Sahlberg	2b253c094c	add a control to read the current reclock file from a node (This used to be ctdb commit ed6a4cbcdcbb4e0df83bec8be67c30288bf9bd41)	2009-06-25 12:17:19 +10:00
Ronnie Sahlberg	4a1a3652fe	Document that you can run ctdb without a reclock file in the sysconfig file (This used to be ctdb commit 33895d217ee096b356f02b5292ba27a840c4f559)	2009-06-25 11:59:21 +10:00
Ronnie Sahlberg	77ef745394	Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking. Update the init script to allow starting without a reclock file. (This used to be ctdb commit 07855ff5eba71e7d607d52e234a42553d9b93605)	2009-06-25 11:50:45 +10:00
Ronnie Sahlberg	180a576f7b	Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292)	2009-06-25 11:41:18 +10:00
Ronnie Sahlberg	52861523f6	new version 1.0.85 (This used to be ctdb commit a4b682e3b2657abeca3e387d96949f83bdbd7b2f)	2009-06-23 11:30:25 +10:00
Ronnie Sahlberg	5f680fa2b4	rename 99.routing to 11.routing so that it executed before the service scripts (This used to be ctdb commit 9bc8e7eec7ffa8969f0f170a77b13cd0033790f1)	2009-06-23 11:29:26 +10:00
Martin Schwenke	566314ca97	Fix minor problem in previous initscript commit. The valgrind start case should not use daemon, since this is specific to Red Hat. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 867f57d166395c92949e480ca725249b0ca8950b)	2009-06-19 18:08:54 +10:00
Martin Schwenke	3dad79b88e	Initscript fixes, mostly for "stop" action. Use a local variable $ctdbd so that we always run ctdbd from the the same place and so that we know what to kill. This variable respects the $CTDBD environment variable, which may be used to specify an alternative location for the daemon. In the important cases use "pkill -0 -f" to check if ctdbd is running. Also, remove the special case for killing ctdbd when running under valgrind. The regular case will handle this just fine. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 070305adfe636c2580776e6bf24bb8be06622b86)	2009-06-19 18:08:31 +10:00
Martin Schwenke	7bfc19d635	Clean up handling the of CTDB restarts in testcases. Glitches during restarts of the CTDB cluster have been causing some tests to fail. This is because restarts are initiated in the body of many tests. This adds a simple function ctdb_restart_when_done, which schedules a restart using an existing hook in the test exit code. This function is now used in tests that need to restart CTDB. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fc69b6a66282d5be6edeb286bf72aeafb252e6dd)	2009-06-19 18:03:14 +10:00
Martin Schwenke	635da189dc	Fix minor onnode bugs relating to local daemons. Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle regression. Due to the subtlety, this description is much longer than the 1 line patch that fixes it! The regression, where a process that invokes onnode is unexpectedly blocked, is only apparent if the following conditions are met: 1. $CTDB_NODES_SOCKETS is set; 2. The command passed to onnode attempts to background a process; and 3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)). In particular, when testing against local daemons (i.e. condition (1) is met), tests/simple/07_ctdb_process_exists.sh would fail (because it does both (2), (3)). The problem is caused by the use of file descriptor 3 in the code that allows separate filtering of stdout and stderr. A backgrounded process will have this descriptor open and the $(...) construct appears to wait for all file descriptors to be closed. This only happens with local daemons because SSH is replaced by a shell and file descriptor 3 leaks into that shell. It does not occur when SSH is used because the file descriptor does not leak into the remote shell where the process is backgrounded. The fix is simply to redirect file descriptor 3 to /dev/null in the fakessh function, which is used when $CTDB_NODES_SOCKETS is set. Also fixed is another minor bug when the -o option and $CTDB_NODES_SOCKETS are used in combination. The code uses the node name as a suffix for the output filename(s). Usually this is an IP address. However, when $CTDB_NODES_SOCKETS is in use the node name is the socket name, which might be a path several directories deep. Each output file is created via a simple redirection and this would fail if unexpected directories appear in the filename. 3 possible fixes were considered: 1. Replace all '/'s in the node name by '_'s. Nice and simple. 2. Use the basename of the node name. However, sockets may be in different directories but have the same basename. 3. Create all required directories before redirecting. This is a little more complex and probably doesn't meet the user's expectations. Option (1) is implemented here. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5d320099025b6835eda3a1e431708f7e0a6b0ba6)	2009-06-19 18:02:17 +10:00
Ronnie Sahlberg	de1402d471	dont log an error if waitpid returns -1 and errno is ECHILD (This used to be ctdb commit fdf50f3e774e3980af81c0b6f4ff81d085f4f697)	2009-06-19 15:55:13 +10:00
Ronnie Sahlberg	baead0fdcc	dont leak file descriptors when set recmdoe timesout (This used to be ctdb commit fc8a364eb095ec11ca01246a583bf1dc53510141)	2009-06-19 14:58:06 +10:00
Ronnie Sahlberg	d3c5fb4bd1	dont leak file descriptors (This used to be ctdb commit 268c3e4b269a92741a02280c84384178e73de10e)	2009-06-19 14:54:22 +10:00
Ronnie Sahlberg	d72b14e86c	in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process. This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery. (This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20)	2009-06-19 14:44:26 +10:00
Ronnie Sahlberg	1183b364f1	reduce the timeout we wait for the reclock child process to finish to 5 seconds before we log an error and abort (This used to be ctdb commit 6d1e4321b63973c2e53c63d386e8cc0bd9605cae)	2009-06-19 13:09:11 +10:00
Martin Schwenke	4697829e7c	Fix minor onnode bugs relating to local daemons. Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle regression. Due to the subtlety, this description is much longer than the 1 line patch that fixes it! The regression, where a process that invokes onnode is unexpectedly blocked, is only apparent if the following conditions are met: 1. $CTDB_NODES_SOCKETS is set; 2. The command passed to onnode attempts to background a process; and 3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)). In particular, when testing against local daemons (i.e. condition (1) is met), tests/simple/07_ctdb_process_exists.sh would fail (because it does both (2), (3)). The problem is caused by the use of file descriptor 3 in the code that allows separate filtering of stdout and stderr. A backgrounded process will have this descriptor open and the $(...) construct appears to wait for all file descriptors to be closed. This only happens with local daemons because SSH is replaced by a shell and file descriptor 3 leaks into that shell. It does not occur when SSH is used because the file descriptor does not leak into the remote shell where the process is backgrounded. The fix is simply to redirect file descriptor 3 to /dev/null in the fakessh function, which is used when $CTDB_NODES_SOCKETS is set. Also fixed is another minor bug when the -o option and $CTDB_NODES_SOCKETS are used in combination. The code uses the node name as a suffix for the output filename(s). Usually this is an IP address. However, when $CTDB_NODES_SOCKETS is in use the node name is the socket name, which might be a path several directories deep. Each output file is created via a simple redirection and this would fail if unexpected directories appear in the filename. 3 possible fixes were considered: 1. Replace all '/'s in the node name by '_'s. Nice and simple. 2. Use the basename of the node name. However, sockets may be in different directories but have the same basename. 3. Create all required directories before redirecting. This is a little more complex and probably doesn't meet the user's expectations. Option (1) is implemented here. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c97d56d93d9c1007a4e85affb19ed0c2d0e11b6d)	2009-06-19 12:12:39 +10:00
Martin Schwenke	62871fbcd5	Clean up handling the of CTDB restarts in testcases. Glitches during restarts of the CTDB cluster have been causing some tests to fail. This is because restarts are initiated in the body of many tests. This adds a simple function ctdb_restart_when_done, which schedules a restart using an existing hook in the test exit code. This function is now used in tests that need to restart CTDB. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d440e83bb4f0c19c085915d0f0e87cc0dabbc569)	2009-06-19 11:40:09 +10:00
Ronnie Sahlberg	0ddf79a3bc	increase the timeout before we shutdown when ther ecovery daemon is hung (This used to be ctdb commit facddcacb4a961cddb117818fa38a3e97770b2fa)	2009-06-18 09:20:18 +10:00
Ronnie Sahlberg	34fbfb8b89	rename 99.routing to 11.routing so it is executed before any of the service scripts (This used to be ctdb commit 1205673499618f90f413fad9e96a88733b5ce359)	2009-06-18 09:11:46 +10:00
Martin Schwenke	b0fd8fffcf	New tests for NFS and CIFS tickles. New tests/complex/ subdirectory contains 2 new tests to ensure that NFS and CIFS connections are tracked by CTDB and that tickle resets are sent when a node is disabled. Changes to ctdb_test_functions.bash to support these tests. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5d188af387a2a1d68d66f47edb7a9ca546ed357c)	2009-06-18 09:04:43 +10:00
Martin Schwenke	133826f5da	Increase threshold in 51_ctdb_bench from 2% to 5%. The threshold for the difference in the number messages sent in either direction around the ring of nodes was set to 2%. Something environmental is causing this different to sometimes be as high as 3%. We're confident it isn't a CTDB issue so we're increasing the threshold to 5%. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit be3e23c9fcb9c716e492af102830a4f6ad8bda7b)	2009-06-18 09:02:21 +10:00
Martin Schwenke	1f3a602b88	Merge commit 'origin/master' (This used to be ctdb commit 8ddd5165f573fc6beaae589b86a6afa4bc17f32a)	2009-06-16 12:56:55 +10:00
Martin Schwenke	ffff61c13b	New tests for NFS and CIFS tickles. New tests/complex/ subdirectory contains 2 new tests to ensure that NFS and CIFS connections are tracked by CTDB and that tickle resets are sent when a node is disabled. Changes to ctdb_test_functions.bash to support these tests. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 31cc46eb157ca1301312f14879e4fb4da7d81088)	2009-06-16 12:47:59 +10:00
Martin Schwenke	ad3c89095e	Make 51_ctdb_bench.sh more tolerant. Limit the allowable difference in message counts in either direction around the ring to 5% (up from 2%). There is something environmental making this blow out to 3% very occasionally when there's no obvious problem with ctdb. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d6e6909ac629212b3028e13b958e1a17c64bee8c)	2009-06-10 16:15:09 +10:00
Ronnie Sahlberg	d1c40424f6	When we ban a node, only drop the IPs on the node being banned, not on every node (This used to be ctdb commit 46e8c3737e6ff54fc80de8e962e922924c27bc35)	2009-06-10 10:35:20 +10:00
Ronnie Sahlberg	2bb687c4cd	remove unused variable (This used to be ctdb commit 2a52336ec021dfe8d56ba72726feb7b2dbd41f68)	2009-06-09 10:58:46 +10:00
Ronnie Sahlberg	ac931b1371	dont require particular values for NoIPFailback and DeterministicIPs when using ctdb moveip (This used to be ctdb commit d350c631850377c09968d2978ef57d2bd0d50116)	2009-06-09 10:57:46 +10:00
Ronnie Sahlberg	f135684766	improve ctdb moveip so that it does not always trigger a recovery. (This used to be ctdb commit 0ca28d7336463ecd2ff65620d8dbcbb496991531)	2009-06-09 10:56:50 +10:00
Ronnie Sahlberg	f6ccf96898	try avoiding to cause a recovery when deleting a public ip from a node (This used to be ctdb commit 6318ea13464e2fe630084c40802d8e697c2cb999)	2009-06-05 17:57:14 +10:00
Ronnie Sahlberg	b046f5e3aa	when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing (This used to be ctdb commit 4d5d22e64270cfb31be6acd71f4f97ec43df5b2c)	2009-06-05 17:00:47 +10:00
Ronnie Sahlberg	79eef7f2b5	dont list DELETED nodes in the ctdb listnodes output (This used to be ctdb commit 7eb137aa4c24c69bd93b98fb3c7108e5f3288ebd)	2009-06-04 13:25:58 +10:00
Ronnie Sahlberg	f691b96d84	make it possible to run 'ctdb listnodes' also if the daemon is not running. in this case, read the nodes file directly instead of asking the local daemon for the list. add an option -Y to provide machinereadable output to listnodes (This used to be ctdb commit 4a55cacc4f5526abd2124460b669e633deeda408)	2009-06-04 13:21:25 +10:00
Ronnie Sahlberg	85d67197fe	From William Jojo <w.jojo[AT]hvcc.edu> AIX dont have getopt.h by default. Dont try including this file when building on AIX (This used to be ctdb commit 06b33a826e71e1dd2f9e02ad614be55535d42045)	2009-06-04 09:41:05 +10:00
Martin Schwenke	0219d12fd4	Merge branch 'init_rewrite' Conflicts: config/ctdb.init Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 92be87b5bfed7882b48f4034c82dfdb031f3afdc)	2009-06-02 16:40:01 +10:00
Martin Schwenke	1c2e7871eb	Merge commit 'origin/master' (This used to be ctdb commit 135b72828fc76856fa8f6d7f9c820120de05596b)	2009-06-02 16:29:25 +10:00
Martin Schwenke	b1b1cbb274	Initscript cleanups. * Move building of CTDB_OPTIONS to new function build_ctdb_options() and have it use a helper function for readability. * New functions check_persistent_databases() and set_ctdb_variables(). * Remove valgrind-specific stop code, since the general pkill should kill ctdbd when running under valgrind. * Remove some bash-isms (e.g. >& /dev/null) since the script is /bin/sh. * Make indentation consistent. * Minor clean-ups. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 951dbcb29fd53cf51a08958efe185db4954d24f3)	2009-06-02 16:07:08 +10:00

1 2 3 4 5 ...

2133 Commits