samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00

Author	SHA1	Message	Date
Rusty Russell	e1217b7bdb	There is one signedness issue in tdb which prevents traverses of TDB records over the 2G offset on systems which support 64 bit file offsets. This fixes that case. On systems with 32 bit offsets, expansion and fcntl locking on these records will fail anyway. SAMBA already does '#define _FILE_OFFSET_BITS 64' in config.h (on my 32-bit x86 Linux system at least) to get 64 bit file offsets. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cherry picked from samba commit `252f7da702`) Signed-off-by: Stefan Metzmacher <metze@samba.org> (This used to be ctdb commit 2d768f664e6db65b3b7e0c732f33ee2b806892f9)	2009-12-16 08:03:42 +01:00
Ronnie Sahlberg	640c48c844	Revert "cleanup: remove a tunable we no longer use in the eventscripts any more :" This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c. Conflicts: include/ctdb_private.h server/ctdb_tunables.c (This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)	2009-12-16 09:51:17 +11:00
Ronnie Sahlberg	fcd16342f6	Merge branch 'trans3' (This used to be ctdb commit b765e12a5fb87a6121e49b349017b6a961929346)	2009-12-15 21:00:22 +11:00
Ronnie Sahlberg	b3104bd1d0	Author: Rusty Russell <rusty@rustcorp.com.au> Date: Tue Dec 15 15:53:30 2009 +1030 eventscript: hack to avoid overloading valgrind Now we fork one child per script, when running under valgrind the load gets quite high. This is because valgrind does a lot of work after exit, and we don't wait for the children to finish; we start the next one when the child reports status via the pipe. This fix is ugly, but simple. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6ed34d5320c39d8a55f2a36ad4c1ab574e0b0796)	2009-12-15 20:56:16 +11:00
Ronnie Sahlberg	842aa60d52	This is a dodgy patch. I saw once where the master ctdbd logging structure was talloc freed which caused issues. So only free the structure if it is NOT the master structure. This needs to be looked into in more detail. (This used to be ctdb commit bcf494b81f4277dc75f05faccf0c446bd15f6e2b)	2009-12-15 19:04:52 +11:00
Ronnie Sahlberg	0982299bed	Revert "Make fetch_locked more scalable" This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d. (This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)	2009-12-15 14:26:28 +11:00
Ronnie Sahlberg	5a7e9900df	Merge commit 'obnox/ctdb-wip-trans3' into trans3 (This used to be ctdb commit ac06a0e042e7d024060d6e87a49bda9ccc072c52)	2009-12-15 14:25:55 +11:00
Ronnie Sahlberg	3b53c02e34	add a new test tool that just locks and releases the same record over and over (This used to be ctdb commit 24767be2eb9aed29704c2a4097bab5466cb6728f)	2009-12-15 12:14:49 +11:00
Ronnie Sahlberg	244bc5cc8f	ctdb_fetch requires the number of nodes being specified. Have it log an error and terminate if thie parameter was omitted (This used to be ctdb commit 340be0179f55acfff77f8c3c8be958679227bde1)	2009-12-15 11:29:16 +11:00
Ronnie Sahlberg	e2e30df2e9	When setting up the logging, set the event to trigger a read of a log message from a child process as a child of the "log" structure and not the ctdb structure, or else we can crash if we receive log messages from a child but the log structure has been freed() (This used to be ctdb commit ea9e39369379939abf6a4076fa2014c10c1a9ad0)	2009-12-15 10:45:18 +11:00
Ronnie Sahlberg	db0d2a1b8f	From rusty: Subject: eventscript: fix spinning at 100% cpu when child exits. ctdbd was spinning reading 0 from a pipe, as soon as the first eventscript finishes. This was caused by the intersection between a78b8ea7168e "Run only one event for each epoll_wait/select call" and 32cfdc3aec34 "eventscript: ctdb_fork_with_logging()". Unavoidable mid-air collision, since both worked fine and both were developed simultaneously. When the script exits, we have two pipes open to it: one for any stdout/stderr for logging (ctdb_log_handler), and one for the result (ctdb_event_script_handler). The latter frees everything, including the log fd and event structure. We used to get one callback to ctdb_log_handler, which got a harmless 0-length read, then one to ctdb_event_script_handler which cleaned up. Now we only do one callback per poll, we need the logging function to clean itself up so we can make process. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 211ea7907e8e96041aa6f7d086551d64d065a8a3)	2009-12-15 10:23:58 +11:00
Ronnie Sahlberg	649ba2631d	Rename the tunable EventScriptBanCount to EventScriptTimeoutCount since we no longer ban nodes when dodgy scripts continue to hang. We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban. (This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)	2009-12-14 15:53:23 +11:00
Ronnie Sahlberg	ed6b5a8c68	cleanup: remove a tunable we no longer use in the eventscripts any more : EventScriptUnhealthyOnTimeout (This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)	2009-12-14 15:48:47 +11:00
Rusty Russell	cab8da8dc4	ctdb: don't print OUTPUT: for DISABLED scripts In other news, did you know ctime() returns a \n-terminated string? Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 1b4e7bb548976b99f122142b040494b6f9911962)	2009-12-14 15:46:49 +11:00
Rusty Russell	784fa9fd8a	eventscript: fix monitoring when killed by another script command Commit c1ba1392fe "eventscript: get rid of ctdb_control_event_script_finished altogether" was wrong: there is one case where we want to free the script without transferring their status to last_status. This happens because we always kill an running monitor command when we run any other command. This still isn't quite right (and never was): the callback will be called with status value 0, which might flip us to HEALTHY if we were unhealthy. This is conveniently fixed in my next set of patches :) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0ea0e27d93398df997d3df9d8bf112358af3a4a5)	2009-12-14 15:46:14 +11:00
Ronnie Sahlberg	e76561f544	remove the variable "disable when unhealthy" there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails (This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)	2009-12-14 15:40:54 +11:00
Michael Adam	b41d9a2bcc	Revert "recovery: add special pull-logic for persistent databases" This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222. This special recovery logic is wrong now with the transaction rewrite. The treatment of persistent databases will later be rewritten to use the database sequence number. Michael (This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be)	2009-12-12 00:45:40 +01:00
Volker Lendecke	f6ea3e6bcf	Make fetch_locked more scalable This patch improves the handling of the fetch_lock operation on non-persistent databases that ctdb clients have to do very frequently. The normal flow how this goes is the following: 1. Client does a local fetch_lock on the database 2. Client looks if the local node is dmaster. If yes, everything is fine If no, continue here 3. Client unlocks the local record 4. Client issues a "get me the record" call to ctdbd 5. ctdbd goes out and fetches the dmaster role 6. ctdbd tells the client to retry 7. Client starts over again The problem is between step 6 and 7: Before the client has had the chance to retry (i.e. catch the record with a fetch_locked), another node might have come asking ctdbd to migrate away the record again. This is a real problem, I've seen >20 loops of this kind in real workloads. This patch does the following: Whenever ctdb receives a record as result of step 5, it puts the key on a "holdback list". As long as a key is on this list, a request to migrate away the dmaster is put on hold. It is the client's duty to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2 after having asked ctdb to fetch the record. This will release the key from the "holdback list" and re-issue all dmaster migration requests. As a safeguard against malicious clients, once a second (default 1000msecs, tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of held back keys, deletes them and releases all held back migration requests. (This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)	2009-12-12 00:45:39 +01:00
Volker Lendecke	b664a86bc2	Import "talloc_array_length" from upstream talloc (This used to be ctdb commit 844aa6300ee4d87561e698001ebc15ac1e455528)	2009-12-12 00:45:39 +01:00
Michael Adam	aea324336c	tests: temporarily disable the transaction test tool. Make it return success for make test. This is temporarily disabled until the rewrite of the transaction code (in samba and the daemon) using the global lock feature has been ported to the ctdb client code. Michael (This used to be ctdb commit 78ca29352aa39f4ef4e41096b92d55cb2e0d348a)	2009-12-12 00:45:39 +01:00
Michael Adam	46de365e78	Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)	2009-12-12 00:45:39 +01:00
Michael Adam	8dedde81cd	define CTDB_DB_SEQNUM_KEY - used with the new implementation of transactions. Michael (This used to be ctdb commit 4b1dbcf0853bdc4832d39a477823ae34f216da52)	2009-12-12 00:45:38 +01:00
Volker Lendecke	9f16f655fa	Tiny simplification of ctdb_queue_packet() (This used to be ctdb commit 1640da1cab7e8b545367824204c82931f3346848)	2009-12-12 00:45:38 +01:00
Volker Lendecke	24d04a3e89	Rename a struct member for clarity (This used to be ctdb commit 6af5e74a21546d723008d69d6752ebebf898c947)	2009-12-12 00:45:37 +01:00
Michael Adam	faacd5ca79	server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)	2009-12-12 00:43:26 +01:00
Ronnie Sahlberg	a8549ef700	From: Volker Lendecke <vl@samba.org> Date: Wed, 9 Dec 2009 22:45:12 +0100 Subject: [PATCH] Revert an accidential commit (This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)	2009-12-10 08:53:55 +11:00
Michael Adam	54b9a49e2e	tests: remove the no_trans mode from ctdb_transaction. Writes without transaction are not possible any more on persistent databases. Michael (This used to be ctdb commit 59f46d7261dfdbdef900bf95dd9eb28ad22a46b2)	2009-12-09 22:04:48 +01:00
Michael Adam	332017925f	tests: remove the persistent_unsafe writes test. This is useless now that persistent write operations without transaction are forbidden. Michael (This used to be ctdb commit b022863d44026c19d5aae54aa485b670bea0540e)	2009-12-09 21:57:00 +01:00
Michael Adam	aa6e42a4ba	tests: remove persistent_safe write test. This is useless now that persistent writes without transactions are forbidden. Michael (This used to be ctdb commit 9ac82311d796e1fab31f8de62b8ccc754445093c)	2009-12-09 21:56:59 +01:00
Michael Adam	c32ff2bbb0	test: add test 54_ctdb_transaction_recovery.sh This is like the 53_ctdb_transaction test, but it additionally runs a loop with recoveries while the transactions are running. When called like this, the transaction loops run for 10 minutes: CTDB_TEST_TIMELIMIT=600 tests/scripts/run_tests tests/simple/54_ctdb_transaction_recovery.sh The default timelimit is 30 seconds. Michael (This used to be ctdb commit 2ff2679e8f3d50ebf735f2c420898a84268bdc95)	2009-12-09 21:56:59 +01:00
Michael Adam	edfc6a8c12	test: get value for --timelimit from environment var CTDB_TEST_TIMELIMIT in transaction test Michael (This used to be ctdb commit c13077ca64f6e6569c30ef7fcb044e5711dce1a3)	2009-12-09 21:56:59 +01:00
Michael Adam	c2c9a04cf2	client: lower level of commit retry message WARNING->DEBUG This can happen frequently when recoveries intercept transactions. Michael (This used to be ctdb commit c46adb210e47530488503e20d682d4d182c0fb79)	2009-12-09 21:56:59 +01:00
Michael Adam	97d780bc20	client: lower debug level of transaction-active-retry message to DEBUG This reduces some noise. Michael (This used to be ctdb commit 54d227811753f4a87f1a2c9dc0b1389f5ca2a12f)	2009-12-09 21:56:59 +01:00
Michael Adam	ea65e80223	call: lower the debug message "refusing migration while transction" to lvl INFO This gets just too noisy on a busy system. And it is purley informational anyways... Michael (This used to be ctdb commit 7f64a00c76203fdf6673c3f862a4bfd17fb848d7)	2009-12-09 21:56:59 +01:00
Volker Lendecke	a0d9bd3c13	Run only one event for each epoll_wait/select call This might be a bit less efficient, but experience in winbind has shown that event callbacks can trigger changes in the socket state in very hard to diagnose ways. (This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)	2009-12-10 07:52:16 +11:00
Christian Ambach	47f8c380d2	reduce vacuuming lognoise syslog.h says: LOG_NOTICE 5 normal but significant condition LOG_INFO 6 informational several vacuuming related logs logged at NOTICE level although I don't see any real significance, these are just informational messages for me Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com> (This used to be ctdb commit 142111983c103e90ccccbe26fd580c4eb28e949f)	2009-12-10 07:33:59 +11:00
Christian Ambach	4269d37ce8	improve time jump logging add the __location__ macro to the logs to get a better idea in which loop the problem occured Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com> (This used to be ctdb commit dccb549fd6a6e338063699544e52f2a1a6a966b5)	2009-12-10 07:31:04 +11:00
Ronnie Sahlberg	839670253a	Merge commit 'rusty/script-report' (This used to be ctdb commit 6e8b279ed307eccac08386e98510361ba3ab3d36)	2009-12-09 14:26:42 +11:00
Ronnie Sahlberg	50820f9e18	Bond devices can have any name the user configures, so when checking link status for an interface, first check if this interface is in fact a bond device (by the precense of a /proc/net/bonding/IFACE file) and use that file for checking status. Othervise assume ib* is an infiniband interface which we donnt know how to check, or otherwise it is an ethernet interface and ethtool should hopefully work. (This used to be ctdb commit 8cc6c5de3d7abb0b72eaa6e769e70963b02d84cb)	2009-12-09 11:33:04 +11:00
Ronnie Sahlberg	3ca3f4c771	make sure to also check that interfaces used for NATGW are ok and have a link. if not the node should become unhealthy (This used to be ctdb commit 03b5bbaae1b53830a4cd20d3079ab8f45ffce923)	2009-12-09 11:13:29 +11:00
Stefan Metzmacher	af170d1a8a	events/50.samba: only use wbinfo --ping-dc if available metze (This used to be ctdb commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b)	2009-12-08 07:38:00 +11:00
Rusty Russell	a46c3b4f2a	ctdb: scriptstatus can now query non-monitor events We also no longer return an error before scripts have been run; a special zero-length data means we have never run the scripts. "ctdb scriptstatus all" returns all event script results. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)	2009-12-08 01:50:55 +10:30
Rusty Russell	5d99a1a47c	eventscript: expost call names and enum We're going to need this so ctdb can query non-monitor status. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)	2009-12-08 01:47:13 +10:30
Rusty Russell	0dbe76f88f	eventscript: lock logging on timeout. Ronnie suggested this; seems like a very good idea. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 93153bca68926401dc9ae7fd77ed3f17be923344)	2009-12-08 01:32:36 +10:30
Rusty Russell	9e87377e7a	ctdb: support --machinereadable (-Y) for scriptstatus Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 47ffe75848f216568ce3db0a60ca88cfe3d6903a)	2009-12-08 01:31:53 +10:30
Rusty Russell	b29067b02f	eventscript: get rid of ctdb_control_event_script_finished altogether We always have to call it before freeing the state; we should just do this work in the destructor itself. Unfortunately, the script state would already be freed by the time the state destructor is called, so we make the script state a child of ctdb, and talloc_free() it manually on the one path which doesn't use the destructor. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c1ba1392fe52762960e896ace0aca0ee4faa94d5)	2009-12-08 12:29:10 +10:30
Rusty Russell	d3593c2f83	eventscript: save state for all script invocations Rather than only tranferring to last_status for monitor events, do it for every event (ctdb->last_status is now an array). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)	2009-12-08 12:27:48 +10:30
Rusty Russell	6960fa96eb	eventscript: cleanup finished to take state arg We only need ctdb->current_monitor so we can kill it when we want to run something else; we don't need to use it here as we always know what script we are running. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4cf1b7c32bcf7e4b65aec1fa7ee1a4b162cac889)	2009-12-08 12:24:56 +10:30
Rusty Russell	e548a335bd	eventscript: use wire format internally for script status. The only difference between the exposed an internal structure now is that the name and output fields were pointers. Switch to using ctdb_scripts_wire/ctdb_script_wire internally as well so marshalling is a noop. We now reject scripts which are too long and truncate logging to the 511 characters we have space for (the entire output will be in the normal ctdbd log). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fd2f04554e604bc421806be96b987e601473a9b8)	2009-12-08 12:48:17 +10:30
Rusty Russell	9753b7e793	eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire We're going to allow fetching status of all script runs, so this name is no longer appropriate. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)	2009-12-08 00:51:24 +10:30
Rusty Russell	3ff8bf8138	eventscript: get_current_script() helper This neatens the code slightly. We also use the name 'current' in ctdb_event_script_handler() for uniformity. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e9661b383e0c50b9e3d114b7434dfe601aff5744)	2009-12-08 12:47:24 +10:30
Rusty Russell	cc678d572f	eventscript: use an array rather than a linked list of scripts This brings us closer to the wire format, by using a simple array and a 'current' iterator. The downside is that a 'struct ctdb_script' is no longer a talloc object: the state must be passed to our log fn, and the current script extracted with &state->scripts->scripts[state->current]. The wackiness of marshalling is simplified, and as a bonus, we can distinguish between an empty event directory (state->scripts->num_scripts == 0) and and error (state->scripts == NULL). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 76e8bdc11b953398ce8850de57aa51f30cb46bff)	2009-12-08 12:47:05 +10:30
Rusty Russell	1eda08ea29	eventscript: record script status for all events This unifies almost everything: the state->current pointer points to the struct ctdb_script where we record start, finish, status and output. We still only marshall up the monitor events; the rest disappear when the state structure is freed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c476c81f3e3d8fc62f2e53d82fce5774044ee9ce)	2009-12-08 12:46:18 +10:30
Rusty Russell	9b50f7ee67	eventscript: use scripts array directly, rather than separate list We rename ctdb_monitor_script_status to ctdb_script, and instead of allocating them as the scripts are executed, we allocate them up front and keep a "current" interator. This slightly simplifies the code, though it means we only marshall up to the last successfully run script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit b2a300768536d10bd867a987ad4cf1c5268c44bc)	2009-12-08 12:45:17 +10:30
Rusty Russell	23e24c503c	eventscript: ctdb_fork_with_logging() A new helper functions which sets up an event attached to the child's stdout/stderr which gets routed to the logging callback after being placed in the normal logs. This is a generalization of the previous code which was hardcoded to call ctdb_log_event_script_output. The only subtlety is that we hang the child fds off the output buffer; the destructor for that will flush, which means it has to be destroyed before the output buffer is. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)	2009-12-08 12:44:30 +10:30
Rusty Russell	e84d2f7edb	eventscript: pass struct ctdb_log_state directly to ctdb_log_handler(). The current logging logic assumes that any stdout/stderr belongs to the currently running monitor script output. This isn't quite right anyway, and we'd like to capture stderr output of other script invocations. So we move towards multiple struct ctdb_log_state by handing it directly to ctdb_log_handler to use, rather than having it assume ctdb->log. We need a ctdb pointer inside the log struct now though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 497766cf186442de00fb324343150442457be858)	2009-12-08 00:31:29 +10:30
Rusty Russell	c309d22f9a	eventscript: remove unused ctbd_ctrl_event_script* The child no longer uses ctdb_ctrl_event_script_init or ctdb_ctrl_event_script_finished, and the others are redundant: it doesn't need to tell us it's starting a script when it only runs one. We move start and stop calls to the parent, and eliminate the RPC infrastructure altogether. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)	2009-12-08 00:27:40 +10:30
Rusty Russell	69c30c6ba0	eventscript: refactor forking code into fork_child_for_script() We do the same thing in two places: fire off a child from the initial ctdb_event_script_callback_v() and also from the ctdb_event_script_handler() when it's done. Unify this logic into fork_child_for_script(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 814704a3286756d40c2a6c508c1c0b77fa711891)	2009-12-08 00:22:55 +10:30
Rusty Russell	dd53eee7a2	eventscript: fork() a child for each script. We rename child_run_scripts() to child_run_script(), because it now runs a single script rather than walking the list. When it's finished, we fork the next child from the ctdb_event_script_handler() callback. ctdb_control_event_script_init() and ctdb_control_event_script_finished() are now called directly by the parent process; the child still calls ctdb_ctrl_event_script_start() and ctdb_ctrl_event_script_stop() before and after the script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0fafdcb8d3532a05846abaa5805b2e2f3cee8f47)	2009-12-08 00:21:25 +10:30
Rusty Russell	640b22ff61	eventscript: store from_user and script_list inside state structure This means all the state about running the scripts is in that structure, which helps in the next patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 020fd21e0905e7f11400f6537988645987f2bb32)	2009-12-08 00:15:18 +10:30
Rusty Russell	b8e347ec9c	eventscript: use direct script state pointer for current monitor We put a "scripts" member in ctdb_event_script_state, rather than using a special struct for monitor events. This will fit better as we further unify the different events, and holds the reports from the child process running each monitor script. Rather than making the monitor state a child of current_monitor_status_ctx, we just point current_monitor directly at it. This means we need to reset that pointer in the destructor for ctdb_event_script_state. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)	2009-12-08 00:14:01 +10:30
Rusty Russell	a4c2a98ba9	eventscript: make current_monitor_status_ctx serve as monitor_event_script_ctx We have monitor_event_script_ctx and other_event_script_ctx, and current_monitor_status_ctx in struct ctdb_context. This seems more complex than it needs to be. We use a single "event_script_ctx" as parent for all event script state structures. Then we explicitly reparent monitor events under current_monitor_status_ctx: this is freed every script invocation to kill off any running scripts anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)	2009-12-08 00:09:20 +10:30
Rusty Russell	68e224d9a4	eventscript: split ctdb_run_event_script into multiple parts Simple refactoring in preparation for switching to one-child-per-script. We also call the functions run by the child process "child_". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit bfee777faff75e9bed4aedc1558957483616a6d3)	2009-12-07 23:55:03 +10:30
Rusty Russell	9a0c171fa7	eventscript: hoist work out of child process, into parent This is the start of a move towards finer-grained reporting, with one child per script. Simple code motion to do sanity check and get the list of scripts before fork(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 816b9177f51ae5b21b92ff4a404f548fe9723c96)	2009-12-07 23:53:35 +10:30
Rusty Russell	9914d3f561	eventscript: don't make ourselves healthy if we're under ban_count If we've timed out, but we've not timed out more than ctdb->tunable.script_ban_count, we pretend we haven't. There's a logic bug in the way this is done: if we were unhealthy before, this would set us to "healthy" again (status == 0). I don't think this would happen in real life, but it's a little surprising. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6488c0e05bab5c4c2c0a6370930b0b27e5ed56e)	2009-12-07 23:52:01 +10:30
Rusty Russell	928b8dcb31	eventscript: handle banning within the callbacks Currently the timeout handler in eventscript.c does the banning if a timeout happens. However, because monitor events are different, it has to special case them. As we call the callback anyway in this case, we should make that handle -ETIME as it sees fit: for everyone but the monitor event, we simply ban ourselves. The more complicated monitor event banning logic is now in ctdb_monitor.c where it belongs. Note: I wrapped the other bans in "if (status == -ETIME)", though they should probably ban themselves on any error. This change should be a noop. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)	2009-12-07 23:48:57 +10:30
Rusty Russell	5190932507	eventscript: expost ctdb_ban_self() eventscript.c uses this now, but our next patch makes others use it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)	2009-12-07 23:18:40 +10:30
Rusty Russell	0dd46797d6	eventscript: handle v. unlikely timeout race If we time out just as the child exits, we currently will report an uninitialized cb_status field. Set it to -ETIME as expected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 024386931bda9757079f206238ae09bae4de6ea2)	2009-12-07 23:17:23 +10:30
Rusty Russell	d5d88ecaaf	eventscript: replace other -1 returns with -errno This completes our "problem with script" reporting; we never set cb_status to -1 on error. Real errnos are used where the failure is a system call (eg. read, setpgid), otherwise -EIO is used if we couldn't communicate with the parent. The latter case is a bit useless, since the parent probably won't see the error anyway, but it's neater. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 1269458547795c90d544371332ba1de68df29548)	2009-12-07 23:15:56 +10:30
Rusty Russell	672e06f438	eventscript: simplify ctdb_run_event_script loop If we break, we avoid cut & paste code inside the loop. Need to initialize ret to 0 for the "no scripts" case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit ec36ced9446da7e3bf866466d265ee8e18f606c1)	2009-12-07 23:13:12 +10:30
Rusty Russell	c70afe0cd4	eventscript: handle and report generic stat/execution errors Rather than ignoring deleted event scripts (or pretending that they were "OK"), and discarding other stat errors, we save the errno and turn it into a negative status. This gives us a bit more information if we can't execute a script (eg. too many symlinks or other weird errors). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)	2009-12-07 23:12:19 +10:30
Rusty Russell	b9b75bd065	eventscript: use -ENOEXEC for disabled status value This unifies code paths and simplifies things: we just hand -ENOEXEC to ctdb_ctrl_event_script_stop(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)	2009-12-07 23:11:47 +10:30
Rusty Russell	ce378014c7	eventscript: enhance script delete race check We currently assume 127 == script removed. The script can also return 127; best to re-check the execution status in this case (and for 126, which will happen if the script is non-executable). If the script is no longer executable/not present, we ignore it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0a53d6b5ac81daf0efa32f35e7758ede2a5bdb63)	2009-12-07 23:09:02 +10:30
Rusty Russell	8993d6f523	eventscript: check_executable() to centralize stat/perm checks This is used later in the "script vanished" check. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8ddb97040842375daf378cbb5816d0c2b031fa65)	2009-12-07 23:09:39 +10:30
Rusty Russell	949803528d	talloc: save errno over talloc_free As we start to use errno more, it's a huge pain if talloc_free() can blatt it (esp. destructors). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 76a0ca77feba14e1e1162c195ffbdf516e62aa4d)	2009-12-07 23:05:58 +10:30
Rusty Russell	066a791770	eventscript: use -ETIME for timeout status value This starts the move toward more expressive encoding of return values: positive values mean the script ran, negative means we had a problem with the script (and the value is the errno). This does timeout, but changes the ctdb tool to recognize it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)	2009-12-07 23:09:42 +10:30
Rusty Russell	85a6f4a4dd	eventscript: marshall onto last_status immediately This simplifies the code a little: last_status is now read to go (it's only used by the scriptstatus command at the moment). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)	2009-12-07 23:09:40 +10:30
Ronnie Sahlberg	2c80c91c87	version 1.0.108 (This used to be ctdb commit fff280878e670e93a818c0071f3172056214e8c4)	2009-12-07 19:04:41 +11:00
Ronnie Sahlberg	cdabe16777	Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state. (This used to be ctdb commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6)	2009-12-07 18:27:46 +11:00
Michael Adam	3420278b3a	packaging: package tests/bin/ctdb_transaction under /usr/share/doc/tests/bin For testing/diagnostic purposes. Michael (This used to be ctdb commit b796d736946856abfbe53de95dfcd73072ee8ccd)	2009-12-04 23:18:12 +01:00
Michael Adam	98c108fa33	client: improve two error messages in ctdb_transaction_commit(). Michael (This used to be ctdb commit d971b2ca84c0451dc7e5acbf4a5ade06270a2044)	2009-12-04 15:06:54 +01:00
Michael Adam	c1039fba0e	server:trans2_commit: move the check for active recovery down. This needs to be done after the control-dispatcher: In the TRANS2_COMMIT control, the client->db_id needs to be set before bailing out, since otherwise the next TRANS2_COMMIT_RETRY will fail... Michael (This used to be ctdb commit 59faf3f923a5989b5ee94ef02a12827412775bae)	2009-12-04 15:03:21 +01:00
Michael Adam	cc7438d87d	client: increase the number of commit retries 10-->100 To cope with timeouts when recoveries and transactions collide. Maybe 100 is too high. Michael (This used to be ctdb commit c23d804165e84bdf95ba960c953c736d361011d7)	2009-12-04 15:03:16 +01:00
Michael Adam	b3fd495522	client: untangle checks and produce more detailed error messages in ctdb_transaction_fetch_start Michael (This used to be ctdb commit 428914377851a98b3fc893798783fbfebffc1c0d)	2009-12-04 15:03:16 +01:00
Michael Adam	7afefed6ae	client: increase the rsn of the __transaction_lock__ when storing So that it is correctly handled by recoveries. Also explicitly set the dmaster field to the current node's pnn. Michael (This used to be ctdb commit 03a5bb727b9db1ba952632f08ceb5355f0df842d)	2009-12-04 15:02:41 +01:00
Michael Adam	ffe62722cb	recovery: add special pull-logic for persistent databases The decision mechanism which records of a persistent db are to be pulled into the recdb during recovery is now as follows: * Usually a record with the higher rsn than that already stored is taken. (Just as for normal tdbs.) * If a transaction is running on some node, then those nodes copies of all records are taken and are not overwritten later by other nodes' copies. In order to keep track of whether a record's copy was obtained from a node with a transaction running, the recovery mechanism misuses the ctdb tdb header field 'lacount' in the recdb. It is cleared later when pushing out the recdb database to the other nodes. This way, an incomplete transaction is not spoiled when a recovery interrupts and the replay should usually succeed (possibly after a few retries). Michael (This used to be ctdb commit 8aef46d2aab3efb322dda51eaa202653cefd5222)	2009-12-04 15:00:21 +01:00
Michael Adam	0635f8b98f	make ctdb_ctrl_transaction_active public. Michael (This used to be ctdb commit e5496a83ef4a01604195b27c4b97f50d4979510e)	2009-12-04 11:30:22 +01:00
Michael Adam	9a8134e862	recovery: for persistent db's don't set the dmaster to the recmaster node number It is important to keep track of the dmaster (i.e. the node that last committed a transaction containing changes to this node). Michael (This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c)	2009-12-04 11:30:21 +01:00
Michael Adam	f96e8166de	recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07)	2009-12-04 11:30:21 +01:00
Michael Adam	814e3c501f	tests:ctdb_transaction: print an extra counters when a commit fails Michael (This used to be ctdb commit 4113385865f53a57b18ea752a7dad8a08bed588e)	2009-12-04 11:30:21 +01:00
Michael Adam	27dc0adfb5	client: in catdb, print the keyname first, and separate records by a blank line Michael (This used to be ctdb commit b9882710e12f28c96a0af298e419160f00578241)	2009-12-04 11:30:21 +01:00
Michael Adam	f09090f9ba	packaging: remove the lib/popt from the tarball in debian mode Debian CTDB packaging fails when this is included. Michael (This used to be ctdb commit 574702f8d701fe3e493b31948420b2981eb36f93)	2009-12-04 11:30:21 +01:00
Michael Adam	522c60182e	packaging: rework maketarball.sh to accept an arbitrary githas to pack The githash can be specified through the environment variable "GITHASH" that can contain a commit hash or a tag name, e.g. The call syntax is now [GITHASH=xyz] [USE_GITHASH=yes/no] [DEBIAN_MODE=yes/no] maketarball.sh Michael (This used to be ctdb commit 41aa9bdfa2934f564bdc14374362437dfad0045f)	2009-12-04 11:30:20 +01:00
Michael Adam	92c5d9eefc	ctdb: add command "ctdb wipedb" to wipe the contents of an attached tdb Michael (This used to be ctdb commit 5a7c1e7f15693522bbf1c39a53be2304ece9a134)	2009-12-04 11:30:20 +01:00
Michael Adam	0213cb4d0b	tests: turn printfs into DEBUG statements in the ctdb_transaction test Michael (This used to be ctdb commit 0e130d79ab71cf3aa65c40af91866823246a0283)	2009-12-04 11:30:20 +01:00
Martin Schwenke	7b6072b63d	Merge branch 'status-test-2' (This used to be ctdb commit 5fc297a6bd49d9366703eef3edb9bdf0fe8505cc)	2009-12-04 14:44:46 +11:00
Ronnie Sahlberg	e28c652cca	Dont store debug level DEBUG_DEBUG in the in-memory ringbuffer. It is unlikely we will need something this verbose for normal troubleshooting. This allows us to keep a significantly longer time interval of log messages in the 500k slots available in the ringbuffer. (This used to be ctdb commit cc99c05c0c6484ad574039a454e6133852cb41fa)	2009-12-04 11:45:37 +11:00
Ronnie Sahlberg	8f442f1c0c	Use statically allocated ringbuffer to store the last 500k log entries in memory instead of dynamically allocated ones so that we reduce the pressure on malloc/free. (This used to be ctdb commit c5cbb95512f034abeec515579983bf7ac55eadd9)	2009-12-04 11:36:27 +11:00
Ronnie Sahlberg	daae501d91	Document the procedure to remove/change the NATGW configuration at runtime without restarting the ctdb service (This used to be ctdb commit 0a0526e03ef995b6b6634f5b75c7a17cb7b5df8f)	2009-12-04 08:33:56 +11:00
Rusty Russell	774bf144c1	eventscript: reduce code duplication for ending a script, and fix bug Commit 50c2caed57c0 removed a gratuitous talloc_steal from the code in ctdb_control_event_script_finished(), but not ctdb_event_script_timeout(). Easiest to call ctdb_control_event_script_finished() at the bottom of the timeout routine. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 17fa252d0d6981fbae8083a818f26d5ce9c5102e)	2009-12-02 16:15:57 +10:30
Ronnie Sahlberg	e56c5b2a67	lower the loglevel for the message that a client has attached to a persistent database (This used to be ctdb commit 2027cf3881ba890648c543bacbfd5b06464efc10)	2009-12-02 14:53:21 +11:00
Ronnie Sahlberg	fab11acc65	lower the loglevel for the message that a client has attached through a domian socket (This used to be ctdb commit de9e5236b20d70eac5ed29991703d6d25a103963)	2009-12-02 14:51:57 +11:00
Ronnie Sahlberg	6bad4a4836	Add a proper function to process a process-exist control in the daemon. This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed. If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw. bz58185 (This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)	2009-12-02 13:58:27 +11:00
Ronnie Sahlberg	1c7de7a2ed	Add a double linked list to the ctdb_context to store a mapping between client pids and client structures. Add the mapping to the list everytime we accept() a new client connection and set it up to remove in the destructor when the client structure is freed. (This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)	2009-12-02 13:41:04 +11:00
Ronnie Sahlberg	bf27dc2d53	Use the PID we pick up from the domain socket when a client connects and store this in the client structure. There is no need to rely on the hack that samba sends some special message handle registrations that encodes the pid in the srvid any more. This might not work on AIX since I recall some issues to get the pid in this way on that platform. (This used to be ctdb commit b4a7efa7e53e060a91dea0e8e57b116e2aeacebf)	2009-12-02 13:17:12 +11:00
Ronnie Sahlberg	2b4fbe5c41	version 1.0.107 (This used to be ctdb commit 22f00368b4cb3a6bfb92033a7dbe693d31b41a54)	2009-12-02 11:28:42 +11:00
Rusty Russell	9e84872ecd	ctdb_io: fix use-after-free on invalid packets Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb. His fix was to remove the talloc_free() in that function, which causes loops when a socket is closed (as it does not get removed from the event system), eg: netcat 192.168.1.2 4379 < /dev/null The real bug is that when we have more than one pending packet in the queue, we loop calling the callback without any safeguards should that callback free the queue (as it tends to do on invalid packets). This can be reproduced by sending more than one bogus packet at once: # Length word at start: 4 == empty packet (assumed little endian) /usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt netcat 192.168.1.2 4379 < /tmp/pkt Using a destructor we can check if the callback frees us, and exit immediately. Elsewhere, we return after the callback anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4d0523dd94fb07e860b3e8118691f93d1ef8d0fa)	2009-12-02 11:27:23 +11:00
Ronnie Sahlberg	6f045cad29	version 1.0.106 (This used to be ctdb commit b5a21fd39269a6e2a9d1c8182dd42a1773ccbb3f)	2009-12-02 11:26:51 +11:00
Martin Schwenke	b17bf38c64	Eventscripts: Fix syntax error in 00.ctdb. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9ea261f791ab919eb1ce5b37073b4f1d30694bb8)	2009-12-01 18:08:57 +11:00
Michael Adam	016d092169	packaging:maketarball.sh: add a DEBIAN_MODE to the tarball creation It is triggered by setting DEBIAN_MODE=yes in the environment. This creates a tarball suitable for use in debian packages. The differences from the standard tarball are these: * The tar ball file is called ctdb_VERSION.orig.tar.gz * The base directory in the tar ball is ctdb-VERSION.orig/ Michael (This used to be ctdb commit 83e7c161efa93cd7acdfc803142b4fb3bfde7538)	2009-12-01 18:02:20 +11:00
Michael Adam	15bd5fb8e7	configure:maketarball.sh: call autogen.sh and include configure in the tarball Michael (This used to be ctdb commit bc8aee079e09164e06533a1474f5e9d899795933)	2009-12-01 18:02:05 +11:00
Michael Adam	7430da3839	packaging:maketarball.sh: create the specfile from the ctdb.spec.in Michael (This used to be ctdb commit bb8d02abd88899d259085b9b23fa52accb222be9)	2009-12-01 18:01:46 +11:00
Martin Schwenke	50a26cf75e	Eventscripts: Remove executable bit accidently set on some scripts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4c6e68ae942c05224c5f8b683fbc2dc1adced8ee)	2009-12-01 17:54:45 +11:00
Martin Schwenke	db25ca69e5	Eventscript argument cleanups and introduction of ctdb_standard_event_handler. The functions file no longer causes a side-effect by doing a shift. It also doesn't set a convenience variable for $1. All eventscripts now explicitly use "$1" in their case statement, as does the initscript. The absence of a shift means that the takeip/releaseip events now explicitly reference $2-$4 rather than $1-$3. New function ctdb_standard_event_handler handles the status and setstatus events, and exits for either of those events. It is called via a default case in each eventscript, replacing an explicit status case where applicable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3d55408cbbb3bb71670b80f3dad5639ea0be5b5b)	2009-12-01 17:43:47 +11:00
Ronnie Sahlberg	2000711cb1	when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51)	2009-12-01 16:06:59 +11:00
Ronnie Sahlberg	698a0e4e9a	When starting up ctdbd, wait until all initial recoveries have finished and until we have gone through a full re-recovery timeout without triggering any pending recoveries before we start up the services and start monitoring the node. (This used to be ctdb commit 821333afb458358f90446062b0242790695e5060)	2009-12-01 13:19:58 +11:00
Ronnie Sahlberg	569001afd0	Merge commit 'martins/status-test-2' Conflicts: server/eventscript.c (This used to be ctdb commit e9b3477a5b9a2eff18f727e7d59338bfb5214793)	2009-12-01 10:53:18 +11:00
Martin Schwenke	ad431c3520	Event scripts: functions file now intercepts status and setstatus. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1f37fdc5217e57d2d643d77a811afca747685e0)	2009-11-27 15:57:33 +11:00
Ronnie Sahlberg	3bc643b46b	remove a stray ) so we compile (This used to be ctdb commit 16db4882635d84b8410a77e2ea8b08d0a257b0ab)	2009-11-27 13:35:39 +11:00
Ronnie Sahlberg	266a163c89	dont use talloc_steal() on a object that is already a child of ctdb. (This used to be ctdb commit 50c2caed57c0520f506eaaeeb0bba2c272da6ef6)	2009-11-27 13:28:31 +11:00
Ronnie Sahlberg	eaa6218def	Merge commit 'martins/status-test' into status-test-2 (This used to be ctdb commit 937823cc73eb098230acff4b1583f6d01f26c21a)	2009-11-27 12:50:45 +11:00
Martin Schwenke	dc2c8dfde1	Merge commit 'martins-svart/status-test-2' into status-test (This used to be ctdb commit 0e6c06ac38fd82adf124d111717502055501974a)	2009-11-27 12:49:31 +11:00
Martin Schwenke	ce06d3de46	Event script infrastructure: add reload event to check_options(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c278c798d41a35f58ca81f8f0e08e4dab51eba9b)	2009-11-27 12:04:02 +11:00
Ronnie Sahlberg	09b9bb2f9f	Merge commit 'martins/status-test' into status-test-2 (This used to be ctdb commit 28d0648725e7de4e4d0e8569e3fbfb0fa1d7f934)	2009-11-26 16:26:25 +11:00
Martin Schwenke	88cd194d6a	Merge commit 'martins-svart/status-test-2' into status-test (This used to be ctdb commit 143f1fa3cc4588505e3992c601153ea08be8432d)	2009-11-26 16:25:15 +11:00
Martin Schwenke	a64ccf07c1	Add flag to ctdb_event_script_callback indicating when called by client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)	2009-11-26 15:49:49 +11:00
Ronnie Sahlberg	ed4f3ea3cc	resolve some conflicts from merging from martins branch (This used to be ctdb commit d3e7407dc9854ec358d081777c5450ec68b17862)	2009-11-26 13:42:12 +11:00
Ronnie Sahlberg	e17fa0fdee	change the lock wait child handling to use a pipe isntead of a socketpair remove a stray alarm(30) that caused databases to be unlocked after 30 seconds. (This used to be ctdb commit 12b187f971d857353403393a9850503e0e558672)	2009-11-26 12:08:35 +11:00
Martin Schwenke	8029db6a91	Merge commit 'martins-svart/status-test-2' into status-test Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a2830594ebeb54eb51ff90999cb12370aeec6e8b)	2009-11-26 10:49:47 +11:00
Martin Schwenke	ece15620c0	Event scripts: use $script_name rather than $service name for status. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 517e9d9b188b18dffc712a8fecddb41540d27b8d)	2009-11-25 16:42:14 +11:00
Martin Schwenke	ee10ea202b	Event scripts: Respect CTDB_MANAGES_NFS and add function log_status_cat. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5d97c07be13a8209a81dfc8f73e49371949e4dc3)	2009-11-25 16:34:49 +11:00
Martin Schwenke	1edcb89948	More eventscript cleanups. Initial smoke testing seems OK. Apart from lots of cleanup work, this also fixes a bug where the share checks didn't used to cope with directory names containing spaces. The previous commit also loaded the config incorrectly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3c93336ab92c2e4829ff4dc360045bfa6df21d50)	2009-11-25 16:30:47 +11:00
Ronnie Sahlberg	926261aafc	use a binary tree and sort all ipv4/v6 addresses before we assign them out on nodes. (This used to be ctdb commit 862526e558099fad4c8259cb88da9b776aa7f80d)	2009-11-25 11:54:40 +11:00
Rusty Russell	3188df4a88	eventscript: check that ctdb forced script events correct Now we're doing checking, we might as well make sure the commands from "ctdb eventscripts" are valid. This gets rid of the "UNKNOWN" event type. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)	2009-11-25 11:02:29 +10:30
Ronnie Sahlberg	cd44c8b4e5	iIt is better to plainly disallow clietnts from connecting here if the node is BANNED. Dont even let them attach at all to the database Revert "temporarily try allowing clients to attach to databases even if the node is banned/stopped or inactive in any other way." This reverts commit 227fe99f105bdc3a4f1000f238cbe3adeb3f22f0. (This used to be ctdb commit 10a3680fb3917ecafc824e73872eace321026172)	2009-11-25 08:03:42 +11:00
Martin Schwenke	1c7445d547	Merge commit 'origin/status-test' into status-test (This used to be ctdb commit 2e60749de3714239224cc04170a9aeeee158153f)	2009-11-24 16:14:54 +11:00
Rusty Russell	ff59bb34af	eventscript: check that ctdb forced script events correct Now we're doing checking, we might as well make sure the commands from "ctdb eventscripts" are valid. This gets rid of the "UNKNOWN" event type. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 66b22980b14601f29fe8cc64bd8f29883c7ca1c0)	2009-11-24 11:24:22 +10:30
Rusty Russell	0b4b83aea0	eventscript: check that internal script events are being invoked correctly This is not as good as a compile-time check, but at least we count the number of arguments are correct. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 83b7b233cb4707e826f6ba260bd630c8bc8f1e76)	2009-11-24 11:23:13 +10:30
Rusty Russell	187efa08ab	eventscript: check that internal script events are being invoked correctly This is not as good as a compile-time check, but at least we count the number of arguments are correct. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit a6d353519932eee48f9241ad8887b692882906c9)	2009-11-24 11:23:13 +10:30
Rusty Russell	534c709cba	eventscript: remove call name from state->options Finally, we remove the call name (eg. "monitor" or "start") from the options field of the struct: it now contains only extra options. This is clearer, and mainly involves adding some %s to debug statements. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 33fb0e7ba047ca73969b59bccf70a04a17c25a0a)	2009-11-24 11:22:46 +10:30
Rusty Russell	0ef91a4e1f	eventscript: remove call name from state->options Finally, we remove the call name (eg. "monitor" or "start") from the options field of the struct: it now contains only extra options. This is clearer, and mainly involves adding some %s to debug statements. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit b0648c7f08eba87ec3c9714e2525c9b621bfb4ef)	2009-11-24 11:22:46 +10:30
Rusty Russell	461f52736d	eventscript: put call type into state struct. This means we can get rid of more strcmp; they can simply use the state->call value instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6c79fa33e26cc4f0873577f8e122b1495b4c427e)	2009-11-24 11:19:58 +10:30
Rusty Russell	205011cb61	eventscript: put call type into state struct. This means we can get rid of more strcmp; they can simply use the state->call value instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 834c93b3e1b8f4151b8a2cd82c2dd8bacc17f66c)	2009-11-24 11:19:58 +10:30
Rusty Russell	2d9254404d	eventscript: introduce enum for different event script calls. Rather than doing strcmp everywhere, pass an explicit enum around. This also subtly documents what options are available. The "options" arg is now used for extra arguments only. Unfortunately, gcc complains on empty format strings, so we make ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We leave ctdb_event_script_callback() taking varargs, which means callers have to do "%s", "". For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts from the ctdb tool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)	2009-11-24 11:16:49 +10:30
Rusty Russell	e0c6e2f489	eventscript: introduce enum for different event script calls. Rather than doing strcmp everywhere, pass an explicit enum around. This also subtly documents what options are available. The "options" arg is now used for extra arguments only. Unfortunately, gcc complains on empty format strings, so we make ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We leave ctdb_event_script_callback() taking varargs, which means callers have to do "%s", "". For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts from the ctdb tool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2)	2009-11-24 11:16:49 +10:30
Rusty Russell	2763df22de	eventscript: put timeout inside ctdb_event_script_callback_v Everyone uses the same timeout value, so just remove it from the API. If we ever need variable timeouts, that might as well be central too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)	2009-11-24 11:09:46 +10:30
Rusty Russell	5dee5769d3	eventscript: put timeout inside ctdb_event_script_callback_v Everyone uses the same timeout value, so just remove it from the API. If we ever need variable timeouts, that might as well be central too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fe8027309c1f7b987cd368fa98f9b28741baa786)	2009-11-24 11:09:46 +10:30
Rusty Russell	3845c6e5b8	eventscript: cleanup ctdb_event_script_v ctdb_event_script_v doesn't take varargs. ctdb_run_event_script is a better name, and fix comment. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 466beafadb37011fe273de8810ab0012e92a1fd8)	2009-11-24 11:09:01 +10:30
Rusty Russell	1d68bb35b2	eventscript: typo cleanups 1) ctdb_event_script_v doesn't take varargs. ctdb_run_event_script is a better name, and fix comment. 2) Fix indentation on allowed_scripts. 3) Comment on run_eventscripts_callback is wrong; it's the callback for any ctdb forced event. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e7d57d7ae678b24dab3364a348838c6a3398942c)	2009-11-24 11:08:39 +10:30
Rusty Russell	ab675516cc	eventscript: fix bug in timeouts on forced eventscripts. Again. In 15bc66ae801b0c69, Ronnie fixed a double-free race. The problem was that ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to hang its data off, which gets freed in the callback. This particularly hurt in ctdb_event_script_timeout. There's nothing wrong with this, but obviously we should make the callback call last of all. At the time, ctdb_event_script_timeout() carefully extracted everything from the struct ctdb_event_script_state before calling ->callback. This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state was referred to after the callback again. But the same change introduced a direct use-after-free bug which caused an occasional oops. So in our last episode (eda052101728cf92) Volker fixed this, and Michael committed it. But we still have the double free bug which 15bc66ae801b0c69 was supposed to fix! Let's try to fix this in a more permanent way, but always doing the callback from the destructor. This means we need to hold the status, and don't send the KILL signal if ->child is set to 0. Finally, add a comment about freeing ourselves in run_eventscripts_callback and the structure definition. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit b90bdb07c1f6913ddbf11bde9684bdc8af61c549)	2009-11-24 11:06:53 +10:30
Rusty Russell	0339a83897	eventscript: fix bug in timeouts on forced eventscripts. Again. In 15bc66ae801b0c69, Ronnie fixed a double-free race. The problem was that ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to hang its data off, which gets freed in the callback. This particularly hurt in ctdb_event_script_timeout. There's nothing wrong with this, but obviously we should make the callback call last of all. At the time, ctdb_event_script_timeout() carefully extracted everything from the struct ctdb_event_script_state before calling ->callback. This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state was referred to after the callback again. But the same change introduced a direct use-after-free bug which caused an occasional oops. So in our last episode (eda052101728cf92) Volker fixed this, and Michael committed it. But we still have the double free bug which 15bc66ae801b0c69 was supposed to fix! Let's try to fix this in a more permanent way, but always doing the callback from the destructor. This means we need to hold the status, and don't send the KILL signal if ->child is set to 0. Finally, add a comment about freeing ourselves in run_eventscripts_callback and the structure definition. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 20b15de068d042b292725945927ceda1b01d07c0)	2009-11-24 11:06:53 +10:30
Rusty Russell	8723045c61	eventscript: clean up forked handler event code Write the whole int through the pipe, rather than quietly cutting it off. Also, use -2 as the result if the read fails; -1 comes from many paths if the child fails before running the script. Add a comment about why we don't need to check the write. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6804f880436645b52c09a78fa300377fa8058d0e)	2009-11-24 11:00:13 +10:30
Ronnie Sahlberg	e6b69fa760	rework and simplify the eventscript handling This version has no trailing whitespace, and fixed (This used to be ctdb commit defbe318152fc479e8076ad70433cdb4971951af)	2009-11-25 11:00:11 +10:30
Rusty Russell	b320d434b2	eventscript: clean up forked handler event code Write the whole int through the pipe, rather than quietly cutting it off. Also, use -2 as the result if the read fails; -1 comes from many paths if the child fails before running the script. Add a comment about why we don't need to check the write. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c715746c2f40eb9b21dbf011d16f1f1b0b53fdf9)	2009-11-24 11:00:13 +10:30
Ronnie Sahlberg	a3d072049e	reduce the log level for three vacuuming related log messages (This used to be ctdb commit fbc453733d53359b9eba34a7ca9123237a7ecca5)	2009-11-24 09:27:22 +11:00
Ronnie Sahlberg	eb3b787394	rework and simplify the eventscript handling (This used to be ctdb commit c5f798116bf3b7954e23c7267b056ee1f5560f45)	2009-11-24 07:40:51 +11:00
Martin Schwenke	d595f41f38	More eventscript cleanups. Initial smoke testing seems OK. Apart from lots of cleanup work, this also fixes a bug where the share checks didn't used to cope with directory names containing spaces. The previous commit also loaded the config incorrectly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 35a60a63a9b5c7d98dde514ae552239506b691c9)	2009-11-20 16:45:36 +11:00
Martin Schwenke	a4a048b5cd	Now vaguely tested initscript updates. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f1e350f9edb74cc44b6c5be4c062fd93e98ba8c4)	2009-11-19 16:48:19 +11:00
Martin Schwenke	ee513c1ba2	More untested eventscript factorisation. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ac655b0a65b32d809d47fec9821f7f31bb2fe2a7)	2009-11-19 15:00:17 +11:00
Martin Schwenke	4ea6069de4	Test suite: Make the CIFS tickle test wait until it sees the required tickle. The test depended on the exit code of "ctdb gettickles", which always succeeds. This change wraps the command in a function that checks whether the tickle we're interested in is registered. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c4b05a731e1bee8f5b46529773a4f5389b2b6064)	2009-11-19 14:54:05 +11:00
Ronnie Sahlberg	894a2f9c0b	new version 1.0.105 (This used to be ctdb commit 5fdf842db09cd806248cdbdce2270f39ed213872)	2009-11-19 11:08:14 +11:00
Ronnie Sahlberg	ae209c74c8	dont reset the event script context everytime we start a new "ctdb eventscript ..." command. Use the existing context used for non-monitor events Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV (This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)	2009-11-19 11:03:51 +11:00
Ronnie Sahlberg	cc2d81a77c	make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob (This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)	2009-11-18 19:10:50 +11:00
Ronnie Sahlberg	bc2675119d	add an in memory ringbuffer where we store the last 500000 log entries regardless of log level. add commandt to extract this in memory buffer and to clear it (This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)	2009-11-18 12:44:18 +11:00
Ronnie Sahlberg	24c593d21f	create a new event context for the syslog daemon (This used to be ctdb commit 354c0edacf2d6cec5b295e139d4fec618bad1b06)	2009-11-17 12:07:10 +11:00
Ronnie Sahlberg	61de178e0a	set up a pipe betweent he main daemon and the child we use for syslogling so that we can clean up the childprocess when we stop ctdbd (This used to be ctdb commit cb8df973ccd446d87fbdd9a27843e54841ba5d89)	2009-11-16 15:17:32 +11:00
Martin Schwenke	73cb65bf1a	Eventscripts: Untested factorisations and introduction of status event. This is the first stage of an experimental change to eventscripts. Ronnie and I did a few hours of factorisation of 40.vsftpd and applied many of the changes to 41.httpd. Other eventscripts were also modified. At this stage this is completely untested. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 364e70b763f0ccd7714d15723ad3ea4d7e2968a1)	2009-11-13 18:28:25 +11:00
Ronnie Sahlberg	93d902e8f7	test of a change to make ctdbd use "status" event instead of the "monitor" event. This allows running the actual monitoring asynchronously from ctdbd and only using "status" to pick up the actual results. (This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)	2009-11-13 12:37:55 +11:00
Ronnie Sahlberg	2861bbdd5a	Merge commit 'martins/master' (This used to be ctdb commit b6bde176af69354ccfb00e6a3169f6b355a59d15)	2009-11-13 12:25:31 +11:00
Martin Schwenke	386d23757b	Test suite: Fix the NFS and CIFS tickle tests. The NFS test sleeps for MonitorInterval to give CTDB time to record an NFS tickle. However, this isn't always long enough. This changes the test to wait until a monitor event has actually occurred. The CIFS test assumes that Samba is able to register a tickle with CTDB before it notices that netstat has registered the tickle and can use onnode to ask CTDB about it. That is an incorrect assumption - sometimes we can get to the point of asking CTDB about the tickle before Samba and CTDB have processed it. This adds a timeout loop that makes the CIFS test wait until the tickle has been registered or fail after 10 seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 20a9d35933d89dc7eb710075f360686a49d78609)	2009-11-13 09:44:34 +11:00
Martin Schwenke	9dabb86f3f	Merge commit 'origin/master' (This used to be ctdb commit ffb911896704ddf6bd5a66e43ba2ae8c382e68de)	2009-11-11 12:16:30 +11:00
Mathieu Parent	2a66b7dae4	Fix bashism in events.d/11.natgw Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 6ccb495d1110157c06596763c7e252f3182c251e)	2009-11-10 12:07:30 +01:00
Ronnie Sahlberg	14a6592511	version 1.0.104 (This used to be ctdb commit 5e13a25df5ccf184bd48595c99765a592bbc5969)	2009-11-06 11:16:05 +11:00
Ronnie Sahlberg	3cbaf935af	sugegstion from metze, use killtcp and kill both directions of the nfs connections. we used to kill only one direction since the other direction was unkillble but recent kernels allow us to kill both (This used to be ctdb commit 8001ae580bcc28d45f6026b529d7ffc247cbba34)	2009-11-06 09:54:03 +11:00
Ronnie Sahlberg	f88fbb5f1e	suggestion from Christian, dont allow UNHEALTHY nodes to become natgw master, unless all nodes are unhealthy (This used to be ctdb commit e8e7129ff1371065fbd75e1aea844d6d04a96fa9)	2009-11-06 08:19:32 +11:00
Volker Lendecke	1fa1830f81	Fix a segfault in the eventscript timeout handler. The state was freed too early. Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit eda052101728cf922ce892e3c53b4f37e7ceac42)	2009-11-05 11:13:53 +01:00
Michael Adam	85a4d9a943	ctdb.sysconfig: add a comment section about CTDB_RUN_TIMEOUT_MONITOR Michael (This used to be ctdb commit b7dc1e0720991cc65353e07cf87608acea21ba27)	2009-11-05 11:13:53 +01:00
Michael Adam	95333e0ee7	Add a 99.timeout event script to trigger monitor timeouts. This just sleeps for twice the value of EventScriptTimeout in the monitor action. It is not run by default, but can be activated by setting CTDB_RUN_TIMEOUT_MONITOR in /etc/sysconfig/ctdb . Michael (This used to be ctdb commit 1a3ecdee85b82bb3234a92ae6bcdeb92238eb7ee)	2009-11-05 11:13:47 +01:00
Ronnie Sahlberg	d8f7fd88ac	dont use the pointer after it has been talloc_free()d. (This used to be ctdb commit 1cbf06a126621b3e932925cdad2ef9c009f93d4e)	2009-11-05 16:07:23 +11:00
Ronnie Sahlberg	0d3bff5fa6	From Rusty It's much nicer for post-mortem debugging to have a body to examine. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 058e21d96c3c02759833fd5ddfe7b43e6a5f5740)	2009-11-05 15:57:46 +11:00
Ronnie Sahlberg	c915f2e5d5	add an extra test for the bond devices and check that there is an active slave. this to handle the case where all links do have a physical layer, but where all slaves have been disabled using ifdown (This used to be ctdb commit bf50709630df000583f2b0ef0edc177c01d60eaf)	2009-11-05 12:12:06 +11:00
Ronnie Sahlberg	2501638e15	dont verify winbindd is running properly at startup (This used to be ctdb commit 9e1b99221c8f257129641f6eda2795537b7ce9de)	2009-11-04 07:50:26 +11:00
Ronnie Sahlberg	666d1d019b	new version 1.0.103 (This used to be ctdb commit 020e2e30e56b9675f345ee62d6bf585396208059)	2009-11-03 11:46:37 +11:00
Ronnie Sahlberg	4bf4e15379	move the check to skip vacuuming on persistent database to the ctdb_vacuuming_init() function (This used to be ctdb commit fb83dba255fc91413a475b273e374e0c4d538137)	2009-11-03 10:48:27 +11:00
Michael Adam	e38dda00e7	packaging: use githash in rpm release by default. setting USE_GITHASH=no in the environment makes makerpms.sh omit the git hash Michael (This used to be ctdb commit 209ff041596e39688186c99995863ed3e816b8e4)	2009-11-03 00:16:28 +01:00
Michael Adam	fe9929165f	server: disable vacuuming for persistent tdbs. The vacuum process treats persistent databases the same as non-persistent and thus ignores the extra state for transactions. This way, it breaks the api-level transactions. Michael (This used to be ctdb commit f98fefbc566eefbfcc660646af6e25256ab82b13)	2009-11-03 00:16:28 +01:00
Michael Adam	c532347a45	client: randomize the transaction_start retry loop: instead of sleeping 1 second, sleep between 1 and 100 milliseconds Michael (This used to be ctdb commit a5d90d8ed8b44355c4ffb9c32ded772025fcc174)	2009-10-30 22:02:21 +11:00
Michael Adam	de875c7eec	Revert "dont exit on a commit failure" This reverts commit 4e9a3a5dc232bac12ab387ea0cf4f1b279bed5c1. Transaction commit should not be allowed to fail. This is a real error. Michael (This used to be ctdb commit 825c506da76d7afd0714b75b8c8727874183a618)	2009-10-30 22:01:53 +11:00
Michael Adam	118185670d	client: fix a race in the local race condition fix in transaction_start The gap that remained is between checking whether a transaction commit is in progress and taking the lock. Now we first take the lock and then check whether a transaction commit is in progress. If so, we release the lock, wait for one second and retry. Michael (This used to be ctdb commit b95524c08bf12914120cb6c818ecc1c99738fe37)	2009-10-30 22:01:16 +11:00
Michael Adam	c2855a11a8	client: add a debug message when a transaction_commit needs to be retried Michael (This used to be ctdb commit 9e4902c7d3ad1329c296f4196fcb1396f2a7a6a0)	2009-10-30 22:00:42 +11:00
Michael Adam	5fa3a2c96a	packaging(RPM): don't touch the run levels in ctdb install/udpate. We should really leave it up to the administrator to decide whether ctdb should be started automatically at boot-time. Michael (This used to be ctdb commit c1d8496f9fd5e8046f3d990264258dfb054f3b32)	2009-10-30 21:42:34 +11:00
Ronnie Sahlberg	e33722a569	start the syslog child a little later, after we have forked and detached from the local shell (This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)	2009-10-30 19:39:11 +11:00
Ronnie Sahlberg	5d73f19418	create a child process to write to syslog. use a udp socket on the ctdbd port to send messages to teh syslog child process for loggign. we need this when syslog becomes "slow", like very slow, and on boxes where syslog is limited to 100 lines per second and starts to block after that (This used to be ctdb commit 1446f4c247310e2ff2d522055bd8927d1a78d017)	2009-10-30 18:53:17 +11:00
Michael Adam	673a8588b1	server: fix debug message in trans2_commit (refusing persistent store during transaction) log the right db_id also log the client_id Michael (This used to be ctdb commit 48ac5c77698ab7a28d24629cc8a6985011c5d14d)	2009-10-30 09:29:25 +11:00
Michael Adam	45c17515c3	client: log db_id as 8-digit hex in ctdb_transaction_fetch_start() Michael (This used to be ctdb commit d7b9babda2f7c7f7b95ee19ec75c37200816c6ef)	2009-10-30 09:28:49 +11:00
Michael Adam	1de0c6f807	server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit Michael (This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376)	2009-10-30 09:28:06 +11:00
Michael Adam	7384dfe4a9	server: line-wrap a debug statement in trans2_commit Michael (This used to be ctdb commit 3be446434adb0f3095ac0ef4b7c4a6258780b863)	2009-10-30 09:27:33 +11:00
Michael Adam	7bfa959a86	server: output client_id in some debug messages in trans2_commit Michael (This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87)	2009-10-30 09:26:51 +11:00
Michael Adam	4d073bd779	server: fix a debug message in trans2_commit - log the correct db_id Michael (This used to be ctdb commit ab9657b5a66d5665e6c5fd1bf8eb4074a3bffeec)	2009-10-30 09:26:16 +11:00
Michael Adam	dca16d5f64	server: extend a debug message in ctdb_control_trans2_error() Michael (This used to be ctdb commit 0fb9573d1c838b436ab9be83e197b68f35f94acb)	2009-10-30 09:24:17 +11:00
Michael Adam	2187e6c379	server: add positive debug statements to trans2_commit and trans2_finished When the operation completed / started successfully. Michael (This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66)	2009-10-30 09:23:29 +11:00
Michael Adam	361aec199e	client: improve "control timed out" debug message * add __location__ * wrap overly long line * print unsigned ints as unsigned (reqid, opcode, destnode) Michael (This used to be ctdb commit 6b47ea111867c845974aa2687a658ebca2854816)	2009-10-30 09:22:52 +11:00
Michael Adam	0113744fec	server: trans2_active: don't report a transaction active on the node that performs the transaction Otherwise a node can lock itself out, e.g. when a commit control times out... Michael (This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)	2009-10-30 09:22:18 +11:00
Ronnie Sahlberg	784a89ec62	new version 1.0.102 (This used to be ctdb commit 4892222ffb255dccd8ced1cb047f199386bb3e98)	2009-10-29 13:49:27 +11:00
Wolfgang Mueller-Friedt	9713b8ea9a	ensure tdb names end with .tdb. and any number of digits (This used to be ctdb commit 8ab1349feb64a91cb500c130ea299e2182491f06)	2009-10-29 13:46:37 +11:00
Wolfgang Mueller-Friedt	2c137b7030	vacuuming needed additional check before getting rid of the record; there is a gap between selecting the records and deleting them, therefore we have to check if the records still can be deleted when we actually are about to delete them (This used to be ctdb commit a6fbc65aca35c41c428a82d7402e43c6eaac1d6e)	2009-10-29 13:45:17 +11:00
Ronnie Sahlberg	f5e90ec3b5	Revert "From Wolfgang M." This reverts commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed. (This used to be ctdb commit 363e7e939ad46b3f75c83c30d4163d63876c2456)	2009-10-29 13:44:12 +11:00
Ronnie Sahlberg	9e235af3a2	make the error logged when winbindd fails to access the dc during startup more scary and easier to spot in the logs (This used to be ctdb commit 0c9b0466fd87b3f1e5d53f867c863217802ac43b)	2009-10-29 11:54:24 +11:00
Ronnie Sahlberg	fcd2ebc32b	update the uptime command to indicate that time since last is either from alst recovery or from last failover (This used to be ctdb commit 467da12a785ba3367ed9cbdf79440394e9703289)	2009-10-29 10:58:14 +11:00
Ronnie Sahlberg	023d09cd38	Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)	2009-10-29 10:49:00 +11:00
Ronnie Sahlberg	a4b8a17b26	update the manpage for "update" to indicate the "time since last" indicates the time since the last recovery OR failover (This used to be ctdb commit 22712c577f64ec84851b4addcf4a46c7e99e0662)	2009-10-29 10:32:28 +11:00
Ronnie Sahlberg	279b7ca564	update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover. (This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)	2009-10-29 10:37:10 +11:00
Michael Adam	2419eab0d9	ctdb_client: reformat a comment slightly to enhance clearness. Michael (This used to be ctdb commit 9560f8b7fe0f7ee0386a87c2653333071050fe4b)	2009-10-29 10:15:54 +11:00
Michael Adam	5d579cf665	client: fix race condition with concurrent transactions on the same node. In ctdb_transaction_commit(), when the trans2_commit control fails, there is a race condition in the 1 second sleep between the local transaction_cancel and the call to ctdb_replay_transaction(): The database is not locked, and neither is the transaction_lock record. So another client can start and possibly complete a new transaction in this gap, but only on the same node: The locking of the transaction_lock record on a different node which involves migration of the record to the other node has been disabled by introduction of the transaction_active flag on the db which closes precisely this gap from the start of the commit until the call to TRANS2_FINISH or TRANS2_ERROR. But this mechanism does not cover the case where a process on the same node tries to start a transaction: There is no obstacle to locking the transaction_lock record because the record does not need to be migrated. This commit closes this race condition in ctdb_transaction_fetch_start() by using the new ctdb_ctrl_transaction_active() call to ask the local ctdb daemon whether it has a transaction running on the database. If so, the check is repeated until the running transaction is done. This does introduce an additional call to the local ctdbd when starting transactions, but it does close the (hopefully) last race condition. Michael (This used to be ctdb commit 02ee9dfd3c6b09f5c5172a7e38738c20b7f0aecd)	2009-10-29 10:15:21 +11:00
Michael Adam	953ccee5c5	client: add ctdb_ctrl_transaction_active() which calls out to CTDB_TRANS2_ACTIVE Michael (This used to be ctdb commit 813cfd7c625ac8af4ef169cc92fb6d69f66004c9)	2009-10-29 10:15:00 +11:00
Michael Adam	abac42ca34	server: add a new ctdb control CTDB_TRANS2_ACTIVE This aske the daemon wheter a transaction is currently active on a given DB on that node. More precisely this asks for the transaction_active flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls. This will be useful for fixing race conditions in the transaction code. Michael (This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)	2009-10-29 10:14:30 +11:00
Ronnie Sahlberg	019f3c930e	version 1.0.101 (This used to be ctdb commit 47b67077bdfa64938bb0fa6d1ca8f56fbd5c960e)	2009-10-28 17:42:01 +11:00
Ronnie Sahlberg	d379b30182	create a separate context for non-monitor eventscripts so they dont collide (This used to be ctdb commit 325de818f88f339a16dc4544e899a2d735933c44)	2009-10-28 17:35:15 +11:00
Ronnie Sahlberg	f8a8c0d6e4	return 0 in the event script callback if it was aborted by a different script (This used to be ctdb commit 8d5cb2586a1d5a0255cc18295430927b914d4527)	2009-10-28 16:40:31 +11:00
Ronnie Sahlberg	d82fdcb56f	new version 1.0.100 (This used to be ctdb commit fa34e8a5d588026029dca949151697817fe7f127)	2009-10-28 16:18:28 +11:00
Ronnie Sahlberg	e07ca41886	change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts restructure the talloc hierarchy to allow this (This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)	2009-10-28 16:11:54 +11:00
Martin Schwenke	8767c894a0	Test suite: Regression fix - wait_until should not run command in sub-shell. Commit 25e82a8a667a54c6921ef076c63fdd738dd75d19 changed wait_until() to protect the command it runs from "set -e" by running it in a subshell. This breaks uses where the command is expected to set global variables. For example, wait_until_get_src_socket lost the value of $out from its call to get_src_socket(). The fix is to not be lazy and use a sub-shell! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 39642e745254d93d74dde907787503854fe6ca4a)	2009-10-28 13:02:18 +11:00
Ronnie Sahlberg	3526bc830d	Enhance the logging fromeventscripts. When a single script is finished, also log the name of the script, the duration it took and the return status. In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second (This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)	2009-10-28 09:07:43 +11:00
Ronnie Sahlberg	0588b5f9c5	add a check that winbind can actually talk to teh dc during the startup event and refuse to start up if it can not (This used to be ctdb commit 4037b6e73a819a8e2463dfe0959b42875e05e106)	2009-10-27 15:45:03 +11:00
Ronnie Sahlberg	d1bf89a617	temporarily try allowing clients to attach to databases even if the node is banned/stopped or inactive in any other way. (This used to be ctdb commit 227fe99f105bdc3a4f1000f238cbe3adeb3f22f0)	2009-10-27 15:17:45 +11:00
Ronnie Sahlberg	1d7681709b	dont run the monitor event so frequently after a event has failed. use _exit() instead of exit() when terminating an eventscript. (This used to be ctdb commit cc30ee2f4f33cb75b2be980c2d4dff6c7c23852f)	2009-10-27 13:51:45 +11:00
Ronnie Sahlberg	4d40b86805	for debugging add a global variable holding the pid of the main daemon. change the tracking of time() in the event loop to only check/warn when called from the main daemon (This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)	2009-10-27 13:18:52 +11:00
Stefan Metzmacher	3d713d9e53	ctdb_diagnostics: don't use hardcoded path to iptables All event scripts use only the relative path, so we should here. Also PATH includes /sbin and /usr/sbin... metze (This used to be ctdb commit 20678e1506db1f96b58c326ee91339e797c07c22)	2009-10-26 14:23:09 +11:00
Stefan Metzmacher	1c6829f3c2	ctdb_client: fix DEBUG statement in ctdb_ctrl_modflags() metze (This used to be ctdb commit a244b75ee49556b0ff51e254cc812594ee3b23a7)	2009-10-26 14:22:07 +11:00
Stefan Metzmacher	198866d82d	server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)	2009-10-26 14:21:45 +11:00
Stefan Metzmacher	7a616a0d7b	server: print out the full 64-bit srvid on 32-bit hosts metze (This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949)	2009-10-26 14:20:52 +11:00
Stefan Metzmacher	ee97e2676d	tcp: don't log an error when we succefully bind to the desired address metze (This used to be ctdb commit 752a9c81de97be509de7e7feddde749cc5ee22a8)	2009-10-26 14:20:23 +11:00
Ronnie Sahlberg	299b027b8c	patch the event loop so we read the current time every iteration. log an error if the clock jumps backwards also log an error if the clock jumps >5 seconds forward (we assume here we will get at least one event every 5 seconds) (This used to be ctdb commit 11193e1e192bee6f579bdf1303153571a82711d7)	2009-10-26 13:20:35 +11:00
Ronnie Sahlberg	8aacfa348d	Suggestion from Volker, make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time. (This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)	2009-10-26 12:20:52 +11:00
Ronnie Sahlberg	c36fa583f3	disabel the multipath eventscript by default (This used to be ctdb commit e79c3bcead7bd4bfb74d0aec81908da71551c107)	2009-10-26 10:22:00 +11:00
Ronnie Sahlberg	9db2a5ca05	update the manpage for ctdb setreclock (This used to be ctdb commit ab4a6a58fb002ec29c19d167800e47987b023fe4)	2009-10-26 10:11:00 +11:00
Ronnie Sahlberg	2d06e9d252	automatically re-activate the reclock file check if we set the reclock file to something (This used to be ctdb commit db250cad7c92c1cc0a690725a4e39531a2e1b7fd)	2009-10-26 10:13:20 +11:00
Ronnie Sahlberg	5aaa15fdb2	lower the log level of a debug message (This used to be ctdb commit 496dc2e80b714811c6e69dc928deaad61cf603b1)	2009-10-26 09:35:18 +11:00
Ronnie Sahlberg	86d1b4c465	Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects. The way to use this is from a client to : 1, first create a message handle and bind it to a SRVID A special prefix for the srvid space has been set aside for samba : Only samba is allowed to use srvid's with the top 32 bits set like this. The lower 32 bits are for samba to use internally. 2, register a "notification" using the new control : CTDB_CONTROL_REGISTER_NOTIFY = 114, This control takes as indata a structure like this : struct ctdb_client_notify_register { uint64_t srvid; uint32_t len; uint8_t notify_data[1]; }; srvid is the srvid used in the space set aside above. len and notify_data is an arbitrary blob. When notifications are later sent out to all clients, this is the payload of that notification message. If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster. A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client". 3, a client that no longer wants to have a notification set up can deregister using control CTDB_CONTROL_DEREGISTER_NOTIFY = 115, which takes this as arguments : struct ctdb_client_notify_deregister { uint64_t srvid; }; When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd. (This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)	2009-10-23 15:24:51 +11:00
Ronnie Sahlberg	c61c655769	when scripts timeout, log pstree to a file in /tmp and just log the filename in the messages file (This used to be ctdb commit 0785afba8e5cd501b9e0ecb4a6a44edf43b57ab0)	2009-10-23 13:55:21 +11:00
Ronnie Sahlberg	3c9b43531a	set the eventscripts to timeout after 20 seconds change the ban count to 10 failures before we ban by default (This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)	2009-10-23 13:54:45 +11:00
Ronnie Sahlberg	65757fe1d6	Merge commit 'martins/master' (This used to be ctdb commit 514a60c57557042e463efeff53dd11b9fec40561)	2009-10-23 10:43:13 +11:00
Ronnie Sahlberg	42718a8842	new version 1.0.99 (This used to be ctdb commit 14fca8383b6b1da49278a9181a975543b956161b)	2009-10-22 18:16:33 +11:00
Martin Schwenke	69cca03851	Merge commit 'origin/master' (This used to be ctdb commit f3e09f2cfd33e79e69fc8c84ce4781a31a7a0437)	2009-10-22 17:48:09 +11:00
Martin Schwenke	a128b7e3bb	Document onnode -n and -f options. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 431f79f7c9038ebd95d27c2465207ca40b8f4f23)	2009-10-22 17:47:10 +11:00
Ronnie Sahlberg	e627fae600	if a lock wait child died/finished, we could have released the lockwait handle and set it to NULL before we call the destructors for releaseing the waiters. The waiters reference the locakwait handle in order to remove itself from the li nked list which caused a SEGV. We dont actually need to remove ourselves from this list here since if the parent freeze_handle holding the list is freed, then all waiters are rele ased as well, and the only place we actually need to relink the waiter is in ctd b_freeze_lock_handler, where we want to respond back to the clients and release the waiters but we still want to keep the freeze_handle hanging around. (This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)	2009-10-22 13:41:28 +11:00
Ronnie Sahlberg	902c476c03	From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)	2009-10-22 12:19:40 +11:00
Ronnie Sahlberg	831f9e05a6	From Wolfgang M. With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad. (This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)	2009-10-22 07:58:44 +11:00
Martin Schwenke	8b2101bc61	Merge commit 'origin/master' (This used to be ctdb commit 61282d4a9be9e544aaa86f3cffc5b58e417f5ab1)	2009-10-21 21:48:15 +11:00
Martin Schwenke	12798118a1	Test suite: Remove the disable/enable monitor tests - they are useless. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8264c42969d4be7fc6c5b4d56f8b5ef7c62b3bfb)	2009-10-21 21:47:06 +11:00
Martin Schwenke	f2a9ba6976	Test suite: Fix the timeouts on the skip share check tests. The timeout for waiting for state changes isn't very predictable. It is "about" MonitorInterval seconds... but can be longer given the duration of eventscript runs and other things. So, we change the timeout to MonitorInterval + EventScriptTimeout, hoping it never takes that long. Move the eventscript installation/removal from the old fake-tests into a function in the functions file. Implement supporting functions to create/remove/check-for various files that it handles. Also add a function that uses all of this that waits for the next monitor event (but only if all other monitor events pass). The final check in the skip share check tests uses the above and waits for a monitor event, and then checks that the node is still healthy. Also enhance the wait_until function to handle a command starting with '!' (as a separate word) to make it easy to wait for a file not to exist. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 25e82a8a667a54c6921ef076c63fdd738dd75d19)	2009-10-21 21:36:39 +11:00
Ronnie Sahlberg	d5fd4fc0ce	During tests it is common to add/delete test eventscripts at runtime. This can race with teh eventascript handling that does a : list all scripts, sort them, then execute them so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy (This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)	2009-10-21 16:50:39 +11:00
Ronnie Sahlberg	a92ba7f729	lower the debug levels for the "create FD messages" so we dont fill up the logs. (This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)	2009-10-21 15:26:24 +11:00
Ronnie Sahlberg	9b8c72c446	When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES. Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery. This avoids having queued up very very large number of MESSAGES that samba semds between eachother to nodes that are blocked/banned/stopped for extended periods . (This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)	2009-10-21 15:20:55 +11:00
Ronnie Sahlberg	149ea4e577	dont restart ctdb when installing the rpm (This used to be ctdb commit ead97cabeb1e0b73bff9d45f8aec8b226769ee9f)	2009-10-21 13:54:02 +11:00
Michael Adam	769a36c048	In ctdb_ltdb_store(), add a missing transaction_cancel when local store failed. Spotted by Volker. Michael (This used to be ctdb commit 0a4d409baabf242a87c06293789d589c896b104c)	2009-10-21 12:49:59 +11:00
Ronnie Sahlberg	14b14a2efb	mprove the log message when we skip the ip allocation check from the recovery daemon. we also skip this check if we are already in the process of performing an ip reallocation and not only when we are performing a full recovery. (This used to be ctdb commit 1a09b02767f3928d3c5db0e0afc59bb938e4a445)	2009-10-21 11:51:30 +11:00
Ronnie Sahlberg	ff8363697d	treat interfaces with the name ethX* as bond devices (This used to be ctdb commit 3997d7e5471810e9a2f145ce2e795073dfc5eded)	2009-10-21 11:34:17 +11:00
Martin Schwenke	7b1e9267f2	Test suite: A timeout of MonitorInterval seconds sometimes isn't enough. Monitor events sometimes happen a little bit more than MonitorInterval seconds apart. This changes some timeouts to MonitorInterval + 1 seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6ef4364b3349145b2fec23e0431cd6df6dcadd41)	2009-10-20 17:11:01 +11:00
Martin Schwenke	cd0424cde1	Merge commit 'origin/master' (This used to be ctdb commit a4aac7312947aa3b26bc26993f04b586c64f18cb)	2009-10-20 16:53:04 +11:00
Martin Schwenke	b84c2d3a6e	Test suite: New tests for validating SKIP_SHARE_CHECK options. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f50d64a8ac91415ca297216d2103ff940076f02b)	2009-10-20 16:52:22 +11:00
Martin Schwenke	43780f5f57	Test suite: Update 99_ctdb_uninstall_eventscript.sh to use ctdb_init(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2b478b0f5f09dd06626592573f053706ac637edd)	2009-10-20 16:51:06 +11:00
Martin Schwenke	d79f7647e7	Test suite: Fix bug in node_has_status(). This function has been broken since it was updated to work with the "stopped" state (probably commit 67c5bfb5f02c9d45a32d976021ede4fb2174dfe9). Although ${var#::0} removes the shortest matching prefix of $var, '' can match substrings that include ':' if '0' isn't where you expect. So we were making unexpected matches and incorrectly returning true for some cases. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 11137bc2d492a62a26ec9f9f62ff362e81643f66)	2009-10-20 16:45:29 +11:00
Martin Schwenke	469ee69363	Test suite: add -x option to ctdb_init() function. This facilitates tracing of tests. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1f906bd3476e7cebf217e35b5477d6a7bb615a0c)	2009-10-20 16:44:44 +11:00
Ronnie Sahlberg	6dd7a8bcfa	version 1.0.98 (This used to be ctdb commit 02862c086d045497f49f3c060700419815d607e7)	2009-10-20 15:36:35 +11:00
Ronnie Sahlberg	28f277acd4	From Wolfgang Mueller make sure to always create the vactun database and get rid of some annoying log messages (This used to be ctdb commit 54f9c314a0354f1039208fe6ac7dc159b6db8750)	2009-10-20 13:01:15 +11:00
Ronnie Sahlberg	d788dd3627	From wolfgang Mueller Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned (This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)	2009-10-20 12:59:48 +11:00
Martin Schwenke	b77094e897	Merge commit 'origin/master' (This used to be ctdb commit b3ae2b753261443dca317803752a9d61285a3270)	2009-10-19 16:46:45 +11:00
Ronnie Sahlberg	58780f4137	add a direcotry where multiple local scripts can be added to run when executing eventscripts (This used to be ctdb commit 27d152a918680a59c7412aec7e1772f25b72d469)	2009-10-19 16:22:15 +11:00
Ronnie Sahlberg	cdc77af3ab	wait a bit longer before shutting down when the reclock file is missing pring the filename of the missing file when we turn unhealthy and also a 'df' (This used to be ctdb commit 97ded8a629ec762f71bad28515e4fbc810790b1d)	2009-10-19 15:33:20 +11:00
Ronnie Sahlberg	1e91fd0a25	Revert "dont shutdown a node when the reclock file is temporarily unavailable." This reverts commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50. (This used to be ctdb commit 02f68dc60e0b7bf26d631850b12834d5c71a88f2)	2009-10-19 15:30:44 +11:00
Martin Schwenke	aca9d7f104	Merge branch 'onnode_options' (This used to be ctdb commit 454125ccfda04aa6b4e14f5c05164d29f41a0ead)	2009-10-16 16:39:46 +11:00
Martin Schwenke	b20d680070	Merge commit 'origin/master' (This used to be ctdb commit 5ad283458e59ea8232e01f34be007901c10c8a2e)	2009-10-16 16:36:48 +11:00
Martin Schwenke	0bff3b4289	initscript: when stopping on Red Hat use the success/failure functions. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bf5402b41282da94fee1ab3e4546ec089ff12f37)	2009-10-16 16:35:56 +11:00
Ronnie Sahlberg	598419e57b	Dont run eventscript monitor when the databases are frozen. The databases can become frozen a while before we do the actual recovery since we have the re-recovery timeout. There is no point in doing much monitoring if we are waiting for a recovery, or if we are banned. This will eliminate some annoying log entries where certain tests will fail if the databases are locked. (This used to be ctdb commit ff824676fab94168707aada7423ae766bc0f711c)	2009-10-15 16:03:43 +11:00
Ronnie Sahlberg	d258616984	dont shutdown a node when the reclock file is temporarily unavailable. Leave the node as UNHEALTHY this stops clients from accessing the node until the reclock file can be accessed again (This used to be ctdb commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50)	2009-10-15 13:19:10 +11:00
Ronnie Sahlberg	9de3652380	add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)	2009-10-15 11:24:54 +11:00
Ronnie Sahlberg	6152a7060b	new version 1.0.97 (This used to be ctdb commit ef992a64d2376b621d4d2973ae22e567158aee12)	2009-10-15 07:41:56 +11:00
Ronnie Sahlberg	d08e3c628d	Merge commit 'martins/onnode_options' (This used to be ctdb commit 82fad66123c1b8c5d4ed3b19c39acf6f367b3f37)	2009-10-14 15:51:57 +11:00
Ronnie Sahlberg	53900b99ea	version 1.0.96 (This used to be ctdb commit 536229fd120bc3fdc2419e22d3bd6ab243dd6667)	2009-10-14 14:52:24 +11:00
Ronnie Sahlberg	c58a6b39a6	add more debugging output to eventscripts and when a script has timed out, print a full "pstree -p" to the log. Example : \|-ctdbd(29826)-+-ctdbd(29862) \| `-ctdbd(31897)-+-00.ctdb(31898)---sleep(31908) change the default timeout to 60 seconds for eventscripts (This used to be ctdb commit a3406c10d70f89d332eab25d481083142dff987d)	2009-10-14 14:14:28 +11:00
Martin Schwenke	f0dd32e412	Merge commit 'origin/master' into onnode_options (This used to be ctdb commit e62928f56ce8927b1d8686db2c31538c86462d1a)	2009-10-14 13:49:30 +11:00
Martin Schwenke	787a6e44c6	New onnode options: -f to specify nodes file, -n to allow use of hostnames. The -f option allows an alternate nodes file to be specified, overriding the CTDB_NODES_FILE environment variable. The -n option allows hostnames to be used instead of node numbers. Using a range of hostnames is invalid, so hostnames can't contain hyphens ('-') - sorry! You can use this option without a nodes file by specifying "-f /dev/null". Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 46474e5f21fd97dd765c616647ff46055a9970e7)	2009-10-14 13:44:57 +11:00
Ronnie Sahlberg	30d9fbfbec	move the logging of the warning "No reclock file used" to the startup case so we only print this warning on "service ctdb start" and not for "service ctdb *" (This used to be ctdb commit eb854f65f978f24583e221138eb4f9b917b89285)	2009-10-14 12:12:04 +11:00
Ronnie Sahlberg	80be59d35e	when we change state between healthy/unhealthy, make sure we ask the recovery master to perform an explicit ip reallocation. This is more reliable and faster than having the recovery dameon track these changes, and since we now have an explicit method to ask the recovery daemon to perform an explicit ip reallocation, we should use this. (This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)	2009-10-14 11:59:16 +11:00
Ronnie Sahlberg	4b7a208b16	allow a pre .95 version of a recovery master to freeze databases on a post .95 node by remapping priority numbers and log this to log.ctdb (This used to be ctdb commit 343c005367789e108c0320e95d7a264535d68dd8)	2009-10-14 10:14:03 +11:00
Ronnie Sahlberg	070f781e39	always create the nfs state directories during the monitor event. this allows us to configure and enable nfs at runtime without having to restart ctdbd (This used to be ctdb commit f6e39d35713475defaa08a623e194f3f2f8f7d53)	2009-10-14 09:15:24 +11:00
Ronnie Sahlberg	3ac5a52969	Port Volkers deadlock avoidance patch to HEAD. This patch ensures that we lock all non-notify related databases first and then the notify databases to avoiud a deadlock where samba needs to lock records on two databases at once (and notify being the second database). Newer versions of samba would instead use the set-db-prio control to set this explicitely on a database per database basis instead of relying on hardcoded database names. This patch will be reverted in the future when all updated versions of samba has been pushed out. (This used to be ctdb commit 70e7781df1f118a0e2632a9c634f3fd388fa6c8c)	2009-10-14 08:17:49 +11:00
Ronnie Sahlberg	98b5caf003	we must break the loop as soon as we find a suitable recmaster does exist otherwise "tdb ipreallocate" will silently fail to update the addresses. (This used to be ctdb commit 346fa055f4106497b87df97da5ebd6e51fa1ef8c)	2009-10-13 09:49:05 +11:00
Ronnie Sahlberg	2cb9580464	new version 1.0.95 (This used to be ctdb commit 3501d6b70bd905d6fdc4e74fe2cedc3ba77e4b86)	2009-10-12 18:53:20 +11:00
Ronnie Sahlberg	d66c77d960	use the correct expected size for thew _cancel control (This used to be ctdb commit 5974b5f7998ef96aeadb7377f32ef1ab85bb5943)	2009-10-12 18:41:57 +11:00
Ronnie Sahlberg	44f1d1fea7	add a dispatch to the recovery transaction cancel call (This used to be ctdb commit c1d7c11978d27d2ee41a2129b31d9ab61a43f8da)	2009-10-12 18:31:59 +11:00
Ronnie Sahlberg	df0dba1862	Merge commit 'martins/master' (This used to be ctdb commit 5f14874c5c705dd637f88a77f30c930fea1201d2)	2009-10-12 16:51:36 +11:00
Ronnie Sahlberg	122c423b82	add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d)	2009-10-12 16:48:05 +11:00
Martin Schwenke	ab98c1b0f1	Clean up ctdb_check_directories* eventscript functions. There are 2 problems with this code: * The loop in ctdb_check_directories_probe() breaks on filenames containing whitespace. The fix to protect them is to pass "$@" to this function and have it operate on "$@". Note that there's still a problem with whitespace in filenames in the 50.samba eventscript. To fix this ctdb_check_directories_probe should read the filenames from stdin. Another time... * The check for '%' in filenames in ctdb_check_directories_probe() ends up involving several forks. On a modern machine this can cost a couple of minutes when checking a large number of directories. The fix is to use a case statement. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit eb1fecaef9aa5cb85dff7d4f7af8a9878deabed8)	2009-10-12 16:32:49 +11:00
Martin Schwenke	d8e2ddc5a8	40.vsftpd: reset the fail counter in the "recovered" event. Each recovery that involves IP reassignments results in a restart of vsftpd in the "recovered" event. Currently, we can have several recoveries in quick succession and the "monitor" event following each can fail because vsftpd isn't ready yet. This results in cumulative failures, so the node is marked unhealthy, even though vsftpd has never had a proper opportunity to become ready. This resets the fail count after each recovery. While we're here, also move the delete of the restart flag file into the body of the conditional. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 318abeb4b913a8d846e7eaf4cf5c2a67b61ce974)	2009-10-12 16:17:37 +11:00
Ronnie Sahlberg	771802b212	allow setting the recmode even when not completely frozen. we sometimes have to do this when we want to trigger a recovery (This used to be ctdb commit 46194e87e189521375b39b4ef33da2b493429fd8)	2009-10-12 13:06:16 +11:00
Ronnie Sahlberg	73c0adb029	initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)	2009-10-12 12:08:39 +11:00
Ronnie Sahlberg	d4c98516a2	uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon. this is encoded in the srvid field of the request header (This used to be ctdb commit 0cb3d33caa42ed783e03bc825b181dde4cf63616)	2009-10-12 09:22:17 +11:00
Ronnie Sahlberg	ae57e54566	during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4)	2009-10-10 16:28:20 +11:00
Ronnie Sahlberg	3219f81710	add a control to read the db priority from a database (This used to be ctdb commit ca6d045e419f308f57e74d4c978907afb05ddb85)	2009-10-10 15:04:18 +11:00
Ronnie Sahlberg	6cf7d8e131	add a control to set a database priority. Let newly created databases default to priority 1. database priorities will be used to control in which order databases are locked during recovery in. (This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)	2009-10-10 14:26:09 +11:00
Ronnie Sahlberg	e8e2f35985	verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35)	2009-10-10 13:55:11 +11:00
Ronnie Sahlberg	05137e4718	Fix bug spotted by Metze, the argument to ctdb_control_event_Script_disabled() is a string not a uint32 (This used to be ctdb commit 687535b51622d1fac7ccb38fa640bf1febd69fd8)	2009-10-09 22:22:11 +11:00
Ronnie Sahlberg	eb9a77c887	version 1.0.94 (This used to be ctdb commit 5cb4d63bf6887d15aba37fafc3f6b6ba38027f13)	2009-10-08 19:17:57 +11:00
Ronnie Sahlberg	342148628f	if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9)	2009-10-08 16:45:25 +11:00
Ronnie Sahlberg	d29c4b5c4d	version 1.0.93 (This used to be ctdb commit e77bf5708df6782b4516f698b9981a1d27e2f10b)	2009-10-06 17:05:14 +11:00
Ronnie Sahlberg	42193cbff8	update natgw eventscript to allow you to fore it to update and / or to remove the configuration at runtime (This used to be ctdb commit deed52b7e4aac94b4d11a8d89d08739e1dfd4ed7)	2009-10-06 16:09:24 +11:00
Martin Schwenke	2fa921ba92	Merge commit 'origin/master' (This used to be ctdb commit 7d91de8a837a12082c343980428153720dcad741)	2009-10-06 13:39:31 +11:00
Martin Schwenke	47f5347963	Document CTDB_NODES_FILE environment variable used by onnode. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 22f0065cd6b66fa0f623f465aaca98883955ac79)	2009-10-06 13:38:00 +11:00
Ronnie Sahlberg	134ed842fa	always send the release/take ip controls to make sure all nodes are updated (This used to be ctdb commit 789703ea684717781c176fd3a2a24d96abde220b)	2009-10-06 12:25:44 +11:00
Ronnie Sahlberg	166b1c97b4	add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)	2009-10-06 12:11:32 +11:00
Ronnie Sahlberg	617e393f6b	update addip/moveip/delip to make it less likely to trigger an accidental recovery (This used to be ctdb commit 3befe5526e147d49451fddc930aaafc3dbe2e9c1)	2009-10-06 11:41:18 +11:00
Ronnie Sahlberg	50712d48d3	change some loglevels and also pront the pnn of the ip for takeip/releaseip logging (This used to be ctdb commit 9d95dfbd12898975ba0d8560d95a974210d3de7c)	2009-10-06 11:40:38 +11:00
Ronnie Sahlberg	71e4259150	add a new function to collect a list of all active nodes EXCEPT a certain node (This used to be ctdb commit be52954d921e7d443304cf49fbd488c619a9c4ec)	2009-10-06 10:52:31 +11:00
Ronnie Sahlberg	3133dadd8f	allocate takeoverip state as a child of vnn and also make the takeocerip context a child of vnn (This used to be ctdb commit 804e5905be51f43c8a338bfbe216fd8d5718850f)	2009-10-06 09:35:15 +11:00
Ronnie Sahlberg	709fc77878	When adding a public ip to a node, make sure to push the assignment of ip addresses out to all nodes so all nodes become aware who currently holds the ip. (This used to be ctdb commit e8df6fc301fb7faf72c72eb39ea68d44d1526b00)	2009-10-06 08:19:25 +11:00
Ronnie Sahlberg	1d60064139	version 1.0.92 (This used to be ctdb commit 9ffb0d08d34cbafed0e49350a3a72b15d92c8ea7)	2009-10-02 14:38:16 +10:00
Ronnie Sahlberg	f8334e2f68	we should close this file on exec (This used to be ctdb commit c1c0ebb8da9a6c29ee83868a311f07f30cb4ed16)	2009-10-02 13:41:54 +10:00
Ronnie Sahlberg	2ab8f6a368	Merge commit 'martins/master' (This used to be ctdb commit 9b206d96da3341836cc25aee5693f551f6f3a80e)	2009-10-01 15:46:01 +10:00
Martin Schwenke	3edf5532d5	Test suite: The ctdb ping test should allow time to go backwards. Time can actually go backwards during this test if ntpd happens to adjust it little bit. So we should cope... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 23ae9e9863ea90c6fb3f105403fd098041fa73f4)	2009-10-01 15:39:09 +10:00
Ronnie Sahlberg	dfc2500a1f	dont exit on a commit failure (This used to be ctdb commit 4e9a3a5dc232bac12ab387ea0cf4f1b279bed5c1)	2009-10-01 14:53:35 +10:00
Ronnie Sahlberg	63278ad040	Revert "Revert "allow the transaction commit to fail"" This reverts commit 74e416108df6934f45ca646d709785dd76ab3c35. (This used to be ctdb commit d1d370033d5007ad1c2c34cd9eeac53001f4b13e)	2009-10-01 14:51:32 +10:00
Ronnie Sahlberg	32286b08ac	document how to use the notification script (This used to be ctdb commit b77e4698e7f83443243965f93b84237f2903cd46)	2009-10-01 14:31:55 +10:00
Ronnie Sahlberg	e90dd8015f	add a new notification to trigger on when ctdb has started (This used to be ctdb commit b1fe04f2e9447f762a0b805763deb29296585ff8)	2009-10-01 14:05:30 +10:00
Martin Schwenke	b27600253d	Minor fixes to 01.reclock eventscript. test -z really needs its argument to be quoted. Simplified a status test. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fe26da7780545b1ecc0a7da5bc1cf8beaeea94cc)	2009-09-30 21:21:56 +10:00
Martin Schwenke	78b7043411	40.vsftpd monitor event only fails after 2 failures to connect to port 21. Change the monitor event in 40.vsftpd so it only fails if there are 2 successive failures connecting to port 21. This reduces the likelihood of unhealthy nodes due to vsftpd being restarted for reconfiguration due to node failover or system reconfiguration. New eventscript functions ctdb_counter_init, ctdb_counter_incr, ctdb_counter_limit. These are used to count arbitrary things in eventscripts, depending on the eventscript name and a tag that is passed, and determine if a specified limit has been hit. They're good for counting failures! These functions are used in 40.vsftpd and also in 01.reclock - the latter used to do the counting without these functions. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cfe63636a163730ae9ad3554b78519b3c07d8896)	2009-09-30 21:05:16 +10:00
Martin Schwenke	e169ba85f3	Merge commit 'origin/master' (This used to be ctdb commit 803cfb4cd2f6d139f466053a6d7e104fcb772ef5)	2009-09-30 19:22:59 +10:00
Ronnie Sahlberg	11c56dfd56	New version 1.0.91 (This used to be ctdb commit d1332f4d5d3d3e4b4e0cd362a6903d09e0d5fcbb)	2009-09-29 13:31:41 +10:00
Ronnie Sahlberg	c971d934a9	From Wolfgang Mueller-Friedt Remove the explicit vacuum/repack commands from the 00.ctdb eventscript and implement this in the ctdb daemon. Combine vacuuming and repacking into one cheap read traverse to enumerate all candidate records and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely. this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration. minor stylish changes made by ronnie s (This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)	2009-09-29 13:27:19 +10:00
Martin Schwenke	e976209996	Merge commit 'origin/master' (This used to be ctdb commit 096cdc0c12d22d99f8405bee5cb9f05c616c8492)	2009-09-29 12:59:10 +10:00
Ronnie Sahlberg	9bac6f2e2c	change the reclock fail count to 19 monitor intervals before we shut down ctdbd (This used to be ctdb commit 6e35feb06ec036b9036c5d1cdd94f7cef140d8a6)	2009-09-28 14:12:59 +10:00
Ronnie Sahlberg	4f0f2cc196	add a new eventscript 01.reclock if the reclock file has been set, then this script will test that the reclock file can actually be accessed. if the file does not exist, or if the attempts to stat the file hangs, the node will be marked unhealthy after the third failed monitoring event and after the tenth failure, ctdb itself will shutdown. (This used to be ctdb commit 2cb04747887674def299e574fccb827c1c3194e7)	2009-09-28 14:06:40 +10:00
Ronnie Sahlberg	22dde50be3	add machinereadable output for the ctdb getreclock command (This used to be ctdb commit 5e7dc36f1649824db2f9dab34bede8b388502a57)	2009-09-28 13:39:54 +10:00
Martin Schwenke	4948051bf4	Test suite: Print debug info on node status timeouts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a083a1976d621c76121f1fa2c2f484cfa47267bd)	2009-09-25 18:00:17 +10:00
Ronnie Sahlberg	9add8cdc5a	Merge commit 'obnox/master-rebase' (This used to be ctdb commit edb58a417bfeb094cbbbf96caec8e2918256dad9)	2009-09-25 17:34:59 +10:00
Ronnie Sahlberg	a74ca1a1bb	Merge root@10.1.1.27:/shared/ctdb/ctdb-git (This used to be ctdb commit db7195d762f69577c4e28f0b0e0ded0ac7f91f0b)	2009-09-25 13:18:18 +10:00
Ronnie Sahlberg	a82b9cfbfd	with the new banning logic with one struct for each node we no longer "forget" the other culprits as often as we used to do, which means that things like "ctdb recover" can now actually lead to a node becomming banned if we perform too many recoveries too frequently. change this to provide absolution to all nodes once they have participated in a recovery session. (This used to be ctdb commit f66d17fb2e81a35d5adb3754e1cc902f76b4590a)	2009-09-25 13:14:53 +10:00
Michael Adam	d0289c650e	Revert "dont check if commit failed, we do allow the commit to fail sometimes" This reverts commit affa6f47432507e84b7e76b88a2c27fff8e6e2e4. Transaction commit should not be allowed to fail. This is a fatal error. Michael (This used to be ctdb commit 4364419a486c1995bea56dab603cc4960e7c8e7a)	2009-09-21 11:16:18 +02:00
Michael Adam	fcaca26ec4	Revert "allow the transaction commit to fail" This reverts commit 7a6134e684c9ac4763bf198ef1410867b6082c94. Transaction commit should not be allowed to fail. This is a fatal error. Michael (This used to be ctdb commit 74e416108df6934f45ca646d709785dd76ab3c35)	2009-09-21 11:16:18 +02:00
Michael Adam	3cb4bcd211	ctdb_client: fix race in starting concurrent transactions on a single node There are two races in concurrent transactions on a single node. One in starting a transaction, and one with committing (replaying). This commit closes the first race by storing the pid in the transaction-lock record and comparing the own pid against it as a measure to prevent starting a second transaction when a second node has come inbetween and changed the pid in the lock record. Michael (This used to be ctdb commit 84e5a55a900b01903b80e23045edfc726d8d77a1)	2009-09-21 11:16:18 +02:00
Ronnie Sahlberg	eb305efdb0	Merge commit 'martins/master' (This used to be ctdb commit 0e6a52ee66830e7742eaa392cd3dd9caeb808fb3)	2009-09-18 14:23:37 +10:00
Ronnie Sahlberg	4b7f6c8a29	dont mark the recovery daemon as a ban culprit just because a node in the cluster was set to recvoery mode == ACTIVE. This happens normally when someone explicitely triggers a recovery using "ctdb recover" (This used to be ctdb commit 3085170be8460e59996a3eee4e29fec9ddbcf0f8)	2009-09-18 12:58:30 +10:00
Ronnie Sahlberg	4a05b2dfd8	try restarting ststd indefinitely not just once (This used to be ctdb commit 03b0d913ae009284e2fadda1b9246ec77d19db29)	2009-09-15 19:33:53 +10:00
Ronnie Sahlberg	029fd6b00f	Revert "try to restart statd everytime it fails, not just the first time" This reverts commit 4f7b39a4871af28df1c4545ec37db179fa47a7da. (This used to be ctdb commit db7b96304e4725f29b12398b7582e385daed63ed)	2009-09-15 19:33:35 +10:00
Ronnie Sahlberg	59cacded72	try to restart statd everytime it fails, not just the first time (This used to be ctdb commit 4f7b39a4871af28df1c4545ec37db179fa47a7da)	2009-09-15 13:35:58 +10:00
Ronnie Sahlberg	c3556c3d88	Merge commit 'obnox/master-rebase' (This used to be ctdb commit 1ae3a40705e14efcc24f558cd4d677932765c4fd)	2009-09-15 08:05:33 +10:00
Ronnie Sahlberg	ee9fe64029	Merge root@10.1.1.27:/shared/ctdb/ctdb-git (This used to be ctdb commit b5410e7be0525e6e5cd49ccebc7bbc57086f3cb2)	2009-09-12 07:05:21 +10:00
Ronnie Sahlberg	6e793bec7c	new version 1.0.90 (This used to be ctdb commit 5624da65d3fad1905c9f93a9e41a90b98ad692d2)	2009-09-12 07:30:18 +10:00
Martin Schwenke	3d8fa9e9e3	Test suite: Update "complex" tests for wait_until_node_has_status() change. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 31216fd48117526c943e42d137ce24ef89fa0009)	2009-09-11 16:15:31 +10:00

... 5 6 7 8 9 ...

2822 Commits