samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00

Author	SHA1	Message	Date
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Stefan Metzmacher	8456a1b0ef	server: setup fault handler to get the build-in backtrace support The panic action feature will be added later. metze (This used to be ctdb commit 37d11895e96ba8bc8c9ba159083970c45f76d9bb)	2010-01-20 09:44:36 +01:00
Stefan Metzmacher	2f36e78d88	server: add missing goto again after do_recovery() metze (This used to be ctdb commit 898894d3acbcc0add2ab0706a3172a446622f687)	2010-01-20 09:44:35 +01:00
Ronnie Sahlberg	a1d60b1511	Make the size of the in memory ringbuffer for keeping the recent log messages configureable using --log-ringbuf-size=<num-entries>. Add an entry in the sysconfig file to set this persistently. (This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)	2010-01-15 15:38:56 +11:00
Stefan Metzmacher	1eb8015ee0	server: call event_add_fd at the end of ctdb_set_child_logging() metze (This used to be ctdb commit 608e0765130aa9bca0aa77db5a888c413867a3fd)	2010-01-12 12:20:15 +01:00
Stefan Metzmacher	1b0f9c3db7	ctdb_logging: simplify ctdb_fork_with_logging a lot and reduce the syscall usage metze (This used to be ctdb commit acb98c36a3d56fa6b34747015b913ada3eaa133f)	2010-01-12 12:20:15 +01:00
Rusty Russell	565b2cda11	eventscript: fix bug when script is aborted Another corner case when we terminate running monitor scripts to run something else: logging can flush the output and we write to a NULL pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit eb22c34bccc8a04fcf63efa2bc48d9788709382e)	2009-12-18 14:48:41 +11:00
Rusty Russell	e757b7c4bf	eventscript: remove cb_status, fix uninitialized bug when monitoring aborted (Reapplied with merge after accidental revert) Previously we updated cb_status a each script finished. Since we're storing the status anyway, we can calculate it by iterating the scripts array itself, providing clear and uniform behavior on all code paths. In particular, this fixes a longstanding bug when we abort monitor scripts to run some other script: the cb_status was uninitialized. In this case, we need to hand something to the callback; 0 might make us go healthy when we shouldn't. So we use the last status (normally, this will be the just-saved current status). In addition, we make the case of failing the first fork for the script and failing other script forks the same: the error is returned via the callback and saved for viewing through 'ctdb scriptstatus'. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2c84fe393ff2b961abf77d58a371c24db5ecb93b)	2009-12-18 14:48:35 +11:00
Rusty Russell	4dce0690de	eventscript: fix cleanup path when setting up script list We shouldn't set ctdb->current_monitor until we set destructor: that's what cleans it up. Also, free state->scripts on no-scripts exit path: it's not a child of state because we need it in the destructor. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 843a2ed5ef85f628788b0caf7417c6b61b5c6d3f)	2009-12-18 12:31:34 +11:00
Stefan Metzmacher	77c4a86351	server: add set_close_on_exec() on more fds metze (This used to be ctdb commit 7101ae80bf4e530f48e31e4c58707aa45a9fd3d5)	2009-12-17 14:41:07 +01:00
Stefan Metzmacher	bbfa4402e4	server: fix fd leaks in the new logging code metze (This used to be ctdb commit 140070dd81b39545fe2d56f70e9b9c96bfdae07f)	2009-12-17 13:05:39 +01:00
Ronnie Sahlberg	9b507abd6e	version 1.0.109 (This used to be ctdb commit 99894a70fe2ebfe43daae7e88ff0fc9cab33e0fb)	2009-12-17 15:49:01 +11:00
Rusty Russell	8aec7e5656	eventscript: remove cb_status, fix uninitialized bug when monitoring aborted Previously we updated cb_status a each script finished. Since we're storing the status anyway, we can calculate it by iterating the scripts array itself, providing clear and uniform behavior on all code paths. In particular, this fixes a longstanding bug when we abort monitor scripts to run some other script: the cb_status was uninitialized. In this case, we need to hand something to the callback; 0 might make us go healthy when we shouldn't. So we use the last status (normally, this will be the just-saved current status). In addition, we make the case of failing the first fork for the script and failing other script forks the same: the error is returned via the callback and saved for viewing through 'ctdb scriptstatus'. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 5d50f0e16948d18009f6623f132113f7273efc7f)	2009-12-17 15:39:46 +11:00
Ronnie Sahlberg	4c722fe34c	fix a conflict in the merge from rusty Merge commit 'rusty/ctdb-no-setsched' Conflicts: server/ctdb_vacuum.c (This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)	2009-12-17 08:18:04 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30
Rusty Russell	f148735928	Add --valgringing flag instead of --nosetsched The do_setsched was being tested for whether to mmap tdbs: let's make it explicit. We can also happily move the kill-child eventscript hack under this flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)	2009-12-16 20:59:15 +10:30
Stefan Metzmacher	f1f0af2b67	server: add CTDB_CONTROL_DB_SET_HEALTHY and CTDB_CONTROL_DB_GET_HEALTH metze (This used to be ctdb commit 7332d900538f0cbcd953a723417a0fe31dc9807c)	2009-12-16 08:08:29 +01:00
Stefan Metzmacher	94bc40307a	server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)	2009-12-16 08:06:10 +01:00
Stefan Metzmacher	9069d3a7fb	server: move error handling to a 'fail' label in ctdb_control_transaction_commit() metze (This used to be ctdb commit d874463235fa299e83fe562291c688aca3b85cf3)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	8fbb5b7915	server/recovery: update flags on nodes before syncing dbs metze (This used to be ctdb commit 49d2dca9ad837e1b397294fb0e966bf0b77f751c)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	b74918b465	server: open /var/ctdb/state/persistent_health.tdb.X on startup This node internal tdb will store the HEALTH state of persistent tdbs. metze (This used to be ctdb commit cbda4666be88c11a810a192a70667b57f773ace1)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	7f05a423e2	server: create vactune.tdb.X with 0600 permissions metze (This used to be ctdb commit 21677ed6fb8c589f348321533c608cad58c4ec93)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	473f02ed48	server: create vactun.tdb.X under /var/ctdb/state metze (This used to be ctdb commit 1db17f312558fe59983a3465680e56c9f0c19e36)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	77d43d01aa	server: create recdb.tdb.X in /var/ctdb/state/ metze (This used to be ctdb commit 92e05282d6c4f16e55d914cc3bde3738ea2d44ad)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	9a96ae0c97	server: only do the mkdir() calls for db_directory* once at the start metze (This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	b48228e7f9	server: add db_directory_state to ctdb_context metze (This used to be ctdb commit 656a6ec5ed81ccfbb86144156a3158e48f105ee4)	2009-12-16 08:03:55 +01:00
Stefan Metzmacher	cda5884854	server: create tdbs with 0600 permissions in ctdb_local_attach() metze (This used to be ctdb commit 6529a1328b9ec304ad306674651b2a67e4426e23)	2009-12-16 08:03:55 +01:00
Stefan Metzmacher	003985acfd	ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls metze Signed-off-by: Stefan Metzmacher <metze@samba.org> (This used to be ctdb commit 1635e931b909c66eb3b1f5357e3a549b1a0da70d)	2009-12-16 08:03:55 +01:00
Ronnie Sahlberg	640c48c844	Revert "cleanup: remove a tunable we no longer use in the eventscripts any more :" This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c. Conflicts: include/ctdb_private.h server/ctdb_tunables.c (This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)	2009-12-16 09:51:17 +11:00
Ronnie Sahlberg	fcd16342f6	Merge branch 'trans3' (This used to be ctdb commit b765e12a5fb87a6121e49b349017b6a961929346)	2009-12-15 21:00:22 +11:00
Ronnie Sahlberg	b3104bd1d0	Author: Rusty Russell <rusty@rustcorp.com.au> Date: Tue Dec 15 15:53:30 2009 +1030 eventscript: hack to avoid overloading valgrind Now we fork one child per script, when running under valgrind the load gets quite high. This is because valgrind does a lot of work after exit, and we don't wait for the children to finish; we start the next one when the child reports status via the pipe. This fix is ugly, but simple. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6ed34d5320c39d8a55f2a36ad4c1ab574e0b0796)	2009-12-15 20:56:16 +11:00
Ronnie Sahlberg	842aa60d52	This is a dodgy patch. I saw once where the master ctdbd logging structure was talloc freed which caused issues. So only free the structure if it is NOT the master structure. This needs to be looked into in more detail. (This used to be ctdb commit bcf494b81f4277dc75f05faccf0c446bd15f6e2b)	2009-12-15 19:04:52 +11:00
Ronnie Sahlberg	0982299bed	Revert "Make fetch_locked more scalable" This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d. (This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)	2009-12-15 14:26:28 +11:00
Ronnie Sahlberg	5a7e9900df	Merge commit 'obnox/ctdb-wip-trans3' into trans3 (This used to be ctdb commit ac06a0e042e7d024060d6e87a49bda9ccc072c52)	2009-12-15 14:25:55 +11:00
Ronnie Sahlberg	e2e30df2e9	When setting up the logging, set the event to trigger a read of a log message from a child process as a child of the "log" structure and not the ctdb structure, or else we can crash if we receive log messages from a child but the log structure has been freed() (This used to be ctdb commit ea9e39369379939abf6a4076fa2014c10c1a9ad0)	2009-12-15 10:45:18 +11:00
Ronnie Sahlberg	db0d2a1b8f	From rusty: Subject: eventscript: fix spinning at 100% cpu when child exits. ctdbd was spinning reading 0 from a pipe, as soon as the first eventscript finishes. This was caused by the intersection between a78b8ea7168e "Run only one event for each epoll_wait/select call" and 32cfdc3aec34 "eventscript: ctdb_fork_with_logging()". Unavoidable mid-air collision, since both worked fine and both were developed simultaneously. When the script exits, we have two pipes open to it: one for any stdout/stderr for logging (ctdb_log_handler), and one for the result (ctdb_event_script_handler). The latter frees everything, including the log fd and event structure. We used to get one callback to ctdb_log_handler, which got a harmless 0-length read, then one to ctdb_event_script_handler which cleaned up. Now we only do one callback per poll, we need the logging function to clean itself up so we can make process. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 211ea7907e8e96041aa6f7d086551d64d065a8a3)	2009-12-15 10:23:58 +11:00
Ronnie Sahlberg	649ba2631d	Rename the tunable EventScriptBanCount to EventScriptTimeoutCount since we no longer ban nodes when dodgy scripts continue to hang. We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban. (This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)	2009-12-14 15:53:23 +11:00
Ronnie Sahlberg	ed6b5a8c68	cleanup: remove a tunable we no longer use in the eventscripts any more : EventScriptUnhealthyOnTimeout (This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)	2009-12-14 15:48:47 +11:00
Rusty Russell	784fa9fd8a	eventscript: fix monitoring when killed by another script command Commit c1ba1392fe "eventscript: get rid of ctdb_control_event_script_finished altogether" was wrong: there is one case where we want to free the script without transferring their status to last_status. This happens because we always kill an running monitor command when we run any other command. This still isn't quite right (and never was): the callback will be called with status value 0, which might flip us to HEALTHY if we were unhealthy. This is conveniently fixed in my next set of patches :) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0ea0e27d93398df997d3df9d8bf112358af3a4a5)	2009-12-14 15:46:14 +11:00
Ronnie Sahlberg	e76561f544	remove the variable "disable when unhealthy" there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails (This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)	2009-12-14 15:40:54 +11:00
Michael Adam	b41d9a2bcc	Revert "recovery: add special pull-logic for persistent databases" This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222. This special recovery logic is wrong now with the transaction rewrite. The treatment of persistent databases will later be rewritten to use the database sequence number. Michael (This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be)	2009-12-12 00:45:40 +01:00
Volker Lendecke	f6ea3e6bcf	Make fetch_locked more scalable This patch improves the handling of the fetch_lock operation on non-persistent databases that ctdb clients have to do very frequently. The normal flow how this goes is the following: 1. Client does a local fetch_lock on the database 2. Client looks if the local node is dmaster. If yes, everything is fine If no, continue here 3. Client unlocks the local record 4. Client issues a "get me the record" call to ctdbd 5. ctdbd goes out and fetches the dmaster role 6. ctdbd tells the client to retry 7. Client starts over again The problem is between step 6 and 7: Before the client has had the chance to retry (i.e. catch the record with a fetch_locked), another node might have come asking ctdbd to migrate away the record again. This is a real problem, I've seen >20 loops of this kind in real workloads. This patch does the following: Whenever ctdb receives a record as result of step 5, it puts the key on a "holdback list". As long as a key is on this list, a request to migrate away the dmaster is put on hold. It is the client's duty to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2 after having asked ctdb to fetch the record. This will release the key from the "holdback list" and re-issue all dmaster migration requests. As a safeguard against malicious clients, once a second (default 1000msecs, tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of held back keys, deletes them and releases all held back migration requests. (This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)	2009-12-12 00:45:39 +01:00
Michael Adam	46de365e78	Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)	2009-12-12 00:45:39 +01:00
Volker Lendecke	9f16f655fa	Tiny simplification of ctdb_queue_packet() (This used to be ctdb commit 1640da1cab7e8b545367824204c82931f3346848)	2009-12-12 00:45:38 +01:00
Volker Lendecke	24d04a3e89	Rename a struct member for clarity (This used to be ctdb commit 6af5e74a21546d723008d69d6752ebebf898c947)	2009-12-12 00:45:37 +01:00
Michael Adam	faacd5ca79	server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)	2009-12-12 00:43:26 +01:00
Michael Adam	ea65e80223	call: lower the debug message "refusing migration while transction" to lvl INFO This gets just too noisy on a busy system. And it is purley informational anyways... Michael (This used to be ctdb commit 7f64a00c76203fdf6673c3f862a4bfd17fb848d7)	2009-12-09 21:56:59 +01:00
Volker Lendecke	a0d9bd3c13	Run only one event for each epoll_wait/select call This might be a bit less efficient, but experience in winbind has shown that event callbacks can trigger changes in the socket state in very hard to diagnose ways. (This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)	2009-12-10 07:52:16 +11:00
Christian Ambach	47f8c380d2	reduce vacuuming lognoise syslog.h says: LOG_NOTICE 5 normal but significant condition LOG_INFO 6 informational several vacuuming related logs logged at NOTICE level although I don't see any real significance, these are just informational messages for me Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com> (This used to be ctdb commit 142111983c103e90ccccbe26fd580c4eb28e949f)	2009-12-10 07:33:59 +11:00

1 2 3 4 5 ...

682 Commits