samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00

Author	SHA1	Message	Date
Stefan Metzmacher	98ee69c66d	server: add updateip event metze (This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	32d00d0a0d	controls: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE metze (This used to be ctdb commit a2c9e4578e149eccb2c6183f64a6b657eb95c5e1)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	37880b0d0a	server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run We know ask for the known and available interfaces. This means a node gets a RELEASE_IP event for all interfaces it "knows", but doesn't serve and a node only gets a TAKE_IP event for "available" interfaces. metze (This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	b9f6afe4b0	client: add CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE ctdb_ctrl_get_public_ips_flags() metze (This used to be ctdb commit 6bd780510058e5589f2f7c3722d37acbba4935ab)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	15616d3271	reserve upper bits in ctdb_control->flags for opcode specific flags metze (This used to be ctdb commit 91122c322fbec08138b92c528d9a946f6727b4fd)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	bea53c60b8	server: keep the interface information in a list of ctdb_iface structures metze (This used to be ctdb commit ff5291778f0752e176539397e9530dcf0e546bea)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	a1da4e05b5	server: allow multiple interfaces comma separated in public_addresses metze (This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	bec35e6441	server: add a ctdb_set_single_public_ip() helper function metze (This used to be ctdb commit 400b4806c4a9686a2ee6398b5d7c3e0ca0793fd1)	2010-01-20 11:10:57 +01:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Stefan Metzmacher	9cba540514	lib/util: import fault/backtrace handling from samba. metze (This used to be ctdb commit 8171d66f0061fe23ed6dfef87ffe63bfc19596eb)	2010-01-20 09:44:36 +01:00
Stefan Metzmacher	a309287947	move DEBUG* macros to one place metze (This used to be ctdb commit 4b4dd5d7f81bf226e05c7f3d40087043da1517a2)	2010-01-20 09:44:36 +01:00
Ronnie Sahlberg	a1d60b1511	Make the size of the in memory ringbuffer for keeping the recent log messages configureable using --log-ringbuf-size=<num-entries>. Add an entry in the sysconfig file to set this persistently. (This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)	2010-01-15 15:38:56 +11:00
Ronnie Sahlberg	4c722fe34c	fix a conflict in the merge from rusty Merge commit 'rusty/ctdb-no-setsched' Conflicts: server/ctdb_vacuum.c (This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)	2009-12-17 08:18:04 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30
Rusty Russell	f148735928	Add --valgringing flag instead of --nosetsched The do_setsched was being tested for whether to mmap tdbs: let's make it explicit. We can also happily move the kill-child eventscript hack under this flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)	2009-12-16 20:59:15 +10:30
Stefan Metzmacher	aa658b6777	client: make ctdb_dumpdb_record() public metze (This used to be ctdb commit 1cdc8dbb9cb971cf6dd6cd22b1adaf70ddc77e65)	2009-12-16 08:08:32 +01:00
Stefan Metzmacher	0e436b46c6	client: add ctdb_ctrl_getdbhealth() metze (This used to be ctdb commit 5abe44d0113839d3a45c9a31d30856aa70c2ea1f)	2009-12-16 08:08:32 +01:00
Stefan Metzmacher	f1f0af2b67	server: add CTDB_CONTROL_DB_SET_HEALTHY and CTDB_CONTROL_DB_GET_HEALTH metze (This used to be ctdb commit 7332d900538f0cbcd953a723417a0fe31dc9807c)	2009-12-16 08:08:29 +01:00
Stefan Metzmacher	94bc40307a	server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)	2009-12-16 08:06:10 +01:00
Stefan Metzmacher	b74918b465	server: open /var/ctdb/state/persistent_health.tdb.X on startup This node internal tdb will store the HEALTH state of persistent tdbs. metze (This used to be ctdb commit cbda4666be88c11a810a192a70667b57f773ace1)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	9a96ae0c97	server: only do the mkdir() calls for db_directory* once at the start metze (This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)	2009-12-16 08:03:56 +01:00
Stefan Metzmacher	b48228e7f9	server: add db_directory_state to ctdb_context metze (This used to be ctdb commit 656a6ec5ed81ccfbb86144156a3158e48f105ee4)	2009-12-16 08:03:55 +01:00
Ronnie Sahlberg	640c48c844	Revert "cleanup: remove a tunable we no longer use in the eventscripts any more :" This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c. Conflicts: include/ctdb_private.h server/ctdb_tunables.c (This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)	2009-12-16 09:51:17 +11:00
Ronnie Sahlberg	0982299bed	Revert "Make fetch_locked more scalable" This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d. (This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)	2009-12-15 14:26:28 +11:00
Ronnie Sahlberg	5a7e9900df	Merge commit 'obnox/ctdb-wip-trans3' into trans3 (This used to be ctdb commit ac06a0e042e7d024060d6e87a49bda9ccc072c52)	2009-12-15 14:25:55 +11:00
Ronnie Sahlberg	649ba2631d	Rename the tunable EventScriptBanCount to EventScriptTimeoutCount since we no longer ban nodes when dodgy scripts continue to hang. We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban. (This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)	2009-12-14 15:53:23 +11:00
Ronnie Sahlberg	ed6b5a8c68	cleanup: remove a tunable we no longer use in the eventscripts any more : EventScriptUnhealthyOnTimeout (This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)	2009-12-14 15:48:47 +11:00
Ronnie Sahlberg	e76561f544	remove the variable "disable when unhealthy" there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails (This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)	2009-12-14 15:40:54 +11:00
Volker Lendecke	f6ea3e6bcf	Make fetch_locked more scalable This patch improves the handling of the fetch_lock operation on non-persistent databases that ctdb clients have to do very frequently. The normal flow how this goes is the following: 1. Client does a local fetch_lock on the database 2. Client looks if the local node is dmaster. If yes, everything is fine If no, continue here 3. Client unlocks the local record 4. Client issues a "get me the record" call to ctdbd 5. ctdbd goes out and fetches the dmaster role 6. ctdbd tells the client to retry 7. Client starts over again The problem is between step 6 and 7: Before the client has had the chance to retry (i.e. catch the record with a fetch_locked), another node might have come asking ctdbd to migrate away the record again. This is a real problem, I've seen >20 loops of this kind in real workloads. This patch does the following: Whenever ctdb receives a record as result of step 5, it puts the key on a "holdback list". As long as a key is on this list, a request to migrate away the dmaster is put on hold. It is the client's duty to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2 after having asked ctdb to fetch the record. This will release the key from the "holdback list" and re-issue all dmaster migration requests. As a safeguard against malicious clients, once a second (default 1000msecs, tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of held back keys, deletes them and releases all held back migration requests. (This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)	2009-12-12 00:45:39 +01:00
Michael Adam	46de365e78	Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)	2009-12-12 00:45:39 +01:00
Michael Adam	8dedde81cd	define CTDB_DB_SEQNUM_KEY - used with the new implementation of transactions. Michael (This used to be ctdb commit 4b1dbcf0853bdc4832d39a477823ae34f216da52)	2009-12-12 00:45:38 +01:00
Volker Lendecke	24d04a3e89	Rename a struct member for clarity (This used to be ctdb commit 6af5e74a21546d723008d69d6752ebebf898c947)	2009-12-12 00:45:37 +01:00
Michael Adam	faacd5ca79	server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)	2009-12-12 00:43:26 +01:00
Ronnie Sahlberg	a8549ef700	From: Volker Lendecke <vl@samba.org> Date: Wed, 9 Dec 2009 22:45:12 +0100 Subject: [PATCH] Revert an accidential commit (This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)	2009-12-10 08:53:55 +11:00
Volker Lendecke	a0d9bd3c13	Run only one event for each epoll_wait/select call This might be a bit less efficient, but experience in winbind has shown that event callbacks can trigger changes in the socket state in very hard to diagnose ways. (This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)	2009-12-10 07:52:16 +11:00
Rusty Russell	a46c3b4f2a	ctdb: scriptstatus can now query non-monitor events We also no longer return an error before scripts have been run; a special zero-length data means we have never run the scripts. "ctdb scriptstatus all" returns all event script results. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)	2009-12-08 01:50:55 +10:30
Rusty Russell	5d99a1a47c	eventscript: expost call names and enum We're going to need this so ctdb can query non-monitor status. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)	2009-12-08 01:47:13 +10:30
Rusty Russell	d3593c2f83	eventscript: save state for all script invocations Rather than only tranferring to last_status for monitor events, do it for every event (ctdb->last_status is now an array). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)	2009-12-08 12:27:48 +10:30
Rusty Russell	9753b7e793	eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire We're going to allow fetching status of all script runs, so this name is no longer appropriate. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)	2009-12-08 00:51:24 +10:30
Rusty Russell	23e24c503c	eventscript: ctdb_fork_with_logging() A new helper functions which sets up an event attached to the child's stdout/stderr which gets routed to the logging callback after being placed in the normal logs. This is a generalization of the previous code which was hardcoded to call ctdb_log_event_script_output. The only subtlety is that we hang the child fds off the output buffer; the destructor for that will flush, which means it has to be destroyed before the output buffer is. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)	2009-12-08 12:44:30 +10:30
Rusty Russell	c309d22f9a	eventscript: remove unused ctbd_ctrl_event_script* The child no longer uses ctdb_ctrl_event_script_init or ctdb_ctrl_event_script_finished, and the others are redundant: it doesn't need to tell us it's starting a script when it only runs one. We move start and stop calls to the parent, and eliminate the RPC infrastructure altogether. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)	2009-12-08 00:27:40 +10:30
Rusty Russell	b8e347ec9c	eventscript: use direct script state pointer for current monitor We put a "scripts" member in ctdb_event_script_state, rather than using a special struct for monitor events. This will fit better as we further unify the different events, and holds the reports from the child process running each monitor script. Rather than making the monitor state a child of current_monitor_status_ctx, we just point current_monitor directly at it. This means we need to reset that pointer in the destructor for ctdb_event_script_state. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)	2009-12-08 00:14:01 +10:30
Rusty Russell	a4c2a98ba9	eventscript: make current_monitor_status_ctx serve as monitor_event_script_ctx We have monitor_event_script_ctx and other_event_script_ctx, and current_monitor_status_ctx in struct ctdb_context. This seems more complex than it needs to be. We use a single "event_script_ctx" as parent for all event script state structures. Then we explicitly reparent monitor events under current_monitor_status_ctx: this is freed every script invocation to kill off any running scripts anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)	2009-12-08 00:09:20 +10:30
Rusty Russell	5190932507	eventscript: expost ctdb_ban_self() eventscript.c uses this now, but our next patch makes others use it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)	2009-12-07 23:18:40 +10:30
Rusty Russell	b9b75bd065	eventscript: use -ENOEXEC for disabled status value This unifies code paths and simplifies things: we just hand -ENOEXEC to ctdb_ctrl_event_script_stop(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)	2009-12-07 23:11:47 +10:30
Rusty Russell	066a791770	eventscript: use -ETIME for timeout status value This starts the move toward more expressive encoding of return values: positive values mean the script ran, negative means we had a problem with the script (and the value is the errno). This does timeout, but changes the ctdb tool to recognize it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)	2009-12-07 23:09:42 +10:30
Rusty Russell	85a6f4a4dd	eventscript: marshall onto last_status immediately This simplifies the code a little: last_status is now read to go (it's only used by the scriptstatus command at the moment). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)	2009-12-07 23:09:40 +10:30
Michael Adam	0635f8b98f	make ctdb_ctrl_transaction_active public. Michael (This used to be ctdb commit e5496a83ef4a01604195b27c4b97f50d4979510e)	2009-12-04 11:30:22 +01:00
Ronnie Sahlberg	e28c652cca	Dont store debug level DEBUG_DEBUG in the in-memory ringbuffer. It is unlikely we will need something this verbose for normal troubleshooting. This allows us to keep a significantly longer time interval of log messages in the 500k slots available in the ringbuffer. (This used to be ctdb commit cc99c05c0c6484ad574039a454e6133852cb41fa)	2009-12-04 11:45:37 +11:00

1 2 3 4 5 ...

503 Commits