samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-27 14:04:05 +03:00

Author	SHA1	Message	Date
Martin Schwenke	697fcfd15a	Test suite: handle change to disconnected node error message. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 20ea31e4ed893eb58cb2efa0b6fb13bcf4031918)	2010-07-30 16:49:51 +10:00
Ronnie Sahlberg	ddb1c74066	Add a code-style document. Shamelessly sto^H^H^Hborrowed from samba3. (This used to be ctdb commit 8024d9e2d589bfe4dee1cb9a79bec663738cb7fa)	2010-07-30 16:37:22 +10:00
Stefan Metzmacher	794230775c	events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them metze (This used to be ctdb commit 1e08d1578d1960fcfc5fdd85492fbd6d194e5e94)	2010-07-30 16:33:27 +10:00
Ronnie Sahlberg	c5de7cfb8c	Merge commit 'rusty/master' (This used to be ctdb commit b4391c00476cde74101736986dfcd2be6c959edc)	2010-07-30 16:25:40 +10:00
Evan Kinney	0557c418e3	ctdb: Fixed use of reserved word "private" in typedefs In include/ctdb.h, ctdb_callback_t and ctdb_rrl_callback_t were defined with a void private variable. The variable name was changed to void private_data to avoid issues encountered in the Samba autoconf script. Evan Kinney <evan.kinney@sas.com> (This used to be ctdb commit 1f453aa4b5e749468c7788afac09c6f0900ea18f)	2010-07-29 17:16:36 +10:00
Stefan Metzmacher	7b1345d446	config/interface_modify.sh: do the echo before running the script metze Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit bb1d2bd31073304fc203868517144f61d12b7fc2)	2010-07-15 15:06:51 +09:30
Stefan Metzmacher	3b9eeb1049	config/interface_modify.sh: before calling a script check if it exists and is executable For non bash shells $_s_script might end with '/*'. We do the workarround this way, because it makes sense to check that a script is executable, before trying to execute it. metze [ This actually applies to any shell -- Rusty Russell ] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)	2010-07-15 15:06:39 +09:30
Rusty Russell	34ce8a4f02	config: wrap iptables in flock to avoid concurrancy. When doing a releaseip event, we do them in parallel for all the separate IPs. This creates a problem for iptables, which isn't reentrant, giving the strange message: iptables encountered unknown error "18446744073709551615" while initializing table "filter" The worst possible symptom of this is that releaseip won't remove the rule which prevents us listening to clients during releaseip, and the node will be healthy but non-responsive. The simple workaround is to flock-wrap iptables. Better would be to rework the code so we didn't need to use iptables in these paths. CQ:S1018353 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)	2010-07-15 10:45:24 +09:30
Rusty Russell	61d3e09632	ctdb: fix crash on "ctdb scriptstatus --events=releaseip" Martin accidentally typed this instead of "ctdb scriptstatus releaseip" and it crashes. CQ:S1018859 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 70877b2e7f8fd0d46899bbeca2c6caad6e6e6820)	2010-07-12 16:08:37 +09:30
Rusty Russell	145b09c9a8	version: generate RPM version from git This unifies our RPM version handling, based on tags. 1) Tags are of form ctdb-<version>. 2) The first <version> starts with .1. 3) Devel versions end with .0.<patchnum>.<checksum>.devel to reliably identify them. This means that devel versions will correctly supersede releases and earlier devels, but new releases will correctly supersede older devel RPMs. Making a new release is as simple as creating a new git tag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 44009e02a661d4a1e14246f650974fc4ed7a07c9)	2010-07-02 13:22:20 +10:00
Rusty Russell	7061ceffd8	Report client for queue errors. We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)	2010-07-01 23:08:49 +10:00
Rusty Russell	1bbd6e2b13	tdb: improve logging When tdb throws an error, we didn't report the name of the tdb; we should. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit cfea357c9b2142c8cd8cac1ee712d40b188793e1)	2010-07-01 18:33:18 +10:00
Rusty Russell	70082cd669	ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock. We discovered that recent smbd locks the serverid tdb while holding a lock on another tdb (locking.tdb): 7: POSIX ADVISORY WRITE smbd-2224318 locking.tdb.0 10600 10600 22: -> POSIX ADVISORY READ smbd-2224318 serverid.tdb.0 26580 26580 The result is a deadlock against the ctdb_freeze code called for recovery. We extend the "notify" workaround to this case, too. BZ:65158 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit dfdaa446cf256854ff6d267dceeb86fbee8bb188)	2010-07-01 21:46:55 +10:00
Rusty Russell	8f8959a145	speed startup: with --sloppy-start, cut initial election timeout to 1/2 second. Seconds between ctdbd first log message and node healthy: BEFORE: 4.03 AFTER: 2.02 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)	2010-06-22 22:55:20 +09:30
Rusty Russell	8946028a07	speed startup: add --sloppy-start. The extra recovery interval wait was introduced in 821333afb458 but no explanation was provided in that message. Nonetheless, if starting the entire cluster for the first time, it should be safe to skip this. We use the commandline arg --sloppy-start which should discourage people from using it outside testing. Seconds between ctdbd first log message and node healthy: BEFORE: 16.10 AFTER: 4.03 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)	2010-06-22 22:52:34 +09:30
Rusty Russell	ed31caffab	speed startup: run startup immediately after recovery finished. Seconds between ctdbd first log message and node healthy: BEFORE: 17.08 AFTER: 16.10 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 372201d418f041d69646793105f6898ab12a7d91)	2010-06-22 22:50:45 +09:30
Rusty Russell	fabeea6197	speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)	2010-06-22 22:50:35 +09:30
Rusty Russell	eb61b11497	speed startup: immediately run first monitor event after startup. Once we've done a startup, we need to run a monitor event successfully to be marked as healthy. Rather than wait the usual 5 seconds, run it immediately (which will then reset next_interval to 5 seconds). Seconds between ctdbd first log message and node healthy: BEFORE: 23.58 AFTER: 18.09 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c8651494febcb1c9e558b2002e2a72c2bf547c06)	2010-06-22 22:50:07 +09:30
Rusty Russell	f7efc1f8e8	speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)	2010-06-22 22:50:23 +09:30
Ronnie Sahlberg	1bd7982602	Wrap the IDR early, but not too early. We dont want it to wrap almost immediately so that basically all "ctdb ..." commands log the "Reqid wrap" warning. (This used to be ctdb commit f26b59d8b96a70baa80ab1bad406ee6a21330b68)	2010-06-10 14:30:38 +10:00
Ronnie sahlberg	eee814ab47	Merge commit 'rusty/idtree' (This used to be ctdb commit 069db55ea6fa6b8dd278b880c1a325e259f3e172)	2010-06-10 13:33:14 +10:00
Rusty Russell	5f9e4b60ae	Delay reusing ids to make protocol more robust Ronnie and I tracked down a bug which seems to be caused by a node running so slowly that we timed out the request and reused the request id before it responded. The result was that we unlocked the wrong record, leading to the following: ctdbd: tdb_unlock: count is 0 ctdbd: tdb_chainunlock failed smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal) ctdbd: Could not find idr:43 ctdbd: server/ctdb_call.c:492 reqid 43 not found This exact problem is now detected, but in general we want to delay id reuse as long as possible to make our system more robust. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)	2010-06-10 08:58:55 +09:30
Rusty Russell	b31b0d79c3	idtree: fix handling of large ids (eg INT_MAX) Since idtree assigns sequentially, it rarely reaches high numbers. But such numbers can be forced with idr_get_new_above(), and that reveals two bugs: 1) Crash in sub_remove() caused by pa array being too short. 2) Shift by more than 32 in _idr_find(), which is undefined, causing the "outside the current tree" optimization to misfire and return NULL. Signed-off-by: Rusty Russell <rusty@rustorp.com.au> (This used to be ctdb commit 32c04e11ebbcf8239e47016302c6ce802a8b0a6f)	2010-06-10 08:55:56 +09:30
Ronnie Sahlberg	7730facc62	fix a debug message (This used to be ctdb commit 856bd6de6218d9b70baed0e6443be4253ea31afe)	2010-06-09 16:22:44 +10:00
Ronnie Sahlberg	d9a3e1d0c0	idr can timeout and wrap/be reused quite quickly. If a noremote node hangs for an extended period, it is possible that we might have a DMASTER request in flight for record A to that node. Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B. If while the request for B is in flight, the first tnode un-hangs and responds back we would receive a dmaster reply for the wrong record. This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight. (This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)	2010-06-09 16:19:29 +10:00
Ronnie Sahlberg	641da4c691	We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus ( a child process might be holding the lock, but not the main daemon) (This used to be ctdb commit 9b4a83e49c5df80df8498b7384c5f53f390c1d9d)	2010-06-09 15:13:22 +10:00
Ronnie Sahlberg	75f3ef154c	add extra logging for failed ctdb_ltdb_unlock() for a few more places it is called from (This used to be ctdb commit 5c0fea90c6474a51992a9c4aeb6af7dfeb213ee0)	2010-06-09 14:37:24 +10:00
Ronnie Sahlberg	fa618aa66a	add additional logging when tdb_chainunlock() fails so we can see where it was called from when it fails (This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)	2010-06-09 14:37:16 +10:00
Ronnie Sahlberg	f6446adde3	print the db name qwhen a chainunlock fails too (This used to be ctdb commit 7932156d7f25870e6937faca08bf75d3cdbad2e5)	2010-06-09 14:37:08 +10:00
Ronnie Sahlberg	64f2d69e4b	when tdb_chainunlock() fails, print the tdb error that occured (This used to be ctdb commit dcdd2010905b9007fbf7ab71f576cfbd48acce8a)	2010-06-09 14:36:59 +10:00
Ronnie Sahlberg	5699091e9a	Some "ctdb ..." commands can be run without having the main daemon running. In that case, when the main daemon is not running the ctdb context will be initialized to NULL, since we can not connect. Move the calls to read the ctdb socketname and connecting via libctdb to only happen when we are executing a "ctdb ..." command that requires that we talk to the actual daemon. Otherwise we will get an ugly SEGV for the "ctdb ..." commandline tool when trying to run a command that is supposed to work also when the daemon is down. (This used to be ctdb commit 18168da84a6aa8d69465e43402444c7ec979604a)	2010-06-09 09:17:35 +10:00
Rusty Russell	e169d689b9	libctdb: connect TDB logging to our logging A simple connector function, made a bit more complex because TDB adds a '\n' and we don't. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit ae5b89dca00ca080c70868430fa54ba07bd6f5f4)	2010-06-08 18:09:42 +09:30
Rusty Russell	662807b807	libctdb: always check header hasn't changed on local tdb The code on which this is based could alter the header: a normal client can't. If we use this differently later we can change this. For the moment it's a nice extra check. We optimize out the record write altogether when the record hasn't changed, rather than just suppressing the seqnum update. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2638dbae7bf1a35ed37802e35e179e435a5d622a)	2010-06-08 18:10:36 +09:30
Rusty Russell	7589b58138	libctdb: more bool conversion, and accompany lock by ctdb_db in API I missed some int->bool conversions previously, particularly the return of ctdb_writerecord(). By always handing functions ctdb_connection or ctdb_db, we keep it consistent with the rest of the API and can do extra lock consistency checks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 3f939956ddd693cba6ea5c655288f4f5ca95f768)	2010-06-08 17:11:40 +09:30
Rusty Russell	866cca9637	libctdb: clarify logging levels Now we have more messages, it seems to make sense to document their usage and make them consistent. In particular, LOG_CRIT for internal libctdb problems, LOG_ALERT for API misuse. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit a6fed3f577c7ec51df38ed15ecb9db6ea2ae7c8f)	2010-06-08 16:53:17 +09:30
Rusty Russell	7aa68b02b2	libctdb: use magic to detect free/invalid locks Rather than using a binary, we use a magic value for locking. We also split out the "dont have the lock yet" from the "do have the lock" paths for clarity and extra checking. This should detect a superset of the previous case, even if they free (and reuse) the lock memory. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit dc081d40051b9204bb38e4de7dfe8d78656593d0)	2010-06-08 16:52:23 +09:30
Ronnie Sahlberg	a4daf81a7c	Additional log messages when tdb databases can no longer be chainlocked or chainunlocked BZ64688 (This used to be ctdb commit b977901a49a9fed45cc8a2fe880eb749f58278f6)	2010-06-08 12:21:20 +10:00
Ronnie Sahlberg	a530344b84	In ctdb_writerecord() Verify that the lock is still held and refuse the write otherwise. We have to guarantee that we dont write to an unlocked record. If we write to a record after it has been released, the record may have already migrated off the node, in which case we get a DMASTER split brain for this record. (These application bugs are incredibly hard to track down) (This used to be ctdb commit f62c7e44dc303f274bbc1dd59fad2167e72a2af0)	2010-06-05 15:43:01 +10:00
Ronnie Sahlberg	b9e5c8a47b	Split ctdb_release_lock() into a function to release the locvk and another function to free the data structures. This allows us to keep the datastructure valid after the lock has been released by the application and we can trap and warn when the application is accessing the lock after it has been released. I.e. application bugs. (This used to be ctdb commit 463a266205f145cd9c4c36b9c59d3747eeef0e2e)	2010-06-05 15:38:11 +10:00
Ronnie Sahlberg	6e0d612750	update "ctdb pnn" to use the new return value for _recv() where bool false means failure and true means success. (This used to be ctdb commit 8fec60cb92d26886d853c918b8bc7931fec46469)	2010-06-05 14:38:01 +10:00
Ronnie Sahlberg	2e2211575a	Must initialize ctdb->locks or else bad things happen (This used to be ctdb commit 9ec0b9bb148327a40e439d9c643c9d2ff93ce598)	2010-06-05 14:27:46 +10:00
Ronnie Sahlberg	433bc560fb	Update the ctdb tool to use the new signature for ctdb_connect() (This used to be ctdb commit ced3bc40f841d353bc86a6ee9dd1868473223f52)	2010-06-05 14:21:42 +10:00
Rusty Russell	3510980049	libctdb: documentation Full documentation for all the functions. This looks longer than it is, because it sorts them into async and sync parts, and also renames some formal parameters. Added TODO to libctdb directory to track our plans. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 108e9c2450876a9f8821aa7efd5be971eee5afd3)	2010-06-04 20:30:08 +09:30
Rusty Russell	c5b4768816	libctdb: use values from ctdb_protocol.h, don't re-declare We're best off including ctdb_protocol.h to get these, even if we document the important ones in ctdb.h. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit cdc19dc73032470d57f38bf825d8113b3a0c8cd1)	2010-06-04 20:22:03 +09:30
Rusty Russell	3a569c14bc	libctdb: use bool in API Return bool instead of -1/0; that's what the young kids are doing these days! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e285b5d5a9d4fbc4f75dbb237d2fcdbd84f2d605)	2010-06-04 20:19:25 +09:30
Rusty Russell	62df8f9a91	libctdb: track lock for each ctdb_db, complain if they hold too long. In particular, this stops them grabbing two (with wrappers so we can enhance this logic once we support threads), and warns them if they re-enter ctdb_service() holding a lock (you are not supposed to block!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c620cfbad3b5f0d6330ef47f572d4ade08e169e8)	2010-06-04 19:41:42 +09:30
Rusty Russell	d77a1e9399	patch libctdb-use-logging.patch (This used to be ctdb commit fecb8a19e97f6e453066461b234acdb0946bbadd)	2010-06-04 20:27:06 +09:30
Rusty Russell	379fd4e606	libctdb: add logging infrastructure This is based on Ronnie's work, merged with mine. That means errors are all my fault. Differences from Ronnie's: 1) use syslog's LOG_ levels directly. 2) typesafe arg to log function, and use it (eg stderr) in helper function. 3) store fn in ctdb context, and expose ctdb_log_level directly thru API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 86259aa395555aaf7b2fae7326caa2ea62961092)	2010-06-04 20:27:03 +09:30
Rusty Russell	cc8435852c	libctdb: add ctdb arg to more functions. This is going to help for logging, since we want it there. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0786152472bc43efae4c896f7c6c07c6e080b9b2)	2010-06-04 16:54:08 +09:30
Ronnie Sahlberg	e8fb04de01	Readrecordlock changes: Make the use of ctdb_release_lock() mandatory from the callback. Split ctdb_release_lock() in two, release the tdb lock in the ctdb_release_lock() function and move the freeing of the lock structure to ctdb_free_lock() which is private to libctdb. When the callback returns, verify that the callback has actually released the lock and warn (FIXME) if not. Update ctdb_writerecord to warn and fail (FIXME) if writing while the lock is not held. (This used to be ctdb commit 87dc18a3a051da04685f14529c53c428d37c2912)	2010-06-04 14:47:06 +10:00

1 2 3 4 5 ...

2940 Commits