samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	c6e20a06c7	set up a handler to catch and log debug messages from the tevent layer (This used to be ctdb commit fdb4c02f595fa207310a9a48da3fefd653fa9e4b)	2010-09-28 08:30:26 +10:00
Stefan Metzmacher	0b5bd411ca	server/banning: also release all ips if we're banning ourself metze (This used to be ctdb commit c386f2c62f06f1c60047b7d4b1ec7a9eec11873c)	2010-09-14 15:50:31 +10:00
Stefan Metzmacher	5e46150490	server/recoverd: if we can't get the recovery lock, ban ourself metze (This used to be ctdb commit 80b8889267339b870868841ff077e850bc5b52e2)	2010-09-14 15:49:01 +10:00
Stefan Metzmacher	ff77985f38	server/recoverd: do takeover_run after verifying the reclock file metze (This used to be ctdb commit 93df096773c89f21f77b3bcf9aa90bf28881b852)	2010-09-14 15:48:37 +10:00
Stefan Metzmacher	96ddf2f607	server/monitor: ask for a takeoverrun after propagating our new flags metze (This used to be ctdb commit 942f44123350d4d0c4ad7f3fcd5ff2d0d175739b)	2010-09-14 15:48:10 +10:00
Ronnie Sahlberg	d8d8b9e1d7	add a new serverid to send a message everytime an ip address is taken on the local node (This used to be ctdb commit 1261f3d9702800a4e59550c881350daf479f00ef)	2010-09-13 15:43:19 +10:00
Ronnie Sahlberg	19211f99c8	remove an unused variable (This used to be ctdb commit e07fdbaf12bbe84370bc47a1979fe198a06a6cc8)	2010-09-13 13:13:12 +10:00
Ronnie Sahlberg	7c682dda59	When memory allocations for recovery fails, dont dereference a null pointer while trying to print the log message for the failure. also shutdown ctdb with ctdb_fatal() (This used to be ctdb commit f8642d0438c6bbb34a72c25d6a904b626e247410)	2010-09-03 12:00:48 +10:00
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Ronnie Sahlberg	ac335e3e5d	run the "init" event before we freeze the databases so that we can read from databases during this event (This used to be ctdb commit 6c93bf5a1219617bfb39b093aee3200c74c2c61a)	2010-08-25 08:35:24 +10:00
Ronnie Sahlberg	e040a966af	Dont set next_interval to 0. This can cause ctdbd to spin at 100% in the eventsystem, creating a timed event that will immediately trigger again and again. On uniprocessors this cause the eventscript we are actually waiting for to basically become cpu starved and never complete. (This used to be ctdb commit 92c8408fba957a8ded13f7e285da290502735234)	2010-08-20 15:00:45 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Ronnie Sahlberg	4c05f1900c	Merge commit 'rusty/vacuum-fix-master' (This used to be ctdb commit dc301b324d2c14a2425a965c076113c4fe97903e)	2010-08-19 13:16:35 +10:00
Ronnie Sahlberg	5aa5f3e7bf	Remove the structure ctdb_control_tcp_vnn since this is identical to the structure ctdb_tcp_connection. Add a new "ctdb deltickle" command to delete tickles from the database. This can ONLY be used for tickles created by "ctdb addtickle". Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds' (This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)	2010-08-18 12:36:03 +10:00
Rusty Russell	9fbb191b78	logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)	2010-08-18 11:46:32 +09:30
Rusty Russell	1a009aff73	takeover: prevent crash by avoiding free in traverse on RST timeout After 5 attempts to send a RST to a client without any response, we free "con"; this is done during a traverse. This frees the node we are walking through (the node is made a child of "con" down in rb_tree.c's trbt_create_node() (Valgrind would catch this, as Martin confirmed). So, we create a temporary parent and reparent onto that; then we free that parent after the traverse, thus deleting the unwanted nodes. CQ:S1019041 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 08f7f85477610a4916c1ec866aa467b28f1bbec3)	2010-08-18 11:40:17 +09:30
Rusty Russell	5f2d43157d	vacuum: disabling vacuuming during a freeze We shouldn't even think about vacuuming when we've frozen the database (which is earlier than when we set CTDB_RECOVERY_ACTIVE) CQ:S1018154 & S1018349 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit d8df6835a931082af232c4b94f1dede6f16169f9)	2010-08-18 11:01:52 +09:30
Rusty Russell	0b07f91d36	vacuum: fix crash on vacuum abort Martin Schwenke discovered that 517f05e42f17766b1e8db8f1f4789cbad968e304 ("freeze: abort vacuuming when we're going to freeze.") used ctdb_db for a logging message which is in fact uninitialized, causing a crash (even if it wasn't actually logged). Initialize it properly. Also fix incorrect format in another logging message introduced in that same change. CQ:S1019093 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8e518950ba281502318d6300f7a5ec6cdf6b5674)	2010-08-18 11:00:11 +09:30
Rusty Russell	af55c910a4	freeze: abort vacuuming when we're going to freeze. There are some reports of freeze timeouts, and it looks like vacuuming might be the culprit. So we add code to tell them to abort when a freeze is going on. (This is based on the 1.0.112 branch version 517f05e42f, but far simpler since tdb is now robust against processes being killed during transaction commit) CQ:S1018154 & S1018349 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit f5d7dc679501e607c2c83a248a89d3cada9df146)	2010-08-18 10:54:28 +09:30
Ronnie Sahlberg	e8ffb0d8a4	We use eventloop nesting in a couple of places, notably the sync parts of the recovery daemon. Initialize all event contexts to allow nesting (This used to be ctdb commit 5bf6bd5e7f33aabbeb7b9707716ef99cf471e590)	2010-08-18 10:11:59 +10:00
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	7061ceffd8	Report client for queue errors. We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)	2010-07-01 23:08:49 +10:00
Rusty Russell	70082cd669	ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock. We discovered that recent smbd locks the serverid tdb while holding a lock on another tdb (locking.tdb): 7: POSIX ADVISORY WRITE smbd-2224318 locking.tdb.0 10600 10600 22: -> POSIX ADVISORY READ smbd-2224318 serverid.tdb.0 26580 26580 The result is a deadlock against the ctdb_freeze code called for recovery. We extend the "notify" workaround to this case, too. BZ:65158 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit dfdaa446cf256854ff6d267dceeb86fbee8bb188)	2010-07-01 21:46:55 +10:00
Rusty Russell	8f8959a145	speed startup: with --sloppy-start, cut initial election timeout to 1/2 second. Seconds between ctdbd first log message and node healthy: BEFORE: 4.03 AFTER: 2.02 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)	2010-06-22 22:55:20 +09:30
Rusty Russell	8946028a07	speed startup: add --sloppy-start. The extra recovery interval wait was introduced in 821333afb458 but no explanation was provided in that message. Nonetheless, if starting the entire cluster for the first time, it should be safe to skip this. We use the commandline arg --sloppy-start which should discourage people from using it outside testing. Seconds between ctdbd first log message and node healthy: BEFORE: 16.10 AFTER: 4.03 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)	2010-06-22 22:52:34 +09:30
Rusty Russell	ed31caffab	speed startup: run startup immediately after recovery finished. Seconds between ctdbd first log message and node healthy: BEFORE: 17.08 AFTER: 16.10 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 372201d418f041d69646793105f6898ab12a7d91)	2010-06-22 22:50:45 +09:30
Rusty Russell	fabeea6197	speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)	2010-06-22 22:50:35 +09:30
Rusty Russell	eb61b11497	speed startup: immediately run first monitor event after startup. Once we've done a startup, we need to run a monitor event successfully to be marked as healthy. Rather than wait the usual 5 seconds, run it immediately (which will then reset next_interval to 5 seconds). Seconds between ctdbd first log message and node healthy: BEFORE: 23.58 AFTER: 18.09 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c8651494febcb1c9e558b2002e2a72c2bf547c06)	2010-06-22 22:50:07 +09:30
Rusty Russell	f7efc1f8e8	speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)	2010-06-22 22:50:23 +09:30
Ronnie Sahlberg	7730facc62	fix a debug message (This used to be ctdb commit 856bd6de6218d9b70baed0e6443be4253ea31afe)	2010-06-09 16:22:44 +10:00
Ronnie Sahlberg	d9a3e1d0c0	idr can timeout and wrap/be reused quite quickly. If a noremote node hangs for an extended period, it is possible that we might have a DMASTER request in flight for record A to that node. Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B. If while the request for B is in flight, the first tnode un-hangs and responds back we would receive a dmaster reply for the wrong record. This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight. (This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)	2010-06-09 16:19:29 +10:00
Ronnie Sahlberg	641da4c691	We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus ( a child process might be holding the lock, but not the main daemon) (This used to be ctdb commit 9b4a83e49c5df80df8498b7384c5f53f390c1d9d)	2010-06-09 15:13:22 +10:00
Ronnie Sahlberg	75f3ef154c	add extra logging for failed ctdb_ltdb_unlock() for a few more places it is called from (This used to be ctdb commit 5c0fea90c6474a51992a9c4aeb6af7dfeb213ee0)	2010-06-09 14:37:24 +10:00
Ronnie Sahlberg	fa618aa66a	add additional logging when tdb_chainunlock() fails so we can see where it was called from when it fails (This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)	2010-06-09 14:37:16 +10:00
Ronnie Sahlberg	a4daf81a7c	Additional log messages when tdb databases can no longer be chainlocked or chainunlocked BZ64688 (This used to be ctdb commit b977901a49a9fed45cc8a2fe880eb749f58278f6)	2010-06-08 12:21:20 +10:00
Ronnie Sahlberg	53ea238c6c	Add a variable for start/current time to ctdb statistics and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output. (This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)	2010-06-02 13:14:53 +10:00
Ronnie Sahlberg	bc208bc916	rename ctdb_set_message_handler to ctdb_client_set_message_handler to avoid a colission with the function of the same name in libctdb (This used to be ctdb commit 41dbdd4fc0ab560420fb0e24a3179ff7c94c5bb7)	2010-06-02 09:51:47 +10:00
Ronnie Sahlberg	761a075de9	rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb (This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)	2010-06-02 09:45:21 +10:00
Ronnie Sahlberg	4136f27145	When adding an ip at runtime, it might not yet have an iface assigned to it, so ensure that the next takover_ip call will fall through to accept the ip and add it. (This used to be ctdb commit 2d60f96680d16c2992e2a35517822f88c12538b7)	2010-06-01 16:22:48 +10:00
Ronnie Sahlberg	92340e4d6f	check if vnn is a valid pointer before dereferencing it based on rustys patch for bz62783 (This used to be ctdb commit bdd250b9afdd1060cfd1e2b0f0a5a567150bb380)	2010-05-26 13:43:28 +10:00
Ronnie Sahlberg	0d46488f6e	Merge commit 'rusty/libctdb2' (This used to be ctdb commit d41b802250ddc0a89581eb6285edfd66bdc7a78a)	2010-05-25 12:48:49 +10:00
Ronnie Sahlberg	6578a97bd9	It was possible for ->recovery_mode to get out of sync with the new three db priorities in such a way that ->recovery_mode was set to normal but database priorities leven2 or 3 was still set to frozen. causing the recovery daemon to fail to detect that a recovery was needed to recover access to the database. BZ63951 (This used to be ctdb commit 7411b2b577a16f85ad6913e1bfccce7ea260a613)	2010-05-25 12:45:54 +10:00
Rusty Russell	d5f6026a22	libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h ctdb_client.h is the existing internal client interface (which was mainly in ctdb.h), and ctdb_protocol.h is the information needed for the wire protocol only. ctdb.h will be the new, shiny, libctdb API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)	2010-05-20 15:18:30 +09:30
Ronnie Sahlberg	6f1221e9e1	Add the number of performed recoveries to the "ctdb statistics" output. (This used to be ctdb commit fa045733cb81412f0d02ab52d74eabc7efca8b3d)	2010-05-11 09:44:53 +10:00
Ronnie Sahlberg	7a62592fc5	when performing a recovery, ensure that all nodes use the same reclock file setting as the recovery master (This used to be ctdb commit 26793ad42b77c2328a00ac9a12bca813c7425245)	2010-05-06 09:33:08 +10:00
Ronnie sahlberg	46f00a2478	Merge commit 'rusty/signal-fix' (This used to be ctdb commit 221a9bb41c3a7af0cc65cda78365010893ca1430)	2010-05-03 15:57:41 +10:00
Ronnie Sahlberg	62742bd337	Dont check ip assignment across the cluster while ip-verification checks are disabled (This used to be ctdb commit 189f4a5af1053271b0834522e35c336df959aa03)	2010-05-03 15:52:02 +10:00
Ronnie Sahlberg	4a43428440	The recent change to the recovery daemon to keep track of and verify that all nodes agree on the most recent ip address assignments broke "ctdb moveip ..." since that call would never trigger a full takeover run and thus would immediately trigger an inconsistency. Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments. BZ62782 (This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)	2010-05-03 15:47:17 +10:00
Ronnie Sahlberg	c3c7aa934f	Make create_merged_ip_list() a static function since it is not called from outside of ctdb_takeover.c (This used to be ctdb commit 880896a27adfdd5173b2810b6b2f3889802046f0)	2010-05-03 15:47:06 +10:00
Ronnie Sahlberg	79fac9771d	In the log message when we have found an inconsistent ip address allocation, add extra log information about what the inconsistency is. (This used to be ctdb commit d2e4a9912c4bd13eb4f12681adebe7e59a6d1fb2)	2010-05-03 15:46:36 +10:00

1 2 3 4 5 ...

780 Commits