samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	0dc5584101	Merge branch 'master-readonly-records' into foo Conflicts: Makefile.in tools/ctdb.c (This used to be ctdb commit 0fedef0ffba4178126eee9544c5e2db52f5db893)	2011-09-12 09:34:34 +10:00
Ronnie Sahlberg	1c05db2c9c	Merge remote branch 'ddiss/master_pmda_and_client_timeouts' (This used to be ctdb commit 7bebfc7bad8f36e54003b8e25372fdaf54836e21)	2011-09-08 11:22:53 +10:00
David Disseldorp	0628d1c0e6	client: add req timeout argument to ctdb_cmdline_client Following connection to the local ctdbd, ctdb_cmdline_client() currently issues a CTDB_CONTROL_GET_PNN request with a fixed 3 second timeout. The ctdb cmd line client accepts a --timelimit argument for specifying a per request timeout, pass this value through to ctdb_cmdline_client() for use as a CTDB_CONTROL_GET_PNN request timeout. (This used to be ctdb commit 0634d0305f42f17048b6830733767e8dc300e11c)	2011-09-06 13:56:54 +02:00
Ronnie Sahlberg	64378fea58	Check interfaces: when reading the public addresses file to create the vnn list check that the actual interface exist, print error and fail startup if the interface does not exist. (This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)	2011-09-06 16:11:00 +10:00
Ronnie Sahlberg	1bbd4cbf35	ReadOnly: Add a ctdb_ltdb_fetch_readonly() helper function (This used to be ctdb commit 8551420fb331dd2a897f4619278a981fcefb96e8)	2011-08-23 10:33:17 +10:00
Ronnie Sahlberg	f924b3f40e	ReadOnly: Add helper functions to manipulate a TDB_DATA as a bitmap for nodes that we are tracking as having a readonly delegation (This used to be ctdb commit d10084e62d37674bb8d9e31d457fd23e050545be)	2011-08-23 10:09:42 +10:00
Volker Lendecke	fff653d126	Remove an unused variable Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 04c3d9c7c9ffa8bb95b0bf1513fd79f6c1096a2f)	2011-08-22 17:11:07 +02:00
David Disseldorp	e097b7f8ff	io: Make queue_io_read() safe for reentry queue_io_read() may be reentered via the queue callback, recoverd is particularly guilty of this. queue_io_read() is not safe for reentry if more than one packet is received and partial chunks follow - data read off the pipe on re-entry is assumed to be the start-of-packet four byte length. This leads to a wrongly aligned stream and the notorious "Invalid packet of length 0" errors. This change fixes queue_io_read() to be safe under reentry, only a single packet is processed per call. https://bugzilla.samba.org/show_bug.cgi?id=8319 (This used to be ctdb commit 9ea41d2fab612772f861270c8a59c01c43bd3a4c)	2011-08-05 14:27:18 +10:00
Michael Adam	9c91e16955	ltdb: add the CTDB_REC_FLAG_AUTOMATIC to the initial header in ctdb_ltdb_fetch() Signals that this record was not created by a client level store. (This used to be ctdb commit 69d34983a37b0324ff7610b8dfdcd8d13bf81c54)	2011-03-14 13:35:51 +01:00
Michael Adam	9e8d6b82b5	server: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs This is realized by adding a ctdb_ltdb_store_fn function pointer to the db context and filling it in the attach procedure for non-persistent dbs. (This used to be ctdb commit df49ec44de80affa5ccc637dec12a20a26e8706e)	2011-03-14 13:35:50 +01:00
Ronnie Sahlberg	b57bd0f896	Remove LACOUNT and LACCESSOR and migrate the records immediately. This concept didnt work out and it is really just as expensive as a full migration anyway, without the benefit of caching the data for subsequence accesses. Now, migrate the records immediately on first access. This will be combined with a "cheap vacuum-lite" for special empty records to prevent growth of databases. Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway. By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags. (This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)	2011-02-18 10:08:32 +11:00
Ronnie Sahlberg	0aa2282c9c	change the hash function to use the much better Jenkins hash from the tdb library cq S1020233 (This used to be ctdb commit b86feb6fe463dfdb67b2798491df18a4c434a430)	2011-02-18 10:05:09 +11:00
Ronnie Sahlberg	c4006ce844	Add ctdb_fork(0 which will fork a child process and drop the real-time scheduler for the child. Use ctdb_fork() from callers where we dont want the child to be running at real-time privilege. (This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)	2011-01-11 07:40:41 +11:00
Ronnie Sahlberg	ea0df6d882	Revert scheduling back to use real-time processes Revert this patch: commit 482c302d46e2162d0cf552f8456bc49573ae729d We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads. (This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)	2011-01-11 07:40:35 +11:00
Ronnie Sahlberg	c69ada0090	add a new ctdb_ltdb function to delete a record in a normal database (This used to be ctdb commit fe9070ec9be69e6a6fcbf9899e7ced24541c9c3a)	2010-12-07 15:32:30 +11:00
Ronnie Sahlberg	1638a63dbe	Drop the loglevel of the "reqid wrap" developer debug message to DEBUG so that we dont spam the logs with this normal benign message. (This used to be ctdb commit dc57df549854e329b453ef14cff5cd352632ef73)	2010-10-28 13:33:30 +11:00
Ronnie Sahlberg	90445abbab	Revert "change the hash function to use the much better Jenkins hash" This reverts commit f7e91ae905cd61249028e15f2cb509ea69f10b9e. This may require a change to the ctdb protocol, or a mechanism to negotiate/verify that we dont run with different hash fucntions across the cluster. Reverting the change until we decide how to solve this in the master version. (This used to be ctdb commit 2a2a7a201c90462295544ca23c8a3e215f140622)	2010-10-11 07:05:41 +11:00
Ronnie Sahlberg	6a7ecb7f42	change the hash function to use the much better Jenkins hash from the tdb library cq S1020233 (This used to be ctdb commit f7e91ae905cd61249028e15f2cb509ea69f10b9e)	2010-10-08 13:18:18 +11:00
Ronnie Sahlberg	39c367a68f	Create macros to update the statistics counters and use these macros everywhere instead of manipulating the coutenrs directly. (This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7)	2010-09-29 12:14:24 +10:00
Harald Klatte	f3078b1c7f	AIX bind wants the correct addrsize (This used to be ctdb commit b5169e037fe113a5b62f510646b8fefc055c053b)	2010-09-03 11:49:19 +10:00
Ronnie Sahlberg	8d12313d6b	ouch, the ordering of the constants and the strings must be kept in sync manually and ther eis no check for errors. should fix this later (This used to be ctdb commit e824af1a41f8ceec1edf6b3d1d6e1758fa00deb2)	2010-08-30 19:43:35 +10:00
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Rusty Russell	9fbb191b78	logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)	2010-08-18 11:46:32 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	7061ceffd8	Report client for queue errors. We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)	2010-07-01 23:08:49 +10:00
Ronnie sahlberg	eee814ab47	Merge commit 'rusty/idtree' (This used to be ctdb commit 069db55ea6fa6b8dd278b880c1a325e259f3e172)	2010-06-10 13:33:14 +10:00
Rusty Russell	5f9e4b60ae	Delay reusing ids to make protocol more robust Ronnie and I tracked down a bug which seems to be caused by a node running so slowly that we timed out the request and reused the request id before it responded. The result was that we unlocked the wrong record, leading to the following: ctdbd: tdb_unlock: count is 0 ctdbd: tdb_chainunlock failed smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal) ctdbd: Could not find idr:43 ctdbd: server/ctdb_call.c:492 reqid 43 not found This exact problem is now detected, but in general we want to delay id reuse as long as possible to make our system more robust. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)	2010-06-10 08:58:55 +09:30
Ronnie Sahlberg	f6446adde3	print the db name qwhen a chainunlock fails too (This used to be ctdb commit 7932156d7f25870e6937faca08bf75d3cdbad2e5)	2010-06-09 14:37:08 +10:00
Ronnie Sahlberg	64f2d69e4b	when tdb_chainunlock() fails, print the tdb error that occured (This used to be ctdb commit dcdd2010905b9007fbf7ab71f576cfbd48acce8a)	2010-06-09 14:36:59 +10:00
Ronnie Sahlberg	a4daf81a7c	Additional log messages when tdb databases can no longer be chainlocked or chainunlocked BZ64688 (This used to be ctdb commit b977901a49a9fed45cc8a2fe880eb749f58278f6)	2010-06-08 12:21:20 +10:00
Ronnie Sahlberg	f1b8bd94bb	rename ctdb_message_fn_t to ctdb_msg_fn_t to avoid a conflict with the type of the same name used in libctdb (This used to be ctdb commit 49e23f8329649e4d9eefab47c9b158fcc7210d07)	2010-06-02 10:00:58 +10:00
Ronnie Sahlberg	761a075de9	rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb (This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)	2010-06-02 09:45:21 +10:00
Rusty Russell	d5f6026a22	libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h ctdb_client.h is the existing internal client interface (which was mainly in ctdb.h), and ctdb_protocol.h is the information needed for the wire protocol only. ctdb.h will be the new, shiny, libctdb API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)	2010-05-20 15:18:30 +09:30
Rusty Russell	72c275dd70	ctdb: use full range of IDR This resolves a problem with huge numbers of requests which could overflow 16 bits. Fortunately, the IDR should scale reasonably well, so we can simply hold all the requests. Although noone checks for failure, I added a constant for that. BZ: 60540 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 72efc4122e37798227c3420a65ed1f706ca9ebe7)	2010-05-11 09:44:43 +10:00
Rusty Russell	e1b59b6a47	eventscript: don't do debugging system() from inside signal handler In the case of a timeout, we dump a log of what's happening to a file in /tmp. We do it from the signal handler, which is an unreliable hack (BZ58365). Instead, create another (lower-priority) child to do the dump, then kill the timedout script. Note that this doesn't quite work as intended (the dump is often run after the script has been killed), so the next patch resolves this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)	2010-04-08 15:13:29 +09:30
Ronnie Sahlberg	06885ea9a7	In the recovery daemon, keep track of which node we have assigned public ip addresses and verify that the remote nodes have/keep a consistent view of assigned addresses. If a remote node has an inconsistent view of addresses visavi the recovery master this will trigger a full ip reallocation. (This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)	2010-04-08 14:25:26 +10:00
Stefan Metzmacher	3419e9c4dd	server: add "setup" event This is needed because the "init" event can't use 'ctdb' commands. metze (This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)	2010-02-23 10:38:49 +01:00
Rusty Russell	435fb78d13	Leave sequence number alone when merely migrating records. (Based on earlier version from Ronnie which modified tdb; this one is standalone). When storing records in a tdb that has "automatic seqnum updates" also check if the actual data for the record has changed or not. If it has not changed at all, except for possibly the header, this is likely just a dmaster migration operation in which case we want to write the record to the tdb but we do not want the tdb sequence number to be increased. This resolves the problem of notify.tdb being thrashed under load: the heuristic in smbd to only reread this when the sequence number increases (rarely) breaks down. Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like: 512 1496 118.33 MB/sec execute 60 sec latency 0.00 msec And turning on latency tracking, this was typical in the logs: ctdbd: High latency 9380914.000000s for operation lockwait on database notify.tdb After this commit: 512 2451 143.85 MB/sec execute 60 sec latency 0.00 msec And no more latency messages... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ed2f8b2fcb7e3f0d795eef22cfa317066490709)	2010-02-16 11:02:25 +11:00
Andrew Tridgell	c137725af8	fixed printing of high latency (This used to be ctdb commit 88aacab30a36d66fe03d120bbf655edfe791ec32)	2010-02-16 10:58:24 +11:00
Andrew Tridgell	2406733ed2	ctdb: migrate to new dlinklist.h from Samba (This used to be ctdb commit f63c091f12f8d582e9518673365c7c52479c470c)	2010-02-09 09:20:55 +11:00
Andrew Tridgell	3eb9735be5	ctdb: move ctdb_io.c to use TLIST_*() macros This will make large packet queues much more efficient (This used to be ctdb commit e3f198056230073135ea6354bbef30c5bb022f8f)	2010-02-04 15:37:53 +11:00
Ronnie Sahlberg	a2857b1504	We only queued up to 1000 packets per queue before we start dropping packets, to avoid the queue to grow excessively if smbd has blocked. This could cause traverse packets to become discarded in case the main smbd daemon does a traverse of a database while there is a recovery (sending a erconfigured message to smbd, causing an avalanche of unlock messages to be sent across the cluster.) This avalance of messages could cause also the tranversal message to be discarded causing the main smbd process to hang indefinitely waiting for the traversal message that will never arrive. Bump the maximum queue length before starting to discard messages from 1000 to 1000000 and at the same time rework the queueing slightly so we can append messages cheaply to the queue instead of walking the list from head to tail every time. (This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)	2010-02-04 09:54:06 +11:00
Ronnie Sahlberg	d7c00d8d7e	Drop the debug level for logging fd creation to DEBUG_DEBUG (This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)	2010-02-04 06:37:41 +11:00
Stefan Metzmacher	98ee69c66d	server: add updateip event metze (This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	a1da4e05b5	server: allow multiple interfaces comma separated in public_addresses metze (This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Ronnie Sahlberg	a1d60b1511	Make the size of the in memory ringbuffer for keeping the recent log messages configureable using --log-ringbuf-size=<num-entries>. Add an entry in the sysconfig file to set this persistently. (This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)	2010-01-15 15:38:56 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30

1 2 3 4 5 ...

545 Commits