samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Rusty Russell	9fbb191b78	logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)	2010-08-18 11:46:32 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	7061ceffd8	Report client for queue errors. We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)	2010-07-01 23:08:49 +10:00
Ronnie sahlberg	eee814ab47	Merge commit 'rusty/idtree' (This used to be ctdb commit 069db55ea6fa6b8dd278b880c1a325e259f3e172)	2010-06-10 13:33:14 +10:00
Rusty Russell	5f9e4b60ae	Delay reusing ids to make protocol more robust Ronnie and I tracked down a bug which seems to be caused by a node running so slowly that we timed out the request and reused the request id before it responded. The result was that we unlocked the wrong record, leading to the following: ctdbd: tdb_unlock: count is 0 ctdbd: tdb_chainunlock failed smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal) ctdbd: Could not find idr:43 ctdbd: server/ctdb_call.c:492 reqid 43 not found This exact problem is now detected, but in general we want to delay id reuse as long as possible to make our system more robust. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)	2010-06-10 08:58:55 +09:30
Ronnie Sahlberg	f6446adde3	print the db name qwhen a chainunlock fails too (This used to be ctdb commit 7932156d7f25870e6937faca08bf75d3cdbad2e5)	2010-06-09 14:37:08 +10:00
Ronnie Sahlberg	64f2d69e4b	when tdb_chainunlock() fails, print the tdb error that occured (This used to be ctdb commit dcdd2010905b9007fbf7ab71f576cfbd48acce8a)	2010-06-09 14:36:59 +10:00
Ronnie Sahlberg	a4daf81a7c	Additional log messages when tdb databases can no longer be chainlocked or chainunlocked BZ64688 (This used to be ctdb commit b977901a49a9fed45cc8a2fe880eb749f58278f6)	2010-06-08 12:21:20 +10:00
Ronnie Sahlberg	f1b8bd94bb	rename ctdb_message_fn_t to ctdb_msg_fn_t to avoid a conflict with the type of the same name used in libctdb (This used to be ctdb commit 49e23f8329649e4d9eefab47c9b158fcc7210d07)	2010-06-02 10:00:58 +10:00
Ronnie Sahlberg	761a075de9	rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb (This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)	2010-06-02 09:45:21 +10:00
Rusty Russell	d5f6026a22	libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h ctdb_client.h is the existing internal client interface (which was mainly in ctdb.h), and ctdb_protocol.h is the information needed for the wire protocol only. ctdb.h will be the new, shiny, libctdb API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)	2010-05-20 15:18:30 +09:30
Rusty Russell	72c275dd70	ctdb: use full range of IDR This resolves a problem with huge numbers of requests which could overflow 16 bits. Fortunately, the IDR should scale reasonably well, so we can simply hold all the requests. Although noone checks for failure, I added a constant for that. BZ: 60540 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 72efc4122e37798227c3420a65ed1f706ca9ebe7)	2010-05-11 09:44:43 +10:00
Rusty Russell	e1b59b6a47	eventscript: don't do debugging system() from inside signal handler In the case of a timeout, we dump a log of what's happening to a file in /tmp. We do it from the signal handler, which is an unreliable hack (BZ58365). Instead, create another (lower-priority) child to do the dump, then kill the timedout script. Note that this doesn't quite work as intended (the dump is often run after the script has been killed), so the next patch resolves this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)	2010-04-08 15:13:29 +09:30
Ronnie Sahlberg	06885ea9a7	In the recovery daemon, keep track of which node we have assigned public ip addresses and verify that the remote nodes have/keep a consistent view of assigned addresses. If a remote node has an inconsistent view of addresses visavi the recovery master this will trigger a full ip reallocation. (This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)	2010-04-08 14:25:26 +10:00
Stefan Metzmacher	3419e9c4dd	server: add "setup" event This is needed because the "init" event can't use 'ctdb' commands. metze (This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)	2010-02-23 10:38:49 +01:00
Rusty Russell	435fb78d13	Leave sequence number alone when merely migrating records. (Based on earlier version from Ronnie which modified tdb; this one is standalone). When storing records in a tdb that has "automatic seqnum updates" also check if the actual data for the record has changed or not. If it has not changed at all, except for possibly the header, this is likely just a dmaster migration operation in which case we want to write the record to the tdb but we do not want the tdb sequence number to be increased. This resolves the problem of notify.tdb being thrashed under load: the heuristic in smbd to only reread this when the sequence number increases (rarely) breaks down. Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like: 512 1496 118.33 MB/sec execute 60 sec latency 0.00 msec And turning on latency tracking, this was typical in the logs: ctdbd: High latency 9380914.000000s for operation lockwait on database notify.tdb After this commit: 512 2451 143.85 MB/sec execute 60 sec latency 0.00 msec And no more latency messages... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ed2f8b2fcb7e3f0d795eef22cfa317066490709)	2010-02-16 11:02:25 +11:00
Andrew Tridgell	c137725af8	fixed printing of high latency (This used to be ctdb commit 88aacab30a36d66fe03d120bbf655edfe791ec32)	2010-02-16 10:58:24 +11:00
Andrew Tridgell	2406733ed2	ctdb: migrate to new dlinklist.h from Samba (This used to be ctdb commit f63c091f12f8d582e9518673365c7c52479c470c)	2010-02-09 09:20:55 +11:00
Andrew Tridgell	3eb9735be5	ctdb: move ctdb_io.c to use TLIST_*() macros This will make large packet queues much more efficient (This used to be ctdb commit e3f198056230073135ea6354bbef30c5bb022f8f)	2010-02-04 15:37:53 +11:00
Ronnie Sahlberg	a2857b1504	We only queued up to 1000 packets per queue before we start dropping packets, to avoid the queue to grow excessively if smbd has blocked. This could cause traverse packets to become discarded in case the main smbd daemon does a traverse of a database while there is a recovery (sending a erconfigured message to smbd, causing an avalanche of unlock messages to be sent across the cluster.) This avalance of messages could cause also the tranversal message to be discarded causing the main smbd process to hang indefinitely waiting for the traversal message that will never arrive. Bump the maximum queue length before starting to discard messages from 1000 to 1000000 and at the same time rework the queueing slightly so we can append messages cheaply to the queue instead of walking the list from head to tail every time. (This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)	2010-02-04 09:54:06 +11:00
Ronnie Sahlberg	d7c00d8d7e	Drop the debug level for logging fd creation to DEBUG_DEBUG (This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)	2010-02-04 06:37:41 +11:00
Stefan Metzmacher	98ee69c66d	server: add updateip event metze (This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	a1da4e05b5	server: allow multiple interfaces comma separated in public_addresses metze (This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Ronnie Sahlberg	a1d60b1511	Make the size of the in memory ringbuffer for keeping the recent log messages configureable using --log-ringbuf-size=<num-entries>. Add an entry in the sysconfig file to set this persistently. (This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)	2010-01-15 15:38:56 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30
Rusty Russell	5d99a1a47c	eventscript: expost call names and enum We're going to need this so ctdb can query non-monitor status. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)	2009-12-08 01:47:13 +10:30
Ronnie Sahlberg	8f442f1c0c	Use statically allocated ringbuffer to store the last 500k log entries in memory instead of dynamically allocated ones so that we reduce the pressure on malloc/free. (This used to be ctdb commit c5cbb95512f034abeec515579983bf7ac55eadd9)	2009-12-04 11:36:27 +11:00
Rusty Russell	9e84872ecd	ctdb_io: fix use-after-free on invalid packets Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb. His fix was to remove the talloc_free() in that function, which causes loops when a socket is closed (as it does not get removed from the event system), eg: netcat 192.168.1.2 4379 < /dev/null The real bug is that when we have more than one pending packet in the queue, we loop calling the callback without any safeguards should that callback free the queue (as it tends to do on invalid packets). This can be reproduced by sending more than one bogus packet at once: # Length word at start: 4 == empty packet (assumed little endian) /usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt netcat 192.168.1.2 4379 < /tmp/pkt Using a destructor we can check if the callback frees us, and exit immediately. Elsewhere, we return after the callback anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4d0523dd94fb07e860b3e8118691f93d1ef8d0fa)	2009-12-02 11:27:23 +11:00
Ronnie Sahlberg	cc2d81a77c	make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob (This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)	2009-11-18 19:10:50 +11:00
Ronnie Sahlberg	bc2675119d	add an in memory ringbuffer where we store the last 500000 log entries regardless of log level. add commandt to extract this in memory buffer and to clear it (This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)	2009-11-18 12:44:18 +11:00
Ronnie Sahlberg	8aacfa348d	Suggestion from Volker, make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time. (This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)	2009-10-26 12:20:52 +11:00
Ronnie Sahlberg	a92ba7f729	lower the debug levels for the "create FD messages" so we dont fill up the logs. (This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)	2009-10-21 15:26:24 +11:00
Ronnie Sahlberg	9b8c72c446	When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES. Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery. This avoids having queued up very very large number of MESSAGES that samba semds between eachother to nodes that are blocked/banned/stopped for extended periods . (This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)	2009-10-21 15:20:55 +11:00
Ronnie Sahlberg	9de3652380	add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)	2009-10-15 11:24:54 +11:00
Ronnie Sahlberg	ff104c6f5a	When we dispatch a message to a handler, pass the data as a real talloc object so that the handler can talloc_steal() the message content. (This used to be ctdb commit c69f5fe1db5b6ed4a009f0c10ab82c6f32b2e0bc)	2009-07-02 12:58:49 +10:00
Ronnie Sahlberg	93026f4cbf	update the handling of debug levels so that we always can use a literal instead of a numeric value. validate the input values used and refuse setting the debug level to an unknown value (This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)	2009-07-01 09:17:13 +10:00
Ronnie Sahlberg	9921e1ec21	change the socket we use for sending grautious ARPs from AF_INET/SOCK_PACKET to AF_PACKET/SOCK_RAW (This used to be ctdb commit 2c4c20d7803f4449f8d463314c40d4734ec80e2f)	2009-05-21 14:10:45 +10:00
Ronnie Sahlberg	26e1486db7	Whitespace changes and using the CTDB_NO_MEMORY() macro changes to the previous patch. (This used to be ctdb commit d623ea7c04daa6349b42d50862843c9f86115488)	2009-05-21 11:49:16 +10:00
Sumit Bose	2fcedf6dac	add missing checks on so far ignored return values Most of these were found during a review by Jim Meyering <meyering@redhat.com> (This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)	2009-05-21 11:22:21 +10:00
Ronnie Sahlberg	98a54c4675	Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon. Log this in "ctdb statistics". Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file. (This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)	2009-05-14 10:33:25 +10:00
Ronnie Sahlberg	689f76f0b0	Merge branch 'obnox' (This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)	2009-03-24 17:49:55 +11:00
Ronnie Sahlberg	7265c713db	we need to set the port properly in the parse_ip helper (This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)	2009-03-24 13:45:11 +11:00
Michael Adam	839dec1b12	move common code of system_linux.c and system_aix.c into new system_common.c Michael (This used to be ctdb commit 124874847e5e03ce2a44bddfe778f01dfb0a7a03)	2009-02-28 03:08:31 +01:00
Michael Adam	3cca0f75e4	Fix treatment of link local ipv6 addresses: set the scope id. metze / Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)	2009-01-19 22:50:53 +01:00
Michael Adam	b6828ab22f	ctdb_util: use the parse_ip() function - avoid code duplication Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 1461b78c47810073f8637bc4592cacaadcdaf14b)	2009-01-19 22:49:13 +01:00
Michael Adam	8ec92c92e2	ctdb_sys_have_ip: fix ipv6 support for aix, too. Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 8b5f1e80e3e2e9ca2198e1baee8af36aa5d6c5b5)	2009-01-19 22:49:12 +01:00

1 2 3 4 5 ...

524 Commits