samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-27 03:21:53 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	06885ea9a7	In the recovery daemon, keep track of which node we have assigned public ip addresses and verify that the remote nodes have/keep a consistent view of assigned addresses. If a remote node has an inconsistent view of addresses visavi the recovery master this will trigger a full ip reallocation. (This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)	2010-04-08 14:25:26 +10:00
Stefan Metzmacher	3419e9c4dd	server: add "setup" event This is needed because the "init" event can't use 'ctdb' commands. metze (This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)	2010-02-23 10:38:49 +01:00
Rusty Russell	435fb78d13	Leave sequence number alone when merely migrating records. (Based on earlier version from Ronnie which modified tdb; this one is standalone). When storing records in a tdb that has "automatic seqnum updates" also check if the actual data for the record has changed or not. If it has not changed at all, except for possibly the header, this is likely just a dmaster migration operation in which case we want to write the record to the tdb but we do not want the tdb sequence number to be increased. This resolves the problem of notify.tdb being thrashed under load: the heuristic in smbd to only reread this when the sequence number increases (rarely) breaks down. Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like: 512 1496 118.33 MB/sec execute 60 sec latency 0.00 msec And turning on latency tracking, this was typical in the logs: ctdbd: High latency 9380914.000000s for operation lockwait on database notify.tdb After this commit: 512 2451 143.85 MB/sec execute 60 sec latency 0.00 msec And no more latency messages... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ed2f8b2fcb7e3f0d795eef22cfa317066490709)	2010-02-16 11:02:25 +11:00
Andrew Tridgell	c137725af8	fixed printing of high latency (This used to be ctdb commit 88aacab30a36d66fe03d120bbf655edfe791ec32)	2010-02-16 10:58:24 +11:00
Andrew Tridgell	2406733ed2	ctdb: migrate to new dlinklist.h from Samba (This used to be ctdb commit f63c091f12f8d582e9518673365c7c52479c470c)	2010-02-09 09:20:55 +11:00
Andrew Tridgell	3eb9735be5	ctdb: move ctdb_io.c to use TLIST_*() macros This will make large packet queues much more efficient (This used to be ctdb commit e3f198056230073135ea6354bbef30c5bb022f8f)	2010-02-04 15:37:53 +11:00
Ronnie Sahlberg	a2857b1504	We only queued up to 1000 packets per queue before we start dropping packets, to avoid the queue to grow excessively if smbd has blocked. This could cause traverse packets to become discarded in case the main smbd daemon does a traverse of a database while there is a recovery (sending a erconfigured message to smbd, causing an avalanche of unlock messages to be sent across the cluster.) This avalance of messages could cause also the tranversal message to be discarded causing the main smbd process to hang indefinitely waiting for the traversal message that will never arrive. Bump the maximum queue length before starting to discard messages from 1000 to 1000000 and at the same time rework the queueing slightly so we can append messages cheaply to the queue instead of walking the list from head to tail every time. (This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)	2010-02-04 09:54:06 +11:00
Ronnie Sahlberg	d7c00d8d7e	Drop the debug level for logging fd creation to DEBUG_DEBUG (This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)	2010-02-04 06:37:41 +11:00
Stefan Metzmacher	98ee69c66d	server: add updateip event metze (This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	a1da4e05b5	server: allow multiple interfaces comma separated in public_addresses metze (This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Ronnie Sahlberg	a1d60b1511	Make the size of the in memory ringbuffer for keeping the recent log messages configureable using --log-ringbuf-size=<num-entries>. Add an entry in the sysconfig file to set this persistently. (This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)	2010-01-15 15:38:56 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30
Rusty Russell	5d99a1a47c	eventscript: expost call names and enum We're going to need this so ctdb can query non-monitor status. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)	2009-12-08 01:47:13 +10:30
Ronnie Sahlberg	8f442f1c0c	Use statically allocated ringbuffer to store the last 500k log entries in memory instead of dynamically allocated ones so that we reduce the pressure on malloc/free. (This used to be ctdb commit c5cbb95512f034abeec515579983bf7ac55eadd9)	2009-12-04 11:36:27 +11:00
Rusty Russell	9e84872ecd	ctdb_io: fix use-after-free on invalid packets Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb. His fix was to remove the talloc_free() in that function, which causes loops when a socket is closed (as it does not get removed from the event system), eg: netcat 192.168.1.2 4379 < /dev/null The real bug is that when we have more than one pending packet in the queue, we loop calling the callback without any safeguards should that callback free the queue (as it tends to do on invalid packets). This can be reproduced by sending more than one bogus packet at once: # Length word at start: 4 == empty packet (assumed little endian) /usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt netcat 192.168.1.2 4379 < /tmp/pkt Using a destructor we can check if the callback frees us, and exit immediately. Elsewhere, we return after the callback anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4d0523dd94fb07e860b3e8118691f93d1ef8d0fa)	2009-12-02 11:27:23 +11:00
Ronnie Sahlberg	cc2d81a77c	make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob (This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)	2009-11-18 19:10:50 +11:00
Ronnie Sahlberg	bc2675119d	add an in memory ringbuffer where we store the last 500000 log entries regardless of log level. add commandt to extract this in memory buffer and to clear it (This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)	2009-11-18 12:44:18 +11:00
Ronnie Sahlberg	8aacfa348d	Suggestion from Volker, make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time. (This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)	2009-10-26 12:20:52 +11:00
Ronnie Sahlberg	a92ba7f729	lower the debug levels for the "create FD messages" so we dont fill up the logs. (This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)	2009-10-21 15:26:24 +11:00
Ronnie Sahlberg	9b8c72c446	When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES. Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery. This avoids having queued up very very large number of MESSAGES that samba semds between eachother to nodes that are blocked/banned/stopped for extended periods . (This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)	2009-10-21 15:20:55 +11:00
Ronnie Sahlberg	9de3652380	add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)	2009-10-15 11:24:54 +11:00
Ronnie Sahlberg	ff104c6f5a	When we dispatch a message to a handler, pass the data as a real talloc object so that the handler can talloc_steal() the message content. (This used to be ctdb commit c69f5fe1db5b6ed4a009f0c10ab82c6f32b2e0bc)	2009-07-02 12:58:49 +10:00
Ronnie Sahlberg	93026f4cbf	update the handling of debug levels so that we always can use a literal instead of a numeric value. validate the input values used and refuse setting the debug level to an unknown value (This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)	2009-07-01 09:17:13 +10:00
Ronnie Sahlberg	9921e1ec21	change the socket we use for sending grautious ARPs from AF_INET/SOCK_PACKET to AF_PACKET/SOCK_RAW (This used to be ctdb commit 2c4c20d7803f4449f8d463314c40d4734ec80e2f)	2009-05-21 14:10:45 +10:00
Ronnie Sahlberg	26e1486db7	Whitespace changes and using the CTDB_NO_MEMORY() macro changes to the previous patch. (This used to be ctdb commit d623ea7c04daa6349b42d50862843c9f86115488)	2009-05-21 11:49:16 +10:00
Sumit Bose	2fcedf6dac	add missing checks on so far ignored return values Most of these were found during a review by Jim Meyering <meyering@redhat.com> (This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)	2009-05-21 11:22:21 +10:00
Ronnie Sahlberg	98a54c4675	Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon. Log this in "ctdb statistics". Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file. (This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)	2009-05-14 10:33:25 +10:00
Ronnie Sahlberg	689f76f0b0	Merge branch 'obnox' (This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)	2009-03-24 17:49:55 +11:00
Ronnie Sahlberg	7265c713db	we need to set the port properly in the parse_ip helper (This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)	2009-03-24 13:45:11 +11:00
Michael Adam	839dec1b12	move common code of system_linux.c and system_aix.c into new system_common.c Michael (This used to be ctdb commit 124874847e5e03ce2a44bddfe778f01dfb0a7a03)	2009-02-28 03:08:31 +01:00
Michael Adam	3cca0f75e4	Fix treatment of link local ipv6 addresses: set the scope id. metze / Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)	2009-01-19 22:50:53 +01:00
Michael Adam	b6828ab22f	ctdb_util: use the parse_ip() function - avoid code duplication Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 1461b78c47810073f8637bc4592cacaadcdaf14b)	2009-01-19 22:49:13 +01:00
Michael Adam	8ec92c92e2	ctdb_sys_have_ip: fix ipv6 support for aix, too. Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 8b5f1e80e3e2e9ca2198e1baee8af36aa5d6c5b5)	2009-01-19 22:49:12 +01:00
Stefan Metzmacher	bf86562144	ctdb_sys_have_ip: don't overwrite input data (setting port to 0) metze Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit de71ce2195bb4f6a96b12437a2d4d1424fd1c59c)	2009-01-19 22:49:12 +01:00
Michael Adam	3dea35263c	Fix verification of IP allocation with ipv6 addresses on Linux. Set sin_port or sin6_port to 0, depending on sa_family. Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit e0c70110e241b065c42c1c07f32c3657bac5d98b)	2009-01-19 22:49:11 +01:00
root	5a2aad1af4	If ctdbd was started with the --socket option then we also set the CTDB_SOCKET variable so that the eventscripts can pick up the name proper (This used to be ctdb commit 7b41b518c3ffebf1712445a8c6242509dc798003)	2008-12-08 17:29:17 +11:00
Ronnie Sahlberg	07d35c754f	add a CTDB_SOCKET variable that can be used to override the default /tmp/ctdb.socket (This used to be ctdb commit b75e2263c565c21ecbbd98fbd2c10787e467bf5c)	2008-11-11 14:49:30 +11:00
Ronnie Sahlberg	d7007793ea	latency is measured in us, not ms use an explicit ctdb_db variable instead of dereferencing state (This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)	2008-10-30 13:34:10 +11:00
Ronnie Sahlberg	e1b0cea427	add control and logging of very high latencies. log the type of operation and the database name for all latencies higher than a treshold (This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)	2008-10-30 12:49:53 +11:00
Ronnie Sahlberg	cb300382b0	update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an older ipv4-only version of these controls. We need this so that we are backwardcompatible with old versions of ctdb and so that we can interoperate with a ipv4-only recmaster during a rolling upgrade. (This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)	2008-10-14 10:40:29 +11:00
Ronnie Sahlberg	348cad7bc1	lower the debuglevel when logging unknown idr in responses (This used to be ctdb commit a72f5b7d1560e427e18b1c55a2932a7fb037f4c7)	2008-09-09 13:59:48 +10:00
Ronnie Sahlberg	7a78a78a1c	From C Cowan. Patch to make AIX compile with the new ipv6 additions. (This used to be ctdb commit e26ce5140ed005725f8b7ac8ba23a180fd7d5337)	2008-09-08 08:57:42 +10:00
Ronnie Sahlberg	5193caec6d	make the function to canonicalize a sockaddr structure public (This used to be ctdb commit 1157d61a0bc557d8ffc453c518dfc48473492bfd)	2008-08-20 11:58:27 +10:00
Ronnie Sahlberg	da1c17bf46	when we compare ip addresses in ctdb_same_ip we must first canonicalize the addresses so that we realize that 127.0.0.1:22 is really the same thing as ::ffff:127.0.0.1:22 Downgrade all AF_INET6 ::ffff:xxxx:xxxx sockaddresses into AF_INET ones (This used to be ctdb commit b0fe4c45fc5ba1ecf62ebb921092c8a34e28a2bd)	2008-08-20 11:52:36 +10:00
Ronnie Sahlberg	8e17e75eac	fix a bug in the tcp socketkiller for ipv6 (This used to be ctdb commit 83735951352a243da185031e4853e7e40c43a0fb)	2008-08-20 09:23:31 +10:00
Ronnie Sahlberg	37234887d9	fix the ipv6 checksum calculation for pseudoheader so that it actually works add support to send ipv6 "gratious arp" aka neighbor solicitation packets from ctdb Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 0a38ea11af9237501f2951fee698a59b46f8750d)	2008-08-19 18:24:08 +10:00
Ronnie Sahlberg	ef997d344f	initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)	2008-08-19 14:58:29 +10:00
Andrew Tridgell	b8e93a9233	added marshalling helper functions (This used to be ctdb commit 12087e7d751a8756076662cd8db5dcf35316c0c5)	2008-07-30 19:58:17 +10:00

1 2 3 4 5 ...

509 Commits