1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00
Commit Graph

790 Commits

Author SHA1 Message Date
Amitay Isaacs
5145b9bcec logging: Fix a bug in ringbuffer
When ringbuffer is full, it does not return any entries.  Simplify
ringbuffer logic by keeping track of number of log entries rather than
last entry.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 939d12b96a0cbebbe6269fa2b14f584058dd6174)
2013-05-23 16:18:23 +10:00
Martin Schwenke
7aa0a49cbd util: ctdb_fork() should call ctdb_set_child_info()
For now we pass NULL as the child name.  Later we'll give ctdb_fork()
and friends an extra argument and pass that through.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit ba8866d40125bab06391a17d48ff06a4a9f9da89)
2013-04-18 13:18:29 +10:00
Martin Schwenke
4ede763f3b util: New functions ctdb_set_child_info() and ctdb_is_child_process()
Must be called by all child processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 59b019a97aad9a731f9080ea5be14d0dbdfe03d6)
2013-04-18 13:18:29 +10:00
Volker Lendecke
d82336f1f3 common/messaging: Use the jenkins hash in ctdb_message
This give a better hash distribution

(This used to be ctdb commit f7f8bde2376f8180a0dca6d7b8d7d2a4a12f4bd8)
2013-04-05 13:13:08 +11:00
Volker Lendecke
a37033bfc9 common/messaging: use tdb_parse_record in message_list_db_fetch
This avoids malloc/free in a hot code path.

(This used to be ctdb commit c137531fae8f7f6392746ce1b9ac6f219775fc29)
2013-04-05 13:12:58 +11:00
Amitay Isaacs
9937adf0ca common/messaging: Abstract db related operations inside db functions
This simplifies the use of message indexdb API and abstracts tdb related code
inside the API.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit bf7296ce9b98563bcb8426cd035dbeab6d884f59)
2013-04-05 13:00:43 +11:00
Amitay Isaacs
8788e6318c common/messaging: Don't forget to free the result returned by tdb_fetch()
This fixes a memory leak in the messaging code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 20be1f991dd75c2333c9ec9db226432a819f57ba)
2013-04-05 13:00:16 +11:00
Amitay Isaacs
96ad89f438 common/messaging: Free message list header if all message handlers are freed
This makes sure that even if the srvids are not deregistered, the header
structure is freed when the last message handler has been freed as a result of
client going away.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4e1ec7412866f2d31c41de1bec0fbf788c03051b)
2013-04-05 12:59:25 +11:00
Amitay Isaacs
d4407a6516 common/io: For scheduling immediate events use tevent_schedule_immediate
tevent_schedule_immediate() is much more efficient at handling events that need
to be processed immediately rather than creating timed events with
timeval_zero().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 11734be353a1e246163eda631d35dfe55d1d6fb1)
2013-03-06 15:32:37 +11:00
Amitay Isaacs
5d7efb4cf1 ctdbd: Add an index db for message list for faster searches
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers.  Using a hash based index significantly improves the
performance of search in a linked list.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 3e09f25d419635f6dd679b48fa65370f7860be7d)
2013-03-06 15:32:33 +11:00
Amitay Isaacs
a2abdc1353 common/io: Rewrite socket handling code to read all available data
This improves the processing of packets considerably.  It has been
observed that there can be as many as 10 packets in the socket buffer and
the current code of reading a single packet from a socket at a time is
not very optimal.  This change reads all the bytes from socket buffer and
then parses to extract multiple packets.  If there are multiple packets,
set up a timed event to process next packet.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d788bc8f7212b7dc1587ae592242dc8c876f4053)
2013-02-19 17:18:21 +11:00
Martin Schwenke
689384a7b4 Logging: Fix breakage when freeing the log ringbuffer
Commit a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer.  The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!

Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function).  The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.

This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed.  There are 3 special cases:

* Forking the recovery daemon.  We want to be able to fetch from the
  ringbuffer there.

* The ringbuffer fetching code.  Change the 2 calls in this code (main
  daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
  instead.

While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.

Note to self: always test... even the most obvious patches...  ;-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00db5fa00474f8a83f1aa3b603fd756cc9b49ff4)
2013-02-07 11:26:29 +11:00
Martin Schwenke
8dc3219e9b Logging: Free the ringbuffer in child processes created with ctdb_fork()
At the moment the log ringbuffer is duplicated in every child process.
Althought it is copy-on-write we want to see if it is contributing to
out-of-memory situations when there are a lot of children.

The ringbuffer isn't accessible from any of the children anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a82d3ec12f0fda16d6bfa8442a07595de897c10e)
2013-02-05 12:40:30 +11:00
Martin Schwenke
f2ba0e8a65 Logging: New function ctdb_log_ringbuffer_free()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a4f622e85168f59417c11705f1734e0352e1d44a)
2013-02-05 12:40:30 +11:00
Mathieu Parent
264f847631 common: Don't lie on unimplemented gratuitous arp
Signed-off-by: Mathieu Parent <math.parent@gmail.com>

(This used to be ctdb commit b054193d1d19a8eef998fa690899501f79badb8a)
2013-01-22 18:04:00 +11:00
Mathieu Parent
fd8d3cfeba common: FreeBSD+kFreeBSD: Implement get_process_name (same as in Linux)
Signed-off-by: Mathieu Parent <math.parent@gmail.com>

(This used to be ctdb commit 258092aaf6b7a9bdc14f0fb35e8bd7f7dc742b3f)
2013-01-22 18:03:44 +11:00
Mathieu Parent
384b9b2a7b common: Detailed platform-specific FIXME
Signed-off-by: Mathieu Parent <math.parent@gmail.com>

(This used to be ctdb commit d202b2fdd4fd70172e5e44583627b57a1b7ad2ed)
2013-01-22 18:03:41 +11:00
Martin Schwenke
db5dfe891c recoverd: Add CTDB_SRVID_GETLOG and CTDB_SRVID_CLEARLOG
These support getting and clearing logs from the ring-buffer in the
recovery daemon.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cbca233d1e03b2410e0bb63b936328d4a8b3c7b4)
2012-10-22 11:15:36 +11:00
Amitay Isaacs
1011d10a51 common: Add routines to get process and lock information
Currently these functions are implemented only for Linux.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit be4051326b0c6a0fd301561af10fd15a0e90023b)
2012-10-20 02:48:44 +11:00
Martin Schwenke
8d7562f3f8 common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace.  If we can reproduce it then this might
help us to debug it.

The idea is that you do something like the following in /etc/sysconfig/ctdb:

  export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"

When we hit this error than we call out to gcore to get a core file so
we can do forensics.  This might block CTDB for a few seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
2012-10-18 20:05:42 +11:00
Martin Schwenke
75347b8668 util: ctdb_fork() closes all sockets opened by the main daemon
Do some other hosuekeeping including stopping tevent.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 212298279557a2833ef0f81809b4a5cdac72ca02)
2012-10-05 11:56:12 +10:00
Amitay Isaacs
c44a97dfa1 util: Do not lock down memory when running with local daemons
Thanks to Ronnie for highlighting the issue of memory lockdown on AIX.
Fix typo, use getuid and not getpid.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 21a5cbf9518fafc610939f14874371a52b1dc8b3)
2012-07-26 22:01:50 +10:00
Amitay Isaacs
51c57e87e5 util: Do not try to lockdown memory when running in local daemons mode
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 25f84797a64a683c303b04057aa8113e9fc47c49)
2012-07-16 12:12:05 +10:00
Ronnie Sahlberg
d21337a0fb Add new command to find which interface is located on
(This used to be ctdb commit f07376309e70f5ccdb7de8453caacc71b451ab48)
2012-06-20 15:11:49 +10:00
Amitay Isaacs
7631830152 server: Replace BOOL datatype with bool, True/False with true/false
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)
2012-05-28 11:22:25 +10:00
Ronnie Sahlberg
a57eba2bb4 Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.

Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a

(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
2012-05-03 14:03:26 +10:00
Amitay Isaacs
4392591555 Remove explicit include of lib/tevent/tevent.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)
2012-04-13 17:28:14 +10:00
Ronnie Sahlberg
e7e51ddb64 LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node
This can improve performance slightly on certain workloads where smbds frequently read from the same record

(This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)
2012-03-20 12:26:22 +11:00
Ronnie Sahlberg
574b47e23f Merge branch 'master' of 10.1.1.27:/shared/ctdb/ctdb-master
(This used to be ctdb commit 9b85aa1aa14091dc1de470a587f7c054b9e40078)
2012-02-21 07:12:50 +11:00
Ronnie Sahlberg
42e477b14e READONLY: only send a control to schedule fast-vacuuming from child context iff we have a connection open to the main daemon
there are some child processes where we do not create a connection to the main daemon (switch_from_server_to_client()) because it is expensive to set up and we normally might not need to talk to the daemon at all via a domainsocket.
but we might want to still call to ctdb_ltdb_store() from such chil processes.

(This used to be ctdb commit 9e372a08c40087e6b5335aa298e94d88273566a5)
2012-02-21 07:03:44 +11:00
Volker Lendecke
3d71e9f7f0 Add common/system_freebsd.c
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 89067e12b868974f9909b447ab5e202d612ac44f)
2012-02-13 16:32:35 +01:00
Volker Lendecke
5e3b13a32a FreeBSD does not define s6_addr32, only s6_addr
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit d657af4fb68ce3f7c462856f2934f6bf169e120b)
2012-02-13 16:20:12 +01:00
Mathieu Parent
33b7470e82 Add kFreeBSD support
(This used to be ctdb commit c75e4ad9b566e47dec66d25988da4cee861c2357)
2012-01-31 14:14:21 +11:00
Ronnie Sahlberg
b5b4c1a2ea explain why we use FIONREAD
(This used to be ctdb commit d0f85478c37828eb8a24315d4326eb4eaedb9afc)
2012-01-04 21:41:12 +11:00
Michael Adam
3dab0c9b0b rb_tree: fix possible access-after-free-error in trbt_traversearray32_node
When the traverse callback frees the current node, the traverse of the
rbtree can fail (the next node->right fails since node is not there any more...).
This is fixed by introducing variables to store the right (and left)
pointers before the callback is called.

(This used to be ctdb commit 8b0caaeed154d26c67a73659d3bbbdd63b21be11)
2011-12-23 17:39:00 +01:00
Ronnie Sahlberg
c3ee62439f Return the peer_pid properly to the caller
(This used to be ctdb commit 0f15a2c65db8f8b4ac0d5ad2755b9aa3c2a8b279)
2011-12-06 13:16:15 +11:00
Mathieu Parent
1ed5288c38 GNU/Hurd support
CTDB has the following limitations on GNU Hurd:

- The pid of a peer is not get from the socket [1]. As a consequence, the peer
  process is not killed when releasing IP [2].

- Gratuitous arp are not yet supported [3]

- network interfaces are always considered present [4]

[1]: ctdb_get_peer_pid() in common/system_gnu.c
[2]: release_kill_clients() in server/ctdb_takeover.c
[3]: ctdb_sys_send_arp() in common/system_gnu.c
[4]: ctdb_sys_check_iface_exists() in common/system_gnu.c

(This used to be ctdb commit 00212e5c7dd229e7f8975a165d5ab8875d4917cc)
2011-12-06 11:58:14 +11:00
Mathieu Parent
bb3d6698e9 Move platform-specific code to common/system_*
This removes #ifdef AIX and ease the addition of new platforms.

(This used to be ctdb commit 2fd1067a075fe0e4b2a36d4ea18af139d03f17bf)
2011-12-06 11:57:11 +11:00
Michael Adam
5d94dff27e system_linux: correctly cast sockaddr_in6 to sockaddr for sendto() in ctdb_sys_send_tcp()
(This used to be ctdb commit 11bebd5367102fcd02b17c44ac87bf50d4c68785)
2011-11-26 00:34:54 +01:00
Michael Adam
d9516a8bf9 system_linux: correctly cast sockaddr_in to sockaddr in ctdb_sys_send_tcp()
(This used to be ctdb commit cc60df5a3edebfdf50fcd22ebfaad35736f90379)
2011-11-26 00:34:54 +01:00
Martin Schwenke
f186dd90b6 Move some common functions to common/ctdb_ltdb.c
Move identical copies of ctdb_null_func(), ctdb_fetch_func(),
ctdb_fetch_with_header_func() from ctdb_client.c and
ctdb_ltdb_server.c to somewhere common.

This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 126cb0d369b2b1aed63801dc4ba0554399e8b7e4)
2011-11-11 14:31:50 +11:00
Martin Schwenke
52ff485958 Added some #ifndefs to stop files being included multiple times.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fdca12c25e6fce6206135b994dedf44265e4eb09)
2011-11-11 14:31:50 +11:00
Martin Schwenke
c8286b8dc7 Clean up warnings: remove unused function dump_packet()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c22e201be15e7d5b788c2f5f7916b553e0faaa2a)
2011-11-09 15:47:30 +11:00
Ronnie Sahlberg
0f92fa224c RB_TREE: Add mechanism to abort a traverse
This patch changes the callback signature for traversal
functions to allow a client to abort a traverse before it finishes.
Updates to all callers and examples as well as rb-test tool.

(This used to be ctdb commit 8ab0c63ad36cfbbb1e5fed46a1f4c47b1fdb581f)
2011-11-08 13:40:28 +11:00
Ronnie Sahlberg
8e4bfba75c ReadOnly: Rename the function ctdb_ltdb_fetch_readonly() to ctdb_ltdb_fetch_with_header() since this is what it actually does.
(This used to be ctdb commit 94a5ce4e08e7891f07dbfe4c822ca4be5ab10965)
2011-09-13 18:38:20 +10:00
Ronnie Sahlberg
0dc5584101 Merge branch 'master-readonly-records' into foo
Conflicts:

	Makefile.in
	tools/ctdb.c

(This used to be ctdb commit 0fedef0ffba4178126eee9544c5e2db52f5db893)
2011-09-12 09:34:34 +10:00
Ronnie Sahlberg
1c05db2c9c Merge remote branch 'ddiss/master_pmda_and_client_timeouts'
(This used to be ctdb commit 7bebfc7bad8f36e54003b8e25372fdaf54836e21)
2011-09-08 11:22:53 +10:00
David Disseldorp
0628d1c0e6 client: add req timeout argument to ctdb_cmdline_client
Following connection to the local ctdbd, ctdb_cmdline_client() currently
issues a CTDB_CONTROL_GET_PNN request with a fixed 3 second timeout.

The ctdb cmd line client accepts a --timelimit argument for specifying
a per request timeout, pass this value through to ctdb_cmdline_client()
for use as a CTDB_CONTROL_GET_PNN request timeout.

(This used to be ctdb commit 0634d0305f42f17048b6830733767e8dc300e11c)
2011-09-06 13:56:54 +02:00
Ronnie Sahlberg
64378fea58 Check interfaces: when reading the public addresses file to create the vnn list
check that the actual interface exist, print error and fail startup if the interface does not exist.

(This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)
2011-09-06 16:11:00 +10:00
Ronnie Sahlberg
1bbd4cbf35 ReadOnly: Add a ctdb_ltdb_fetch_readonly() helper function
(This used to be ctdb commit 8551420fb331dd2a897f4619278a981fcefb96e8)
2011-08-23 10:33:17 +10:00
Ronnie Sahlberg
f924b3f40e ReadOnly: Add helper functions to manipulate a TDB_DATA as a bitmap for nodes that we are tracking as having a readonly delegation
(This used to be ctdb commit d10084e62d37674bb8d9e31d457fd23e050545be)
2011-08-23 10:09:42 +10:00
Volker Lendecke
fff653d126 Remove an unused variable
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 04c3d9c7c9ffa8bb95b0bf1513fd79f6c1096a2f)
2011-08-22 17:11:07 +02:00
David Disseldorp
e097b7f8ff io: Make queue_io_read() safe for reentry
queue_io_read() may be reentered via the queue callback, recoverd is
particularly guilty of this.

queue_io_read() is not safe for reentry if more than one packet is
received and partial chunks follow - data read off the pipe on re-entry
is assumed to be the start-of-packet four byte length. This leads to a
wrongly aligned stream and the notorious "Invalid packet of length 0"
errors.

This change fixes queue_io_read() to be safe under reentry, only a
single packet is processed per call.

https://bugzilla.samba.org/show_bug.cgi?id=8319

(This used to be ctdb commit 9ea41d2fab612772f861270c8a59c01c43bd3a4c)
2011-08-05 14:27:18 +10:00
Michael Adam
9c91e16955 ltdb: add the CTDB_REC_FLAG_AUTOMATIC to the initial header in ctdb_ltdb_fetch()
Signals that this record was not created by a client level store.

(This used to be ctdb commit 69d34983a37b0324ff7610b8dfdcd8d13bf81c54)
2011-03-14 13:35:51 +01:00
Michael Adam
9e8d6b82b5 server: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs
This is realized by adding a ctdb_ltdb_store_fn function pointer to the db
context and filling it in the attach procedure for non-persistent dbs.

(This used to be ctdb commit df49ec44de80affa5ccc637dec12a20a26e8706e)
2011-03-14 13:35:50 +01:00
Ronnie Sahlberg
b57bd0f896 Remove LACOUNT and LACCESSOR and migrate the records immediately.
This concept didnt work out and it is really just as expensive as a full migration
anyway, without the benefit of caching the data for subsequence accesses.

Now, migrate the records immediately on first access.
This will be combined with a "cheap vacuum-lite" for special empty records to
prevent growth of databases.

Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.

By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.

(This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
2011-02-18 10:08:32 +11:00
Ronnie Sahlberg
0aa2282c9c change the hash function to use the much better Jenkins hash
from the tdb library

cq S1020233

(This used to be ctdb commit b86feb6fe463dfdb67b2798491df18a4c434a430)
2011-02-18 10:05:09 +11:00
Ronnie Sahlberg
c4006ce844 Add ctdb_fork(0 which will fork a child process and drop the real-time
scheduler for the child.

Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.

(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
2011-01-11 07:40:41 +11:00
Ronnie Sahlberg
ea0df6d882 Revert scheduling back to use real-time processes
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d

We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.

(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
2011-01-11 07:40:35 +11:00
Ronnie Sahlberg
c69ada0090 add a new ctdb_ltdb function to delete a record in a normal database
(This used to be ctdb commit fe9070ec9be69e6a6fcbf9899e7ced24541c9c3a)
2010-12-07 15:32:30 +11:00
Ronnie Sahlberg
1638a63dbe Drop the loglevel of the "reqid wrap" developer debug message to DEBUG
so that we dont spam the logs with this normal benign message.

(This used to be ctdb commit dc57df549854e329b453ef14cff5cd352632ef73)
2010-10-28 13:33:30 +11:00
Ronnie Sahlberg
90445abbab Revert "change the hash function to use the much better Jenkins hash"
This reverts commit f7e91ae905cd61249028e15f2cb509ea69f10b9e.

This may require a change to the ctdb protocol, or a mechanism
to negotiate/verify that we dont run with different hash fucntions
across the cluster.

Reverting the change until we decide how to solve this in the master
version.

(This used to be ctdb commit 2a2a7a201c90462295544ca23c8a3e215f140622)
2010-10-11 07:05:41 +11:00
Ronnie Sahlberg
6a7ecb7f42 change the hash function to use the much better Jenkins hash
from the tdb library

cq S1020233

(This used to be ctdb commit f7e91ae905cd61249028e15f2cb509ea69f10b9e)
2010-10-08 13:18:18 +11:00
Ronnie Sahlberg
39c367a68f Create macros to update the statistics counters and use these macros
everywhere instead of manipulating the coutenrs directly.

(This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7)
2010-09-29 12:14:24 +10:00
Harald Klatte
f3078b1c7f AIX bind wants the correct addrsize
(This used to be ctdb commit b5169e037fe113a5b62f510646b8fefc055c053b)
2010-09-03 11:49:19 +10:00
Ronnie Sahlberg
8d12313d6b ouch, the ordering of the constants and the strings must be kept in sync
manually   and ther eis no check for errors.     should fix this later

(This used to be ctdb commit e824af1a41f8ceec1edf6b3d1d6e1758fa00deb2)
2010-08-30 19:43:35 +10:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Rusty Russell
9fbb191b78 logging: give a unique logging name to each forked child.
This means we can distinguish which child is logging, esp. via syslog where we have no pid.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
2010-08-18 11:46:32 +09:30
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Rusty Russell
7061ceffd8 Report client for queue errors.
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them.  Add a name for each queue, and print nread.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
2010-07-01 23:08:49 +10:00
Ronnie sahlberg
eee814ab47 Merge commit 'rusty/idtree'
(This used to be ctdb commit 069db55ea6fa6b8dd278b880c1a325e259f3e172)
2010-06-10 13:33:14 +10:00
Rusty Russell
5f9e4b60ae Delay reusing ids to make protocol more robust
Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

	ctdbd: tdb_unlock: count is 0
	ctdbd: tdb_chainunlock failed
	smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
	ctdbd: Could not find idr:43
	ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)
2010-06-10 08:58:55 +09:30
Ronnie Sahlberg
f6446adde3 print the db name qwhen a chainunlock fails too
(This used to be ctdb commit 7932156d7f25870e6937faca08bf75d3cdbad2e5)
2010-06-09 14:37:08 +10:00
Ronnie Sahlberg
64f2d69e4b when tdb_chainunlock() fails, print the tdb error that occured
(This used to be ctdb commit dcdd2010905b9007fbf7ab71f576cfbd48acce8a)
2010-06-09 14:36:59 +10:00
Ronnie Sahlberg
a4daf81a7c Additional log messages when tdb databases can no longer be chainlocked or chainunlocked
BZ64688

(This used to be ctdb commit b977901a49a9fed45cc8a2fe880eb749f58278f6)
2010-06-08 12:21:20 +10:00
Ronnie Sahlberg
f1b8bd94bb rename ctdb_message_fn_t to ctdb_msg_fn_t to avoid a conflict with the type of the same name used in libctdb
(This used to be ctdb commit 49e23f8329649e4d9eefab47c9b158fcc7210d07)
2010-06-02 10:00:58 +10:00
Ronnie Sahlberg
761a075de9 rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb
(This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)
2010-06-02 09:45:21 +10:00
Rusty Russell
d5f6026a22 libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.

ctdb.h will be the new, shiny, libctdb API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
2010-05-20 15:18:30 +09:30
Rusty Russell
72c275dd70 ctdb: use full range of IDR
This resolves a problem with huge numbers of requests which could overflow
16 bits.  Fortunately, the IDR should scale reasonably well, so we can simply
hold all the requests.

Although noone checks for failure, I added a constant for that.

BZ: 60540
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 72efc4122e37798227c3420a65ed1f706ca9ebe7)
2010-05-11 09:44:43 +10:00
Rusty Russell
e1b59b6a47 eventscript: don't do debugging system() from inside signal handler
In the case of a timeout, we dump a log of what's happening to a file
in /tmp.  We do it from the signal handler, which is an unreliable hack
(BZ58365).

Instead, create another (lower-priority) child to do the dump, then
kill the timedout script.

Note that this doesn't quite work as intended (the dump is often run
after the script has been killed), so the next patch resolves this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)
2010-04-08 15:13:29 +09:30
Ronnie Sahlberg
06885ea9a7 In the recovery daemon, keep track of which node we have assigned public ip
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.

If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.

(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
2010-04-08 14:25:26 +10:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Rusty Russell
435fb78d13 Leave sequence number alone when merely migrating records.
(Based on earlier version from Ronnie which modified tdb; this one
is standalone).

When storing records in a tdb that has "automatic seqnum updates"
also check if the actual data for the record has changed or not.

If it has not changed at all, except for possibly the header,
this is likely just a dmaster migration operation in which case
we want to write the record to the tdb but we do not want the tdb
sequence number to be increased.

This resolves the problem of notify.tdb being thrashed under load:
the heuristic in smbd to only reread this when the sequence number
increases (rarely) breaks down.

Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like:
 512      1496  118.33 MB/sec  execute 60 sec  latency 0.00 msec
And turning on latency tracking, this was typical in the logs:
 ctdbd: High latency 9380914.000000s for operation lockwait on database notify.tdb

After this commit:
  512      2451  143.85 MB/sec  execute 60 sec  latency 0.00 msec
And no more latency messages...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 9ed2f8b2fcb7e3f0d795eef22cfa317066490709)
2010-02-16 11:02:25 +11:00
Andrew Tridgell
c137725af8 fixed printing of high latency
(This used to be ctdb commit 88aacab30a36d66fe03d120bbf655edfe791ec32)
2010-02-16 10:58:24 +11:00
Andrew Tridgell
2406733ed2 ctdb: migrate to new dlinklist.h from Samba
(This used to be ctdb commit f63c091f12f8d582e9518673365c7c52479c470c)
2010-02-09 09:20:55 +11:00
Andrew Tridgell
3eb9735be5 ctdb: move ctdb_io.c to use TLIST_*() macros
This will make large packet queues much more efficient

(This used to be ctdb commit e3f198056230073135ea6354bbef30c5bb022f8f)
2010-02-04 15:37:53 +11:00
Ronnie Sahlberg
a2857b1504 We only queued up to 1000 packets per queue before we start dropping
packets, to avoid the queue to grow excessively if smbd has blocked.

This could cause traverse packets to become discarded in case the main
smbd daemon does a traverse of a database while there is a recovery
(sending a erconfigured message to smbd, causing an avalanche of unlock
messages to be sent across the cluster.)

This avalance of messages could cause also the tranversal message to be
discarded  causing the main smbd process to hang indefinitely waiting
for the traversal message that will never arrive.

Bump the maximum queue length before starting to discard messages from
1000 to 1000000 and at the same time rework the queueing slightly so we
can append messages cheaply to the queue instead of walking the list
from head to tail every time.

(This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)
2010-02-04 09:54:06 +11:00
Ronnie Sahlberg
d7c00d8d7e Drop the debug level for logging fd creation to DEBUG_DEBUG
(This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)
2010-02-04 06:37:41 +11:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
a1da4e05b5 server: allow multiple interfaces comma separated in public_addresses
metze

(This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Ronnie Sahlberg
a1d60b1511 Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.

Add an entry in the sysconfig file to set this persistently.

(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
2010-01-15 15:38:56 +11:00
Rusty Russell
af2613e16f ctdb: use mlockall, cautiously
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays.  But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.

This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB.  We warn, but don't exit, if it fails.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
2009-12-16 20:57:20 +10:30
Rusty Russell
c488ba440a Remove RT priority, use niceness.
1) It's buggy.  Code needs to be carefully written (ie. no busy
   loops) to handle running with it, and we fork and run scripts.[1]

2) It makes debugging harder.  If ctdbd loops (as has happened recently)
   it can be extremely hard to get in and see what's happening.  We've already
   seen the valgrind hacks.

3) We have seen recent scheduler problems.  Perhaps they are unrelated,
   but removing this very unusual setup is unlikely to hurt.

4) It doesn't make anything faster.  Under all but the most perverse of
   circumstances, 99% of the cpu gives the same performance as 100%, and
   we will always preempt normal processes anyway.

[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
    each script" by removing the switch_from_server_to_client() which
    restored it, but even that was only for monitor scripts.  Others were
    run with RT priority.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
2009-12-16 19:26:22 +10:30
Rusty Russell
5d99a1a47c eventscript: expost call names and enum
We're going to need this so ctdb can query non-monitor status.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
2009-12-08 01:47:13 +10:30
Ronnie Sahlberg
8f442f1c0c Use statically allocated ringbuffer to store the last 500k log entries
in memory instead of dynamically allocated ones so that we reduce the pressure
on malloc/free.

(This used to be ctdb commit c5cbb95512f034abeec515579983bf7ac55eadd9)
2009-12-04 11:36:27 +11:00
Rusty Russell
9e84872ecd ctdb_io: fix use-after-free on invalid packets
Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb.
His fix was to remove the talloc_free() in that function, which causes
loops when a socket is closed (as it does not get removed from the event
system), eg:
	netcat 192.168.1.2 4379 < /dev/null

The real bug is that when we have more than one pending packet in the
queue, we loop calling the callback without any safeguards should that
callback free the queue (as it tends to do on invalid packets).  This
can be reproduced by sending more than one bogus packet at once:
	# Length word at start: 4 == empty packet (assumed little endian)
	/usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt
	netcat 192.168.1.2 4379 < /tmp/pkt

Using a destructor we can check if the callback frees us, and exit
immediately.  Elsewhere, we return after the callback anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4d0523dd94fb07e860b3e8118691f93d1ef8d0fa)
2009-12-02 11:27:23 +11:00
Ronnie Sahlberg
cc2d81a77c make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob
(This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)
2009-11-18 19:10:50 +11:00
Ronnie Sahlberg
bc2675119d add an in memory ringbuffer where we store the last 500000 log entries regardless of log level.
add commandt to extract this in memory buffer and to clear it

(This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)
2009-11-18 12:44:18 +11:00
Ronnie Sahlberg
8aacfa348d Suggestion from Volker,
make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time.

(This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)
2009-10-26 12:20:52 +11:00
Ronnie Sahlberg
a92ba7f729 lower the debug levels for the "create FD messages" so we dont fill up the logs.
(This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)
2009-10-21 15:26:24 +11:00
Ronnie Sahlberg
9b8c72c446 When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.

    This avoids having queued up very very large number of MESSAGES that samba semds
     between eachother to nodes that are blocked/banned/stopped for extended periods
    .

(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
2009-10-21 15:20:55 +11:00
Ronnie Sahlberg
9de3652380 add logging everytime we create a filedescriptor in the main ctdb daemon
so we can spot if there are leaks.

plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish

(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
2009-10-15 11:24:54 +11:00
Ronnie Sahlberg
ff104c6f5a When we dispatch a message to a handler, pass the data as a real talloc object so that the handler can talloc_steal() the message content.
(This used to be ctdb commit c69f5fe1db5b6ed4a009f0c10ab82c6f32b2e0bc)
2009-07-02 12:58:49 +10:00
Ronnie Sahlberg
93026f4cbf update the handling of debug levels so that we always can use a literal instead of a numeric value.
validate the input values used and refuse setting the debug level to an unknown value

(This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)
2009-07-01 09:17:13 +10:00
Ronnie Sahlberg
9921e1ec21 change the socket we use for sending grautious ARPs from AF_INET/SOCK_PACKET to AF_PACKET/SOCK_RAW
(This used to be ctdb commit 2c4c20d7803f4449f8d463314c40d4734ec80e2f)
2009-05-21 14:10:45 +10:00
Ronnie Sahlberg
26e1486db7 Whitespace changes and using the CTDB_NO_MEMORY() macro changes to
the previous patch.

(This used to be ctdb commit d623ea7c04daa6349b42d50862843c9f86115488)
2009-05-21 11:49:16 +10:00
Sumit Bose
2fcedf6dac add missing checks on so far ignored return values
Most of these were found during a review by Jim Meyering <meyering@redhat.com>

(This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)
2009-05-21 11:22:21 +10:00
Ronnie Sahlberg
98a54c4675 Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
2009-05-14 10:33:25 +10:00
Ronnie Sahlberg
689f76f0b0 Merge branch 'obnox'
(This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)
2009-03-24 17:49:55 +11:00
Ronnie Sahlberg
7265c713db we need to set the port properly in the parse_ip helper
(This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)
2009-03-24 13:45:11 +11:00
Michael Adam
839dec1b12 move common code of system_linux.c and system_aix.c into new system_common.c
Michael

(This used to be ctdb commit 124874847e5e03ce2a44bddfe778f01dfb0a7a03)
2009-02-28 03:08:31 +01:00
Michael Adam
3cca0f75e4 Fix treatment of link local ipv6 addresses: set the scope id.
metze / Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)
2009-01-19 22:50:53 +01:00
Michael Adam
b6828ab22f ctdb_util: use the parse_ip() function - avoid code duplication
Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 1461b78c47810073f8637bc4592cacaadcdaf14b)
2009-01-19 22:49:13 +01:00
Michael Adam
8ec92c92e2 ctdb_sys_have_ip: fix ipv6 support for aix, too.
Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 8b5f1e80e3e2e9ca2198e1baee8af36aa5d6c5b5)
2009-01-19 22:49:12 +01:00
Stefan Metzmacher
bf86562144 ctdb_sys_have_ip: don't overwrite input data (setting port to 0)
metze

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit de71ce2195bb4f6a96b12437a2d4d1424fd1c59c)
2009-01-19 22:49:12 +01:00
Michael Adam
3dea35263c Fix verification of IP allocation with ipv6 addresses on Linux.
Set sin_port or sin6_port to 0, depending on sa_family.

Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e0c70110e241b065c42c1c07f32c3657bac5d98b)
2009-01-19 22:49:11 +01:00
root
5a2aad1af4 If ctdbd was started with the --socket option then we also set the CTDB_SOCKET variable so that the eventscripts can pick up the name proper
(This used to be ctdb commit 7b41b518c3ffebf1712445a8c6242509dc798003)
2008-12-08 17:29:17 +11:00
Ronnie Sahlberg
07d35c754f add a CTDB_SOCKET variable that can be used to override the default
/tmp/ctdb.socket

(This used to be ctdb commit b75e2263c565c21ecbbd98fbd2c10787e467bf5c)
2008-11-11 14:49:30 +11:00
Ronnie Sahlberg
d7007793ea latency is measured in us, not ms
use an explicit ctdb_db variable instead of dereferencing state

(This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)
2008-10-30 13:34:10 +11:00
Ronnie Sahlberg
e1b0cea427 add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold

(This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)
2008-10-30 12:49:53 +11:00
Ronnie Sahlberg
cb300382b0 update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
older ipv4-only version of these controls.

We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.

(This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)
2008-10-14 10:40:29 +11:00
Ronnie Sahlberg
348cad7bc1 lower the debuglevel when logging unknown idr in responses
(This used to be ctdb commit a72f5b7d1560e427e18b1c55a2932a7fb037f4c7)
2008-09-09 13:59:48 +10:00
Ronnie Sahlberg
7a78a78a1c From C Cowan.
Patch to make AIX compile with the new ipv6 additions.

(This used to be ctdb commit e26ce5140ed005725f8b7ac8ba23a180fd7d5337)
2008-09-08 08:57:42 +10:00
Ronnie Sahlberg
5193caec6d make the function to canonicalize a sockaddr structure public
(This used to be ctdb commit 1157d61a0bc557d8ffc453c518dfc48473492bfd)
2008-08-20 11:58:27 +10:00
Ronnie Sahlberg
da1c17bf46 when we compare ip addresses in ctdb_same_ip we must first canonicalize the addresses so that we realize that 127.0.0.1:22 is really the same thing as ::ffff:127.0.0.1:22
Downgrade all AF_INET6 ::ffff:xxxx:xxxx sockaddresses into AF_INET ones

(This used to be ctdb commit b0fe4c45fc5ba1ecf62ebb921092c8a34e28a2bd)
2008-08-20 11:52:36 +10:00
Ronnie Sahlberg
8e17e75eac fix a bug in the tcp socketkiller for ipv6
(This used to be ctdb commit 83735951352a243da185031e4853e7e40c43a0fb)
2008-08-20 09:23:31 +10:00
Ronnie Sahlberg
37234887d9 fix the ipv6 checksum calculation for pseudoheader so that it actually works
add support to send ipv6 "gratious arp" aka neighbor solicitation packets from ctdb

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 0a38ea11af9237501f2951fee698a59b46f8750d)
2008-08-19 18:24:08 +10:00
Ronnie Sahlberg
ef997d344f initial ipv6 patch
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)
2008-08-19 14:58:29 +10:00
Andrew Tridgell
b8e93a9233 added marshalling helper functions
(This used to be ctdb commit 12087e7d751a8756076662cd8db5dcf35316c0c5)
2008-07-30 19:58:17 +10:00
Andrew Tridgell
5672c421d1 we don't need ctdb_ltdb_persistent_store() any more
(This used to be ctdb commit 2bc7f3aef4668bd1680db87ef215c349280a84f2)
2008-07-30 19:58:03 +10:00
Ronnie Sahlberg
52f03be2d9 From Chris Cowan, patch to make aix compile again
(This used to be ctdb commit 77255bb5523b8d132770a0a7d4ba29ec9e5043cc)
2008-07-09 10:17:39 +10:00
Ronnie Sahlberg
811493a0b6 zero out the sockaddr_in structure before we store the ipv4 data in it to make sure that all data is initialized. Othervise valgrind will complain about uninitialized data when we write this structure out on the wire
(This used to be ctdb commit 80e249512f93bca2445d40590db38d31be2aafd7)
2008-07-07 08:53:22 +10:00
Andrew Tridgell
30f8411eb9 ensure pad bytes in the ltdb_header are initialised
(This used to be ctdb commit 00b1a635e3d61ca7c5487d65ac54f3eb6ea7355e)
2008-07-04 17:40:25 +10:00
Ronnie Sahlberg
d8433cacb2 first cut to convert takeover_callback_state{}
to use ctdb_sock_addr instead of sockaddr_in

(This used to be ctdb commit 5444ebd0815e335a75ef4857546e23f490a22338)
2008-06-04 17:12:57 +10:00
Ronnie Sahlberg
7d39ac131b convert handling of gratious arps and their controls and helpers to
use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6

(This used to be ctdb commit 86d6f53512d358ff68b58dac737ffa7576c3cce6)
2008-06-04 15:13:00 +10:00
Ronnie Sahlberg
ceaf488f05 do persistent writes in a child process
(This used to be ctdb commit 2da3d1f876f5d654f849af8a3e588f5a61300c3d)
2008-05-28 13:04:25 +10:00
Ronnie Sahlberg
52e44b37e2 disable transactions for now, there are more situations where there are conflicting locks and the "net" command is not prepared that the persistent store can fail.
(This used to be ctdb commit e07abdfbdb7963309898f0bf5382b8e951409f0a)
2008-05-22 18:33:54 +10:00
Ronnie Sahlberg
d895f43504 cleanup of the previous patch.
With these patches, ctdbd will enforce and (by default) always use
tdb_transactions when updating/writing records to a persistent database.

This might come with a small performance degratation  since transactions
are slower than no transactions at all.

If a client, such as samba wants to use a persistent database but does NOT
want to pay the performance penalty, it can specify TDB_NOSYNC  as the
srvid parameter in the ctdb_control() for CTDB_CONTROL_DB_ATTACH_PERSISTENT.

In this case CTDBD will remember that "this database is not that important"
so I can use unsafe (no transaction) tdb_stores to write the updates.
It will be faster than the default (always use transaction) but less crash safe.

(This used to be ctdb commit 3d85d2cf669686f89cacdc481eaa97aef1ba62c0)
2008-05-22 13:12:53 +10:00
Ronnie Sahlberg
ed2cf0291d second try for safe transaction stores into persistend tdb databases
for stores into persistent databases, ALWAYS use a lockwait child take out the lock for the record and never the daemon itself.

(This used to be ctdb commit 7fb6cf549de1b5e9ac5a3e4483c7591850ea2464)
2008-05-22 12:47:33 +10:00
Ronnie Sahlberg
909ff219e0 Start implementing support for ipv6.
This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well.

Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close)  a raw socket of the appropriate type inside ctdb_sys_send_tcp().
We know which type of socket v4/v6 to use based on the sin_family of the destination address.

Since ctdb_sys_send_tcp() opens its own socket  we no longer nede to pass a socket
descriptor as a parameter.  Get rid of this redundant parameter and fixup all callers.

(This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)
2008-05-14 15:47:47 +10:00
Ronnie Sahlberg
7d04ca2fc4 Add a missing include
(This used to be ctdb commit 6131f4b4fc7b65f83f3d57927b23393c84bd2a2b)
2008-05-14 15:37:20 +10:00
Ronnie Sahlberg
7178dfb656 add a checksum routine for tcp over ipv6
(This used to be ctdb commit b712762a1b8a3028625085e32136df4458b292c0)
2008-05-14 12:25:55 +10:00
Ronnie Sahlberg
b8eb5925cf Try to use tdb transactions when updating a record and record header inside the ctdb daemon.
If a transaction could be started, do safe transaction store when updating the record inside the daemon.
If the transaction could not be started (maybe another samba process has a lock on the database?) then just do a normal store instead (instead of blocking the ctdb daemon).

The client can "signal" ctdb that updates to this database should, if possible, be done using safe transactions by specifying the TDB_NOSYNC flag when attaching to the database.
The TDB flags are passed to ctdb in the "srvid" field of the control header when attaching using the CTDB_CONTROL_DB_ATTACH_PERSISTENT.

Currently, samba3.2 does not yet tell ctdbd to handle any persistent databases using safe transactions.

If samba3.2 wants a particular persistent database to be handled using
safe transactions inside the ctdbd daemon, it should pass
TDB_NOSYNC as the flags to the call to attach to a persistent database
in ctdbd_db_attach()     it currently specifies 0 as the srvid

(This used to be ctdb commit 8d6ecf47318188448d934ab76e40da7e4cece67d)
2008-05-12 13:37:31 +10:00
Ronnie Sahlberg
0e1a20b603 Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""
remove the transaction stuff and push   so that the git tree will work

This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b.

(This used to be ctdb commit 876d3aca18c27c2239116c8feb6582b3a68c6571)
2008-04-10 15:59:51 +10:00
Ronnie Sahlberg
39f119b42c Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603.

(This used to be ctdb commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b)
2008-04-10 14:57:41 +10:00
Ronnie Sahlberg
9684befa16 Revert "- accept an optional set of tdb_flags from clients on open a database,"
This reverts commit 49330f97c78ca0669615297ac3d8498651831214.

(This used to be ctdb commit 171d1d71ef9f2373620bd7da3adaecb405338603)
2008-04-10 14:45:45 +10:00
Andrew Tridgell
dc15a9c1f6 - accept an optional set of tdb_flags from clients on open a database,
thus allowing the client to pass through the TDB_NOSYNC flag

- ensure that tdb_store() operations on persistent databases that don't
  have TDB_NOSYNC set happen inside a transaction wrapper, thus making
  them crash safe

(This used to be ctdb commit 49330f97c78ca0669615297ac3d8498651831214)
2008-04-10 15:25:48 +10:00
Ronnie Sahlberg
27a7f854f5 add improvements to tracking memory usage in ctdbd adn the recovery daemon
and a ctdb command to pull the talloc memory map from a recovery daemon
ctdb rddumpmemory

(This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)
2008-04-01 15:34:54 +11:00
Ronnie Sahlberg
78081de82a from tridge: decorate dumpmemory output so that packets that are queued show up with a little more information to make memory leak debugging easier
(This used to be ctdb commit 890832ba37d92c7996b38735451f93592c37ff79)
2008-04-01 11:31:42 +11:00
Andrew Tridgell
f6e53f433b merge from ronnie
(This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c)
2008-02-04 20:07:15 +11:00
Andrew Tridgell
9d6ac0cf55 added debug constants to allow for better mapping to syslog levels
(This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)
2008-02-04 17:44:24 +11:00
Andrew Tridgell
3b3fceacbe block alarm signals during critical sections of vacuum
(This used to be ctdb commit cfb14ae76f00f10d27b56c034b2247ab12d63065)
2008-01-10 09:43:14 +11:00
Andrew Tridgell
c60988325d added support for persistent databases in ctdbd
(This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)
2007-09-21 12:24:02 +10:00
Ronnie Sahlberg
ab1c8c074e merge from tridge
(This used to be ctdb commit eda3caa77be352967a41ff9bddda5296c94797a9)
2007-09-13 14:28:18 +10:00
Andrew Tridgell
3c0f61cb92 we don't need the is_loopback logic in ctdb any more
(This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)
2007-09-13 10:45:06 +10:00
Ronnie Sahlberg
9c1b2f4856 merged patch from tridge
(This used to be ctdb commit 90ab044093f67b656e21861ce12d6fee5794d21f)
2007-09-10 16:23:06 +10:00
Andrew Tridgell
f3ae1cdb02 - use struct sockaddr_in more consistently instead of string addresses
- allow for public_address lines with a defaulting interface

(This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)
2007-09-10 14:27:29 +10:00
Ronnie Sahlberg
4ac749bfa4 change the signature to ctdb_sys_have_ip() to also return:
a bool that specifies whether the ip was held by a loopback adaptor or 
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address 
over to the loopback interface

when we release an ip address  after we have move it onto loopback, 
use 60.nfs to kill off the server side (the local part) of the tcp 
connection   so that the tcp connections dont survive a 
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip 
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public 
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
2007-09-10 07:20:44 +10:00
Ronnie Sahlberg
157be530dd change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn
(This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f)
2007-09-04 10:38:48 +10:00
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
6681da31df add an initial implementation of a service_id structure and three
controls to  register/unregister/check a server id.

a server id consists of TYPE:VNN:ID    where type is specific to the 
application.  VNN is the node where the serverid was registered and ID 
might be a node unique identifier such as a pid or similar.


Clients can register a server id for themself at the local ctdb daemon.
When a client dissappears   or when the domain socket connection for the 
client drops  then any and all server ids registered across that domain 
socket will also be automatically removed from the store.

clients can register as many server_ids as they want at the same time    
but each TYPE:VNN:ID must be globally unique.

Clients have the option of explicitely unregister a server id by using 
the UNREGISTER control.


Registration and unregistration can only be done by clients to the local 
daemon. clients can not register their server id to a remote node.


clients can check if a server id does exist on any ctdb node in the 
network by using the check control

(This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)
2007-08-24 15:53:41 +10:00
Andrew Tridgell
df9ec77b6b merge from volker
(This used to be ctdb commit a5587b3c065f7115ad5e55429c2c9d9923d3b4dc)
2007-08-22 17:18:55 +10:00
Ronnie Sahlberg
6fc0653b97 zero out the sa struct to supress a valgrind error
(This used to be ctdb commit b17ff60ad4c5fac76d3f77dacb10c30ae564bf09)
2007-08-15 12:34:41 +10:00
Ronnie Sahlberg
e0957ba4a4 add a function to return the first entry that is stored in a tree where
the key is an array of uint32_t

(This used to be ctdb commit 99553397aade4f1c4d17ef14dad406934958c80a)
2007-08-15 10:57:21 +10:00
Ronnie Sahlberg
adb49f02f0 change the mem hierarchy for trees. let the node be owned by the data
we store in the tree and use a node destructor so that when the data is 
talloc_free()d we also remove the node from the tree.

(This used to be ctdb commit b8dabd1811ebd85ee031563e95085f720a2fa04d)
2007-08-09 14:08:59 +10:00
Ronnie Sahlberg
18deb7e015 remove an unused function
(This used to be ctdb commit 38a26d1f3709fbce551bc3a7af8bacd0ff465bca)
2007-08-09 07:59:50 +10:00
Ronnie Sahlberg
203306400e add helpers to traverse a tree where the key is an array of uint32
(This used to be ctdb commit d328c66827cafff6356e96df2a782930274fe139)
2007-08-08 13:50:18 +10:00
Ronnie Sahlberg
9525b010aa add helpers to add/lookup/delete nodes in a tree where the key is an
array of uint32

(This used to be ctdb commit b7e0996e7735c8629d07453b9d335990c2dbc3db)
2007-08-08 12:30:12 +10:00
Ronnie Sahlberg
c1bfda5772 add a tree insert function that takes a callback fucntion to populate
the data of the tree.
this callback makes it more convenient to manage cases where one might 
want to insert multiple entries into the tree with the same key

rename the tree->tree pointer to tree->root  since this is supposed to 
point to the root of the tree

add a small test utility

(This used to be ctdb commit f6313bed9c53e0d1c36c9e08ac707e88e2a4fcd5)
2007-08-08 11:21:18 +10:00
Ronnie Sahlberg
49f0317b21 when inserting data in the tree, if there was already a node with the
same key then replace the data in the node with the new data and return 
the pointer to the previous data held in the node.

this allows a caller to avoid having to first check if a node already 
exists before inserting a possibly duplicate/colliding entry and lets 
the caller do whatever it needs to do after the fact.

(This used to be ctdb commit 6634cabb910c26400780d51727ff2d1ba5e16e36)
2007-08-08 08:20:46 +10:00
Ronnie Sahlberg
24d1ee09ec dont wait indefinitely for the initial getvnn to complete
(This used to be ctdb commit ef38725ad8c5f1792feacb14b8888f246187da15)
2007-08-08 07:35:53 +10:00
Ronnie Sahlberg
d69055b789 change error output in ctdb and in ctdb_cmdline_client to print to
stderr instead of stdout

(This used to be ctdb commit 6e6e165c2d8f0963ce37567c23aaa012fc3e89d9)
2007-08-07 12:51:25 +10:00
Ronnie Sahlberg
c76f323f73 fix the remaining bugs with tree delete that testing found.
the binary tree should work reasonably well now for delete.
insert always worked fine.

(This used to be ctdb commit 452cda26b206549504480b77483308b44cfa8b01)
2007-07-30 09:09:34 +10:00
Ronnie Sahlberg
8df376b3f0 remove dead code
(This used to be ctdb commit 97ffcda5e56f04aed2f4f8b889b4eb6311f69c4d)
2007-07-26 07:22:36 +10:00
Ronnie Sahlberg
84851b674c fix some remaining bugs with deleting nodes
(This used to be ctdb commit 8aec0e0bef794afce1d2abf762bfadee4ab7e619)
2007-07-26 07:21:32 +10:00
Ronnie Sahlberg
8e0a12463b there were situations where we were not guaranteed that a sibling had 2
child nodes which would cause a segv when trying to dereferencing those 
two child nodes in order to read their color

(This used to be ctdb commit 56f5fb8f8f3e667f5bc13f09fb5de01f5f2e0fae)
2007-07-25 17:53:55 +10:00
Ronnie Sahlberg
904e5ba55e if sibling is NULL it is a leaf node and thus black.
(This used to be ctdb commit 400488472ba64514fa6534d5de90edba6c5e27c6)
2007-07-25 17:22:04 +10:00
Ronnie Sahlberg
13e56a81a9 initial version of talloc based red-black trees
very initial version

(This used to be ctdb commit 121f5c9dfc8e441313e42d94bed9c9f13ec91398)
2007-07-24 18:51:13 +10:00
Ronnie Sahlberg
217142d1e9 add some support for controlling Linux or AIX in the makefile
this should really be done by configure

(This used to be ctdb commit 5a855599288995659e81f1bdbed157bdb207f94a)
2007-07-14 10:58:51 +10:00
Ronnie Sahlberg
a8211f9d1f add an initial system_aix.c to manage raw sockets under aix
(This used to be ctdb commit 277527befedd6f5dfde1c51698245197afd83d99)
2007-07-14 10:27:34 +10:00
Ronnie Sahlberg
6b143c4c5e update the comment at the top of file to reflect the purpose of the file
(This used to be ctdb commit 4d7670102f44ff0f99dafeb050843be38cb258b0)
2007-07-13 17:10:09 +10:00
Ronnie Sahlberg
f09566a81a add a private_data field to the killtcp structure and let the system
specific routines populate it as it see fit when creating a 
capture socket.
pass this structure to read_tcp and close capture socket as parameter

(This used to be ctdb commit 79bbfcfb2223889126fe307d5bbfd24917da07ee)
2007-07-13 17:07:10 +10:00
Andrew Tridgell
1e14ecd176 - merge from ronnie
- cleaner handling of system capture socket

(This used to be ctdb commit d194a41a71b8466d0726dcbae3970a86386fcb3c)
2007-07-13 11:31:18 +10:00
Andrew Tridgell
d2a5af7eb8 fully save/restore scheduler parameters
(This used to be ctdb commit 59408eabe7515d49a6eef3b6fb2590a1cd1df956)
2007-07-13 09:35:46 +10:00
Ronnie Sahlberg
850e634166 netinet/if_ether.h is more portable than net/ethernet.h
(This used to be ctdb commit ee84ea17529a27e22c1a0503d07aaeec1ef731e2)
2007-07-12 11:43:30 +10:00
Ronnie Sahlberg
9cde594006 the posix.4 name for the priority field is sched_priority
not __sched_priority

(This used to be ctdb commit c08c5a36b3f1dd2cb72278058cb5664816e1d339)
2007-07-12 11:31:20 +10:00
Ronnie Sahlberg
a650497680 as an optimization for when we want to send multiple tickles at a time
let the caller create the sending socket and use a single socket instead 
of one new one for each tickle.
pass a sending socket to ctdb_sys_send_tcp()

ctdb_sys_kill_tcp is not longer used so remove it

set the socketflags for close on exec and nonblocking in the helper that 
creates the sockets instead of in the caller

add a helper to create a sending socket to send tickles from

(This used to be ctdb commit 469f3fb238a0674a2b48fdf1a7e657e32428178a)
2007-07-12 09:22:06 +10:00
Ronnie Sahlberg
823b7d4a5f rename killtcp->fd to killtcp->capture_fd
we might want to have two sockets attached to the killtcp structure
one for capturing and a second one for sending  so we dont have to 
create a new socket for each tickle we want to send

(This used to be ctdb commit b3e82ec38047bbec1edfd88ade264077d4cbd2ee)
2007-07-12 08:52:24 +10:00
Ronnie Sahlberg
e4db03f7e6 add a ctdb_ prefix to two public functions
(This used to be ctdb commit 32adee5426aa75ddcd4d648ef326ed03d5ff5c46)
2007-07-11 18:13:03 +10:00
Ronnie Sahlberg
aa080f66d9 first cut at a better and more scalable socketkiller
that can kill multiple connections asynchronously using one listening 
socket

(This used to be ctdb commit 22bb44f3d745aa354becd75d30774992f6c40b3a)
2007-07-11 17:43:51 +10:00
Andrew Tridgell
32de198fd3 update lib/replace from samba4
(This used to be ctdb commit f0555484105668c01c21f56322992e752e831109)
2007-07-10 15:29:31 +10:00
Andrew Tridgell
bdf01ed7c0 - neaten up the command line for killtcp
- split out the event script code into a separate module
- get rid of the separate takeover directory

(This used to be ctdb commit 8ea2c923a3e2464200ff79bf2c3f1f89e6a93ad4)
2007-07-04 16:51:13 +10:00
Andrew Tridgell
14c788f3cb move more util code to lib/util
(This used to be ctdb commit de5ab0584c978a6be4afeacd80c84015b206a3c6)
2007-06-07 22:30:29 +10:00
Andrew Tridgell
ae3d54094b start splitting the code into separate client and server pieces
(This used to be ctdb commit 603cd77988c181525946cd5eb0f4d0d646b58059)
2007-06-07 22:06:19 +10:00
Andrew Tridgell
3d75c9a51d later times are a lower priority, not a higher priority
(This used to be ctdb commit e96424e7d366df29767c4eeaccdcc0cc975cb8ae)
2007-06-07 19:21:55 +10:00
Andrew Tridgell
dbb803e6af choose the most connected node first
(This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2)
2007-06-07 19:17:27 +10:00
Andrew Tridgell
df6439d796 formatting fixes
(This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b)
2007-06-07 18:39:37 +10:00
Andrew Tridgell
d774192737 use a priority time for the election data, not just the vnn
(This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d)
2007-06-07 18:37:27 +10:00
Andrew Tridgell
c42ddcda23 validate vnn on node flags change
(This used to be ctdb commit 5628ebbcc2aa61b63c761783c70fe4d8a0070607)
2007-06-07 18:13:14 +10:00
Andrew Tridgell
96861466b7 there are now far too many controls for the controls statistics fields to be useful
(This used to be ctdb commit f5e188fc7e13b55b6b4081dcc74ea9614a76f9bb)
2007-06-07 18:07:38 +10:00
Andrew Tridgell
3e4d7bef23 get all the tunables at once in recovery daemon
(This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93)
2007-06-07 18:05:25 +10:00
Andrew Tridgell
cb4c33cc68 handle CTDB_CURRENT_NODE in ban commands
(This used to be ctdb commit fefb53f1d22c5458a1e107f8352818aee87983de)
2007-06-07 16:48:31 +10:00
Andrew Tridgell
23bf62fe30 added admin commands to ban/unban nodes
(This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad)
2007-06-07 16:34:33 +10:00
Andrew Tridgell
2ed57a9ae1 implement a scheme where nodes are banned if they continuously caused the cluster
to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes)

(This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c)
2007-06-07 15:18:55 +10:00
Andrew Tridgell
9754d16d48 merged admin enable/disable change from ronnie
(This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a)
2007-06-07 11:15:22 +10:00
Ronnie Sahlberg
9ff733c784 add a control to permanently enable/disable a node
(This used to be ctdb commit d66fdba16ca22f62ddac6882a17614879b08a798)
2007-06-07 09:16:17 +10:00
Andrew Tridgell
8fbca613d4 get parents idea of recmode and recmaster when deciding if we should do a takeover run
(This used to be ctdb commit 0e8124acd2f1a9b34292c1ee13c7e4cd6fe49876)
2007-06-06 21:56:54 +10:00
Andrew Tridgell
4a7f116746 update flags in parent daemon too
(This used to be ctdb commit 8995246d95e670753ab8c61d724d284cac2b414d)
2007-06-06 21:34:36 +10:00
Andrew Tridgell
ae56096b0b ensure all nodes display disabled nodes correctly
(This used to be ctdb commit 959f82cfe926994658f5826007caccb0409003e1)
2007-06-06 21:27:09 +10:00
Andrew Tridgell
81fad8636f added timeouts in all event scripts
(This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1)
2007-06-06 13:45:12 +10:00
Andrew Tridgell
76b7361c7e - added monitoring of rpc ports for nfs, and of Samba ports and directories
- added monitoring of the ethernet link state

When monitoring detects an error, the node loses its public IP address

(This used to be ctdb commit 0af57aead8c983511d25774b4ffe09fa5ff26501)
2007-06-06 12:08:42 +10:00
Andrew Tridgell
cafddf76dc - fixed flags display in logs
- added monitor handler to test event script

(This used to be ctdb commit a4c18dddee169df49e5d77d9a94ce9329f169319)
2007-06-06 11:13:24 +10:00
Andrew Tridgell
eaf701fbda send the right sort of message on monitoring failure
(This used to be ctdb commit 9db537d9b11d48a36346db721ed8936ff5ecacb2)
2007-06-06 11:12:45 +10:00
Andrew Tridgell
af8834dd02 added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem
(This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f)
2007-06-06 10:25:46 +10:00
Andrew Tridgell
be3a00bd73 clean out some more cruft
(This used to be ctdb commit ad16c5fe2748b48a6f6c79976359d56d9bed33f4)
2007-06-05 17:57:07 +10:00
Andrew Tridgell
ac55bc4166 first step in health monitoring of cluster nodes. When not healthy they will be marked disabled
(This used to be ctdb commit d3dbd9fc4db21632075b56fc52cf95435c63374a)
2007-06-05 17:43:19 +10:00
Andrew Tridgell
a3048a8942 more unused code
(This used to be ctdb commit b01f226949965942c1d64ff3b4ecc0b835d4fecc)
2007-06-05 15:17:53 +10:00
Andrew Tridgell
efcacd76b7 remove an unused function
(This used to be ctdb commit 9a36d0e0c110c66fe72dce530318b9bc0ac1ce0b)
2007-06-05 15:17:24 +10:00
Andrew Tridgell
ee546dec81 merge from ronnie
(This used to be ctdb commit 531d7ea7aca3116e78a4502a1c8b75a3fb764a4f)
2007-06-04 22:13:59 +10:00
Ronnie Sahlberg
4be9a44ba7 add a control that lists all public ip addresses and which node that
currently serves it

(This used to be ctdb commit db9b89dc423b31079e5502323e5fd2bbaf82e1e9)
2007-06-04 21:11:51 +10:00
Andrew Tridgell
39ced972ae make recovery daemon values tunable
(This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353)
2007-06-04 20:22:44 +10:00
Ronnie Sahlberg
1ee8989bd4 merge from tridge
(This used to be ctdb commit 3bfede5d46dba5a3654dad9205534391bc339461)
2007-06-04 20:10:53 +10:00
Ronnie Sahlberg
79b54a624e change the takoverip/releaseip controls to pass a structure containing
both the nodenumber and the id of the node that has taken over that 
address in addition to the public address itself    so that all nodes 
can learn which node is currently hosting each of the public addresses

(This used to be ctdb commit 53e9ff790387b85a36fa9c3c44cd4c95cbdf35da)
2007-06-04 20:07:37 +10:00
Andrew Tridgell
dbb2ec43dd added tunables settable using ctdb command line tool
(This used to be ctdb commit 73d440f8cb19373cfad7a2f0f0ca4f963c57ff29)
2007-06-04 19:53:19 +10:00
Andrew Tridgell
f1d81386e6 - start moving tunable variables into their own structure
- fixed the test scripts to use a separate dbdir

(This used to be ctdb commit 396752e8908c48373564e915e2d49cfc9ff61eba)
2007-06-04 17:46:37 +10:00
Andrew Tridgell
a57991c0eb remove some cruft thats not needed any more
(This used to be ctdb commit c4308805b997740b77e058c1a14b84cb400a7c30)
2007-06-04 17:23:55 +10:00
Ronnie Sahlberg
a3e4e204dc add the ip address to the nodemap structure we pull from a server and
display the physical address of a node when we do a ctdb status

(This used to be ctdb commit 660bf30db713f0680acd3f74275ad603b35a0c24)
2007-06-04 13:26:07 +10:00
Ronnie Sahlberg
8175804757 print an error message to stdout if we failed to open the logfile for
the daemon

(This used to be ctdb commit fca953b1a3f3d6bf18264ecda1c75c68b60e2008)
2007-06-03 18:59:27 +10:00
Andrew Tridgell
518d410075 fixed a race condition in the handling of the recovery lock
(This used to be ctdb commit 3b98c5ad23662259b0eed399ab0c8037cf9b2b0b)
2007-06-03 10:29:14 +10:00
Andrew Tridgell
68963d865a first step towards fixing "make test" with the new daemon system
(This used to be ctdb commit f95f7e4c93dea482e6cf0614b5415229a7c9f3fb)
2007-06-02 13:16:11 +10:00
Andrew Tridgell
ebf12646cf - make specification of a recovery lock file compulsory
- die if someone other than the recmaster can get the recovery lock

(This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869)
2007-06-02 11:36:42 +10:00
Andrew Tridgell
4f72a202d9 - moved cmdline options that are only relevant to ctdbd into ctdbd.c
- fixed a valgrind error on failing to send a control

- don't mark node dead when already disconnected

- moved node list lock code into common code

(This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b)
2007-06-02 10:03:28 +10:00
Andrew Tridgell
27b0e323e6 disable realtime scheduler in event scripts
(This used to be ctdb commit 56225ac6fdfe754289bc7d5e0fc8d21c81a7aa8e)
2007-06-02 08:46:49 +10:00
Andrew Tridgell
5e5701a7b8 - make calling of recovered event script async
- shutdown sockets before calling shutdown script

(This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9)
2007-06-02 08:41:19 +10:00
Andrew Tridgell
7db1d04d5c make the running of the takeover and release event scripts async, to prevent outages due to slow scripts
(This used to be ctdb commit 4189be97eee7ab2a50335c860f2fcd9566667d01)
2007-06-01 19:05:41 +10:00
Andrew Tridgell
bf3b740a1b ctdb is GPL not LGPL
(This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960)
2007-05-31 13:50:53 +10:00
Andrew Tridgell
1e72af9c51 close sockets when we exec scripts
(This used to be ctdb commit 0fac2164db4279db2d7d376a34be05b890304087)
2007-05-30 15:43:25 +10:00
Andrew Tridgell
c833b06a35 we need to listen at transport initialise stage to find our own node number
(This used to be ctdb commit 4a9455dfbe95e53884b46ad26dba0c33e3432ba9)
2007-05-30 14:46:14 +10:00
Andrew Tridgell
3c062bb5ae - use a CTDB_BROADCAST_ALL for the attach message so it goes to currently disconnected nodes
- start node monitoring only after transport starts
- check if a node is already disconnected in the node dead function

(This used to be ctdb commit b81ab6d507797282237768380c6f0e5a4c6519a5)
2007-05-30 14:35:22 +10:00
Andrew Tridgell
8ed48aac51 don't start the transport connecting to the other nodes until after the startup event script has run
(This used to be ctdb commit afca3cc74211aa2e18b1f74d36b2add8dffcfdc7)
2007-05-30 13:26:50 +10:00
Andrew Tridgell
b382fac817 wait for local tcp services like smbd to come up before allowing ctdb to start talking to other nodes
(This used to be ctdb commit 04eea084ebf1710ea66ccb03ac661e3b2f58d96f)
2007-05-30 12:27:58 +10:00
Andrew Tridgell
3b146e7616 don't block SIGCHLD, or we lose return values from system() !
nicer log messages from events script

(This used to be ctdb commit 5ed2b496675a6a47d7ad87519a97bc4f293e6730)
2007-05-29 17:23:29 +10:00
Andrew Tridgell
a7a0f99d98 fixed broadcast controls from the command line
(This used to be ctdb commit 54464e0b5123265780013a0a46c8b94709d227dc)
2007-05-29 16:34:50 +10:00
Andrew Tridgell
bf3f0f4b2f - ignore blank lines at end of lists
- rpm tweaks
(This used to be ctdb commit 3506464fa914c5aad10fe22283563d021ca45ca0)
2007-05-29 16:23:47 +10:00
Andrew Tridgell
5a4c3b0b24 default log file to reasonable location
(This used to be ctdb commit 36b0a43c5d58d8171c1340603486e64051d696ac)
2007-05-29 15:26:38 +10:00
Andrew Tridgell
873c3a5934 use autoconf for more paths
(This used to be ctdb commit b765a391632621dfe3b129b85782e87f586ae2eb)
2007-05-29 15:20:41 +10:00
Andrew Tridgell
2d9e0ad56a use /etc/services for ctdb
(This used to be ctdb commit 64bf6964ff33320c5351337c7f8ed4da5bd71275)
2007-05-29 15:15:00 +10:00
Andrew Tridgell
1140d5a20a fixed more warnings on 64 bit boxes
(This used to be ctdb commit 2f6eae476203f8a8b28e083553204c01f224c8a5)
2007-05-29 13:58:41 +10:00