1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

740 Commits

Author SHA1 Message Date
Ronnie Sahlberg
c5de7cfb8c Merge commit 'rusty/master'
(This used to be ctdb commit b4391c00476cde74101736986dfcd2be6c959edc)
2010-07-30 16:25:40 +10:00
Evan Kinney
0557c418e3 ctdb: Fixed use of reserved word "private" in typedefs
In include/ctdb.h, ctdb_callback_t and ctdb_rrl_callback_t were
defined with a void *private variable. The variable name was
changed to void *private_data to avoid issues encountered in
the Samba autoconf script.

Evan Kinney <evan.kinney@sas.com>

(This used to be ctdb commit 1f453aa4b5e749468c7788afac09c6f0900ea18f)
2010-07-29 17:16:36 +10:00
Rusty Russell
7061ceffd8 Report client for queue errors.
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them.  Add a name for each queue, and print nread.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
2010-07-01 23:08:49 +10:00
Rusty Russell
8946028a07 speed startup: add --sloppy-start.
The extra recovery interval wait was introduced in 821333afb458 but no
explanation was provided in that message.  Nonetheless, if starting
the entire cluster for the first time, it should be safe to skip this.

We use the commandline arg --sloppy-start which should discourage
people from using it outside testing.

Seconds between ctdbd first log message and node healthy:
BEFORE:	16.10
AFTER: 4.03

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)
2010-06-22 22:52:34 +09:30
Rusty Russell
cfe0edc0b9 libctdb: implement synchronous readrecordlock interface.
Because this doesn't use a generic callback, it's not quite as trivial
as the other sync wrappers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 1f20b938d46d4fcd50d2b473c1ab8dc31d178d2d)
2010-06-21 14:47:34 +09:30
Rusty Russell
b93e65eaf7 libctdb: implement ctdb_disconnect and ctdb_detachdb
These are important for testing, since we can easily tell if we
leak memory if there are outstanding allocations after calling
these.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 18a212aa40d0ff9ff59775c6fcf9dc973e991460)
2010-06-18 15:35:52 +09:30
Rusty Russell
5f9e4b60ae Delay reusing ids to make protocol more robust
Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

	ctdbd: tdb_unlock: count is 0
	ctdbd: tdb_chainunlock failed
	smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
	ctdbd: Could not find idr:43
	ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)
2010-06-10 08:58:55 +09:30
Rusty Russell
7589b58138 libctdb: more bool conversion, and accompany lock by ctdb_db in API
I missed some int->bool conversions previously, particularly the
return of ctdb_writerecord().

By always handing functions ctdb_connection or ctdb_db, we keep it
consistent with the rest of the API and can do extra lock consistency
checks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 3f939956ddd693cba6ea5c655288f4f5ca95f768)
2010-06-08 17:11:40 +09:30
Rusty Russell
866cca9637 libctdb: clarify logging levels
Now we have more messages, it seems to make sense to document their usage
and make them consistent.

In particular, LOG_CRIT for internal libctdb problems, LOG_ALERT for
API misuse.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit a6fed3f577c7ec51df38ed15ecb9db6ea2ae7c8f)
2010-06-08 16:53:17 +09:30
Ronnie Sahlberg
b9e5c8a47b Split ctdb_release_lock() into a function to release the locvk and another function to free the data structures.
This allows us to keep the datastructure valid after the lock has been released by the application and we can trap and warn when the application is accessing the lock after it has been released. I.e. application bugs.

(This used to be ctdb commit 463a266205f145cd9c4c36b9c59d3747eeef0e2e)
2010-06-05 15:38:11 +10:00
Rusty Russell
3510980049 libctdb: documentation
Full documentation for all the functions.

This looks longer than it is, because it sorts them into async and
sync parts, and also renames some formal parameters.

Added TODO to libctdb directory to track our plans.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 108e9c2450876a9f8821aa7efd5be971eee5afd3)
2010-06-04 20:30:08 +09:30
Rusty Russell
c5b4768816 libctdb: use values from ctdb_protocol.h, don't re-declare
We're best off including ctdb_protocol.h to get these, even if we
document the important ones in ctdb.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit cdc19dc73032470d57f38bf825d8113b3a0c8cd1)
2010-06-04 20:22:03 +09:30
Rusty Russell
3a569c14bc libctdb: use bool in API
Return bool instead of -1/0; that's what the young kids are doing
these days!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e285b5d5a9d4fbc4f75dbb237d2fcdbd84f2d605)
2010-06-04 20:19:25 +09:30
Rusty Russell
379fd4e606 libctdb: add logging infrastructure
This is based on Ronnie's work, merged with mine.  That means
errors are all my fault.

Differences from Ronnie's:
1) use syslog's LOG_ levels directly.
2) typesafe arg to log function, and use it (eg stderr) in helper function.
3) store fn in ctdb context, and expose ctdb_log_level directly thru API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 86259aa395555aaf7b2fae7326caa2ea62961092)
2010-06-04 20:27:03 +09:30
Rusty Russell
cc8435852c libctdb: add ctdb arg to more functions.
This is going to help for logging, since we want it there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 0786152472bc43efae4c896f7c6c07c6e080b9b2)
2010-06-04 16:54:08 +09:30
Rusty Russell
94df6f322d libctdb: change callback for ctdb_readrecordlock.
After discussion with Ronnie, we decided to revisit this interface.  We use
the name ctdb_readrecordlock_async, as it is *not* always a send, and we
use a specific callback to avoid the "fake request" creation on the fast
path.

The request itself is never exposed: this means it can't be cancelled,
but we can revisit that later if need be.

This makes both use and implementation simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 03b5546ae45a60ab41eb4f7159a45bfdbf959888)
2010-06-04 13:33:08 +09:30
Ronnie Sahlberg
2d4b98381f ctdb_req_control contains 4 padding bytes. Create an explicit pad variable here and set it to 0 when creating a control to keep valgrind happy.
PDUs are padded to 8 byte boundary. If padding is used, memset it to 0
to keep valgrind happy.

(This used to be ctdb commit 8818d5c483558c0faa6a3923ed5e675fdcfc13af)
2010-06-02 16:49:05 +10:00
Ronnie Sahlberg
f94291c37d Make the call to free the request explicit in the callback
instead of implicit

(This used to be ctdb commit 573e4e2d2bd09dd9579150cce926de774a0b609c)
2010-06-02 13:49:34 +10:00
Ronnie Sahlberg
53ea238c6c Add a variable for start/current time to ctdb statistics
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.

(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
2010-06-02 13:14:53 +10:00
Ronnie Sahlberg
8666094e92 add a function to read the current socketname from the ctdb structure
(This used to be ctdb commit 112d252b2ab614eeac38e4a1658cd1e85f6eb829)
2010-06-02 10:25:31 +10:00
Ronnie Sahlberg
3c7350b8c6 rename ctdb_remove_message_handler to ctdb_client_remove_message_handler
to avoid conflict with the function of the same name in libctdb

(This used to be ctdb commit 636ed76d04c8c499a911eb0d72d54b71b0a73d31)
2010-06-02 10:05:58 +10:00
Ronnie Sahlberg
f1b8bd94bb rename ctdb_message_fn_t to ctdb_msg_fn_t to avoid a conflict with the type of the same name used in libctdb
(This used to be ctdb commit 49e23f8329649e4d9eefab47c9b158fcc7210d07)
2010-06-02 10:00:58 +10:00
Ronnie Sahlberg
bc208bc916 rename ctdb_set_message_handler to ctdb_client_set_message_handler
to avoid a colission with the function of the same name in libctdb

(This used to be ctdb commit 41dbdd4fc0ab560420fb0e24a3179ff7c94c5bb7)
2010-06-02 09:51:47 +10:00
Ronnie Sahlberg
761a075de9 rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb
(This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)
2010-06-02 09:45:21 +10:00
Ronnie Sahlberg
bdbf7077e8 rename ccan/typesafe_cb.h to ctdb_typesafe_cb.h and
add this file to the install/rpm

(This used to be ctdb commit 96f186240a17386de1e02eb3af392d97bb55a1ae)
2010-06-02 09:18:48 +10:00
Rusty Russell
bd8d302589 libctdb: tweak interface for readrecordlock
Previously we could hang in poll with the callback pending (since we
fake it): explicitly call it immediately.

Note: I experienced corruption using DLIST_ADD_END (ctdb->pnn was blatted
when adding to the message_handler list).  I switched them all to DLIST_ADD,
but maybe I'm using it wrong?

(This used to be ctdb commit 3727165f0d206999d2cfc2800ff8868640868c7c)
2010-05-24 13:52:17 +09:30
Rusty Russell
30f4d01df1 libctdb: uniform callbacks, _recv functions to pull out data.
This is a bit tricky for those cases where we need to do multiple or
zero I/Os (eg. attachdb and readrecordlock), but works well for the
simple cases.

(This used to be ctdb commit ebe4dd724338c156423cfdcc10a75b68c2084cde)
2010-05-24 13:17:36 +09:30
Rusty Russell
7046a1ad0a libctdb: API changes from Ronnie's version
These simplifications mostly came up due to the implementation.

o Rename ctdb_context to ctdb_connection.
   We already have a ctdb_context internally in ctdbd; don't confuse them!
o Rename ctdb_handle to struct ctdb_request.
   From the user POV it's a request, and it's also useful internally to
   avoid implicit cast to/from void *.
o Rename ctdb_db_context to ctdb_db.
o Introduce ctdb_lock.
   This provides an explicit "lock object" you get from readrecordlock
   and have to hand to those functions which need you to hold a lock.
o status args are "int" not int32_t.
   Should this be a bool?
o Remove last traces on generic callback.
   Without semi-sync API, this doesn't help anything and loses type safety.
o Remove the semi-async API.
   We can add this later, but I think a sync and async API is enough for
   our poor users for the moment :)
o Registering a message handler also takes a callback.
   This way you can tell if it failed.  Not sure if this is overkill, but it's
   consistent.
o ctdb_service() takes an revents arg
   Strictly not necessary for a nonblocking fd, but nice to know if a
   read or write is possible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 86e1f93df856f9627182ed0e18bfcff6866c0954)
2010-05-20 16:07:30 +09:30
Rusty Russell
bbfb992f55 libctdb: ctdb.h and tst.c from Ronnie
This imports ctdb.h and tst.c from Ronnie's work: it's a separate commit
for now to make the changes obvious.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 09f05cbfc883e5aac33d3781b163cde178ece4cf)
2010-05-20 16:01:28 +09:30
Rusty Russell
d5f6026a22 libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.

ctdb.h will be the new, shiny, libctdb API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
2010-05-20 15:18:30 +09:30
Ronnie Sahlberg
6f1221e9e1 Add the number of performed recoveries to the "ctdb statistics" output.
(This used to be ctdb commit fa045733cb81412f0d02ab52d74eabc7efca8b3d)
2010-05-11 09:44:53 +10:00
Rusty Russell
72c275dd70 ctdb: use full range of IDR
This resolves a problem with huge numbers of requests which could overflow
16 bits.  Fortunately, the IDR should scale reasonably well, so we can simply
hold all the requests.

Although noone checks for failure, I added a constant for that.

BZ: 60540
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 72efc4122e37798227c3420a65ed1f706ca9ebe7)
2010-05-11 09:44:43 +10:00
Ronnie sahlberg
46f00a2478 Merge commit 'rusty/signal-fix'
(This used to be ctdb commit 221a9bb41c3a7af0cc65cda78365010893ca1430)
2010-05-03 15:57:41 +10:00
Ronnie Sahlberg
4a43428440 The recent change to the recovery daemon to keep track of and
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.

Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.

BZ62782

(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
2010-05-03 15:47:17 +10:00
Rusty Russell
e1b59b6a47 eventscript: don't do debugging system() from inside signal handler
In the case of a timeout, we dump a log of what's happening to a file
in /tmp.  We do it from the signal handler, which is an unreliable hack
(BZ58365).

Instead, create another (lower-priority) child to do the dump, then
kill the timedout script.

Note that this doesn't quite work as intended (the dump is often run
after the script has been killed), so the next patch resolves this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)
2010-04-08 15:13:29 +09:30
Ronnie Sahlberg
06885ea9a7 In the recovery daemon, keep track of which node we have assigned public ip
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.

If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.

(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
2010-04-08 14:25:26 +10:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
32d00d0a0d controls: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE
metze

(This used to be ctdb commit a2c9e4578e149eccb2c6183f64a6b657eb95c5e1)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
37880b0d0a server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.

metze

(This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
b9f6afe4b0 client: add CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE ctdb_ctrl_get_public_ips_flags()
metze

(This used to be ctdb commit 6bd780510058e5589f2f7c3722d37acbba4935ab)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
15616d3271 reserve upper bits in ctdb_control->flags for opcode specific flags
metze

(This used to be ctdb commit 91122c322fbec08138b92c528d9a946f6727b4fd)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
bea53c60b8 server: keep the interface information in a list of ctdb_iface structures
metze

(This used to be ctdb commit ff5291778f0752e176539397e9530dcf0e546bea)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
a1da4e05b5 server: allow multiple interfaces comma separated in public_addresses
metze

(This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
bec35e6441 server: add a ctdb_set_single_public_ip() helper function
metze

(This used to be ctdb commit 400b4806c4a9686a2ee6398b5d7c3e0ca0793fd1)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Stefan Metzmacher
9cba540514 lib/util: import fault/backtrace handling from samba.
metze

(This used to be ctdb commit 8171d66f0061fe23ed6dfef87ffe63bfc19596eb)
2010-01-20 09:44:36 +01:00
Stefan Metzmacher
a309287947 move DEBUG* macros to one place
metze

(This used to be ctdb commit 4b4dd5d7f81bf226e05c7f3d40087043da1517a2)
2010-01-20 09:44:36 +01:00
Ronnie Sahlberg
a1d60b1511 Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.

Add an entry in the sysconfig file to set this persistently.

(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
2010-01-15 15:38:56 +11:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
af2613e16f ctdb: use mlockall, cautiously
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays.  But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.

This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB.  We warn, but don't exit, if it fails.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
2009-12-16 20:57:20 +10:30
Rusty Russell
c488ba440a Remove RT priority, use niceness.
1) It's buggy.  Code needs to be carefully written (ie. no busy
   loops) to handle running with it, and we fork and run scripts.[1]

2) It makes debugging harder.  If ctdbd loops (as has happened recently)
   it can be extremely hard to get in and see what's happening.  We've already
   seen the valgrind hacks.

3) We have seen recent scheduler problems.  Perhaps they are unrelated,
   but removing this very unusual setup is unlikely to hurt.

4) It doesn't make anything faster.  Under all but the most perverse of
   circumstances, 99% of the cpu gives the same performance as 100%, and
   we will always preempt normal processes anyway.

[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
    each script" by removing the switch_from_server_to_client() which
    restored it, but even that was only for monitor scripts.  Others were
    run with RT priority.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
2009-12-16 19:26:22 +10:30
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
aa658b6777 client: make ctdb_dumpdb_record() public
metze

(This used to be ctdb commit 1cdc8dbb9cb971cf6dd6cd22b1adaf70ddc77e65)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
0e436b46c6 client: add ctdb_ctrl_getdbhealth()
metze

(This used to be ctdb commit 5abe44d0113839d3a45c9a31d30856aa70c2ea1f)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
f1f0af2b67 server: add CTDB_CONTROL_DB_SET_HEALTHY and CTDB_CONTROL_DB_GET_HEALTH
metze

(This used to be ctdb commit 7332d900538f0cbcd953a723417a0fe31dc9807c)
2009-12-16 08:08:29 +01:00
Stefan Metzmacher
94bc40307a server: Use tdb_check to verify persistent tdbs on startup
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.

The default is 0 which means to reject a startup with
unhealthy dbs.

The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.

Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".

metze

(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
2009-12-16 08:06:10 +01:00
Stefan Metzmacher
b74918b465 server: open /var/ctdb/state/persistent_health.tdb.X on startup
This node internal tdb will store the HEALTH state of persistent
tdbs.

metze

(This used to be ctdb commit cbda4666be88c11a810a192a70667b57f773ace1)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
9a96ae0c97 server: only do the mkdir() calls for db_directory* once at the start
metze

(This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
b48228e7f9 server: add db_directory_state to ctdb_context
metze

(This used to be ctdb commit 656a6ec5ed81ccfbb86144156a3158e48f105ee4)
2009-12-16 08:03:55 +01:00
Ronnie Sahlberg
640c48c844 Revert "cleanup: remove a tunable we no longer use in the eventscripts any more :"
This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c.

Conflicts:

	include/ctdb_private.h
	server/ctdb_tunables.c

(This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
2009-12-16 09:51:17 +11:00
Ronnie Sahlberg
0982299bed Revert "Make fetch_locked more scalable"
This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d.

(This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)
2009-12-15 14:26:28 +11:00
Ronnie Sahlberg
5a7e9900df Merge commit 'obnox/ctdb-wip-trans3' into trans3
(This used to be ctdb commit ac06a0e042e7d024060d6e87a49bda9ccc072c52)
2009-12-15 14:25:55 +11:00
Ronnie Sahlberg
649ba2631d Rename the tunable EventScriptBanCount to EventScriptTimeoutCount
since we no longer ban nodes when dodgy scripts continue to hang.

We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban.

(This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
2009-12-14 15:53:23 +11:00
Ronnie Sahlberg
ed6b5a8c68 cleanup: remove a tunable we no longer use in the eventscripts any more :
EventScriptUnhealthyOnTimeout

(This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)
2009-12-14 15:48:47 +11:00
Ronnie Sahlberg
e76561f544 remove the variable "disable when unhealthy"
there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails

(This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
2009-12-14 15:40:54 +11:00
Volker Lendecke
f6ea3e6bcf Make fetch_locked more scalable
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.

The normal flow how this goes is the following:

1. Client does a local fetch_lock on the database

2. Client looks if the local node is dmaster.
   If yes, everything is fine
   If no, continue here

3. Client unlocks the local record

4. Client issues a "get me the record" call to ctdbd

5. ctdbd goes out and fetches the dmaster role

6. ctdbd tells the client to retry

7. Client starts over again

The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.

This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.

As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.

(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
2009-12-12 00:45:39 +01:00
Michael Adam
46de365e78 Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number.
Michael

(This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)
2009-12-12 00:45:39 +01:00
Michael Adam
8dedde81cd define CTDB_DB_SEQNUM_KEY - used with the new implementation of transactions.
Michael

(This used to be ctdb commit 4b1dbcf0853bdc4832d39a477823ae34f216da52)
2009-12-12 00:45:38 +01:00
Volker Lendecke
24d04a3e89 Rename a struct member for clarity
(This used to be ctdb commit 6af5e74a21546d723008d69d6752ebebf898c947)
2009-12-12 00:45:37 +01:00
Michael Adam
faacd5ca79 server: add a new control CTDB_CONTROL_TRANS3_COMMIT
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.

It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.

The old transaction implementation is going to be
removed in a later commit.

Michael

(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
2009-12-12 00:43:26 +01:00
Ronnie Sahlberg
a8549ef700 From: Volker Lendecke <vl@samba.org>
Date: Wed, 9 Dec 2009 22:45:12 +0100
Subject: [PATCH] Revert an accidential commit

(This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)
2009-12-10 08:53:55 +11:00
Volker Lendecke
a0d9bd3c13 Run only one event for each epoll_wait/select call
This might be a bit less efficient, but experience in winbind has shown that
event callbacks can trigger changes in the socket state in very hard to
diagnose ways.

(This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)
2009-12-10 07:52:16 +11:00
Rusty Russell
a46c3b4f2a ctdb: scriptstatus can now query non-monitor events
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.

"ctdb scriptstatus all" returns all event script results.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
2009-12-08 01:50:55 +10:30
Rusty Russell
5d99a1a47c eventscript: expost call names and enum
We're going to need this so ctdb can query non-monitor status.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
2009-12-08 01:47:13 +10:30
Rusty Russell
d3593c2f83 eventscript: save state for all script invocations
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array). 

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
2009-12-08 12:27:48 +10:30
Rusty Russell
9753b7e793 eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
2009-12-08 00:51:24 +10:30
Rusty Russell
23e24c503c eventscript: ctdb_fork_with_logging()
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.

This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.

The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
2009-12-08 12:44:30 +10:30
Rusty Russell
c309d22f9a eventscript: remove unused ctbd_ctrl_event_script*
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.

We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
2009-12-08 00:27:40 +10:30
Rusty Russell
b8e347ec9c eventscript: use direct script state pointer for current monitor
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events.  This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.

Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it.  This means we need to reset
that pointer in the destructor for ctdb_event_script_state.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
2009-12-08 00:14:01 +10:30
Rusty Russell
a4c2a98ba9 eventscript: make current_monitor_status_ctx serve as monitor_event_script_ctx
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context.  This seems more
complex than it needs to be.

We use a single "event_script_ctx" as parent for all event script
state structures.  Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
2009-12-08 00:09:20 +10:30
Rusty Russell
5190932507 eventscript: expost ctdb_ban_self()
eventscript.c uses this now, but our next patch makes others use it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
2009-12-07 23:18:40 +10:30
Rusty Russell
b9b75bd065 eventscript: use -ENOEXEC for disabled status value
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
2009-12-07 23:11:47 +10:30
Rusty Russell
066a791770 eventscript: use -ETIME for timeout status value
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).

This does timeout, but changes the ctdb tool to recognize it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
2009-12-07 23:09:42 +10:30
Rusty Russell
85a6f4a4dd eventscript: marshall onto last_status immediately
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
2009-12-07 23:09:40 +10:30
Michael Adam
0635f8b98f make ctdb_ctrl_transaction_active public.
Michael

(This used to be ctdb commit e5496a83ef4a01604195b27c4b97f50d4979510e)
2009-12-04 11:30:22 +01:00
Ronnie Sahlberg
e28c652cca Dont store debug level DEBUG_DEBUG in the in-memory ringbuffer.
It is unlikely we will need something this verbose for normal troubleshooting.
This allows us to keep a significantly longer time interval of log messages
in the 500k slots available in the ringbuffer.

(This used to be ctdb commit cc99c05c0c6484ad574039a454e6133852cb41fa)
2009-12-04 11:45:37 +11:00
Ronnie Sahlberg
6bad4a4836 Add a proper function to process a process-exist control in the daemon.
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.

If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.

bz58185

(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
2009-12-02 13:58:27 +11:00
Ronnie Sahlberg
1c7de7a2ed Add a double linked list to the ctdb_context to store a mapping between client pids and client structures.
Add the mapping to the list everytime we accept() a new client connection
and set it up to remove in the destructor when the client structure is freed.

(This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)
2009-12-02 13:41:04 +11:00
Ronnie Sahlberg
569001afd0 Merge commit 'martins/status-test-2'
Conflicts:

	server/eventscript.c

(This used to be ctdb commit e9b3477a5b9a2eff18f727e7d59338bfb5214793)
2009-12-01 10:53:18 +11:00
Martin Schwenke
a64ccf07c1 Add flag to ctdb_event_script_callback indicating when called by client.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)
2009-11-26 15:49:49 +11:00
Rusty Russell
3188df4a88 eventscript: check that ctdb forced script events correct
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)
2009-11-25 11:02:29 +10:30
Rusty Russell
2d9254404d eventscript: introduce enum for different event script calls.
Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)
2009-11-24 11:16:49 +10:30
Rusty Russell
2763df22de eventscript: put timeout inside ctdb_event_script_callback_v
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)
2009-11-24 11:09:46 +10:30
Ronnie Sahlberg
e6b69fa760 rework and simplify the eventscript handling
This version has no trailing whitespace, and fixed 


(This used to be ctdb commit defbe318152fc479e8076ad70433cdb4971951af)
2009-11-25 11:00:11 +10:30
Ronnie Sahlberg
ae209c74c8 dont reset the event script context everytime we start a new "ctdb eventscript ..."
command.
Use the existing context used for non-monitor events

Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV

(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
2009-11-19 11:03:51 +11:00
Ronnie Sahlberg
cc2d81a77c make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob
(This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)
2009-11-18 19:10:50 +11:00
Ronnie Sahlberg
bc2675119d add an in memory ringbuffer where we store the last 500000 log entries regardless of log level.
add commandt to extract this in memory buffer and to clear it

(This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)
2009-11-18 12:44:18 +11:00
Ronnie Sahlberg
61de178e0a set up a pipe betweent he main daemon and the child we use for syslogling so that we can clean up the childprocess when we stop ctdbd
(This used to be ctdb commit cb8df973ccd446d87fbdd9a27843e54841ba5d89)
2009-11-16 15:17:32 +11:00
Ronnie Sahlberg
93d902e8f7 test of a change to make ctdbd use "status" event instead of the "monitor" event.
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.

(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
2009-11-13 12:37:55 +11:00
Ronnie Sahlberg
e33722a569 start the syslog child a little later, after we have forked and detached from the local shell
(This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)
2009-10-30 19:39:11 +11:00
Ronnie Sahlberg
5d73f19418 create a child process to write to syslog.
use a udp socket on the ctdbd port to send messages to teh syslog child process for loggign.

we need this when syslog becomes "slow",   like very slow, and on boxes where syslog is limited to 100 lines per second and starts to block after that

(This used to be ctdb commit 1446f4c247310e2ff2d522055bd8927d1a78d017)
2009-10-30 18:53:17 +11:00
Michael Adam
0113744fec server: trans2_active: don't report a transaction active on the node that performs the transaction
Otherwise a node can lock itself out, e.g. when a commit control times out...

Michael

(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
2009-10-30 09:22:18 +11:00
Ronnie Sahlberg
023d09cd38 Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover."
This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36.

(This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)
2009-10-29 10:49:00 +11:00
Ronnie Sahlberg
279b7ca564 update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover.
(This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)
2009-10-29 10:37:10 +11:00
Michael Adam
abac42ca34 server: add a new ctdb control CTDB_TRANS2_ACTIVE
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.

This will be useful for fixing race conditions in the transaction code.

Michael

(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)
2009-10-29 10:14:30 +11:00
Ronnie Sahlberg
d379b30182 create a separate context for non-monitor eventscripts so they dont collide
(This used to be ctdb commit 325de818f88f339a16dc4544e899a2d735933c44)
2009-10-28 17:35:15 +11:00
Ronnie Sahlberg
e07ca41886 change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts
restructure the talloc hierarchy to allow this

(This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)
2009-10-28 16:11:54 +11:00
Ronnie Sahlberg
4d40b86805 for debugging
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon

(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
2009-10-27 13:18:52 +11:00
Ronnie Sahlberg
86d1b4c465 Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects.
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
   A special prefix for the srvid space has been set aside for samba :
   Only samba is allowed to use srvid's with the top 32 bits set like this.
   The lower 32 bits are for samba to use internally.

2, register a "notification" using the new control :
                    CTDB_CONTROL_REGISTER_NOTIFY         = 114,
   This control takes as indata a structure like this :
struct ctdb_client_notify_register {
        uint64_t srvid;
        uint32_t len;
        uint8_t notify_data[1];
};

srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.

If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.

A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".

3, a client that no longer wants to have a notification set up can deregister using control
                    CTDB_CONTROL_DEREGISTER_NOTIFY       = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
        uint64_t srvid;
};

When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.

(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
2009-10-23 15:24:51 +11:00
Ronnie Sahlberg
9b8c72c446 When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.

    This avoids having queued up very very large number of MESSAGES that samba semds
     between eachother to nodes that are blocked/banned/stopped for extended periods
    .

(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
2009-10-21 15:20:55 +11:00
Ronnie Sahlberg
d788dd3627 From wolfgang Mueller
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned

(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
2009-10-20 12:59:48 +11:00
Ronnie Sahlberg
80be59d35e when we change state between healthy/unhealthy, make sure we ask the recovery
master to perform an explicit ip reallocation.

This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.

(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)
2009-10-14 11:59:16 +11:00
Ronnie Sahlberg
122c423b82 add a new control for explicitely cancelling recovery transactions, i.e. the
transactions we start across all tdb databased during the recovery.

this allows us to properly clean up and delete these tdb transactions on a
recovery failure.

(This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d)
2009-10-12 16:48:05 +11:00
Ronnie Sahlberg
73c0adb029 initial attempt at freezing databases in priority order
(This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)
2009-10-12 12:08:39 +11:00
Ronnie Sahlberg
d4c98516a2 uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon.
this is encoded in the srvid field of the request header

(This used to be ctdb commit 0cb3d33caa42ed783e03bc825b181dde4cf63616)
2009-10-12 09:22:17 +11:00
Ronnie Sahlberg
3219f81710 add a control to read the db priority from a database
(This used to be ctdb commit ca6d045e419f308f57e74d4c978907afb05ddb85)
2009-10-10 15:04:18 +11:00
Ronnie Sahlberg
6cf7d8e131 add a control to set a database priority. Let newly created databases default to priority 1.
database priorities will be used to control in which order databases are locked during recovery in.

(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
2009-10-10 14:26:09 +11:00
Ronnie Sahlberg
166b1c97b4 add a new message to ask the recovery daemon to temporarily disable checking ip address consistency.
This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery

(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)
2009-10-06 12:11:32 +11:00
Ronnie Sahlberg
71e4259150 add a new function to collect a list of all active nodes EXCEPT a certain node
(This used to be ctdb commit be52954d921e7d443304cf49fbd488c619a9c4ec)
2009-10-06 10:52:31 +11:00
Ronnie Sahlberg
c971d934a9 From Wolfgang Mueller-Friedt
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.

Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.

this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.

minor stylish changes made by ronnie s

(This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
2009-09-29 13:27:19 +10:00
Ronnie Sahlberg
cda5f02c7c new prototype banning code
(This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a)
2009-09-04 02:20:39 +10:00
Ronnie Sahlberg
1cc79905ad add new controls to make it possible to enable/disable individual eventscripts
update scriptstatus output so it lists disabled scripts

(This used to be ctdb commit 7e799b7523c9699bd65a8a8207f7e03d668b0b81)
2009-08-13 13:04:08 +10:00
Wolfgang Mueller-Friedt
16af87bf25 repack limit tunable
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>

(This used to be ctdb commit a2768b0732f2ab2e3fafda55587bd2e99eedf0fa)
2009-07-29 13:30:39 +10:00
Ronnie Sahlberg
fb63d27e4e initial part of new vacuuming patch.
create some new fields for ctdb_db and tunables

(This used to be ctdb commit 3a8e7d36cc42aedf4b7665364224140dcbfb3efa)
2009-07-29 13:25:43 +10:00
Michael Adam
4cd06a330e Fix persistent transaction commit race condition.
In ctdb_client.c:ctdb_transaction_commit(), after a failed
TRANS2_COMMIT control call (for instance due to the 1-second
being exceeded waiting for a busy node's reply), there is a
1-second gap between the transaction_cancel() and
replay_transaction() calls in which there is no lock on the
persistent db. And due to the lack of global state
indicating that a transaction is in progress in ctdbd, other nodes
may succeed to start transactions on the db in this gap and
even worse work on top of the possibly already pushed changes.
So the data diverges on the several nodes.

This change fixes this by introducing global state for a transaction
commit being active in the ctdb_db_context struct and in a db_id field
in the client so that a client keeps track of _which_ tdb it as
transaction commit running on. These data are set by ctdb upon
entering the trans2_commit control and they are cleared in the
trans2_error or trans2_finished controls. This makes it impossible
to start a nother transaction or migrate a record to a different
node while a transaction is active on a persistent tdb, including
the retry loop.

This approach is dead lock free and still allows recovery process
to be started in the retry-gap between cancel and replay.
Also note, that this solution does not require any change in the
client side.

This was debugged and developed together with
Stefan Metzmacher <metze@samba.org> - thanks!

Michael

(This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)
2009-07-29 11:12:39 +10:00
Ronnie Sahlberg
37d68c58b8 add two commands : setlmasterrole and setrecmasterrole to enable/disable these capabilities at runtime
(This used to be ctdb commit 51aaed0e9e42e901451292e8dd545297ab725a62)
2009-07-28 13:45:13 +10:00
Ronnie Sahlberg
72e2380e92 add a command "setnatgwstate {on|off}" that can be used to indicate if this node is using natgw functionality or not.
(This used to be ctdb commit 89a9bb29a60a6fb1fba55987e6cf0a4baa695e50)
2009-07-28 09:58:11 +10:00
Ronnie Sahlberg
e5e9fc48b1 create a new event : stopped.
This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ...

Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered.

(This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)
2009-07-17 12:26:16 +10:00
Ronnie Sahlberg
88f3c40d9c add two new controls, CTOP_NODE and CONTINUE_NODE
that are used to stop/continue a node instead of using modflags messages

(This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a)
2009-07-09 12:22:46 +10:00
Ronnie Sahlberg
66c8d4fb3d make it possible to start the daemon in STOPPED mode
(This used to be ctdb commit 866aa995dc029db6e510060e9e95a8ca149094ac)
2009-07-09 11:57:20 +10:00
Ronnie Sahlberg
9f0dc4b93b Add a new node flag : STOPPED
This node flag means the node is DISABLED and that all its public ip addresses
are failed over, but also that it has been removed from the VNNmap.

A STOPPED node should be in recovery mode active untill restarted using the continue command.

Adding two new commands "ctdb stop" "ctdb continue"

(This used to be ctdb commit d47dab1026deba0554f21282a59bd172209ea066)
2009-07-09 11:38:18 +10:00
Ronnie Sahlberg
289c58e9b6 add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process.
the ctdb command will block until the ip reallocation has comleted

(This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216)
2009-07-02 13:00:26 +10:00
Ronnie Sahlberg
93026f4cbf update the handling of debug levels so that we always can use a literal instead of a numeric value.
validate the input values used and refuse setting the debug level to an unknown value

(This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)
2009-07-01 09:17:13 +10:00
Ronnie Sahlberg
5b235c3999 add a control to set the reclock file
(This used to be ctdb commit 36cc2e586f03fa497ee9b06f3e6afc80219c4aaa)
2009-06-25 14:25:18 +10:00
Ronnie Sahlberg
2b253c094c add a control to read the current reclock file from a node
(This used to be ctdb commit ed6a4cbcdcbb4e0df83bec8be67c30288bf9bd41)
2009-06-25 12:17:19 +10:00
Ronnie Sahlberg
e6170b5389 add a new node state : DELETED.
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.

This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like

   1.0.0.1
   #1.0.0.2
   1.0.0.3

After removing 1.0.0.2 from the cluster,  the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2

Any line in the nodes file that is commented out represents a DELETED pnn

(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
2009-06-01 14:18:34 +10:00
Sumit Bose
11988fc77a structure member node_list_file is not used anywhere
(This used to be ctdb commit 0e84ea23d1d998d4d4ac7d8a858b3d8294f056cb)
2009-05-21 11:16:43 +10:00
Sumit Bose
9171a7784c structure member logfile is not used anywhere
(This used to be ctdb commit 4f86c991812c2d0bddbe3de9a9906cf5df118cd4)
2009-05-21 11:15:43 +10:00
Ronnie Sahlberg
98a54c4675 Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
2009-05-14 10:33:25 +10:00
root
af25fa38f3 fixed a problem with clients disconnecting during a traverse
When a client (such as smbstatus) is killed, it may have outstanding
traverse children on remote nodes. We need to catch the client
disconnect in ctdbd and send a control to all nodes telling them to
kill those outstanding traverse children.

(This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)
2009-05-06 07:32:25 +10:00
root
6793f077a8 Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery
(This used to be ctdb commit 329df9e47e6ca8ab5143985a999e68f37c6d88a5)
2009-05-01 01:17:59 +10:00
Ronnie Sahlberg
38ea6708dd add a tuneable RecoveryDropAllIPs so it is possible to control after how long a node that has been stuck in recovery will wait until it will yield all public addresses.
this now defaults to 60 seconds

This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses

(This used to be ctdb commit 70f21428c9eec96bcc787be191e7478ad68956dc)
2009-04-24 18:28:08 +10:00
Ronnie Sahlberg
d94917ec49 Change the (dodgy) seqnumfrequency variable to have ms resolution instead of second resolution.
Rename the variable to SeqnumInterval for
1, it is an interval and not a 1/interval unit
2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable

this is a real dodgy variable

(This used to be ctdb commit 68eac459e5d2b6b534f72821036675ffe5d7a350)
2009-04-01 17:21:38 +11:00
Ronnie Sahlberg
297ab50173 remove a prototype for a function no longer used
(This used to be ctdb commit 9ac9745ba9296d01e3b18148ae8c3240e51cf090)
2009-04-01 17:13:48 +11:00
Ronnie Sahlberg
ad40ee25f9 add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state.
This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes.

(This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)
2009-03-31 14:23:31 +11:00
Ronnie Sahlberg
689f76f0b0 Merge branch 'obnox'
(This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)
2009-03-24 17:49:55 +11:00
Ronnie Sahlberg
7265c713db we need to set the port properly in the parse_ip helper
(This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)
2009-03-24 13:45:11 +11:00
Michael Adam
a83ed1d743 Merge commit 'ctdb-ronnie/master'
(This used to be ctdb commit 39a972b0d6d0d70282c25c54a124b67431467e77)
2009-03-23 10:07:44 +01:00
root
629d5ee1fa add a new command "ctdb scriptstatus"
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.

If an eventscript timedout or returned an error we also
show the output from the eventscript.

Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb              Status:OK    Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface         Status:OK    Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd        Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd            Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd             Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba             Status:ERROR    Duration:0.057 Mon Mar 23 19:04:33 2009
   OUTPUT:ERROR: Samba tcp port 445 is not responding

Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.

Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.

(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
2009-03-23 19:07:45 +11:00
root
dc05c1b80c create a helper function that converts a ctdb instance in daemon mode to become
a ctdb client instance.

use this from the recovery daemon child process to switch to client mode
and connect back to the main daemon

(This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a)
2009-03-23 12:37:30 +11:00
Michael Adam
839dec1b12 move common code of system_linux.c and system_aix.c into new system_common.c
Michael

(This used to be ctdb commit 124874847e5e03ce2a44bddfe778f01dfb0a7a03)
2009-02-28 03:08:31 +01:00
Michael Adam
1821d5619b remove include <netinet/in.h> from public ctdb.h
This is not portable.
The ctdb build includes the necessary headers from includes.h.
And users of ctdb should cope with including the necessary
prerequisite headers themselves.

Michael

(This used to be ctdb commit fedc6983f5dee39152e6f400f89a3e07eab57f0c)
2009-01-29 13:22:02 +01:00
Michael Adam
70aa6445d6 Fix the build on AIX: sys/socket.h needs to be included before ctdb.h
(for struct sockaddr to be defined)

Thanks to William Jojo <w.jojo@hvcc.edu> for reporting.

Michael

(This used to be ctdb commit 7558bca1e99884c02747adb7cbea799d04ee24d5)
2009-01-29 10:24:58 +01:00
Michael Adam
3cca0f75e4 Fix treatment of link local ipv6 addresses: set the scope id.
metze / Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)
2009-01-19 22:50:53 +01:00
root
321866dbba finish the ipv6 support.
allow clients to register either ipv4 or ipv6 client connections to the tickles list

(This used to be ctdb commit d9b44d7c3255b0fd7359b9afeb613e6ff4c4eaac)
2009-01-13 16:17:20 +11:00
Ronnie Sahlberg
edb7241c05 redesign how reloadnodes is implemented.
modify the transport methods to allow to restart individual connections
and set up destructors properly.

only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.

make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded

(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
2008-12-02 13:26:30 +11:00
Ronnie Sahlberg
a782bdbacd inew version 1.0.66
ddwq

(This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5)
2008-11-24 19:06:02 +11:00
Ronnie Sahlberg
94a56ea410 reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.

(This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)
2008-11-20 12:43:18 +11:00
Ronnie Sahlberg
e1b0cea427 add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold

(This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)
2008-10-30 12:49:53 +11:00
Ronnie Sahlberg
b9bd20ce55 add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses

(This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0)
2008-10-22 11:04:41 +11:00
Ronnie Sahlberg
ce66008e08 specify a "script log level" on the commandline to set under which log
level any/all output from eventscripts will be logged as

(This used to be ctdb commit cdc79d4f22f1a6aec5c34115969421f93663932a)
2008-10-17 07:56:12 +11:00
Ronnie Sahlberg
260718e017 update the client side of getnodemap and getpublicips controls to
fallback to the old-style ipv4-only controls if the new-style ipv4/ipv6
control fails.

this allows a 1.0.59+ (ipv4/ipv6) ctdb daemon being recmaster  to be
compatible with
pre-1.0.59  versions of ctdb that are ipv4 only.

(This used to be ctdb commit 8e912abc2c68f5fe7b06c600ba6fec1a6900127c)
2008-10-15 00:24:44 +11:00
Ronnie Sahlberg
cb300382b0 update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
older ipv4-only version of these controls.

We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.

(This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)
2008-10-14 10:40:29 +11:00
Ronnie Sahlberg
a3bbe238c9 The ctdb daemon keeps track of whether the recovery process is running
correctly by measuring how long it was since the last successful
communication with the recovery daemon was recorded.

After a certain timeout the ctdb daemon would deem the recovery daemon
as inoperable and shut down.

If the system clock is suddenly changed forward by many (60 or more)
seconds this could cause the timeout to trigger prematurely/immediately
where ctdb would incorrectly think that more than 60 seconds had passed
since last successful communications and thus abort.

Instead of cehcking for one timeout occuring, only deem the recovery
daemon to be "down" and trigger a shutdown if communications have
timedout for three intervals in a row.

(This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d)
2008-09-17 14:17:41 +10:00
Ronnie Sahlberg
6474f3278d additional monitoring between the two daemons.
we currently only monitor that the dameons are running by kill(0, pid)
and verifying the the domain socket between them is ok.

this is not sufficient since we can have a situation where the recovery
daemon is hung.

this new code monitors that the recovery daemon is operating.
if the recovery hangs, we log this and shut down the main daemon

(This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)
2008-09-09 13:44:46 +10:00
Ronnie Sahlberg
a35fa0aa8f rename ctdb_tcp_client back to the original name ctdb_control_tcp
(This used to be ctdb commit 4d1c0418cfe6170bc081684dbe45908a5d285f0b)
2008-08-27 10:24:35 +10:00
Ronnie Sahlberg
5193caec6d make the function to canonicalize a sockaddr structure public
(This used to be ctdb commit 1157d61a0bc557d8ffc453c518dfc48473492bfd)
2008-08-20 11:58:27 +10:00
Ronnie Sahlberg
ef997d344f initial ipv6 patch
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)
2008-08-19 14:58:29 +10:00
Andrew Tridgell
aa1bc0abba added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell
the difference between a initial commit attempt and a retry, which
allows us to get the persistent updates counter right for retries

(This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5)
2008-08-08 13:11:28 +10:00
Andrew Tridgell
5a0249d34c return a more detailed error code from a trans2 commit error
(This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113)
2008-08-08 09:58:49 +10:00
Ronnie Sahlberg
b9d8bb23af remove the reclock file we store pnn counts in.
This file creates additional locking stress on the backend filesystem and we may not need it anyway.

(This used to be ctdb commit 84236e03e40bcf46fa634d106903277c149a734f)
2008-08-06 11:52:26 +10:00
Andrew Tridgell
237e2f5409 new prototypes
(This used to be ctdb commit 71d9d24abae62f70acbd7c1ded8af0b817607c2a)
2008-07-30 19:58:27 +10:00
Andrew Tridgell
98502135e7 added new multi-record transaction commit code
(This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692)
2008-07-30 19:57:00 +10:00
Andrew Tridgell
abe0232818 rename the structure we use for marshalling multiple records
(This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106)
2008-07-30 14:24:56 +10:00
Ronnie Sahlberg
1bfcca524d From Michael Adams,
change one element from private to private_data

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 0de79352c9b36c118e36905f08ebbe38ecbb957e)
2008-07-22 09:07:42 +10:00
Ronnie Sahlberg
6eb4e46fe1 Add two new controls to start and cancel a persistent update.
This allows ctdb to automatically start a new full blown recovery
if a client has started updating the local tdb for a persistent database
but is kill -9ed before it has ensured the update is distributed clusterwide.

(This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)
2008-07-17 13:50:55 +10:00
Ronnie Sahlberg
ab8535eaa5 make LVS a capability so that we can see which nodes are configured with
LVS and which are not using LVS.

"ctdb getcapabilities"

(This used to be ctdb commit 172d01fb34f032e098b1c77a7b0f17bf11301640)
2008-07-10 10:37:22 +10:00
Andrew Tridgell
9999f18369 an extraordinarily ugly patch!
This is a hack to allow backtraces under valgrind to show what opcode
is getting uninitialised bytes

(This used to be ctdb commit 67bb12c8f0af5914efb44b76bc6ddbb11fc0fcdf)
2008-07-04 18:00:24 +10:00
Andrew Tridgell
8be67e0e09 CTDB_NO_MEMORY_VOID() needs to return on error
(This used to be ctdb commit 6d21fd57bedffce2298ce7fe4c7d889c858ba7fa)
2008-07-04 16:58:29 +10:00
Ronnie Sahlberg
ef769e7237 track both when we last started and ended a recovery.
make ctdb uptime print how long the recovery took

in the recovery daemon when we check that the public ip address
allocation on the local node is correct (we have the ips we should have
and we dont have any we shouldnt have) use ctdb uptime and check the
recovery start/stop times and make sure we dont check for ip allocation
inconsistencies during a recovery  where the ip address allocation is in flux.

(This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429)
2008-07-02 13:55:59 +10:00
Ronnie Sahlberg
05b50ebe0a print the opcode when an async callback detects an error
(This used to be ctdb commit 423934629704683d3a3042570577fb4e04b17a6d)
2008-07-02 12:21:53 +10:00
Ronnie Sahlberg
779468ab3f if the event scripts hangs EventScriptsBanCount consecutive times in a row
the node will ban itself for the default recovery ban period

(This used to be ctdb commit 7239d7ecd54037b11eddf47328a3129d281e7d4a)
2008-06-13 13:18:06 +10:00
Ronnie Sahlberg
4b6b094860 add a callback for failed nodes to the async control helper.
this callback is called for every node where the control failed (or timed out)

when we issue the start recovery control from recovery master,
set any node that fails as a culprit   so it will eventually be banned

(This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2)
2008-06-12 16:53:36 +10:00
Ronnie Sahlberg
d8433cacb2 first cut to convert takeover_callback_state{}
to use ctdb_sock_addr instead of sockaddr_in

(This used to be ctdb commit 5444ebd0815e335a75ef4857546e23f490a22338)
2008-06-04 17:12:57 +10:00
Ronnie Sahlberg
7d39ac131b convert handling of gratious arps and their controls and helpers to
use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6

(This used to be ctdb commit 86d6f53512d358ff68b58dac737ffa7576c3cce6)
2008-06-04 15:13:00 +10:00
Ronnie Sahlberg
1c88f422d5 add a parameter for the tdb-flags to the client function
ctdb_attach()   so that we can pass TDB_NOSYNC when we attach to
a persistent database and want fast unsafe writes instead of
slow but safe tdb_transaction writes.

enhance the ctdb_persistent test suite to test both safe and unsafe writes

(This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4)
2008-06-04 10:46:20 +10:00
Ronnie Sahlberg
ceaf488f05 do persistent writes in a child process
(This used to be ctdb commit 2da3d1f876f5d654f849af8a3e588f5a61300c3d)
2008-05-28 13:04:25 +10:00
Ronnie Sahlberg
ed2cf0291d second try for safe transaction stores into persistend tdb databases
for stores into persistent databases, ALWAYS use a lockwait child take out the lock for the record and never the daemon itself.

(This used to be ctdb commit 7fb6cf549de1b5e9ac5a3e4483c7591850ea2464)
2008-05-22 12:47:33 +10:00
Ronnie Sahlberg
909ff219e0 Start implementing support for ipv6.
This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well.

Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close)  a raw socket of the appropriate type inside ctdb_sys_send_tcp().
We know which type of socket v4/v6 to use based on the sin_family of the destination address.

Since ctdb_sys_send_tcp() opens its own socket  we no longer nede to pass a socket
descriptor as a parameter.  Get rid of this redundant parameter and fixup all callers.

(This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)
2008-05-14 15:47:47 +10:00
Ronnie Sahlberg
2bc0e5a69f add a new container to hold a socketaddr for either ipv4 or ipv6
(This used to be ctdb commit 93b98838824fae5f47e4ed6b95ae9e4e7597bec3)
2008-05-14 15:40:44 +10:00
Ronnie Sahlberg
b8eb5925cf Try to use tdb transactions when updating a record and record header inside the ctdb daemon.
If a transaction could be started, do safe transaction store when updating the record inside the daemon.
If the transaction could not be started (maybe another samba process has a lock on the database?) then just do a normal store instead (instead of blocking the ctdb daemon).

The client can "signal" ctdb that updates to this database should, if possible, be done using safe transactions by specifying the TDB_NOSYNC flag when attaching to the database.
The TDB flags are passed to ctdb in the "srvid" field of the control header when attaching using the CTDB_CONTROL_DB_ATTACH_PERSISTENT.

Currently, samba3.2 does not yet tell ctdbd to handle any persistent databases using safe transactions.

If samba3.2 wants a particular persistent database to be handled using
safe transactions inside the ctdbd daemon, it should pass
TDB_NOSYNC as the flags to the call to attach to a persistent database
in ctdbd_db_attach()     it currently specifies 0 as the srvid

(This used to be ctdb commit 8d6ecf47318188448d934ab76e40da7e4cece67d)
2008-05-12 13:37:31 +10:00
Ronnie Sahlberg
92b61cd7d5 Expand the client async framework so that it can take a callback function.
This allows us to use the async framework also for controls that return
outdata.

Add a "capabilities" field to the ctdb_node structure. This field is
only initialized and kept valid inside the recovery daemon context and not
inside the main ctdb daemon.

change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable.

When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes.
when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap.
Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list)

(This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6)
2008-05-06 15:42:59 +10:00
Ronnie Sahlberg
a9c45f9513 Add a capabilities field to the ctdb structure
Define two capabilities :
can be recmaster
can be lmaster
Default both capabilities to YES

Update the ctdb tool to read capabilities off a node

(This used to be ctdb commit 50f1255ea9ed15bb8fa11cf838b29afa77e857fd)
2008-05-06 10:02:27 +10:00
Ronnie Sahlberg
0e1a20b603 Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""
remove the transaction stuff and push   so that the git tree will work

This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b.

(This used to be ctdb commit 876d3aca18c27c2239116c8feb6582b3a68c6571)
2008-04-10 15:59:51 +10:00
Ronnie Sahlberg
39f119b42c Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603.

(This used to be ctdb commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b)
2008-04-10 14:57:41 +10:00
Ronnie Sahlberg
9684befa16 Revert "- accept an optional set of tdb_flags from clients on open a database,"
This reverts commit 49330f97c78ca0669615297ac3d8498651831214.

(This used to be ctdb commit 171d1d71ef9f2373620bd7da3adaecb405338603)
2008-04-10 14:45:45 +10:00
Andrew Tridgell
dc15a9c1f6 - accept an optional set of tdb_flags from clients on open a database,
thus allowing the client to pass through the TDB_NOSYNC flag

- ensure that tdb_store() operations on persistent databases that don't
  have TDB_NOSYNC set happen inside a transaction wrapper, thus making
  them crash safe

(This used to be ctdb commit 49330f97c78ca0669615297ac3d8498651831214)
2008-04-10 15:25:48 +10:00
Ronnie Sahlberg
e8e67ef576 add a mechanism to force a node to run the eventscripts with arbitrary arguments
ctdb eventscript "command argument argument ..."

(This used to be ctdb commit 118a16e763d8332c6ce4d8b8e194775fb874c8c8)
2008-04-02 11:13:30 +11:00
Ronnie Sahlberg
27a7f854f5 add improvements to tracking memory usage in ctdbd adn the recovery daemon
and a ctdb command to pull the talloc memory map from a recovery daemon
ctdb rddumpmemory

(This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)
2008-04-01 15:34:54 +11:00
Ronnie Sahlberg
0d7b34c9e5 Add two new controls to add/delete public ip address from a node at runtime.
The controls only modify the runtime setting of which public addresses a node
can server and does not modify /etc/ctdb/public_addresses.
To make the change permanent you also need to edit /etc/ctdb/public_addresses
manually.

After ip addresses have been added/deleted you need to invoke a recovery
for the ip addresses to be redistributed.

(This used to be ctdb commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c)
2008-03-27 09:23:27 +11:00
Ronnie Sahlberg
2863d2cfd1 From M Dietz,
Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago

(This used to be ctdb commit 8477f6a079e2beb8c09c19702733c4e17f5032fe)
2008-03-25 08:27:38 +11:00
Ronnie Sahlberg
d53424731f in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb.
This can cause a memory leak if the call is terminated before we have managed to respond to the client.
(and the call is talloc_free()d but the data is still hanging off ctdb)

instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak.

In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc().

This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so
we must change all creations of a ctdb_call into explicitely creating it through talloc()

(This used to be ctdb commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f)
2008-03-19 13:54:17 +11:00
Ronnie Sahlberg
74d57f8d51 Redo the vacukming process to mkake it scalable.
Vacumming used to delete one record at a time on all nodes, that was
m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all.

The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted.

(This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53)
2008-03-13 07:53:29 +11:00
Ronnie Sahlberg
a89ed0fdc2 add a new tunable 'NoIPFailback'
when this tunable is set, ip addresses will only be failed over when a node
fails. And only those ip addresses held by the failed node will be reallocated
in the cluster.

When a node becomes active again, this will not lead to any failback of ip addresses.

This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes.

This tuneable can NOT be active at the same time as DeterministicIPs are used.

(This used to be ctdb commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4)
2008-03-03 12:52:16 +11:00
Ronnie Sahlberg
f6f7f54bd6 add a new tunable : reclockpingperiod
once every such interval :
* the recovery master on each node will uppdate the "connected" count in the
reclock count file (ctdb getreclock)
* if the node thinks it is a recovery master but it detects another node
  that is DISCONNECTED but which still holds a lock to the reclock count file
  this may mean that we have a split cluster.
  if that other node that is DISCONNECTED but still holds the lock on hte reclock
  pnn count file, is MORE connected than the local node,
  yield the recmaster role and let the other half of the lcuster take over

this add a second, last chance mechanism to detect split clusters.
IF the cluster is split but GPFS is not yet split, this mechanism makes
the largest half of the cluster become the active half.

(This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287)
2008-03-03 09:19:30 +11:00
Ronnie Sahlberg
e0036942bc add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate.
(This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09)
2008-02-29 12:37:42 +11:00
Ronnie Sahlberg
4adeafef11 add a control to get the name of the reclock file from the daemon
(This used to be ctdb commit 9effb22cc1616d684352d7ebabb359e69adb0f52)
2008-02-29 10:03:39 +11:00
Ronnie Sahlberg
7bc8007f93 add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY.
Use with caution.

(This used to be ctdb commit c20293360db67f9876b0c84e5e9e12a5868964cb)
2008-02-22 10:33:09 +11:00
Ronnie Sahlberg
39539f6044 Add a new parameter to /etc/sysconfig/ctdb
CTDB_START_AS_DISABLED="yes"

and command line argument
--start-as-disabled

When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses.
The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses.

Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster.

(This used to be ctdb commit b93d29f43f5306c244c887b54a77bca8a061daf2)
2008-02-22 09:42:52 +11:00
Ronnie Sahlberg
9f99b44fd1 to make it easier/less disruptive to add nodes to a running cluster
add a new control that causes the node to drop the current nodes list
and reread it from the nodes file.
During this operation, the node will also drop the tcp layer and restart it.

When we drop the tcp layer, by talloc_free()ing the ctcp structure
add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer

add two new commands for the ctdb tool.
one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file

(This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c)
2008-02-19 14:44:48 +11:00
Ronnie Sahlberg
3f56526037 Specify and print debuglevels by name and not by number
(This used to be ctdb commit 79ad830294b8b677fbd0c5ad7ed6fbde71f74f8d)
2008-02-05 10:26:23 +11:00
Andrew Tridgell
9d6ac0cf55 added debug constants to allow for better mapping to syslog levels
(This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)
2008-02-04 17:44:24 +11:00
Andrew Tridgell
146d4b0db7 merge async recovery changes from Ronnie
(This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2)
2008-01-29 13:59:28 +11:00
Ronnie Sahlberg
9055978b46 add a ctdb uptime command that prints when ctdb was started and when the
last recovery occured

(This used to be ctdb commit b86e8ccbdac044bb949c4fc2ebb27635126272a9)
2008-01-17 11:33:23 +11:00
Andrew Tridgell
b62b7fcde8 added syslog support, and use a pipe to catch logging from child processes to the ctdbd logging functions
(This used to be ctdb commit 1306b04cd01e996fd1aa1159a9521f2ff7b06165)
2008-01-16 22:03:01 +11:00
Ronnie Sahlberg
5b7838d768 ctdb_control_send() does not need to take an outdata parameter
remove the outdata parameter from the function and all callers

(This used to be ctdb commit e3951337f8df2ae19cce61c954036590c7a03582)
2008-01-16 10:23:26 +11:00
Ronnie Sahlberg
ba31feaec0 split node health monitoring and checking for connected/disconnected
nodes into two separate files.

move the monitoring of keepalives for detecting connected/disconnected 
remote nodes into ctdb_keepalive.c

(This used to be ctdb commit 23a57b20c314d5f11a433cf251eb9d9de743849a)
2008-01-15 08:42:12 +11:00
Andrew Tridgell
b866a147d2 get rid of monitor_retry as well
(This used to be ctdb commit c957cf9c1d99d5d3f4ca726f7a867c829660a2b7)
2008-01-10 14:49:43 +11:00
Andrew Tridgell
538f519dba exponential backoff in health monitoring for faster startup
(This used to be ctdb commit 1b04a1f675f73b48366ba98803a58c3d8df1b6e1)
2008-01-10 14:40:56 +11:00
Andrew Tridgell
3b3fceacbe block alarm signals during critical sections of vacuum
(This used to be ctdb commit cfb14ae76f00f10d27b56c034b2247ab12d63065)
2008-01-10 09:43:14 +11:00
Andrew Tridgell
1c91398aef ensure the recovery daemon is not clagged up by vacuum calls
(This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b)
2008-01-08 21:28:42 +11:00
Andrew Tridgell
96100fcae6 added two new ctdb commands:
ctdb vacuum   : vacuums all the databases, deleting any zero length
                 ctdb records

 ctdb repack   : repacks all the databases, resulting in a perfectly
                 packed database with no freelist entries

(This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4)
2008-01-08 17:23:27 +11:00
Andrew Tridgell
37861932ce merge from ronnie
(This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7)
2008-01-07 16:17:22 +11:00
Andrew Tridgell
748843a3c6 added paranoid transaction ids
(This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e)
2008-01-06 13:24:55 +11:00
Andrew Tridgell
c08f2616cd new simpler and much faster recovery code based on tdb transactions
(This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3)
2008-01-06 12:38:01 +11:00
Andrew Tridgell
43aa27c9ee this is needed with merged tdb
(This used to be ctdb commit 3dc07f2bf98ab445ab960ef14173bc6924e3b658)
2008-01-05 17:42:01 +11:00
Andrew Tridgell
e4aefbc66d a new tunable DatabaseMaxDead that enables the tdb max dead cache logic
(This used to be ctdb commit 01c519c3658a8fcb9545b507b597e723658e4c4e)
2008-01-05 09:36:53 +11:00
Ronnie Sahlberg
50573c5391 add ctdb_disable/enable_monitoring() that only modifies the monitoring
flag.
change calling of the recovered/takeip/releaseip event scripts to use 
these enable/disable functions instead of stopping/starting monitoring.

when we disable monitoring we want all events to still be running
in particular the events to monitor for dead nodes  and we only want to 
supress running the monitor event scripts

(This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)
2007-11-30 10:09:54 +11:00
Ronnie Sahlberg
0eb6c04dc1 get rid of the control to set the monitoring mode.
monitoring should always be enabled
(though a node may want to temporarily disable running the "monitor"
event scripts but can do so internally without the need for this 
control)

(This used to be ctdb commit e3a33618026823e6af845fd8513cddb08e6b5584)
2007-11-30 10:00:04 +11:00
Ronnie Sahlberg
9e73dc87cc Add a --node-ip argument so that one can specify which ip address a
specific instance of ctdbd should bind to. This helps when running a
"virtual" cluster on a single machine where all instcances bind to 
different alias interfaces.

If --node-ip is specified, then we will only try to bind to this ip 
address only. Othervise we fall back to the original method trying the
ip addresses in /etc/ctdb/nodes one by one until we find one we can bind 
to.

No variable in /etc/sysconfig/ctdb added since this parameter only makes 
sense in a virtual test/debug cluster.

(This used to be ctdb commit d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0)
2007-11-26 10:52:55 +11:00
Andrew Tridgell
bde886988b prevent a deadly embrace between smbd and ctdbd by moving the calling
of the startup event scripts after the point where recovery has
started and the node is in normal operation

This makes the 'startup' script just a special type of the 'monitor'
script which is called first

(This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)
2007-11-12 10:53:11 +11:00
Ronnie Sahlberg
4a97876fb7 when we are shutting down, we should first shut down the recovery daemon
(This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)
2007-10-22 12:34:08 +10:00
Ronnie Sahlberg
d1ba047b7f add a new transport method so that when a node is marked as dead, we
shut down and restart the transport

othervise, if we use the tcp transport the tcp connection might try to 
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp 
connection quickly reaches the maximum backoff rto which is often 60 or 
120 seconds.   this would mean that it could take up to 60/120 seconds 
before the tcp layer detects that the connection is dead and it has to 
be reestablished.

(This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)
2007-10-19 08:58:30 +10:00
Ronnie Sahlberg
056aac6e0c add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set,    the first entry in /etc/ctdb/public_addresses will 
always be hosted by node 0, when that node is available, the second 
entry by node1 and so on.

This tunable allows the allocation of addresses to become very 
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are 
identical on all the nodes in the cluster.

(This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
2007-10-16 12:15:02 +10:00
Andrew Tridgell
0e855c0772 merge from ronnie
(This used to be ctdb commit d18712caba11855010be52f90bac656683076676)
2007-10-15 14:17:49 +10:00
Andrew Tridgell
174879621e add config option for disabling bans
(This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)
2007-10-15 13:22:58 +10:00
Ronnie Sahlberg
bdd67bba1e add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip 
address


update the killtcp control in the daemon that if a socketpair that is to 
be killed does not match a normal public address it checks if the 
destination address maches the single public ip address and if so uses 
that vnn structure from the ctdb context


this allows killtcp to kill also connections to the single public ip 
instead of only normal public addresses

(This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)
2007-10-10 09:42:32 +10:00
Ronnie Sahlberg
80cd82f8e4 add a control to send gratious arps from the ctdb daemon
(This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)
2007-10-09 11:56:09 +10:00
Ronnie Sahlberg
72379ee3eb change async.private to async.private_data since private is a reserved
work in c++

(This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)
2007-09-26 14:25:32 +10:00
Andrew Tridgell
80100c3573 run monitoring more quickly when unhealthy and at startup
(This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)
2007-09-24 10:12:18 +10:00
Andrew Tridgell
c60988325d added support for persistent databases in ctdbd
(This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)
2007-09-21 12:24:02 +10:00
Andrew Tridgell
3c0f61cb92 we don't need the is_loopback logic in ctdb any more
(This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)
2007-09-13 10:45:06 +10:00
Andrew Tridgell
f3ae1cdb02 - use struct sockaddr_in more consistently instead of string addresses
- allow for public_address lines with a defaulting interface

(This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)
2007-09-10 14:27:29 +10:00
Ronnie Sahlberg
4ac749bfa4 change the signature to ctdb_sys_have_ip() to also return:
a bool that specifies whether the ip was held by a loopback adaptor or 
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address 
over to the loopback interface

when we release an ip address  after we have move it onto loopback, 
use 60.nfs to kill off the server side (the local part) of the tcp 
connection   so that the tcp connections dont survive a 
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip 
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public 
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
2007-09-10 07:20:44 +10:00
Ronnie Sahlberg
77ec4d5248 allow different nodes in the cluster to use different public_addresses
files
so that we can partition the cluster into different subsets of nodes 
which each serve a different subset of the public addresses

(This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)
2007-09-04 23:15:23 +10:00
Ronnie Sahlberg
8f819c6a0e get rid of the ctdb_vnn_list structure and just use a single list of
ctdb_vnn

(This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)
2007-09-04 18:20:29 +10:00
Ronnie Sahlberg
cf45c5096c we cant have takeover_ctx hanging off ctdb since it is freed/recreated
everytime we release an ip.
this context is used to hold all resources needed when sending out 
gratious arps and tcp tickles during ip takeover.

we hang it off the vnn structure that manages that particular ip address 
instead   so that we can have multiple ones going in parallell

this bug (or the same bug in different shape) has probably been in ctdb 
for very very long   but is likely to be hard to trigger

(This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)
2007-09-04 14:36:52 +10:00
Ronnie Sahlberg
d66d9cdd22 change debug output from vnn to pnn
change ctdb_daemon_send_message to take pnn as parameter isntead of vnn

(This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)
2007-09-04 10:45:41 +10:00
Ronnie Sahlberg
0c91261340 change ctdb_send_message to take pnn as parameter instead of vnn
(This used to be ctdb commit 93dd4fba2e0fa6a011d15406652836785a974880)
2007-09-04 10:42:20 +10:00