1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-06 13:18:07 +03:00
Commit Graph

258 Commits

Author SHA1 Message Date
Martin Schwenke
6808b0aa6a ctdb-daemon: Drop interface monitoring
This is done by 10.interace where the monitor event fails when there
is a missing interface.  The in-daemon interface checking adds no
value.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:14 +02:00
Martin Schwenke
81e526965c ctdb-daemon: New control CTDB_CONTROL_GET_NODES_FILE
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.

Also new client function ctdb_ctrl_getnodesfile()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a5be2c245d ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings
Every time a nodemap is contructed the node IP addresses all need to
be parsed.  This isn't very productive use of CPU.

Instead, parse each string once when the nodes file is loaded.  This
results in much simpler code.

This code also removes the use of ctdb_address.  Duplicating the port
is pointless without an abstraction layer around ctdb_address.  If
CTDB gets an incompatible transport in the future then add an
abstraction layer.

Note that the infiniband code is not updated.  Compilation of the
infiniband code is already broken.  Fixing it will be a separate,
properly tested effort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
876529054a ctdb-daemon: Set node PNN in one place
This is currently set in 2 places.  One of them makes the node loading
code difficult to refactor.  Also, when the surrounding code in either
place is touched then it might get broken.

This only needs to be done once at startup, not on every reload.  So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
db6385afe9 ctdb-daemon: Move VNN map initialisation out of node loading
Each node reload unnecessarily and incorrectly resets the VNN map,
causing a potentially unnecessary recovery.  When nodes are reloaded
any newly deleted nodes should already be disconnected and any newly
added nodes should also be disconnected.  This means that reloading
the nodes file should not cause a change in the VNN map.

The current implementation also leaks memory every time the nodes are
reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Amitay Isaacs
a54db687ac ctdb: Rename CTDB_VERSION to CTDB_PROTOCOL
CTDB_VERSION really is the ctdb protocol version.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2014-10-28 05:42:05 +01:00
Martin Schwenke
2974554356 ctdb-logging: Replace logd code with a basic syslog(3) implementation
It is much simpler for most cases to have a syslog backend that
doesn't need a separate CTDB-specific logging daemon.  This loses the
lossy, non-blocking mode provided by logd.  However, a corresponding
feature with a completely different implemention (not requiring an
extra daemon) will be re-added into the syslog backend.  In an ideal
world the new implementation would be added first but unfortunately
that is hard to do because the logd code is hooked in at more than one
place.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Amitay Isaacs
f5f11e1a05 ctdb-daemon: Decrement pending calls statistics when calls are deferred
Deferred calls should not be treated as pending calls since they are
re-processed from the beginning.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-09-12 08:46:14 +02:00
Amitay Isaacs
d410b20601 ctdb-daemon: Make sure ctdb runs with real-time priority
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-09-12 08:46:14 +02:00
Amitay Isaacs
e6127a9ece ctdb-daemon: Increment pending calls statistics correctly
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-09-11 09:04:11 +02:00
Martin Schwenke
acf26089f1 ctdb-util: Rename db_wrap to tdb_wrap and make it a build subsystem
This makes it consistent with Samba, to ease transition.

Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.

There are some potential whitespace fixes in this commit that have
been ignored.  CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-09-10 01:36:15 +02:00
Martin Schwenke
a81dccf7ad ctdb-daemon: Move some inline declarations to header file
To avoid warnings when using --enable-developer, which uses
-Wmissing-prototypes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-09-10 01:36:14 +02:00
Amitay Isaacs
deb7bb89b3 ctdb-daemon: Remove duplicate code with refactored function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-09-05 07:05:10 +02:00
Martin Schwenke
1677dd499c ctdb-daemon: Remove ctdbd_pid global variable
This duplicates ctdb->ctdbd_pid.

Thanks to Sumit Bose <sbose@redhat.com> for the suggestion.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-05 06:51:13 +02:00
Martin Schwenke
e454e5ac9c ctdb-daemon: Check PID in ctdb_remove_pidfile(), not unreliable flag
If something unexpectedly uses fork() then an exiting child will
remove the PID file while the main daemon is still running.  The real
test is whether the current process has the PID of the main CTDB
daemon, which is the process that calls setsid().

This could be done using getpgrp() instead.  At the moment the
eventscript handler harmlessly calls setpgid() - harmless because the
atexit() handlers are cleared upon exec().  However, it is possible
that process groups will be used more in future so it is probably
better to rely on the session ID.

Thanks to Sumit Bose <sbose@redhat.com> for the idea.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-05 06:51:13 +02:00
Martin Schwenke
c7b3be97d9 ctdb-daemon: Exit if setting the session ID fails
Currently ctdbd_wrapper depends on the session ID.  Very soon PID file
removal will too.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-05 06:51:13 +02:00
Amitay Isaacs
8c8ef5640e ctdb-daemon: Rename ctdb_lockdown_memory to lockdown_memory
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-06-12 05:40:10 +02:00
Amitay Isaacs
22f71579a4 ctdb-daemon: Instead of passing ctdb context, pass valgrinding boolean
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-06-12 05:40:10 +02:00
Amitay Isaacs
d09f8134c1 ctdb-daemon: Rename block_signal to ignore_signal
This function does not block signals, but ignores them.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-06-12 05:40:10 +02:00
Amitay Isaacs
3a9d375328 ctdb-common: Drop ctdb prefix from utility functions independent of ctdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-06-12 05:40:10 +02:00
Amitay Isaacs
5b580e5d65 ctdb-common: Changing scheduler policy does not require ctdb context
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-06-12 05:40:10 +02:00
Martin Schwenke
e6304d1e1a ctdb/daemon: Untangle serialisation of 1st recovery -> startup -> monitor
At the moment ctdb_check_healthy() is overloaded to wait until the
first recovery is complete, handle the "startup" event and also
actually handle monitoring.  This is untidy and hard to follow.

Instead, have the daemon explicitly wait for 1st recovery after the
"setup" event.  When first recovery is complete, schedule a function
to handle the "startup" event.  When the "startup" event succeeds then
explicitly enable monitoring.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-01-17 17:59:41 +11:00
Amitay Isaacs
7aa20ccb5c ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER
This was added to support external monitoring using CTDB event scripts.
However, it was never used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 11:41:12 +11:00
Amitay Isaacs
6d1b74f052 ctdb-server: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-19 17:13:03 +01:00
Martin Schwenke
e782b61732 ctdbd: Pass the public address file location in ctdb context
No need to pass it as an extra argument to ctdb_start_daemon.

Also ensure options.public_address_list gets a nice static default.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a3d63a9db89d08bb284b3b3a6db773422f21b477)
2013-10-22 15:37:54 +11:00
Amitay Isaacs
be33efa3e4 ctdbd: Remove transaction code related to TRANS2 commits
This removes data types and structure elements related to TRANS2
persistent transaction code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654)
2013-10-04 15:20:25 +10:00
Martin Schwenke
88ba32b787 ctdbd: Sleep at exit to allow time for log messages to flush
Register print_exit_message() earlier so that it covers most of the
early exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 90d792cf28d6a823141e4c417b6978f02a9cf596)
2013-07-19 15:40:59 +10:00
Martin Schwenke
84f5528d9b ctdbd: Exit if something is already listening on CTDB socket
Don't blindly remove the socket.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58)
2013-07-19 15:40:43 +10:00
Sumit Bose
157f1cfefd Fixes for various issues found by Coverity
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952)
2013-07-11 15:16:55 +10:00
Martin Schwenke
9c8cc863f7 ctdbd: Use ctdb_die() on "setup" event failure
This is slightly easier to read because it all fits on 1 line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 035bf3eecf99337c84d4ad16cdbf297b1fa037db)
2013-07-05 15:52:33 +10:00
Martin Schwenke
c327c91490 ctdbd: Avoid a core dump when "init" event fails
The "init" event only really fails in the scripts, which should log
something useful on failure.  Therefore, a core dump isn't terribly
useful and sometimes attracts unwanted attention.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3af2d833b63af9931792106db71797f3692669a8)
2013-07-05 15:52:33 +10:00
Martin Schwenke
44e885e98e ctdbd: Fix panic on overlapping shutdowns
The runstate can't be set to SHUTDOWN twice, so the current naive code
causes a panic on the 2nd shutdown.  This regression was introduced in
commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458)
2013-06-22 15:51:16 +10:00
Martin Schwenke
6a52a87028 ctdbd: Refactor shutdown sequence
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b32fd04bfbf33062d45365b37a7247e272a76ceb)
2013-06-22 15:51:02 +10:00
Martin Schwenke
26d0746b5d ctdbd: "init" event should run earlier in daemon initialisation
It should run before:

* the transport is started;
* databases are attached; and
* processing configuration files (e.g. nodes, public_addresses).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0a0c8543f167e11b75a622513367b083e42cbd3f)
2013-06-20 13:01:09 +10:00
Mathieu Parent
d82b9ae410 build: Fix tdb.h path to enable building with system TDB library
(This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)
2013-06-14 16:45:27 +10:00
Martin Schwenke
94b0e8dfeb ctdbd: When the "setup" event fails log an error and exit, don't abort
The "setup" event can fail when one of the eventscripts fails to run
its "setup" event.  If this occurs then the eventscript should log an
error.  The stack trace and core file generated when we abort provides
no useful information.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c50eca6fbf49a6c7bf50905334704f8d2d3237d7)
2013-05-24 14:08:07 +10:00
Martin Schwenke
6d9667f01c ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).

Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
2013-05-24 14:08:07 +10:00
Martin Schwenke
147f6bb4b8 ctdbd: Start logging process earlier
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f43fe3a560d5915c1a9893256f4e7bfe3d7e290a)
2013-05-24 14:08:07 +10:00
Martin Schwenke
0e678a73b8 ctdbd: Only start recovery daemon and timed events after setup event
This deconstructs ctdb_start_transport(), which did much more than
starting the transport.

This removes a very unlikely race and adds some clarity.  The setup
event is supposed to set the tunables before the first recovery.
However, there was nothing stopping the first recovery from starting
before the setup event had completed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c31feb27dcdb748b5333321c85fe54852dfa1bcf)
2013-05-24 14:08:06 +10:00
Martin Schwenke
63577c96db ctdbd: Replace ctdb->done_startup with ctdb->runstate
This allows states, including startup and shutdown states, to be
clearly tracked.  This doesn't include regular runtime "states", which
are handled by node flags.

Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
2013-05-24 14:08:06 +10:00
Amitay Isaacs
7ee9e22a09 ctdbd: Print version string in the daemon startup
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9d4524d13cbba21bfaf61bd35667984359b379b3)
2013-05-23 16:18:23 +10:00
Martin Schwenke
3769368a99 ctdbd: Log CTDB startup before creating the PID file
Otherwise the messages are in a stupid order...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit cd87ba85fc6c375758c7d3dfa8dbd4d8a02074b0)
2013-05-06 15:40:30 +10:00
Michael Adam
666985bc3a ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c7eab97c7a939710b73aae2d75b404b235a998f5)
2013-04-24 18:49:03 +10:00
Amitay Isaacs
016522fe29 ctdbd: Set num_clients statistic from ctdb->num_clients
This fixes the problem of "ctdb statisticsreset" clearing the number of
clients even when there are active clients.

Values returned in statistics for frozen, recovering, memory_used are based on
the current state of CTDB and are not maintained as statistics.  This should
include num_clients as well.

Currently ctdb->num_clients is unused. So use that to track the number of
clients and fill in statistics field only when requested.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit dc4ca816630ed44b419108da53421331243fb8c7)
2013-04-22 14:00:51 +10:00
Martin Schwenke
3471807875 ctdbd: Log PID file creation and removal at NOTICE level
Unexpected removal of this file can have serious consequences, so it
is best if this is logged at the default level.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bfed6a8d1771db3401d12b819204736c33acb312)
2013-04-22 13:58:36 +10:00
Martin Schwenke
dcf1ac34ab ctdbd: Add --pidfile option
Default is not to create a pid file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 996e74d3db0c50f91b320af8ab7c43ea6b1136af)
2013-04-18 13:21:59 +10:00
Amitay Isaacs
5d7efb4cf1 ctdbd: Add an index db for message list for faster searches
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers.  Using a hash based index significantly improves the
performance of search in a linked list.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 3e09f25d419635f6dd679b48fa65370f7860be7d)
2013-03-06 15:32:33 +11:00
Martin Schwenke
a0c88ec816 ctdbd: Message logged at exit should be different for different processes
Some subprocesses print "CTDB daemon shutting down" when they exit and
this can be confusing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f1ffe1112b7e342d7f1228ca816a8e5918f893cf)
2013-02-05 16:03:41 +11:00
Amitay Isaacs
9eeb94c5c0 daemon: Update the comment and remove redundant check in ctdb_start_transport()
ctdb_start_transport() is called just before "setup" event, when CTDB
is ready to process the requests. "startup" event happens much later
after a successful recovery.

Transport method ctdb->methods is successfully initialized before
ctdb_start_transport() is called.  No need to check again.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9a70a4d23d00f6cb996c061ba3dfb7c47b4f6a4f)
2013-01-09 13:18:33 +11:00
Martin Schwenke
6fbd3ea2c2 ctdbd: Initialise the node flags in just one place
Currently flags are initialised in 2 places.  One of them is in
ctdb_tcp_listen_automatic(), which just seems wrong.  This makes the
code easier to follow by just doing it in ctdb_start_daemon().

This means that the flags are now initialised later than previously.
However, it is still done before the transport is started and before
clients can connect.

In future it might make sense to do a similar thing with setting the
PNN.  However, the current optimisation is reasonably obvious...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2bbee8ac23ad5b7adf7122d8c91d5f0d54582507)
2013-01-07 10:35:39 +11:00
Amitay Isaacs
c4236ec8fb ctdbd: Return explicit boolean values for function returning bool
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 3de2830ae68241ee95bcc14dc1bb896ff18d86ce)
2012-07-16 12:12:05 +10:00
Martin Schwenke
55be3c1239 Reimplement logging of long running events
Reimplement 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00 using tevent
trace points.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 98e1b46adba11b9549b5c5976e1f561fe732fa6e)
2012-06-12 16:10:01 +10:00
Amitay Isaacs
7631830152 server: Replace BOOL datatype with bool, True/False with true/false
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)
2012-05-28 11:22:25 +10:00
Ronnie Sahlberg
a57eba2bb4 Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.

Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a

(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
2012-05-03 14:03:26 +10:00
Ronnie Sahlberg
a367fa6138 RELOADIPS: simplify the reloadips code a bit
and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it
from spamming the logs with "We already host ..."
messages

(This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)
2012-05-01 15:34:26 +10:00
Ronnie Sahlberg
7a1aa560e7 Add new control to reload the public ip address file on a node
Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster.
Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy.

(This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)
2012-05-01 10:48:08 +10:00
Amitay Isaacs
4392591555 Remove explicit include of lib/tevent/tevent.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)
2012-04-13 17:28:14 +10:00
Ronnie Sahlberg
c051f67d67 FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records
(This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)
2012-03-20 11:39:00 +11:00
Volker Lendecke
cb44ebbc95 Make CTDB_CURRENT_NODE work with CTDB_REQ_MESSAGE
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit be8a153346ca7d40f09a6d03aad655aaa5c4a903)
2012-02-13 15:50:56 +01:00
Ronnie Sahlberg
73f8be16c6 ReadOnly: add per-database statistics to view how much delegations/revokes we have
(This used to be ctdb commit 751ed46197661eb841042ab6a02855a51dd0b17c)
2012-02-08 15:29:27 +11:00
Ronnie Sahlberg
1eafa68f0f STATISTICS: add total counts for number of delegations and number of revokes
Everytime we give a delegation to another node we count this as one delegation.
If the same record is delegated to several nodes we count one for each node.

Everytime a record has all its delegations revoked we count this as one revoke.

(This used to be ctdb commit b098bcf8007be63889aaed640a951b0eeaa9d191)
2012-02-08 13:42:30 +11:00
Ronnie Sahlberg
65e1e1d3ef Niceify the readonlyrecord API. Dont force clients to be exposed to the featch_with_header function
We dont strictly need to force clients to use CTDB_FETCH_WITH_HEADER instead of CTDB_FETCH when they ask for readonly records.
Have ctdbd internally remap this internally to FETCH_WITH_HEADER and map the reply back to CTDB_FETCH_FUNC or CTDB_FETCH_WITH_HEADER_FUNC based on what the client initially asked for.

This removes the need for the client to know about the CTDB_FETCH_WITH_HEADER_FUNC function and simplifies the client code.
Clients that do not care what the header after the request is can just continue using the old CTDB_FETCH_FUNC call and ctdbd will do all the difficult stuff.

(This used to be ctdb commit 444a7bac4e9a854b06c1ad4cb36c2b58a72001fa)
2012-01-31 17:20:35 +11:00
Mathieu Parent
bb3d6698e9 Move platform-specific code to common/system_*
This removes #ifdef AIX and ease the addition of new platforms.

(This used to be ctdb commit 2fd1067a075fe0e4b2a36d4ea18af139d03f17bf)
2011-12-06 11:57:11 +11:00
Mathieu Parent
4ce585cd0e Remove zero-length gnu_printf format string in ctdb_daemon.c (gcc warning)
server/ctdbd.c: In function ‘main’:
server/ctdb_daemon.c:943:7: warning: zero-length gnu_printf format string [-Wformat-zero-length]

(This used to be ctdb commit e6d1dd3ec4a078e5f32bc52a4a9e4b7d9a2e2d16)
2011-12-06 11:56:18 +11:00
Volker Lendecke
5a1da0ac55 Add CTDB_CONTROL_CHECK_SRVID
(This used to be ctdb commit ad64ef2c40a2a12b37dbf39142e95c6781c2fc3b)
2011-11-30 09:02:26 +11:00
Ronnie Sahlberg
0e79b2d1e8 Record Fetch Collapse: Collapse multiple fetch request into one single request.
When multiple clients fetch the same record concurrently, send only one single
fetch across the network and deferr all other fetches locally.
This improves performance for hot records and reduces cpu load on ctdb.

(This used to be ctdb commit 82d6946ad8b3348e8b9d3d971f24925ade02d1be)
2011-11-08 16:08:28 +11:00
Ronnie Sahlberg
9729d3e339 ReadOnly: Check the readonly flag instead of whether the tdb pointer is NULL or not
(This used to be ctdb commit 01314c2cb3a480917d6a632b83c39f0a48bba0e7)
2011-08-23 10:41:52 +10:00
Ronnie Sahlberg
de7c3de0a2 ReadOnly: clear out the tracking record once a revoke is completed
(This used to be ctdb commit 7af255551f058d1f6bfdd38ca603e7a19d1bb7ba)
2011-08-23 10:35:56 +10:00
Ronnie Sahlberg
38e8964910 ReadOnly: Add handlign of readonly requests readwrite requests, delegations and revoking of delegation to the processing loop for CALL requests coming in from a local client via domain socket
(This used to be ctdb commit e7cbf5b5d03cc26a73a92066a651f8eab73624b8)
2011-08-23 10:32:42 +10:00
Michael Adam
4cca8876e2 daemon: fill ctdb->ctdbd_pid early
(This used to be ctdb commit 3da1e2e30bf34622f08e6ecd5b8fe55684e5007a)
2011-03-14 13:35:50 +01:00
Michael Adam
ee44c23cd5 daemon: correctly end a running trans3_commit if the client disconnects.
(This used to be ctdb commit 9e0898db6df52d9bc799dd87bfea8c72d5f70ba0)
2011-02-24 10:35:25 +01:00
Ronnie Sahlberg
d903473d82 We can not always rely on the recovery daemon pinging us in a timely manner
so we need a "ticker" in the main ctdbd daemon too to ensure we get at least one event to process every second.

This will improve the accuracy of "Time jumped" messages and remove false positives when the recovery daemon is "slow".

(This used to be ctdb commit 70154e5e19e219de086b2995d41e8f6e069ee20d)
2011-01-14 09:47:44 +11:00
Ronnie Sahlberg
ea0df6d882 Revert scheduling back to use real-time processes
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d

We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.

(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
2011-01-11 07:40:35 +11:00
Ronnie Sahlberg
83e68b62dd delay loading the public ip address file until after we have started the transport and discovered ouw own pnn number
(This used to be ctdb commit 1b57fc866fc836b5dbd3ef7b646e5a0f4280e81e)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
5f76f3c0e2 Add a new tunable : DisableIPFailover that when set to non 0
will stopp any ip reallocations at all from happening.

(This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
db8cb31d8b during shutdown there is a window after we have stopped TCP and disconnected from all other nodes but before we have stopped all processing.
During this window we may still hit asynchronous events that will fail because we can not send/receive packets from other nodes.

These messages are logged as ... Transport is DOWN. To help indicate that they are benign messages related to the process of shutting down.

These messages spam the syslog during normal shutdown, so this patch will drop the loglevel of these messages to DEBUG, so that they will not appear in or spam the syslog.

(This used to be ctdb commit 8275d265d2ae19b765e30ecf18f6b6319b6e6453)
2010-10-28 13:41:08 +11:00
Ronnie Sahlberg
5ef29f9f25 Update latency countes to show min/max and average
(This used to be ctdb commit 1919e949af4641ffe919123e44b02fb87c13ab9f)
2010-10-11 15:12:24 +11:00
Ronnie Sahlberg
9f66a93f12 Add rolling statistics that are collected across 10 second intervals.
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.

(This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)
2010-09-29 12:14:45 +10:00
Ronnie Sahlberg
39c367a68f Create macros to update the statistics counters and use these macros
everywhere instead of manipulating the coutenrs directly.

(This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7)
2010-09-29 12:14:24 +10:00
Ronnie Sahlberg
c6e20a06c7 set up a handler to catch and log debug messages from the tevent layer
(This used to be ctdb commit fdb4c02f595fa207310a9a48da3fefd653fa9e4b)
2010-09-28 08:30:26 +10:00
Ronnie Sahlberg
ac335e3e5d run the "init" event before we freeze the databases
so that we can read from databases during this event

(This used to be ctdb commit 6c93bf5a1219617bfb39b093aee3200c74c2c61a)
2010-08-25 08:35:24 +10:00
Ronnie Sahlberg
e8ffb0d8a4 We use eventloop nesting in a couple of places, notably the sync
parts of the recovery daemon.

Initialize all event contexts to allow nesting

(This used to be ctdb commit 5bf6bd5e7f33aabbeb7b9707716ef99cf471e590)
2010-08-18 10:11:59 +10:00
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Rusty Russell
7061ceffd8 Report client for queue errors.
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them.  Add a name for each queue, and print nread.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
2010-07-01 23:08:49 +10:00
Ronnie Sahlberg
fa618aa66a add additional logging when tdb_chainunlock() fails
so we can see where it was called from when it fails

(This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)
2010-06-09 14:37:16 +10:00
Rusty Russell
d5f6026a22 libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.

ctdb.h will be the new, shiny, libctdb API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
2010-05-20 15:18:30 +09:30
Ronnie Sahlberg
eeeb89e3e2 Reduce the loglevel for two log messages for Registering and Deregistering server ids.
BZ61890

(This used to be ctdb commit 944434eb6420774e42e58984c6ddaa326a6853bd)
2010-03-30 11:57:25 +11:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Andrew Tridgell
f23b82b58c ctdb: when we fill the client packet queue we need to drop the client
We can't just drop packets to the list, as those packets could be part
of the core protocol the client is using. This happens (for example)
when Samba is doing a traverse. If we drop a traverse packet then
Samba hangs indefinately. We are better off dropping the ctdb socket
to Samba.

(This used to be ctdb commit a7a86dafa4d88a6bbc6a71b77ed79a178fd802a6)
2010-02-04 15:37:59 +11:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
af2613e16f ctdb: use mlockall, cautiously
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays.  But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.

This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB.  We warn, but don't exit, if it fails.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
2009-12-16 20:57:20 +10:30
Rusty Russell
c488ba440a Remove RT priority, use niceness.
1) It's buggy.  Code needs to be carefully written (ie. no busy
   loops) to handle running with it, and we fork and run scripts.[1]

2) It makes debugging harder.  If ctdbd loops (as has happened recently)
   it can be extremely hard to get in and see what's happening.  We've already
   seen the valgrind hacks.

3) We have seen recent scheduler problems.  Perhaps they are unrelated,
   but removing this very unusual setup is unlikely to hurt.

4) It doesn't make anything faster.  Under all but the most perverse of
   circumstances, 99% of the cpu gives the same performance as 100%, and
   we will always preempt normal processes anyway.

[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
    each script" by removing the switch_from_server_to_client() which
    restored it, but even that was only for monitor scripts.  Others were
    run with RT priority.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
2009-12-16 19:26:22 +10:30
Stefan Metzmacher
94bc40307a server: Use tdb_check to verify persistent tdbs on startup
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.

The default is 0 which means to reject a startup with
unhealthy dbs.

The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.

Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".

metze

(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
2009-12-16 08:06:10 +01:00
Stefan Metzmacher
9a96ae0c97 server: only do the mkdir() calls for db_directory* once at the start
metze

(This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)
2009-12-16 08:03:56 +01:00
Volker Lendecke
a0d9bd3c13 Run only one event for each epoll_wait/select call
This might be a bit less efficient, but experience in winbind has shown that
event callbacks can trigger changes in the socket state in very hard to
diagnose ways.

(This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)
2009-12-10 07:52:16 +11:00
Ronnie Sahlberg
fab11acc65 lower the loglevel for the message that a client has attached through a domian socket
(This used to be ctdb commit de9e5236b20d70eac5ed29991703d6d25a103963)
2009-12-02 14:51:57 +11:00
Ronnie Sahlberg
6bad4a4836 Add a proper function to process a process-exist control in the daemon.
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.

If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.

bz58185

(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
2009-12-02 13:58:27 +11:00
Ronnie Sahlberg
1c7de7a2ed Add a double linked list to the ctdb_context to store a mapping between client pids and client structures.
Add the mapping to the list everytime we accept() a new client connection
and set it up to remove in the destructor when the client structure is freed.

(This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)
2009-12-02 13:41:04 +11:00
Ronnie Sahlberg
bf27dc2d53 Use the PID we pick up from the domain socket when a client connects
and store this in the client structure.

There is no need to rely on the hack that samba sends some special message
handle registrations that encodes the pid in the srvid any more.

This might not work on AIX since I recall some issues to get the pid in
this way on that platform.

(This used to be ctdb commit b4a7efa7e53e060a91dea0e8e57b116e2aeacebf)
2009-12-02 13:17:12 +11:00
Ronnie Sahlberg
e33722a569 start the syslog child a little later, after we have forked and detached from the local shell
(This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)
2009-10-30 19:39:11 +11:00
Ronnie Sahlberg
4d40b86805 for debugging
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon

(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
2009-10-27 13:18:52 +11:00
Ronnie Sahlberg
86d1b4c465 Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects.
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
   A special prefix for the srvid space has been set aside for samba :
   Only samba is allowed to use srvid's with the top 32 bits set like this.
   The lower 32 bits are for samba to use internally.

2, register a "notification" using the new control :
                    CTDB_CONTROL_REGISTER_NOTIFY         = 114,
   This control takes as indata a structure like this :
struct ctdb_client_notify_register {
        uint64_t srvid;
        uint32_t len;
        uint8_t notify_data[1];
};

srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.

If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.

A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".

3, a client that no longer wants to have a notification set up can deregister using control
                    CTDB_CONTROL_DEREGISTER_NOTIFY       = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
        uint64_t srvid;
};

When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.

(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
2009-10-23 15:24:51 +11:00
Ronnie Sahlberg
a92ba7f729 lower the debug levels for the "create FD messages" so we dont fill up the logs.
(This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)
2009-10-21 15:26:24 +11:00
Ronnie Sahlberg
9b8c72c446 When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.

    This avoids having queued up very very large number of MESSAGES that samba semds
     between eachother to nodes that are blocked/banned/stopped for extended periods
    .

(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
2009-10-21 15:20:55 +11:00
Ronnie Sahlberg
9de3652380 add logging everytime we create a filedescriptor in the main ctdb daemon
so we can spot if there are leaks.

plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish

(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
2009-10-15 11:24:54 +11:00
Ronnie Sahlberg
73c0adb029 initial attempt at freezing databases in priority order
(This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)
2009-10-12 12:08:39 +11:00
Michael Adam
4cd06a330e Fix persistent transaction commit race condition.
In ctdb_client.c:ctdb_transaction_commit(), after a failed
TRANS2_COMMIT control call (for instance due to the 1-second
being exceeded waiting for a busy node's reply), there is a
1-second gap between the transaction_cancel() and
replay_transaction() calls in which there is no lock on the
persistent db. And due to the lack of global state
indicating that a transaction is in progress in ctdbd, other nodes
may succeed to start transactions on the db in this gap and
even worse work on top of the possibly already pushed changes.
So the data diverges on the several nodes.

This change fixes this by introducing global state for a transaction
commit being active in the ctdb_db_context struct and in a db_id field
in the client so that a client keeps track of _which_ tdb it as
transaction commit running on. These data are set by ctdb upon
entering the trans2_commit control and they are cleared in the
trans2_error or trans2_finished controls. This makes it impossible
to start a nother transaction or migrate a record to a different
node while a transaction is active on a persistent tdb, including
the retry loop.

This approach is dead lock free and still allows recovery process
to be started in the retry-gap between cancel and replay.
Also note, that this solution does not require any change in the
client side.

This was debugged and developed together with
Stefan Metzmacher <metze@samba.org> - thanks!

Michael

(This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)
2009-07-29 11:12:39 +10:00
Ronnie Sahlberg
d4b30b34aa dont even try to send a message from the main daemon if the transport is down
(This used to be ctdb commit 9a2c4c3ed09ac9ea781d06999d11e5c3b5b4a97a)
2009-06-30 12:09:28 +10:00
Ronnie Sahlberg
4259156050 dont remove the socket when the dameon stops. This can race if the
service is immediately restarted

(This used to be ctdb commit b18356764cd49d934eab901e596bb75c6e3ecdf8)
2009-05-29 18:16:13 +10:00
Sumit Bose
2fcedf6dac add missing checks on so far ignored return values
Most of these were found during a review by Jim Meyering <meyering@redhat.com>

(This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)
2009-05-21 11:22:21 +10:00
root
bfea570af4 when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0
Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero   and when the client disconnects we decrement it to a negative number.

this is a pure cosmetic patch with no operational impact to ctdb

(This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)
2009-05-01 12:30:26 +10:00
Ronnie Sahlberg
e5e2f6f8f7 increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell)
(This used to be ctdb commit 018f8b0b1823ef59b46f1a671aec5309d10628f4)
2009-04-06 14:00:41 +10:00
Ronnie Sahlberg
94a56ea410 reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.

(This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)
2008-11-20 12:43:18 +11:00
Ronnie Sahlberg
06728fdac9 we actually need a ctdb_db variable
(This used to be ctdb commit aba984f1b85f5a2d370b093061cf15843ee53758)
2008-11-03 21:54:52 +11:00
Ronnie Sahlberg
d7007793ea latency is measured in us, not ms
use an explicit ctdb_db variable instead of dereferencing state

(This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)
2008-10-30 13:34:10 +11:00
Ronnie Sahlberg
e1b0cea427 add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold

(This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)
2008-10-30 12:49:53 +11:00
Ronnie Sahlberg
6474f3278d additional monitoring between the two daemons.
we currently only monitor that the dameons are running by kill(0, pid)
and verifying the the domain socket between them is ok.

this is not sufficient since we can have a situation where the recovery
daemon is hung.

this new code monitors that the recovery daemon is operating.
if the recovery hangs, we log this and shut down the main daemon

(This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)
2008-09-09 13:44:46 +10:00
Ronnie Sahlberg
ef997d344f initial ipv6 patch
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)
2008-08-19 14:58:29 +10:00
Ronnie Sahlberg
8b520bcb5f lower a debug message
(This used to be ctdb commit 554dcf16d37c8b9e4704df11d21fb272f30f5cec)
2008-07-18 10:38:51 +10:00
Ronnie Sahlberg
6eb4e46fe1 Add two new controls to start and cancel a persistent update.
This allows ctdb to automatically start a new full blown recovery
if a client has started updating the local tdb for a persistent database
but is kill -9ed before it has ensured the update is distributed clusterwide.

(This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)
2008-07-17 13:50:55 +10:00
Ronnie Sahlberg
334db8ccba proper waitpid() fix.
remove all waitpid() calls and use the event system to trap sigchld

(This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358)
2008-07-09 14:02:54 +10:00
Ronnie Sahlberg
522830dea8 Revert "waitpid() can block if it takes a long time before the child terminates"
This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.

revert the waitpid changes.   we need to waitpid for some childredn so should
refactor the approach completely

(This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)
2008-07-08 17:41:31 +10:00
Ronnie Sahlberg
79425ddec5 Revert "set sigchild to SIG_IGN instead of SIG_DFL"
This reverts commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76.

(This used to be ctdb commit 2030e9ff2ca044181b72c3b87d513bf27057b5a2)
2008-07-08 17:40:53 +10:00
Ronnie Sahlberg
71d2315eee set sigchild to SIG_IGN instead of SIG_DFL
(This used to be ctdb commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76)
2008-07-08 16:31:23 +10:00
Ronnie Sahlberg
d67de4a7d2 waitpid() can block if it takes a long time before the child terminates
so we should not call it from the main daemon.

1, set SIGCHLD to SIG_DFL to make sure we ignore this signal

2, get rid of all waitpid() calls

3, change reporting of event script status code from _exit()/waitpid()   to write()/read() one byte across the pipe.

(This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)
2008-07-08 03:48:11 +10:00
Ronnie Sahlberg
adf40341a7 ctdb->methods becomes NULL when we shutdown the transport.
If we shutdown the transport   and CTDB later decides to send a command out
for queueing, the call to ctdb->methods->allocate_pkt() will SEGV.

This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the
"shutdown" eventscripts to finish.
If the event scripts now take much much longer to execute for some reason, this
race condition becomes much more probable.

Decorate all dereferencing of ctdb->methods->    with a check that ctdb->menthods is non-NULL

(This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7)
2008-05-11 14:28:33 +10:00
Ronnie Sahlberg
cd1858d126 fix compiler warning during a fatal error failing to lock down the socket
(This used to be ctdb commit 0ad22de1a614dc2d1926546027be5f5eea3381ed)
2008-04-10 09:56:49 +10:00
Ronnie Sahlberg
2da3fe1b17 From Chris Cowan
secure the domain socket and set permissions properly

(This used to be ctdb commit ac6a362fc2fc4a56b4c310478a96eb12daace176)
2008-04-10 06:51:53 +10:00
Ronnie Sahlberg
6b797f148c From Chris Cowan
Add support in AIX to track the PID of a client that connects to the unix domain socket

(This used to be ctdb commit 4c006c675d577d4a45f4db2929af6d50bc28dd9e)
2008-04-03 10:58:51 +11:00
Ronnie Sahlberg
03d30f405d decorate the memdump output with a nice field for ctdb_client structures to show the pid of the client that attached
(This used to be ctdb commit 0d9314302d0b988b6ab5d533deef40c5b343c249)
2008-04-01 17:17:21 +11:00
Ronnie Sahlberg
27a7f854f5 add improvements to tracking memory usage in ctdbd adn the recovery daemon
and a ctdb command to pull the talloc memory map from a recovery daemon
ctdb rddumpmemory

(This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)
2008-04-01 15:34:54 +11:00
Andrew Tridgell
f6e53f433b merge from ronnie
(This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c)
2008-02-04 20:07:15 +11:00
Andrew Tridgell
9d6ac0cf55 added debug constants to allow for better mapping to syslog levels
(This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)
2008-02-04 17:44:24 +11:00
Andrew Tridgell
b62b7fcde8 added syslog support, and use a pipe to catch logging from child processes to the ctdbd logging functions
(This used to be ctdb commit 1306b04cd01e996fd1aa1159a9521f2ff7b06165)
2008-01-16 22:03:01 +11:00
Andrew Tridgell
bf9e33d4cf - catch a case where the client disconnects during a call
- track all talloc memory, using NULL context

(This used to be ctdb commit bf89c56002f5311520e91cb367753bc46e5dddc9)
2008-01-16 09:44:48 +11:00
Ronnie Sahlberg
ba31feaec0 split node health monitoring and checking for connected/disconnected
nodes into two separate files.

move the monitoring of keepalives for detecting connected/disconnected 
remote nodes into ctdb_keepalive.c

(This used to be ctdb commit 23a57b20c314d5f11a433cf251eb9d9de743849a)
2008-01-15 08:42:12 +11:00
Andrew Tridgell
9311f7fb7e fixed the bug that make "onnode N service ctdb start" hang
(This used to be ctdb commit b50dcb16f30a60abce42f491f9b0aae7948b8206)
2008-01-05 12:09:29 +11:00
Andrew Tridgell
bde886988b prevent a deadly embrace between smbd and ctdbd by moving the calling
of the startup event scripts after the point where recovery has
started and the node is in normal operation

This makes the 'startup' script just a special type of the 'monitor'
script which is called first

(This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)
2007-11-12 10:53:11 +11:00
Andrew Tridgell
b87ddd9148 no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
2007-09-24 10:00:14 +10:00
Andrew Tridgell
c60988325d added support for persistent databases in ctdbd
(This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)
2007-09-21 12:24:02 +10:00
Andrew Tridgell
a478c78f03 changed some debug levels
(This used to be ctdb commit ed764533e1c2f8982e1577ca5e7f5f4482a15345)
2007-09-12 13:21:19 +10:00
Ronnie Sahlberg
d66d9cdd22 change debug output from vnn to pnn
change ctdb_daemon_send_message to take pnn as parameter isntead of vnn

(This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)
2007-09-04 10:45:41 +10:00
Ronnie Sahlberg
211b497818 change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn
change ctdb_ban_info.vnn to ctdb_ban_info.pnn

(This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)
2007-09-04 10:33:10 +10:00
Ronnie Sahlberg
583b6e6ba6 change ctdb_get_vnn to ctdb_get_pnn
(This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)
2007-09-04 10:18:44 +10:00
Ronnie Sahlberg
fc9d39c3a6 change ctdb_validate_vnn to ctdb_validate_pnn
(This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)
2007-09-04 10:09:58 +10:00
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
8b06fc7284 change the structure used for node flag change messages so that we can
see both the old flags as well as the new flags (so we can tell which 
flags changed)

send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to 
every node, connected or not, in the cluster.


in the handler inside the recovery daemon which is invoked for node flag 
change messages, only do a takeover_run() and redistribute the ip addresses IF it was the 
disabled or the unhealthy flags that changed. Also send out the cluster 
reconfigured message in this case.
If any of the other flags changed we dont need to do the takeover_run(0 
here since that will be done during recovery.

(This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)
2007-08-21 17:25:15 +10:00
Ronnie Sahlberg
5228abef64 add an atexit() that will print "CTDB daemon shutting down" in the log
when the main daemon exits

(This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)
2007-08-21 09:43:53 +10:00
Ronnie Sahlberg
aed2c58c64 dont pollute the log with 'Registered PID XXX for client YYY' at log
level 0.

change the log level to 3 for this information message

(This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)
2007-08-21 08:42:42 +10:00