samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-06 13:18:07 +03:00

Author	SHA1	Message	Date
Martin Schwenke	6808b0aa6a	ctdb-daemon: Drop interface monitoring This is done by 10.interace where the monitor event fails when there is a missing interface. The in-daemon interface checking adds no value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:14 +02:00
Martin Schwenke	81e526965c	ctdb-daemon: New control CTDB_CONTROL_GET_NODES_FILE This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file instead of the daemon. Also new client function ctdb_ctrl_getnodesfile() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-03-23 12:23:12 +01:00
Martin Schwenke	a5be2c245d	ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings Every time a nodemap is contructed the node IP addresses all need to be parsed. This isn't very productive use of CPU. Instead, parse each string once when the nodes file is loaded. This results in much simpler code. This code also removes the use of ctdb_address. Duplicating the port is pointless without an abstraction layer around ctdb_address. If CTDB gets an incompatible transport in the future then add an abstraction layer. Note that the infiniband code is not updated. Compilation of the infiniband code is already broken. Fixing it will be a separate, properly tested effort. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>	2015-03-23 12:23:12 +01:00
Martin Schwenke	876529054a	ctdb-daemon: Set node PNN in one place This is currently set in 2 places. One of them makes the node loading code difficult to refactor. Also, when the surrounding code in either place is touched then it might get broken. This only needs to be done once at startup, not on every reload. So do it once in a very obvious way, sacrificing a few CPU cycles for some added clarity. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-03-23 12:23:12 +01:00
Martin Schwenke	db6385afe9	ctdb-daemon: Move VNN map initialisation out of node loading Each node reload unnecessarily and incorrectly resets the VNN map, causing a potentially unnecessary recovery. When nodes are reloaded any newly deleted nodes should already be disconnected and any newly added nodes should also be disconnected. This means that reloading the nodes file should not cause a change in the VNN map. The current implementation also leaks memory every time the nodes are reloaded. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-03-23 12:23:12 +01:00
Amitay Isaacs	a54db687ac	ctdb: Rename CTDB_VERSION to CTDB_PROTOCOL CTDB_VERSION really is the ctdb protocol version. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>	2014-10-28 05:42:05 +01:00
Martin Schwenke	2974554356	ctdb-logging: Replace logd code with a basic syslog(3) implementation It is much simpler for most cases to have a syslog backend that doesn't need a separate CTDB-specific logging daemon. This loses the lossy, non-blocking mode provided by logd. However, a corresponding feature with a completely different implemention (not requiring an extra daemon) will be re-added into the syslog backend. In an ideal world the new implementation would be added first but unfortunately that is hard to do because the logd code is hooked in at more than one place. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-10-28 05:42:04 +01:00
Amitay Isaacs	f5f11e1a05	ctdb-daemon: Decrement pending calls statistics when calls are deferred Deferred calls should not be treated as pending calls since they are re-processed from the beginning. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-09-12 08:46:14 +02:00
Amitay Isaacs	d410b20601	ctdb-daemon: Make sure ctdb runs with real-time priority Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-09-12 08:46:14 +02:00
Amitay Isaacs	e6127a9ece	ctdb-daemon: Increment pending calls statistics correctly Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-09-11 09:04:11 +02:00
Martin Schwenke	acf26089f1	ctdb-util: Rename db_wrap to tdb_wrap and make it a build subsystem This makes it consistent with Samba, to ease transition. Update unit test code to link to with tdb_wrap instead of including db_wrap.c. There are some potential whitespace fixes in this commit that have been ignored. CTDB's lib/tdb_wrap will be deleted after the transition to Samba's lib/tdb_wrap, so there's no point polishing it too much. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-09-10 01:36:15 +02:00
Martin Schwenke	a81dccf7ad	ctdb-daemon: Move some inline declarations to header file To avoid warnings when using --enable-developer, which uses -Wmissing-prototypes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-09-10 01:36:14 +02:00
Amitay Isaacs	deb7bb89b3	ctdb-daemon: Remove duplicate code with refactored function Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-09-05 07:05:10 +02:00
Martin Schwenke	1677dd499c	ctdb-daemon: Remove ctdbd_pid global variable This duplicates ctdb->ctdbd_pid. Thanks to Sumit Bose <sbose@redhat.com> for the suggestion. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-07-05 06:51:13 +02:00
Martin Schwenke	e454e5ac9c	ctdb-daemon: Check PID in ctdb_remove_pidfile(), not unreliable flag If something unexpectedly uses fork() then an exiting child will remove the PID file while the main daemon is still running. The real test is whether the current process has the PID of the main CTDB daemon, which is the process that calls setsid(). This could be done using getpgrp() instead. At the moment the eventscript handler harmlessly calls setpgid() - harmless because the atexit() handlers are cleared upon exec(). However, it is possible that process groups will be used more in future so it is probably better to rely on the session ID. Thanks to Sumit Bose <sbose@redhat.com> for the idea. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-07-05 06:51:13 +02:00
Martin Schwenke	c7b3be97d9	ctdb-daemon: Exit if setting the session ID fails Currently ctdbd_wrapper depends on the session ID. Very soon PID file removal will too. :-) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-07-05 06:51:13 +02:00
Amitay Isaacs	8c8ef5640e	ctdb-daemon: Rename ctdb_lockdown_memory to lockdown_memory Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-06-12 05:40:10 +02:00
Amitay Isaacs	22f71579a4	ctdb-daemon: Instead of passing ctdb context, pass valgrinding boolean Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-06-12 05:40:10 +02:00
Amitay Isaacs	d09f8134c1	ctdb-daemon: Rename block_signal to ignore_signal This function does not block signals, but ignores them. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-06-12 05:40:10 +02:00
Amitay Isaacs	3a9d375328	ctdb-common: Drop ctdb prefix from utility functions independent of ctdb Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-06-12 05:40:10 +02:00
Amitay Isaacs	5b580e5d65	ctdb-common: Changing scheduler policy does not require ctdb context Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-06-12 05:40:10 +02:00
Martin Schwenke	e6304d1e1a	ctdb/daemon: Untangle serialisation of 1st recovery -> startup -> monitor At the moment ctdb_check_healthy() is overloaded to wait until the first recovery is complete, handle the "startup" event and also actually handle monitoring. This is untidy and hard to follow. Instead, have the daemon explicitly wait for 1st recovery after the "setup" event. When first recovery is complete, schedule a function to handle the "startup" event. When the "startup" event succeeds then explicitly enable monitoring. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-01-17 17:59:41 +11:00
Amitay Isaacs	7aa20ccb5c	ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER This was added to support external monitoring using CTDB event scripts. However, it was never used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-01-16 11:41:12 +11:00
Amitay Isaacs	6d1b74f052	ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>	2013-11-19 17:13:03 +01:00
Martin Schwenke	e782b61732	ctdbd: Pass the public address file location in ctdb context No need to pass it as an extra argument to ctdb_start_daemon. Also ensure options.public_address_list gets a nice static default. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a3d63a9db89d08bb284b3b3a6db773422f21b477)	2013-10-22 15:37:54 +11:00
Amitay Isaacs	be33efa3e4	ctdbd: Remove transaction code related to TRANS2 commits This removes data types and structure elements related to TRANS2 persistent transaction code. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654)	2013-10-04 15:20:25 +10:00
Martin Schwenke	88ba32b787	ctdbd: Sleep at exit to allow time for log messages to flush Register print_exit_message() earlier so that it covers most of the early exits. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 90d792cf28d6a823141e4c417b6978f02a9cf596)	2013-07-19 15:40:59 +10:00
Martin Schwenke	84f5528d9b	ctdbd: Exit if something is already listening on CTDB socket Don't blindly remove the socket. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58)	2013-07-19 15:40:43 +10:00
Sumit Bose	157f1cfefd	Fixes for various issues found by Coverity Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952)	2013-07-11 15:16:55 +10:00
Martin Schwenke	9c8cc863f7	ctdbd: Use ctdb_die() on "setup" event failure This is slightly easier to read because it all fits on 1 line. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 035bf3eecf99337c84d4ad16cdbf297b1fa037db)	2013-07-05 15:52:33 +10:00
Martin Schwenke	c327c91490	ctdbd: Avoid a core dump when "init" event fails The "init" event only really fails in the scripts, which should log something useful on failure. Therefore, a core dump isn't terribly useful and sometimes attracts unwanted attention. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3af2d833b63af9931792106db71797f3692669a8)	2013-07-05 15:52:33 +10:00
Martin Schwenke	44e885e98e	ctdbd: Fix panic on overlapping shutdowns The runstate can't be set to SHUTDOWN twice, so the current naive code causes a panic on the 2nd shutdown. This regression was introduced in commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458)	2013-06-22 15:51:16 +10:00
Martin Schwenke	6a52a87028	ctdbd: Refactor shutdown sequence Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b32fd04bfbf33062d45365b37a7247e272a76ceb)	2013-06-22 15:51:02 +10:00
Martin Schwenke	26d0746b5d	ctdbd: "init" event should run earlier in daemon initialisation It should run before: * the transport is started; * databases are attached; and * processing configuration files (e.g. nodes, public_addresses). Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0a0c8543f167e11b75a622513367b083e42cbd3f)	2013-06-20 13:01:09 +10:00
Mathieu Parent	d82b9ae410	build: Fix tdb.h path to enable building with system TDB library (This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)	2013-06-14 16:45:27 +10:00
Martin Schwenke	94b0e8dfeb	ctdbd: When the "setup" event fails log an error and exit, don't abort The "setup" event can fail when one of the eventscripts fails to run its "setup" event. If this occurs then the eventscript should log an error. The stack trace and core file generated when we abort provides no useful information. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c50eca6fbf49a6c7bf50905334704f8d2d3237d7)	2013-05-24 14:08:07 +10:00
Martin Schwenke	6d9667f01c	ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY This adds more serialisation to the startup, ensuring that the "startup" event runs after everything to do with the first recovery (including the "recovered" event). Given that it now takes longer to get to the "startup" state, the initscript needs to wait until ctdbd gets to "first_recovery". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)	2013-05-24 14:08:07 +10:00
Martin Schwenke	147f6bb4b8	ctdbd: Start logging process earlier Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f43fe3a560d5915c1a9893256f4e7bfe3d7e290a)	2013-05-24 14:08:07 +10:00
Martin Schwenke	0e678a73b8	ctdbd: Only start recovery daemon and timed events after setup event This deconstructs ctdb_start_transport(), which did much more than starting the transport. This removes a very unlikely race and adds some clarity. The setup event is supposed to set the tunables before the first recovery. However, there was nothing stopping the first recovery from starting before the setup event had completed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c31feb27dcdb748b5333321c85fe54852dfa1bcf)	2013-05-24 14:08:06 +10:00
Martin Schwenke	63577c96db	ctdbd: Replace ctdb->done_startup with ctdb->runstate This allows states, including startup and shutdown states, to be clearly tracked. This doesn't include regular runtime "states", which are handled by node flags. Introduce new functions ctdb_set_runstate(), runstate_to_string() and runstate_from_string(). Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)	2013-05-24 14:08:06 +10:00
Amitay Isaacs	7ee9e22a09	ctdbd: Print version string in the daemon startup Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 9d4524d13cbba21bfaf61bd35667984359b379b3)	2013-05-23 16:18:23 +10:00
Martin Schwenke	3769368a99	ctdbd: Log CTDB startup before creating the PID file Otherwise the messages are in a stupid order... :-) Signed-off-by: Martin Schwenke <martin@meltin.net> Reported-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit cd87ba85fc6c375758c7d3dfa8dbd4d8a02074b0)	2013-05-06 15:40:30 +10:00
Michael Adam	666985bc3a	ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c7eab97c7a939710b73aae2d75b404b235a998f5)	2013-04-24 18:49:03 +10:00
Amitay Isaacs	016522fe29	ctdbd: Set num_clients statistic from ctdb->num_clients This fixes the problem of "ctdb statisticsreset" clearing the number of clients even when there are active clients. Values returned in statistics for frozen, recovering, memory_used are based on the current state of CTDB and are not maintained as statistics. This should include num_clients as well. Currently ctdb->num_clients is unused. So use that to track the number of clients and fill in statistics field only when requested. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit dc4ca816630ed44b419108da53421331243fb8c7)	2013-04-22 14:00:51 +10:00
Martin Schwenke	3471807875	ctdbd: Log PID file creation and removal at NOTICE level Unexpected removal of this file can have serious consequences, so it is best if this is logged at the default level. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfed6a8d1771db3401d12b819204736c33acb312)	2013-04-22 13:58:36 +10:00
Martin Schwenke	dcf1ac34ab	ctdbd: Add --pidfile option Default is not to create a pid file. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 996e74d3db0c50f91b320af8ab7c43ea6b1136af)	2013-04-18 13:21:59 +10:00
Amitay Isaacs	5d7efb4cf1	ctdbd: Add an index db for message list for faster searches When CTDB is busy with lots of smbd, CTDB was spending too much time in daemon_check_srvids() which searches a list of srvids in the registered message handlers. Using a hash based index significantly improves the performance of search in a linked list. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 3e09f25d419635f6dd679b48fa65370f7860be7d)	2013-03-06 15:32:33 +11:00
Martin Schwenke	a0c88ec816	ctdbd: Message logged at exit should be different for different processes Some subprocesses print "CTDB daemon shutting down" when they exit and this can be confusing. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f1ffe1112b7e342d7f1228ca816a8e5918f893cf)	2013-02-05 16:03:41 +11:00
Amitay Isaacs	9eeb94c5c0	daemon: Update the comment and remove redundant check in ctdb_start_transport() ctdb_start_transport() is called just before "setup" event, when CTDB is ready to process the requests. "startup" event happens much later after a successful recovery. Transport method ctdb->methods is successfully initialized before ctdb_start_transport() is called. No need to check again. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 9a70a4d23d00f6cb996c061ba3dfb7c47b4f6a4f)	2013-01-09 13:18:33 +11:00
Martin Schwenke	6fbd3ea2c2	ctdbd: Initialise the node flags in just one place Currently flags are initialised in 2 places. One of them is in ctdb_tcp_listen_automatic(), which just seems wrong. This makes the code easier to follow by just doing it in ctdb_start_daemon(). This means that the flags are now initialised later than previously. However, it is still done before the transport is started and before clients can connect. In future it might make sense to do a similar thing with setting the PNN. However, the current optimisation is reasonably obvious... Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2bbee8ac23ad5b7adf7122d8c91d5f0d54582507)	2013-01-07 10:35:39 +11:00
Amitay Isaacs	c4236ec8fb	ctdbd: Return explicit boolean values for function returning bool Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 3de2830ae68241ee95bcc14dc1bb896ff18d86ce)	2012-07-16 12:12:05 +10:00
Martin Schwenke	55be3c1239	Reimplement logging of long running events Reimplement 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00 using tevent trace points. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 98e1b46adba11b9549b5c5976e1f561fe732fa6e)	2012-06-12 16:10:01 +10:00
Amitay Isaacs	7631830152	server: Replace BOOL datatype with bool, True/False with true/false Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)	2012-05-28 11:22:25 +10:00
Ronnie Sahlberg	a57eba2bb4	Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)	2012-05-03 14:03:26 +10:00
Ronnie Sahlberg	a367fa6138	RELOADIPS: simplify the reloadips code a bit and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it from spamming the logs with "We already host ..." messages (This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)	2012-05-01 15:34:26 +10:00
Ronnie Sahlberg	7a1aa560e7	Add new control to reload the public ip address file on a node Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster. Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy. (This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)	2012-05-01 10:48:08 +10:00
Amitay Isaacs	4392591555	Remove explicit include of lib/tevent/tevent.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)	2012-04-13 17:28:14 +10:00
Ronnie Sahlberg	c051f67d67	FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records (This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)	2012-03-20 11:39:00 +11:00
Volker Lendecke	cb44ebbc95	Make CTDB_CURRENT_NODE work with CTDB_REQ_MESSAGE Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit be8a153346ca7d40f09a6d03aad655aaa5c4a903)	2012-02-13 15:50:56 +01:00
Ronnie Sahlberg	73f8be16c6	ReadOnly: add per-database statistics to view how much delegations/revokes we have (This used to be ctdb commit 751ed46197661eb841042ab6a02855a51dd0b17c)	2012-02-08 15:29:27 +11:00
Ronnie Sahlberg	1eafa68f0f	STATISTICS: add total counts for number of delegations and number of revokes Everytime we give a delegation to another node we count this as one delegation. If the same record is delegated to several nodes we count one for each node. Everytime a record has all its delegations revoked we count this as one revoke. (This used to be ctdb commit b098bcf8007be63889aaed640a951b0eeaa9d191)	2012-02-08 13:42:30 +11:00
Ronnie Sahlberg	65e1e1d3ef	Niceify the readonlyrecord API. Dont force clients to be exposed to the featch_with_header function We dont strictly need to force clients to use CTDB_FETCH_WITH_HEADER instead of CTDB_FETCH when they ask for readonly records. Have ctdbd internally remap this internally to FETCH_WITH_HEADER and map the reply back to CTDB_FETCH_FUNC or CTDB_FETCH_WITH_HEADER_FUNC based on what the client initially asked for. This removes the need for the client to know about the CTDB_FETCH_WITH_HEADER_FUNC function and simplifies the client code. Clients that do not care what the header after the request is can just continue using the old CTDB_FETCH_FUNC call and ctdbd will do all the difficult stuff. (This used to be ctdb commit 444a7bac4e9a854b06c1ad4cb36c2b58a72001fa)	2012-01-31 17:20:35 +11:00
Mathieu Parent	bb3d6698e9	Move platform-specific code to common/system_* This removes #ifdef AIX and ease the addition of new platforms. (This used to be ctdb commit 2fd1067a075fe0e4b2a36d4ea18af139d03f17bf)	2011-12-06 11:57:11 +11:00
Mathieu Parent	4ce585cd0e	Remove zero-length gnu_printf format string in ctdb_daemon.c (gcc warning) server/ctdbd.c: In function ‘main’: server/ctdb_daemon.c:943:7: warning: zero-length gnu_printf format string [-Wformat-zero-length] (This used to be ctdb commit e6d1dd3ec4a078e5f32bc52a4a9e4b7d9a2e2d16)	2011-12-06 11:56:18 +11:00
Volker Lendecke	5a1da0ac55	Add CTDB_CONTROL_CHECK_SRVID (This used to be ctdb commit ad64ef2c40a2a12b37dbf39142e95c6781c2fc3b)	2011-11-30 09:02:26 +11:00
Ronnie Sahlberg	0e79b2d1e8	Record Fetch Collapse: Collapse multiple fetch request into one single request. When multiple clients fetch the same record concurrently, send only one single fetch across the network and deferr all other fetches locally. This improves performance for hot records and reduces cpu load on ctdb. (This used to be ctdb commit 82d6946ad8b3348e8b9d3d971f24925ade02d1be)	2011-11-08 16:08:28 +11:00
Ronnie Sahlberg	9729d3e339	ReadOnly: Check the readonly flag instead of whether the tdb pointer is NULL or not (This used to be ctdb commit 01314c2cb3a480917d6a632b83c39f0a48bba0e7)	2011-08-23 10:41:52 +10:00
Ronnie Sahlberg	de7c3de0a2	ReadOnly: clear out the tracking record once a revoke is completed (This used to be ctdb commit 7af255551f058d1f6bfdd38ca603e7a19d1bb7ba)	2011-08-23 10:35:56 +10:00
Ronnie Sahlberg	38e8964910	ReadOnly: Add handlign of readonly requests readwrite requests, delegations and revoking of delegation to the processing loop for CALL requests coming in from a local client via domain socket (This used to be ctdb commit e7cbf5b5d03cc26a73a92066a651f8eab73624b8)	2011-08-23 10:32:42 +10:00
Michael Adam	4cca8876e2	daemon: fill ctdb->ctdbd_pid early (This used to be ctdb commit 3da1e2e30bf34622f08e6ecd5b8fe55684e5007a)	2011-03-14 13:35:50 +01:00
Michael Adam	ee44c23cd5	daemon: correctly end a running trans3_commit if the client disconnects. (This used to be ctdb commit 9e0898db6df52d9bc799dd87bfea8c72d5f70ba0)	2011-02-24 10:35:25 +01:00
Ronnie Sahlberg	d903473d82	We can not always rely on the recovery daemon pinging us in a timely manner so we need a "ticker" in the main ctdbd daemon too to ensure we get at least one event to process every second. This will improve the accuracy of "Time jumped" messages and remove false positives when the recovery daemon is "slow". (This used to be ctdb commit 70154e5e19e219de086b2995d41e8f6e069ee20d)	2011-01-14 09:47:44 +11:00
Ronnie Sahlberg	ea0df6d882	Revert scheduling back to use real-time processes Revert this patch: commit 482c302d46e2162d0cf552f8456bc49573ae729d We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads. (This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)	2011-01-11 07:40:35 +11:00
Ronnie Sahlberg	83e68b62dd	delay loading the public ip address file until after we have started the transport and discovered ouw own pnn number (This used to be ctdb commit 1b57fc866fc836b5dbd3ef7b646e5a0f4280e81e)	2010-11-10 14:55:24 +11:00
Ronnie Sahlberg	5f76f3c0e2	Add a new tunable : DisableIPFailover that when set to non 0 will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)	2010-11-10 14:55:24 +11:00
Ronnie Sahlberg	db8cb31d8b	during shutdown there is a window after we have stopped TCP and disconnected from all other nodes but before we have stopped all processing. During this window we may still hit asynchronous events that will fail because we can not send/receive packets from other nodes. These messages are logged as ... Transport is DOWN. To help indicate that they are benign messages related to the process of shutting down. These messages spam the syslog during normal shutdown, so this patch will drop the loglevel of these messages to DEBUG, so that they will not appear in or spam the syslog. (This used to be ctdb commit 8275d265d2ae19b765e30ecf18f6b6319b6e6453)	2010-10-28 13:41:08 +11:00
Ronnie Sahlberg	5ef29f9f25	Update latency countes to show min/max and average (This used to be ctdb commit 1919e949af4641ffe919123e44b02fb87c13ab9f)	2010-10-11 15:12:24 +11:00
Ronnie Sahlberg	9f66a93f12	Add rolling statistics that are collected across 10 second intervals. Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected. (This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)	2010-09-29 12:14:45 +10:00
Ronnie Sahlberg	39c367a68f	Create macros to update the statistics counters and use these macros everywhere instead of manipulating the coutenrs directly. (This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7)	2010-09-29 12:14:24 +10:00
Ronnie Sahlberg	c6e20a06c7	set up a handler to catch and log debug messages from the tevent layer (This used to be ctdb commit fdb4c02f595fa207310a9a48da3fefd653fa9e4b)	2010-09-28 08:30:26 +10:00
Ronnie Sahlberg	ac335e3e5d	run the "init" event before we freeze the databases so that we can read from databases during this event (This used to be ctdb commit 6c93bf5a1219617bfb39b093aee3200c74c2c61a)	2010-08-25 08:35:24 +10:00
Ronnie Sahlberg	e8ffb0d8a4	We use eventloop nesting in a couple of places, notably the sync parts of the recovery daemon. Initialize all event contexts to allow nesting (This used to be ctdb commit 5bf6bd5e7f33aabbeb7b9707716ef99cf471e590)	2010-08-18 10:11:59 +10:00
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	7061ceffd8	Report client for queue errors. We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)	2010-07-01 23:08:49 +10:00
Ronnie Sahlberg	fa618aa66a	add additional logging when tdb_chainunlock() fails so we can see where it was called from when it fails (This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)	2010-06-09 14:37:16 +10:00
Rusty Russell	d5f6026a22	libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h ctdb_client.h is the existing internal client interface (which was mainly in ctdb.h), and ctdb_protocol.h is the information needed for the wire protocol only. ctdb.h will be the new, shiny, libctdb API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)	2010-05-20 15:18:30 +09:30
Ronnie Sahlberg	eeeb89e3e2	Reduce the loglevel for two log messages for Registering and Deregistering server ids. BZ61890 (This used to be ctdb commit 944434eb6420774e42e58984c6ddaa326a6853bd)	2010-03-30 11:57:25 +11:00
Stefan Metzmacher	3419e9c4dd	server: add "setup" event This is needed because the "init" event can't use 'ctdb' commands. metze (This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)	2010-02-23 10:38:49 +01:00
Andrew Tridgell	f23b82b58c	ctdb: when we fill the client packet queue we need to drop the client We can't just drop packets to the list, as those packets could be part of the core protocol the client is using. This happens (for example) when Samba is doing a traverse. If we drop a traverse packet then Samba hangs indefinately. We are better off dropping the ctdb socket to Samba. (This used to be ctdb commit a7a86dafa4d88a6bbc6a71b77ed79a178fd802a6)	2010-02-04 15:37:59 +11:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Ronnie Sahlberg	4c722fe34c	fix a conflict in the merge from rusty Merge commit 'rusty/ctdb-no-setsched' Conflicts: server/ctdb_vacuum.c (This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)	2009-12-17 08:18:04 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30
Stefan Metzmacher	94bc40307a	server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)	2009-12-16 08:06:10 +01:00
Stefan Metzmacher	9a96ae0c97	server: only do the mkdir() calls for db_directory* once at the start metze (This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)	2009-12-16 08:03:56 +01:00
Volker Lendecke	a0d9bd3c13	Run only one event for each epoll_wait/select call This might be a bit less efficient, but experience in winbind has shown that event callbacks can trigger changes in the socket state in very hard to diagnose ways. (This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)	2009-12-10 07:52:16 +11:00
Ronnie Sahlberg	fab11acc65	lower the loglevel for the message that a client has attached through a domian socket (This used to be ctdb commit de9e5236b20d70eac5ed29991703d6d25a103963)	2009-12-02 14:51:57 +11:00
Ronnie Sahlberg	6bad4a4836	Add a proper function to process a process-exist control in the daemon. This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed. If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw. bz58185 (This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)	2009-12-02 13:58:27 +11:00
Ronnie Sahlberg	1c7de7a2ed	Add a double linked list to the ctdb_context to store a mapping between client pids and client structures. Add the mapping to the list everytime we accept() a new client connection and set it up to remove in the destructor when the client structure is freed. (This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)	2009-12-02 13:41:04 +11:00
Ronnie Sahlberg	bf27dc2d53	Use the PID we pick up from the domain socket when a client connects and store this in the client structure. There is no need to rely on the hack that samba sends some special message handle registrations that encodes the pid in the srvid any more. This might not work on AIX since I recall some issues to get the pid in this way on that platform. (This used to be ctdb commit b4a7efa7e53e060a91dea0e8e57b116e2aeacebf)	2009-12-02 13:17:12 +11:00
Ronnie Sahlberg	e33722a569	start the syslog child a little later, after we have forked and detached from the local shell (This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)	2009-10-30 19:39:11 +11:00
Ronnie Sahlberg	4d40b86805	for debugging add a global variable holding the pid of the main daemon. change the tracking of time() in the event loop to only check/warn when called from the main daemon (This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)	2009-10-27 13:18:52 +11:00
Ronnie Sahlberg	86d1b4c465	Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects. The way to use this is from a client to : 1, first create a message handle and bind it to a SRVID A special prefix for the srvid space has been set aside for samba : Only samba is allowed to use srvid's with the top 32 bits set like this. The lower 32 bits are for samba to use internally. 2, register a "notification" using the new control : CTDB_CONTROL_REGISTER_NOTIFY = 114, This control takes as indata a structure like this : struct ctdb_client_notify_register { uint64_t srvid; uint32_t len; uint8_t notify_data[1]; }; srvid is the srvid used in the space set aside above. len and notify_data is an arbitrary blob. When notifications are later sent out to all clients, this is the payload of that notification message. If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster. A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client". 3, a client that no longer wants to have a notification set up can deregister using control CTDB_CONTROL_DEREGISTER_NOTIFY = 115, which takes this as arguments : struct ctdb_client_notify_deregister { uint64_t srvid; }; When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd. (This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)	2009-10-23 15:24:51 +11:00
Ronnie Sahlberg	a92ba7f729	lower the debug levels for the "create FD messages" so we dont fill up the logs. (This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)	2009-10-21 15:26:24 +11:00
Ronnie Sahlberg	9b8c72c446	When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES. Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery. This avoids having queued up very very large number of MESSAGES that samba semds between eachother to nodes that are blocked/banned/stopped for extended periods . (This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)	2009-10-21 15:20:55 +11:00
Ronnie Sahlberg	9de3652380	add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)	2009-10-15 11:24:54 +11:00
Ronnie Sahlberg	73c0adb029	initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)	2009-10-12 12:08:39 +11:00
Michael Adam	4cd06a330e	Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)	2009-07-29 11:12:39 +10:00
Ronnie Sahlberg	d4b30b34aa	dont even try to send a message from the main daemon if the transport is down (This used to be ctdb commit 9a2c4c3ed09ac9ea781d06999d11e5c3b5b4a97a)	2009-06-30 12:09:28 +10:00
Ronnie Sahlberg	4259156050	dont remove the socket when the dameon stops. This can race if the service is immediately restarted (This used to be ctdb commit b18356764cd49d934eab901e596bb75c6e3ecdf8)	2009-05-29 18:16:13 +10:00
Sumit Bose	2fcedf6dac	add missing checks on so far ignored return values Most of these were found during a review by Jim Meyering <meyering@redhat.com> (This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)	2009-05-21 11:22:21 +10:00
root	bfea570af4	when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0 Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero and when the client disconnects we decrement it to a negative number. this is a pure cosmetic patch with no operational impact to ctdb (This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)	2009-05-01 12:30:26 +10:00
Ronnie Sahlberg	e5e2f6f8f7	increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell) (This used to be ctdb commit 018f8b0b1823ef59b46f1a671aec5309d10628f4)	2009-04-06 14:00:41 +10:00
Ronnie Sahlberg	94a56ea410	reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)	2008-11-20 12:43:18 +11:00
Ronnie Sahlberg	06728fdac9	we actually need a ctdb_db variable (This used to be ctdb commit aba984f1b85f5a2d370b093061cf15843ee53758)	2008-11-03 21:54:52 +11:00
Ronnie Sahlberg	d7007793ea	latency is measured in us, not ms use an explicit ctdb_db variable instead of dereferencing state (This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)	2008-10-30 13:34:10 +11:00
Ronnie Sahlberg	e1b0cea427	add control and logging of very high latencies. log the type of operation and the database name for all latencies higher than a treshold (This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)	2008-10-30 12:49:53 +11:00
Ronnie Sahlberg	6474f3278d	additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)	2008-09-09 13:44:46 +10:00
Ronnie Sahlberg	ef997d344f	initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)	2008-08-19 14:58:29 +10:00
Ronnie Sahlberg	8b520bcb5f	lower a debug message (This used to be ctdb commit 554dcf16d37c8b9e4704df11d21fb272f30f5cec)	2008-07-18 10:38:51 +10:00
Ronnie Sahlberg	6eb4e46fe1	Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)	2008-07-17 13:50:55 +10:00
Ronnie Sahlberg	334db8ccba	proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358)	2008-07-09 14:02:54 +10:00
Ronnie Sahlberg	522830dea8	Revert "waitpid() can block if it takes a long time before the child terminates" This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10. revert the waitpid changes. we need to waitpid for some childredn so should refactor the approach completely (This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)	2008-07-08 17:41:31 +10:00
Ronnie Sahlberg	79425ddec5	Revert "set sigchild to SIG_IGN instead of SIG_DFL" This reverts commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76. (This used to be ctdb commit 2030e9ff2ca044181b72c3b87d513bf27057b5a2)	2008-07-08 17:40:53 +10:00
Ronnie Sahlberg	71d2315eee	set sigchild to SIG_IGN instead of SIG_DFL (This used to be ctdb commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76)	2008-07-08 16:31:23 +10:00
Ronnie Sahlberg	d67de4a7d2	waitpid() can block if it takes a long time before the child terminates so we should not call it from the main daemon. 1, set SIGCHLD to SIG_DFL to make sure we ignore this signal 2, get rid of all waitpid() calls 3, change reporting of event script status code from _exit()/waitpid() to write()/read() one byte across the pipe. (This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)	2008-07-08 03:48:11 +10:00
Ronnie Sahlberg	adf40341a7	ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7)	2008-05-11 14:28:33 +10:00
Ronnie Sahlberg	cd1858d126	fix compiler warning during a fatal error failing to lock down the socket (This used to be ctdb commit 0ad22de1a614dc2d1926546027be5f5eea3381ed)	2008-04-10 09:56:49 +10:00
Ronnie Sahlberg	2da3fe1b17	From Chris Cowan secure the domain socket and set permissions properly (This used to be ctdb commit ac6a362fc2fc4a56b4c310478a96eb12daace176)	2008-04-10 06:51:53 +10:00
Ronnie Sahlberg	6b797f148c	From Chris Cowan Add support in AIX to track the PID of a client that connects to the unix domain socket (This used to be ctdb commit 4c006c675d577d4a45f4db2929af6d50bc28dd9e)	2008-04-03 10:58:51 +11:00
Ronnie Sahlberg	03d30f405d	decorate the memdump output with a nice field for ctdb_client structures to show the pid of the client that attached (This used to be ctdb commit 0d9314302d0b988b6ab5d533deef40c5b343c249)	2008-04-01 17:17:21 +11:00
Ronnie Sahlberg	27a7f854f5	add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)	2008-04-01 15:34:54 +11:00
Andrew Tridgell	f6e53f433b	merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c)	2008-02-04 20:07:15 +11:00
Andrew Tridgell	9d6ac0cf55	added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)	2008-02-04 17:44:24 +11:00
Andrew Tridgell	b62b7fcde8	added syslog support, and use a pipe to catch logging from child processes to the ctdbd logging functions (This used to be ctdb commit 1306b04cd01e996fd1aa1159a9521f2ff7b06165)	2008-01-16 22:03:01 +11:00
Andrew Tridgell	bf9e33d4cf	- catch a case where the client disconnects during a call - track all talloc memory, using NULL context (This used to be ctdb commit bf89c56002f5311520e91cb367753bc46e5dddc9)	2008-01-16 09:44:48 +11:00
Ronnie Sahlberg	ba31feaec0	split node health monitoring and checking for connected/disconnected nodes into two separate files. move the monitoring of keepalives for detecting connected/disconnected remote nodes into ctdb_keepalive.c (This used to be ctdb commit 23a57b20c314d5f11a433cf251eb9d9de743849a)	2008-01-15 08:42:12 +11:00
Andrew Tridgell	9311f7fb7e	fixed the bug that make "onnode N service ctdb start" hang (This used to be ctdb commit b50dcb16f30a60abce42f491f9b0aae7948b8206)	2008-01-05 12:09:29 +11:00
Andrew Tridgell	bde886988b	prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)	2007-11-12 10:53:11 +11:00
Andrew Tridgell	b87ddd9148	no longer wait at startup for services to become available, instead set the node initially unhealthy and let the status monitoring bring the node online. This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup and thus frozen (This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)	2007-09-24 10:00:14 +10:00
Andrew Tridgell	c60988325d	added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)	2007-09-21 12:24:02 +10:00
Andrew Tridgell	a478c78f03	changed some debug levels (This used to be ctdb commit ed764533e1c2f8982e1577ca5e7f5f4482a15345)	2007-09-12 13:21:19 +10:00
Ronnie Sahlberg	d66d9cdd22	change debug output from vnn to pnn change ctdb_daemon_send_message to take pnn as parameter isntead of vnn (This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)	2007-09-04 10:45:41 +10:00
Ronnie Sahlberg	211b497818	change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)	2007-09-04 10:33:10 +10:00
Ronnie Sahlberg	583b6e6ba6	change ctdb_get_vnn to ctdb_get_pnn (This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)	2007-09-04 10:18:44 +10:00
Ronnie Sahlberg	fc9d39c3a6	change ctdb_validate_vnn to ctdb_validate_pnn (This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)	2007-09-04 10:09:58 +10:00
Ronnie Sahlberg	eb4cf6a686	change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)	2007-09-04 10:06:36 +10:00
Ronnie Sahlberg	8b06fc7284	change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)	2007-08-21 17:25:15 +10:00
Ronnie Sahlberg	5228abef64	add an atexit() that will print "CTDB daemon shutting down" in the log when the main daemon exits (This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)	2007-08-21 09:43:53 +10:00
Ronnie Sahlberg	aed2c58c64	dont pollute the log with 'Registered PID XXX for client YYY' at log level 0. change the log level to 3 for this information message (This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)	2007-08-21 08:42:42 +10:00

1 2 3 4 5 ...

258 Commits