samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00

Author	SHA1	Message	Date
Amitay Isaacs	1c21f37e57	ctdbd: Set process names for child processes This helps distinguish processes in process list in top, perf, etc. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)	2013-07-10 14:33:19 +10:00
Amitay Isaacs	c6914e3891	banning: Make ctdb_local_node_got_banned() a void function When this function is called, we are already committed to banning and there is no point in failing this function. In case, freezing of databases fails, it will be fixed from recovery daemon. (This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)	2013-07-02 12:59:08 +10:00
Martin Schwenke	44e885e98e	ctdbd: Fix panic on overlapping shutdowns The runstate can't be set to SHUTDOWN twice, so the current naive code causes a panic on the 2nd shutdown. This regression was introduced in commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458)	2013-06-22 15:51:16 +10:00
Martin Schwenke	6a52a87028	ctdbd: Refactor shutdown sequence Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b32fd04bfbf33062d45365b37a7247e272a76ceb)	2013-06-22 15:51:02 +10:00
Martin Schwenke	6d9667f01c	ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY This adds more serialisation to the startup, ensuring that the "startup" event runs after everything to do with the first recovery (including the "recovered" event). Given that it now takes longer to get to the "startup" state, the initscript needs to wait until ctdbd gets to "first_recovery". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)	2013-05-24 14:08:07 +10:00
Martin Schwenke	63577c96db	ctdbd: Replace ctdb->done_startup with ctdb->runstate This allows states, including startup and shutdown states, to be clearly tracked. This doesn't include regular runtime "states", which are handled by node flags. Introduce new functions ctdb_set_runstate(), runstate_to_string() and runstate_from_string(). Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)	2013-05-24 14:08:06 +10:00
Amitay Isaacs	4a6fa39ff9	daemon: Protect against double free of callback state while shutting down When CTDB is shut down and monitoring has been stopped, monitor_context gets freed and all the callback states hanging off it. This includes callback state for current_monitor, if the current monitor event has not yet finished. As a result, when the shutdown event is called, current_monitor->callback state is not NULL, but it's actually freed and it's a dangling reference. So before executing callback function and freeing callback state check if ctdb->monitor->monitor_context is not NULL. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)	2013-01-09 14:39:23 +11:00
Amitay Isaacs	4392591555	Remove explicit include of lib/tevent/tevent.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)	2012-04-13 17:28:14 +10:00
Ronnie Sahlberg	0581fd85e6	Eventscripts: Add special -ECANCELED status for monitor events that are cancelled When a monitor event is canceled by a higher priority script, make sure we return status -ECANCELED to the callback in ctdB_monitor.c Also treat -ECANCELED as a simple "try monitor event again" and skip modifying any HEALTHY/UNHEALTHY flags when this happens (This used to be ctdb commit a15ec57c26d1bc82af85f74eebae0bd8abde3233)	2011-11-18 12:22:22 +11:00
Ronnie Sahlberg	a9eba762d7	remove a non-error logmessage about persistent databases being healthy, as expected S1026492 (This used to be ctdb commit da9e02085523e27fa29e35c60034f6a8aaaa81e8)	2011-08-04 13:49:48 +10:00
Ronnie Sahlberg	629f4da55a	remove a log message we dont need about "allow clients to attach to databases" S1026492 (This used to be ctdb commit 42c3e4c5216000c370814441e38c7a8180047aaf)	2011-08-04 13:49:38 +10:00
Ronnie Sahlberg	ae35e9e5b2	Cleanup of logging messages/spamming Reduce an infomational message about not performing ip reallocation from NOTICE(the default) to INFO. These messages are normal during startup or when stopped/banned when we will be in recovery mode for a while. Remove a messager in the loop waiting for initial startup to complete about the generation being invalid. It is always invalid at this stage before we have finished initial recovery. Rate-limit the informational messages for CTDB_WAIT_UNTIL_RECOVERED so that we only print them once per second for the first 60 seconds and after that only once per 10 minutes. These messages are normal during startup, but we should not be logging them every second for cases where we will remain in recovery mode during startup for an extended period of time. Such as if suspended or permabanned. CQ S1023302 (This used to be ctdb commit 3a0af8780dc595acbed880f288fcbc4f62c862fb)	2011-05-04 10:42:32 +10:00
Ronnie Sahlberg	3cc230b5ee	Dont allow clients to connect to databases untile we are well past and through the initial recovery phase CQ S1022412 Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit e02bbd915b7151c615ff64f09ad9abc9720bef7d)	2011-03-14 13:35:53 +01:00
Ronnie Sahlberg	40bd94bd5e	If the node is stopped, put a log entry in /var/log/* to indicate this is why we never become ready (This used to be ctdb commit ef1de8211f83259ea37dcd57562139a3b63d9631)	2011-02-02 14:09:56 +11:00
Ronnie Sahlberg	c4006ce844	Add ctdb_fork(0 which will fork a child process and drop the real-time scheduler for the child. Use ctdb_fork() from callers where we dont want the child to be running at real-time privilege. (This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)	2011-01-11 07:40:41 +11:00
Stefan Metzmacher	0b5bd411ca	server/banning: also release all ips if we're banning ourself metze (This used to be ctdb commit c386f2c62f06f1c60047b7d4b1ec7a9eec11873c)	2010-09-14 15:50:31 +10:00
Stefan Metzmacher	96ddf2f607	server/monitor: ask for a takeoverrun after propagating our new flags metze (This used to be ctdb commit 942f44123350d4d0c4ad7f3fcd5ff2d0d175739b)	2010-09-14 15:48:10 +10:00
Ronnie Sahlberg	e040a966af	Dont set next_interval to 0. This can cause ctdbd to spin at 100% in the eventsystem, creating a timed event that will immediately trigger again and again. On uniprocessors this cause the eventscript we are actually waiting for to basically become cpu starved and never complete. (This used to be ctdb commit 92c8408fba957a8ded13f7e285da290502735234)	2010-08-20 15:00:45 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Rusty Russell	9fbb191b78	logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)	2010-08-18 11:46:32 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	8946028a07	speed startup: add --sloppy-start. The extra recovery interval wait was introduced in 821333afb458 but no explanation was provided in that message. Nonetheless, if starting the entire cluster for the first time, it should be safe to skip this. We use the commandline arg --sloppy-start which should discourage people from using it outside testing. Seconds between ctdbd first log message and node healthy: BEFORE: 16.10 AFTER: 4.03 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)	2010-06-22 22:52:34 +09:30
Rusty Russell	ed31caffab	speed startup: run startup immediately after recovery finished. Seconds between ctdbd first log message and node healthy: BEFORE: 17.08 AFTER: 16.10 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 372201d418f041d69646793105f6898ab12a7d91)	2010-06-22 22:50:45 +09:30
Rusty Russell	eb61b11497	speed startup: immediately run first monitor event after startup. Once we've done a startup, we need to run a monitor event successfully to be marked as healthy. Rather than wait the usual 5 seconds, run it immediately (which will then reset next_interval to 5 seconds). Seconds between ctdbd first log message and node healthy: BEFORE: 23.58 AFTER: 18.09 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c8651494febcb1c9e558b2002e2a72c2bf547c06)	2010-06-22 22:50:07 +09:30
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Stefan Metzmacher	94bc40307a	server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)	2009-12-16 08:06:10 +01:00
Ronnie Sahlberg	649ba2631d	Rename the tunable EventScriptBanCount to EventScriptTimeoutCount since we no longer ban nodes when dodgy scripts continue to hang. We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban. (This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)	2009-12-14 15:53:23 +11:00
Ronnie Sahlberg	e76561f544	remove the variable "disable when unhealthy" there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails (This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)	2009-12-14 15:40:54 +11:00
Rusty Russell	9914d3f561	eventscript: don't make ourselves healthy if we're under ban_count If we've timed out, but we've not timed out more than ctdb->tunable.script_ban_count, we pretend we haven't. There's a logic bug in the way this is done: if we were unhealthy before, this would set us to "healthy" again (status == 0). I don't think this would happen in real life, but it's a little surprising. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6488c0e05bab5c4c2c0a6370930b0b27e5ed56e)	2009-12-07 23:52:01 +10:30
Rusty Russell	928b8dcb31	eventscript: handle banning within the callbacks Currently the timeout handler in eventscript.c does the banning if a timeout happens. However, because monitor events are different, it has to special case them. As we call the callback anyway in this case, we should make that handle -ETIME as it sees fit: for everyone but the monitor event, we simply ban ourselves. The more complicated monitor event banning logic is now in ctdb_monitor.c where it belongs. Note: I wrapped the other bans in "if (status == -ETIME)", though they should probably ban themselves on any error. This change should be a noop. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)	2009-12-07 23:48:57 +10:30
Ronnie Sahlberg	698a0e4e9a	When starting up ctdbd, wait until all initial recoveries have finished and until we have gone through a full re-recovery timeout without triggering any pending recoveries before we start up the services and start monitoring the node. (This used to be ctdb commit 821333afb458358f90446062b0242790695e5060)	2009-12-01 13:19:58 +11:00
Martin Schwenke	a64ccf07c1	Add flag to ctdb_event_script_callback indicating when called by client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)	2009-11-26 15:49:49 +11:00
Rusty Russell	2d9254404d	eventscript: introduce enum for different event script calls. Rather than doing strcmp everywhere, pass an explicit enum around. This also subtly documents what options are available. The "options" arg is now used for extra arguments only. Unfortunately, gcc complains on empty format strings, so we make ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We leave ctdb_event_script_callback() taking varargs, which means callers have to do "%s", "". For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts from the ctdb tool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)	2009-11-24 11:16:49 +10:30
Rusty Russell	2763df22de	eventscript: put timeout inside ctdb_event_script_callback_v Everyone uses the same timeout value, so just remove it from the API. If we ever need variable timeouts, that might as well be central too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)	2009-11-24 11:09:46 +10:30
Ronnie Sahlberg	e07ca41886	change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts restructure the talloc hierarchy to allow this (This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)	2009-10-28 16:11:54 +11:00
Ronnie Sahlberg	1d7681709b	dont run the monitor event so frequently after a event has failed. use _exit() instead of exit() when terminating an eventscript. (This used to be ctdb commit cc30ee2f4f33cb75b2be980c2d4dff6c7c23852f)	2009-10-27 13:51:45 +11:00
Stefan Metzmacher	198866d82d	server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)	2009-10-26 14:21:45 +11:00
Ronnie Sahlberg	e627fae600	if a lock wait child died/finished, we could have released the lockwait handle and set it to NULL before we call the destructors for releaseing the waiters. The waiters reference the locakwait handle in order to remove itself from the li nked list which caused a SEGV. We dont actually need to remove ourselves from this list here since if the parent freeze_handle holding the list is freed, then all waiters are rele ased as well, and the only place we actually need to relink the waiter is in ctd b_freeze_lock_handler, where we want to respond back to the clients and release the waiters but we still want to keep the freeze_handle hanging around. (This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)	2009-10-22 13:41:28 +11:00
Ronnie Sahlberg	598419e57b	Dont run eventscript monitor when the databases are frozen. The databases can become frozen a while before we do the actual recovery since we have the re-recovery timeout. There is no point in doing much monitoring if we are waiting for a recovery, or if we are banned. This will eliminate some annoying log entries where certain tests will fail if the databases are locked. (This used to be ctdb commit ff824676fab94168707aada7423ae766bc0f711c)	2009-10-15 16:03:43 +11:00
Ronnie Sahlberg	80be59d35e	when we change state between healthy/unhealthy, make sure we ask the recovery master to perform an explicit ip reallocation. This is more reliable and faster than having the recovery dameon track these changes, and since we now have an explicit method to ask the recovery daemon to perform an explicit ip reallocation, we should use this. (This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)	2009-10-14 11:59:16 +11:00
Ronnie Sahlberg	73c0adb029	initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)	2009-10-12 12:08:39 +11:00
Ronnie Sahlberg	e90dd8015f	add a new notification to trigger on when ctdb has started (This used to be ctdb commit b1fe04f2e9447f762a0b805763deb29296585ff8)	2009-10-01 14:05:30 +10:00
Ronnie Sahlberg	cda5f02c7c	new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a)	2009-09-04 02:20:39 +10:00
Ronnie Sahlberg	41a519191e	dont let other nodes modify the STOPPED flag for the local process when pushing out flags changes (This used to be ctdb commit 501a2747d839ca291b70c761098549cf6d47a158)	2009-07-09 13:20:14 +10:00
Ronnie Sahlberg	d1c40424f6	When we ban a node, only drop the IPs on the node being banned, not on every node (This used to be ctdb commit 46e8c3737e6ff54fc80de8e962e922924c27bc35)	2009-06-10 10:35:20 +10:00
Ronnie Sahlberg	ad40ee25f9	add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state. This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes. (This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)	2009-03-31 14:23:31 +11:00
Ronnie Sahlberg	94a56ea410	reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)	2008-11-20 12:43:18 +11:00
Ronnie Sahlberg	f4fd4d0af8	dont disable/enable monitoring for each eventscript, instead just disable the monitoring during the "startrecovery" event and enable it again once recovery has completed (This used to be ctdb commit 68029894f80804c9f31fc90ed0c1b58f75812c3d)	2008-05-16 08:20:40 +10:00
Ronnie Sahlberg	e8e67ef576	add a mechanism to force a node to run the eventscripts with arbitrary arguments ctdb eventscript "command argument argument ..." (This used to be ctdb commit 118a16e763d8332c6ce4d8b8e194775fb874c8c8)	2008-04-02 11:13:30 +11:00
Ronnie Sahlberg	7bc8007f93	add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY. Use with caution. (This used to be ctdb commit c20293360db67f9876b0c84e5e9e12a5868964cb)	2008-02-22 10:33:09 +11:00

1 2

76 Commits