samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00

Author	SHA1	Message	Date
Amitay Isaacs	d0c858f211	ctdbd: Make sure we don't kill init process by mistake If getpgrp() fails, it will return -1 and that will send KILL signal to init process (PID 1). This does not happen on RHEL, but does on AIX. Reported-by: Chris Cowan <cc@us.ibm.com> Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit edb2a3556d03e248b42f63dd2c62382b723bc98f)	2013-06-14 16:39:48 +10:00
Martin Schwenke	fa16cccf02	ctdbd: Remove the "stopped" event It isn't used, superceded by "ipreallocated". Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)	2013-05-06 13:38:21 +10:00
Martin Schwenke	2e59cd5428	ctdbd: New control CTDB_CONTROL_IPREALLOCATED This is an alternative to using ctdb_run_eventscripts() that can be used when in recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)	2013-05-06 13:38:21 +10:00
Martin Schwenke	f6e48639cd	ctdbd: Avoid freeing non-monitor event callback when monitoring is disabled When running a non-monitor event, check is made for any active monitor events. If there is an active monitor event, then the active monitor event is cancelled. This is done by freeing state->callback which is allocated from monitor_context. When CTDB is stopped or shutdown, monitoring is disabled by freeing monitor_context, which frees callback and then stopped or shutdown event is run. This creates a new callback structure which is allocated at the exact same memory location as the monitor callback which was freed. So in the check for active monitor events, it frees the new callback for non-monitor event. Since the callback function flags successful completion of that event, it is never marked complete and CTDB is stuck in a loop waiting for completion. Move the monitor cancellation to the top of the function so that this can't happen. Follow log snippest highlights the problem. 2013/04/30 16:54:10.673807 [21505]: Received SHUTDOWN command. Stopping CTDB daemon. 2013/04/30 16:54:10.673814 [21505]: Shutting down recovery daemon 2013/04/30 16:54:10.673852 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0 2013/04/30 16:54:10.673858 [21505]: Monitoring has been stopped 2013/04/30 16:54:10.673899 [21505]: server/eventscript.c:594 Sending SIGTERM to child pid:23847 2013/04/30 16:54:10.673913 [21505]: server/eventscript.c:629 searching for callback 0x1c6d5c0 2013/04/30 16:54:10.673932 [21505]: server/eventscript.c:641 running callback 2013/04/30 16:54:10.673939 [21505]: server/eventscript.c:866 in event_script_callback 2013/04/30 16:54:10.673946 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0 Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 05f785b51cfd8b22b3ae35bf034127fbc07005be)	2013-05-06 13:00:07 +10:00
Martin Schwenke	37632efde0	ctdbd: Don't use a fixed length buffer for the hung script command The amount of data to write into the buffer wasn't constrained anywhere... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9b0d56b16775aa16f33bdfdf831256e085fa3339)	2013-02-05 16:05:13 +11:00
Martin Schwenke	e883720461	ctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable This is quite easy to misconfigure by failing to set the execute bit on the script. Better to complain loudly. This is a debugging facilty rather than core CTDB functionality, so it doesn't need a subtle mechanism to disable it at run-time. To disable the designated script at run-time either edit it to put an "exit 0" at the top or move it aside and symlink to /bin/true. This is implemented by actually removing the code that checks that the file exists and is executable. The output from the shell when the system() function fails is just as useful. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3400b2ed34b6eb9496eb55f1aab6f89d2952060d)	2013-02-05 16:05:13 +11:00
Martin Schwenke	bc5f0a2b65	ctdbd: Remove command-line option --debug-hung-script Use an environment variable instead. This just means that the initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the environment variable. The justification for this simplification is that more debug options will be arriving soon and we want to handle them consistently without needing to add a command-line option for each. So, the convention will be to use an environment variable for each debug option. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)	2013-02-05 16:05:13 +11:00
Martin Schwenke	f2428cadd8	ctdbd: Remove debug_hung_script_ctx The only allocation against this context is by ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler() anyway. There should be no memory leak. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)	2013-02-05 16:05:13 +11:00
Amitay Isaacs	4a6fa39ff9	daemon: Protect against double free of callback state while shutting down When CTDB is shut down and monitoring has been stopped, monitor_context gets freed and all the callback states hanging off it. This includes callback state for current_monitor, if the current monitor event has not yet finished. As a result, when the shutdown event is called, current_monitor->callback state is not NULL, but it's actually freed and it's a dangling reference. So before executing callback function and freeing callback state check if ctdb->monitor->monitor_context is not NULL. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)	2013-01-09 14:39:23 +11:00
Martin Schwenke	199b971f57	ctdbd: Remove references to forcing running of eventscripts from log messages Running of eventscripts can be initiated from many places, including the recovery daemon. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 440892d75ef73c0aca22f47c0c01712be00cf5b7)	2012-10-18 20:05:43 +11:00
Martin Schwenke	65725d30d4	ctdbd: Remove the worked "Forced" from message about running eventscripts The eventscripts are run after a takeover run and in this case they're not forced. The messages seems to imply that somone has run "ctdb eventscript" when that is not necessarily the case. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3880589db4d563e438126cf5080261fa06b9e242)	2012-07-26 22:10:54 +10:00
Ronnie Sahlberg	dce5969d12	Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung. Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect. For now we only collect a pstree so we can see what part of the script we hung in. S1037271 (This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)	2012-05-17 10:29:03 +10:00
Ronnie Sahlberg	a57eba2bb4	Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)	2012-05-03 14:03:26 +10:00
Amitay Isaacs	4392591555	Remove explicit include of lib/tevent/tevent.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)	2012-04-13 17:28:14 +10:00
Ronnie Sahlberg	93ec9c589c	Eventscripts: remove the horrible horrible circular reference between state and callback since these two structures do not even share the same parent talloc context. Instead, tie them together via referencing a permanent linked list hung off the ctdb structure. (This used to be ctdb commit a95c02da6c67dc4bd8716b75318a4188301df6f9)	2012-02-23 06:49:47 +11:00
Ronnie Sahlberg	0581fd85e6	Eventscripts: Add special -ECANCELED status for monitor events that are cancelled When a monitor event is canceled by a higher priority script, make sure we return status -ECANCELED to the callback in ctdB_monitor.c Also treat -ECANCELED as a simple "try monitor event again" and skip modifying any HEALTHY/UNHEALTHY flags when this happens (This used to be ctdb commit a15ec57c26d1bc82af85f74eebae0bd8abde3233)	2011-11-18 12:22:22 +11:00
Ronnie Sahlberg	2902203900	Logging: when we log stdout/stderr messages from eventscripts to the system log, prefix every line of output with the name of the eventscript. CQ S1028412 (This used to be ctdb commit 392363c04185f47a826fc6ed95038342be2150bf)	2011-08-26 09:39:25 +10:00
Rusty Russell	87ea4818bf	eventscript: fix callback after free ctdb_event_script_callback() takes a mem_ctx arg which it doesn't use, but the implication is pretty clear, that when that mem_ctx is freed, the callback shouldn't happen. Indeed, Ronnie reproduced a case where that callback refers to freed memory, in the ip reallocation code under stress. So attach the callback to the mem_ctx they give us, and remove it from the script state structure when that's freed. It's a bit weird, but it works. CQ: S1026179 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6fcd867cc835ef1ffc1c50964f135c346503d40c)	2011-07-29 08:50:39 +10:00
Ronnie Sahlberg	2f1395ce03	If the eventscript is finished but state->ctdb is NULL, log an error and return. (Need to find root cause for this is soo too.) (This used to be ctdb commit 2e80d53b73fcba58ed5a72bab66c051691ccf719)	2011-04-12 06:36:42 +10:00
Ronnie Sahlberg	c4006ce844	Add ctdb_fork(0 which will fork a child process and drop the real-time scheduler for the child. Use ctdb_fork() from callers where we dont want the child to be running at real-time privilege. (This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)	2011-01-11 07:40:41 +11:00
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Rusty Russell	9fbb191b78	logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)	2010-08-18 11:46:32 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	6da848f31c	eventscript: simplify script timeout handling Now the script child signal handler doesn't do anything, we can unify the "timeout" and "abort" cases introduced in 9dd25cb751919799. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 439f049c7024d69aa4b87dc811e1772981ad29cb)	2010-04-08 15:11:05 +09:30
Rusty Russell	a9c8b9e89b	eventscript: wait for debugging dump before killing timedout script Fairly simple: prevent the destructor from killing the script, and do it explicitly from the debugging child. We can remove the extra "already dead" test, since this will be detected in the destructor anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit f8aa83788e3cc10ab7655a90d7b7b17ddbe48685)	2010-04-08 15:09:08 +09:30
Rusty Russell	e1b59b6a47	eventscript: don't do debugging system() from inside signal handler In the case of a timeout, we dump a log of what's happening to a file in /tmp. We do it from the signal handler, which is an unreliable hack (BZ58365). Instead, create another (lower-priority) child to do the dump, then kill the timedout script. Note that this doesn't quite work as intended (the dump is often run after the script has been killed), so the next patch resolves this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)	2010-04-08 15:13:29 +09:30
Rusty Russell	037dfbb8ad	eventscript: fix case where we fail to create child for some reason Initialize the child pid to 0 so destructor doesn't try to kill it: server/eventscript.c:565 Sending SIGTERM to child pid:139742328 Failed to kill child process for eventscript, errno No such process(3) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fcc63e04beb427c1f48deae6d3d98c78a2a67949)	2010-04-08 10:35:04 +09:30
Ronnie Sahlberg	e179910136	When we forcefully abort a running eventscript, dont log this as is the script timedout. Instead send a different signal (SIGABRT) to the child process to silently kill the process group for the script and its children without logging anything. We abort any running "monitor" script anytime any other event is generated either by ctdbd itself or by "ctdb eventscript ..." BZ61043 (This used to be ctdb commit 9dd25cb751919799af9d8a23a0725343a8400e58)	2010-03-30 12:47:54 +11:00
Stefan Metzmacher	3419e9c4dd	server: add "setup" event This is needed because the "init" event can't use 'ctdb' commands. metze (This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)	2010-02-23 10:38:49 +01:00
Ronnie Sahlberg	68decc38ca	Ignore any scripts that timesout for most events, except startup. Threat hung scripts always (except startup) as success. (This used to be ctdb commit b6d939c9758c7d2e39206838492f2f644dd61db7)	2010-02-16 11:21:27 +11:00
Ronnie Sahlberg	96a61ca907	Reduce loglevel for two eventscript related debug messages (This used to be ctdb commit f8994790e65baebb81bbfad646cdda6234b6d29a)	2010-02-16 11:02:11 +11:00
Stefan Metzmacher	98ee69c66d	server: add updateip event metze (This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Rusty Russell	565b2cda11	eventscript: fix bug when script is aborted Another corner case when we terminate running monitor scripts to run something else: logging can flush the output and we write to a NULL pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit eb22c34bccc8a04fcf63efa2bc48d9788709382e)	2009-12-18 14:48:41 +11:00
Rusty Russell	e757b7c4bf	eventscript: remove cb_status, fix uninitialized bug when monitoring aborted (Reapplied with merge after accidental revert) Previously we updated cb_status a each script finished. Since we're storing the status anyway, we can calculate it by iterating the scripts array itself, providing clear and uniform behavior on all code paths. In particular, this fixes a longstanding bug when we abort monitor scripts to run some other script: the cb_status was uninitialized. In this case, we need to hand something to the callback; 0 might make us go healthy when we shouldn't. So we use the last status (normally, this will be the just-saved current status). In addition, we make the case of failing the first fork for the script and failing other script forks the same: the error is returned via the callback and saved for viewing through 'ctdb scriptstatus'. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2c84fe393ff2b961abf77d58a371c24db5ecb93b)	2009-12-18 14:48:35 +11:00
Rusty Russell	4dce0690de	eventscript: fix cleanup path when setting up script list We shouldn't set ctdb->current_monitor until we set destructor: that's what cleans it up. Also, free state->scripts on no-scripts exit path: it's not a child of state because we need it in the destructor. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 843a2ed5ef85f628788b0caf7417c6b61b5c6d3f)	2009-12-18 12:31:34 +11:00
Ronnie Sahlberg	9b507abd6e	version 1.0.109 (This used to be ctdb commit 99894a70fe2ebfe43daae7e88ff0fc9cab33e0fb)	2009-12-17 15:49:01 +11:00
Rusty Russell	8aec7e5656	eventscript: remove cb_status, fix uninitialized bug when monitoring aborted Previously we updated cb_status a each script finished. Since we're storing the status anyway, we can calculate it by iterating the scripts array itself, providing clear and uniform behavior on all code paths. In particular, this fixes a longstanding bug when we abort monitor scripts to run some other script: the cb_status was uninitialized. In this case, we need to hand something to the callback; 0 might make us go healthy when we shouldn't. So we use the last status (normally, this will be the just-saved current status). In addition, we make the case of failing the first fork for the script and failing other script forks the same: the error is returned via the callback and saved for viewing through 'ctdb scriptstatus'. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 5d50f0e16948d18009f6623f132113f7273efc7f)	2009-12-17 15:39:46 +11:00
Rusty Russell	f148735928	Add --valgringing flag instead of --nosetsched The do_setsched was being tested for whether to mmap tdbs: let's make it explicit. We can also happily move the kill-child eventscript hack under this flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)	2009-12-16 20:59:15 +10:30
Ronnie Sahlberg	b3104bd1d0	Author: Rusty Russell <rusty@rustcorp.com.au> Date: Tue Dec 15 15:53:30 2009 +1030 eventscript: hack to avoid overloading valgrind Now we fork one child per script, when running under valgrind the load gets quite high. This is because valgrind does a lot of work after exit, and we don't wait for the children to finish; we start the next one when the child reports status via the pipe. This fix is ugly, but simple. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 6ed34d5320c39d8a55f2a36ad4c1ab574e0b0796)	2009-12-15 20:56:16 +11:00
Rusty Russell	784fa9fd8a	eventscript: fix monitoring when killed by another script command Commit c1ba1392fe "eventscript: get rid of ctdb_control_event_script_finished altogether" was wrong: there is one case where we want to free the script without transferring their status to last_status. This happens because we always kill an running monitor command when we run any other command. This still isn't quite right (and never was): the callback will be called with status value 0, which might flip us to HEALTHY if we were unhealthy. This is conveniently fixed in my next set of patches :) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 0ea0e27d93398df997d3df9d8bf112358af3a4a5)	2009-12-14 15:46:14 +11:00
Rusty Russell	a46c3b4f2a	ctdb: scriptstatus can now query non-monitor events We also no longer return an error before scripts have been run; a special zero-length data means we have never run the scripts. "ctdb scriptstatus all" returns all event script results. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)	2009-12-08 01:50:55 +10:30
Rusty Russell	5d99a1a47c	eventscript: expost call names and enum We're going to need this so ctdb can query non-monitor status. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)	2009-12-08 01:47:13 +10:30
Rusty Russell	0dbe76f88f	eventscript: lock logging on timeout. Ronnie suggested this; seems like a very good idea. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 93153bca68926401dc9ae7fd77ed3f17be923344)	2009-12-08 01:32:36 +10:30
Rusty Russell	b29067b02f	eventscript: get rid of ctdb_control_event_script_finished altogether We always have to call it before freeing the state; we should just do this work in the destructor itself. Unfortunately, the script state would already be freed by the time the state destructor is called, so we make the script state a child of ctdb, and talloc_free() it manually on the one path which doesn't use the destructor. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c1ba1392fe52762960e896ace0aca0ee4faa94d5)	2009-12-08 12:29:10 +10:30
Rusty Russell	d3593c2f83	eventscript: save state for all script invocations Rather than only tranferring to last_status for monitor events, do it for every event (ctdb->last_status is now an array). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)	2009-12-08 12:27:48 +10:30
Rusty Russell	6960fa96eb	eventscript: cleanup finished to take state arg We only need ctdb->current_monitor so we can kill it when we want to run something else; we don't need to use it here as we always know what script we are running. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4cf1b7c32bcf7e4b65aec1fa7ee1a4b162cac889)	2009-12-08 12:24:56 +10:30
Rusty Russell	e548a335bd	eventscript: use wire format internally for script status. The only difference between the exposed an internal structure now is that the name and output fields were pointers. Switch to using ctdb_scripts_wire/ctdb_script_wire internally as well so marshalling is a noop. We now reject scripts which are too long and truncate logging to the 511 characters we have space for (the entire output will be in the normal ctdbd log). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fd2f04554e604bc421806be96b987e601473a9b8)	2009-12-08 12:48:17 +10:30
Rusty Russell	9753b7e793	eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire We're going to allow fetching status of all script runs, so this name is no longer appropriate. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)	2009-12-08 00:51:24 +10:30

1 2 3 4

162 Commits