1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

193 Commits

Author SHA1 Message Date
Ronnie Sahlberg
c4006ce844 Add ctdb_fork(0 which will fork a child process and drop the real-time
scheduler for the child.

Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.

(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
2011-01-11 07:40:41 +11:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Rusty Russell
9fbb191b78 logging: give a unique logging name to each forked child.
This means we can distinguish which child is logging, esp. via syslog where we have no pid.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
2010-08-18 11:46:32 +09:30
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Rusty Russell
6da848f31c eventscript: simplify script timeout handling
Now the script child signal handler doesn't do anything, we can unify the
"timeout" and "abort" cases introduced in 9dd25cb751919799.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 439f049c7024d69aa4b87dc811e1772981ad29cb)
2010-04-08 15:11:05 +09:30
Rusty Russell
a9c8b9e89b eventscript: wait for debugging dump before killing timedout script
Fairly simple: prevent the destructor from killing the script, and do it
explicitly from the debugging child.

We can remove the extra "already dead" test, since this will be detected
in the destructor anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit f8aa83788e3cc10ab7655a90d7b7b17ddbe48685)
2010-04-08 15:09:08 +09:30
Rusty Russell
e1b59b6a47 eventscript: don't do debugging system() from inside signal handler
In the case of a timeout, we dump a log of what's happening to a file
in /tmp.  We do it from the signal handler, which is an unreliable hack
(BZ58365).

Instead, create another (lower-priority) child to do the dump, then
kill the timedout script.

Note that this doesn't quite work as intended (the dump is often run
after the script has been killed), so the next patch resolves this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)
2010-04-08 15:13:29 +09:30
Rusty Russell
037dfbb8ad eventscript: fix case where we fail to create child for some reason
Initialize the child pid to 0 so destructor doesn't try to kill it:

	server/eventscript.c:565 Sending SIGTERM to child pid:139742328
	Failed to kill child process for eventscript, errno No such process(3)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fcc63e04beb427c1f48deae6d3d98c78a2a67949)
2010-04-08 10:35:04 +09:30
Ronnie Sahlberg
e179910136 When we forcefully abort a running eventscript, dont log this as is
the script timedout.

Instead send a different signal (SIGABRT) to the child process to silently
kill the process group for the script and its children without logging
anything.

We abort any running "monitor" script anytime any other event is generated
either by ctdbd itself or by "ctdb eventscript ..."

BZ61043

(This used to be ctdb commit 9dd25cb751919799af9d8a23a0725343a8400e58)
2010-03-30 12:47:54 +11:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Ronnie Sahlberg
68decc38ca Ignore any scripts that timesout for most events, except startup.
Threat hung scripts always (except startup) as success.

(This used to be ctdb commit b6d939c9758c7d2e39206838492f2f644dd61db7)
2010-02-16 11:21:27 +11:00
Ronnie Sahlberg
96a61ca907 Reduce loglevel for two eventscript related debug messages
(This used to be ctdb commit f8994790e65baebb81bbfad646cdda6234b6d29a)
2010-02-16 11:02:11 +11:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Rusty Russell
565b2cda11 eventscript: fix bug when script is aborted
Another corner case when we terminate running monitor scripts to run
something else: logging can flush the output and we write to a NULL
pointer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit eb22c34bccc8a04fcf63efa2bc48d9788709382e)
2009-12-18 14:48:41 +11:00
Rusty Russell
e757b7c4bf eventscript: remove cb_status, fix uninitialized bug when monitoring aborted
(Reapplied with merge after accidental revert)

Previously we updated cb_status a each script finished.  Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.

In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized.  In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't.  So we use the last status (normally,
this will be the just-saved current status).

In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 2c84fe393ff2b961abf77d58a371c24db5ecb93b)
2009-12-18 14:48:35 +11:00
Rusty Russell
4dce0690de eventscript: fix cleanup path when setting up script list
We shouldn't set ctdb->current_monitor until we set destructor: that's
what cleans it up.

Also, free state->scripts on no-scripts exit path: it's not a child of
state because we need it in the destructor.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 843a2ed5ef85f628788b0caf7417c6b61b5c6d3f)
2009-12-18 12:31:34 +11:00
Ronnie Sahlberg
9b507abd6e version 1.0.109
(This used to be ctdb commit 99894a70fe2ebfe43daae7e88ff0fc9cab33e0fb)
2009-12-17 15:49:01 +11:00
Rusty Russell
8aec7e5656 eventscript: remove cb_status, fix uninitialized bug when monitoring aborted
Previously we updated cb_status a each script finished.  Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.

In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized.  In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't.  So we use the last status (normally,
this will be the just-saved current status).

In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 5d50f0e16948d18009f6623f132113f7273efc7f)
2009-12-17 15:39:46 +11:00
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Ronnie Sahlberg
b3104bd1d0 Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Tue Dec 15 15:53:30 2009 +1030

    eventscript: hack to avoid overloading valgrind

    Now we fork one child per script, when running under valgrind the
load
    gets quite high.  This is because valgrind does a lot of work after
exit,
    and we don't wait for the children to finish; we start the next one
when
    the child reports status via the pipe.

    This fix is ugly, but simple.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 6ed34d5320c39d8a55f2a36ad4c1ab574e0b0796)
2009-12-15 20:56:16 +11:00
Rusty Russell
784fa9fd8a eventscript: fix monitoring when killed by another script command
Commit c1ba1392fe "eventscript: get rid of ctdb_control_event_script_finished
altogether" was wrong: there is one case where we want to free the script
without transferring their status to last_status.  This happens because we
always kill an running monitor command when we run any other command.

This still isn't quite right (and never was): the callback will be called
with status value 0, which might flip us to HEALTHY if we were unhealthy.
This is conveniently fixed in my next set of patches :)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 0ea0e27d93398df997d3df9d8bf112358af3a4a5)
2009-12-14 15:46:14 +11:00
Rusty Russell
a46c3b4f2a ctdb: scriptstatus can now query non-monitor events
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.

"ctdb scriptstatus all" returns all event script results.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
2009-12-08 01:50:55 +10:30
Rusty Russell
5d99a1a47c eventscript: expost call names and enum
We're going to need this so ctdb can query non-monitor status.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
2009-12-08 01:47:13 +10:30
Rusty Russell
0dbe76f88f eventscript: lock logging on timeout.
Ronnie suggested this; seems like a very good idea.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 93153bca68926401dc9ae7fd77ed3f17be923344)
2009-12-08 01:32:36 +10:30
Rusty Russell
b29067b02f eventscript: get rid of ctdb_control_event_script_finished altogether
We always have to call it before freeing the state; we should just do
this work in the destructor itself.

Unfortunately, the script state would already be freed by the time
the state destructor is called, so we make the script state a child of
ctdb, and talloc_free() it manually on the one path which doesn't use
the destructor.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c1ba1392fe52762960e896ace0aca0ee4faa94d5)
2009-12-08 12:29:10 +10:30
Rusty Russell
d3593c2f83 eventscript: save state for all script invocations
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array). 

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
2009-12-08 12:27:48 +10:30
Rusty Russell
6960fa96eb eventscript: cleanup finished to take state arg
We only need ctdb->current_monitor so we can kill it when we want to run
something else; we don't need to use it here as we always know what script
we are running.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 4cf1b7c32bcf7e4b65aec1fa7ee1a4b162cac889)
2009-12-08 12:24:56 +10:30
Rusty Russell
e548a335bd eventscript: use wire format internally for script status.
The only difference between the exposed an internal structure now is
that the name and output fields were pointers.  Switch to using
ctdb_scripts_wire/ctdb_script_wire internally as well so marshalling
is a noop.

We now reject scripts which are too long and truncate logging to the
511 characters we have space for (the entire output will be in the
normal ctdbd log).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fd2f04554e604bc421806be96b987e601473a9b8)
2009-12-08 12:48:17 +10:30
Rusty Russell
9753b7e793 eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
2009-12-08 00:51:24 +10:30
Rusty Russell
3ff8bf8138 eventscript: get_current_script() helper
This neatens the code slightly.  We also use the name 'current' in
ctdb_event_script_handler() for uniformity.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e9661b383e0c50b9e3d114b7434dfe601aff5744)
2009-12-08 12:47:24 +10:30
Rusty Russell
cc678d572f eventscript: use an array rather than a linked list of scripts
This brings us closer to the wire format, by using a simple array
and a 'current' iterator.

The downside is that a 'struct ctdb_script' is no longer a talloc
object: the state must be passed to our log fn, and the current
script extracted with &state->scripts->scripts[state->current].

The wackiness of marshalling is simplified, and as a bonus, we can
distinguish between an empty event directory
(state->scripts->num_scripts == 0) and and error (state->scripts ==
NULL).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 76e8bdc11b953398ce8850de57aa51f30cb46bff)
2009-12-08 12:47:05 +10:30
Rusty Russell
1eda08ea29 eventscript: record script status for all events
This unifies almost everything: the state->current pointer points to
the struct ctdb_script where we record start, finish, status and
output.

We still only marshall up the monitor events; the rest disappear when
the state structure is freed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c476c81f3e3d8fc62f2e53d82fce5774044ee9ce)
2009-12-08 12:46:18 +10:30
Rusty Russell
9b50f7ee67 eventscript: use scripts array directly, rather than separate list
We rename ctdb_monitor_script_status to ctdb_script, and instead of
allocating them as the scripts are executed, we allocate them up front
and keep a "current" interator.

This slightly simplifies the code, though it means we only marshall up
to the last successfully run script.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit b2a300768536d10bd867a987ad4cf1c5268c44bc)
2009-12-08 12:45:17 +10:30
Rusty Russell
23e24c503c eventscript: ctdb_fork_with_logging()
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.

This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.

The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
2009-12-08 12:44:30 +10:30
Rusty Russell
c309d22f9a eventscript: remove unused ctbd_ctrl_event_script*
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.

We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
2009-12-08 00:27:40 +10:30
Rusty Russell
69c30c6ba0 eventscript: refactor forking code into fork_child_for_script()
We do the same thing in two places: fire off a child from the initial
ctdb_event_script_callback_v() and also from the ctdb_event_script_handler()
when it's done.

Unify this logic into fork_child_for_script().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 814704a3286756d40c2a6c508c1c0b77fa711891)
2009-12-08 00:22:55 +10:30
Rusty Russell
dd53eee7a2 eventscript: fork() a child for each script.
We rename child_run_scripts() to child_run_script(), because it now
runs a single script rather than walking the list.  When it's
finished, we fork the next child from the ctdb_event_script_handler()
callback.

ctdb_control_event_script_init() and ctdb_control_event_script_finished()
are now called directly by the parent process; the child still calls
ctdb_ctrl_event_script_start() and ctdb_ctrl_event_script_stop() before
and after the script.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0fafdcb8d3532a05846abaa5805b2e2f3cee8f47)
2009-12-08 00:21:25 +10:30
Rusty Russell
640b22ff61 eventscript: store from_user and script_list inside state structure
This means all the state about running the scripts is in that structure,
which helps in the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 020fd21e0905e7f11400f6537988645987f2bb32)
2009-12-08 00:15:18 +10:30
Rusty Russell
b8e347ec9c eventscript: use direct script state pointer for current monitor
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events.  This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.

Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it.  This means we need to reset
that pointer in the destructor for ctdb_event_script_state.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
2009-12-08 00:14:01 +10:30
Rusty Russell
a4c2a98ba9 eventscript: make current_monitor_status_ctx serve as monitor_event_script_ctx
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context.  This seems more
complex than it needs to be.

We use a single "event_script_ctx" as parent for all event script
state structures.  Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
2009-12-08 00:09:20 +10:30
Rusty Russell
68e224d9a4 eventscript: split ctdb_run_event_script into multiple parts
Simple refactoring in preparation for switching to one-child-per-script.
We also call the functions run by the child process "child_".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit bfee777faff75e9bed4aedc1558957483616a6d3)
2009-12-07 23:55:03 +10:30
Rusty Russell
9a0c171fa7 eventscript: hoist work out of child process, into parent
This is the start of a move towards finer-grained reporting, with one
child per script.  Simple code motion to do sanity check and get the
list of scripts before fork().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 816b9177f51ae5b21b92ff4a404f548fe9723c96)
2009-12-07 23:53:35 +10:30
Rusty Russell
928b8dcb31 eventscript: handle banning within the callbacks
Currently the timeout handler in eventscript.c does the banning if a
timeout happens.  However, because monitor events are different, it has
to special case them.

As we call the callback anyway in this case, we should make that handle
-ETIME as it sees fit: for everyone but the monitor event, we simply ban
ourselves.  The more complicated monitor event banning logic is now in
ctdb_monitor.c where it belongs.

Note: I wrapped the other bans in "if (status == -ETIME)", though they
should probably ban themselves on any error.  This change should be a
noop.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)
2009-12-07 23:48:57 +10:30
Rusty Russell
5190932507 eventscript: expost ctdb_ban_self()
eventscript.c uses this now, but our next patch makes others use it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
2009-12-07 23:18:40 +10:30
Rusty Russell
0dd46797d6 eventscript: handle v. unlikely timeout race
If we time out just as the child exits, we currently will report an
uninitialized cb_status field.  Set it to -ETIME as expected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 024386931bda9757079f206238ae09bae4de6ea2)
2009-12-07 23:17:23 +10:30
Rusty Russell
d5d88ecaaf eventscript: replace other -1 returns with -errno
This completes our "problem with script" reporting; we never set cb_status
to -1 on error.  Real errnos are used where the failure is a system call
(eg. read, setpgid), otherwise -EIO is used if we couldn't communicate with
the parent.

The latter case is a bit useless, since the parent probably won't see
the error anyway, but it's neater.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 1269458547795c90d544371332ba1de68df29548)
2009-12-07 23:15:56 +10:30
Rusty Russell
672e06f438 eventscript: simplify ctdb_run_event_script loop
If we break, we avoid cut & paste code inside the loop.  Need to initialize
ret to 0 for the "no scripts" case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit ec36ced9446da7e3bf866466d265ee8e18f606c1)
2009-12-07 23:13:12 +10:30
Rusty Russell
c70afe0cd4 eventscript: handle and report generic stat/execution errors
Rather than ignoring deleted event scripts (or pretending that they were "OK"),
and discarding other stat errors, we save the errno and turn it into a negative
status.

This gives us a bit more information if we can't execute a script (eg.
too many symlinks or other weird errors).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
2009-12-07 23:12:19 +10:30
Rusty Russell
b9b75bd065 eventscript: use -ENOEXEC for disabled status value
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
2009-12-07 23:11:47 +10:30
Rusty Russell
ce378014c7 eventscript: enhance script delete race check
We currently assume 127 == script removed.  The script can also return 127;
best to re-check the execution status in this case (and for 126, which will
happen if the script is non-executable).

If the script is no longer executable/not present, we ignore it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0a53d6b5ac81daf0efa32f35e7758ede2a5bdb63)
2009-12-07 23:09:02 +10:30
Rusty Russell
8993d6f523 eventscript: check_executable() to centralize stat/perm checks
This is used later in the "script vanished" check.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 8ddb97040842375daf378cbb5816d0c2b031fa65)
2009-12-07 23:09:39 +10:30
Rusty Russell
066a791770 eventscript: use -ETIME for timeout status value
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).

This does timeout, but changes the ctdb tool to recognize it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
2009-12-07 23:09:42 +10:30
Rusty Russell
85a6f4a4dd eventscript: marshall onto last_status immediately
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
2009-12-07 23:09:40 +10:30
Rusty Russell
774bf144c1 eventscript: reduce code duplication for ending a script, and fix bug
Commit 50c2caed57c0 removed a gratuitous talloc_steal from the code in
ctdb_control_event_script_finished(), but not ctdb_event_script_timeout().

Easiest to call ctdb_control_event_script_finished() at the bottom of the
timeout routine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 17fa252d0d6981fbae8083a818f26d5ce9c5102e)
2009-12-02 16:15:57 +10:30
Ronnie Sahlberg
569001afd0 Merge commit 'martins/status-test-2'
Conflicts:

	server/eventscript.c

(This used to be ctdb commit e9b3477a5b9a2eff18f727e7d59338bfb5214793)
2009-12-01 10:53:18 +11:00
Ronnie Sahlberg
3bc643b46b remove a stray ) so we compile
(This used to be ctdb commit 16db4882635d84b8410a77e2ea8b08d0a257b0ab)
2009-11-27 13:35:39 +11:00
Ronnie Sahlberg
266a163c89 dont use talloc_steal() on a object that is already a child of ctdb.
(This used to be ctdb commit 50c2caed57c0520f506eaaeeb0bba2c272da6ef6)
2009-11-27 13:28:31 +11:00
Ronnie Sahlberg
eaa6218def Merge commit 'martins/status-test' into status-test-2
(This used to be ctdb commit 937823cc73eb098230acff4b1583f6d01f26c21a)
2009-11-27 12:50:45 +11:00
Martin Schwenke
dc2c8dfde1 Merge commit 'martins-svart/status-test-2' into status-test
(This used to be ctdb commit 0e6c06ac38fd82adf124d111717502055501974a)
2009-11-27 12:49:31 +11:00
Martin Schwenke
ce06d3de46 Event script infrastructure: add reload event to check_options().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c278c798d41a35f58ca81f8f0e08e4dab51eba9b)
2009-11-27 12:04:02 +11:00
Ronnie Sahlberg
09b9bb2f9f Merge commit 'martins/status-test' into status-test-2
(This used to be ctdb commit 28d0648725e7de4e4d0e8569e3fbfb0fa1d7f934)
2009-11-26 16:26:25 +11:00
Martin Schwenke
88cd194d6a Merge commit 'martins-svart/status-test-2' into status-test
(This used to be ctdb commit 143f1fa3cc4588505e3992c601153ea08be8432d)
2009-11-26 16:25:15 +11:00
Martin Schwenke
a64ccf07c1 Add flag to ctdb_event_script_callback indicating when called by client.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)
2009-11-26 15:49:49 +11:00
Ronnie Sahlberg
ed4f3ea3cc resolve some conflicts from merging from martins branch
(This used to be ctdb commit d3e7407dc9854ec358d081777c5450ec68b17862)
2009-11-26 13:42:12 +11:00
Martin Schwenke
8029db6a91 Merge commit 'martins-svart/status-test-2' into status-test
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a2830594ebeb54eb51ff90999cb12370aeec6e8b)
2009-11-26 10:49:47 +11:00
Rusty Russell
3188df4a88 eventscript: check that ctdb forced script events correct
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)
2009-11-25 11:02:29 +10:30
Rusty Russell
ff59bb34af eventscript: check that ctdb forced script events correct
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 66b22980b14601f29fe8cc64bd8f29883c7ca1c0)
2009-11-24 11:24:22 +10:30
Rusty Russell
0b4b83aea0 eventscript: check that internal script events are being invoked correctly
This is not as good as a compile-time check, but at least we count the
number of arguments are correct.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 83b7b233cb4707e826f6ba260bd630c8bc8f1e76)
2009-11-24 11:23:13 +10:30
Rusty Russell
187efa08ab eventscript: check that internal script events are being invoked correctly
This is not as good as a compile-time check, but at least we count the
number of arguments are correct.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit a6d353519932eee48f9241ad8887b692882906c9)
2009-11-24 11:23:13 +10:30
Rusty Russell
534c709cba eventscript: remove call name from state->options
Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.

This is clearer, and mainly involves adding some %s to debug statements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 33fb0e7ba047ca73969b59bccf70a04a17c25a0a)
2009-11-24 11:22:46 +10:30
Rusty Russell
0ef91a4e1f eventscript: remove call name from state->options
Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.

This is clearer, and mainly involves adding some %s to debug statements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit b0648c7f08eba87ec3c9714e2525c9b621bfb4ef)
2009-11-24 11:22:46 +10:30
Rusty Russell
461f52736d eventscript: put call type into state struct.
This means we can get rid of more strcmp; they can simply use the
state->call value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6c79fa33e26cc4f0873577f8e122b1495b4c427e)
2009-11-24 11:19:58 +10:30
Rusty Russell
205011cb61 eventscript: put call type into state struct.
This means we can get rid of more strcmp; they can simply use the
state->call value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 834c93b3e1b8f4151b8a2cd82c2dd8bacc17f66c)
2009-11-24 11:19:58 +10:30
Rusty Russell
2d9254404d eventscript: introduce enum for different event script calls.
Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)
2009-11-24 11:16:49 +10:30
Rusty Russell
e0c6e2f489 eventscript: introduce enum for different event script calls.
Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2)
2009-11-24 11:16:49 +10:30
Rusty Russell
2763df22de eventscript: put timeout inside ctdb_event_script_callback_v
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)
2009-11-24 11:09:46 +10:30
Rusty Russell
5dee5769d3 eventscript: put timeout inside ctdb_event_script_callback_v
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fe8027309c1f7b987cd368fa98f9b28741baa786)
2009-11-24 11:09:46 +10:30
Rusty Russell
3845c6e5b8 eventscript: cleanup ctdb_event_script_v
ctdb_event_script_v doesn't take varargs.  ctdb_run_event_script is
a better name, and fix comment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 466beafadb37011fe273de8810ab0012e92a1fd8)
2009-11-24 11:09:01 +10:30
Rusty Russell
1d68bb35b2 eventscript: typo cleanups
1) ctdb_event_script_v doesn't take varargs.  ctdb_run_event_script is
   a better name, and fix comment.
2) Fix indentation on allowed_scripts.
3) Comment on run_eventscripts_callback is wrong; it's the callback
   for any ctdb forced event.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e7d57d7ae678b24dab3364a348838c6a3398942c)
2009-11-24 11:08:39 +10:30
Rusty Russell
ab675516cc eventscript: fix bug in timeouts on forced eventscripts. Again.
In 15bc66ae801b0c69, Ronnie fixed a double-free race.  The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback.  This particularly
hurt in ctdb_event_script_timeout.

There's nothing wrong with this, but obviously we should make the callback
call last of all.  At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.

This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again.  But the same change introduced
a direct use-after-free bug which caused an occasional oops.

So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.

But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix!  Let's try to fix this in a more permanent way, but always doing
the callback from the destructor.  This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.

Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit b90bdb07c1f6913ddbf11bde9684bdc8af61c549)
2009-11-24 11:06:53 +10:30
Rusty Russell
0339a83897 eventscript: fix bug in timeouts on forced eventscripts. Again.
In 15bc66ae801b0c69, Ronnie fixed a double-free race.  The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback.  This particularly
hurt in ctdb_event_script_timeout.

There's nothing wrong with this, but obviously we should make the callback
call last of all.  At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.

This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again.  But the same change introduced
a direct use-after-free bug which caused an occasional oops.

So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.

But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix!  Let's try to fix this in a more permanent way, but always doing
the callback from the destructor.  This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.

Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 20b15de068d042b292725945927ceda1b01d07c0)
2009-11-24 11:06:53 +10:30
Rusty Russell
8723045c61 eventscript: clean up forked handler event code
Write the whole int through the pipe, rather than quietly cutting it
off.  Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.

Add a comment about why we don't need to check the write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6804f880436645b52c09a78fa300377fa8058d0e)
2009-11-24 11:00:13 +10:30
Ronnie Sahlberg
e6b69fa760 rework and simplify the eventscript handling
This version has no trailing whitespace, and fixed 


(This used to be ctdb commit defbe318152fc479e8076ad70433cdb4971951af)
2009-11-25 11:00:11 +10:30
Rusty Russell
b320d434b2 eventscript: clean up forked handler event code
Write the whole int through the pipe, rather than quietly cutting it
off.  Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.

Add a comment about why we don't need to check the write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c715746c2f40eb9b21dbf011d16f1f1b0b53fdf9)
2009-11-24 11:00:13 +10:30
Ronnie Sahlberg
eb3b787394 rework and simplify the eventscript handling
(This used to be ctdb commit c5f798116bf3b7954e23c7267b056ee1f5560f45)
2009-11-24 07:40:51 +11:00
Ronnie Sahlberg
ae209c74c8 dont reset the event script context everytime we start a new "ctdb eventscript ..."
command.
Use the existing context used for non-monitor events

Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV

(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
2009-11-19 11:03:51 +11:00
Ronnie Sahlberg
93d902e8f7 test of a change to make ctdbd use "status" event instead of the "monitor" event.
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.

(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
2009-11-13 12:37:55 +11:00
Volker Lendecke
1fa1830f81 Fix a segfault in the eventscript timeout handler.
The state was freed too early.

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit eda052101728cf922ce892e3c53b4f37e7ceac42)
2009-11-05 11:13:53 +01:00
Ronnie Sahlberg
d379b30182 create a separate context for non-monitor eventscripts so they dont collide
(This used to be ctdb commit 325de818f88f339a16dc4544e899a2d735933c44)
2009-10-28 17:35:15 +11:00
Ronnie Sahlberg
f8a8c0d6e4 return 0 in the event script callback if it was aborted by a different script
(This used to be ctdb commit 8d5cb2586a1d5a0255cc18295430927b914d4527)
2009-10-28 16:40:31 +11:00
Ronnie Sahlberg
e07ca41886 change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts
restructure the talloc hierarchy to allow this

(This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)
2009-10-28 16:11:54 +11:00
Ronnie Sahlberg
3526bc830d Enhance the logging fromeventscripts.
When a single script is finished, also log the name of the script, the duration it took and the return status.

In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second

(This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)
2009-10-28 09:07:43 +11:00
Ronnie Sahlberg
1d7681709b dont run the monitor event so frequently after a event has failed.
use _exit() instead of exit() when terminating an eventscript.

(This used to be ctdb commit cc30ee2f4f33cb75b2be980c2d4dff6c7c23852f)
2009-10-27 13:51:45 +11:00
Ronnie Sahlberg
c61c655769 when scripts timeout, log pstree to a file in /tmp and just log the filename in the messages file
(This used to be ctdb commit 0785afba8e5cd501b9e0ecb4a6a44edf43b57ab0)
2009-10-23 13:55:21 +11:00
Ronnie Sahlberg
902c476c03 From Volker L
Fix some warnings  and an incorrect check for a talloc failure

(This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)
2009-10-22 12:19:40 +11:00
Ronnie Sahlberg
d5fd4fc0ce During tests it is common to add/delete test eventscripts at runtime.
This can race with teh eventascript handling that does a :

list all scripts,   sort them,  then execute them

so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy

(This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)
2009-10-21 16:50:39 +11:00
Ronnie Sahlberg
a92ba7f729 lower the debug levels for the "create FD messages" so we dont fill up the logs.
(This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)
2009-10-21 15:26:24 +11:00
Ronnie Sahlberg
d788dd3627 From wolfgang Mueller
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned

(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
2009-10-20 12:59:48 +11:00