IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The only allocation against this context is by
ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler()
anyway. There should be no memory leak.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it. This includes
callback state for current_monitor, if the current monitor event has
not yet finished. As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.
So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)
Running of eventscripts can be initiated from many places, including
the recovery daemon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 440892d75ef73c0aca22f47c0c01712be00cf5b7)
The eventscripts are run after a takeover run and in this case they're
not forced. The messages seems to imply that somone has run "ctdb
eventscript" when that is not necessarily the case.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3880589db4d563e438126cf5080261fa06b9e242)
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.
S1037271
(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.
Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a
(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
Instead, tie them together via referencing a permanent linked list hung off the ctdb structure.
(This used to be ctdb commit a95c02da6c67dc4bd8716b75318a4188301df6f9)
When a monitor event is canceled by a higher priority script, make sure we return
status -ECANCELED to the callback in ctdB_monitor.c
Also treat -ECANCELED as a simple "try monitor event again" and skip modifying any HEALTHY/UNHEALTHY flags when this happens
(This used to be ctdb commit a15ec57c26d1bc82af85f74eebae0bd8abde3233)
ctdb_event_script_callback() takes a mem_ctx arg which it doesn't use, but
the implication is pretty clear, that when that mem_ctx is freed, the callback
shouldn't happen. Indeed, Ronnie reproduced a case where that callback
refers to freed memory, in the ip reallocation code under stress.
So attach the callback to the mem_ctx they give us, and remove it from the
script state structure when that's freed. It's a bit weird, but it works.
CQ: S1026179
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6fcd867cc835ef1ffc1c50964f135c346503d40c)
scheduler for the child.
Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.
(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
This is called everytime a reallocation is performed.
While STARTRECOVERY/RECOVERED events are only called when
we do ipreallocation as part of a full database/cluster recovery,
this new event can be used to trigger on when we just do a light
failover due to a node becomming unhealthy.
I.e. situations where we do a failover but we do not perform a full
cluster recovery.
Use this to trigger for natgw so we select a new natgw master node
when failover happens and not just when cluster rebuilds happen.
(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
This means we can distinguish which child is logging, esp. via syslog where we have no pid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
Now the script child signal handler doesn't do anything, we can unify the
"timeout" and "abort" cases introduced in 9dd25cb751919799.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 439f049c7024d69aa4b87dc811e1772981ad29cb)
Fairly simple: prevent the destructor from killing the script, and do it
explicitly from the debugging child.
We can remove the extra "already dead" test, since this will be detected
in the destructor anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f8aa83788e3cc10ab7655a90d7b7b17ddbe48685)
In the case of a timeout, we dump a log of what's happening to a file
in /tmp. We do it from the signal handler, which is an unreliable hack
(BZ58365).
Instead, create another (lower-priority) child to do the dump, then
kill the timedout script.
Note that this doesn't quite work as intended (the dump is often run
after the script has been killed), so the next patch resolves this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)
Initialize the child pid to 0 so destructor doesn't try to kill it:
server/eventscript.c:565 Sending SIGTERM to child pid:139742328
Failed to kill child process for eventscript, errno No such process(3)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fcc63e04beb427c1f48deae6d3d98c78a2a67949)
the script timedout.
Instead send a different signal (SIGABRT) to the child process to silently
kill the process group for the script and its children without logging
anything.
We abort any running "monitor" script anytime any other event is generated
either by ctdbd itself or by "ctdb eventscript ..."
BZ61043
(This used to be ctdb commit 9dd25cb751919799af9d8a23a0725343a8400e58)
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.
metze
(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
Another corner case when we terminate running monitor scripts to run
something else: logging can flush the output and we write to a NULL
pointer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eb22c34bccc8a04fcf63efa2bc48d9788709382e)
(Reapplied with merge after accidental revert)
Previously we updated cb_status a each script finished. Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.
In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized. In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't. So we use the last status (normally,
this will be the just-saved current status).
In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2c84fe393ff2b961abf77d58a371c24db5ecb93b)
We shouldn't set ctdb->current_monitor until we set destructor: that's
what cleans it up.
Also, free state->scripts on no-scripts exit path: it's not a child of
state because we need it in the destructor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 843a2ed5ef85f628788b0caf7417c6b61b5c6d3f)
Previously we updated cb_status a each script finished. Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.
In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized. In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't. So we use the last status (normally,
this will be the just-saved current status).
In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 5d50f0e16948d18009f6623f132113f7273efc7f)
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit. We can also happily move the kill-child eventscript hack under
this flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
Date: Tue Dec 15 15:53:30 2009 +1030
eventscript: hack to avoid overloading valgrind
Now we fork one child per script, when running under valgrind the
load
gets quite high. This is because valgrind does a lot of work after
exit,
and we don't wait for the children to finish; we start the next one
when
the child reports status via the pipe.
This fix is ugly, but simple.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6ed34d5320c39d8a55f2a36ad4c1ab574e0b0796)
Commit c1ba1392fe "eventscript: get rid of ctdb_control_event_script_finished
altogether" was wrong: there is one case where we want to free the script
without transferring their status to last_status. This happens because we
always kill an running monitor command when we run any other command.
This still isn't quite right (and never was): the callback will be called
with status value 0, which might flip us to HEALTHY if we were unhealthy.
This is conveniently fixed in my next set of patches :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0ea0e27d93398df997d3df9d8bf112358af3a4a5)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
We're going to need this so ctdb can query non-monitor status.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
Ronnie suggested this; seems like a very good idea.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 93153bca68926401dc9ae7fd77ed3f17be923344)
We always have to call it before freeing the state; we should just do
this work in the destructor itself.
Unfortunately, the script state would already be freed by the time
the state destructor is called, so we make the script state a child of
ctdb, and talloc_free() it manually on the one path which doesn't use
the destructor.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c1ba1392fe52762960e896ace0aca0ee4faa94d5)
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
We only need ctdb->current_monitor so we can kill it when we want to run
something else; we don't need to use it here as we always know what script
we are running.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 4cf1b7c32bcf7e4b65aec1fa7ee1a4b162cac889)
The only difference between the exposed an internal structure now is
that the name and output fields were pointers. Switch to using
ctdb_scripts_wire/ctdb_script_wire internally as well so marshalling
is a noop.
We now reject scripts which are too long and truncate logging to the
511 characters we have space for (the entire output will be in the
normal ctdbd log).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fd2f04554e604bc421806be96b987e601473a9b8)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
This neatens the code slightly. We also use the name 'current' in
ctdb_event_script_handler() for uniformity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e9661b383e0c50b9e3d114b7434dfe601aff5744)
This brings us closer to the wire format, by using a simple array
and a 'current' iterator.
The downside is that a 'struct ctdb_script' is no longer a talloc
object: the state must be passed to our log fn, and the current
script extracted with &state->scripts->scripts[state->current].
The wackiness of marshalling is simplified, and as a bonus, we can
distinguish between an empty event directory
(state->scripts->num_scripts == 0) and and error (state->scripts ==
NULL).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 76e8bdc11b953398ce8850de57aa51f30cb46bff)
This unifies almost everything: the state->current pointer points to
the struct ctdb_script where we record start, finish, status and
output.
We still only marshall up the monitor events; the rest disappear when
the state structure is freed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c476c81f3e3d8fc62f2e53d82fce5774044ee9ce)
We rename ctdb_monitor_script_status to ctdb_script, and instead of
allocating them as the scripts are executed, we allocate them up front
and keep a "current" interator.
This slightly simplifies the code, though it means we only marshall up
to the last successfully run script.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit b2a300768536d10bd867a987ad4cf1c5268c44bc)
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.
This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.
The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
We do the same thing in two places: fire off a child from the initial
ctdb_event_script_callback_v() and also from the ctdb_event_script_handler()
when it's done.
Unify this logic into fork_child_for_script().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 814704a3286756d40c2a6c508c1c0b77fa711891)