IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We only need ctdb->current_monitor so we can kill it when we want to run
something else; we don't need to use it here as we always know what script
we are running.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 4cf1b7c32bcf7e4b65aec1fa7ee1a4b162cac889)
The only difference between the exposed an internal structure now is
that the name and output fields were pointers. Switch to using
ctdb_scripts_wire/ctdb_script_wire internally as well so marshalling
is a noop.
We now reject scripts which are too long and truncate logging to the
511 characters we have space for (the entire output will be in the
normal ctdbd log).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fd2f04554e604bc421806be96b987e601473a9b8)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
This neatens the code slightly. We also use the name 'current' in
ctdb_event_script_handler() for uniformity.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e9661b383e0c50b9e3d114b7434dfe601aff5744)
This brings us closer to the wire format, by using a simple array
and a 'current' iterator.
The downside is that a 'struct ctdb_script' is no longer a talloc
object: the state must be passed to our log fn, and the current
script extracted with &state->scripts->scripts[state->current].
The wackiness of marshalling is simplified, and as a bonus, we can
distinguish between an empty event directory
(state->scripts->num_scripts == 0) and and error (state->scripts ==
NULL).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 76e8bdc11b953398ce8850de57aa51f30cb46bff)
This unifies almost everything: the state->current pointer points to
the struct ctdb_script where we record start, finish, status and
output.
We still only marshall up the monitor events; the rest disappear when
the state structure is freed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c476c81f3e3d8fc62f2e53d82fce5774044ee9ce)
We rename ctdb_monitor_script_status to ctdb_script, and instead of
allocating them as the scripts are executed, we allocate them up front
and keep a "current" interator.
This slightly simplifies the code, though it means we only marshall up
to the last successfully run script.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit b2a300768536d10bd867a987ad4cf1c5268c44bc)
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.
This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.
The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
We do the same thing in two places: fire off a child from the initial
ctdb_event_script_callback_v() and also from the ctdb_event_script_handler()
when it's done.
Unify this logic into fork_child_for_script().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 814704a3286756d40c2a6c508c1c0b77fa711891)
We rename child_run_scripts() to child_run_script(), because it now
runs a single script rather than walking the list. When it's
finished, we fork the next child from the ctdb_event_script_handler()
callback.
ctdb_control_event_script_init() and ctdb_control_event_script_finished()
are now called directly by the parent process; the child still calls
ctdb_ctrl_event_script_start() and ctdb_ctrl_event_script_stop() before
and after the script.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0fafdcb8d3532a05846abaa5805b2e2f3cee8f47)
This means all the state about running the scripts is in that structure,
which helps in the next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 020fd21e0905e7f11400f6537988645987f2bb32)
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events. This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.
Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it. This means we need to reset
that pointer in the destructor for ctdb_event_script_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context. This seems more
complex than it needs to be.
We use a single "event_script_ctx" as parent for all event script
state structures. Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
Simple refactoring in preparation for switching to one-child-per-script.
We also call the functions run by the child process "child_".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit bfee777faff75e9bed4aedc1558957483616a6d3)
This is the start of a move towards finer-grained reporting, with one
child per script. Simple code motion to do sanity check and get the
list of scripts before fork().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 816b9177f51ae5b21b92ff4a404f548fe9723c96)
Currently the timeout handler in eventscript.c does the banning if a
timeout happens. However, because monitor events are different, it has
to special case them.
As we call the callback anyway in this case, we should make that handle
-ETIME as it sees fit: for everyone but the monitor event, we simply ban
ourselves. The more complicated monitor event banning logic is now in
ctdb_monitor.c where it belongs.
Note: I wrapped the other bans in "if (status == -ETIME)", though they
should probably ban themselves on any error. This change should be a
noop.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)
eventscript.c uses this now, but our next patch makes others use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
If we time out just as the child exits, we currently will report an
uninitialized cb_status field. Set it to -ETIME as expected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 024386931bda9757079f206238ae09bae4de6ea2)
This completes our "problem with script" reporting; we never set cb_status
to -1 on error. Real errnos are used where the failure is a system call
(eg. read, setpgid), otherwise -EIO is used if we couldn't communicate with
the parent.
The latter case is a bit useless, since the parent probably won't see
the error anyway, but it's neater.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 1269458547795c90d544371332ba1de68df29548)
If we break, we avoid cut & paste code inside the loop. Need to initialize
ret to 0 for the "no scripts" case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit ec36ced9446da7e3bf866466d265ee8e18f606c1)
Rather than ignoring deleted event scripts (or pretending that they were "OK"),
and discarding other stat errors, we save the errno and turn it into a negative
status.
This gives us a bit more information if we can't execute a script (eg.
too many symlinks or other weird errors).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
We currently assume 127 == script removed. The script can also return 127;
best to re-check the execution status in this case (and for 126, which will
happen if the script is non-executable).
If the script is no longer executable/not present, we ignore it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0a53d6b5ac81daf0efa32f35e7758ede2a5bdb63)
This is used later in the "script vanished" check.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8ddb97040842375daf378cbb5816d0c2b031fa65)
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).
This does timeout, but changes the ctdb tool to recognize it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
Commit 50c2caed57c0 removed a gratuitous talloc_steal from the code in
ctdb_control_event_script_finished(), but not ctdb_event_script_timeout().
Easiest to call ctdb_control_event_script_finished() at the bottom of the
timeout routine.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 17fa252d0d6981fbae8083a818f26d5ce9c5102e)
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.
This gets rid of the "UNKNOWN" event type.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.
This gets rid of the "UNKNOWN" event type.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 66b22980b14601f29fe8cc64bd8f29883c7ca1c0)
This is not as good as a compile-time check, but at least we count the
number of arguments are correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 83b7b233cb4707e826f6ba260bd630c8bc8f1e76)
This is not as good as a compile-time check, but at least we count the
number of arguments are correct.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit a6d353519932eee48f9241ad8887b692882906c9)
Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.
This is clearer, and mainly involves adding some %s to debug statements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 33fb0e7ba047ca73969b59bccf70a04a17c25a0a)
Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.
This is clearer, and mainly involves adding some %s to debug statements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit b0648c7f08eba87ec3c9714e2525c9b621bfb4ef)
This means we can get rid of more strcmp; they can simply use the
state->call value instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6c79fa33e26cc4f0873577f8e122b1495b4c427e)
This means we can get rid of more strcmp; they can simply use the
state->call value instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 834c93b3e1b8f4151b8a2cd82c2dd8bacc17f66c)
Rather than doing strcmp everywhere, pass an explicit enum around. This
also subtly documents what options are available. The "options" arg
is now used for extra arguments only.
Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".
For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)
Rather than doing strcmp everywhere, pass an explicit enum around. This
also subtly documents what options are available. The "options" arg
is now used for extra arguments only.
Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".
For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2)
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fe8027309c1f7b987cd368fa98f9b28741baa786)
ctdb_event_script_v doesn't take varargs. ctdb_run_event_script is
a better name, and fix comment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 466beafadb37011fe273de8810ab0012e92a1fd8)
1) ctdb_event_script_v doesn't take varargs. ctdb_run_event_script is
a better name, and fix comment.
2) Fix indentation on allowed_scripts.
3) Comment on run_eventscripts_callback is wrong; it's the callback
for any ctdb forced event.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e7d57d7ae678b24dab3364a348838c6a3398942c)
In 15bc66ae801b0c69, Ronnie fixed a double-free race. The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback. This particularly
hurt in ctdb_event_script_timeout.
There's nothing wrong with this, but obviously we should make the callback
call last of all. At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.
This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again. But the same change introduced
a direct use-after-free bug which caused an occasional oops.
So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.
But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix! Let's try to fix this in a more permanent way, but always doing
the callback from the destructor. This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.
Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit b90bdb07c1f6913ddbf11bde9684bdc8af61c549)
In 15bc66ae801b0c69, Ronnie fixed a double-free race. The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback. This particularly
hurt in ctdb_event_script_timeout.
There's nothing wrong with this, but obviously we should make the callback
call last of all. At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.
This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again. But the same change introduced
a direct use-after-free bug which caused an occasional oops.
So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.
But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix! Let's try to fix this in a more permanent way, but always doing
the callback from the destructor. This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.
Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 20b15de068d042b292725945927ceda1b01d07c0)
Write the whole int through the pipe, rather than quietly cutting it
off. Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.
Add a comment about why we don't need to check the write.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6804f880436645b52c09a78fa300377fa8058d0e)
Write the whole int through the pipe, rather than quietly cutting it
off. Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.
Add a comment about why we don't need to check the write.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c715746c2f40eb9b21dbf011d16f1f1b0b53fdf9)
command.
Use the existing context used for non-monitor events
Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV
(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.
(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
When a single script is finished, also log the name of the script, the duration it took and the return status.
In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second
(This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)
This can race with teh eventascript handling that does a :
list all scripts, sort them, then execute them
so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy
(This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned
(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
so we can spot if there are leaks.
plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish
(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
print a full "pstree -p" to the log.
Example :
|-ctdbd(29826)-+-ctdbd(29862)
| `-ctdbd(31897)-+-00.ctdb(31898)---sleep(31908)
change the default timeout to 60 seconds for eventscripts
(This used to be ctdb commit a3406c10d70f89d332eab25d481083142dff987d)
This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ...
Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered.
(This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.
If an eventscript timedout or returned an error we also
show the output from the eventscript.
Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009
OUTPUT:ERROR: Samba tcp port 445 is not responding
Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.
Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.
(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
memory if ctdb_run_eventscript() would be called
during processing of ctdb_event_script_timeout() for
user unvoked eventscripts. (eventsccripts invoked by "ctdb eventscript ...")
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
(This used to be ctdb commit 15bc66ae801b0c69a65a7a2acf5df151e76edc2a)
This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.
revert the waitpid changes. we need to waitpid for some childredn so should
refactor the approach completely
(This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)
so we should not call it from the main daemon.
1, set SIGCHLD to SIG_DFL to make sure we ignore this signal
2, get rid of all waitpid() calls
3, change reporting of event script status code from _exit()/waitpid() to write()/read() one byte across the pipe.
(This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)
If the event script that timed out was for the "monitor" event, then
even if it timed out we still return SUCCESS back to the guy invoking the eventscript.
Only consider the eventscript for "monitor" to have failed with an error
IFF it actually terminated with an error, or if it timed out 5 times in a row and hung.
(This used to be ctdb commit 60f3c04bd8b20ecbe937ffed08875cdc6898b422)
This attempts to fix the problem of ctdb event scripts blocking due to
attempted access to the ctdb databases during recovery. The changes are:
- now only the 'shutdown' and 'startrecovery' events can be called
with the databases locked in recovery. The event scripts must ensure
that for these two events no database access is attempted
- the recovered, takeip and releaseip events could previously be called
inside a recovery. The code now ensures that this doesn't happen, delaying
the events till after recovery has finished
- the 50.samba event script now avoids using testparm unless it is really
needed
This needs extensive testing.
(This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b)
multiple public addresses spread across multiple interfaces on each
node.
this is a massive patch since we have previously made the assumtion that
we only have one public address per node.
get rid of the public_interface argument. the public addresses file
now explicitely lists which interface the address belongs to
(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
specific script /etc/ctdb/events.d/00.ctdb
get rid of CTDB_EVENTS_SCRIPT and --event-script
(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
instead for from /etc/ctdb/events so that we can get better debugging
output in the logs when something fails in the scripts
(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)