1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00
Commit Graph

177 Commits

Author SHA1 Message Date
Martin Schwenke
ecafbce1b1 ctdb-daemon: Do not disable monitoring when running eventscripts
This is racy and cbffbb7c2f makes it
unnecessary.

The eventscript code still knows that monitor events are special
compared to other events.  However, the general concept of monitoring
is no longer tangled up with running scripts.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-03-23 04:20:14 +01:00
Amitay Isaacs
276b233c00 ctdb-daemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug script
If CTDB_DEUB_HUNG_SCRIPT is set, use that instead of the default
debug script.  This code was dropped by mistake in commit
18c1f43210.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Feb 12 08:47:47 CET 2014 on sn-devel-104
2014-02-12 08:47:47 +01:00
Amitay Isaacs
eee450fec2 ctdb-daemon: Simplify listing event scripts using scandir
Instead of using RB tree for sorting the script names (incorrectly since
it's only using the leading numbers in the script name), use scandir
with alphasort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104
2014-01-21 06:41:25 +01:00
Amitay Isaacs
cbffbb7c2f ctdb-daemon: Do not run monitor event if any other event is already running
Any currently running monitor events are cancelled if any other events
are scheduled.  However, this does not stop monitor events to be run
when other events are already running.

Keep track of the number of active events and schedule monitor event
only if there are no active events.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-21 11:30:41 +11:00
Amitay Isaacs
97575e1ba0 ctdb-daemon: Remove unused code to run eventscripts
Eventscripts are now executed using a helper.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 12:11:38 +11:00
Amitay Isaacs
18c1f43210 ctdb-daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 2)
Use ctdb_event_helper to run debug-hung-script.sh.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 12:11:38 +11:00
Amitay Isaacs
d86662a925 ctdb-daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging (part 1)
Use ctdb_event_helper to run eventscripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 12:11:37 +11:00
Amitay Isaacs
7aa20ccb5c ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER
This was added to support external monitoring using CTDB event scripts.
However, it was never used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 11:41:12 +11:00
Amitay Isaacs
bafa467021 ctdb-daemon: Deprecate RELOAD and STATUS events
These events have never been used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 11:41:12 +11:00
Amitay Isaacs
e850a6d2ca ctdbd: Finish eventscript callback processing before debugging hung script
This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script.  So "ctdb scriptstatus" output
will be useful from debug hung script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
a030b938ca eventscript: Wait for debug hung script to finish or timeout before continuing
Currently if the debug hung script takes long time to finish, the subsequent
monitor event can collide with the previous event which is not yet finished.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9e99e0eb072e2b845914ee3896acbc66b96138d7)
2013-08-09 11:04:55 +10:00
Martin Schwenke
6cbcc4a8d9 ctdbd: Pass event name to hung script debugger
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e0f3fa1020e13b84bdd672538168d148f1847d57)
2013-07-23 11:28:07 +10:00
Sumit Bose
157f1cfefd Fixes for various issues found by Coverity
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952)
2013-07-11 15:16:55 +10:00
Amitay Isaacs
1c21f37e57 ctdbd: Set process names for child processes
This helps distinguish processes in process list in top, perf, etc.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
c944a589ca ctdbd: Don't ban self if init or shutdown event fails
There is no point in banning the node if init or shutdown event times
out since it's going to quit anyway.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ef1c4e99ca66e7a990bc557f34abb624c315e6ba)
2013-07-02 12:59:09 +10:00
Amitay Isaacs
d0c858f211 ctdbd: Make sure we don't kill init process by mistake
If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1).  This does not happen on RHEL, but does on AIX.

Reported-by: Chris Cowan <cc@us.ibm.com>

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit edb2a3556d03e248b42f63dd2c62382b723bc98f)
2013-06-14 16:39:48 +10:00
Martin Schwenke
fa16cccf02 ctdbd: Remove the "stopped" event
It isn't used, superceded by "ipreallocated".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
2013-05-06 13:38:21 +10:00
Martin Schwenke
2e59cd5428 ctdbd: New control CTDB_CONTROL_IPREALLOCATED
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)
2013-05-06 13:38:21 +10:00
Martin Schwenke
f6e48639cd ctdbd: Avoid freeing non-monitor event callback when monitoring is disabled
When running a non-monitor event, check is made for any active monitor
events.  If there is an active monitor event, then the active monitor
event is cancelled.  This is done by freeing state->callback which is
allocated from monitor_context.

When CTDB is stopped or shutdown, monitoring is disabled by freeing
monitor_context, which frees callback and then stopped or shutdown event
is run.  This creates a new callback structure which is allocated at
the exact same memory location as the monitor callback which was freed.
So in the check for active monitor events, it frees the new callback
for non-monitor event.  Since the callback function flags successful
completion of that event, it is never marked complete and CTDB is stuck
in a loop waiting for completion.

Move the monitor cancellation to the top of the function so that this
can't happen.

Follow log snippest highlights the problem.

2013/04/30 16:54:10.673807 [21505]: Received SHUTDOWN command. Stopping CTDB daemon.
2013/04/30 16:54:10.673814 [21505]: Shutting down recovery daemon
2013/04/30 16:54:10.673852 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0
2013/04/30 16:54:10.673858 [21505]: Monitoring has been stopped
2013/04/30 16:54:10.673899 [21505]: server/eventscript.c:594 Sending SIGTERM to child pid:23847
2013/04/30 16:54:10.673913 [21505]: server/eventscript.c:629 searching for callback 0x1c6d5c0
2013/04/30 16:54:10.673932 [21505]: server/eventscript.c:641 running callback
2013/04/30 16:54:10.673939 [21505]: server/eventscript.c:866 in event_script_callback
2013/04/30 16:54:10.673946 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 05f785b51cfd8b22b3ae35bf034127fbc07005be)
2013-05-06 13:00:07 +10:00
Martin Schwenke
37632efde0 ctdbd: Don't use a fixed length buffer for the hung script command
The amount of data to write into the buffer wasn't constrained
anywhere...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9b0d56b16775aa16f33bdfdf831256e085fa3339)
2013-02-05 16:05:13 +11:00
Martin Schwenke
e883720461 ctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable
This is quite easy to misconfigure by failing to set the execute bit
on the script.  Better to complain loudly.

This is a debugging facilty rather than core CTDB functionality, so it
doesn't need a subtle mechanism to disable it at run-time.  To disable
the designated script at run-time either edit it to put an "exit 0" at
the top or move it aside and symlink to /bin/true.

This is implemented by actually removing the code that checks that the
file exists and is executable.  The output from the shell when the
system() function fails is just as useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3400b2ed34b6eb9496eb55f1aab6f89d2952060d)
2013-02-05 16:05:13 +11:00
Martin Schwenke
bc5f0a2b65 ctdbd: Remove command-line option --debug-hung-script
Use an environment variable instead.  This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.

The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each.  So, the convention
will be to use an environment variable for each debug option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
2013-02-05 16:05:13 +11:00
Martin Schwenke
f2428cadd8 ctdbd: Remove debug_hung_script_ctx
The only allocation against this context is by
ctdb_fork_with_logging().  This memory is freed by ctdb_log_handler()
anyway.  There should be no memory leak.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)
2013-02-05 16:05:13 +11:00
Amitay Isaacs
4a6fa39ff9 daemon: Protect against double free of callback state while shutting down
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it.  This includes
callback state for current_monitor, if the current monitor event has
not yet finished.  As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.

So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)
2013-01-09 14:39:23 +11:00
Martin Schwenke
199b971f57 ctdbd: Remove references to forcing running of eventscripts from log messages
Running of eventscripts can be initiated from many places, including
the recovery daemon.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 440892d75ef73c0aca22f47c0c01712be00cf5b7)
2012-10-18 20:05:43 +11:00
Martin Schwenke
65725d30d4 ctdbd: Remove the worked "Forced" from message about running eventscripts
The eventscripts are run after a takeover run and in this case they're
not forced.  The messages seems to imply that somone has run "ctdb
eventscript" when that is not necessarily the case.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3880589db4d563e438126cf5080261fa06b9e242)
2012-07-26 22:10:54 +10:00
Ronnie Sahlberg
dce5969d12 Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung.
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.

S1037271

(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
2012-05-17 10:29:03 +10:00
Ronnie Sahlberg
a57eba2bb4 Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.

Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a

(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
2012-05-03 14:03:26 +10:00
Amitay Isaacs
4392591555 Remove explicit include of lib/tevent/tevent.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)
2012-04-13 17:28:14 +10:00
Ronnie Sahlberg
93ec9c589c Eventscripts: remove the horrible horrible circular reference between state and callback since these two structures do not even share the same parent talloc context.
Instead, tie them together via referencing a permanent linked list hung off the ctdb structure.

(This used to be ctdb commit a95c02da6c67dc4bd8716b75318a4188301df6f9)
2012-02-23 06:49:47 +11:00
Ronnie Sahlberg
0581fd85e6 Eventscripts: Add special -ECANCELED status for monitor events that are cancelled
When a monitor event is canceled by a higher priority script, make sure we return
status -ECANCELED to the callback in ctdB_monitor.c
Also treat -ECANCELED as a simple "try monitor event again" and skip modifying any HEALTHY/UNHEALTHY flags when this happens

(This used to be ctdb commit a15ec57c26d1bc82af85f74eebae0bd8abde3233)
2011-11-18 12:22:22 +11:00
Ronnie Sahlberg
2902203900 Logging: when we log stdout/stderr messages from eventscripts to the system log, prefix every line of output with the name of the eventscript.
CQ S1028412

(This used to be ctdb commit 392363c04185f47a826fc6ed95038342be2150bf)
2011-08-26 09:39:25 +10:00
Rusty Russell
87ea4818bf eventscript: fix callback after free
ctdb_event_script_callback() takes a mem_ctx arg which it doesn't use, but
the implication is pretty clear, that when that mem_ctx is freed, the callback
shouldn't happen.  Indeed, Ronnie reproduced a case where that callback
refers to freed memory, in the ip reallocation code under stress.

So attach the callback to the mem_ctx they give us, and remove it from the
script state structure when that's freed.  It's a bit weird, but it works.

CQ: S1026179
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 6fcd867cc835ef1ffc1c50964f135c346503d40c)
2011-07-29 08:50:39 +10:00
Ronnie Sahlberg
2f1395ce03 If the eventscript is finished but state->ctdb is NULL,
log an error and return.

(Need to find root cause for this is soo too.)

(This used to be ctdb commit 2e80d53b73fcba58ed5a72bab66c051691ccf719)
2011-04-12 06:36:42 +10:00
Ronnie Sahlberg
c4006ce844 Add ctdb_fork(0 which will fork a child process and drop the real-time
scheduler for the child.

Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.

(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
2011-01-11 07:40:41 +11:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Rusty Russell
9fbb191b78 logging: give a unique logging name to each forked child.
This means we can distinguish which child is logging, esp. via syslog where we have no pid.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
2010-08-18 11:46:32 +09:30
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Rusty Russell
6da848f31c eventscript: simplify script timeout handling
Now the script child signal handler doesn't do anything, we can unify the
"timeout" and "abort" cases introduced in 9dd25cb751919799.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 439f049c7024d69aa4b87dc811e1772981ad29cb)
2010-04-08 15:11:05 +09:30
Rusty Russell
a9c8b9e89b eventscript: wait for debugging dump before killing timedout script
Fairly simple: prevent the destructor from killing the script, and do it
explicitly from the debugging child.

We can remove the extra "already dead" test, since this will be detected
in the destructor anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit f8aa83788e3cc10ab7655a90d7b7b17ddbe48685)
2010-04-08 15:09:08 +09:30
Rusty Russell
e1b59b6a47 eventscript: don't do debugging system() from inside signal handler
In the case of a timeout, we dump a log of what's happening to a file
in /tmp.  We do it from the signal handler, which is an unreliable hack
(BZ58365).

Instead, create another (lower-priority) child to do the dump, then
kill the timedout script.

Note that this doesn't quite work as intended (the dump is often run
after the script has been killed), so the next patch resolves this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)
2010-04-08 15:13:29 +09:30
Rusty Russell
037dfbb8ad eventscript: fix case where we fail to create child for some reason
Initialize the child pid to 0 so destructor doesn't try to kill it:

	server/eventscript.c:565 Sending SIGTERM to child pid:139742328
	Failed to kill child process for eventscript, errno No such process(3)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fcc63e04beb427c1f48deae6d3d98c78a2a67949)
2010-04-08 10:35:04 +09:30
Ronnie Sahlberg
e179910136 When we forcefully abort a running eventscript, dont log this as is
the script timedout.

Instead send a different signal (SIGABRT) to the child process to silently
kill the process group for the script and its children without logging
anything.

We abort any running "monitor" script anytime any other event is generated
either by ctdbd itself or by "ctdb eventscript ..."

BZ61043

(This used to be ctdb commit 9dd25cb751919799af9d8a23a0725343a8400e58)
2010-03-30 12:47:54 +11:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Ronnie Sahlberg
68decc38ca Ignore any scripts that timesout for most events, except startup.
Threat hung scripts always (except startup) as success.

(This used to be ctdb commit b6d939c9758c7d2e39206838492f2f644dd61db7)
2010-02-16 11:21:27 +11:00
Ronnie Sahlberg
96a61ca907 Reduce loglevel for two eventscript related debug messages
(This used to be ctdb commit f8994790e65baebb81bbfad646cdda6234b6d29a)
2010-02-16 11:02:11 +11:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Rusty Russell
565b2cda11 eventscript: fix bug when script is aborted
Another corner case when we terminate running monitor scripts to run
something else: logging can flush the output and we write to a NULL
pointer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit eb22c34bccc8a04fcf63efa2bc48d9788709382e)
2009-12-18 14:48:41 +11:00