1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

319 Commits

Author SHA1 Message Date
Amitay Isaacs
1f523a628a ctdb-tests: Avoid early exits in scripts that appear on tail of a pipe
When executing a shell script code "foo | bar", if "bar" terminates early,
then "foo" can get I/O error when writing to stdout.

The tdbtool stub did not wait to read anything from stdin when it is
expected to.  This would cause tests to fail randomly under load when
tdbtool process exited early.

Similarly, debug function read from stdin only under certain conditions
(higher debug and when not reading from tty).  Otherwise, exited early.

Thanks to Andrew Bartlett for noticing the problem and Catalyst Cloud
(http://catalyst.net.nz/cloud) for providing resources to test fixes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 20 16:26:37 CET 2015 on sn-devel-104
2015-03-20 16:26:36 +01:00
Martin Schwenke
50ddc2c356 ctdb-scripts: Remove unused function nfs_statd_update()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-04 10:42:27 +01:00
Martin Schwenke
500c6e194b ctdb-scripts: Change statd-callout to be more scalable
Updating ctdb.tdb on each add-client, del-client and each delete
during notify was too ambitious.  Persistent transactions do not
perform well enough to do this.

Revert to having add-client and del-client create touch files.  Each
monitor event calls "statd-callout update" to convert touch files into
ctdb.tdb records.

Update testcases to do the "update" and add an extra test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-04 10:42:27 +01:00
Martin Schwenke
ab51f283e7 ctdb-scripts: Call iptables/ip6tables directly from iptables_wrapper
Drops the iptables() and ip6tables() functions and, hence, the
hardcoding of paths /sbin/iptables and /sbin/ip6tables.  The latter
avoids problems on openSUSE where (for example) /usr/sbin/iptables is
used instead.

This means that locking around ip*tables commands is only done when
iptables_wrapper is called directly.  This is fine because the only
conflict is when "releaseip" or "takeip"/"updateip" events are run in
parallel.  The other uses in 11.natgw and 70.iscsi are in events where
there will be no collisions.

Making 11.natgw support IPv6 is unnecessary.  Just put a static IPv6
address on each interface - they're plentiful.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jan 28 08:29:55 CET 2015 on sn-devel-104
2015-01-28 08:29:55 +01:00
Martin Schwenke
4638010abb ctdb-scripts: Don't use the GNU awk gensub() function
This is a gawk extension and can't be used reliably if just running
"awk".  It is simple enough to switch to using the standard sub() and
gsub() functions.

The alternative is to switch to explicitly running "gawk".  However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-01-09 02:03:40 +01:00
Martin Schwenke
a5c5eee7d1 ctdb-scripts: Try to deal with Ubuntu having /usr/sbin/service
Falling back to running the initscript doesn't work because it detects
that upstart is being used and fails.  This was observed when trying
to start winbind on Ubuntu 11.04.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-01-09 02:03:40 +01:00
Martin Schwenke
d0b2375c3d ctdb-scripts: Wait until IPv6 addresses are not "tentative"
There are a few potential failure modes when adding an IPv6 address.
It takes a little while of duplicate address detection to complete, so
wait for a while.  After a timeout, also need to check to see if
duplicate address detection failed - if it did then actually drop the
IP address.

This really needs some careful thinking.  If CTDB disappears on a node
but the node's IP addresses are still on interfaces then the above
failure mode could cause the takeover nodes to become banned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Amitay Isaacs
d4212bd6a5 ctdb-eventscripts: Specify broadcast optionally to ip addr add
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-12-05 21:02:40 +01:00
Martin Schwenke
6471541d6d ctdb-scripts: Make 10.interface IPv6-safe
Add checking to "releaseip" and "updateip" to ensure that the given IP
address is really on the given interface with the given netmask.  If
reality doesn't match the given arguments then believe reality.

Use new function iptables_wrapper() instead of calling iptables()
directly.

Use new function flush_route_cache() instead of doing IPv4-specific
/proc magic.

Remove setting of otherwise unused variable "failed".

Fix a test for which the error message has changed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Martin Schwenke
c314ae0b2a ctdb-scripts: New functions ip6tables() and iptables_wrapper()
ip6tables() uses the same lock as iptables().  This is done on
suspicion.

iptables_wrapper() takes 1st argument "inet" or "inet6", and the rest
is passed to the correct iptables variant.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Martin Schwenke
ed029ae0a1 ctdb-scripts: Add IPv6 addresses support in ip_maskbits_iface()
It also prints a third word, the address family.  This is either
"inet" or "inet6".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Martin Schwenke
4940f191d3 ctdb-scripts: Update eventscripts to use ctdb -X instead of ctdb -Y
Also update associated eventscript unit tests and ctdb stub.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:39 +01:00
Martin Schwenke
f51672f514 ctdb-scripts: Add rpc.statd stack dumping to Ganesha restart
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
968401ccdc ctdb-scripts: Dump stack traces for hung mountd, rquotad, statd processes
Add a corresponding new unit test for statd.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
1f49e1ab5b ctdb-scripts: Add optional program name argument to nfs_dump_some_threads()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
2ebc305be6 ctdb-scripts: Factor out new function program_stack_traces()
In the process, fix a bug where an extra trace would be printed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
8ed3ff456c ctdb-logging: Add logging via UDP to 127.0.0.1:514 to syslog backend
This has most of the advantages of the old logd with none of the
complexity of the extra process.  There are several good syslog
implementations that can listen on the UDP port.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Martin Schwenke
1d1cd04cb9 ctdb-logging: New option CTDB_LOGGING, remove CTDB_LOGFILE, CTDB_SYSLOG
Remove --logfile and --syslog daemon options and replace with
--logging.

Modularise and clean up logging initialisation code.  The
initialisation API includes an app_name argument that is currently
unused - this will be used in extensions to the syslog backend.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Martin Schwenke
61b1fdec2f ctdb-scripts: Support NFS on RHEL7 with systemd
Need to be able to recognise a RHEL system.  Still use "system" to
start and stop service, since that still works and yields the smallest
change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-07 10:59:56 +02:00
Martin Schwenke
058e14cdb0 ctdb-eventscripts: Fix regression in IP add/delete functions
Commit 176ae6c704 caused these functions
to exit on failure.  This is incorrect and broke NAT gateway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-03-23 04:20:14 +01:00
Martin Schwenke
fcf846a795 ctdb-eventscripts: Switch on dumping of stuck nfsd threads
This feature was added quite a while ago but was not enabled by
default.  It is a useful feature so enable it to dump stack traces of
up to 5 stuck processes by default.

This can be disabled by setting:

  CTDB_NFS_DUMP_STUCK_THREADS=0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104
2014-02-25 04:06:45 +01:00
Martin Schwenke
c743fc4345 ctdb-scripts: Update a misleading comment
This comment was true when 50.samba was spaghetti because it tried to
automatically manage both smbd (and nmbd) and winbind.  It isn't true
anymore.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 19 04:07:12 CET 2014 on sn-devel-104
2014-02-19 04:07:12 +01:00
Martin Schwenke
176ae6c704 ctdb-eventscripts: Deleting IPs should use the promote_secondaries option
If a primary IP address is being deleted from an interface, the
secondaries are remembered and added back after the primary is
deleted.  This is done under a lock shared by the add/del script code.
It is necessary because, by default, Linux deletes secondaries when
the corresponding primary is deleted.

There is a race here between ctdbd and the scripts, since ctdbd
doesn't know about the lock.  If ctdbd receives a release IP control
and the IP address is not on an interface then it is regarded as a
"Redundant release of IP" so no "releaseip" event is generated.  This
can occur if the IP address in question is a secondary that has been
temporarily dropped.  It is more likely if the number of secondaries
is large.

Since Linux 2.6.12 (i.e. 2005) Linux has supported a
promote_secondaries option on interfaces.  This option is currently
undocumented but that will change in Linux 3.14.  With
promote_secondaries enabled the kernel will not drop secondaries but
will promote a corresponding secondary instead.  The kernel does all
necessary locking.

Use promote_secondaries to simplify the code, avoid re-adding
secondaries, avoid re-adding routes and provide improved performance.

This could be done conditionally, with a fallback to legacy
secondary-re-adding code, but no supported Linux distribution is
running a pre-2.6.12 kernel so this is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-02-13 02:03:24 +01:00
Martin Schwenke
b7bfe46636 ctdb/eventscripts: Move all eventscript state under $CTDB_VARDIR/state
Services can be flagged for reconfigure when they release IPs at
shutdown.  The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.

$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event.  Just push the service
state subdirectories down a level and put everything else in a
subdirectory.

This way all the eventscript state gets cleaned up every time CTDB
starts up.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104
2014-01-17 09:58:26 +01:00
Martin Schwenke
50e00b3e52 ctdb/eventscripts: Print a count if killing TCP connections times out
Also update related test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-01-17 17:59:34 +11:00
Martin Schwenke
8eb20c2347 ctdb/eventscripts: Reconfigure lock should be released quickly
Currently the lock is held until the corresponding eventscript
completes, since the process still exists.  If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time.  The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held.  This can cause an unwanted monitor replay.

Change this so that the lock is released immediately after the
reconfiguration is complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-01-17 17:59:26 +11:00
Martin Schwenke
fdccaab2a9 ctdb/eventscripts: Do not reconfigure in "monitor" events
"monitor" events can be cancelled.  If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped.  In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).

A long time ago we did service reconfiguration in "monitor" events
following failovers.  Service reconfiguration was then moved to the
"ipreallocated" event.  However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur.  The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated".  Therefore, IPs can be deleted without
running the required service reconfiguration.

The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.

This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.

Also update the associated tests.  Make the first confirm that the
monitor event no longer does reconfiguration.  Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
2013-12-17 06:32:35 +01:00
Martin Schwenke
37aea37269 scripts: Make detect_init_style() more readable
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 516cdea0e73cf3f63b3303e22809834c8cbc64e4)
2013-10-22 14:34:05 +11:00
Martin Schwenke
49d0153b10 eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
A generic framework is no longer needed now that the "ctdb" checker is
the only one left.  Simplify the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 044d302b41a2040642355401e3236fcecc3a620a)
2013-10-22 14:34:04 +11:00
Martin Schwenke
0e9c939c0c eventscripts: Remove TCP port checks other than the built-in CTDB one
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.

Remove tests related to the removed checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
2013-10-22 14:34:04 +11:00
Martin Schwenke
d02a645691 scripts: Remove setting of PATH from functions file
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin.  Use of /usr/local/bin, which may be
required on some systems, is also overridden.  This can make it
difficult to do interactive debugging of script problems.

Rely on the system PATH instead.

If system-specific changes need to be made then this can be done in a
configuration file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfbff39e22e42f3997f637290748290833525714)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cd4041760b scripts: Simplify script_log() to just look at CTDB_SYSLOG variable
The old logic was actually wrong.  If CTDB_LOGFILE is unset then a
default is used, not syslog.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79e2029f9bc078126e865aa715100a3870c7604b)
2013-10-22 14:34:04 +11:00
Martin Schwenke
4526fdbbca scripts: Remove support for CTDB_OPTIONS configuration variable
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog).  If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1043b53d12 scripts: Remove unused configuration variable CTDB_MANAGES_SCP
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bda0da41aaf629a252cc361b73ebc5328f26ed04)
2013-10-22 14:34:03 +11:00
Martin Schwenke
ace6c1ee62 eventscripts: Fix comment - CTDB_TCP_PORT_CHECKS -> CTDB_TCP_PORT_CHECKERS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a79ba2f1277a776347e2c3f04ce8419e0be62de)
2013-10-22 13:07:13 +11:00
Martin Schwenke
5818771192 scripts: Add support for optional ctdbd.conf configuration file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)
2013-09-25 14:35:46 +10:00
Martin Schwenke
e6ce2f55ef eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
2013-08-14 15:57:04 +10:00
Martin Schwenke
35d9631eda eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
2013-08-14 15:57:04 +10:00
Martin Schwenke
b1f7337d2b eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
2013-08-14 15:57:04 +10:00
Martin Schwenke
0ca046577f eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
2013-08-14 15:57:03 +10:00
Martin Schwenke
bdbe37b24f eventscripts: Separate out RPC service restart code
While doing this:

* Explicitly assign RPC program and version information in
  _nfs_check_rpc_common().  This is more lines of code but is easier
  to read.

* Don't print the options when starting a service.  Trying to print it
  makes the code messy for little benefit.

  Update the eventscript unit testing code and a Ganesha test to
  reflect this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
2013-08-14 15:57:03 +10:00
Martin Schwenke
5459cdc8a6 eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output.  This is consistent with the way
CTDB generally deals with the output when stopping a service.

It also makes updating the eventscript unit tests easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
2013-08-14 15:57:03 +10:00
Martin Schwenke
a8dd716146 eventscripts: kill_tcp_connections() should send connections to stdin
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection.  This will considerably reduce the
time when there is a large number of tcp connections.  This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.

Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
2013-07-29 15:53:06 +10:00
Martin Schwenke
4e07c6c433 eventscripts: When replaying monitor status, don't log empty output
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ce04f1c107b4392ca955d9f29b93aaaae62439ce)
2013-07-05 15:52:33 +10:00
Martin Schwenke
bee02e06e6 scripts: drop_ip() should use delete_ip_from_iface()
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)
2013-06-20 13:01:09 +10:00
Martin Schwenke
a1eb516f0a scripts: drop_all_public_ips() now prints messages to stdout, not log
Change all callers to maintain current behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)
2013-06-20 13:01:09 +10:00
Martin Schwenke
45878d4363 eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped.  This can be useful for trying to determine
why nfsd is stuck.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)
2013-06-14 15:15:06 +10:00
Martin Schwenke
2e515f2306 eventscripts: Fix statd-callout update handling
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run.  This stops the statd-callout updates from ever being called.

Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file.  Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>

(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)
2013-05-28 16:11:47 +10:00
Martin Schwenke
66019e3287 scripts: Provide mktemp function for platforms without mktemp command
This is needed for AIX and possibly others.

Also provide a cheaper mktemp function is needed in the run_tests
script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
2013-05-27 15:14:33 +10:00
Martin Schwenke
51dbaecb54 eventscripts: Fix regression in _loadconfig()
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1619a36c1beba11533052dc5728fa3adaa08870)
2013-05-22 14:24:21 +10:00
Martin Schwenke
de84c1fd3c eventscripts: NFS RPC checks no longer support "knfsd"
No longer used, support removed from test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e)
2013-05-07 12:55:09 +10:00
Martin Schwenke
05b2edeec2 eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"
Want nfs_check_rpc_services() to support filenames without the 'k'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
2013-05-06 20:40:58 +10:00
Martin Schwenke
c52183c055 eventscripts: New function nfs_check_rpc_services()
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.

nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator.  The files have one limit check and
a set of actions per line.  The program name is extracted from the
file name.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
2013-05-06 20:40:58 +10:00
Martin Schwenke
167acd1cd5 eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5a717fd495ba5a2bfd481d69f38b68fa4576716f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
bdab9d1ea6 eventscripts: Factor out common code from nfs_check_rpc_service()
This creates new function _nfs_check_rpc_common().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc3bb42e48bbdabd19187c231846b98589b4f4f3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
910e138cb3 eventscripts: Remove ganesha support from nfs_check_rpc_service()
This is unused so doesn't need to be maintained.  An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 887733dd7be53158bfe07b30ef31b611d0f8122f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
944d063a3e Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"
This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.

This change is unused and is just complicating the function.

Conflicts:
	config/functions

(This used to be ctdb commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
577a3cae5d eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
The code in 60.nfs is going to be genericised, so make all the checks
look the same.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
2013-05-06 20:40:58 +10:00
Martin Schwenke
6c347a5294 eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
2bc807f974 eventscripts: Remove unused function ctdb_check_counter_limit()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a8ef00608e48a551a334aded206146807aeb4c5a)
2013-05-06 16:24:59 +10:00
Martin Schwenke
29a3823e40 eventscripts: Minor cleanups for killtcp/tickle functions
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25ef4f655f1efc833deb5e244f9fff461e92f439)
2013-05-06 16:24:50 +10:00
Martin Schwenke
189a5c003c eventscripts: Tweak the timeout check in kill_tcp_connections()
This has 2 advantages:

1. It uses get_tcp_connections_for_ip() to check for leftover
   connections, instead of custom code.

2. It checks for the timeout condition before sleeping.  The current
   code sleeps and then checks, so wastes a second.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 60a08eb96e1d97aab31e9bd4af01683c650541c2)
2013-05-06 16:22:15 +10:00
Martin Schwenke
8f84a2bec7 eventscripts: In killtcp/tickle functions, $_failed should be boolean
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 319c1b68d5aa78f82a68febcad233a7c78afc887)
2013-05-06 16:22:07 +10:00
Martin Schwenke
ed59deaee3 eventscripts: Remove unused $_killcount from tickle_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8514ca56830b30e7f0eb5018632640daaf8ff65d)
2013-05-06 16:16:56 +10:00
Martin Schwenke
975ea7fb7a eventscripts: Refactor connection listing in killtcp and tickle functions
Uses new function get_tcp_connections_for_ip().  This avoids using a
temporary file and running netstat twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a621622903c7ef17764b15293d6ea8df5a53c7e1)
2013-05-06 16:16:50 +10:00
Martin Schwenke
a320e1f7f1 eventscripts: Reimplement kill_tcp_connections_local_only()
... using kill_tcp_connections()

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 10e4db8f796d1e3259733180494db3b4bbad291a)
2013-05-06 15:45:11 +10:00
Martin Schwenke
5e828b48fe eventscripts: Change handling of one-way kills in kill_tcp_connections()
This change is a no-op.  However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
2013-05-06 15:45:10 +10:00
Martin Schwenke
d98d931af3 eventscripts: Remove unnecessary variables from killtcp/tickle functions
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3eae161472e6352f7f656851c73dc056f95113eb)
2013-05-06 15:45:10 +10:00
Martin Schwenke
6e2863a4f9 eventscripts: Clean up ctdb_check_command()
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
2013-05-06 15:45:10 +10:00
Martin Schwenke
30addb886a eventscripts; Cleanup up ctdb_check_directories()
The documentation comments are wrong... and remove option
$service_name argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
0ad8f46db3 eventscripts: Assert that $service_name is set in a few key places
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d0a7d83ddc824961d876fc9afba829c90aef3e7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
5dd9e52e46 eventscripts: counters default to $script_name if $service_name not set
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fff88940f71058e4eefd65f50a6701389c005c17)
2013-05-06 15:45:10 +10:00
Martin Schwenke
e9abc9c070 eventscripts: Simplify handling of $service name in "managed" functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().

Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
2013-05-06 15:45:10 +10:00
Martin Schwenke
c56acf7127 eventscripts: Simplify handling of $service name in start/stop functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
8065366b33 eventscripts: Simplify handling of $service name in service_management
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
2013-05-06 15:45:10 +10:00
Martin Schwenke
4c9438b2a3 eventscripts: Simplify handling of $service name in reconfigure functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
642848b916 eventscripts: Remove unused function ctdb_check_counter_equal()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fd536a26b310b5bf9628da62cca0b425f4a54030)
2013-05-06 15:45:10 +10:00
Martin Schwenke
bbd0ed0e29 scripts: Fix script_log() regression
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.

Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
2013-05-06 15:43:16 +10:00
Martin Schwenke
823edbf6fe scripts: Ensure even external scripts get tagged in logs as "ctdbd"
Our practice is to search logs for "ctdbd:".  We want to make sure we
find everything.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93)
2013-04-22 13:58:36 +10:00
Martin Schwenke
fb8be43d6d eventscripts: Ensure directories are created
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.

Create those directories instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0076cfc4666e5a96eb2c8affb59585b090840e00)
2013-04-22 13:58:36 +10:00
Martin Schwenke
903f4c394c scripts: Clean up update_tickles() and handling of associated directory
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 700cf95a1f29b4b88460a00a55d57a9e397011e0)
2013-04-19 13:13:36 +10:00
Martin Schwenke
100a0eed90 scripts: Use $CTDB_SCRIPT_DEBUGLEVEL instead of something more complex
The current logic is horrible and creates an unnecessary file.  Let's
make the script debug level independent of ctddb's debug level.

* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly

* Remove ctdb_set_current_debuglevel()

* Remove the "getdebug" command from ctdb stub in eventscript unit
  tests

* Update relevant eventscript unit tests to use
  $CTDB_SCRIPT_DEBUGLEVEL

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 85efa446c7f5c5af1c3a960001aa777775ae562f)
2013-04-19 13:13:36 +10:00
Srikrishan Malik
28cbe527d4 Changes for unobtrusive recovery and new method for health check.
Unobtrusive recovery: Ganesha will not be restarted on failovers.

Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>

(This used to be ctdb commit 0e651e9da0f1f3c836b4474612ab13d0ccd272d9)
2013-01-11 17:16:46 +11:00
Martin Schwenke
4f622fe9fb scripts: Make script_log() use supplied message, stop logger from hanging
When using syslog any provided message arguments are ignored and not
passed to logger.  This means that logger blocks waiting on stdin.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50abf597cefe6f8ea2a2ff7694bf84641344a9b1)
2013-01-08 15:18:47 +11:00
Martin Schwenke
d801b02681 scripts: Make drop_all_public_ips() more robust
Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.

Factoring out some of the code will allow it to be used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367)
2013-01-08 15:18:47 +11:00
Martin Schwenke
0eb757329e scripts: Move drop_all_public_ips() to the functions file
... so it can be improved and used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b23c30253cc9eb274b895cac0f8c65245ba0a200)
2013-01-08 15:18:46 +11:00
Martin Schwenke
217ad07b72 Eventscripts: Change the default reconfigure action to do nothing
A default action of restarting the service doesn't obey the principle
of least surprise.  It cause the NFS service to be implicitly
reintroduced.

This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)
2013-01-07 10:35:39 +11:00
Martin Schwenke
9f6b30a517 scripts: Refactor logging code in initscript and functions file
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ee242c949a98bb7397e0f7368b20d44c06fe772)
2012-10-18 20:05:43 +11:00
Michael Adam
6372592982 config/functions: fix a comment
ctdb_check_counter_limits does not fail but succeed if count >= limit

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)
2012-10-17 21:56:58 +02:00
Martin Schwenke
d33b12a1c5 Eventscripts: Add service-start and service-stop pseudo-events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)
2012-10-10 14:54:53 +11:00
Martin Schwenke
2d719e5c84 eventscripts: Auto-start/stop services in background
If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done
in the background with logging.

Fix some unit tests for samba and winbind.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
2012-10-03 08:48:23 +10:00
Martin Schwenke
835e0b6d49 Eventscripts: Modernise 60.ganesha to match 60.nfs
Originally from Srikrishan Malik <srikrishan.malik@in.ibm.com> with
some style changes by me.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 637cab6304dae66b85668506028c76ea1ee88980)
2012-05-16 17:24:21 +10:00
Martin Schwenke
92eb004162 Eventscript functions: add optional version to nfs_check_rpc_service()
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 92f74fd589467b46c758e116e97417edfe8773d7)
2012-05-16 17:05:05 +10:00
Martin Schwenke
476cf45049 Eventscript functions - no longer require interface_modify.sh
Make add_ip_to_iface() and delete_ip_from_iface() do their own locking
so the external script is no longer required.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93f90caf91246074d9359bf31a39b26212cccc42)
2012-03-22 15:30:27 +11:00
Martin Schwenke
0b2c3d7d24 Eventscript functions - remove now-unused route/IP re-add script logic
This is no longer used by 13.per_ip_routing or anything else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2a2ea6c61a05af2d0765e964abcc7ef04047431e)
2012-03-22 15:30:26 +11:00
Martin Schwenke
940efdb8e9 Eventscript functions - remove functions only used by 13.per_ip_routing
The relevant functions are now in that script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45c3476d12bf0f52966b72d286f101fce1382cd2)
2012-03-22 15:30:26 +11:00
Martin Schwenke
0d67779c67 Eventscript functions - add new function die()
Args:

1. Error message to be printed.

2. Option exit code (default 1)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 97b0c138cb97e30db27c40b4ee1481109ae90c78)
2012-03-22 15:30:26 +11:00
Mathieu Parent
91431262be config/functions: CTDB_VARDIR is /var/lib/ctdb on Debian-like systems
(This used to be ctdb commit 56160eccb62178f645b017b1257677a1e854b2bc)
2011-11-08 16:31:03 +11:00
Mathieu Parent
a1919fd316 apache's service name is not always httpd
Solution 2 of <https://bugzilla.samba.org/show_bug.cgi?id=8317>

(This used to be ctdb commit 8b9ac5cd8d867ff4866ac464c570d9293d03a91e)
2011-10-12 20:07:45 +11:00
Martin Schwenke
205c7c7663 Eventscripts - enhance ctdb_replay_monitor_status()
Print useful output and return a suitable exit code.

The DISABLED and TIMEDOUT statuses use fake negative return codes, and
these can't be faked from the shell.  So we map DISABLED to OK and
TIMEDOUT to ERROR - this should avoid nearly all surprises.  When we
do this we add a note to the beginning of the output.  The alternative
is to "fix" ctdbd to use only codes that can actually be returned by
shell scripts.  However, the reason for using negative codes is
probably to distinguish them from real ones...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dda44d026e0c1b02feb02185b8c200a542be341a)
2011-08-31 15:34:43 +10:00
Martin Schwenke
aa64622137 Eventscripts - use ctdb scriptstatus -Y when replaying status
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5be904fb1fbd546618d25509b41ab836db62a70a)
2011-08-30 16:34:43 +10:00
Martin Schwenke
b97625acb6 Eventscripts: add a synchronous synthetic reconfigure event.
In the current code services can only be reconfigured asynchronously.
This means that configuration file changes can be made, an asychronous
reconfigure event can be triggered, and it always succeeds.  Some time
later when a service is actually reconfigured then a failure may be
seen

This adds a synthetic reconfigure event that reconfigures a service
synchronously so that any failure is reported on exit.

ctdb_service_check_reconfigure() is essentially reimplemented.

If a reconfigure event is in flight and an ipreallocated or monitor
event occurs then any scheduled asynchronous reconfigure is deferred
until the next monitor cycle.  This is to avoid reconfigures trampling
on each other.  In this case a monitor event will also replay the
previous status to try to avoid exposing any temporary instability.

If a reconfigure event collides with another reconfigure event it will
exit with status 2, indicating that the reconfigure should be retried.

The reconfigure event is implemented using a subprocess to control the
exit from the synthetic event.

As before, if a monitor event causes a scheduled synchronous
reconfigure to occure then it will replay the previous status for the
service, given that a reconfigure can cause temporary instability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 220578bfd3507152b29ba4c28942f9d5e8733886)
2011-08-30 14:29:48 +10:00
Martin Schwenke
7980a4cb44 Eventscripts - new function ctdb_check_args()
Pass this "$@" to do common eventscript argument checking.

For regular use putting this in 00.ctdb would be enough.  However, for
developer testing it can be useful to call this in other eventscripts.
For example, 10.interfaces and 13.per_ip_routing currently check these
by hand.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de7e7fd6dfeed61ef9977b8d5b568f90a9707b)
2011-08-30 09:33:47 +10:00
Martin Schwenke
63729fc35d Eventscripts - ctdb_check_tcp_ports() bug fix.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8d9c0b251c84d6fdf6ea7d972e5f7d1d0222f9b)
2011-08-30 09:33:47 +10:00
Martin Schwenke
194de8faf8 Eventscripts - fix debugging buglet in ctdb_check_tcp_ports_ctdb()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 61000e38d6016e58f67e292393756d0bd5262ae5)
2011-08-30 09:33:47 +10:00
Martin Schwenke
9257b57f2c Eventscripts: New configuration variable CTDB_SERVICE_AUTOSTARTSTOP.
Some of the current auto-start/stop logic is broken, particularly for
Samba.  Fixing it is non-trivial.

If $CTDB_SERVICE_AUTOSTARTSTOP is "yes" then auto-start/stop services
when told to newly manage or no longer manage them.  This defaults to
"yes".

However, if using a canned configuration file that doesn't set
$CTDB_SERVICE_AUTOSTARTSTOP then this stops the auto-start-stop logic
from working.  Therefore, this works around CQ S1026685 - on the
system in question another daemon controls service auto-start/stop and
CTDB just gets in the way.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ef71b8290ae49117d7bcc7166598b77cb64cc8a0)
2011-08-30 09:33:47 +10:00
Martin Schwenke
6e7dbf0543 Eventscripts - new default TCP port checker using "ctdb checktcpport"
New function ctdb_check_tcp_ports_ctdb().  This should be fast... and
is now the default checker.  If it fails in an unexpected way we fall
back to the nmap and netstat checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1e16a707ce204817531a61455000361f972080a)
2011-08-17 14:02:45 +10:00
Martin Schwenke
1374327f6e Eventscripts - generalise TCP port checking plus new nmap-based checker
Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().

Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.

ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try.  Default
value is currently "nmap netstat".  If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed.  This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9651175b40b9454e7d4e98291955fcf1445085e)
2011-08-17 12:12:20 +10:00
Martin Schwenke
62f654d3d2 Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging
Use the new debug function to conditionally print the netstat output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44c14aeeb11080980fe07c7396d06843a4870747)
2011-08-17 10:39:54 +10:00
Martin Schwenke
86792724a2 Eventscripts - weaken TCP port check message if CTDB has just been started.
Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started.  The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.

This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service.  When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message.  This means that until the node actually becomes
healthy we see more friendly messages.

The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started.  This reduces the chances of people reporting such
false recreates...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 571865eb6ef847857129d0b1e2ba5fa7254bfe8c)
2011-08-17 10:39:53 +10:00
Martin Schwenke
5c9fbb55ce Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port.  There are 2 problems with this:

* Netstat is run on each loop iteration when it need only be run once.

* The -a option is used to list all connections but the function only
  cares about the listening ports.  There may be many thousands of
  non-listening ports to grep through.

This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option.  It also only runs netstat once before the
main loop.

When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 830355a8b18c53cfcc3ad1e3009bbb1a7a681fa0)
2011-08-17 10:39:53 +10:00
Martin Schwenke
f0f9271301 Eventscripts: add a debug() function and call ctdb_set_current_debuglevel()
The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG).  If no args are given
then use stdin - this allows the function to be used with here
documents.

To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6143483d9f87322578c00f12081e381f425226ca)
2011-08-17 10:39:35 +10:00
Martin Schwenke
171bef3d68 Eventscripts - new function ctdb_set_current_debuglevel()
This function ensures that CTDB_CURRENT_DEBUGLEVEL is set.  It works
like this:

1. If it is already set then do nothing, since it might have been set
   some other way.

   The recommended "other way" would be to add a file in rc.local.d/.

2. If it is not set then set it by sourcing
   /var/ctdb/eventscript_debuglevel.

3. If this file does not exist then create it using output from "ctdb
   getdebug".

If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.

If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93910921c8a25f2b029733cd938069ff7c7bdab7)
2011-08-17 09:00:46 +10:00
Martin Schwenke
32fe247e37 Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.
This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 380c9263eb37db5a250264316e250c2160908263)
2011-08-12 16:28:09 +10:00
Martin Schwenke
398116ff29 Eventscripts: clean up 60.nfs monitor event.
This adds a helper function called nfs_check_rpc_service() and uses it
to make the monitor event much more readable.  An example of usage is
as follows:

  nfs_check_rpc_service "mountd" \
    -ge 10 "verbose restart:b unhealthy" \
    -eq 5 "restart:b"

The first argument to nfs_check_rpc_service() is the name of the RPC
service to be checked.  The RPC service corresponding to this command
is checked for availability using the rpcinfo command.  If the service
is available then the function succeeds and subsequent arguments are
ignored.

If the rpcinfo check fails then a failure counter for that particular
RPC service is incremented and subsequent arguments are processed in
groups of 3:

1. An integer comparison operator supported by test.
2. An integer failure limit.
3. An action string.

The value of the failure counter is checked using (1) and (2) above.
The first check that succeeds has its action string processed - note
that this explains the somewhat curious reverse ordering of checks.

It the example above:

* If the counter is >= 10 then a verbose message is printed
  describing the failure, the service is restarted in the background
  and the node is marked as unhealthy (via an "exit 1" from the
  function).

* If the counter is == 5 then the service us restarted in the
  background.

For more action options please see the code.

This also changes the ctdb_check_rpc() function so that it no longer
takes a program number to check.  It now just takes a real RPC program
name that rpcinfo can resolve via /etc/rpc.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9b66057964756a6245bafb436eb6106fb6a2866e)
2011-08-12 14:16:14 +10:00
Martin Schwenke
3a760b09ed Evenscripts: improvements to ctdb_service_check_reconfigure().
* Make this function applicable to "ipreallocated" event too.

* Monitor event should not always succeed just because we reconfigure.

  If the service was unhealthy before the reconfigure and we end the
  reconfigure with "exit 0" then we can cause the node's health status
  to flip-flop.

  To avoid this we return the status of the service from the previous
  monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 21dfcbbdccd906fcd6ab7bba81418ce565bf63aa)
2011-08-11 10:46:57 +10:00
Martin Schwenke
2a14f91722 Eventscript functions: new function ctdb_check_counter().
This should eventually be able to replace ctdb_check_counter_limit()
and ctdb_check_counter_equal(), although it doesn't issue warnings
like the former.

It takes 4 optional arguments:

1. _msg - If "error" then over limit causes an error message and and
   exit 1.  Anything else fails silently but the function returns 1.
   Default is "error".

2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt).
   Default is -ge.

3. _limit - Limit for the counter to be used in comparison.  Default is
   $service_fail_limit.

4. _service_name - Used to identify the counter.  Default is
   $service_name.

For example:

  ctdb_check_counter error -ge 5 foo

will print a message and exit 1 if the counter for foo is >= 5,
whereas

  ctdb_check_counter check -ge 5 foo

will just return 1 if the counter for foo is >= 5, and

  ctdb_counter_check

with print a message and exit 1 if the counter for $service_name is >=
$service_fail_limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b01b7233515669e995e037205796e265643b176)
2011-08-11 10:46:56 +10:00
Martin Schwenke
219c6fd55b Eventscripts: remove unused remove_ip() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 881af7c1417962b9b3ade6565b3e8eb9f9df7a97)
2011-08-11 10:46:56 +10:00
Martin Schwenke
5c948528b5 Eventscripts: startstop_nfs stop no longer redirects output to /dev/null.
When stopping (as opposed to restarting) it is useful to see this
information.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9ab1937239761dc32b143c9d225447bc6f090b4)
2011-08-11 10:46:56 +10:00
Martin Schwenke
caee6f1508 Eventscripts: fix typo in _ctdb_counter_common().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f57d1722b6aa082f3f826171acc57d7d796ea95c)
2011-08-11 10:46:56 +10:00
Martin Schwenke
ab693dbcc0 Eventscripts: improve log messages in ctdb_start_stop_service().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6da7095192fb172a06b434cfb02f4bfa6221b343)
2011-08-11 10:46:56 +10:00
Martin Schwenke
1b956b2b0a Eventscript functions: fix counter regression.
d362be7d32079ac1390d67056ce107bfbca2c937 wasn't well thought out.
Subsequent commits depend on ctdb_counter_init() taking an argument,
so this makes those cases work.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 05a8fcfbac3da2b5843b31e0fe258255cc761190)
2011-08-11 10:46:56 +10:00
Martin Schwenke
217edfa1c8 Eventscript functions: ctdb_service_check-reconfigure() acts only on monitor.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit beabf506a5eb68fc50fdbf8772c1d2bb0f7951e3)
2011-08-11 10:46:56 +10:00
Martin Schwenke
820d9b30ea Eventscripts: rejig the reconfigure infrastructure.
* Add an optional service name argument to existing reconfigure
  functions.

* User function service_reconfigure() instead of variable
  $service_reconfigure to specify how a service is reconfigured.

* New function ctdb_service_check_reconfigure() reconfigures a service
  if it is flagged for reconfigure.

* Remove $service_reconfigure settings from 40.vsftpd and 41.httpd -
  they're the defaults.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15d4111d0761d82f57d5d4f0b1227812d14e4d7c)
2011-08-11 10:46:20 +10:00
Martin Schwenke
5b5bd3d27b Eventscript functions: move flagging of managed services.
Move flagging of managed or unmanaged services into
ctdb_service_start() and ctdb_service_stop().  That way services will
be correctly flagged if they are started from the startup and shutdown
events.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8675744cbd90b5a5095ed6fff7b36ae82004a457)
2011-08-11 10:46:20 +10:00
Martin Schwenke
428e32d647 Eventscript function: change service_start into a function.
service_start is currently a variable.  This makes passing arguments
hard.  We change it to be a function and put default definitions into
the functions file.

We use a convention that if a service name argument is passed to a
redefined version of service_start() or service_stop() then it will
act unconditionally.  If no argument is passed then it can use
internal logic to decide if services should really be started.  This
is useful when a single eventscript handles multiple services.

This is a cherry-pick of ae38895 that needed to be reset mid-stream.
There is still some breakage following this commit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 86e4aefed9fd1028660c98e3ea758c2b75ffc1d8)
2011-08-11 10:46:20 +10:00
Martin Schwenke
f60802c776 Eventscript functions: add optional event name argument to fail count functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b14f18649f42aab80ce0336c15ab6159f241c9af)
2011-08-11 10:46:20 +10:00
Martin Schwenke
ea6a53e2b3 Eventscript functions - optimise is_ctdb_managed_service().
This function generates a lot of trace when running under "set -x".
This is due to the backward compatibility code.

This adds 3 optimisations:

1. Before invoking the backward compatiblity code,
   is_ctdb_managed_service() returns early if the service is listed in
   $CTDB_MANAGED_SERVICES.

2. ctdb_compat_managed_service() actually now updates
   $CTDB_MANAGED_SERVICES instead of temporary variable $t.

   This means that a subsequent call to is_ctdb_managed_service() will
   short circuit due to optimisation (1).

3. ctdb_compat_managed_service() only adds a service to
   $CTDB_MANAGED_SERVICES if it is the service being checked by
   is_ctdb_managed_service().

   This stops irrelevant services being added to
   $CTDB_MANAGED_SERVICES multiple times by multiple calls to
   is_ctdb_managed_service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 758f4667c60089e09a0439c1eb74f5e426ca5e2e)
2011-08-11 10:46:20 +10:00
Martin Schwenke
ee38b9a159 Eventscript functions: new function ctdb_setup_service_state_dir().
To be used by eventscripts to create a per-service directory for their
own state data.  $service_state_dir is set to point to the new
directory.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a273554791c2a5281aee28f8e2be0c514e14c91e)
2011-08-09 16:35:07 +10:00
Martin Schwenke
ec33c04283 Eventscript functions: new functions to remember/check if service managed.
This was done ad hoc and was badly named.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9a084a121f629b2c1bcefc1e4c4a4a5cacf53987)
2011-08-09 16:20:08 +10:00
Martin Schwenke
72362e7b56 Eventscripts: source a file specified by $CTDB_RC_LOCAL in functions file.
Another unit testing hook.  This is easier than dropping files into
rc.local.d/ and then removing them.

The file has to be executable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b13ac3bdaf326a6cdfd87da9195eb9630806c418)
2011-08-08 13:51:32 +10:00
Martin Schwenke
394bbe8454 Eventscript functions - use $CTDB_VARDIR instead of local $ctdb_spool_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0c6d9b19f0dd8946f9504b0d1cf50dd21f7a592)
2011-08-08 13:21:23 +10:00
Martin Schwenke
cfdccc5cac Eventscripts: use set_proc() in startstop_nfs().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5a3d5c6b1ca3682bb45104e50061871dec6e9b1d)
2011-08-03 19:57:40 +10:00
Martin Schwenke
75bbc93c0b Eventscripts: remove unnecessary absolute paths from external commands.
For eventscript unit testing it will be necessary to override external
commands to allow stub implementations to be used.  If absolute paths
aren't used then this can be done using either a fake bin/
subdirectory or by using shell functions.

This removes all of the simple cases of absolute paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Conflicts:

	config/ctdb.init
	config/events.d/50.samba

        Keep old code but remove absolute paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8)
2011-08-03 17:19:15 +10:00
Martin Schwenke
5f4ab05766 Eventscripts: new functions set_proc() and get_proc().
These provide a thin layer around writing and reading files in /proc.
They can be easily replaced by stubs for unit testing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 637f9d8af517b73c72ed8f3cc2a2661f11eb2126)
2011-08-03 17:04:58 +10:00
Martin Schwenke
571e55ac0d Eventscripts: remove ctdb_wait_command() and ctdb_wait_tcp_ports() functions.
These haven't been used for a long time.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f5fd361cadb3ea18d29e2d7215a7853718e48d00)
2011-08-03 17:02:41 +10:00
Martin Schwenke
e3a9991e46 Eventscripts: iptables() should put lock in $CTDB_VARDIR.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3f04793f391c63b78ffb9c9851ab3f0daf3ed50a)
2011-08-03 16:55:43 +10:00
Martin Schwenke
3bbfdfcdd3 Make Emacs recognise that the eventscript functions file is a shell script.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a6dfb76cfa759f6f9409f24368111c4f85ca0fbf)
2011-08-03 16:49:38 +10:00
Martin Schwenke
3380c6ce1d Eventscript functions: add $CTDB_ETCDIR and hook service() functions.
* $CTDB_ETCDIR defaults to /etc but can be changed for testing.  All
  hard-coded instances of /etc have been changed to $CTDB_ETCDIR.
  This includes references to /etc/init.d and /etc/sysconfig.

* service() and nice_service() functions now call new function
  _service().  This makes it easier to override these functions (say,
  in rc.local) for testing and call most of the existing functionality
  using _service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f43c9a7604b779bb6257ddb2bf3cbe266d496a63)
2011-08-03 16:45:54 +10:00
Martin Schwenke
d31fbcab4b Set $CTDB_VARDIR in the functions file.
This will be needed when eventscripts that use it are called
externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ebd53b66b0cc66d9d04830781886234167fc2164)
2011-08-03 16:44:49 +10:00
Martin Schwenke
3efd5ef77c Eventscripts: only autostart during a monitor event.
Otherwise we might short-circuit events that are run only once and
actually need to do something.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)
2011-01-11 16:48:50 +11:00
Martin Schwenke
fb8f199651 Eventscripts: print a message when reconfiguring a service.
Otherwise there can be strange error messages from services
stopping/starting, without any context.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)
2011-01-11 16:48:17 +11:00
Martin Schwenke
934ae76d38 Eventscripts: work around NFS restart failure under load.
"service nfs restart" can fail.  To stop nfsd it sends a SIGINT and
nfsd might take a while to process it if the system is loaded.
Starting nfsd may then fail because resources are still in use.

This does some /proc magic to tell nfsd to do no more processing.  It
then runs service stop, kills nfsd with SIGKILL, and then runs service
start.  This is much less likely to fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)
2011-01-11 16:47:43 +11:00
Ronnie Sahlberg
8147d29598 add a missing part of the import of the previous ganesha patch
(This used to be ctdb commit 171b8855bb2feae7f7dd6a079571f3113dedd6f4)
2010-12-06 11:50:15 +11:00
Ronnie Sahlberg
ebcc866ae0 update autostart/stop to work for samba
(This used to be ctdb commit 37ab57e2adaecc3f7996ea20af45a5df0cd8be76)
2010-11-22 20:42:26 +11:00
Martin Schwenke
a2af87482b Eventscript functions - catch failures in ctdb_service_start().
ctdb_service_start() currently succeeds if ctdb_counter_init()
succeeds.

This changes it to fail when a service start fails.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ddb73962d72d933bf0edc28be0dbb45bea7e5ef4)
2010-11-18 12:15:05 +11:00
Martin Schwenke
3ab768e8d4 50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.

This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.

An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().

To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d98f175e8420d921a123ae9c0ce00945350b1537)
2010-11-18 12:12:30 +11:00
Ronnie Sahlberg
4fe85e5be5 add a new support function ctdb_check_counter_equal()
update nfs to try to restart the service after 10 consecutive failures
and to flag the node unhealthy after 15

add similar function to mountd

(This used to be ctdb commit 1569a54bb82fc433895ed68f816cf48399ad9d40)
2010-11-17 13:54:57 +11:00
Martin Schwenke
8fe1ec3754 Eventscripts: make loadconfig() function hookable by the test suite.
Rename loadconfig() to _loadconfig().  Add a new loadconfig() that
simply calls _loadconfig().

This makes it easy for the test suite to override loadconfig().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1d77a3adfff893b3c01b87f791e72c0d3148425c)
2010-11-17 11:46:48 +11:00
Martin Schwenke
6ab5ae2c9b 60.nfs only fails or warns after 10 consecutive nfsd/statd failures.
These failures are sometimes the result of slow restarts so we want to
avoid dirtying the logs or marking a node unhealthy because of them,
unless they are excessive.

For these 2 cases we use the existing fail counting code but hack a
temporary service_name in a subshell to allow separate fail counts.

We also update ctdb_check_rpc() so that it captures the error output
from rpcinfo and we add a message including the service name to the
beginning.  The error is printed to stdout but is also stored in
ctdb_check_rpc_out to allow it to be conditionally used by the caller.
This function also now returns non-zero rather than exiting on
failure.

Other direct rpcinfo calls are relaced by called to ctdb_check_rpc()
for consistency.

Option handling code for service restarts is cleaned up so that fits
in 80 columns.  A more informative restart messageis now used in all
cases, printing the exact command being used to start a service.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79c25fe241cf5d8f92e23d3736823ebaf4e1769d)
2010-11-17 11:43:09 +11:00