1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-08 21:18:16 +03:00
Commit Graph

1163 Commits

Author SHA1 Message Date
Martin Schwenke
8f713c87c1 ctdb-scripts: Don't fail monitoring if sanity checks fail
Just log some warnings.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-08-29 17:06:25 +02:00
Martin Schwenke
6b4a46e574 ctdb-scripts: Move filesystem monitoring into a function, clean it up
Drop obvious comments.  Use die() for less lines of code.  Use a case
statement to avoid forking unnecessary processes for each filesystem
being checked.  Drop parentheses around percentages in messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-08-29 17:06:25 +02:00
Martin Schwenke
47f7d1b1c8 ctdb-scripts: Rename 40.fs_use to 05.system
Will put all the system monitoring in here, simplifying 00.ctdb.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-08-29 17:06:25 +02:00
Martin Schwenke
7d04778c82 ctdb-scripts: Improve error handling for 50.samba testparm failure
Also add tests.  Update testparm stub to fake error and timeout.  Add
timeout stub.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-08-07 05:33:29 +02:00
Martin Schwenke
b0bc4d2cab ctdb-scripts: Move 60.nfs Ganesha callout to doc/examples/
We don't expect to maintain an up-to-date copy.  NFS Ganesha team
might provide patches.

Also move the Ganesha .check file

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-21 07:54:09 +02:00
Martin Schwenke
dd88c2ab8c ctdb-scripts: Support RPC checks for tcp6 and udp6
This adds new configuration variable CTDB_RPCINFO_LOCALHOST6.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-21 07:54:09 +02:00
Martin Schwenke
b6a3c1decd ctdb-scripts: Implement registration in nfs-linux-kernel-callout
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
fa6f22d7ae ctdb-scripts: Add registration for CTDB_NFS_CALLOUT operations
This is an optimisation to avoid forking the callout for operations
that are not implemented.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
bb7093ab09 ctdb-scripts: Add portmapper NFS .check file
Unhealthy after 1 failed attempt to contact the portmapper.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
a02bdb97f9 ctdb-scripts: Move NFS support functions to 60.nfs
Now that there is only a single NFS eventscript, other eventscripts no
longer need to load all of this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
a3a443dcf4 ctdb-scripts: Drop configuration variable CTDB_NFS_DUMP_STUCK_THREADS
This is now handled by passing the desired number of threads to the
command specified in the dump_stuck_threads variable in .check files.

Remove unused function nfs_dump_some_threads().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
f3a4c4f10b ctdb-scripts: Remove unused function startstop_ganesha()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
6586651508 ctdb-scripts: Remove 60.ganesha, replace with callout for 60.nfs
This isn't a straightforward move of code from 60.ganesha to the
callout.  Simplifications have been made to allow better
interoperation with the new NFS checking logic.

The following configuration variables have been removed:

  CTDB_GANESHA_REC_SUBDIR

    Edit NFS ganesha callout to change this location

  CTDB_NFS_SERVER_MODE, NFS_SERVER_MODE

    Use CTDB_NFS_CALLOUT instead

  CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK, CTDB_SKIP_GANESHA_NFSD_CHECK

    Disable the corresponding .check file instead

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
713ec21750 ctdb-scripts: Extend NFS .check files with service_check_cmd variable
$service_check_cmd specifies a command to run instead of the regular
rpcinfo-based check.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
d332013123 ctdb-scripts: Remove functions startstop_nfs() and startstop_nfslock()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
9c87d1dd29 ctdb-scripts: Parameterise 60.nfs with $CTDB_NFS_CALLOUT
The goal is to have a single NFS eventscript.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
49c4d1900c ctdb-scripts: Remove old NFS checking code
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
74428e5c14 ctdb-scripts: Switch NFS checks to new style
Note that the 60.ganesha RPC checks need to be identical to those in
the nfs-checks.d/ directory.  This is because the NFS unit test
infrastructure checks output against what should be produced by the
checks in nfs-checks.d/.  This is a minor issue, since one of the aims
of this work is to remove the need for a separate 60.ganesha.

In most cases configuration variable CTDB_NFS_DUMP_STUCK_THREADS is
now ignored.  This is now handled by passing the desired number of
threads to the command specified in the service_debug_cmd variable in
a .check file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
3161d611bb ctdb-scripts: Add new NFS service checking infrastructure
Provides a new extensible format for .check files, using simple
variables instead of the unwieldy extended test(1) syntax now used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
dfeb5b84fd ctdb-scripts: Factor out new function ctdb_counter_get()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
9f4f1c51fc ctdb-scripts: Move "ERROR:" prefix out of ctdb_check_rpc()
There will be warnings in addition to errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:18 +02:00
Martin Schwenke
1a9687f948 ctdb-scripts: Clean up ctdb_check_rpc()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:17 +02:00
Martin Schwenke
6f3ee81d17 ctdb-scripts: NFS RPC checks should be simple and consistent
Change status, nlockmgr, mountd, rquotad to be unhealthy after 6
rpcinfo check failures and do a verbose restart after every 2
failures.  Change 60.ganesha for consistency, since 60.ganesha tests
are broken and depend on the consistency.

Apart from the consistency aspect, the check infrastructure will soon
be simplified so that it only allows the equivalent of "unhealthy" and
"verbose restart:b" actions.

Update tests to have a corresponding numbers of iterations.  Run 1
extra iteration in most tests to check there are no unexpected
behaviour changes after the designated number of iterations completes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:17 +02:00
Martin Schwenke
bc71251433 ctdb-scripts: Support monitoring of interestingly named VLANs on bonds
VLAN interfaces on bonds with a name other than <iface>.<id>@<iface>
are not currently supported.  That is, where the VLAN name isn't based
on the underlying bond name.  Such VLAN interfaces can be created with
the "ip link" command, as opposed to the "vconfig" command, or by
renaming a VLAN interface.

This is improved by determining the underlying interface name for a
VLAN from the output of "ip link".

No serious attempt is made to support VLANs with '@' in their name,
although this seems to be legal.  Why would you do that?

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-14 09:57:16 +02:00
Martin Schwenke
87c5c96b76 ctdb-scripts: Fix regression in VLAN interface support
Commit 6471541d6d broke support for VLAN
interfaces.  Releasing a public IP address depends on
ip_maskbits_iface() and for a VLAN interface this will return an
interface of the form <vlan>@<iface>, which can't be fed back into
"ip" commands.

Update ip_maskbits_iface() to drop the '@' and everything after it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reported-by: Jan Schwaratzki <jschwaratzki@ddn.com>
2015-07-14 09:57:16 +02:00
Martin Schwenke
0a65013b9d ctdb-scripts: Use an "if" statement instead of "&&"
If statd-callout is unwanted, so is removed, then this code fails.
Change to an "if" so that it succeeds as intended.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-09 06:23:21 +02:00
Martin Schwenke
0c609c9505 ctdb-scripts: Only write to /proc route flush files if they exist
On IPv4-only or IPv6-only systems one of these files will not exist.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-01 04:18:29 +02:00
Martin Schwenke
27674c413d ctdb-scripts: Create the directory containing the recovery lock
This will handle the most obvious cases.  It won't handle the case
where the directory is missing and the recovery lock location is
updated at run-time.  However, this is a good improvement.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-01 04:18:28 +02:00
Jose A. Rivera
7266c68856 ctdb: Change use of 'which' to 'type' in scripts.
While 'which' is a very common tool, on many distros it is not a requirement
that it be installed. 'type' is a shell built-in specified by the Open Group,
and is found in shells like bash, dash, and ksh across multiple OSes.

Signed-off-by: Jose A. Rivera <jarrpa@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jun  5 20:39:47 CEST 2015 on sn-devel-104
2015-06-05 20:39:47 +02:00
Martin Schwenke
3b25face87 ctdb-scripts: New eventscript 10.external
This is an alternative to 10.interface and is installed as disabled by
default.  It should only be used with DisableIPFailover=yes and when
IP failover is being handled externally.  In this mode CTDB can be
informed of public IP address movements using "ctdb moveip".

During the "startup" event, this eventscript currently finds any
public IP addresses configured in $CTDB_PUBLIC_ADDRESSES and tells
CTDB which node they are on using "ctdb moveip".  This allows CTDB to
send ARPs and tickle-ACKs.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-13 06:42:13 +02:00
Martin Schwenke
0d0512cb65 ctdb-scripts: Drop all public IP addresses from 10.interface
00.ctdb should not know about public IP addresses.

Move related tests to operate on 10.interface.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-13 06:42:13 +02:00
Martin Schwenke
c927ec928c ctdb-scripts: Drop update of public address configuration from config.tdb
This isn't used or documented anywhere.

2 differing points of view:

* This is a very good idea but it should probably be generalised to
  cover more configuration items.  This would end up like the Samba
  registry configuration and would use a tool to support setting
  configuration values.

* If people really want to update configuration while a node is down
  then they should fix the configuration before bringing up that node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
e359d826a4 ctdb-scripts: Add alternative network family monitoring for NFS
For example, adding a file called nfs-rpc-checks.d/20.nfsd@udp.check
will cause NFS to be checked on UDP as well, using a separate counter.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Apr 30 09:24:12 CEST 2015 on sn-devel-104
2015-04-30 09:24:12 +02:00
Amitay Isaacs
f6af2d96c2 ctdb-scripts: Run tdb checker under timeout command
If tdb database file size grows beyond 4GB, tdbtool/tdbdump can hang
indefinitely.  This will prevent CTDB from starting up.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-04-30 06:45:26 +02:00
Amitay Isaacs
83f3a35645 ctdb-scripts: Add new configuration variable CTDB_MAX_OPEN_FILES
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-04-30 06:45:26 +02:00
Martin Schwenke
0621f07eb4 ctdb-scripts: New configuration variable CTDB_NODE_ADDRESS
Required when automatic address detection can not be used.  This can
be the case when running multiple ctdbd daemons/nodes on the same
physical host (usually for testing), using InfiniBand for the private
network or on Linux when sysctl net.ipv4.ip_nonlocal_bind=1.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Apr 27 06:10:08 CEST 2015 on sn-devel-104
2015-04-27 06:10:08 +02:00
Martin Schwenke
0ae57588eb ctdb-scripts: Simplify a command pipeline
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-27 03:32:11 +02:00
Martin Schwenke
1092f9755f ctdb-scripts: Replace uses of "ctdb pnn" with ctdb_get_pnn()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-27 03:32:10 +02:00
Martin Schwenke
09b5e4978a ctdb-scripts: Changed uses of "ctdb xpnn" to ctdb_get_pnn()
"ctdb xpnn" does not work when sysctl net.ipv4.ip_nonlocal_bind=1,
since it determines the node by attempting to bind to each addres in
the nodes file.  The solution is to not use "ctdb xpnn".  After the
initial call, ctdb_get_pnn() will be more efficient that "ctdb xpnn".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-27 03:32:10 +02:00
Martin Schwenke
579dda6858 ctdb-scripts: New function ctdb_get_pnn() does cached retrieval of PNN
This avoids the expense of establishing a client connection to the
daemon just to get the PNN of the current node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-27 03:32:10 +02:00
Amitay Isaacs
14886ed00c ctdb-scripts: Use tcp connection for checking RPC services
It's possible for a RPC service to register only for UDP and not TCP.
Since we assume all the NFS operations are over TCP, always check RPC
services over TCP.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Martin Schwenke
130202d635 ctdb-scripts: Respect $RPCMOUNTDOPTS when restarting rpc.mountd
$RPCMOUNTDOPTS is ignored when restarting rpc.statd due to the service
being unresponsive.  This variable can be used to increase the number
of rpc.mountd threads when there are a lot of clients reattaching so
ignoring it can mean that only a single rpc.mount thread is started.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-27 06:40:08 +01:00
Amitay Isaacs
1f523a628a ctdb-tests: Avoid early exits in scripts that appear on tail of a pipe
When executing a shell script code "foo | bar", if "bar" terminates early,
then "foo" can get I/O error when writing to stdout.

The tdbtool stub did not wait to read anything from stdin when it is
expected to.  This would cause tests to fail randomly under load when
tdbtool process exited early.

Similarly, debug function read from stdin only under certain conditions
(higher debug and when not reading from tty).  Otherwise, exited early.

Thanks to Andrew Bartlett for noticing the problem and Catalyst Cloud
(http://catalyst.net.nz/cloud) for providing resources to test fixes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 20 16:26:37 CET 2015 on sn-devel-104
2015-03-20 16:26:36 +01:00
Amitay Isaacs
4f82ef4b38 ctdb-scripts: Simplify 00.ctdb event script
Avoid extra which commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2015-03-20 13:49:26 +01:00
Martin Schwenke
38279d7ec1 ctdb-eventscripts: Make 11.natgw stateful
IP addresses and routes are only changed if either the NAT gateway
configuration or the NAT gateway master node has changed.  If running
"ip monitor" this will minimise the amount of noise seen.  It should
also be more lightweight at the expense of managing a couple of state
files.

Add a test to check that configuration changes behave correctly.
Tweak the static route result generation code so that the required
output is sorted.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-16 06:41:06 +01:00
Martin Schwenke
50ddc2c356 ctdb-scripts: Remove unused function nfs_statd_update()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-04 10:42:27 +01:00
Martin Schwenke
500c6e194b ctdb-scripts: Change statd-callout to be more scalable
Updating ctdb.tdb on each add-client, del-client and each delete
during notify was too ambitious.  Persistent transactions do not
perform well enough to do this.

Revert to having add-client and del-client create touch files.  Each
monitor event calls "statd-callout update" to convert touch files into
ctdb.tdb records.

Update testcases to do the "update" and add an extra test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-04 10:42:27 +01:00
Martin Schwenke
032441d9a2 ctdb-scripts: Fix a regression in statd-callout
Commit 4638010abb changed from using
gensub() to gsub() in awk.  However, it didn't halve the number of
backslashes in the target strings.  This is necessary because
backslash is used in gensub() target strings to allow substitution of
text matching parenthesised subexpressions.  This is not the case with
gsub().

So, halve the number of backslashes in the target string where gsub()
is used in statd-callout.  This is the only target string broken by
changes made by the above commit

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-04 10:42:27 +01:00
Martin Schwenke
dc32f11b87 ctdb-scripts: Improve messages about invalid tunables during "setup"
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 18 08:03:33 CET 2015 on sn-devel-104
2015-02-18 08:03:33 +01:00
Martin Schwenke
39686f4505 ctdb-scripts: Fix tunable setup code by making it shell-agnostic
All tunables set in configuration are currently set to 0 on system
where /bin/sh is dash (and perhaps other non-bash shells).  dash puts
single quotes around all values in the output of the "set" builtin
command, whereas bash only puts them around values when something
needs to be quoted.  Tunables always have a simple integer value so
dash will quote them and bash won't.  The setup code currently passes
the raw value, including any quotes to "ctdb setvar ...".  This
command does no error checking on the input, so "'1'" is converted to
0.

Change the code so that the value is determined from the shell
variable and is independent of the "set" output.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-18 05:34:06 +01:00
Martin Schwenke
664d62b611 ctdb: Change default debug level to NOTICE (2)
This was true for the daemon until commit
b4589b954e.

Defaulting to ERR in the ctdb CLI tool encourages logging notices at
ERR level, so default to NOTICE instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-18 05:34:06 +01:00
Martin Schwenke
ab51f283e7 ctdb-scripts: Call iptables/ip6tables directly from iptables_wrapper
Drops the iptables() and ip6tables() functions and, hence, the
hardcoding of paths /sbin/iptables and /sbin/ip6tables.  The latter
avoids problems on openSUSE where (for example) /usr/sbin/iptables is
used instead.

This means that locking around ip*tables commands is only done when
iptables_wrapper is called directly.  This is fine because the only
conflict is when "releaseip" or "takeip"/"updateip" events are run in
parallel.  The other uses in 11.natgw and 70.iscsi are in events where
there will be no collisions.

Making 11.natgw support IPv6 is unnecessary.  Just put a static IPv6
address on each interface - they're plentiful.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jan 28 08:29:55 CET 2015 on sn-devel-104
2015-01-28 08:29:55 +01:00
Martin Schwenke
9b67c1fa37 ctdb-scripts: Error message, comment and whitespace cleanups
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-01-28 06:01:09 +01:00
Martin Schwenke
1a5414b6d2 ctdb-scripts: iSCSI eventscript should fail when PNN can't be determined
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-01-28 06:01:08 +01:00
Martin Schwenke
d1bd26e5eb ctdb-scripts: Make 70.iscsi IPv6-aware
Block iSCSI port for families of all address the node is configured to
host.

Could just unconditional add blocking using ip6tables instead.
However, this would produce errors when no IPv6 public addresses are
configured and ip6tables is not installed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-01-28 06:01:08 +01:00
Martin Schwenke
4638010abb ctdb-scripts: Don't use the GNU awk gensub() function
This is a gawk extension and can't be used reliably if just running
"awk".  It is simple enough to switch to using the standard sub() and
gsub() functions.

The alternative is to switch to explicitly running "gawk".  However,
although the eventscripts aren't exactly portable, it is probably
better to move closer to portability than further away.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-01-09 02:03:40 +01:00
Martin Schwenke
a5c5eee7d1 ctdb-scripts: Try to deal with Ubuntu having /usr/sbin/service
Falling back to running the initscript doesn't work because it detects
that upstart is being used and fails.  This was observed when trying
to start winbind on Ubuntu 11.04.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-01-09 02:03:40 +01:00
Led
2c3672f424 ctdb-scripts: Fix bashism in ctdbd_wrapper script
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11007

Signed-off-by: Oleksandr Chumachenko <ledest@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-01-09 02:03:40 +01:00
Martin Schwenke
d0b2375c3d ctdb-scripts: Wait until IPv6 addresses are not "tentative"
There are a few potential failure modes when adding an IPv6 address.
It takes a little while of duplicate address detection to complete, so
wait for a while.  After a timeout, also need to check to see if
duplicate address detection failed - if it did then actually drop the
IP address.

This really needs some careful thinking.  If CTDB disappears on a node
but the node's IP addresses are still on interfaces then the above
failure mode could cause the takeover nodes to become banned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Amitay Isaacs
d4212bd6a5 ctdb-eventscripts: Specify broadcast optionally to ip addr add
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-12-05 21:02:40 +01:00
Martin Schwenke
6471541d6d ctdb-scripts: Make 10.interface IPv6-safe
Add checking to "releaseip" and "updateip" to ensure that the given IP
address is really on the given interface with the given netmask.  If
reality doesn't match the given arguments then believe reality.

Use new function iptables_wrapper() instead of calling iptables()
directly.

Use new function flush_route_cache() instead of doing IPv4-specific
/proc magic.

Remove setting of otherwise unused variable "failed".

Fix a test for which the error message has changed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Martin Schwenke
c314ae0b2a ctdb-scripts: New functions ip6tables() and iptables_wrapper()
ip6tables() uses the same lock as iptables().  This is done on
suspicion.

iptables_wrapper() takes 1st argument "inet" or "inet6", and the rest
is passed to the correct iptables variant.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Martin Schwenke
ed029ae0a1 ctdb-scripts: Add IPv6 addresses support in ip_maskbits_iface()
It also prints a third word, the address family.  This is either
"inet" or "inet6".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Martin Schwenke
4940f191d3 ctdb-scripts: Update eventscripts to use ctdb -X instead of ctdb -Y
Also update associated eventscript unit tests and ctdb stub.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:39 +01:00
Martin Schwenke
7f377cf26e ctdb-scripts: Fix stack dumping when debugging hung scripts
There are parentheses missing that stop the default pattern from
matching commands with trailing garbage (e.g. "exportfs.orig").

A careful check of POSIX (and running GNU sed with --posix) suggests
that "\|" isn't a supported way of specifying alternation in a regular
expression.  Therefore, it is clearer to switch to extended regular
expressions so that this has a chance of being portable (even though
the point is to print /proc/<pid>/stack, which only works on Linux).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104
2014-11-18 06:37:44 +01:00
Martin Schwenke
4cd5be87da ctdb-scripts: Try to restart statd after every 10 failures
Also add and update tests for statd stack dumps.  Update the existing
60.ganesha statd test to do more iterations.  Duplicate the result as
a new test for 60.nfs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
f51672f514 ctdb-scripts: Add rpc.statd stack dumping to Ganesha restart
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
968401ccdc ctdb-scripts: Dump stack traces for hung mountd, rquotad, statd processes
Add a corresponding new unit test for statd.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
1f49e1ab5b ctdb-scripts: Add optional program name argument to nfs_dump_some_threads()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
2ebc305be6 ctdb-scripts: Factor out new function program_stack_traces()
In the process, fix a bug where an extra trace would be printed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-11-18 04:17:10 +01:00
Martin Schwenke
1a8d431936 ctdb-logging: Add logging via UDP logging using RFC5424
Some implementations may not understand RC3164 format messages on the
UDP socket, so add support for RFC5424 message format.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Martin Schwenke
8ed3ff456c ctdb-logging: Add logging via UDP to 127.0.0.1:514 to syslog backend
This has most of the advantages of the old logd with none of the
complexity of the extra process.  There are several good syslog
implementations that can listen on the UDP port.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Martin Schwenke
a6e770ec28 ctdb-logging: Add non-blocking Unix domain logging to syslog backend
Format messages as per RFC3164.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Martin Schwenke
1d1cd04cb9 ctdb-logging: New option CTDB_LOGGING, remove CTDB_LOGFILE, CTDB_SYSLOG
Remove --logfile and --syslog daemon options and replace with
--logging.

Modularise and clean up logging initialisation code.  The
initialisation API includes an app_name argument that is currently
unused - this will be used in extensions to the syslog backend.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-10-28 05:42:04 +01:00
Martin Schwenke
b544073653 ctdb-logging: Remove log ringbuffer
As far as we know, nobody uses this and it just complicates the
logging subsystem.

Remove all ringbuffer code and documentation.  Update the local
daemons startup code correspondingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-10-06 12:34:32 +02:00
Amitay Isaacs
f1e281cd47 ctdb-scripts: Fix the regular expresssion for parsing /proc/locks
The major and minor device numbers are hexadecimal not decimal.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Sep 25 07:19:59 CEST 2014 on sn-devel-104
2014-09-25 07:19:59 +02:00
Amitay Isaacs
22257dd4b6 ctdb-scripts: Do not export variables if they are not set
Variables that are not set but exported, may return an empty string
for getenv().  Tested on freebsd.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Sep 17 09:55:47 CEST 2014 on sn-devel-104
2014-09-17 09:55:47 +02:00
Amitay Isaacs
8509bffdeb ctdb-scripts: Fix a typo
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-09-17 07:29:10 +02:00
Martin Schwenke
bc59e508d3 ctdb-eventscripts: Remove special case for virtio_net
The current check is incorrect in 2 ways:

* Commit be71a84565 contained a thinko
  that stops virtio_net interfaces from simply being marked up

* virtio_net interfaces can actually be down

virtio_net has supported ethtool since Linux 2.6.29, so just remove
the special case.  This means that testing CTDB on very old virtual
machines is not supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 31 13:08:47 CEST 2014 on sn-devel-104
2014-07-31 13:08:47 +02:00
Martin Schwenke
7c2c6748e3 ctdb-eventscripts: Remove unused argument to natgw_ensure_master()
This was used to limit damage in the "recovered" event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jul 29 10:03:16 CEST 2014 on sn-devel-104
2014-07-29 10:03:16 +02:00
Martin Schwenke
cb94eba157 ctdb-eventscripts: Remove NAT gateway "monitor" event
This event was introduced to handle misconfiguration.  For example,
where all nodes where configured as NAT gateway slaves.

However, this event can fail when there are performance issues and
capabilities can't be retrieved from a remote node.  The problem is
most likely with the remote node, so marking the local node UNHEALTHY
is probably a mistake.

Having a NAT gateway master node only matters in "ipreallocated", so
leave it to do the checking.  Given that a node will run
"ipreallocated" as part of the first recovery, this should cause
misconfigurations to be detected nice and early.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-29 07:38:13 +02:00
Martin Schwenke
61b1fdec2f ctdb-scripts: Support NFS on RHEL7 with systemd
Need to be able to recognise a RHEL system.  Still use "system" to
start and stop service, since that still works and yields the smallest
change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-07 10:59:56 +02:00
Martin Schwenke
a7c5500765 ctdb-tests: Fix racy test for debugging hung scripts
Debugging can still be running when a monitor event times out and
scriptstatus output changes.

When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done.  The test
then waits for the log file to appear.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul  3 08:19:23 CEST 2014 on sn-devel-104
2014-07-03 08:19:22 +02:00
Martin Schwenke
b0c191e5de ctdb-scripts: Always print footer when debugging hung script
There shouldn't be an early exit for the "init" event.  Just make the
"ctdb scriptstatus" call conditional.

While here, move the comment about only running a single instance to
be near locking code.  The comment is more useful there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-07-03 05:55:13 +02:00
Martin Schwenke
aac607d727 ctdb-eventscripts: Ensure $GANRECDIR points to configured subdirectory
Check that the $GANRECDIR symlink points to the location specified by
$CTDB_GANESHA_REC_SUBDIR and replace it if incorrect.  This handles
reconfiguration and filesystem changes.

While touching this code:

* Create the $GANRECDIR link as a separate step if it doesn't exist.
  This means there is only 1 place where the link is created.

* Change some variables names to the style used for local function
  variables.

* Remove some "ln failed" error messages.  ln failures will be logged
  anyway.

* Add -v to various mkdir/rm/ln commands so that these actions are
  logged when they actually do something.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 20 05:40:16 CEST 2014 on sn-devel-104
2014-06-20 05:40:16 +02:00
Martin Schwenke
6da8126a11 ctdb-eventscripts: New configuration variable CTDB_GANESHA_REC_SUBDIR
Backup and restore of the cluster filesystem can upset the operation
of 60.ganesha by changing the contents of this subdirectory.

Allow this subdirectory to be configured to a subdirectory that is
ignored by backup and restore processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jun 11 09:29:22 CEST 2014 on sn-devel-104
2014-06-11 09:29:22 +02:00
Martin Schwenke
151b02cd9e ctdb-eventscripts: Add check for invalid policy routing configuration
The range
CTDB_PER_IP_ROUTING_TABLE_ID_LOW..CTDB_PER_IP_ROUTING_TABLE_ID_HIGH
should not include 253-255.  Otherwise policy routing may overwrite
the default system routing tables.

Add some corresponding tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-05-05 03:55:08 +02:00
Martin Schwenke
e09147b6a3 ctdb-eventscripts: Update comment in 11.routing
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-05-05 03:55:08 +02:00
Martin Schwenke
be71a84565 ctdb-eventscripts: Don't check if $iface is empty
This is the loop variable.  It can't be empty, especially given the
way the list is built.  This must have survived from an earlier
version of the script.

Given that there are whitespace changes associated with the above,
clean-up the "virtio_net" avoidance check so that it reads less like
line-noise.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-05-05 03:55:08 +02:00
Martin Schwenke
2f2421bae1 ctdb-eventscripts: CTDB_NATGW_PUBLIC_* optional on slave-only nodes
Commit 4ee4925d41 forgot about
CTDB_NATGW_SLAVE_ONLY so it introduces an incorrect failure when this
is set, and CTDB_NATGW_PUBLIC_IFACE or CTDB_NATGW_PUBLIC_IP is unset.

Relax the sanity check to see if CTDB_NATGW_SLAVE_ONLY is set.

Update the documentation to explicitly state that
CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IP are optional and
unused if CTDB_NATGW_SLAVE_ONLY is set.  It would be possible to
insist that CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IFACE should
be unset in that case.  However, it is more reasonable to allow
consistent configuration across nodes except with some nodes
configured slave-only.

Add tests, update infrastructure and fix a thinko in the stub's
"natgwlist" implementation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Apr 14 06:06:49 CEST 2014 on sn-devel-104
2014-04-14 06:06:49 +02:00
Martin Schwenke
70bbbbe448 ctdb-eventscripts: CTDB_NATGW_STATIC_ROUTES can specify gateways
Extend CTDB_NATGW_STATIC_ROUTES so that each network can have an
optional gateway that overrides CTDB_NATGW_DEFAULT_GATEWAY.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:42 +01:00
Martin Schwenke
34682affe9 ctdb-eventscripts: New configuration variable CTDB_NATGW_STATIC_ROUTES
This can be used to create more specific NATGW routes than the usual
NATGW default route.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:42 +01:00
Martin Schwenke
7705efc355 ctdb-eventscripts: Clarify that CTDB_NATGW_DEFAULT_GATEWAY is optional
This has been implied since the command to add the route has had
errors redirected to /dev/null.  If infrastucture (e.g. ADS, DNS) is
on the same network as CTDB_NATGW_PUBLIC_IP then no route is
necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:42 +01:00
Martin Schwenke
8a3be1f1a9 ctdb-eventscripts: Improve check in NATGW "startup" event
Although the dots in $CTDB_NATGW_PUBLIC_IP could probably only help
match an invalid public IP address, this is only executed once so do
as exact a check as possible.

Use CTDB_BASE instead of hardcoding /etc/ctdb.

Make the error message less redundant.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:42 +01:00
Martin Schwenke
e22a22b1f7 ctdb-eventscripts: Reformat natgw_clear()
Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:42 +01:00
Martin Schwenke
3c839c60d1 ctdb-eventscripts: Rename some NAT gateway functions
delete_all() really needed renaming for clarity.  While doing this,
might as well rename some of the others that don't start with
"natgw_".

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:42 +01:00
Martin Schwenke
4ee4925d41 ctdb-eventscripts: Sanity check NAT gateway configuration
NAT gateway really can't operate unless most of the configuration
variables are set.

A check in delete_all() can be removed - strange that this isn't also
done in the add case.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:41 +01:00
Martin Schwenke
0953f5799c ctdb-eventscripts: Improve readability of NAT gateway update code
Put the code into a couple of usefully named functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:41 +01:00
Martin Schwenke
feeb9843bf ctdb-eventscripts: Use set_proc() to update /proc
In case we want to write some unit tests in the future.

Signed-off-by: Martin Schwenke <martin@meltin.net>
2014-03-26 04:21:41 +01:00
Martin Schwenke
058e14cdb0 ctdb-eventscripts: Fix regression in IP add/delete functions
Commit 176ae6c704 caused these functions
to exit on failure.  This is incorrect and broke NAT gateway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-03-23 04:20:14 +01:00
Martin Schwenke
87d58fd07b ctdb-eventscripts: Attach to persistent ctdb.tdb in "startup" event
"statd-callout notify" currently complains until an add-client or
del-client is done.

Given that we might use ctdb.tdb for something else in the future it
makes sense attach to it in the "startup" event.  This could be done
in the background but it should be so lightweight that a timeout will
indicate serious problems.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-03-23 04:20:14 +01:00
Martin Schwenke
fcf846a795 ctdb-eventscripts: Switch on dumping of stuck nfsd threads
This feature was added quite a while ago but was not enabled by
default.  It is a useful feature so enable it to dump stack traces of
up to 5 stuck processes by default.

This can be disabled by setting:

  CTDB_NFS_DUMP_STUCK_THREADS=0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104
2014-02-25 04:06:45 +01:00
Martin Schwenke
c743fc4345 ctdb-scripts: Update a misleading comment
This comment was true when 50.samba was spaghetti because it tried to
automatically manage both smbd (and nmbd) and winbind.  It isn't true
anymore.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 19 04:07:12 CET 2014 on sn-devel-104
2014-02-19 04:07:12 +01:00
Martin Schwenke
2532149f8f ctdb-scripts: Enhancements to hung script debugging
* Add stack dumps for "interesting" processes that sometimes get
  stuck, so try to print stack traces for them if they appear in the
  pstree output.

* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
  CTDB_DEBUG_HUNG_SCRIPT_STACKPAT.  These are primarily for testing
  but the latter may be useful for live debugging.

* Load CTDB configuration so that above configuration variables can be
  set/changed without restarting ctdbd.

Add a test that tries to ensure that all of this is working.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-02-19 12:04:47 +11:00
Martin Schwenke
176ae6c704 ctdb-eventscripts: Deleting IPs should use the promote_secondaries option
If a primary IP address is being deleted from an interface, the
secondaries are remembered and added back after the primary is
deleted.  This is done under a lock shared by the add/del script code.
It is necessary because, by default, Linux deletes secondaries when
the corresponding primary is deleted.

There is a race here between ctdbd and the scripts, since ctdbd
doesn't know about the lock.  If ctdbd receives a release IP control
and the IP address is not on an interface then it is regarded as a
"Redundant release of IP" so no "releaseip" event is generated.  This
can occur if the IP address in question is a secondary that has been
temporarily dropped.  It is more likely if the number of secondaries
is large.

Since Linux 2.6.12 (i.e. 2005) Linux has supported a
promote_secondaries option on interfaces.  This option is currently
undocumented but that will change in Linux 3.14.  With
promote_secondaries enabled the kernel will not drop secondaries but
will promote a corresponding secondary instead.  The kernel does all
necessary locking.

Use promote_secondaries to simplify the code, avoid re-adding
secondaries, avoid re-adding routes and provide improved performance.

This could be done conditionally, with a fallback to legacy
secondary-re-adding code, but no supported Linux distribution is
running a pre-2.6.12 kernel so this is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-02-13 02:03:24 +01:00
Srikrishan Malik
9a2a5a2f7c ctdb-eventscripts: Create extra files for ganesha recovery
This adds new files for Ganesha's recovery.  myreleaseip_* are used by
the recovery thread on the node where IP is released. The releaseip_*
and tekeip_* files are used by recovery thread where IP is taken over.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-02-12 06:50:08 +01:00
Srikrishan Malik
6b378f2f76 ctdb-eventscripts: Run mmlsconfig only once and use cached results
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-02-12 06:50:08 +01:00
Srikrishan Malik
164ee000df ctdb-eventscripts: Do not mark node unhealthy if no fs is available
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104
2014-01-30 11:18:19 +01:00
Martin Schwenke
b7bfe46636 ctdb/eventscripts: Move all eventscript state under $CTDB_VARDIR/state
Services can be flagged for reconfigure when they release IPs at
shutdown.  The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.

$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event.  Just push the service
state subdirectories down a level and put everything else in a
subdirectory.

This way all the eventscript state gets cleaned up every time CTDB
starts up.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104
2014-01-17 09:58:26 +01:00
Martin Schwenke
50e00b3e52 ctdb/eventscripts: Print a count if killing TCP connections times out
Also update related test

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-01-17 17:59:34 +11:00
Martin Schwenke
8eb20c2347 ctdb/eventscripts: Reconfigure lock should be released quickly
Currently the lock is held until the corresponding eventscript
completes, since the process still exists.  If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time.  The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held.  This can cause an unwanted monitor replay.

Change this so that the lock is released immediately after the
reconfiguration is complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-01-17 17:59:26 +11:00
Martin Schwenke
fdccaab2a9 ctdb/eventscripts: Do not reconfigure in "monitor" events
"monitor" events can be cancelled.  If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped.  In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).

A long time ago we did service reconfiguration in "monitor" events
following failovers.  Service reconfiguration was then moved to the
"ipreallocated" event.  However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur.  The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated".  Therefore, IPs can be deleted without
running the required service reconfiguration.

The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.

This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.

Also update the associated tests.  Make the first confirm that the
monitor event no longer does reconfiguration.  Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
2013-12-17 06:32:35 +01:00
Amitay Isaacs
c18f3eeffb ctdb-scripts: Be careful when generating unique pids for stack traces
sort expects the data to be line based, so make it so.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:17 +01:00
Amitay Isaacs
21ef3b1cc0 ctdb-config: Simplify the default CTDB configuration file
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:17 +01:00
Amitay Isaacs
b3efb7ea51 ctdb-scripts: Replace hard-coded /var/ctdb with CTDB_VARDIR
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:17 +01:00
Amitay Isaacs
7f20b760ec ctdb-scripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT
If these configuration variables are not defined, then there should
a default fallback.  This is a workaround till CTDB compile time
configuration can be accessed at runtime.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:17 +01:00
Amitay Isaacs
7a174985ff ctdb-eventscripts: Perform share check before NFS RPC checks in 60.ganesha
If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:17 +01:00
Martin Schwenke
a6dbe126f5 ctdb-scripts: Add an early exit to statd-callout's notify case
If $statd_state is empty then the loop will run once and print
spurious errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:16 +01:00
Martin Schwenke
f279a97ca4 ctdb-eventscripts: Remove the nfs_statd_update() call from 60.ganesha
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:16 +01:00
Amitay Isaacs
86802b05f6 ctdb-scripts: Run a single instance of debug_locks.sh at a give time
This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.

Also, improve the logging format with separators.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:16 +01:00
Martin Schwenke
1dcf01f4a6 ctdb-scripts: Rewrite statd-callout to avoid 10 minute lag
This is naive and assumes no performance problems when updating
persistent DBs.  It also does no error handling.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:16 +01:00
Martin Schwenke
4ab58a12a1 ctdb-scripts: debug_locks.sh should use configuration to find TDB location
That is, don't use fixed paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-27 18:46:16 +01:00
Srikrishan Malik
ab59087775 eventscript: Fix link creation failure if the link already exist but the target path is missing
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>

(This used to be ctdb commit 370022e1ff654db99d0c3ce0c49914c249e57289)
2013-11-01 13:09:05 +11:00
Martin Schwenke
edda442b36 eventscripts: Rewrite the smb.conf cache file handling
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning.  Instead, put a
timeout on a foreground update.

If the foreground update fails:

* If there's no available cache file then die.

* If there is a previous cache file then use it and log a warning.

* Do a background update at the end of the monitor event.

Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value.  Update the
associated test to add a comma.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562)
2013-10-29 17:14:55 +11:00
Martin Schwenke
ab1b91caa4 initscript: Update systemd configuration to put PID file in /run/ctdb
Elsewhere we're moving the socket to /var/run/ctdb.  We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b63f6fd2d295c8e18cbf3420ab05fce07b727f31)
2013-10-25 12:06:07 +11:00
Amitay Isaacs
7eb680a95f build: Move the default CTDB socket from /tmp to /var/run/ctdb
Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.

The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location.  This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dc67a4e24af9d07aead2a1710eeaf5d6cc409201)
2013-10-25 12:06:07 +11:00
Mathieu Parent
cdf507c4b5 Add missing $remote_fs LSB dependency
(This used to be ctdb commit a0b965bb73777dde7a4abf80c5c4742581bce520)
2013-10-24 16:54:08 +11:00
Amitay Isaacs
17f8295460 eventscripts: Instead of listing all tunables, query EventScriptTimeout
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 58ca2c3e7e3a27023ad86660f01a2052e2a19635)
2013-10-24 16:54:07 +11:00
Martin Schwenke
4fbf3e5bdf initscript: New configuration variable CTDB_DBDIR_STATE
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30d9b634b16c3cc740e5e453ea5c21012b1fde88)
2013-10-22 14:34:05 +11:00
Martin Schwenke
37aea37269 scripts: Make detect_init_style() more readable
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 516cdea0e73cf3f63b3303e22809834c8cbc64e4)
2013-10-22 14:34:05 +11:00
Martin Schwenke
0b69785eb2 eventscripts: Rework the iSCSI eventscript
* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45e2bc66abf9fcfeadcc279a656ed7fd1838920a)
2013-10-22 14:34:05 +11:00
Martin Schwenke
04c31bf50d eventscripts: Don't update static routes on "recovered" event
Routes only need to be updated when IPs have moved.  IP takeover runs
will generate "ipreallocated", which is enough.  "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1152215fc69217e4292762e28d193b7ea0e06ee3)
2013-10-22 14:34:05 +11:00
Martin Schwenke
3132550a88 eventscripts: NAT gateway script doesn't need to handle "recovered" event
Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event.  The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 542c70d6281d636ecd51502fbbf219f418bfac66)
2013-10-22 14:34:05 +11:00
Martin Schwenke
5369f711dc eventscripts: Delete placeholder "recovered" and "shutdown" events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00736a21fc268c10b6a718731e56b3dbb7e60554)
2013-10-22 14:34:04 +11:00
Martin Schwenke
2e819aa00f eventscripts: Clean up comment at the top of 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea9d3acfe7e8665685f54294f5edc9b8ffc2f3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cf04ff178c eventscripts: Remove reconfigure check from samba and winbind eventscripts
There is no reconfigure code for these scripts so no need to check for
reconfiguration.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 41df1637c1d8a7b2f5a9974408db71b1f74cb2f2)
2013-10-22 14:34:04 +11:00
Martin Schwenke
a45aae410c eventscripts: Remove reconfigure code from httpd eventscript
Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b77fd95bda5f1960aca952e1b759231890b56f3)
2013-10-22 14:34:04 +11:00
Martin Schwenke
49d0153b10 eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
A generic framework is no longer needed now that the "ctdb" checker is
the only one left.  Simplify the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 044d302b41a2040642355401e3236fcecc3a620a)
2013-10-22 14:34:04 +11:00
Martin Schwenke
0e9c939c0c eventscripts: Remove TCP port checks other than the built-in CTDB one
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.

Remove tests related to the removed checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
2013-10-22 14:34:04 +11:00
Martin Schwenke
d02a645691 scripts: Remove setting of PATH from functions file
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin.  Use of /usr/local/bin, which may be
required on some systems, is also overridden.  This can make it
difficult to do interactive debugging of script problems.

Rely on the system PATH instead.

If system-specific changes need to be made then this can be done in a
configuration file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfbff39e22e42f3997f637290748290833525714)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1ede20925f eventscripts: Clean up 20.multipathd
Reduce the complexity, including the depth of background processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49f077c475b078889ff0492fe7d567a64d6cb87c)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1e4c965f52 eventscripts: NAT gateway script should export CTDB_NATGW_NODES
Otherwise calls to "ctdb natgwlist" will not behave as expected if a
non-standard file is used, since that command will use the default
file location.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e574b30257126679704b088c4334a8e7a53a9c3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cd4041760b scripts: Simplify script_log() to just look at CTDB_SYSLOG variable
The old logic was actually wrong.  If CTDB_LOGFILE is unset then a
default is used, not syslog.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79e2029f9bc078126e865aa715100a3870c7604b)
2013-10-22 14:34:04 +11:00
Martin Schwenke
4526fdbbca scripts: Remove support for CTDB_OPTIONS configuration variable
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog).  If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1043b53d12 scripts: Remove unused configuration variable CTDB_MANAGES_SCP
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bda0da41aaf629a252cc361b73ebc5328f26ed04)
2013-10-22 14:34:03 +11:00
Martin Schwenke
04f67b1066 eventscripts: Deprecate NFS_SERVER_MODE, use CTDB_NFS_SERVER_MODE instead
All CTDB configuration variables should start with CTDB_.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
2013-10-22 14:34:03 +11:00
Martin Schwenke
ace6c1ee62 eventscripts: Fix comment - CTDB_TCP_PORT_CHECKS -> CTDB_TCP_PORT_CHECKERS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a79ba2f1277a776347e2c3f04ce8419e0be62de)
2013-10-22 13:07:13 +11:00
Martin Schwenke
5818771192 scripts: Add support for optional ctdbd.conf configuration file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)
2013-09-25 14:35:46 +10:00
Amitay Isaacs
4c4bfcbd6f eventscripts: Load CTDB configuration settings in 70.iscsi
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ff41ce5ef202f8f6342e285d195bb5df61d848ce)
2013-09-23 18:38:28 +10:00
Martin Schwenke
b88bf1275c eventscripts: Clean up monitoring of system memory in 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d)
2013-09-11 15:34:30 +10:00
Martin Schwenke
cc74417341 eventscripts: Avoid using a temporary file in 62.cnfs
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81833052d7ee8f76b1e98376a0273448640cfa8e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
bb974f150b scripts: Remove gdb_backtrace
This uses potentially insecure temporary files and is not referenced
anywhere else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
2013-08-22 17:00:20 +10:00
Martin Schwenke
fec69034ee eventscripts: Become unhealthy faster on nfsd failure
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem.  Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures.  Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
2013-08-14 16:10:30 +10:00
Martin Schwenke
e6ce2f55ef eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
2013-08-14 15:57:04 +10:00
Martin Schwenke
35d9631eda eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
2013-08-14 15:57:04 +10:00
Martin Schwenke
b1f7337d2b eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
2013-08-14 15:57:04 +10:00
Martin Schwenke
0ca046577f eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
2013-08-14 15:57:03 +10:00
Martin Schwenke
bdbe37b24f eventscripts: Separate out RPC service restart code
While doing this:

* Explicitly assign RPC program and version information in
  _nfs_check_rpc_common().  This is more lines of code but is easier
  to read.

* Don't print the options when starting a service.  Trying to print it
  makes the code messy for little benefit.

  Update the eventscript unit testing code and a Ganesha test to
  reflect this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
2013-08-14 15:57:03 +10:00
Martin Schwenke
df539a66cb eventscripts: Remove support for RPC service 'q' and 's' restart flags
They're hard to maintain and provide very little benefit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
2013-08-14 15:57:03 +10:00
Martin Schwenke
5459cdc8a6 eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output.  This is consistent with the way
CTDB generally deals with the output when stopping a service.

It also makes updating the eventscript unit tests easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
2013-08-14 15:57:03 +10:00
Martin Schwenke
98163e01a9 scripts: Do not run ctdb tool commands when debugging hung "init" event
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.

Also, minor log formatting changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
f5ddb49e62 eventscripts: Use configured RECLOCK file instead of asking CTDB
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead.  And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)
2013-08-09 11:04:55 +10:00
Martin Schwenke
3c73949317 initscript: The wrapper script should export CTDB_SOCKET
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value.  This at least ensures that ctdbd will be
started.

If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)
2013-07-29 15:58:51 +10:00
Martin Schwenke
a8dd716146 eventscripts: kill_tcp_connections() should send connections to stdin
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection.  This will considerably reduce the
time when there is a large number of tcp connections.  This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.

Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
2013-07-29 15:53:06 +10:00
Martin Schwenke
67b22b6e94 scripts: Run scriptstatus for hung event
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.

Since there is now quite a bit of output, serialise the calls to this
script using flock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)
2013-07-29 14:02:13 +10:00
Martin Schwenke
1da757d91a eventscripts: A missing interface should cause monitoring to fail
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.

This couldn't be done previously because orphaned interfaces used to
be listed for monitoring.  This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.

While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch.  ;-)

This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef)
2013-07-19 15:35:41 +10:00
Martin Schwenke
4b5c9c7991 eventscripts: Get list of configured interfaces using "ctdb ifaces"
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces.  This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)
2013-07-19 15:35:41 +10:00
Martin Schwenke
7610b6c009 scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 412bc0e20bef694d4e911dc9c984fd7716231f1f)
2013-07-11 15:18:06 +10:00
Martin Schwenke
e4d99cc899 packaging: Add systemd support
Based on an original patch by Sumit Bose <sbose@redhat.com>.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e43a4b7b69a21c4cec2453dcac436b64bf5d7f06)
2013-07-10 18:14:33 +10:00
Martin Schwenke
adbee6ae4e initscript: Simpify initscript and control CTDB via new ctdbd_wrapper
Currently the initscript is very complex.  This makes it hard to read
and hard to add support for new init systems, such as systemd.

Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd.  It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e3abc7eebab5cceddc4ce7817890dd5db9be3450)
2013-07-10 15:19:27 +10:00
Amitay Isaacs
ae0afad8ee initscript: Export CTDB_DEBUG_LOCKS variable
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a415a1986900135f889efc25ecaf2761b1dae81a)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
f46d0e783c scripts: Add an example debug_locks.sh script to debug locking issue
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c711ff4702c5f95b75e4bf030665fc2afffc2f9e)
2013-07-10 14:33:18 +10:00
Martin Schwenke
d6d1fb1f46 eventscripts: New configuration variable $CTDB_SKIP_GANESHA_NFSD_CHECK
This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f606df4f2db754592e6d1a16c26e155cacb2beef)
2013-07-05 15:52:33 +10:00
Martin Schwenke
7f6169b207 eventscript: Move Ganesha nfsd monitoring to a function
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ceb5b2d37f7ab4894908ec26f3812b3bed991525)
2013-07-05 15:52:33 +10:00
Martin Schwenke
c3e83d4532 eventscripts: Drop RPC service version from nfs_check_rpc_service() calls
Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 520914e7ee1b879c1080e5857fda18ed5b973fd6)
2013-07-05 15:52:33 +10:00
Martin Schwenke
4e07c6c433 eventscripts: When replaying monitor status, don't log empty output
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ce04f1c107b4392ca955d9f29b93aaaae62439ce)
2013-07-05 15:52:33 +10:00
Martin Schwenke
01d879806b eventscripts: "setup" event doesn't need to wait for SETUP runstate
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e)
2013-06-20 13:01:10 +10:00
Martin Schwenke
4eed91b54a eventscripts: 13.per_ip_routing should not try hard to find public_addresses
This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!

The test code has been fixed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332)
2013-06-20 13:01:10 +10:00
Martin Schwenke
6317285c4f scripts: Move TDB checking from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0)
2013-06-20 13:01:10 +10:00
Martin Schwenke
961468146e scripts: Move dropping of all IPs from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b77cceb49a30a181063adc7868d42d2851318e8)
2013-06-20 13:01:09 +10:00
Martin Schwenke
bee02e06e6 scripts: drop_ip() should use delete_ip_from_iface()
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)
2013-06-20 13:01:09 +10:00
Martin Schwenke
a1eb516f0a scripts: drop_all_public_ips() now prints messages to stdout, not log
Change all callers to maintain current behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)
2013-06-20 13:01:09 +10:00
Martin Schwenke
45878d4363 eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped.  This can be useful for trying to determine
why nfsd is stuck.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)
2013-06-14 15:15:06 +10:00
Martin Schwenke
f408caea2a eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Consider the following example:

1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
   underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.

Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)
2013-06-13 20:01:22 +10:00
Martin Schwenke
2e515f2306 eventscripts: Fix statd-callout update handling
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run.  This stops the statd-callout updates from ever being called.

Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file.  Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>

(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)
2013-05-28 16:11:47 +10:00
Martin Schwenke
1eab9c898c eventscripts: Stop NAT gateway's delete_all() from polluting the log
Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:

  ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1

Since this usually fails it is better to mute the error than to have
it pollute the log.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)
2013-05-27 15:15:25 +10:00
Martin Schwenke
66019e3287 scripts: Provide mktemp function for platforms without mktemp command
This is needed for AIX and possibly others.

Also provide a cheaper mktemp function is needed in the run_tests
script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
2013-05-27 15:14:33 +10:00
Martin Schwenke
a989a299d1 eventscripts: 11.natgw should not call ctdb tool in "init" event
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.

Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19)
2013-05-24 14:08:07 +10:00
Martin Schwenke
6d9667f01c ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).

Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
2013-05-24 14:08:07 +10:00
Martin Schwenke
b5ebff6931 tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.

At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately.  This behaviour isn't very
friendly.

The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.

The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)
2013-05-24 14:08:07 +10:00
Martin Schwenke
bb39f0a186 scripts: Rework notify.sh to use notify.d/ directory
This makes it easier to add notification handlers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d29e9a420b133088bf23a847c8d1dbce56c25eb0)
2013-05-23 16:18:23 +10:00
Martin Schwenke
51dbaecb54 eventscripts: Fix regression in _loadconfig()
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1619a36c1beba11533052dc5728fa3adaa08870)
2013-05-22 14:24:21 +10:00
Martin Schwenke
ff9831f5b1 initscript: If CTDB doesn't become ready, print a message before killing
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e6b6b793f61556c21e8daf34abf89ee7b388ecfb)
2013-05-22 14:24:21 +10:00
Amitay Isaacs
84bcb95952 eventscripts: Do not use bashism for string comparison
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit b0cae7d5a00ef3764bae187affc8e9a252f4b329)
2013-05-20 19:47:10 +10:00
Martin Schwenke
de84c1fd3c eventscripts: NFS RPC checks no longer support "knfsd"
No longer used, support removed from test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e)
2013-05-07 12:55:09 +10:00
Martin Schwenke
434f9e8594 eventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC services
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs

* Installation and packaging additions to handle nfs-rpc-checks.d/

* Unit test updates, including deleting 1 test that sanity checked
  test infrastructure

* Test infrastructure changes to use nfs-rpc-checks.d/

Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs.  To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7e792d6768d9ca420ce3713cb122e63afd594b15)
2013-05-07 12:55:09 +10:00
Martin Schwenke
05b2edeec2 eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"
Want nfs_check_rpc_services() to support filenames without the 'k'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
2013-05-06 20:40:58 +10:00
Martin Schwenke
c52183c055 eventscripts: New function nfs_check_rpc_services()
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.

nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator.  The files have one limit check and
a set of actions per line.  The program name is extracted from the
file name.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
2013-05-06 20:40:58 +10:00
Martin Schwenke
167acd1cd5 eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5a717fd495ba5a2bfd481d69f38b68fa4576716f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
bdab9d1ea6 eventscripts: Factor out common code from nfs_check_rpc_service()
This creates new function _nfs_check_rpc_common().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc3bb42e48bbdabd19187c231846b98589b4f4f3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
910e138cb3 eventscripts: Remove ganesha support from nfs_check_rpc_service()
This is unused so doesn't need to be maintained.  An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 887733dd7be53158bfe07b30ef31b611d0f8122f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
944d063a3e Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"
This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.

This change is unused and is just complicating the function.

Conflicts:
	config/functions

(This used to be ctdb commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
577a3cae5d eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
The code in 60.nfs is going to be genericised, so make all the checks
look the same.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
2013-05-06 20:40:58 +10:00
Martin Schwenke
6c347a5294 eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
2bc807f974 eventscripts: Remove unused function ctdb_check_counter_limit()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a8ef00608e48a551a334aded206146807aeb4c5a)
2013-05-06 16:24:59 +10:00
Martin Schwenke
460d0651b6 eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()
ctdb_check_counter_limit() can soon be removed...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1)
2013-05-06 16:24:59 +10:00
Martin Schwenke
8373226251 eventscripts: Might as well try to stat the reclock file first
It is in the background but it still might cause the counter to be
reset before it is checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ef2cf75e95ff382c65524a4d77eb00ab8411d2fc)
2013-05-06 16:24:58 +10:00
Martin Schwenke
31c3edcadf eventscripts: Make the early exit in 01.reclock earlier
That way we don't even check the counter...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 136abd4604dc68f7c696704bac708bae53cf1940)
2013-05-06 16:24:58 +10:00
Martin Schwenke
29a3823e40 eventscripts: Minor cleanups for killtcp/tickle functions
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25ef4f655f1efc833deb5e244f9fff461e92f439)
2013-05-06 16:24:50 +10:00
Martin Schwenke
189a5c003c eventscripts: Tweak the timeout check in kill_tcp_connections()
This has 2 advantages:

1. It uses get_tcp_connections_for_ip() to check for leftover
   connections, instead of custom code.

2. It checks for the timeout condition before sleeping.  The current
   code sleeps and then checks, so wastes a second.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 60a08eb96e1d97aab31e9bd4af01683c650541c2)
2013-05-06 16:22:15 +10:00
Martin Schwenke
8f84a2bec7 eventscripts: In killtcp/tickle functions, $_failed should be boolean
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 319c1b68d5aa78f82a68febcad233a7c78afc887)
2013-05-06 16:22:07 +10:00
Martin Schwenke
ed59deaee3 eventscripts: Remove unused $_killcount from tickle_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8514ca56830b30e7f0eb5018632640daaf8ff65d)
2013-05-06 16:16:56 +10:00
Martin Schwenke
975ea7fb7a eventscripts: Refactor connection listing in killtcp and tickle functions
Uses new function get_tcp_connections_for_ip().  This avoids using a
temporary file and running netstat twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a621622903c7ef17764b15293d6ea8df5a53c7e1)
2013-05-06 16:16:50 +10:00
Martin Schwenke
a320e1f7f1 eventscripts: Reimplement kill_tcp_connections_local_only()
... using kill_tcp_connections()

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 10e4db8f796d1e3259733180494db3b4bbad291a)
2013-05-06 15:45:11 +10:00
Martin Schwenke
5e828b48fe eventscripts: Change handling of one-way kills in kill_tcp_connections()
This change is a no-op.  However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
2013-05-06 15:45:10 +10:00
Martin Schwenke
d98d931af3 eventscripts: Remove unnecessary variables from killtcp/tickle functions
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3eae161472e6352f7f656851c73dc056f95113eb)
2013-05-06 15:45:10 +10:00
Martin Schwenke
6e2863a4f9 eventscripts: Clean up ctdb_check_command()
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
2013-05-06 15:45:10 +10:00
Martin Schwenke
30addb886a eventscripts; Cleanup up ctdb_check_directories()
The documentation comments are wrong... and remove option
$service_name argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
0ad8f46db3 eventscripts: Assert that $service_name is set in a few key places
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d0a7d83ddc824961d876fc9afba829c90aef3e7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
5dd9e52e46 eventscripts: counters default to $script_name if $service_name not set
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fff88940f71058e4eefd65f50a6701389c005c17)
2013-05-06 15:45:10 +10:00
Martin Schwenke
e9abc9c070 eventscripts: Simplify handling of $service name in "managed" functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().

Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
2013-05-06 15:45:10 +10:00
Martin Schwenke
c56acf7127 eventscripts: Simplify handling of $service name in start/stop functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
8065366b33 eventscripts: Simplify handling of $service name in service_management
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
2013-05-06 15:45:10 +10:00
Martin Schwenke
4c9438b2a3 eventscripts: Simplify handling of $service name in reconfigure functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
642848b916 eventscripts: Remove unused function ctdb_check_counter_equal()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fd536a26b310b5bf9628da62cca0b425f4a54030)
2013-05-06 15:45:10 +10:00
Martin Schwenke
bbd0ed0e29 scripts: Fix script_log() regression
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.

Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
2013-05-06 15:43:16 +10:00
Martin Schwenke
27a5b78c8e initscript: Look for tdbtool/tdbdump using which, not in fixed locations
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c74cc0442eb90d859eae270b59456d28605817c4)
2013-05-06 15:40:30 +10:00
Martin Schwenke
fa16cccf02 ctdbd: Remove the "stopped" event
It isn't used, superceded by "ipreallocated".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
2013-05-06 13:38:21 +10:00
Martin Schwenke
fb028a208c eventscripts: Remove use of "stopped" event
Use "ipreallocated" instead.  The "stopped" event pre-dates the
"ipreallocated" event.  The only way of stopping a node is via the
ctdb tool, which explicitly causes a takeover run to occur after the
node is stopped.  The takeover run will generate an "ipreallocated"
event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 978d4a0d6d8c9877b23f72e3a7b78c1245d16908)
2013-05-06 13:38:21 +10:00
Martin Schwenke
823edbf6fe scripts: Ensure even external scripts get tagged in logs as "ctdbd"
Our practice is to search logs for "ctdbd:".  We want to make sure we
find everything.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93)
2013-04-22 13:58:36 +10:00
Martin Schwenke
fb8be43d6d eventscripts: Ensure directories are created
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.

Create those directories instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0076cfc4666e5a96eb2c8affb59585b090840e00)
2013-04-22 13:58:36 +10:00
Martin Schwenke
903f4c394c scripts: Clean up update_tickles() and handling of associated directory
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 700cf95a1f29b4b88460a00a55d57a9e397011e0)
2013-04-19 13:13:36 +10:00
Martin Schwenke
100a0eed90 scripts: Use $CTDB_SCRIPT_DEBUGLEVEL instead of something more complex
The current logic is horrible and creates an unnecessary file.  Let's
make the script debug level independent of ctddb's debug level.

* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly

* Remove ctdb_set_current_debuglevel()

* Remove the "getdebug" command from ctdb stub in eventscript unit
  tests

* Update relevant eventscript unit tests to use
  $CTDB_SCRIPT_DEBUGLEVEL

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 85efa446c7f5c5af1c3a960001aa777775ae562f)
2013-04-19 13:13:36 +10:00
Martin Schwenke
f54dab03d5 scripts: Ensure service command is in $PATH in ctdb-crash-cleanup.sh
Move the use of the service command below inclusion of functions file,
which sets $PATH.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d254d03f69cbdc3e473202b759af6e1392cbb59c)
2013-04-19 13:12:36 +10:00
Martin Schwenke
d24077922f initscript: Remove duplicate setting of $ctdbd
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e7a4b7e35a1e4b826846e2494a3803abb57065ee)
2013-04-18 13:22:12 +10:00
Martin Schwenke
1f5bfde553 scripts: ctdb-crash-cleanup.sh uses initscript to see if ctdbd is running
"ctdb ping" can time out.  How many times should we try?

Instead, depend on the initscript to implement something sane.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 90cb337e5ccf397b69a64298559a428ff508f196)
2013-04-18 13:22:12 +10:00
Martin Schwenke
38366b6b53 initscript: Use a PID file to implement the "status" option
Using "ctdb ping" and "ctdb status" is fraught with danger.  These
commands can timeout when ctdbd is running, leading callers to believe
that ctdbd is not running.  Timeouts could be increased but we would
still have to handle potential timeouts.

Everything else in the world implements the "status" option by
checking if the relevant process is running.  This change makes CTDB
do the same thing and uses standard distro functions.

This change is backward compatible in sense that a missing
/var/run/ctdb/ directory means that we don't do a PID file check but
just depend on the distro's checking method.  Therefore, if CTDB was
started with an older version of this script then "service ctdb
status" will still work.

This script does not support changing the value of CTDB_VALGRIND
between calls.  If you start with CTDB_VALGRIND=yes then you need to
check status with the same setting.  CTDB_VALGRIND is a debug
variable, so this is acceptable.

This also adds sourcing of /lib/lsb/init-functions to make the Debian
function status_of_proc() available.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 687e2eace4f48400cf5029914f62b6ddabb85378)
2013-04-18 13:22:12 +10:00
Amitay Isaacs
d931e73fb8 statd-callout: Make sure statd callout script always runs as root
In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fe8c4880b371492a38554868d4ca10918c54e412)
2013-04-08 11:14:28 +10:00
Amitay Isaacs
6e650b6ee5 eventscripts: Remove calls to "smbstatus -np" for samba cleanup
This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 053b89c6dbce47001505524606889334559d2ec4)
2013-02-11 11:25:49 +11:00
Martin Schwenke
8c9eedbce3 initscript: export CTDB_EXTERNAL_TRACE
This means it can be set like any other configuration option in the
configuration file, without needing to export it there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a0ef73e197dc9147f7718e0813fe803ff0b3d54d)
2013-02-05 16:05:13 +11:00
Martin Schwenke
bc5f0a2b65 ctdbd: Remove command-line option --debug-hung-script
Use an environment variable instead.  This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.

The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each.  So, the convention
will be to use an environment variable for each debug option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
2013-02-05 16:05:13 +11:00
Mathieu Parent
69afd9abc5 doc: allows to -> allows one to
Signed-off-by: Mathieu Parent <math.parent@gmail.com>

(This used to be ctdb commit 95fc493a7d4145f976cb3fe928d9e92faec4dd71)
2013-01-22 18:03:35 +11:00
Srikrishan Malik
28cbe527d4 Changes for unobtrusive recovery and new method for health check.
Unobtrusive recovery: Ganesha will not be restarted on failovers.

Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>

(This used to be ctdb commit 0e651e9da0f1f3c836b4474612ab13d0ccd272d9)
2013-01-11 17:16:46 +11:00
Martin Schwenke
aca9299669 eventscripts: Fail the setup event if CTDB does not become ready
Currently it silently continues without attempting to set tunables.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 735ec99b99c7bb579851ce8293011aaf1dcc552a)
2013-01-09 12:45:59 +11:00
Martin Schwenke
4f622fe9fb scripts: Make script_log() use supplied message, stop logger from hanging
When using syslog any provided message arguments are ignored and not
passed to logger.  This means that logger blocks waiting on stdin.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50abf597cefe6f8ea2a2ff7694bf84641344a9b1)
2013-01-08 15:18:47 +11:00
Martin Schwenke
095fac9491 scripts: Rework ctdb-crash-cleanup.sh so that it uses existing functions
This improves maintainability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e2aaa64925cca359c71520e01a18fc9461b0da4d)
2013-01-08 15:18:47 +11:00
Martin Schwenke
d801b02681 scripts: Make drop_all_public_ips() more robust
Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.

Factoring out some of the code will allow it to be used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367)
2013-01-08 15:18:47 +11:00
Martin Schwenke
4157efdcbb scripts: debug-hung-script.sh doesn't need functions/loadconfig
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8507303b525d20c74e8ec4e7c4f5f275945cd3b6)
2013-01-08 15:18:47 +11:00
Martin Schwenke
f5226c9a75 scripts: statd-callout should calculate CTDB_BASE if it is not set
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 376015ba5ad6b7703ae9949a1d40a0c72dfaba0c)
2013-01-08 15:18:46 +11:00
Martin Schwenke
297b98d5b6 eventscripts: Each script should set CTDB_BASE if it is not set
This makes it easier to run the scripts externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 740ea8ea5084149c8b552a01ee1c98c558b12384)
2013-01-08 15:18:46 +11:00
Martin Schwenke
0eb757329e scripts: Move drop_all_public_ips() to the functions file
... so it can be improved and used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b23c30253cc9eb274b895cac0f8c65245ba0a200)
2013-01-08 15:18:46 +11:00
Martin Schwenke
217ad07b72 Eventscripts: Change the default reconfigure action to do nothing
A default action of restarting the service doesn't obey the principle
of least surprise.  It cause the NFS service to be implicitly
reintroduced.

This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)
2013-01-07 10:35:39 +11:00
Martin Schwenke
3d408ca1e1 Eventscripts: Do not restart NFS on reconfigure
It looks like this restart was accidentally reintroduced in commit
fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure
became unset so the default action of restarting the service would
occur.  From there cleanups have explicitly reintroduced it and
carried it through the code.

Also update the unit tests affected by this change.

The restart was originally removed in commit
bc481c3f1a44c50648488c4f8a7f15ec395d446f.

The default reconfigure action of restarting a service is clearly
suboptimal and will be addressed in a separate patch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37)
2013-01-07 10:35:39 +11:00
Martin Schwenke
df7152fe87 Initscript: when checking status, print output of "ctdb ping" if it fails
At the moment the caller has no idea why it thinks CTDB isn't running
and we can't debug failures...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 776590bf84d221092298346a28d7fc0552a67c9d)
2013-01-07 10:35:38 +11:00
Michael Adam
b64e237f9b events/50.samba: fix testparm background update
creating the smb.conf cache with "-v" results in a cache file
that fails to load with "testparm -s ..." later on due to
"copy = " not being processable. (Copying the empty service name fails).

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 81788cfabe960497b050c5ee4e4e487ee061012a)
2013-01-05 01:15:19 +01:00
Martin Schwenke
8fad7670f1 Eventscripts: 10.interface should list configured interfaces
The current code lists available interfaces.  If IPs are configured in
some other way than the public addresses file (e.g. ctdb addip) and their
interfaces default to being marked down then, since down interfaces are
not available, these interfaces can never be marked up.

The configured interfaces should be listed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8f010355b715e49709836e057a5d0f110919897)
2012-11-19 15:54:50 +11:00
Martin Schwenke
f082f4006f Eventscripts: 10.interface startup event should only process interfaces once
Provided that monitor_interfaces() sets the state of each interface,
there's no need to mark all interfaces as up before running
monitor_interfaces() in the startup event.  monitor_interfaces() will
set the true status of each interface anyway.  The duplication is
unnecessary and may cause extra action in the recovery daemon because
the state of some interfaces is changed an extra time.

Instead, add a comment at the top of the loop in monitor_interfaces()
to warn against early loop exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f243a916ee71013f7402b9c396c2ead88eb3aab0)
2012-11-14 10:57:48 +11:00
Volker Lendecke
295dfa771a Avoid a bashism in 60.ganesha
This file is #!/bin/sh. On sn-devel at least, with this /bin/sh the
shell does not like == for string equality.

(This used to be ctdb commit e2213db479129ce9c2b2fb88ec8c53cbd33d54b3)
2012-10-24 18:31:16 +11:00
Martin Schwenke
9f6b30a517 scripts: Refactor logging code in initscript and functions file
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ee242c949a98bb7397e0f7368b20d44c06fe772)
2012-10-18 20:05:43 +11:00
Martin Schwenke
ad8eb45fe2 initscript: Check that rc.ctdb is executable before running it
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 59a47c0674bacfebc17a1b44f0244727bf2fa7a4)
2012-10-18 20:05:43 +11:00
Martin Schwenke
66d0aba85b Revert "Eventscripts - add facility to 10.interface to delete unmanaged IPs"
This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099.

This is dangerous and, on reflection, I can't see it being useful.
There are often permanent IPs on interfaces that CTDB shares with its
public IPs.

(This used to be ctdb commit 16aba4eb620844626a1c71c58b51658caf44dea6)
2012-10-18 20:05:42 +11:00
Martin Schwenke
34a6c07e99 Eventscripts: "recovered" event should not fail on NATGW failure
The recovery process has no protection against the "recovered" event
failing, so this can cause a recovery loop.

Instead of failing the "recovered" event, add a "monitor" event and
fail that instead.  In this case the failure semantics are well
defined.

A separate patch should ban nodes if the "recovered" event fails for
an unknown reason.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eaa7c165f58abd7e259c37d76b7dd37c91e13d9f)
2012-10-18 20:05:42 +11:00
Martin Schwenke
8d7562f3f8 common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace.  If we can reproduce it then this might
help us to debug it.

The idea is that you do something like the following in /etc/sysconfig/ctdb:

  export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"

When we hit this error than we call out to gcore to get a core file so
we can do forensics.  This might block CTDB for a few seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
2012-10-18 20:05:42 +11:00
Michael Adam
6372592982 config/functions: fix a comment
ctdb_check_counter_limits does not fail but succeed if count >= limit

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)
2012-10-17 21:56:58 +02:00
Amitay Isaacs
cc763c455d doc: Add info about execute permissions on event scripts
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 25d886060b138bc5e78fe93d7bebe3990264f29d)
2012-10-17 11:39:39 +11:00
Amitay Isaacs
efe77d0e35 doc: Fix documentation for setup event
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 36d25e96a2f8ae1461c5a708a2922f0475a39900)
2012-10-17 11:39:39 +11:00
Amitay Isaacs
ce210f6978 scripts: Remove duplicate code from init script to set tunables
The tunable variables defined in CTDB configuration file are currently
set up from init script as well as part of "setup" event in 00.ctdb
eventscript.  Remove the duplication of this code and set tunable
variables only from setup event.  During the "setup" event, it's possible
that ctdb tool commands can timeout if CTDB daemon is not ready.  To guard
against such eventuality, wait till "ctdb ping" command succeeds before
executing any other ctdb tool commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 632c1b9c1cc2e242376358ce49fd2022b3f27aa2)
2012-10-17 11:32:41 +11:00
Martin Schwenke
74843dadad Eventscripts: Add support for "reconfigure" pseudo-event for policy routing
This rebuilds all policy routes and can be used if the configuration
changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c185ffd2822fcee26d07398464c59b66c61f53fa)
2012-10-11 12:10:45 +11:00
Martin Schwenke
d33b12a1c5 Eventscripts: Add service-start and service-stop pseudo-events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)
2012-10-10 14:54:53 +11:00
Martin Schwenke
2d719e5c84 eventscripts: Auto-start/stop services in background
If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done
in the background with logging.

Fix some unit tests for samba and winbind.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
2012-10-03 08:48:23 +10:00
Martin Schwenke
f3ae31e741 Eventscripts: split 50.samba into 49.winbind and 50.samba
winbind and samba can be separately managed.  This makes the service
starting and stopping code way too complicated, and even adds a small
amount of complexity to the monitoring code.  The sensible option is
to split this eventscript in two.

There are two potentially backward incompatible changes here:

* Functionality has been removed that allowed 50.samba to manage
  winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf
  "security" parameter was set to "ADS" or "DOMAIN".

  Maintaining this functionality would have required moving the
  testparm-related code to the functions file, deciding where the
  cache file should go, and then calling it from both 49.winbind and
  50.samba.  This feature wasn't of great value and asking
  administrators to set an extra variable in exchange for code
  simplicity seems like a reasonable deal.

* External code will need to be changed if it calls 50.samba directly
  with winbind-related expectations.  This is fairly obvious!

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)
2012-10-03 08:46:32 +10:00
Martin Schwenke
e2d4250731 Initscript: Kill any existing ctdbd processes if the ping succeeds
Initialising a new ctdbd will destroy the Unix domain socket so
existing processes will be useless anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 043ef77086797a703aec436a26a05c56a1bcbf2b)
2012-10-02 17:37:53 +10:00
Martin Schwenke
530415b671 Eventscripts: Indent error when a route delete fails in 11.per_ip_routing
This puts it under the umbrella of the previous warning that should
also have been printed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958)
2012-09-11 12:52:22 +10:00
Martin Schwenke
0d35a8c439 eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocated
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0d0a6f19960f233224970b8d5d19b0e37222616)
2012-09-11 12:52:22 +10:00
Martin Schwenke
e1348221d6 eventscripts: Print a warning on failure to delete a routing rule
del_routing_for_ip() currently fails silently, which could hide real
errors.

In add_routing_for_ip() we don't want to see any error when calling
del_routing_for_ip(), since we don't expect the rule to be there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30d69defa7e97ab5e3ba0492a27868dde2616494)
2012-09-11 12:52:22 +10:00
Martin Schwenke
5bbf4b6e30 Eventscripts: 13.per_ip_routing should always fail if config is missing
Currently, if the configuration file is specified by
$CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the
absent) monitor event "succeeds", so the state of a node will
flip-flop.

Instead of this, if the configuration file is missing then fail early
on for all events.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c64c6c77c3f6aa2898e5a575547b587bea868c76)
2012-07-30 15:57:56 +10:00
Martin Schwenke
ff0830037e Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is missing"
When the configuration file is missing this causes the node to
flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor
event here).

Will reimplement this properly.

This reverts commit 351ca413eec460330571ca8b01ad269728fe15df.

(This used to be ctdb commit 5277d749c9111716fd723647d5421907476422bf)
2012-07-30 15:57:56 +10:00
Martin Schwenke
35de9f2583 Eventscripts: Clean up 11.routing
The loops can all be done without cat or grep.

The pair of loops in updateip is combined into a single loop.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 96fdda124f5511fb76190e7c7a7f0b98e6b01a31)
2012-07-30 11:24:59 +10:00
Martin Schwenke
748c3d7eb6 Initscript: clean up drop_all_public_ips()
This makes the case implicit where $CTDB_PUBLIC_ADDRESSES is unset.
This is OK because that's not an interesting code path.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b2725d1ae052e848c2487cb10c5393a877d118c)
2012-07-26 22:05:43 +10:00
Martin Schwenke
4d4768ef26 statd-callout: Fix a bug in the calculations of $STATE
It is just meant to be even, so divided *and* multiplied by 2.  Use
$(( )) to make it more readable.

While touching this code, make the related calculation a bit more
readable too.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25d45e69f4ffc2b26061ac13038d52a353e79e61)
2012-07-26 21:24:15 +10:00
Martin Schwenke
6717698cba Eventscripts: Default route on NAT gateway should have a metric of 10
At the moment routes from 11.routing can fail to be added because they
conflict with the default route added by 11.natgw.

NAT gateway is meant to be a last resort, so routes from 11.routing
should override it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 624f4677e99ed1710a0ace76201150349b1a0335)
2012-07-26 21:14:58 +10:00
Martin Schwenke
31bdf91933 Eventscripts: Update/remove stale comments in 11.natgw
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d713d5e5be67f5914a661694c15d938bd67dea3)
2012-07-26 21:14:58 +10:00
Martin Schwenke
05359689f6 Eventscripts: Retrieve and build NAT gateway details better in 11.natgw
* "ctdb natgw" is run twice when it doesn't need to be.

* Tweak the parsing of "ctdb natgw" output so that it is done by the
  shell instead of a bunch of external processes.

* Make default NAT gateway be -1, even on error.  If the process
  failed entirely then it could previously be empty.

* Streamline the error handling using die() for when there is no NAT
  gateway.

* Downcase script-local variable names.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 630cfe6451ba23d959fa4907fbba42702337ed3b)
2012-07-26 21:14:58 +10:00
Martin Schwenke
e7325ebcd5 Eventscripts: Optimise building the host address in 11.natgw
It can be build without forking unnecessary processes.

Also downcase variable name because it is local to script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 34f58a0773618c4508a55ad75fc4602dad5a5f4c)
2012-07-26 21:14:58 +10:00
Martin Schwenke
9a7a199132 Eventscripts: Clean up startup sanity check in 11.natgw
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f6e421e8bf935cae790a6dc2b861eb9c7f8610b4)
2012-07-26 21:14:57 +10:00
Martin Schwenke
573fb0497a Eventscripts: remove redundant firewall rules from 11.natgw
aeb70c7e7822854eb87873a5c7783e27e6e72318 said it moved these but it
redundantly duplicated them instead.  That commit also fixed the
problem because it moved the rules after delete_all() not out of the
startup event as claimed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 07149edaecb3caa672163e5a3b89715557d5205a)
2012-07-26 21:14:57 +10:00
Martin Schwenke
c0b7fbf2a4 Eventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation
$CTDB_NATGW_PUBLIC_IP can be split into $_ip and $_maskbits without
forking lots of processes.

Also "local" isn't supported by POSIX.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e20fdb974158061f4627d6f360c168d764690e6f)
2012-07-26 21:14:57 +10:00
Martin Schwenke
1ba9fa2e48 Eventscripts: Fix deprecated iptables ! usage
This currently causes warning in the logs.

This change is not SLES10-compatible but we already have some other
non-SLES10-compatible changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7640352c6697f9d4e0d13afbc8523afc64e7d462)
2012-05-25 15:26:07 +10:00
Ronnie Sahlberg
383711ac82 Merge remote branch 'martins/ganesha'
(This used to be ctdb commit f23b5a160184db8c92f8c69307dc4a64adae839d)
2012-05-17 11:48:07 +10:00
Ronnie Sahlberg
dce5969d12 Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung.
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.

S1037271

(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
2012-05-17 10:29:03 +10:00
Martin Schwenke
835e0b6d49 Eventscripts: Modernise 60.ganesha to match 60.nfs
Originally from Srikrishan Malik <srikrishan.malik@in.ibm.com> with
some style changes by me.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 637cab6304dae66b85668506028c76ea1ee88980)
2012-05-16 17:24:21 +10:00
Martin Schwenke
ffbe59bd44 Eventscripts: restart lockd in the background when going unhealthy
Sometimes the restart can hang when there are I/O problems.  Then the
eventscript times out and gets killed so the node never marked as
unhealthy.

Restarting in the background avoids this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 13acd58c41fba1a33894fbd654fed69ea0eac322)
2012-05-16 17:19:55 +10:00
Martin Schwenke
92eb004162 Eventscript functions: add optional version to nfs_check_rpc_service()
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 92f74fd589467b46c758e116e97417edfe8773d7)
2012-05-16 17:05:05 +10:00
Martin Schwenke
fd048a1771 Eventscript functions: add optional version to nfs_check_rpc_service()
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1957d53b78f101cd0cd37d9705a225deef5174a2)
2012-05-11 10:33:27 +10:00
Martin Schwenke
0c8f785628 Eventscripts: fix basename -> dirname typo
I fixed one of these previously but didn't notice this one...  :-(

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0c674efd19368d41d9cc28909d2b16c1af54c86c)
2012-04-27 15:42:42 +10:00
Martin Schwenke
012015b32c Eventscripts - Fix typo in 13.per_ip_routing support for __auto_link_local__
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9542e770a9780740b49122f1f52f08b32eca4b35)
2012-04-27 15:40:43 +10:00
Martin Schwenke
a3ee4a900f Initscript - add backup of corrupt non-persistent databases
Corrupt non-persistent databases never get analysed because ctdbd
zeroes them at startup.

Modify the initscript so that corrupt non-persistent databases are
moved aside to a backup.  If the number of backups for a particular
database exceeds $CTDB_MAX_CORRUPT_DB_BACKUPS (default 10) then the
oldest excess backups are garbage collected.

Abstracts from and cleans up the code for checking persistent
databases.

Logging of related messages is done to syslog or a log file as
specified.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00cd75595685dae829758abf1a4cb644af7ed50e)
2012-03-28 15:02:07 +11:00
Martin Schwenke
2f5cb56017 Eventscripts - make 13.per_ip_routing fail gracefully if config is missing
Currently it spews out random messages about the file being missing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 351ca413eec460330571ca8b01ad269728fe15df)
2012-03-22 15:30:27 +11:00
Martin Schwenke
ac973b34df Eventscripts - make 13.per_ip_routing try harder to find public_addresses
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d4621277240721e6d130a930b0100506b64467ea)
2012-03-22 15:30:27 +11:00
Martin Schwenke
020c8190c5 Eventscripts - use set_proc() rather than accessing /proc directly
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bdb4cdaf2aed79c8de6a8db8c01685b242808310)
2012-03-22 15:30:27 +11:00
Martin Schwenke
4f65737809 Eventscripts - 13.per_ip_routing should use dirname not basename for mkdir
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d034845ecea66b47004bc73f2554914a397b1c9d)
2012-03-22 15:30:27 +11:00
Martin Schwenke
56d90e930d Eventscript support - Remove unused interface_modify.sh
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 994492f79275fe84155d842f6bc288c1858217dd)
2012-03-22 15:30:27 +11:00
Martin Schwenke
476cf45049 Eventscript functions - no longer require interface_modify.sh
Make add_ip_to_iface() and delete_ip_from_iface() do their own locking
so the external script is no longer required.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93f90caf91246074d9359bf31a39b26212cccc42)
2012-03-22 15:30:27 +11:00
Martin Schwenke
0b2c3d7d24 Eventscript functions - remove now-unused route/IP re-add script logic
This is no longer used by 13.per_ip_routing or anything else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2a2ea6c61a05af2d0765e964abcc7ef04047431e)
2012-03-22 15:30:26 +11:00
Martin Schwenke
940efdb8e9 Eventscript functions - remove functions only used by 13.per_ip_routing
The relevant functions are now in that script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45c3476d12bf0f52966b72d286f101fce1382cd2)
2012-03-22 15:30:26 +11:00
Martin Schwenke
95e10b20cb Eventscripts - redesign and rewrite 13.per_ip_routing
The current version is quite difficult to read.  This one is hopefully
clearer.

Major changes:

* The configuration file has a more forgiving syntax.  Items can be
  separated by arbitrary whitespace.

* Mappings between IP addresses and table IDs are no longer stored in
  files in a state directory.  Instead they are stored in
  /etc/iproute2/rt_tables as mappings between table IDs and labels, as
  allowed by the ip command.  The current structure of the labels is
  ctdb.<source-ip>.  This means that once the labels are setup the
  routing tables can be referenced by just knowing the source IP.  As
  with the old state directory, mappings in this file owned by CTDB
  are deleted when CTDB shuts down.

* There are no release or re-add scripts.

  - Release scripts are not necessary as an optimisation because of
    the previous improvement (i.e. use of rt_tables).  No lookup is
    necessary to delete rules or flush tables.

  - Re-add scripts are no longer used.  Routes can still go missing
    when removal of a primary IP from an interfaces (or similar)
    causes removal of all other addresses (i.e. secondaries) and also
    all associated routes.  However, any missing routes are now
    re-added in the "ipreallocated" event.  This happens shortly after
    takeip/releaseip/updateip and means that the routes will only be
    re-added once.  The window for missing routes is slightly bigger
    but is not expected to be significant.

* The magic "__auto_link_local__" configuration value no longer causes
  a dynamic configuration file to be maintained in a state directory.
  The link local configuration is now generated when needed from the
  public_addresses file.  This greatly simplifies the code.  This
  approach is slightly less efficient but should not be significant.

The above changes mean that, apart from maintaining mappings in the
rt_tables file, there are no state files kept anymore.

Some utility functions only used by this script have been rewritten
and moved into this script.  They will be removed from the functions
file by a future commit.

The route re-add code will also be removed from interface_modify.sh by
a future commit.  It is currently harmless.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0f7cbbb55f26cf3c953e98fe5e7eaa12f59fbf78)
2012-03-22 15:30:26 +11:00
Martin Schwenke
0d67779c67 Eventscript functions - add new function die()
Args:

1. Error message to be printed.

2. Option exit code (default 1)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 97b0c138cb97e30db27c40b4ee1481109ae90c78)
2012-03-22 15:30:26 +11:00
Ronnie Sahlberg
81fb334cff when shutting down ctdb, allow it 30 seconds instead of 10 before will -9 the daemon
(This used to be ctdb commit d8b400d76665f37ffd9de302eedcff9f23807225)
2012-02-21 19:02:36 +11:00
Ronnie Sahlberg
e3b85bba3f Add a hoook to the ctdb initscript that we can call out to for applications that want to
track and produce audit logs when someone runs "service ctdb <something>"

S1033891

(This used to be ctdb commit 4f4fbd4080a3a7226d3b82637f803c4b71217d39)
2012-02-06 12:07:08 +11:00
Mathieu Parent
956f06f3ae Fix ctdb-crash-cleanup sysconfig handling
(This used to be ctdb commit 667b174d605646b53f4855e9aaf5f8ce4fdde532)
2011-12-06 11:55:46 +11:00
Martin Schwenke
162ac70f9e Eventscripts - add facility to 10.interface to delete unmanaged IPs
For a number of reasons (delip failure, admin stupidity, ...) an
interface that hosts public addresses can also contain spurious,
unmanaged addresses.

Add functionality to 10.interfaces, controlled by new configuration
variable CTDB_DELETE_UNEXPECTED_IPS, to delete these addresses when
encountered as part of a monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 88f88d86b0d08240f749fb721b8c401c2eeb1099)
2011-11-17 16:47:00 +11:00
Martin Schwenke
ba5e5f51cf Eventscripts - remove $0 from error messages in 40.fs_use
The script name is now prepended to output by ctdbd.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bfa0fe70db195413a6d7a98f46f7a1270aba678c)
2011-11-16 16:26:49 +11:00
Martin Schwenke
9187db869e Eventscripts: Make 40.fs_use use less processes and arguably clearer.
* $fs can be parsed using shell prefix and suffix removal.

* df output can be parsed with a single call to sed.

  Failure is indicated by empty output from sed, so we check for that
  as the error condition, changing the associated message
  appropriately.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c5ef0d1440f1d952784cc67946c414d149722d01)
2011-11-16 16:26:45 +11:00
Mathieu Parent
4617a0e9cf config can be in /etc/default/ instead of /etc/sysconfig/
... on Debian system and derivated.

(ctdb_diagnostics still hardcodes /etc/sysconfig/)

(This used to be ctdb commit 1341329f6125d491b82c873f793af819e677f714)
2011-11-08 16:31:15 +11:00
Mathieu Parent
91431262be config/functions: CTDB_VARDIR is /var/lib/ctdb on Debian-like systems
(This used to be ctdb commit 56160eccb62178f645b017b1257677a1e854b2bc)
2011-11-08 16:31:03 +11:00
Mathieu Parent
0250b72a65 Fix bashism in 40.fs_use
Also, add -P to df, to avoid multiline on Linux when device name is long (this is the case with LVM)

(This used to be ctdb commit f4d5a5810f1a840a41c3541a3b822fce44d41e9a)
2011-10-12 20:08:40 +11:00
Mathieu Parent
a1919fd316 apache's service name is not always httpd
Solution 2 of <https://bugzilla.samba.org/show_bug.cgi?id=8317>

(This used to be ctdb commit 8b9ac5cd8d867ff4866ac464c570d9293d03a91e)
2011-10-12 20:07:45 +11:00
Mathieu Parent
7f1ff4dbd8 Less verbosity when there is no public addresses file
This partialy reverts 81eff51, but still avoid spam.

(This used to be ctdb commit e646142f4d28b5401235cd5edee325f7a29f8193)
2011-10-12 20:07:03 +11:00
Martin Schwenke
205c7c7663 Eventscripts - enhance ctdb_replay_monitor_status()
Print useful output and return a suitable exit code.

The DISABLED and TIMEDOUT statuses use fake negative return codes, and
these can't be faked from the shell.  So we map DISABLED to OK and
TIMEDOUT to ERROR - this should avoid nearly all surprises.  When we
do this we add a note to the beginning of the output.  The alternative
is to "fix" ctdbd to use only codes that can actually be returned by
shell scripts.  However, the reason for using negative codes is
probably to distinguish them from real ones...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dda44d026e0c1b02feb02185b8c200a542be341a)
2011-08-31 15:34:43 +10:00
Martin Schwenke
aa64622137 Eventscripts - use ctdb scriptstatus -Y when replaying status
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5be904fb1fbd546618d25509b41ab836db62a70a)
2011-08-30 16:34:43 +10:00
Martin Schwenke
b97625acb6 Eventscripts: add a synchronous synthetic reconfigure event.
In the current code services can only be reconfigured asynchronously.
This means that configuration file changes can be made, an asychronous
reconfigure event can be triggered, and it always succeeds.  Some time
later when a service is actually reconfigured then a failure may be
seen

This adds a synthetic reconfigure event that reconfigures a service
synchronously so that any failure is reported on exit.

ctdb_service_check_reconfigure() is essentially reimplemented.

If a reconfigure event is in flight and an ipreallocated or monitor
event occurs then any scheduled asynchronous reconfigure is deferred
until the next monitor cycle.  This is to avoid reconfigures trampling
on each other.  In this case a monitor event will also replay the
previous status to try to avoid exposing any temporary instability.

If a reconfigure event collides with another reconfigure event it will
exit with status 2, indicating that the reconfigure should be retried.

The reconfigure event is implemented using a subprocess to control the
exit from the synthetic event.

As before, if a monitor event causes a scheduled synchronous
reconfigure to occure then it will replay the previous status for the
service, given that a reconfigure can cause temporary instability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 220578bfd3507152b29ba4c28942f9d5e8733886)
2011-08-30 14:29:48 +10:00
Martin Schwenke
94c3429567 Eventscripts - call ctdb_check_args() in 00.ctdb
This is the first eventscript.  Sanity check as early as possible and
everyone benefits.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0564717fcc1e21688ae5dacbd437fd493bcb8853)
2011-08-30 09:33:47 +10:00
Martin Schwenke
bc4e62be85 Eventscripts - call ctdb_check_args() instead of doing hand checking
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc5bc1948dcbe8b8b25185260927b94a4b529174)
2011-08-30 09:33:47 +10:00
Martin Schwenke
7980a4cb44 Eventscripts - new function ctdb_check_args()
Pass this "$@" to do common eventscript argument checking.

For regular use putting this in 00.ctdb would be enough.  However, for
developer testing it can be useful to call this in other eventscripts.
For example, 10.interfaces and 13.per_ip_routing currently check these
by hand.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de7e7fd6dfeed61ef9977b8d5b568f90a9707b)
2011-08-30 09:33:47 +10:00
Martin Schwenke
63729fc35d Eventscripts - ctdb_check_tcp_ports() bug fix.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8d9c0b251c84d6fdf6ea7d972e5f7d1d0222f9b)
2011-08-30 09:33:47 +10:00
Martin Schwenke
194de8faf8 Eventscripts - fix debugging buglet in ctdb_check_tcp_ports_ctdb()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 61000e38d6016e58f67e292393756d0bd5262ae5)
2011-08-30 09:33:47 +10:00
Martin Schwenke
9257b57f2c Eventscripts: New configuration variable CTDB_SERVICE_AUTOSTARTSTOP.
Some of the current auto-start/stop logic is broken, particularly for
Samba.  Fixing it is non-trivial.

If $CTDB_SERVICE_AUTOSTARTSTOP is "yes" then auto-start/stop services
when told to newly manage or no longer manage them.  This defaults to
"yes".

However, if using a canned configuration file that doesn't set
$CTDB_SERVICE_AUTOSTARTSTOP then this stops the auto-start-stop logic
from working.  Therefore, this works around CQ S1026685 - on the
system in question another daemon controls service auto-start/stop and
CTDB just gets in the way.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ef71b8290ae49117d7bcc7166598b77cb64cc8a0)
2011-08-30 09:33:47 +10:00
Martin Schwenke
54402cdff4 Eventscripts - in 60.nfs uniquify the share check directory list
There are sites that have multiple entries for the same export.  This
optimises the share check in this case.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1ccdae79b64b236fc27f4653606429d73c9c3595)
2011-08-30 09:33:47 +10:00
Ronnie Sahlberg
02ebd35398 Merge remote branch 'martins/eventscripts'
(This used to be ctdb commit bb008c01989ebb173a3f095ebd2f90ab54f9da91)
2011-08-17 14:10:04 +10:00
Martin Schwenke
6e7dbf0543 Eventscripts - new default TCP port checker using "ctdb checktcpport"
New function ctdb_check_tcp_ports_ctdb().  This should be fast... and
is now the default checker.  If it fails in an unexpected way we fall
back to the nmap and netstat checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1e16a707ce204817531a61455000361f972080a)
2011-08-17 14:02:45 +10:00
Martin Schwenke
1374327f6e Eventscripts - generalise TCP port checking plus new nmap-based checker
Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().

Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.

ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try.  Default
value is currently "nmap netstat".  If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed.  This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9651175b40b9454e7d4e98291955fcf1445085e)
2011-08-17 12:12:20 +10:00
Martin Schwenke
62f654d3d2 Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging
Use the new debug function to conditionally print the netstat output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44c14aeeb11080980fe07c7396d06843a4870747)
2011-08-17 10:39:54 +10:00
Martin Schwenke
86792724a2 Eventscripts - weaken TCP port check message if CTDB has just been started.
Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started.  The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.

This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service.  When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message.  This means that until the node actually becomes
healthy we see more friendly messages.

The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started.  This reduces the chances of people reporting such
false recreates...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 571865eb6ef847857129d0b1e2ba5fa7254bfe8c)
2011-08-17 10:39:53 +10:00
Martin Schwenke
5c9fbb55ce Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port.  There are 2 problems with this:

* Netstat is run on each loop iteration when it need only be run once.

* The -a option is used to list all connections but the function only
  cares about the listening ports.  There may be many thousands of
  non-listening ports to grep through.

This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option.  It also only runs netstat once before the
main loop.

When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 830355a8b18c53cfcc3ad1e3009bbb1a7a681fa0)
2011-08-17 10:39:53 +10:00
Martin Schwenke
f0f9271301 Eventscripts: add a debug() function and call ctdb_set_current_debuglevel()
The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG).  If no args are given
then use stdin - this allows the function to be used with here
documents.

To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6143483d9f87322578c00f12081e381f425226ca)
2011-08-17 10:39:35 +10:00
Ronnie Sahlberg
ce4555b7a6 dont use a too big persistence timeout value
(This used to be ctdb commit 82628e32c431d66b806399ffb9657c3a031f6428)
2011-08-17 10:00:06 +10:00
Martin Schwenke
3e1a0528b8 Eventscripts - conditionally inherit ctdbd debug level in each monitor event
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a7eebc06f81a7b0a3fba93759bcbdeabc8c2e86e)
2011-08-17 09:14:23 +10:00
Martin Schwenke
171bef3d68 Eventscripts - new function ctdb_set_current_debuglevel()
This function ensures that CTDB_CURRENT_DEBUGLEVEL is set.  It works
like this:

1. If it is already set then do nothing, since it might have been set
   some other way.

   The recommended "other way" would be to add a file in rc.local.d/.

2. If it is not set then set it by sourcing
   /var/ctdb/eventscript_debuglevel.

3. If this file does not exist then create it using output from "ctdb
   getdebug".

If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.

If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93910921c8a25f2b029733cd938069ff7c7bdab7)
2011-08-17 09:00:46 +10:00
Martin Schwenke
430ca2f606 Eventscripts - ensure the statd update-trigger file always exists.
See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ee9856996a8ec738e9d3ea7f1561605da526b8c)
2011-08-16 13:28:40 +10:00
Martin Schwenke
1452b63d27 Eventscripts: remove "return 0" from 50.samba service_stop().
This potentially masks errors and was basically included by accident.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e7e4a1b4f31118027fd13a6223192f9957cf2e74)
2011-08-16 13:18:40 +10:00
Ronnie Sahlberg
81292ac0e6 Change the errors for 10.interface to clearly state ERROR: for error messages
Update the tests system to catch the new error strings generated by this change

(This used to be ctdb commit a2c30d88348da47d1a733a16e4c7d83c3becb6df)
2011-08-15 15:53:04 +10:00
Ronnie Sahlberg
1fb577f4b2 Merge remote branch 'martins/eventscript.10.interface'
(This used to be ctdb commit 0d17daab38d4086f922a8006d4c545133adca191)
2011-08-15 15:27:50 +10:00
Ronnie Sahlberg
bc00292cfe Merge remote branch 'martins/60_nfs_regression'
(This used to be ctdb commit 845fb0ba24cf9118470c58fae7103ab8322ce079)
2011-08-15 15:22:20 +10:00
Martin Schwenke
c9d168bbe4 Eventscripts: 10.interfaces - make startup event actually mark interfaces up!
The startup event intends to mark interfaces up.  However, it doesn't
actually do that because $INTERFACES is empty.

This uses the function get_all_interfaces() to list the
interfaces... and then mark them up.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fc62bf0975c6059ee467285565d0dc3b4daaf238)
2011-08-12 16:34:34 +10:00
Martin Schwenke
5ab955a73d Eventscripts: 10.interfaces - startup comment says assume all interfaces good.
Interfaces are currently marked down.  Mark them up instead, as per
the comment... and discussion with Ronnie.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 35942841229cc72ce363a7236aec708f1a33136b)
2011-08-12 16:34:34 +10:00
Martin Schwenke
e7963d8a65 Eventscripts: 10.interfaces - new function get_all_interfaces().
Move existing interface listing code to new function in preparation
for using it in startup event.

While we're here change the "sort | uniq" into "sort -u" and save some
complexity.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cd1442531ad079b11c60f46ee9d34f5104bef219)
2011-08-12 16:34:34 +10:00
Martin Schwenke
9bdcdb76be Eventscripts: 10.interface clean-ups - minor tweaks and new comments.
* sed can read files, it doesn't need a file piped to it
* use $() subshells instead of `` - they seem to quote better in dash
* tweak the uniquifying code so that it is easier to read
* add comments
* remove some extraneous semicolons at ends of lines

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5f49537889a92c3cb68d9203912188bedf00ecd4)
2011-08-12 16:34:13 +10:00
Martin Schwenke
32fe247e37 Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.
This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 380c9263eb37db5a250264316e250c2160908263)
2011-08-12 16:28:09 +10:00
Martin Schwenke
7c33fb1711 Eventscripts: 10.interface clean-ups - variable name fix-ups.
Change most of the uppercase variable names to lowercase for
consistency with other variables, readability and so they can be
easily distinguished from environment/configuration variables.  Change
the name of 2 of the variabless to add some clarity.  Changes are as
follows:

  INTERFACES   -> all_interfaces
  IFACES       -> ctdb_interfaces
  IFACE        -> iface
  I            -> i
  REALIFACE    -> realiface

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b201c1087b1433cfbc95de76cb4205e484ccd6f)
2011-08-12 15:57:34 +10:00
Martin Schwenke
6fa27bdf18 Eventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces().
The logic in the monitor event itself is very complex.  Nearly all of
it can go away by adding a single check of
$CTDB_PARTIALLY_ONLINE_INTERFACES to the return logic of
monitor_interfaces() and reversing the sense of the corresponding
check.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fa93177442c65c2a4eb2d5d5dba0a0da1c486969)
2011-08-12 15:00:03 +10:00