1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-12 09:18:10 +03:00
Commit Graph

785 Commits

Author SHA1 Message Date
Martin Schwenke
4fbf3e5bdf initscript: New configuration variable CTDB_DBDIR_STATE
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30d9b634b16c3cc740e5e453ea5c21012b1fde88)
2013-10-22 14:34:05 +11:00
Martin Schwenke
37aea37269 scripts: Make detect_init_style() more readable
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 516cdea0e73cf3f63b3303e22809834c8cbc64e4)
2013-10-22 14:34:05 +11:00
Martin Schwenke
0b69785eb2 eventscripts: Rework the iSCSI eventscript
* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45e2bc66abf9fcfeadcc279a656ed7fd1838920a)
2013-10-22 14:34:05 +11:00
Martin Schwenke
04c31bf50d eventscripts: Don't update static routes on "recovered" event
Routes only need to be updated when IPs have moved.  IP takeover runs
will generate "ipreallocated", which is enough.  "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1152215fc69217e4292762e28d193b7ea0e06ee3)
2013-10-22 14:34:05 +11:00
Martin Schwenke
3132550a88 eventscripts: NAT gateway script doesn't need to handle "recovered" event
Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event.  The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 542c70d6281d636ecd51502fbbf219f418bfac66)
2013-10-22 14:34:05 +11:00
Martin Schwenke
5369f711dc eventscripts: Delete placeholder "recovered" and "shutdown" events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00736a21fc268c10b6a718731e56b3dbb7e60554)
2013-10-22 14:34:04 +11:00
Martin Schwenke
2e819aa00f eventscripts: Clean up comment at the top of 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea9d3acfe7e8665685f54294f5edc9b8ffc2f3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cf04ff178c eventscripts: Remove reconfigure check from samba and winbind eventscripts
There is no reconfigure code for these scripts so no need to check for
reconfiguration.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 41df1637c1d8a7b2f5a9974408db71b1f74cb2f2)
2013-10-22 14:34:04 +11:00
Martin Schwenke
a45aae410c eventscripts: Remove reconfigure code from httpd eventscript
Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b77fd95bda5f1960aca952e1b759231890b56f3)
2013-10-22 14:34:04 +11:00
Martin Schwenke
49d0153b10 eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
A generic framework is no longer needed now that the "ctdb" checker is
the only one left.  Simplify the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 044d302b41a2040642355401e3236fcecc3a620a)
2013-10-22 14:34:04 +11:00
Martin Schwenke
0e9c939c0c eventscripts: Remove TCP port checks other than the built-in CTDB one
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.

Remove tests related to the removed checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
2013-10-22 14:34:04 +11:00
Martin Schwenke
d02a645691 scripts: Remove setting of PATH from functions file
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin.  Use of /usr/local/bin, which may be
required on some systems, is also overridden.  This can make it
difficult to do interactive debugging of script problems.

Rely on the system PATH instead.

If system-specific changes need to be made then this can be done in a
configuration file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfbff39e22e42f3997f637290748290833525714)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1ede20925f eventscripts: Clean up 20.multipathd
Reduce the complexity, including the depth of background processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49f077c475b078889ff0492fe7d567a64d6cb87c)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1e4c965f52 eventscripts: NAT gateway script should export CTDB_NATGW_NODES
Otherwise calls to "ctdb natgwlist" will not behave as expected if a
non-standard file is used, since that command will use the default
file location.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e574b30257126679704b088c4334a8e7a53a9c3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cd4041760b scripts: Simplify script_log() to just look at CTDB_SYSLOG variable
The old logic was actually wrong.  If CTDB_LOGFILE is unset then a
default is used, not syslog.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79e2029f9bc078126e865aa715100a3870c7604b)
2013-10-22 14:34:04 +11:00
Martin Schwenke
4526fdbbca scripts: Remove support for CTDB_OPTIONS configuration variable
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog).  If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1043b53d12 scripts: Remove unused configuration variable CTDB_MANAGES_SCP
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bda0da41aaf629a252cc361b73ebc5328f26ed04)
2013-10-22 14:34:03 +11:00
Martin Schwenke
04f67b1066 eventscripts: Deprecate NFS_SERVER_MODE, use CTDB_NFS_SERVER_MODE instead
All CTDB configuration variables should start with CTDB_.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
2013-10-22 14:34:03 +11:00
Martin Schwenke
ace6c1ee62 eventscripts: Fix comment - CTDB_TCP_PORT_CHECKS -> CTDB_TCP_PORT_CHECKERS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a79ba2f1277a776347e2c3f04ce8419e0be62de)
2013-10-22 13:07:13 +11:00
Martin Schwenke
5818771192 scripts: Add support for optional ctdbd.conf configuration file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)
2013-09-25 14:35:46 +10:00
Amitay Isaacs
4c4bfcbd6f eventscripts: Load CTDB configuration settings in 70.iscsi
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ff41ce5ef202f8f6342e285d195bb5df61d848ce)
2013-09-23 18:38:28 +10:00
Martin Schwenke
b88bf1275c eventscripts: Clean up monitoring of system memory in 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d)
2013-09-11 15:34:30 +10:00
Martin Schwenke
cc74417341 eventscripts: Avoid using a temporary file in 62.cnfs
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81833052d7ee8f76b1e98376a0273448640cfa8e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
bb974f150b scripts: Remove gdb_backtrace
This uses potentially insecure temporary files and is not referenced
anywhere else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
2013-08-22 17:00:20 +10:00
Martin Schwenke
fec69034ee eventscripts: Become unhealthy faster on nfsd failure
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem.  Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures.  Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
2013-08-14 16:10:30 +10:00
Martin Schwenke
e6ce2f55ef eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
2013-08-14 15:57:04 +10:00
Martin Schwenke
35d9631eda eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
2013-08-14 15:57:04 +10:00
Martin Schwenke
b1f7337d2b eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
2013-08-14 15:57:04 +10:00
Martin Schwenke
0ca046577f eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
2013-08-14 15:57:03 +10:00
Martin Schwenke
bdbe37b24f eventscripts: Separate out RPC service restart code
While doing this:

* Explicitly assign RPC program and version information in
  _nfs_check_rpc_common().  This is more lines of code but is easier
  to read.

* Don't print the options when starting a service.  Trying to print it
  makes the code messy for little benefit.

  Update the eventscript unit testing code and a Ganesha test to
  reflect this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
2013-08-14 15:57:03 +10:00
Martin Schwenke
df539a66cb eventscripts: Remove support for RPC service 'q' and 's' restart flags
They're hard to maintain and provide very little benefit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
2013-08-14 15:57:03 +10:00
Martin Schwenke
5459cdc8a6 eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output.  This is consistent with the way
CTDB generally deals with the output when stopping a service.

It also makes updating the eventscript unit tests easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
2013-08-14 15:57:03 +10:00
Martin Schwenke
98163e01a9 scripts: Do not run ctdb tool commands when debugging hung "init" event
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.

Also, minor log formatting changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
f5ddb49e62 eventscripts: Use configured RECLOCK file instead of asking CTDB
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead.  And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)
2013-08-09 11:04:55 +10:00
Martin Schwenke
3c73949317 initscript: The wrapper script should export CTDB_SOCKET
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value.  This at least ensures that ctdbd will be
started.

If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)
2013-07-29 15:58:51 +10:00
Martin Schwenke
a8dd716146 eventscripts: kill_tcp_connections() should send connections to stdin
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection.  This will considerably reduce the
time when there is a large number of tcp connections.  This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.

Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
2013-07-29 15:53:06 +10:00
Martin Schwenke
67b22b6e94 scripts: Run scriptstatus for hung event
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.

Since there is now quite a bit of output, serialise the calls to this
script using flock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)
2013-07-29 14:02:13 +10:00
Martin Schwenke
1da757d91a eventscripts: A missing interface should cause monitoring to fail
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.

This couldn't be done previously because orphaned interfaces used to
be listed for monitoring.  This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.

While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch.  ;-)

This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef)
2013-07-19 15:35:41 +10:00
Martin Schwenke
4b5c9c7991 eventscripts: Get list of configured interfaces using "ctdb ifaces"
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces.  This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)
2013-07-19 15:35:41 +10:00
Martin Schwenke
7610b6c009 scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 412bc0e20bef694d4e911dc9c984fd7716231f1f)
2013-07-11 15:18:06 +10:00
Martin Schwenke
e4d99cc899 packaging: Add systemd support
Based on an original patch by Sumit Bose <sbose@redhat.com>.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e43a4b7b69a21c4cec2453dcac436b64bf5d7f06)
2013-07-10 18:14:33 +10:00
Martin Schwenke
adbee6ae4e initscript: Simpify initscript and control CTDB via new ctdbd_wrapper
Currently the initscript is very complex.  This makes it hard to read
and hard to add support for new init systems, such as systemd.

Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd.  It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e3abc7eebab5cceddc4ce7817890dd5db9be3450)
2013-07-10 15:19:27 +10:00
Amitay Isaacs
ae0afad8ee initscript: Export CTDB_DEBUG_LOCKS variable
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a415a1986900135f889efc25ecaf2761b1dae81a)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
f46d0e783c scripts: Add an example debug_locks.sh script to debug locking issue
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c711ff4702c5f95b75e4bf030665fc2afffc2f9e)
2013-07-10 14:33:18 +10:00
Martin Schwenke
d6d1fb1f46 eventscripts: New configuration variable $CTDB_SKIP_GANESHA_NFSD_CHECK
This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f606df4f2db754592e6d1a16c26e155cacb2beef)
2013-07-05 15:52:33 +10:00
Martin Schwenke
7f6169b207 eventscript: Move Ganesha nfsd monitoring to a function
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ceb5b2d37f7ab4894908ec26f3812b3bed991525)
2013-07-05 15:52:33 +10:00
Martin Schwenke
c3e83d4532 eventscripts: Drop RPC service version from nfs_check_rpc_service() calls
Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 520914e7ee1b879c1080e5857fda18ed5b973fd6)
2013-07-05 15:52:33 +10:00
Martin Schwenke
4e07c6c433 eventscripts: When replaying monitor status, don't log empty output
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ce04f1c107b4392ca955d9f29b93aaaae62439ce)
2013-07-05 15:52:33 +10:00
Martin Schwenke
01d879806b eventscripts: "setup" event doesn't need to wait for SETUP runstate
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e)
2013-06-20 13:01:10 +10:00
Martin Schwenke
4eed91b54a eventscripts: 13.per_ip_routing should not try hard to find public_addresses
This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!

The test code has been fixed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332)
2013-06-20 13:01:10 +10:00
Martin Schwenke
6317285c4f scripts: Move TDB checking from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0)
2013-06-20 13:01:10 +10:00
Martin Schwenke
961468146e scripts: Move dropping of all IPs from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b77cceb49a30a181063adc7868d42d2851318e8)
2013-06-20 13:01:09 +10:00
Martin Schwenke
bee02e06e6 scripts: drop_ip() should use delete_ip_from_iface()
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)
2013-06-20 13:01:09 +10:00
Martin Schwenke
a1eb516f0a scripts: drop_all_public_ips() now prints messages to stdout, not log
Change all callers to maintain current behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)
2013-06-20 13:01:09 +10:00
Martin Schwenke
45878d4363 eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped.  This can be useful for trying to determine
why nfsd is stuck.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)
2013-06-14 15:15:06 +10:00
Martin Schwenke
f408caea2a eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Consider the following example:

1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
   underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.

Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)
2013-06-13 20:01:22 +10:00
Martin Schwenke
2e515f2306 eventscripts: Fix statd-callout update handling
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run.  This stops the statd-callout updates from ever being called.

Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file.  Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>

(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)
2013-05-28 16:11:47 +10:00
Martin Schwenke
1eab9c898c eventscripts: Stop NAT gateway's delete_all() from polluting the log
Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:

  ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1

Since this usually fails it is better to mute the error than to have
it pollute the log.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)
2013-05-27 15:15:25 +10:00
Martin Schwenke
66019e3287 scripts: Provide mktemp function for platforms without mktemp command
This is needed for AIX and possibly others.

Also provide a cheaper mktemp function is needed in the run_tests
script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
2013-05-27 15:14:33 +10:00
Martin Schwenke
a989a299d1 eventscripts: 11.natgw should not call ctdb tool in "init" event
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.

Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19)
2013-05-24 14:08:07 +10:00
Martin Schwenke
6d9667f01c ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).

Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
2013-05-24 14:08:07 +10:00
Martin Schwenke
b5ebff6931 tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.

At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately.  This behaviour isn't very
friendly.

The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.

The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)
2013-05-24 14:08:07 +10:00
Martin Schwenke
bb39f0a186 scripts: Rework notify.sh to use notify.d/ directory
This makes it easier to add notification handlers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d29e9a420b133088bf23a847c8d1dbce56c25eb0)
2013-05-23 16:18:23 +10:00
Martin Schwenke
51dbaecb54 eventscripts: Fix regression in _loadconfig()
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1619a36c1beba11533052dc5728fa3adaa08870)
2013-05-22 14:24:21 +10:00
Martin Schwenke
ff9831f5b1 initscript: If CTDB doesn't become ready, print a message before killing
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e6b6b793f61556c21e8daf34abf89ee7b388ecfb)
2013-05-22 14:24:21 +10:00
Amitay Isaacs
84bcb95952 eventscripts: Do not use bashism for string comparison
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit b0cae7d5a00ef3764bae187affc8e9a252f4b329)
2013-05-20 19:47:10 +10:00
Martin Schwenke
de84c1fd3c eventscripts: NFS RPC checks no longer support "knfsd"
No longer used, support removed from test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e)
2013-05-07 12:55:09 +10:00
Martin Schwenke
434f9e8594 eventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC services
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs

* Installation and packaging additions to handle nfs-rpc-checks.d/

* Unit test updates, including deleting 1 test that sanity checked
  test infrastructure

* Test infrastructure changes to use nfs-rpc-checks.d/

Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs.  To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7e792d6768d9ca420ce3713cb122e63afd594b15)
2013-05-07 12:55:09 +10:00
Martin Schwenke
05b2edeec2 eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"
Want nfs_check_rpc_services() to support filenames without the 'k'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
2013-05-06 20:40:58 +10:00
Martin Schwenke
c52183c055 eventscripts: New function nfs_check_rpc_services()
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.

nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator.  The files have one limit check and
a set of actions per line.  The program name is extracted from the
file name.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
2013-05-06 20:40:58 +10:00
Martin Schwenke
167acd1cd5 eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5a717fd495ba5a2bfd481d69f38b68fa4576716f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
bdab9d1ea6 eventscripts: Factor out common code from nfs_check_rpc_service()
This creates new function _nfs_check_rpc_common().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc3bb42e48bbdabd19187c231846b98589b4f4f3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
910e138cb3 eventscripts: Remove ganesha support from nfs_check_rpc_service()
This is unused so doesn't need to be maintained.  An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 887733dd7be53158bfe07b30ef31b611d0f8122f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
944d063a3e Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"
This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.

This change is unused and is just complicating the function.

Conflicts:
	config/functions

(This used to be ctdb commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
577a3cae5d eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
The code in 60.nfs is going to be genericised, so make all the checks
look the same.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
2013-05-06 20:40:58 +10:00
Martin Schwenke
6c347a5294 eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
2bc807f974 eventscripts: Remove unused function ctdb_check_counter_limit()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a8ef00608e48a551a334aded206146807aeb4c5a)
2013-05-06 16:24:59 +10:00
Martin Schwenke
460d0651b6 eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()
ctdb_check_counter_limit() can soon be removed...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1)
2013-05-06 16:24:59 +10:00
Martin Schwenke
8373226251 eventscripts: Might as well try to stat the reclock file first
It is in the background but it still might cause the counter to be
reset before it is checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ef2cf75e95ff382c65524a4d77eb00ab8411d2fc)
2013-05-06 16:24:58 +10:00
Martin Schwenke
31c3edcadf eventscripts: Make the early exit in 01.reclock earlier
That way we don't even check the counter...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 136abd4604dc68f7c696704bac708bae53cf1940)
2013-05-06 16:24:58 +10:00
Martin Schwenke
29a3823e40 eventscripts: Minor cleanups for killtcp/tickle functions
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25ef4f655f1efc833deb5e244f9fff461e92f439)
2013-05-06 16:24:50 +10:00
Martin Schwenke
189a5c003c eventscripts: Tweak the timeout check in kill_tcp_connections()
This has 2 advantages:

1. It uses get_tcp_connections_for_ip() to check for leftover
   connections, instead of custom code.

2. It checks for the timeout condition before sleeping.  The current
   code sleeps and then checks, so wastes a second.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 60a08eb96e1d97aab31e9bd4af01683c650541c2)
2013-05-06 16:22:15 +10:00
Martin Schwenke
8f84a2bec7 eventscripts: In killtcp/tickle functions, $_failed should be boolean
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 319c1b68d5aa78f82a68febcad233a7c78afc887)
2013-05-06 16:22:07 +10:00
Martin Schwenke
ed59deaee3 eventscripts: Remove unused $_killcount from tickle_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8514ca56830b30e7f0eb5018632640daaf8ff65d)
2013-05-06 16:16:56 +10:00
Martin Schwenke
975ea7fb7a eventscripts: Refactor connection listing in killtcp and tickle functions
Uses new function get_tcp_connections_for_ip().  This avoids using a
temporary file and running netstat twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a621622903c7ef17764b15293d6ea8df5a53c7e1)
2013-05-06 16:16:50 +10:00
Martin Schwenke
a320e1f7f1 eventscripts: Reimplement kill_tcp_connections_local_only()
... using kill_tcp_connections()

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 10e4db8f796d1e3259733180494db3b4bbad291a)
2013-05-06 15:45:11 +10:00
Martin Schwenke
5e828b48fe eventscripts: Change handling of one-way kills in kill_tcp_connections()
This change is a no-op.  However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
2013-05-06 15:45:10 +10:00
Martin Schwenke
d98d931af3 eventscripts: Remove unnecessary variables from killtcp/tickle functions
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3eae161472e6352f7f656851c73dc056f95113eb)
2013-05-06 15:45:10 +10:00
Martin Schwenke
6e2863a4f9 eventscripts: Clean up ctdb_check_command()
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
2013-05-06 15:45:10 +10:00
Martin Schwenke
30addb886a eventscripts; Cleanup up ctdb_check_directories()
The documentation comments are wrong... and remove option
$service_name argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
0ad8f46db3 eventscripts: Assert that $service_name is set in a few key places
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d0a7d83ddc824961d876fc9afba829c90aef3e7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
5dd9e52e46 eventscripts: counters default to $script_name if $service_name not set
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fff88940f71058e4eefd65f50a6701389c005c17)
2013-05-06 15:45:10 +10:00
Martin Schwenke
e9abc9c070 eventscripts: Simplify handling of $service name in "managed" functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().

Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
2013-05-06 15:45:10 +10:00
Martin Schwenke
c56acf7127 eventscripts: Simplify handling of $service name in start/stop functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
8065366b33 eventscripts: Simplify handling of $service name in service_management
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
2013-05-06 15:45:10 +10:00
Martin Schwenke
4c9438b2a3 eventscripts: Simplify handling of $service name in reconfigure functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
642848b916 eventscripts: Remove unused function ctdb_check_counter_equal()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fd536a26b310b5bf9628da62cca0b425f4a54030)
2013-05-06 15:45:10 +10:00
Martin Schwenke
bbd0ed0e29 scripts: Fix script_log() regression
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.

Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
2013-05-06 15:43:16 +10:00
Martin Schwenke
27a5b78c8e initscript: Look for tdbtool/tdbdump using which, not in fixed locations
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c74cc0442eb90d859eae270b59456d28605817c4)
2013-05-06 15:40:30 +10:00
Martin Schwenke
fa16cccf02 ctdbd: Remove the "stopped" event
It isn't used, superceded by "ipreallocated".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
2013-05-06 13:38:21 +10:00