1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00
Commit Graph

3600 Commits

Author SHA1 Message Date
Martin Schwenke
54402cdff4 Eventscripts - in 60.nfs uniquify the share check directory list
There are sites that have multiple entries for the same export.  This
optimises the share check in this case.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1ccdae79b64b236fc27f4653606429d73c9c3595)
2011-08-30 09:33:47 +10:00
Ronnie Sahlberg
2902203900 Logging: when we log stdout/stderr messages from eventscripts to the system log, prefix every line of output with the name of the eventscript.
CQ S1028412

(This used to be ctdb commit 392363c04185f47a826fc6ed95038342be2150bf)
2011-08-26 09:39:25 +10:00
Ronnie Sahlberg
6692512299 LibCTDB : update the ctdb tool to use libctdb to read the recovery mode
(This used to be ctdb commit 750a31cf95c356a0ee071967537eb615dce35845)
2011-08-23 16:35:08 +10:00
Ronnie Sahlberg
75afbee956 LibCTDB : uptade the ctdb tool to use libctdb to query for the recmaster
(This used to be ctdb commit 81c14c8625a6d5670b8795a655d7a3f3318009e5)
2011-08-23 16:32:38 +10:00
Ronnie Sahlberg
7e29817f61 LibCTDB : initialize ctdb->pnn to -1 when we create a new context
but before we learn the pnn of the local node

(This used to be ctdb commit 2cc48be3219b887b85649a14db311af0549041cf)
2011-08-23 16:16:48 +10:00
Ronnie Sahlberg
c0f724e0f9 LibCTDB : change the ctdb_fetch_lock_once test tool to use libctdb instead of the old client
(This used to be ctdb commit cd1080726d7787b335ab4bfb64a7991237ab92f5)
2011-08-23 15:32:27 +10:00
Ronnie Sahlberg
b00b0e9d2e LibCTDB : add support for getrecmode
(This used to be ctdb commit b663f286ea8edd64c0405a1ab45b6ef1da501bf5)
2011-08-23 15:32:14 +10:00
Ronnie Sahlberg
af19b5acff LibCTDB: add commands where an application can query how many commands are active
and we have not yet received a reply to.
Applications may use this command to query if it is "safe" to stop the event system and sleep
or whether it should first wait for all activity to ctdb daemons to cease first.

(This used to be ctdb commit 8d89bfdfd1f55dfeb22890b8bb0f08f31d1fa91a)
2011-08-23 12:43:16 +10:00
Volker Lendecke
1cf1670f0a Fix a const warning
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e25559087c9752502580875f7e33f3c416c05f84)
2011-08-22 17:11:07 +02:00
Volker Lendecke
fff653d126 Remove an unused variable
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 04c3d9c7c9ffa8bb95b0bf1513fd79f6c1096a2f)
2011-08-22 17:11:07 +02:00
Volker Lendecke
85bc1ccb7e libctdb: "unpack_reply_control" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit bb8f68f0256c43fe0671fe45023d1c88e340ad96)
2011-08-22 17:11:07 +02:00
Volker Lendecke
da528a65b1 libctdb: "unpack_reply_call" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 77ae553249ef1e1d467d792ac033f2aaa4e337e6)
2011-08-22 17:11:07 +02:00
Volker Lendecke
21bb8abc93 libctdb: "ctdb_request_free" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 5a5ed2a43b76bec69494b6cdc6451527f5c472e5)
2011-08-22 17:11:07 +02:00
Volker Lendecke
b0706be89e libctdb: Make sure ctdb_request->ctdb is filled correctly
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 06433d20a43d41f05b96a9dda6dc5931539feaa3)
2011-08-22 17:11:06 +02:00
Volker Lendecke
8638b5f5d2 libctdb: Ensure 0-termination of sun_path
Rusty, please check!

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 955f67a02026b157440d2ae87ead193773331e75)
2011-08-22 17:11:06 +02:00
Volker Lendecke
b4fd8024b5 libctdb: Fix a few format warnings
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit fa6564c24429e084be728dbe6eea1dec13e58709)
2011-08-22 17:11:06 +02:00
Volker Lendecke
ef64060898 libctdb: Add license header to messages.c
Rusty, please check!

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 4bceba374be82e76ba5d9d923172e85e9365b990)
2011-08-22 17:11:06 +02:00
Volker Lendecke
d55e6cf53c libctdb: Reorder attachdb
No code change, this is for easier reading the sequence of what happens

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 883b9b185dce03a6bf233fbf2cfabad9883519e5)
2011-08-22 17:11:06 +02:00
Volker Lendecke
a31d7516f5 libctdb: Reorder set_message_handler
No code change, this is for better readability

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e0f93459e05eef33024096654b4aaf1eb3c6d7c4)
2011-08-22 17:11:06 +02:00
Volker Lendecke
19f31f86ac libctdb: Correct 4bfdfda, stddef.h is needed by libctdb_private.h
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 683caa7bbf45d5b6791e53e2f3ee6d0ac3b08f28)
2011-08-22 17:11:06 +02:00
Volker Lendecke
452f6504a0 Add missing #include to libctdb/ctdb.c
We need that to have the "offsetof" macro, thus we don't need to redeclare it
in libctdb_private.h

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 4bfdfdae4f8ab23f14bc6ab4c06b66c07714ec17)
2011-08-17 15:12:46 +02:00
Ronnie Sahlberg
02ebd35398 Merge remote branch 'martins/eventscripts'
(This used to be ctdb commit bb008c01989ebb173a3f095ebd2f90ab54f9da91)
2011-08-17 14:10:04 +10:00
Martin Schwenke
6e7dbf0543 Eventscripts - new default TCP port checker using "ctdb checktcpport"
New function ctdb_check_tcp_ports_ctdb().  This should be fast... and
is now the default checker.  If it fails in an unexpected way we fall
back to the nmap and netstat checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1e16a707ce204817531a61455000361f972080a)
2011-08-17 14:02:45 +10:00
Martin Schwenke
1374327f6e Eventscripts - generalise TCP port checking plus new nmap-based checker
Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().

Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.

ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try.  Default
value is currently "nmap netstat".  If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed.  This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9651175b40b9454e7d4e98291955fcf1445085e)
2011-08-17 12:12:20 +10:00
Martin Schwenke
62f654d3d2 Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging
Use the new debug function to conditionally print the netstat output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44c14aeeb11080980fe07c7396d06843a4870747)
2011-08-17 10:39:54 +10:00
Martin Schwenke
86792724a2 Eventscripts - weaken TCP port check message if CTDB has just been started.
Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started.  The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.

This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service.  When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message.  This means that until the node actually becomes
healthy we see more friendly messages.

The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started.  This reduces the chances of people reporting such
false recreates...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 571865eb6ef847857129d0b1e2ba5fa7254bfe8c)
2011-08-17 10:39:53 +10:00
Martin Schwenke
5c9fbb55ce Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port.  There are 2 problems with this:

* Netstat is run on each loop iteration when it need only be run once.

* The -a option is used to list all connections but the function only
  cares about the listening ports.  There may be many thousands of
  non-listening ports to grep through.

This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option.  It also only runs netstat once before the
main loop.

When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 830355a8b18c53cfcc3ad1e3009bbb1a7a681fa0)
2011-08-17 10:39:53 +10:00
Martin Schwenke
f0f9271301 Eventscripts: add a debug() function and call ctdb_set_current_debuglevel()
The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG).  If no args are given
then use stdin - this allows the function to be used with here
documents.

To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6143483d9f87322578c00f12081e381f425226ca)
2011-08-17 10:39:35 +10:00
Ronnie Sahlberg
9d0a5b167c Add a new command 'ctdb checktcpport <port>'
that tries to bind to the specified port on INADDR_ANY.

This can be used for testing if a service is listening to that port or not.

Errors are printed to stdout and the returned status code is either 0 : if we managed to bind to the port (in which case the service is NOT listening on that bort) or the value of errno that stopped us from binding to a port.

errno for EADDRINUSE is 98 so a script using this command should check the status code against the value 98.
If this command returns 98 it means the service is listening to the specified port.

(This used to be ctdb commit 04cbb490c5a075080923fde58af7082572c55c43)
2011-08-17 10:20:19 +10:00
Ronnie Sahlberg
ce4555b7a6 dont use a too big persistence timeout value
(This used to be ctdb commit 82628e32c431d66b806399ffb9657c3a031f6428)
2011-08-17 10:00:06 +10:00
Martin Schwenke
3e1a0528b8 Eventscripts - conditionally inherit ctdbd debug level in each monitor event
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a7eebc06f81a7b0a3fba93759bcbdeabc8c2e86e)
2011-08-17 09:14:23 +10:00
Martin Schwenke
171bef3d68 Eventscripts - new function ctdb_set_current_debuglevel()
This function ensures that CTDB_CURRENT_DEBUGLEVEL is set.  It works
like this:

1. If it is already set then do nothing, since it might have been set
   some other way.

   The recommended "other way" would be to add a file in rc.local.d/.

2. If it is not set then set it by sourcing
   /var/ctdb/eventscript_debuglevel.

3. If this file does not exist then create it using output from "ctdb
   getdebug".

If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.

If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93910921c8a25f2b029733cd938069ff7c7bdab7)
2011-08-17 09:00:46 +10:00
Martin Schwenke
430ca2f606 Eventscripts - ensure the statd update-trigger file always exists.
See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ee9856996a8ec738e9d3ea7f1561605da526b8c)
2011-08-16 13:28:40 +10:00
Martin Schwenke
1452b63d27 Eventscripts: remove "return 0" from 50.samba service_stop().
This potentially masks errors and was basically included by accident.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e7e4a1b4f31118027fd13a6223192f9957cf2e74)
2011-08-16 13:18:40 +10:00
Ronnie Sahlberg
81292ac0e6 Change the errors for 10.interface to clearly state ERROR: for error messages
Update the tests system to catch the new error strings generated by this change

(This used to be ctdb commit a2c30d88348da47d1a733a16e4c7d83c3becb6df)
2011-08-15 15:53:04 +10:00
Ronnie Sahlberg
569ea5c4e1 Merge remote branch 'martins/eventscript_tests'
(This used to be ctdb commit 4e670d9bc1bdeb2abd7e846bc36e02f0aa0d7309)
2011-08-15 15:43:15 +10:00
Martin Schwenke
65ff8b4b7b Tests - exportfs stub needs to print out export options.
This is needed due to bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 92f8e31f6995836b1668639a4dac2862efee269a)
2011-08-15 15:42:20 +10:00
Ronnie Sahlberg
1fb577f4b2 Merge remote branch 'martins/eventscript.10.interface'
(This used to be ctdb commit 0d17daab38d4086f922a8006d4c545133adca191)
2011-08-15 15:27:50 +10:00
Ronnie Sahlberg
bc00292cfe Merge remote branch 'martins/60_nfs_regression'
(This used to be ctdb commit 845fb0ba24cf9118470c58fae7103ab8322ce079)
2011-08-15 15:22:20 +10:00
Ronnie Sahlberg
2c5f1d7ccc Merge remote branch 'martins/eventscript.60.nfs.rpc'
(This used to be ctdb commit 2e30a2bb4371a846c7a768affa15883211642d5c)
2011-08-15 15:20:18 +10:00
Ronnie Sahlberg
775e188cb7 Merge remote branch 'martins/test_suite'
(This used to be ctdb commit f9899b1b96056d23628356589c855cf2262e5152)
2011-08-15 15:16:06 +10:00
Ronnie Sahlberg
846d1c77d6 Merge remote branch 'martins/eventscript_tests'
(This used to be ctdb commit 06b322ad6eff8d4e691f8e014b7d85983b261147)
2011-08-15 15:15:12 +10:00
Martin Schwenke
facd0ce624 Tests - ctdb listvars test should allow alphanumericals in tunable names.
This matches the new "LCP2PublicIPs" tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0680437bf5f02aeaed6387370e58bbdba2c04f28)
2011-08-15 13:56:26 +10:00
Ronnie Sahlberg
d1f5177374 Change the default for ip failover to be LCP2 and not DeterministicIPs
(This used to be ctdb commit 038916248a73d6a250108c9235c0c4f76dba8e0c)
2011-08-15 10:43:42 +10:00
Martin Schwenke
c9d168bbe4 Eventscripts: 10.interfaces - make startup event actually mark interfaces up!
The startup event intends to mark interfaces up.  However, it doesn't
actually do that because $INTERFACES is empty.

This uses the function get_all_interfaces() to list the
interfaces... and then mark them up.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fc62bf0975c6059ee467285565d0dc3b4daaf238)
2011-08-12 16:34:34 +10:00
Martin Schwenke
5ab955a73d Eventscripts: 10.interfaces - startup comment says assume all interfaces good.
Interfaces are currently marked down.  Mark them up instead, as per
the comment... and discussion with Ronnie.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 35942841229cc72ce363a7236aec708f1a33136b)
2011-08-12 16:34:34 +10:00
Martin Schwenke
e7963d8a65 Eventscripts: 10.interfaces - new function get_all_interfaces().
Move existing interface listing code to new function in preparation
for using it in startup event.

While we're here change the "sort | uniq" into "sort -u" and save some
complexity.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cd1442531ad079b11c60f46ee9d34f5104bef219)
2011-08-12 16:34:34 +10:00
Martin Schwenke
9bdcdb76be Eventscripts: 10.interface clean-ups - minor tweaks and new comments.
* sed can read files, it doesn't need a file piped to it
* use $() subshells instead of `` - they seem to quote better in dash
* tweak the uniquifying code so that it is easier to read
* add comments
* remove some extraneous semicolons at ends of lines

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5f49537889a92c3cb68d9203912188bedf00ecd4)
2011-08-12 16:34:13 +10:00
Martin Schwenke
3b43805a31 Tests: re-enable the NFS eventscript tests - they work again.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3e145ab1bb61ed2087ec5ce6183ee24802686ed3)
2011-08-12 16:30:54 +10:00
Martin Schwenke
32fe247e37 Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.
This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 380c9263eb37db5a250264316e250c2160908263)
2011-08-12 16:28:09 +10:00
Martin Schwenke
7c33fb1711 Eventscripts: 10.interface clean-ups - variable name fix-ups.
Change most of the uppercase variable names to lowercase for
consistency with other variables, readability and so they can be
easily distinguished from environment/configuration variables.  Change
the name of 2 of the variabless to add some clarity.  Changes are as
follows:

  INTERFACES   -> all_interfaces
  IFACES       -> ctdb_interfaces
  IFACE        -> iface
  I            -> i
  REALIFACE    -> realiface

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b201c1087b1433cfbc95de76cb4205e484ccd6f)
2011-08-12 15:57:34 +10:00
Martin Schwenke
6fa27bdf18 Eventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces().
The logic in the monitor event itself is very complex.  Nearly all of
it can go away by adding a single check of
$CTDB_PARTIALLY_ONLINE_INTERFACES to the return logic of
monitor_interfaces() and reversing the sense of the corresponding
check.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fa93177442c65c2a4eb2d5d5dba0a0da1c486969)
2011-08-12 15:00:03 +10:00
Martin Schwenke
00c4cc6d22 Eventscripts: 10.interfaces clean-up - use more descriptive variable names.
The name of variable $ok gives no clue to its meaning/use so this
changes that variable to be named $up_interfaces_found.

The return logic relating to $ok and $fail is difficult to read, so
these variables are given true/fale values, allowing the return logic
to be simplified.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3402930319d462eab5525410f6a676952e120182)
2011-08-12 14:49:27 +10:00
Martin Schwenke
bb5db84021 Eventscripts: 10.interfaces cleanup - new functions mark_up(), mark_down().
The same few lines of logic are used every time an interface up or down.

This encapsulates those few lines in 2 new functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ab443c4d7d282f282792abc6a6ac224ab06abe30)
2011-08-12 14:43:15 +10:00
Martin Schwenke
1d71dd08e3 Eventscripts: change failure counts and behaviour for statd and nfsd.
We reduce the number of failures before attempting a restart.
However, after 6 failures we mark the cluster unhealthy and no longer
try to restart.  If the previous 2 attempts didn't work then there
isn't any use in bogging the system down with an attempted restart on
every monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f654739080b40b7ac1b7f998cacc689d3d4e3193)
2011-08-12 14:16:17 +10:00
Martin Schwenke
398116ff29 Eventscripts: clean up 60.nfs monitor event.
This adds a helper function called nfs_check_rpc_service() and uses it
to make the monitor event much more readable.  An example of usage is
as follows:

  nfs_check_rpc_service "mountd" \
    -ge 10 "verbose restart:b unhealthy" \
    -eq 5 "restart:b"

The first argument to nfs_check_rpc_service() is the name of the RPC
service to be checked.  The RPC service corresponding to this command
is checked for availability using the rpcinfo command.  If the service
is available then the function succeeds and subsequent arguments are
ignored.

If the rpcinfo check fails then a failure counter for that particular
RPC service is incremented and subsequent arguments are processed in
groups of 3:

1. An integer comparison operator supported by test.
2. An integer failure limit.
3. An action string.

The value of the failure counter is checked using (1) and (2) above.
The first check that succeeds has its action string processed - note
that this explains the somewhat curious reverse ordering of checks.

It the example above:

* If the counter is >= 10 then a verbose message is printed
  describing the failure, the service is restarted in the background
  and the node is marked as unhealthy (via an "exit 1" from the
  function).

* If the counter is == 5 then the service us restarted in the
  background.

For more action options please see the code.

This also changes the ctdb_check_rpc() function so that it no longer
takes a program number to check.  It now just takes a real RPC program
name that rpcinfo can resolve via /etc/rpc.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9b66057964756a6245bafb436eb6106fb6a2866e)
2011-08-12 14:16:14 +10:00
Martin Schwenke
881054a0ad Tests: Re-enable the Samba eventscript tests.
They work again.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2036764bfd1a4571fcfcca22099c2b9a95a02c57)
2011-08-11 15:33:46 +10:00
Martin Schwenke
9f98ec85d9 Revert "Tests: tweak some samba tests to cope with debug from ctdb_check_tcp_ports()."
This reverts commit 557ac30e60516742da10b83bfbbbb41430c977a2.

(This used to be ctdb commit 9600cc7a6b7b854fac1a5b080129e3df8fcbd84e)
2011-08-11 15:32:28 +10:00
Martin Schwenke
1971336200 Eventscripts: fix regression in 60.nfs export checking.
Commit 35a60a63a9b5c7d98dde514ae552239506b691c9 introduced a
regression, reported by "Jonathan Buzzard" <J.Buzzard@dundee.ac.uk>,
as follows:

  Basically the use of sed in the following code snippet does not work
  for long exports where exportfs wraps the host or network onto the
  next line.

         exportfs | grep -v '^#' | grep '^/' |
         sed -e 's/[[:space:]]*[^[:space:]]*$//' |
         ctdb_check_directories

  The result is that the you get lots of blank lines being sent to
  ctdb_check_directories which causes the host to be marked as
  unhealthy and then thrashing sets in of the managed IP's making the
  whole cluster unusable.

This tightens up the sed expression so that it is less likely to
produce a spurious empty line.  It also removes an unnecessary "grep -v".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6)
2011-08-11 15:01:39 +10:00
Ronnie Sahlberg
f9e58b502f Merge remote branch 'martins/eventscript.10.interface'
(This used to be ctdb commit 84ac667af408816e5508719b9fdb7c5e25408640)
2011-08-11 14:15:22 +10:00
Ronnie Sahlberg
b77a78d809 Merge remote branch 'martins/eventscript_infrastructure'
(This used to be ctdb commit 20864822372b6d574c545287002a429b273c4bcc)
2011-08-11 14:01:02 +10:00
Martin Schwenke
088620b026 Eventscripts: in 60.nfs move statd-notify code to service_reconfigure().
This means that it now occurs on every reconfigure event.  As a result
the ipreallocated event is removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c45a89418ba733ff91d48340d72bdb6d2ef80051)
2011-08-11 13:56:25 +10:00
Martin Schwenke
eef89f83b2 Eventscripts - 60.nfs should define service_reconfigure().
Not $service_reconfigure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 642292d7ba7a95567964b4160c7ee31a4f8985d1)
2011-08-11 13:55:02 +10:00
Ronnie Sahlberg
53b956fee7 When starting and stopping ctdb through the init-script, make sure we first clear all public ips bvefore we start the daemon, in case they are still hanging around since a previous kill -9 and also make sure we drop them after we have stopped the deamon when shutting down
CQ S1027550

(This used to be ctdb commit 8de5513b3ad89711da845c7588d35b32e2f2acb6)
2011-08-11 11:48:04 +10:00
Martin Schwenke
3a760b09ed Evenscripts: improvements to ctdb_service_check_reconfigure().
* Make this function applicable to "ipreallocated" event too.

* Monitor event should not always succeed just because we reconfigure.

  If the service was unhealthy before the reconfigure and we end the
  reconfigure with "exit 0" then we can cause the node's health status
  to flip-flop.

  To avoid this we return the status of the service from the previous
  monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 21dfcbbdccd906fcd6ab7bba81418ce565bf63aa)
2011-08-11 10:46:57 +10:00
Martin Schwenke
e66a1af9b3 Eventscripts: 50.samba - only start/stop nmbd if $CTDB_SERVICE_NMB set.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit defaec99df8c279d8e315d5010f9146e013afda2)
2011-08-11 10:46:57 +10:00
Martin Schwenke
8fb04d451e Eventscripts: 50.samba needs null service_reconfigure() function.
Samba doesn't need to do anything for configuration changes.  It will
notice configuration changes and reload automatically.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit de13350c17261032a7468c2cf4d2cf4a8d66a840)
2011-08-11 10:46:57 +10:00
Martin Schwenke
b01d99a8fa Eventscripts: 40.vsftpd service_stop() no longer /dev/null's output.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f928c201b6d0e1cd3e5568ae65186e3cee7c4988)
2011-08-11 10:46:57 +10:00
Martin Schwenke
1ea3616dcc Eventscripts: improvements to 41.httpd.
* Reduce the failure counts so that restart attempts happen sooner.

* Use service_start() and service_stop() for the restart.
  ctdb_service_start() resets the failure count, which isn't very
  useful in this context.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 01776b9f29af9ad5c8534649ece1bd100e450434)
2011-08-11 10:46:56 +10:00
Martin Schwenke
2a14f91722 Eventscript functions: new function ctdb_check_counter().
This should eventually be able to replace ctdb_check_counter_limit()
and ctdb_check_counter_equal(), although it doesn't issue warnings
like the former.

It takes 4 optional arguments:

1. _msg - If "error" then over limit causes an error message and and
   exit 1.  Anything else fails silently but the function returns 1.
   Default is "error".

2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt).
   Default is -ge.

3. _limit - Limit for the counter to be used in comparison.  Default is
   $service_fail_limit.

4. _service_name - Used to identify the counter.  Default is
   $service_name.

For example:

  ctdb_check_counter error -ge 5 foo

will print a message and exit 1 if the counter for foo is >= 5,
whereas

  ctdb_check_counter check -ge 5 foo

will just return 1 if the counter for foo is >= 5, and

  ctdb_counter_check

with print a message and exit 1 if the counter for $service_name is >=
$service_fail_limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b01b7233515669e995e037205796e265643b176)
2011-08-11 10:46:56 +10:00
Martin Schwenke
219c6fd55b Eventscripts: remove unused remove_ip() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 881af7c1417962b9b3ade6565b3e8eb9f9df7a97)
2011-08-11 10:46:56 +10:00
Martin Schwenke
5c948528b5 Eventscripts: startstop_nfs stop no longer redirects output to /dev/null.
When stopping (as opposed to restarting) it is useful to see this
information.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9ab1937239761dc32b143c9d225447bc6f090b4)
2011-08-11 10:46:56 +10:00
Martin Schwenke
caee6f1508 Eventscripts: fix typo in _ctdb_counter_common().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f57d1722b6aa082f3f826171acc57d7d796ea95c)
2011-08-11 10:46:56 +10:00
Martin Schwenke
ab693dbcc0 Eventscripts: improve log messages in ctdb_start_stop_service().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6da7095192fb172a06b434cfb02f4bfa6221b343)
2011-08-11 10:46:56 +10:00
Martin Schwenke
1b956b2b0a Eventscript functions: fix counter regression.
d362be7d32079ac1390d67056ce107bfbca2c937 wasn't well thought out.
Subsequent commits depend on ctdb_counter_init() taking an argument,
so this makes those cases work.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 05a8fcfbac3da2b5843b31e0fe258255cc761190)
2011-08-11 10:46:56 +10:00
Martin Schwenke
217edfa1c8 Eventscript functions: ctdb_service_check-reconfigure() acts only on monitor.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit beabf506a5eb68fc50fdbf8772c1d2bb0f7951e3)
2011-08-11 10:46:56 +10:00
Martin Schwenke
cd4074d2f8 Eventscripts: make 50.samba use $service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0f003f05e28037eefdce3a686fcb52cd2289af9d)
2011-08-11 10:46:56 +10:00
Martin Schwenke
3d1f0100be Evenscripts: update 60.nfs to use ctdb_service_check_reconfigure.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7c070b0bc86b3b9a91a9dc263b72c0567934535c)
2011-08-11 10:46:56 +10:00
Martin Schwenke
a35138a001 Evenscripts: update 60.nfs to use ctdb_setup_service_state_dir.
The state directory basename becomes "nfs" rather than "statd".  One
line of code i moved from the "startup" event to service_start().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc4c5c19af7efe01c48f73bb5ec5e607ed79db4c)
2011-08-11 10:46:20 +10:00
Martin Schwenke
d6c5fcfbae Evenscripts: update 40.vsftpd to use ctdb_service_check_reconfigure.
To simplify we also remove the reconfigure from the recovered event
because the monitor event will handle this very quickly anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit da3aedd1a472b430b75989d3c157efedd382e327)
2011-08-11 10:46:20 +10:00
Martin Schwenke
4daf8bb1c8 Evenscripts: update 41.httpd to use ctdb_service_check_reconfigure.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 51c45b1c4751af41e5f9fd252763e0025f8cce3a)
2011-08-11 10:46:20 +10:00
Martin Schwenke
820d9b30ea Eventscripts: rejig the reconfigure infrastructure.
* Add an optional service name argument to existing reconfigure
  functions.

* User function service_reconfigure() instead of variable
  $service_reconfigure to specify how a service is reconfigured.

* New function ctdb_service_check_reconfigure() reconfigures a service
  if it is flagged for reconfigure.

* Remove $service_reconfigure settings from 40.vsftpd and 41.httpd -
  they're the defaults.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15d4111d0761d82f57d5d4f0b1227812d14e4d7c)
2011-08-11 10:46:20 +10:00
Martin Schwenke
5b5bd3d27b Eventscript functions: move flagging of managed services.
Move flagging of managed or unmanaged services into
ctdb_service_start() and ctdb_service_stop().  That way services will
be correctly flagged if they are started from the startup and shutdown
events.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8675744cbd90b5a5095ed6fff7b36ae82004a457)
2011-08-11 10:46:20 +10:00
Martin Schwenke
428e32d647 Eventscript function: change service_start into a function.
service_start is currently a variable.  This makes passing arguments
hard.  We change it to be a function and put default definitions into
the functions file.

We use a convention that if a service name argument is passed to a
redefined version of service_start() or service_stop() then it will
act unconditionally.  If no argument is passed then it can use
internal logic to decide if services should really be started.  This
is useful when a single eventscript handles multiple services.

This is a cherry-pick of ae38895 that needed to be reset mid-stream.
There is still some breakage following this commit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 86e4aefed9fd1028660c98e3ea758c2b75ffc1d8)
2011-08-11 10:46:20 +10:00
Martin Schwenke
f60802c776 Eventscript functions: add optional event name argument to fail count functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b14f18649f42aab80ce0336c15ab6159f241c9af)
2011-08-11 10:46:20 +10:00
Martin Schwenke
ea6a53e2b3 Eventscript functions - optimise is_ctdb_managed_service().
This function generates a lot of trace when running under "set -x".
This is due to the backward compatibility code.

This adds 3 optimisations:

1. Before invoking the backward compatiblity code,
   is_ctdb_managed_service() returns early if the service is listed in
   $CTDB_MANAGED_SERVICES.

2. ctdb_compat_managed_service() actually now updates
   $CTDB_MANAGED_SERVICES instead of temporary variable $t.

   This means that a subsequent call to is_ctdb_managed_service() will
   short circuit due to optimisation (1).

3. ctdb_compat_managed_service() only adds a service to
   $CTDB_MANAGED_SERVICES if it is the service being checked by
   is_ctdb_managed_service().

   This stops irrelevant services being added to
   $CTDB_MANAGED_SERVICES multiple times by multiple calls to
   is_ctdb_managed_service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 758f4667c60089e09a0439c1eb74f5e426ca5e2e)
2011-08-11 10:46:20 +10:00
Martin Schwenke
6ec2cfc7da 50.samba eventscript should use is_ctdb_managed_service "winbind".
Currently it checks $CTDB_MANAGES_WINBIND directly in several places.
This doesn't work when someone sets $CTDB_MANAGED_SERVICES directly.

This modifies check_ctdb_manages_winbind() so that it return a
condition rather than modifying $CTDB_MANAGES_WINBIND.  This makes
some code more readable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 538902fbc1e74134a03987b36b3733ad641f8971)
2011-08-11 10:46:20 +10:00
Martin Schwenke
e96e655430 50.samba eventscript should use is_ctdb_managed_service "samba".
Currently it checks $CTDB_MANAGES_SAMBA directly.  This doesn't work
when someone sets $CTDB_MANAGED_SERVICES directly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8f0f8948abd340088720718fef7dc858661ba23)
2011-08-11 10:46:20 +10:00
Martin Schwenke
45bcf843ec 50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.

This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.

An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().

To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Conflicts:

	config/events.d/50.samba

	Most of this merged elsewhere.  This just removes a check that
	this is the monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 257a2e350280c0b76ed2fac588cad167381fda52)
2011-08-11 10:46:20 +10:00
Ronnie Sahlberg
21226ee738 Add documentation for the new filesystem use monitoring
(This used to be ctdb commit 9f10c5d48a08ffb3417f880c801aed2aa2dc1355)
2011-08-11 10:07:50 +10:00
Ronnie Sahlberg
ee96db07d5 Add new eventscript 40.fs_use that can be used to monitor file system use and flag a node unhealthy when they become full
(This used to be ctdb commit 2fd1babf8135ad5d53f3b25ba823d840ebc66460)
2011-08-11 10:04:40 +10:00
Ronnie Sahlberg
c8a18e8f9a make the persistent even longer for lvs to make people even happier
(This used to be ctdb commit 8158077624eb763ba40c6a7b4b7faf3867b205d7)
2011-08-11 09:12:38 +10:00
Ronnie Sahlberg
543701293f increase the persistent timeout to make people happier
(This used to be ctdb commit 68ea19cb02017e93769df7f6312d5e0bef55e605)
2011-08-11 07:14:57 +10:00
Ronnie Sahlberg
f9156adef5 check the shares if they are available before we decide to try to restart nfs
CQ S1027529

(This used to be ctdb commit b6c6a4588ccf6ef78fabfd76d228f56b4eb65165)
2011-08-11 07:14:16 +10:00
Martin Schwenke
4e60075228 Eventscripts - fix 10.interface bash incompatibility.
In dash, this fails gracefully with nothing to stderr:

  t=$(cat /does_not_exist) 2>/dev/null

In bash the error from cat is still printed due to different order of
evaluation.

This works everywhere:

  t=$(cat /does_not_exist 2>/dev/null)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a6e61867c7a58d5a77cd8641d8df0b105cddff77)
2011-08-10 16:06:26 +10:00
Martin Schwenke
06f1004da4 Merge branch 'eventscript.20.multipathd' into eventscript.00.ctdb
(This used to be ctdb commit 8723b88b0b2bbeece38c74c77c50e8d8b3e2d5ca)
2011-08-10 15:32:58 +10:00
Martin Schwenke
383b203096 Merge branch 'eventscript.62.cnfs' into eventscript.20.multipathd
(This used to be ctdb commit fb87fa9273db4f82e801a331b5d95059d64dfb8e)
2011-08-10 15:32:11 +10:00
Martin Schwenke
7eae4aafca Merge branch 'eventscript.13.per_ip_routing' into eventscript.62.cnfs
(This used to be ctdb commit cfa4102ec0d97e1d1d3c1ce6407ffacdb85c2e10)
2011-08-10 15:31:13 +10:00
Martin Schwenke
098da255fa Evenscripts: update 61.cnfs to use ctdb_setup_service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit afafeb1fb12384bddff470d38b534f513a1f3b07)
2011-08-10 12:27:41 +10:00
Martin Schwenke
061b7adad6 Evenscripts: update 13.per_ip_routing to use ctdb_setup_service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 18e0236754507a9475653f04bb239c5d46ba51de)
2011-08-09 17:35:37 +10:00