1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-10 01:18:15 +03:00
Commit Graph

1163 Commits

Author SHA1 Message Date
Martin Schwenke
910e138cb3 eventscripts: Remove ganesha support from nfs_check_rpc_service()
This is unused so doesn't need to be maintained.  An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 887733dd7be53158bfe07b30ef31b611d0f8122f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
944d063a3e Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"
This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.

This change is unused and is just complicating the function.

Conflicts:
	config/functions

(This used to be ctdb commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f)
2013-05-06 20:40:58 +10:00
Martin Schwenke
577a3cae5d eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
The code in 60.nfs is going to be genericised, so make all the checks
look the same.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
2013-05-06 20:40:58 +10:00
Martin Schwenke
6c347a5294 eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3)
2013-05-06 20:40:58 +10:00
Martin Schwenke
2bc807f974 eventscripts: Remove unused function ctdb_check_counter_limit()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a8ef00608e48a551a334aded206146807aeb4c5a)
2013-05-06 16:24:59 +10:00
Martin Schwenke
460d0651b6 eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()
ctdb_check_counter_limit() can soon be removed...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1)
2013-05-06 16:24:59 +10:00
Martin Schwenke
8373226251 eventscripts: Might as well try to stat the reclock file first
It is in the background but it still might cause the counter to be
reset before it is checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ef2cf75e95ff382c65524a4d77eb00ab8411d2fc)
2013-05-06 16:24:58 +10:00
Martin Schwenke
31c3edcadf eventscripts: Make the early exit in 01.reclock earlier
That way we don't even check the counter...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 136abd4604dc68f7c696704bac708bae53cf1940)
2013-05-06 16:24:58 +10:00
Martin Schwenke
29a3823e40 eventscripts: Minor cleanups for killtcp/tickle functions
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25ef4f655f1efc833deb5e244f9fff461e92f439)
2013-05-06 16:24:50 +10:00
Martin Schwenke
189a5c003c eventscripts: Tweak the timeout check in kill_tcp_connections()
This has 2 advantages:

1. It uses get_tcp_connections_for_ip() to check for leftover
   connections, instead of custom code.

2. It checks for the timeout condition before sleeping.  The current
   code sleeps and then checks, so wastes a second.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 60a08eb96e1d97aab31e9bd4af01683c650541c2)
2013-05-06 16:22:15 +10:00
Martin Schwenke
8f84a2bec7 eventscripts: In killtcp/tickle functions, $_failed should be boolean
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 319c1b68d5aa78f82a68febcad233a7c78afc887)
2013-05-06 16:22:07 +10:00
Martin Schwenke
ed59deaee3 eventscripts: Remove unused $_killcount from tickle_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8514ca56830b30e7f0eb5018632640daaf8ff65d)
2013-05-06 16:16:56 +10:00
Martin Schwenke
975ea7fb7a eventscripts: Refactor connection listing in killtcp and tickle functions
Uses new function get_tcp_connections_for_ip().  This avoids using a
temporary file and running netstat twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a621622903c7ef17764b15293d6ea8df5a53c7e1)
2013-05-06 16:16:50 +10:00
Martin Schwenke
a320e1f7f1 eventscripts: Reimplement kill_tcp_connections_local_only()
... using kill_tcp_connections()

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 10e4db8f796d1e3259733180494db3b4bbad291a)
2013-05-06 15:45:11 +10:00
Martin Schwenke
5e828b48fe eventscripts: Change handling of one-way kills in kill_tcp_connections()
This change is a no-op.  However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
2013-05-06 15:45:10 +10:00
Martin Schwenke
d98d931af3 eventscripts: Remove unnecessary variables from killtcp/tickle functions
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3eae161472e6352f7f656851c73dc056f95113eb)
2013-05-06 15:45:10 +10:00
Martin Schwenke
6e2863a4f9 eventscripts: Clean up ctdb_check_command()
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
2013-05-06 15:45:10 +10:00
Martin Schwenke
30addb886a eventscripts; Cleanup up ctdb_check_directories()
The documentation comments are wrong... and remove option
$service_name argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
0ad8f46db3 eventscripts: Assert that $service_name is set in a few key places
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d0a7d83ddc824961d876fc9afba829c90aef3e7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
5dd9e52e46 eventscripts: counters default to $script_name if $service_name not set
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fff88940f71058e4eefd65f50a6701389c005c17)
2013-05-06 15:45:10 +10:00
Martin Schwenke
e9abc9c070 eventscripts: Simplify handling of $service name in "managed" functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().

Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
2013-05-06 15:45:10 +10:00
Martin Schwenke
c56acf7127 eventscripts: Simplify handling of $service name in start/stop functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
2013-05-06 15:45:10 +10:00
Martin Schwenke
8065366b33 eventscripts: Simplify handling of $service name in service_management
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
2013-05-06 15:45:10 +10:00
Martin Schwenke
4c9438b2a3 eventscripts: Simplify handling of $service name in reconfigure functions
Complicated argument handling was introduced to deal with multiple
services per eventscript.  This was a failure and we split 50.samba.

This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
2013-05-06 15:45:10 +10:00
Martin Schwenke
642848b916 eventscripts: Remove unused function ctdb_check_counter_equal()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fd536a26b310b5bf9628da62cca0b425f4a54030)
2013-05-06 15:45:10 +10:00
Martin Schwenke
bbd0ed0e29 scripts: Fix script_log() regression
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.

Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
2013-05-06 15:43:16 +10:00
Martin Schwenke
27a5b78c8e initscript: Look for tdbtool/tdbdump using which, not in fixed locations
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c74cc0442eb90d859eae270b59456d28605817c4)
2013-05-06 15:40:30 +10:00
Martin Schwenke
fa16cccf02 ctdbd: Remove the "stopped" event
It isn't used, superceded by "ipreallocated".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
2013-05-06 13:38:21 +10:00
Martin Schwenke
fb028a208c eventscripts: Remove use of "stopped" event
Use "ipreallocated" instead.  The "stopped" event pre-dates the
"ipreallocated" event.  The only way of stopping a node is via the
ctdb tool, which explicitly causes a takeover run to occur after the
node is stopped.  The takeover run will generate an "ipreallocated"
event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 978d4a0d6d8c9877b23f72e3a7b78c1245d16908)
2013-05-06 13:38:21 +10:00
Martin Schwenke
823edbf6fe scripts: Ensure even external scripts get tagged in logs as "ctdbd"
Our practice is to search logs for "ctdbd:".  We want to make sure we
find everything.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93)
2013-04-22 13:58:36 +10:00
Martin Schwenke
fb8be43d6d eventscripts: Ensure directories are created
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.

Create those directories instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0076cfc4666e5a96eb2c8affb59585b090840e00)
2013-04-22 13:58:36 +10:00
Martin Schwenke
903f4c394c scripts: Clean up update_tickles() and handling of associated directory
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 700cf95a1f29b4b88460a00a55d57a9e397011e0)
2013-04-19 13:13:36 +10:00
Martin Schwenke
100a0eed90 scripts: Use $CTDB_SCRIPT_DEBUGLEVEL instead of something more complex
The current logic is horrible and creates an unnecessary file.  Let's
make the script debug level independent of ctddb's debug level.

* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly

* Remove ctdb_set_current_debuglevel()

* Remove the "getdebug" command from ctdb stub in eventscript unit
  tests

* Update relevant eventscript unit tests to use
  $CTDB_SCRIPT_DEBUGLEVEL

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 85efa446c7f5c5af1c3a960001aa777775ae562f)
2013-04-19 13:13:36 +10:00
Martin Schwenke
f54dab03d5 scripts: Ensure service command is in $PATH in ctdb-crash-cleanup.sh
Move the use of the service command below inclusion of functions file,
which sets $PATH.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d254d03f69cbdc3e473202b759af6e1392cbb59c)
2013-04-19 13:12:36 +10:00
Martin Schwenke
d24077922f initscript: Remove duplicate setting of $ctdbd
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e7a4b7e35a1e4b826846e2494a3803abb57065ee)
2013-04-18 13:22:12 +10:00
Martin Schwenke
1f5bfde553 scripts: ctdb-crash-cleanup.sh uses initscript to see if ctdbd is running
"ctdb ping" can time out.  How many times should we try?

Instead, depend on the initscript to implement something sane.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 90cb337e5ccf397b69a64298559a428ff508f196)
2013-04-18 13:22:12 +10:00
Martin Schwenke
38366b6b53 initscript: Use a PID file to implement the "status" option
Using "ctdb ping" and "ctdb status" is fraught with danger.  These
commands can timeout when ctdbd is running, leading callers to believe
that ctdbd is not running.  Timeouts could be increased but we would
still have to handle potential timeouts.

Everything else in the world implements the "status" option by
checking if the relevant process is running.  This change makes CTDB
do the same thing and uses standard distro functions.

This change is backward compatible in sense that a missing
/var/run/ctdb/ directory means that we don't do a PID file check but
just depend on the distro's checking method.  Therefore, if CTDB was
started with an older version of this script then "service ctdb
status" will still work.

This script does not support changing the value of CTDB_VALGRIND
between calls.  If you start with CTDB_VALGRIND=yes then you need to
check status with the same setting.  CTDB_VALGRIND is a debug
variable, so this is acceptable.

This also adds sourcing of /lib/lsb/init-functions to make the Debian
function status_of_proc() available.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 687e2eace4f48400cf5029914f62b6ddabb85378)
2013-04-18 13:22:12 +10:00
Amitay Isaacs
d931e73fb8 statd-callout: Make sure statd callout script always runs as root
In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fe8c4880b371492a38554868d4ca10918c54e412)
2013-04-08 11:14:28 +10:00
Amitay Isaacs
6e650b6ee5 eventscripts: Remove calls to "smbstatus -np" for samba cleanup
This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 053b89c6dbce47001505524606889334559d2ec4)
2013-02-11 11:25:49 +11:00
Martin Schwenke
8c9eedbce3 initscript: export CTDB_EXTERNAL_TRACE
This means it can be set like any other configuration option in the
configuration file, without needing to export it there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a0ef73e197dc9147f7718e0813fe803ff0b3d54d)
2013-02-05 16:05:13 +11:00
Martin Schwenke
bc5f0a2b65 ctdbd: Remove command-line option --debug-hung-script
Use an environment variable instead.  This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.

The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each.  So, the convention
will be to use an environment variable for each debug option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
2013-02-05 16:05:13 +11:00
Mathieu Parent
69afd9abc5 doc: allows to -> allows one to
Signed-off-by: Mathieu Parent <math.parent@gmail.com>

(This used to be ctdb commit 95fc493a7d4145f976cb3fe928d9e92faec4dd71)
2013-01-22 18:03:35 +11:00
Srikrishan Malik
28cbe527d4 Changes for unobtrusive recovery and new method for health check.
Unobtrusive recovery: Ganesha will not be restarted on failovers.

Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>

(This used to be ctdb commit 0e651e9da0f1f3c836b4474612ab13d0ccd272d9)
2013-01-11 17:16:46 +11:00
Martin Schwenke
aca9299669 eventscripts: Fail the setup event if CTDB does not become ready
Currently it silently continues without attempting to set tunables.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 735ec99b99c7bb579851ce8293011aaf1dcc552a)
2013-01-09 12:45:59 +11:00
Martin Schwenke
4f622fe9fb scripts: Make script_log() use supplied message, stop logger from hanging
When using syslog any provided message arguments are ignored and not
passed to logger.  This means that logger blocks waiting on stdin.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50abf597cefe6f8ea2a2ff7694bf84641344a9b1)
2013-01-08 15:18:47 +11:00
Martin Schwenke
095fac9491 scripts: Rework ctdb-crash-cleanup.sh so that it uses existing functions
This improves maintainability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e2aaa64925cca359c71520e01a18fc9461b0da4d)
2013-01-08 15:18:47 +11:00
Martin Schwenke
d801b02681 scripts: Make drop_all_public_ips() more robust
Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.

Factoring out some of the code will allow it to be used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367)
2013-01-08 15:18:47 +11:00
Martin Schwenke
4157efdcbb scripts: debug-hung-script.sh doesn't need functions/loadconfig
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8507303b525d20c74e8ec4e7c4f5f275945cd3b6)
2013-01-08 15:18:47 +11:00
Martin Schwenke
f5226c9a75 scripts: statd-callout should calculate CTDB_BASE if it is not set
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 376015ba5ad6b7703ae9949a1d40a0c72dfaba0c)
2013-01-08 15:18:46 +11:00
Martin Schwenke
297b98d5b6 eventscripts: Each script should set CTDB_BASE if it is not set
This makes it easier to run the scripts externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 740ea8ea5084149c8b552a01ee1c98c558b12384)
2013-01-08 15:18:46 +11:00
Martin Schwenke
0eb757329e scripts: Move drop_all_public_ips() to the functions file
... so it can be improved and used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b23c30253cc9eb274b895cac0f8c65245ba0a200)
2013-01-08 15:18:46 +11:00
Martin Schwenke
217ad07b72 Eventscripts: Change the default reconfigure action to do nothing
A default action of restarting the service doesn't obey the principle
of least surprise.  It cause the NFS service to be implicitly
reintroduced.

This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)
2013-01-07 10:35:39 +11:00
Martin Schwenke
3d408ca1e1 Eventscripts: Do not restart NFS on reconfigure
It looks like this restart was accidentally reintroduced in commit
fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure
became unset so the default action of restarting the service would
occur.  From there cleanups have explicitly reintroduced it and
carried it through the code.

Also update the unit tests affected by this change.

The restart was originally removed in commit
bc481c3f1a44c50648488c4f8a7f15ec395d446f.

The default reconfigure action of restarting a service is clearly
suboptimal and will be addressed in a separate patch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37)
2013-01-07 10:35:39 +11:00
Martin Schwenke
df7152fe87 Initscript: when checking status, print output of "ctdb ping" if it fails
At the moment the caller has no idea why it thinks CTDB isn't running
and we can't debug failures...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 776590bf84d221092298346a28d7fc0552a67c9d)
2013-01-07 10:35:38 +11:00
Michael Adam
b64e237f9b events/50.samba: fix testparm background update
creating the smb.conf cache with "-v" results in a cache file
that fails to load with "testparm -s ..." later on due to
"copy = " not being processable. (Copying the empty service name fails).

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 81788cfabe960497b050c5ee4e4e487ee061012a)
2013-01-05 01:15:19 +01:00
Martin Schwenke
8fad7670f1 Eventscripts: 10.interface should list configured interfaces
The current code lists available interfaces.  If IPs are configured in
some other way than the public addresses file (e.g. ctdb addip) and their
interfaces default to being marked down then, since down interfaces are
not available, these interfaces can never be marked up.

The configured interfaces should be listed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8f010355b715e49709836e057a5d0f110919897)
2012-11-19 15:54:50 +11:00
Martin Schwenke
f082f4006f Eventscripts: 10.interface startup event should only process interfaces once
Provided that monitor_interfaces() sets the state of each interface,
there's no need to mark all interfaces as up before running
monitor_interfaces() in the startup event.  monitor_interfaces() will
set the true status of each interface anyway.  The duplication is
unnecessary and may cause extra action in the recovery daemon because
the state of some interfaces is changed an extra time.

Instead, add a comment at the top of the loop in monitor_interfaces()
to warn against early loop exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f243a916ee71013f7402b9c396c2ead88eb3aab0)
2012-11-14 10:57:48 +11:00
Volker Lendecke
295dfa771a Avoid a bashism in 60.ganesha
This file is #!/bin/sh. On sn-devel at least, with this /bin/sh the
shell does not like == for string equality.

(This used to be ctdb commit e2213db479129ce9c2b2fb88ec8c53cbd33d54b3)
2012-10-24 18:31:16 +11:00
Martin Schwenke
9f6b30a517 scripts: Refactor logging code in initscript and functions file
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ee242c949a98bb7397e0f7368b20d44c06fe772)
2012-10-18 20:05:43 +11:00
Martin Schwenke
ad8eb45fe2 initscript: Check that rc.ctdb is executable before running it
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 59a47c0674bacfebc17a1b44f0244727bf2fa7a4)
2012-10-18 20:05:43 +11:00
Martin Schwenke
66d0aba85b Revert "Eventscripts - add facility to 10.interface to delete unmanaged IPs"
This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099.

This is dangerous and, on reflection, I can't see it being useful.
There are often permanent IPs on interfaces that CTDB shares with its
public IPs.

(This used to be ctdb commit 16aba4eb620844626a1c71c58b51658caf44dea6)
2012-10-18 20:05:42 +11:00
Martin Schwenke
34a6c07e99 Eventscripts: "recovered" event should not fail on NATGW failure
The recovery process has no protection against the "recovered" event
failing, so this can cause a recovery loop.

Instead of failing the "recovered" event, add a "monitor" event and
fail that instead.  In this case the failure semantics are well
defined.

A separate patch should ban nodes if the "recovered" event fails for
an unknown reason.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eaa7c165f58abd7e259c37d76b7dd37c91e13d9f)
2012-10-18 20:05:42 +11:00
Martin Schwenke
8d7562f3f8 common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace.  If we can reproduce it then this might
help us to debug it.

The idea is that you do something like the following in /etc/sysconfig/ctdb:

  export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"

When we hit this error than we call out to gcore to get a core file so
we can do forensics.  This might block CTDB for a few seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
2012-10-18 20:05:42 +11:00
Michael Adam
6372592982 config/functions: fix a comment
ctdb_check_counter_limits does not fail but succeed if count >= limit

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)
2012-10-17 21:56:58 +02:00
Amitay Isaacs
cc763c455d doc: Add info about execute permissions on event scripts
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 25d886060b138bc5e78fe93d7bebe3990264f29d)
2012-10-17 11:39:39 +11:00
Amitay Isaacs
efe77d0e35 doc: Fix documentation for setup event
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 36d25e96a2f8ae1461c5a708a2922f0475a39900)
2012-10-17 11:39:39 +11:00
Amitay Isaacs
ce210f6978 scripts: Remove duplicate code from init script to set tunables
The tunable variables defined in CTDB configuration file are currently
set up from init script as well as part of "setup" event in 00.ctdb
eventscript.  Remove the duplication of this code and set tunable
variables only from setup event.  During the "setup" event, it's possible
that ctdb tool commands can timeout if CTDB daemon is not ready.  To guard
against such eventuality, wait till "ctdb ping" command succeeds before
executing any other ctdb tool commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 632c1b9c1cc2e242376358ce49fd2022b3f27aa2)
2012-10-17 11:32:41 +11:00
Martin Schwenke
74843dadad Eventscripts: Add support for "reconfigure" pseudo-event for policy routing
This rebuilds all policy routes and can be used if the configuration
changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c185ffd2822fcee26d07398464c59b66c61f53fa)
2012-10-11 12:10:45 +11:00
Martin Schwenke
d33b12a1c5 Eventscripts: Add service-start and service-stop pseudo-events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)
2012-10-10 14:54:53 +11:00
Martin Schwenke
2d719e5c84 eventscripts: Auto-start/stop services in background
If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done
in the background with logging.

Fix some unit tests for samba and winbind.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
2012-10-03 08:48:23 +10:00
Martin Schwenke
f3ae31e741 Eventscripts: split 50.samba into 49.winbind and 50.samba
winbind and samba can be separately managed.  This makes the service
starting and stopping code way too complicated, and even adds a small
amount of complexity to the monitoring code.  The sensible option is
to split this eventscript in two.

There are two potentially backward incompatible changes here:

* Functionality has been removed that allowed 50.samba to manage
  winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf
  "security" parameter was set to "ADS" or "DOMAIN".

  Maintaining this functionality would have required moving the
  testparm-related code to the functions file, deciding where the
  cache file should go, and then calling it from both 49.winbind and
  50.samba.  This feature wasn't of great value and asking
  administrators to set an extra variable in exchange for code
  simplicity seems like a reasonable deal.

* External code will need to be changed if it calls 50.samba directly
  with winbind-related expectations.  This is fairly obvious!

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)
2012-10-03 08:46:32 +10:00
Martin Schwenke
e2d4250731 Initscript: Kill any existing ctdbd processes if the ping succeeds
Initialising a new ctdbd will destroy the Unix domain socket so
existing processes will be useless anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 043ef77086797a703aec436a26a05c56a1bcbf2b)
2012-10-02 17:37:53 +10:00
Martin Schwenke
530415b671 Eventscripts: Indent error when a route delete fails in 11.per_ip_routing
This puts it under the umbrella of the previous warning that should
also have been printed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958)
2012-09-11 12:52:22 +10:00
Martin Schwenke
0d35a8c439 eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocated
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0d0a6f19960f233224970b8d5d19b0e37222616)
2012-09-11 12:52:22 +10:00
Martin Schwenke
e1348221d6 eventscripts: Print a warning on failure to delete a routing rule
del_routing_for_ip() currently fails silently, which could hide real
errors.

In add_routing_for_ip() we don't want to see any error when calling
del_routing_for_ip(), since we don't expect the rule to be there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30d69defa7e97ab5e3ba0492a27868dde2616494)
2012-09-11 12:52:22 +10:00
Martin Schwenke
5bbf4b6e30 Eventscripts: 13.per_ip_routing should always fail if config is missing
Currently, if the configuration file is specified by
$CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the
absent) monitor event "succeeds", so the state of a node will
flip-flop.

Instead of this, if the configuration file is missing then fail early
on for all events.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c64c6c77c3f6aa2898e5a575547b587bea868c76)
2012-07-30 15:57:56 +10:00
Martin Schwenke
ff0830037e Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is missing"
When the configuration file is missing this causes the node to
flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor
event here).

Will reimplement this properly.

This reverts commit 351ca413eec460330571ca8b01ad269728fe15df.

(This used to be ctdb commit 5277d749c9111716fd723647d5421907476422bf)
2012-07-30 15:57:56 +10:00
Martin Schwenke
35de9f2583 Eventscripts: Clean up 11.routing
The loops can all be done without cat or grep.

The pair of loops in updateip is combined into a single loop.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 96fdda124f5511fb76190e7c7a7f0b98e6b01a31)
2012-07-30 11:24:59 +10:00
Martin Schwenke
748c3d7eb6 Initscript: clean up drop_all_public_ips()
This makes the case implicit where $CTDB_PUBLIC_ADDRESSES is unset.
This is OK because that's not an interesting code path.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b2725d1ae052e848c2487cb10c5393a877d118c)
2012-07-26 22:05:43 +10:00
Martin Schwenke
4d4768ef26 statd-callout: Fix a bug in the calculations of $STATE
It is just meant to be even, so divided *and* multiplied by 2.  Use
$(( )) to make it more readable.

While touching this code, make the related calculation a bit more
readable too.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25d45e69f4ffc2b26061ac13038d52a353e79e61)
2012-07-26 21:24:15 +10:00
Martin Schwenke
6717698cba Eventscripts: Default route on NAT gateway should have a metric of 10
At the moment routes from 11.routing can fail to be added because they
conflict with the default route added by 11.natgw.

NAT gateway is meant to be a last resort, so routes from 11.routing
should override it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 624f4677e99ed1710a0ace76201150349b1a0335)
2012-07-26 21:14:58 +10:00
Martin Schwenke
31bdf91933 Eventscripts: Update/remove stale comments in 11.natgw
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d713d5e5be67f5914a661694c15d938bd67dea3)
2012-07-26 21:14:58 +10:00
Martin Schwenke
05359689f6 Eventscripts: Retrieve and build NAT gateway details better in 11.natgw
* "ctdb natgw" is run twice when it doesn't need to be.

* Tweak the parsing of "ctdb natgw" output so that it is done by the
  shell instead of a bunch of external processes.

* Make default NAT gateway be -1, even on error.  If the process
  failed entirely then it could previously be empty.

* Streamline the error handling using die() for when there is no NAT
  gateway.

* Downcase script-local variable names.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 630cfe6451ba23d959fa4907fbba42702337ed3b)
2012-07-26 21:14:58 +10:00
Martin Schwenke
e7325ebcd5 Eventscripts: Optimise building the host address in 11.natgw
It can be build without forking unnecessary processes.

Also downcase variable name because it is local to script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 34f58a0773618c4508a55ad75fc4602dad5a5f4c)
2012-07-26 21:14:58 +10:00
Martin Schwenke
9a7a199132 Eventscripts: Clean up startup sanity check in 11.natgw
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f6e421e8bf935cae790a6dc2b861eb9c7f8610b4)
2012-07-26 21:14:57 +10:00
Martin Schwenke
573fb0497a Eventscripts: remove redundant firewall rules from 11.natgw
aeb70c7e7822854eb87873a5c7783e27e6e72318 said it moved these but it
redundantly duplicated them instead.  That commit also fixed the
problem because it moved the rules after delete_all() not out of the
startup event as claimed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 07149edaecb3caa672163e5a3b89715557d5205a)
2012-07-26 21:14:57 +10:00
Martin Schwenke
c0b7fbf2a4 Eventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation
$CTDB_NATGW_PUBLIC_IP can be split into $_ip and $_maskbits without
forking lots of processes.

Also "local" isn't supported by POSIX.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e20fdb974158061f4627d6f360c168d764690e6f)
2012-07-26 21:14:57 +10:00
Martin Schwenke
1ba9fa2e48 Eventscripts: Fix deprecated iptables ! usage
This currently causes warning in the logs.

This change is not SLES10-compatible but we already have some other
non-SLES10-compatible changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7640352c6697f9d4e0d13afbc8523afc64e7d462)
2012-05-25 15:26:07 +10:00
Ronnie Sahlberg
383711ac82 Merge remote branch 'martins/ganesha'
(This used to be ctdb commit f23b5a160184db8c92f8c69307dc4a64adae839d)
2012-05-17 11:48:07 +10:00
Ronnie Sahlberg
dce5969d12 Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung.
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.

S1037271

(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
2012-05-17 10:29:03 +10:00
Martin Schwenke
835e0b6d49 Eventscripts: Modernise 60.ganesha to match 60.nfs
Originally from Srikrishan Malik <srikrishan.malik@in.ibm.com> with
some style changes by me.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 637cab6304dae66b85668506028c76ea1ee88980)
2012-05-16 17:24:21 +10:00
Martin Schwenke
ffbe59bd44 Eventscripts: restart lockd in the background when going unhealthy
Sometimes the restart can hang when there are I/O problems.  Then the
eventscript times out and gets killed so the node never marked as
unhealthy.

Restarting in the background avoids this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 13acd58c41fba1a33894fbd654fed69ea0eac322)
2012-05-16 17:19:55 +10:00
Martin Schwenke
92eb004162 Eventscript functions: add optional version to nfs_check_rpc_service()
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 92f74fd589467b46c758e116e97417edfe8773d7)
2012-05-16 17:05:05 +10:00
Martin Schwenke
fd048a1771 Eventscript functions: add optional version to nfs_check_rpc_service()
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1957d53b78f101cd0cd37d9705a225deef5174a2)
2012-05-11 10:33:27 +10:00
Martin Schwenke
0c8f785628 Eventscripts: fix basename -> dirname typo
I fixed one of these previously but didn't notice this one...  :-(

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0c674efd19368d41d9cc28909d2b16c1af54c86c)
2012-04-27 15:42:42 +10:00
Martin Schwenke
012015b32c Eventscripts - Fix typo in 13.per_ip_routing support for __auto_link_local__
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9542e770a9780740b49122f1f52f08b32eca4b35)
2012-04-27 15:40:43 +10:00
Martin Schwenke
a3ee4a900f Initscript - add backup of corrupt non-persistent databases
Corrupt non-persistent databases never get analysed because ctdbd
zeroes them at startup.

Modify the initscript so that corrupt non-persistent databases are
moved aside to a backup.  If the number of backups for a particular
database exceeds $CTDB_MAX_CORRUPT_DB_BACKUPS (default 10) then the
oldest excess backups are garbage collected.

Abstracts from and cleans up the code for checking persistent
databases.

Logging of related messages is done to syslog or a log file as
specified.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00cd75595685dae829758abf1a4cb644af7ed50e)
2012-03-28 15:02:07 +11:00
Martin Schwenke
2f5cb56017 Eventscripts - make 13.per_ip_routing fail gracefully if config is missing
Currently it spews out random messages about the file being missing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 351ca413eec460330571ca8b01ad269728fe15df)
2012-03-22 15:30:27 +11:00
Martin Schwenke
ac973b34df Eventscripts - make 13.per_ip_routing try harder to find public_addresses
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d4621277240721e6d130a930b0100506b64467ea)
2012-03-22 15:30:27 +11:00
Martin Schwenke
020c8190c5 Eventscripts - use set_proc() rather than accessing /proc directly
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bdb4cdaf2aed79c8de6a8db8c01685b242808310)
2012-03-22 15:30:27 +11:00
Martin Schwenke
4f65737809 Eventscripts - 13.per_ip_routing should use dirname not basename for mkdir
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d034845ecea66b47004bc73f2554914a397b1c9d)
2012-03-22 15:30:27 +11:00
Martin Schwenke
56d90e930d Eventscript support - Remove unused interface_modify.sh
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 994492f79275fe84155d842f6bc288c1858217dd)
2012-03-22 15:30:27 +11:00
Martin Schwenke
476cf45049 Eventscript functions - no longer require interface_modify.sh
Make add_ip_to_iface() and delete_ip_from_iface() do their own locking
so the external script is no longer required.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93f90caf91246074d9359bf31a39b26212cccc42)
2012-03-22 15:30:27 +11:00
Martin Schwenke
0b2c3d7d24 Eventscript functions - remove now-unused route/IP re-add script logic
This is no longer used by 13.per_ip_routing or anything else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2a2ea6c61a05af2d0765e964abcc7ef04047431e)
2012-03-22 15:30:26 +11:00
Martin Schwenke
940efdb8e9 Eventscript functions - remove functions only used by 13.per_ip_routing
The relevant functions are now in that script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45c3476d12bf0f52966b72d286f101fce1382cd2)
2012-03-22 15:30:26 +11:00
Martin Schwenke
95e10b20cb Eventscripts - redesign and rewrite 13.per_ip_routing
The current version is quite difficult to read.  This one is hopefully
clearer.

Major changes:

* The configuration file has a more forgiving syntax.  Items can be
  separated by arbitrary whitespace.

* Mappings between IP addresses and table IDs are no longer stored in
  files in a state directory.  Instead they are stored in
  /etc/iproute2/rt_tables as mappings between table IDs and labels, as
  allowed by the ip command.  The current structure of the labels is
  ctdb.<source-ip>.  This means that once the labels are setup the
  routing tables can be referenced by just knowing the source IP.  As
  with the old state directory, mappings in this file owned by CTDB
  are deleted when CTDB shuts down.

* There are no release or re-add scripts.

  - Release scripts are not necessary as an optimisation because of
    the previous improvement (i.e. use of rt_tables).  No lookup is
    necessary to delete rules or flush tables.

  - Re-add scripts are no longer used.  Routes can still go missing
    when removal of a primary IP from an interfaces (or similar)
    causes removal of all other addresses (i.e. secondaries) and also
    all associated routes.  However, any missing routes are now
    re-added in the "ipreallocated" event.  This happens shortly after
    takeip/releaseip/updateip and means that the routes will only be
    re-added once.  The window for missing routes is slightly bigger
    but is not expected to be significant.

* The magic "__auto_link_local__" configuration value no longer causes
  a dynamic configuration file to be maintained in a state directory.
  The link local configuration is now generated when needed from the
  public_addresses file.  This greatly simplifies the code.  This
  approach is slightly less efficient but should not be significant.

The above changes mean that, apart from maintaining mappings in the
rt_tables file, there are no state files kept anymore.

Some utility functions only used by this script have been rewritten
and moved into this script.  They will be removed from the functions
file by a future commit.

The route re-add code will also be removed from interface_modify.sh by
a future commit.  It is currently harmless.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0f7cbbb55f26cf3c953e98fe5e7eaa12f59fbf78)
2012-03-22 15:30:26 +11:00
Martin Schwenke
0d67779c67 Eventscript functions - add new function die()
Args:

1. Error message to be printed.

2. Option exit code (default 1)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 97b0c138cb97e30db27c40b4ee1481109ae90c78)
2012-03-22 15:30:26 +11:00
Ronnie Sahlberg
81fb334cff when shutting down ctdb, allow it 30 seconds instead of 10 before will -9 the daemon
(This used to be ctdb commit d8b400d76665f37ffd9de302eedcff9f23807225)
2012-02-21 19:02:36 +11:00
Ronnie Sahlberg
e3b85bba3f Add a hoook to the ctdb initscript that we can call out to for applications that want to
track and produce audit logs when someone runs "service ctdb <something>"

S1033891

(This used to be ctdb commit 4f4fbd4080a3a7226d3b82637f803c4b71217d39)
2012-02-06 12:07:08 +11:00
Mathieu Parent
956f06f3ae Fix ctdb-crash-cleanup sysconfig handling
(This used to be ctdb commit 667b174d605646b53f4855e9aaf5f8ce4fdde532)
2011-12-06 11:55:46 +11:00
Martin Schwenke
162ac70f9e Eventscripts - add facility to 10.interface to delete unmanaged IPs
For a number of reasons (delip failure, admin stupidity, ...) an
interface that hosts public addresses can also contain spurious,
unmanaged addresses.

Add functionality to 10.interfaces, controlled by new configuration
variable CTDB_DELETE_UNEXPECTED_IPS, to delete these addresses when
encountered as part of a monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 88f88d86b0d08240f749fb721b8c401c2eeb1099)
2011-11-17 16:47:00 +11:00
Martin Schwenke
ba5e5f51cf Eventscripts - remove $0 from error messages in 40.fs_use
The script name is now prepended to output by ctdbd.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bfa0fe70db195413a6d7a98f46f7a1270aba678c)
2011-11-16 16:26:49 +11:00
Martin Schwenke
9187db869e Eventscripts: Make 40.fs_use use less processes and arguably clearer.
* $fs can be parsed using shell prefix and suffix removal.

* df output can be parsed with a single call to sed.

  Failure is indicated by empty output from sed, so we check for that
  as the error condition, changing the associated message
  appropriately.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c5ef0d1440f1d952784cc67946c414d149722d01)
2011-11-16 16:26:45 +11:00
Mathieu Parent
4617a0e9cf config can be in /etc/default/ instead of /etc/sysconfig/
... on Debian system and derivated.

(ctdb_diagnostics still hardcodes /etc/sysconfig/)

(This used to be ctdb commit 1341329f6125d491b82c873f793af819e677f714)
2011-11-08 16:31:15 +11:00
Mathieu Parent
91431262be config/functions: CTDB_VARDIR is /var/lib/ctdb on Debian-like systems
(This used to be ctdb commit 56160eccb62178f645b017b1257677a1e854b2bc)
2011-11-08 16:31:03 +11:00
Mathieu Parent
0250b72a65 Fix bashism in 40.fs_use
Also, add -P to df, to avoid multiline on Linux when device name is long (this is the case with LVM)

(This used to be ctdb commit f4d5a5810f1a840a41c3541a3b822fce44d41e9a)
2011-10-12 20:08:40 +11:00
Mathieu Parent
a1919fd316 apache's service name is not always httpd
Solution 2 of <https://bugzilla.samba.org/show_bug.cgi?id=8317>

(This used to be ctdb commit 8b9ac5cd8d867ff4866ac464c570d9293d03a91e)
2011-10-12 20:07:45 +11:00
Mathieu Parent
7f1ff4dbd8 Less verbosity when there is no public addresses file
This partialy reverts 81eff51, but still avoid spam.

(This used to be ctdb commit e646142f4d28b5401235cd5edee325f7a29f8193)
2011-10-12 20:07:03 +11:00
Martin Schwenke
205c7c7663 Eventscripts - enhance ctdb_replay_monitor_status()
Print useful output and return a suitable exit code.

The DISABLED and TIMEDOUT statuses use fake negative return codes, and
these can't be faked from the shell.  So we map DISABLED to OK and
TIMEDOUT to ERROR - this should avoid nearly all surprises.  When we
do this we add a note to the beginning of the output.  The alternative
is to "fix" ctdbd to use only codes that can actually be returned by
shell scripts.  However, the reason for using negative codes is
probably to distinguish them from real ones...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dda44d026e0c1b02feb02185b8c200a542be341a)
2011-08-31 15:34:43 +10:00
Martin Schwenke
aa64622137 Eventscripts - use ctdb scriptstatus -Y when replaying status
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5be904fb1fbd546618d25509b41ab836db62a70a)
2011-08-30 16:34:43 +10:00
Martin Schwenke
b97625acb6 Eventscripts: add a synchronous synthetic reconfigure event.
In the current code services can only be reconfigured asynchronously.
This means that configuration file changes can be made, an asychronous
reconfigure event can be triggered, and it always succeeds.  Some time
later when a service is actually reconfigured then a failure may be
seen

This adds a synthetic reconfigure event that reconfigures a service
synchronously so that any failure is reported on exit.

ctdb_service_check_reconfigure() is essentially reimplemented.

If a reconfigure event is in flight and an ipreallocated or monitor
event occurs then any scheduled asynchronous reconfigure is deferred
until the next monitor cycle.  This is to avoid reconfigures trampling
on each other.  In this case a monitor event will also replay the
previous status to try to avoid exposing any temporary instability.

If a reconfigure event collides with another reconfigure event it will
exit with status 2, indicating that the reconfigure should be retried.

The reconfigure event is implemented using a subprocess to control the
exit from the synthetic event.

As before, if a monitor event causes a scheduled synchronous
reconfigure to occure then it will replay the previous status for the
service, given that a reconfigure can cause temporary instability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 220578bfd3507152b29ba4c28942f9d5e8733886)
2011-08-30 14:29:48 +10:00
Martin Schwenke
94c3429567 Eventscripts - call ctdb_check_args() in 00.ctdb
This is the first eventscript.  Sanity check as early as possible and
everyone benefits.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0564717fcc1e21688ae5dacbd437fd493bcb8853)
2011-08-30 09:33:47 +10:00
Martin Schwenke
bc4e62be85 Eventscripts - call ctdb_check_args() instead of doing hand checking
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc5bc1948dcbe8b8b25185260927b94a4b529174)
2011-08-30 09:33:47 +10:00
Martin Schwenke
7980a4cb44 Eventscripts - new function ctdb_check_args()
Pass this "$@" to do common eventscript argument checking.

For regular use putting this in 00.ctdb would be enough.  However, for
developer testing it can be useful to call this in other eventscripts.
For example, 10.interfaces and 13.per_ip_routing currently check these
by hand.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de7e7fd6dfeed61ef9977b8d5b568f90a9707b)
2011-08-30 09:33:47 +10:00
Martin Schwenke
63729fc35d Eventscripts - ctdb_check_tcp_ports() bug fix.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8d9c0b251c84d6fdf6ea7d972e5f7d1d0222f9b)
2011-08-30 09:33:47 +10:00
Martin Schwenke
194de8faf8 Eventscripts - fix debugging buglet in ctdb_check_tcp_ports_ctdb()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 61000e38d6016e58f67e292393756d0bd5262ae5)
2011-08-30 09:33:47 +10:00
Martin Schwenke
9257b57f2c Eventscripts: New configuration variable CTDB_SERVICE_AUTOSTARTSTOP.
Some of the current auto-start/stop logic is broken, particularly for
Samba.  Fixing it is non-trivial.

If $CTDB_SERVICE_AUTOSTARTSTOP is "yes" then auto-start/stop services
when told to newly manage or no longer manage them.  This defaults to
"yes".

However, if using a canned configuration file that doesn't set
$CTDB_SERVICE_AUTOSTARTSTOP then this stops the auto-start-stop logic
from working.  Therefore, this works around CQ S1026685 - on the
system in question another daemon controls service auto-start/stop and
CTDB just gets in the way.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ef71b8290ae49117d7bcc7166598b77cb64cc8a0)
2011-08-30 09:33:47 +10:00
Martin Schwenke
54402cdff4 Eventscripts - in 60.nfs uniquify the share check directory list
There are sites that have multiple entries for the same export.  This
optimises the share check in this case.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1ccdae79b64b236fc27f4653606429d73c9c3595)
2011-08-30 09:33:47 +10:00
Ronnie Sahlberg
02ebd35398 Merge remote branch 'martins/eventscripts'
(This used to be ctdb commit bb008c01989ebb173a3f095ebd2f90ab54f9da91)
2011-08-17 14:10:04 +10:00
Martin Schwenke
6e7dbf0543 Eventscripts - new default TCP port checker using "ctdb checktcpport"
New function ctdb_check_tcp_ports_ctdb().  This should be fast... and
is now the default checker.  If it fails in an unexpected way we fall
back to the nmap and netstat checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1e16a707ce204817531a61455000361f972080a)
2011-08-17 14:02:45 +10:00
Martin Schwenke
1374327f6e Eventscripts - generalise TCP port checking plus new nmap-based checker
Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().

Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.

ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try.  Default
value is currently "nmap netstat".  If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed.  This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9651175b40b9454e7d4e98291955fcf1445085e)
2011-08-17 12:12:20 +10:00
Martin Schwenke
62f654d3d2 Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging
Use the new debug function to conditionally print the netstat output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44c14aeeb11080980fe07c7396d06843a4870747)
2011-08-17 10:39:54 +10:00
Martin Schwenke
86792724a2 Eventscripts - weaken TCP port check message if CTDB has just been started.
Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started.  The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.

This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service.  When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message.  This means that until the node actually becomes
healthy we see more friendly messages.

The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started.  This reduces the chances of people reporting such
false recreates...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 571865eb6ef847857129d0b1e2ba5fa7254bfe8c)
2011-08-17 10:39:53 +10:00
Martin Schwenke
5c9fbb55ce Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port.  There are 2 problems with this:

* Netstat is run on each loop iteration when it need only be run once.

* The -a option is used to list all connections but the function only
  cares about the listening ports.  There may be many thousands of
  non-listening ports to grep through.

This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option.  It also only runs netstat once before the
main loop.

When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 830355a8b18c53cfcc3ad1e3009bbb1a7a681fa0)
2011-08-17 10:39:53 +10:00
Martin Schwenke
f0f9271301 Eventscripts: add a debug() function and call ctdb_set_current_debuglevel()
The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG).  If no args are given
then use stdin - this allows the function to be used with here
documents.

To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6143483d9f87322578c00f12081e381f425226ca)
2011-08-17 10:39:35 +10:00
Ronnie Sahlberg
ce4555b7a6 dont use a too big persistence timeout value
(This used to be ctdb commit 82628e32c431d66b806399ffb9657c3a031f6428)
2011-08-17 10:00:06 +10:00
Martin Schwenke
3e1a0528b8 Eventscripts - conditionally inherit ctdbd debug level in each monitor event
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a7eebc06f81a7b0a3fba93759bcbdeabc8c2e86e)
2011-08-17 09:14:23 +10:00
Martin Schwenke
171bef3d68 Eventscripts - new function ctdb_set_current_debuglevel()
This function ensures that CTDB_CURRENT_DEBUGLEVEL is set.  It works
like this:

1. If it is already set then do nothing, since it might have been set
   some other way.

   The recommended "other way" would be to add a file in rc.local.d/.

2. If it is not set then set it by sourcing
   /var/ctdb/eventscript_debuglevel.

3. If this file does not exist then create it using output from "ctdb
   getdebug".

If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.

If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 93910921c8a25f2b029733cd938069ff7c7bdab7)
2011-08-17 09:00:46 +10:00
Martin Schwenke
430ca2f606 Eventscripts - ensure the statd update-trigger file always exists.
See the comment in the code for details.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ee9856996a8ec738e9d3ea7f1561605da526b8c)
2011-08-16 13:28:40 +10:00
Martin Schwenke
1452b63d27 Eventscripts: remove "return 0" from 50.samba service_stop().
This potentially masks errors and was basically included by accident.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e7e4a1b4f31118027fd13a6223192f9957cf2e74)
2011-08-16 13:18:40 +10:00
Ronnie Sahlberg
81292ac0e6 Change the errors for 10.interface to clearly state ERROR: for error messages
Update the tests system to catch the new error strings generated by this change

(This used to be ctdb commit a2c30d88348da47d1a733a16e4c7d83c3becb6df)
2011-08-15 15:53:04 +10:00
Ronnie Sahlberg
1fb577f4b2 Merge remote branch 'martins/eventscript.10.interface'
(This used to be ctdb commit 0d17daab38d4086f922a8006d4c545133adca191)
2011-08-15 15:27:50 +10:00
Ronnie Sahlberg
bc00292cfe Merge remote branch 'martins/60_nfs_regression'
(This used to be ctdb commit 845fb0ba24cf9118470c58fae7103ab8322ce079)
2011-08-15 15:22:20 +10:00
Martin Schwenke
c9d168bbe4 Eventscripts: 10.interfaces - make startup event actually mark interfaces up!
The startup event intends to mark interfaces up.  However, it doesn't
actually do that because $INTERFACES is empty.

This uses the function get_all_interfaces() to list the
interfaces... and then mark them up.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fc62bf0975c6059ee467285565d0dc3b4daaf238)
2011-08-12 16:34:34 +10:00
Martin Schwenke
5ab955a73d Eventscripts: 10.interfaces - startup comment says assume all interfaces good.
Interfaces are currently marked down.  Mark them up instead, as per
the comment... and discussion with Ronnie.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 35942841229cc72ce363a7236aec708f1a33136b)
2011-08-12 16:34:34 +10:00
Martin Schwenke
e7963d8a65 Eventscripts: 10.interfaces - new function get_all_interfaces().
Move existing interface listing code to new function in preparation
for using it in startup event.

While we're here change the "sort | uniq" into "sort -u" and save some
complexity.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cd1442531ad079b11c60f46ee9d34f5104bef219)
2011-08-12 16:34:34 +10:00
Martin Schwenke
9bdcdb76be Eventscripts: 10.interface clean-ups - minor tweaks and new comments.
* sed can read files, it doesn't need a file piped to it
* use $() subshells instead of `` - they seem to quote better in dash
* tweak the uniquifying code so that it is easier to read
* add comments
* remove some extraneous semicolons at ends of lines

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5f49537889a92c3cb68d9203912188bedf00ecd4)
2011-08-12 16:34:13 +10:00
Martin Schwenke
32fe247e37 Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.
This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 380c9263eb37db5a250264316e250c2160908263)
2011-08-12 16:28:09 +10:00
Martin Schwenke
7c33fb1711 Eventscripts: 10.interface clean-ups - variable name fix-ups.
Change most of the uppercase variable names to lowercase for
consistency with other variables, readability and so they can be
easily distinguished from environment/configuration variables.  Change
the name of 2 of the variabless to add some clarity.  Changes are as
follows:

  INTERFACES   -> all_interfaces
  IFACES       -> ctdb_interfaces
  IFACE        -> iface
  I            -> i
  REALIFACE    -> realiface

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b201c1087b1433cfbc95de76cb4205e484ccd6f)
2011-08-12 15:57:34 +10:00
Martin Schwenke
6fa27bdf18 Eventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces().
The logic in the monitor event itself is very complex.  Nearly all of
it can go away by adding a single check of
$CTDB_PARTIALLY_ONLINE_INTERFACES to the return logic of
monitor_interfaces() and reversing the sense of the corresponding
check.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fa93177442c65c2a4eb2d5d5dba0a0da1c486969)
2011-08-12 15:00:03 +10:00
Martin Schwenke
00c4cc6d22 Eventscripts: 10.interfaces clean-up - use more descriptive variable names.
The name of variable $ok gives no clue to its meaning/use so this
changes that variable to be named $up_interfaces_found.

The return logic relating to $ok and $fail is difficult to read, so
these variables are given true/fale values, allowing the return logic
to be simplified.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3402930319d462eab5525410f6a676952e120182)
2011-08-12 14:49:27 +10:00
Martin Schwenke
bb5db84021 Eventscripts: 10.interfaces cleanup - new functions mark_up(), mark_down().
The same few lines of logic are used every time an interface up or down.

This encapsulates those few lines in 2 new functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ab443c4d7d282f282792abc6a6ac224ab06abe30)
2011-08-12 14:43:15 +10:00
Martin Schwenke
1d71dd08e3 Eventscripts: change failure counts and behaviour for statd and nfsd.
We reduce the number of failures before attempting a restart.
However, after 6 failures we mark the cluster unhealthy and no longer
try to restart.  If the previous 2 attempts didn't work then there
isn't any use in bogging the system down with an attempted restart on
every monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f654739080b40b7ac1b7f998cacc689d3d4e3193)
2011-08-12 14:16:17 +10:00
Martin Schwenke
398116ff29 Eventscripts: clean up 60.nfs monitor event.
This adds a helper function called nfs_check_rpc_service() and uses it
to make the monitor event much more readable.  An example of usage is
as follows:

  nfs_check_rpc_service "mountd" \
    -ge 10 "verbose restart:b unhealthy" \
    -eq 5 "restart:b"

The first argument to nfs_check_rpc_service() is the name of the RPC
service to be checked.  The RPC service corresponding to this command
is checked for availability using the rpcinfo command.  If the service
is available then the function succeeds and subsequent arguments are
ignored.

If the rpcinfo check fails then a failure counter for that particular
RPC service is incremented and subsequent arguments are processed in
groups of 3:

1. An integer comparison operator supported by test.
2. An integer failure limit.
3. An action string.

The value of the failure counter is checked using (1) and (2) above.
The first check that succeeds has its action string processed - note
that this explains the somewhat curious reverse ordering of checks.

It the example above:

* If the counter is >= 10 then a verbose message is printed
  describing the failure, the service is restarted in the background
  and the node is marked as unhealthy (via an "exit 1" from the
  function).

* If the counter is == 5 then the service us restarted in the
  background.

For more action options please see the code.

This also changes the ctdb_check_rpc() function so that it no longer
takes a program number to check.  It now just takes a real RPC program
name that rpcinfo can resolve via /etc/rpc.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9b66057964756a6245bafb436eb6106fb6a2866e)
2011-08-12 14:16:14 +10:00
Martin Schwenke
1971336200 Eventscripts: fix regression in 60.nfs export checking.
Commit 35a60a63a9b5c7d98dde514ae552239506b691c9 introduced a
regression, reported by "Jonathan Buzzard" <J.Buzzard@dundee.ac.uk>,
as follows:

  Basically the use of sed in the following code snippet does not work
  for long exports where exportfs wraps the host or network onto the
  next line.

         exportfs | grep -v '^#' | grep '^/' |
         sed -e 's/[[:space:]]*[^[:space:]]*$//' |
         ctdb_check_directories

  The result is that the you get lots of blank lines being sent to
  ctdb_check_directories which causes the host to be marked as
  unhealthy and then thrashing sets in of the managed IP's making the
  whole cluster unusable.

This tightens up the sed expression so that it is less likely to
produce a spurious empty line.  It also removes an unnecessary "grep -v".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6)
2011-08-11 15:01:39 +10:00
Ronnie Sahlberg
f9e58b502f Merge remote branch 'martins/eventscript.10.interface'
(This used to be ctdb commit 84ac667af408816e5508719b9fdb7c5e25408640)
2011-08-11 14:15:22 +10:00
Ronnie Sahlberg
b77a78d809 Merge remote branch 'martins/eventscript_infrastructure'
(This used to be ctdb commit 20864822372b6d574c545287002a429b273c4bcc)
2011-08-11 14:01:02 +10:00
Martin Schwenke
088620b026 Eventscripts: in 60.nfs move statd-notify code to service_reconfigure().
This means that it now occurs on every reconfigure event.  As a result
the ipreallocated event is removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c45a89418ba733ff91d48340d72bdb6d2ef80051)
2011-08-11 13:56:25 +10:00
Martin Schwenke
eef89f83b2 Eventscripts - 60.nfs should define service_reconfigure().
Not $service_reconfigure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 642292d7ba7a95567964b4160c7ee31a4f8985d1)
2011-08-11 13:55:02 +10:00
Ronnie Sahlberg
53b956fee7 When starting and stopping ctdb through the init-script, make sure we first clear all public ips bvefore we start the daemon, in case they are still hanging around since a previous kill -9 and also make sure we drop them after we have stopped the deamon when shutting down
CQ S1027550

(This used to be ctdb commit 8de5513b3ad89711da845c7588d35b32e2f2acb6)
2011-08-11 11:48:04 +10:00
Martin Schwenke
3a760b09ed Evenscripts: improvements to ctdb_service_check_reconfigure().
* Make this function applicable to "ipreallocated" event too.

* Monitor event should not always succeed just because we reconfigure.

  If the service was unhealthy before the reconfigure and we end the
  reconfigure with "exit 0" then we can cause the node's health status
  to flip-flop.

  To avoid this we return the status of the service from the previous
  monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 21dfcbbdccd906fcd6ab7bba81418ce565bf63aa)
2011-08-11 10:46:57 +10:00
Martin Schwenke
e66a1af9b3 Eventscripts: 50.samba - only start/stop nmbd if $CTDB_SERVICE_NMB set.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit defaec99df8c279d8e315d5010f9146e013afda2)
2011-08-11 10:46:57 +10:00
Martin Schwenke
8fb04d451e Eventscripts: 50.samba needs null service_reconfigure() function.
Samba doesn't need to do anything for configuration changes.  It will
notice configuration changes and reload automatically.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit de13350c17261032a7468c2cf4d2cf4a8d66a840)
2011-08-11 10:46:57 +10:00
Martin Schwenke
b01d99a8fa Eventscripts: 40.vsftpd service_stop() no longer /dev/null's output.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f928c201b6d0e1cd3e5568ae65186e3cee7c4988)
2011-08-11 10:46:57 +10:00
Martin Schwenke
1ea3616dcc Eventscripts: improvements to 41.httpd.
* Reduce the failure counts so that restart attempts happen sooner.

* Use service_start() and service_stop() for the restart.
  ctdb_service_start() resets the failure count, which isn't very
  useful in this context.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 01776b9f29af9ad5c8534649ece1bd100e450434)
2011-08-11 10:46:56 +10:00
Martin Schwenke
2a14f91722 Eventscript functions: new function ctdb_check_counter().
This should eventually be able to replace ctdb_check_counter_limit()
and ctdb_check_counter_equal(), although it doesn't issue warnings
like the former.

It takes 4 optional arguments:

1. _msg - If "error" then over limit causes an error message and and
   exit 1.  Anything else fails silently but the function returns 1.
   Default is "error".

2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt).
   Default is -ge.

3. _limit - Limit for the counter to be used in comparison.  Default is
   $service_fail_limit.

4. _service_name - Used to identify the counter.  Default is
   $service_name.

For example:

  ctdb_check_counter error -ge 5 foo

will print a message and exit 1 if the counter for foo is >= 5,
whereas

  ctdb_check_counter check -ge 5 foo

will just return 1 if the counter for foo is >= 5, and

  ctdb_counter_check

with print a message and exit 1 if the counter for $service_name is >=
$service_fail_limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b01b7233515669e995e037205796e265643b176)
2011-08-11 10:46:56 +10:00
Martin Schwenke
219c6fd55b Eventscripts: remove unused remove_ip() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 881af7c1417962b9b3ade6565b3e8eb9f9df7a97)
2011-08-11 10:46:56 +10:00
Martin Schwenke
5c948528b5 Eventscripts: startstop_nfs stop no longer redirects output to /dev/null.
When stopping (as opposed to restarting) it is useful to see this
information.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9ab1937239761dc32b143c9d225447bc6f090b4)
2011-08-11 10:46:56 +10:00
Martin Schwenke
caee6f1508 Eventscripts: fix typo in _ctdb_counter_common().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f57d1722b6aa082f3f826171acc57d7d796ea95c)
2011-08-11 10:46:56 +10:00
Martin Schwenke
ab693dbcc0 Eventscripts: improve log messages in ctdb_start_stop_service().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6da7095192fb172a06b434cfb02f4bfa6221b343)
2011-08-11 10:46:56 +10:00
Martin Schwenke
1b956b2b0a Eventscript functions: fix counter regression.
d362be7d32079ac1390d67056ce107bfbca2c937 wasn't well thought out.
Subsequent commits depend on ctdb_counter_init() taking an argument,
so this makes those cases work.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 05a8fcfbac3da2b5843b31e0fe258255cc761190)
2011-08-11 10:46:56 +10:00
Martin Schwenke
217edfa1c8 Eventscript functions: ctdb_service_check-reconfigure() acts only on monitor.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit beabf506a5eb68fc50fdbf8772c1d2bb0f7951e3)
2011-08-11 10:46:56 +10:00
Martin Schwenke
cd4074d2f8 Eventscripts: make 50.samba use $service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0f003f05e28037eefdce3a686fcb52cd2289af9d)
2011-08-11 10:46:56 +10:00
Martin Schwenke
3d1f0100be Evenscripts: update 60.nfs to use ctdb_service_check_reconfigure.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7c070b0bc86b3b9a91a9dc263b72c0567934535c)
2011-08-11 10:46:56 +10:00
Martin Schwenke
a35138a001 Evenscripts: update 60.nfs to use ctdb_setup_service_state_dir.
The state directory basename becomes "nfs" rather than "statd".  One
line of code i moved from the "startup" event to service_start().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cc4c5c19af7efe01c48f73bb5ec5e607ed79db4c)
2011-08-11 10:46:20 +10:00
Martin Schwenke
d6c5fcfbae Evenscripts: update 40.vsftpd to use ctdb_service_check_reconfigure.
To simplify we also remove the reconfigure from the recovered event
because the monitor event will handle this very quickly anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit da3aedd1a472b430b75989d3c157efedd382e327)
2011-08-11 10:46:20 +10:00
Martin Schwenke
4daf8bb1c8 Evenscripts: update 41.httpd to use ctdb_service_check_reconfigure.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 51c45b1c4751af41e5f9fd252763e0025f8cce3a)
2011-08-11 10:46:20 +10:00
Martin Schwenke
820d9b30ea Eventscripts: rejig the reconfigure infrastructure.
* Add an optional service name argument to existing reconfigure
  functions.

* User function service_reconfigure() instead of variable
  $service_reconfigure to specify how a service is reconfigured.

* New function ctdb_service_check_reconfigure() reconfigures a service
  if it is flagged for reconfigure.

* Remove $service_reconfigure settings from 40.vsftpd and 41.httpd -
  they're the defaults.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 15d4111d0761d82f57d5d4f0b1227812d14e4d7c)
2011-08-11 10:46:20 +10:00
Martin Schwenke
5b5bd3d27b Eventscript functions: move flagging of managed services.
Move flagging of managed or unmanaged services into
ctdb_service_start() and ctdb_service_stop().  That way services will
be correctly flagged if they are started from the startup and shutdown
events.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8675744cbd90b5a5095ed6fff7b36ae82004a457)
2011-08-11 10:46:20 +10:00
Martin Schwenke
428e32d647 Eventscript function: change service_start into a function.
service_start is currently a variable.  This makes passing arguments
hard.  We change it to be a function and put default definitions into
the functions file.

We use a convention that if a service name argument is passed to a
redefined version of service_start() or service_stop() then it will
act unconditionally.  If no argument is passed then it can use
internal logic to decide if services should really be started.  This
is useful when a single eventscript handles multiple services.

This is a cherry-pick of ae38895 that needed to be reset mid-stream.
There is still some breakage following this commit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 86e4aefed9fd1028660c98e3ea758c2b75ffc1d8)
2011-08-11 10:46:20 +10:00
Martin Schwenke
f60802c776 Eventscript functions: add optional event name argument to fail count functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b14f18649f42aab80ce0336c15ab6159f241c9af)
2011-08-11 10:46:20 +10:00
Martin Schwenke
ea6a53e2b3 Eventscript functions - optimise is_ctdb_managed_service().
This function generates a lot of trace when running under "set -x".
This is due to the backward compatibility code.

This adds 3 optimisations:

1. Before invoking the backward compatiblity code,
   is_ctdb_managed_service() returns early if the service is listed in
   $CTDB_MANAGED_SERVICES.

2. ctdb_compat_managed_service() actually now updates
   $CTDB_MANAGED_SERVICES instead of temporary variable $t.

   This means that a subsequent call to is_ctdb_managed_service() will
   short circuit due to optimisation (1).

3. ctdb_compat_managed_service() only adds a service to
   $CTDB_MANAGED_SERVICES if it is the service being checked by
   is_ctdb_managed_service().

   This stops irrelevant services being added to
   $CTDB_MANAGED_SERVICES multiple times by multiple calls to
   is_ctdb_managed_service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 758f4667c60089e09a0439c1eb74f5e426ca5e2e)
2011-08-11 10:46:20 +10:00
Martin Schwenke
6ec2cfc7da 50.samba eventscript should use is_ctdb_managed_service "winbind".
Currently it checks $CTDB_MANAGES_WINBIND directly in several places.
This doesn't work when someone sets $CTDB_MANAGED_SERVICES directly.

This modifies check_ctdb_manages_winbind() so that it return a
condition rather than modifying $CTDB_MANAGES_WINBIND.  This makes
some code more readable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 538902fbc1e74134a03987b36b3733ad641f8971)
2011-08-11 10:46:20 +10:00
Martin Schwenke
e96e655430 50.samba eventscript should use is_ctdb_managed_service "samba".
Currently it checks $CTDB_MANAGES_SAMBA directly.  This doesn't work
when someone sets $CTDB_MANAGED_SERVICES directly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8f0f8948abd340088720718fef7dc858661ba23)
2011-08-11 10:46:20 +10:00
Martin Schwenke
45bcf843ec 50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.

This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.

An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().

To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Conflicts:

	config/events.d/50.samba

	Most of this merged elsewhere.  This just removes a check that
	this is the monitor event.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 257a2e350280c0b76ed2fac588cad167381fda52)
2011-08-11 10:46:20 +10:00
Ronnie Sahlberg
21226ee738 Add documentation for the new filesystem use monitoring
(This used to be ctdb commit 9f10c5d48a08ffb3417f880c801aed2aa2dc1355)
2011-08-11 10:07:50 +10:00
Ronnie Sahlberg
ee96db07d5 Add new eventscript 40.fs_use that can be used to monitor file system use and flag a node unhealthy when they become full
(This used to be ctdb commit 2fd1babf8135ad5d53f3b25ba823d840ebc66460)
2011-08-11 10:04:40 +10:00
Ronnie Sahlberg
c8a18e8f9a make the persistent even longer for lvs to make people even happier
(This used to be ctdb commit 8158077624eb763ba40c6a7b4b7faf3867b205d7)
2011-08-11 09:12:38 +10:00
Ronnie Sahlberg
543701293f increase the persistent timeout to make people happier
(This used to be ctdb commit 68ea19cb02017e93769df7f6312d5e0bef55e605)
2011-08-11 07:14:57 +10:00
Ronnie Sahlberg
f9156adef5 check the shares if they are available before we decide to try to restart nfs
CQ S1027529

(This used to be ctdb commit b6c6a4588ccf6ef78fabfd76d228f56b4eb65165)
2011-08-11 07:14:16 +10:00
Martin Schwenke
4e60075228 Eventscripts - fix 10.interface bash incompatibility.
In dash, this fails gracefully with nothing to stderr:

  t=$(cat /does_not_exist) 2>/dev/null

In bash the error from cat is still printed due to different order of
evaluation.

This works everywhere:

  t=$(cat /does_not_exist 2>/dev/null)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a6e61867c7a58d5a77cd8641d8df0b105cddff77)
2011-08-10 16:06:26 +10:00
Martin Schwenke
06f1004da4 Merge branch 'eventscript.20.multipathd' into eventscript.00.ctdb
(This used to be ctdb commit 8723b88b0b2bbeece38c74c77c50e8d8b3e2d5ca)
2011-08-10 15:32:58 +10:00
Martin Schwenke
383b203096 Merge branch 'eventscript.62.cnfs' into eventscript.20.multipathd
(This used to be ctdb commit fb87fa9273db4f82e801a331b5d95059d64dfb8e)
2011-08-10 15:32:11 +10:00
Martin Schwenke
7eae4aafca Merge branch 'eventscript.13.per_ip_routing' into eventscript.62.cnfs
(This used to be ctdb commit cfa4102ec0d97e1d1d3c1ce6407ffacdb85c2e10)
2011-08-10 15:31:13 +10:00
Martin Schwenke
098da255fa Evenscripts: update 61.cnfs to use ctdb_setup_service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit afafeb1fb12384bddff470d38b534f513a1f3b07)
2011-08-10 12:27:41 +10:00
Martin Schwenke
061b7adad6 Evenscripts: update 13.per_ip_routing to use ctdb_setup_service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 18e0236754507a9475653f04bb239c5d46ba51de)
2011-08-09 17:35:37 +10:00
Martin Schwenke
609a1e5c77 Evenscripts: update 20.multipathd to use ctdb_setup_service_state_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 797ca65bdd59b14325ffd32b4d4140e9b01dbe71)
2011-08-09 17:28:09 +10:00
Martin Schwenke
f36bae1cbf Eventscripts: fix dangerous rm -rf in 00.ctdb init event.
Also remove some unnecessary absolute paths for commands, which were
making the code slightly difficult to read.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1b3f2dd62efb240f8486016fe0f8dfb73d6ccc66)
2011-08-09 16:48:57 +10:00
Martin Schwenke
dd56cde3ff Eventscripts: 00.ctdb uses $service_state_dir, neaten update_config_from_tdb().
This also fixes a bug where update_config_from_tdb() used an incorrect
filename in one place.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a5ce2adaa39f077f56582072a97bb64d0eba4b4d)
2011-08-09 16:45:50 +10:00
Martin Schwenke
cbf030a72e 00.ctdb eventscript removes all files from $ctdb_active_dir.
Without this you can get into a situation where ctdbd can not start.
If the active file for a service exists but the service is not
running, then trying to stop the service may fail, causing the
eventscript to exit from ctdb_start_stop_service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 28379ca0f747c5952d690a451834ce7421adfd34)
2011-08-09 16:42:27 +10:00
Martin Schwenke
71e9016ec2 Scripts: add note about not using absolute command paths to README.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 87e6a4a23a6ae6c276e9628ce513663f47b4ee77)
2011-08-09 16:36:37 +10:00
Martin Schwenke
d81c1319e9 Add a README to the config/ subdirectory.
This includes a comment about using POSIX Bourne shell, including a
suggestion not to use "local" variables.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ae002c7513b1b2aa5136437a1a19f8cd179b869)
2011-08-09 16:36:37 +10:00
Martin Schwenke
ee38b9a159 Eventscript functions: new function ctdb_setup_service_state_dir().
To be used by eventscripts to create a per-service directory for their
own state data.  $service_state_dir is set to point to the new
directory.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a273554791c2a5281aee28f8e2be0c514e14c91e)
2011-08-09 16:35:07 +10:00
Martin Schwenke
ec33c04283 Eventscript functions: new functions to remember/check if service managed.
This was done ad hoc and was badly named.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9a084a121f629b2c1bcefc1e4c4a4a5cacf53987)
2011-08-09 16:20:08 +10:00
Martin Schwenke
50dc5b01a4 Scripts: remove absolute paths from interface_modify.sh.
The "ip" command is currently run as "/sbin/ip".  This makes it
impossible to replace with a stub in unit testing.  The functions file
controls $PATH, so we don't need absolute paths.

This replaces the absolute paths...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b4c712aab3edc0059f2e5a6730b7fdcf7e5f4ec)
2011-08-08 15:50:10 +10:00
Martin Schwenke
eec654314a Eventscripts - Remove local variable usage in 10.interfaces.
POSIX sh doesn't have local variables.  Debian's dash doesn't behave
the same way as bash on this contruct:

  local var=`command that produces multiple words`

It only assigns the 1st word and may print an error.

Just remove the use of the "local" keyword in monitor_interfaces() to
solve this.  It isn't actually limiting the scope of any variables
that are used outside the function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 95d9a1e19655461288a2c7e52abf9d01ab23e05a)
2011-08-08 15:44:30 +10:00
Martin Schwenke
72362e7b56 Eventscripts: source a file specified by $CTDB_RC_LOCAL in functions file.
Another unit testing hook.  This is easier than dropping files into
rc.local.d/ and then removing them.

The file has to be executable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b13ac3bdaf326a6cdfd87da9195eb9630806c418)
2011-08-08 13:51:32 +10:00
Martin Schwenke
394bbe8454 Eventscript functions - use $CTDB_VARDIR instead of local $ctdb_spool_dir.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0c6d9b19f0dd8946f9504b0d1cf50dd21f7a592)
2011-08-08 13:21:23 +10:00
Martin Schwenke
b0e7237653 Eventscripts - remove some more absolute paths to commands.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f5b7cb03aaf19fb4b12fc3f0c14d98ee2d7b0798)
2011-08-04 17:14:11 +10:00
Martin Schwenke
8026b3ce5a Eventscripts - Rework the use of get_proc() for the bonding checks.
Call call_proc(), put the output into a variable and then use it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2dfdc997f432d522034922b43cb6f8f878d11ba7)
2011-08-03 20:12:48 +10:00
Martin Schwenke
6fd94af5cc Eventscripts: update 60.nfs service() start to use set_proc().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 70ebb30b90956bb1212287d267ccb72ea83740ca)
2011-08-03 20:01:38 +10:00
Martin Schwenke
4b516600a2 Eventscripts: update 10.interface to use set_proc() and get_proc().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 61b7f0172ba5c83c847c29fac3582c25c7754b68)
2011-08-03 19:58:25 +10:00
Martin Schwenke
cfdccc5cac Eventscripts: use set_proc() in startstop_nfs().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5a3d5c6b1ca3682bb45104e50061871dec6e9b1d)
2011-08-03 19:57:40 +10:00
Martin Schwenke
75bbc93c0b Eventscripts: remove unnecessary absolute paths from external commands.
For eventscript unit testing it will be necessary to override external
commands to allow stub implementations to be used.  If absolute paths
aren't used then this can be done using either a fake bin/
subdirectory or by using shell functions.

This removes all of the simple cases of absolute paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Conflicts:

	config/ctdb.init
	config/events.d/50.samba

        Keep old code but remove absolute paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8)
2011-08-03 17:19:15 +10:00
Martin Schwenke
5f4ab05766 Eventscripts: new functions set_proc() and get_proc().
These provide a thin layer around writing and reading files in /proc.
They can be easily replaced by stubs for unit testing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 637f9d8af517b73c72ed8f3cc2a2661f11eb2126)
2011-08-03 17:04:58 +10:00
Martin Schwenke
571e55ac0d Eventscripts: remove ctdb_wait_command() and ctdb_wait_tcp_ports() functions.
These haven't been used for a long time.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f5fd361cadb3ea18d29e2d7215a7853718e48d00)
2011-08-03 17:02:41 +10:00
Martin Schwenke
e3a9991e46 Eventscripts: iptables() should put lock in $CTDB_VARDIR.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3f04793f391c63b78ffb9c9851ab3f0daf3ed50a)
2011-08-03 16:55:43 +10:00
Martin Schwenke
3bbfdfcdd3 Make Emacs recognise that the eventscript functions file is a shell script.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a6dfb76cfa759f6f9409f24368111c4f85ca0fbf)
2011-08-03 16:49:38 +10:00
Martin Schwenke
3380c6ce1d Eventscript functions: add $CTDB_ETCDIR and hook service() functions.
* $CTDB_ETCDIR defaults to /etc but can be changed for testing.  All
  hard-coded instances of /etc have been changed to $CTDB_ETCDIR.
  This includes references to /etc/init.d and /etc/sysconfig.

* service() and nice_service() functions now call new function
  _service().  This makes it easier to override these functions (say,
  in rc.local) for testing and call most of the existing functionality
  using _service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f43c9a7604b779bb6257ddb2bf3cbe266d496a63)
2011-08-03 16:45:54 +10:00
Martin Schwenke
d31fbcab4b Set $CTDB_VARDIR in the functions file.
This will be needed when eventscripts that use it are called
externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ebd53b66b0cc66d9d04830781886234167fc2164)
2011-08-03 16:44:49 +10:00
Martin Schwenke
652bf326e1 Eventscripts - 10.interfaces should not check orphaned interfaces.
If the last IP address on an interfaces is removed then that
interfaces should no longer be checked by 10.interfaces.  However,
"ctdb ifaces" still lists such interfaces so they are currently
checked.

The problem really needs to be addressed in ctdbd but a neat quick
eventscript fix will be minimally invasive...

This changes the code to use "ctdb -Y ip -v" instead of "ctdb -Y
ifaces".  The former includes details of all public addresses and
associated interfaces, so when an address is removed there is no
output for it.  This avoids orphaned interfaces from being listed.

The logic is also slightly improved so that $IFACES includes just a
(non-uniquified) list of interfaces, allowing an existing loop to be
removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443)
2011-08-02 16:53:14 +10:00
Ronnie Sahlberg
18af72f08f change the name for the key for the record where we stoire the public address config from public-addresses... to public_addresses...
CQ1019030

(This used to be ctdb commit 114d5034ff4880848588caf493382a537a1469ae)
2011-06-28 15:40:46 +10:00
Mathieu Parent
c262fe6a8f Fix bashism
... again ;-)

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 2266586c1839af032622be54dc7f71e39d2bd9ef)
2011-05-14 22:30:25 +02:00
Ronnie Sahlberg
d020b2c950 When using multiple VLANs, some funky stuff can sometimes happen when
adding/removing IP addresses causing routes might be dropped by the system.

The easiest workaround for this is to unconditionally try to reapply
all static routes for all interfaces once ipreallocation has finished,
not just adding them back on the affected interface.

This worksaround a funky issue in
CQ S1023538

(This used to be ctdb commit 84600d1f53632d5fe76c308727f31f61b5ec1010)
2011-05-12 12:06:45 +10:00
Ronnie Sahlberg
d1edf44e4f If samba fails to start for some reason, make this cause the startup event to fail too, so that ctdbd will re-try the startup event later.
Or else this will leave samba not running.

CQ S1023394

(This used to be ctdb commit f90485b08d32cbe56050718a3b28ca0fe1d64e0f)
2011-05-10 09:59:38 +10:00
Ronnie Sahlberg
ee9e137759 Dont exit from checking interfaces once we have found one interface that is not
in use by public addresses.   this can happen when we have removed existing interfaces/ip addresses and prevents us from verifying the status of other interfaces

(This used to be ctdb commit d67955b42f7627be9dae995230c8fcbb8a948ec2)
2011-05-10 07:53:43 +10:00
Ronnie Sahlberg
2e2e37fdd6 Remove logging of spam/errors from the 10.interfrace
script if/when we have for example NATGW configured but no public addresses defined on that interface

CQ S1023378

(This used to be ctdb commit 8837daa424732aeb5a20814b1709c345a97a0e09)
2011-05-09 08:10:49 +10:00
Ronnie Sahlberg
d97e42183e bonding mode 4 monitoring:
we can not just check if MII Status is up for bonding mode 4, since the kernel will always report the bond device as UP
even if all cables are disconneccted.

For mode 4, ignore the status of the bond device and instead chek if at least one slave interface is up
when determining if the device is good or bad

(This used to be ctdb commit a6930cec6d9503dba18b9d4839d87a1c1a8ddba2)
2011-04-13 09:05:58 +10:00
Ronnie Sahlberg
c04505724a IFACE handling. Assume links are always good on nstartup (they almost always
Simplify the handling of setting the links in the 10.interface eventscript
and remove the optimization to only call setifacelink on state change
to make the code simpler to read.

If a take ip event fails, flag the node as unhealthy.

Add a check to the interface script to check if the interface exists
or if it has been deleted.
So that we can capture and become UNHELTHY if someone deletes an interface
we are using to host public addresses.

(This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)
2011-04-11 07:40:05 +10:00
Ronnie Sahlberg
55853a4683 NATGW: dont set arp_ignore in 11.natgw anymore since we no longer
need this for the natgw functionality

(This used to be ctdb commit bf3bf2967e3781c918e33b3a210e68e0ccca0c51)
2011-04-06 11:33:11 +10:00
Michael Adam
c9dc10292e ctdb.init: print a warning when tdbdump is found but tdbtoo or "tdbtool check" is not available
(This used to be ctdb commit afb26e38b617b85cdac14a7cd6dd3c85b8fddbc4)
2011-04-05 13:50:00 +02:00
Michael Adam
faa6d8d7e2 ctdb.init: check for availability of "tdbtool check" and "tdbdump"
Print a warning if neither is available.

(This used to be ctdb commit 4137d2a7d31cdce22847cebfc0239cfe2d8e937c)
2011-04-05 13:43:56 +02:00
Mathieu Parent
a5a6140b7e Correction of spelling errors
* continous -> continuous
* activete  -> activate

(thanks to lintian)

See https://bugzilla.samba.org/show_bug.cgi?id=6935

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit fb6987c2f747d6dbf9bb3899a480124d1c242a90)
2011-03-23 00:35:23 +01:00
Ronnie Sahlberg
a453e79050 50.samba : Tell winbind about every time we add/remove and ip from the node
CQ S1021636

(This used to be ctdb commit 87b279027616cffbcedfd534ac0032cd51238dfe)
2011-02-18 11:29:35 +11:00
Ronnie Sahlberg
d32a4dd501 remove checking for filesystems and filesystem health from the cnfs script.
remove the gpfsmount and gpfsumount entry points

(This used to be ctdb commit 7db5a4832a9555be53c301f198f72b9e075a8ae7)
2011-02-18 10:11:56 +11:00
Ronnie Sahlberg
ef0ab7eee1 60.nfs
Dont update the statd settings that often.
When we have very many nodes and very many ips, this would generate
a lot of unnessecary load on the system

(This used to be ctdb commit 0c030c9384500f340d8382c20e1e91b11aa377e9)
2011-02-18 10:10:34 +11:00
Martin Schwenke
59c5a9f279 Eventscripts: lower the fail/restart limits for nfsd.
We were potentially leaving a node unable to serve requests for too
long.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5be8610ffa33db49e33949560d0ef2fa5f3c0c73)
2011-01-11 16:49:46 +11:00
Martin Schwenke
96378d6dc8 Eventscripts: use "startstop_nfs restart" to reconfigure NFS.
This was defaulting to just "service nfs restart", which doesn't have
the workarounds we need.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0f462e9e9fe12b595f3c7452123db8e69548abd6)
2011-01-11 16:49:14 +11:00
Martin Schwenke
3efd5ef77c Eventscripts: only autostart during a monitor event.
Otherwise we might short-circuit events that are run only once and
actually need to do something.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)
2011-01-11 16:48:50 +11:00
Martin Schwenke
fb8f199651 Eventscripts: print a message when reconfiguring a service.
Otherwise there can be strange error messages from services
stopping/starting, without any context.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)
2011-01-11 16:48:17 +11:00
Martin Schwenke
934ae76d38 Eventscripts: work around NFS restart failure under load.
"service nfs restart" can fail.  To stop nfsd it sends a SIGINT and
nfsd might take a while to process it if the system is loaded.
Starting nfsd may then fail because resources are still in use.

This does some /proc magic to tell nfsd to do no more processing.  It
then runs service stop, kills nfsd with SIGKILL, and then runs service
start.  This is much less likely to fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)
2011-01-11 16:47:43 +11:00
Ronnie Sahlberg
47aad74673 TYPO
(This used to be ctdb commit 38dc1ac2e87416a22c9356596286b773d601e71c)
2011-01-11 16:17:33 +11:00
Ronnie Sahlberg
2a3442d972 STATD is 100027 not 1000247
(This used to be ctdb commit f4cf15a2b06ffefde0cba803603b48040ad0fa05)
2011-01-11 16:16:28 +11:00
Ronnie Sahlberg
7e747aab8d 60.nfs Check if we have rpc.statd and if not, skip checking for statd
availability at all (since we cant restart it, there is not point checking
if it is alive)

(This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)
2011-01-06 15:49:15 +11:00
Ronnie Sahlberg
ded7c23122 41.HTTPD
Httpd can be very slow to start on some platforms,
wait 5 monitor intervals before we try to restart it if
it has not bound to port 80 yet.
After 10 failed intervals, flag the node as unhealthy.

(This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)
2010-12-22 10:31:41 +11:00
Ronnie Sahlberg
e9ff38be7d 60.nfs
Try to restart LOCKD after 10 failures and
flag the node as unhealthy after 15 failures

(This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)
2010-12-22 10:31:31 +11:00
Ronnie Sahlberg
57e74f6d8a Dont run net serverid wipe in the background
(This used to be ctdb commit 76c515f9f05f4fb5683b5ff65cf136c168fd882f)
2010-12-22 10:31:26 +11:00
Ronnie Sahlberg
97a6eccaf7 50.samba
Net serverid wipe can take a bit of time sometimes so background it.

Only perform auto start/stop of the managed service on the monitor event

(This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)
2010-12-14 21:19:28 +11:00
Ronnie Sahlberg
1e41ab5fa3 LVS
update lvs configuration on ipreallocated events too

(This used to be ctdb commit a4e98073d955676fdcbb91affae1de1a733d0bc2)
2010-12-13 14:24:16 +11:00
Ronnie Sahlberg
c26c6a01cf only run "serverid wipe" if we are actually running samba.
we dont need to run this on systems where we do run winbind but not samba

(This used to be ctdb commit fcb9e8d1e1c78439ea42adb8b05ad84fbca7f724)
2010-12-10 13:42:12 +11:00
Ronnie Sahlberg
8147d29598 add a missing part of the import of the previous ganesha patch
(This used to be ctdb commit 171b8855bb2feae7f7dd6a079571f3113dedd6f4)
2010-12-06 11:50:15 +11:00
Chandra Seetharaman
5e485d5ca0 make changes to ctdb event scripts to support NFS-Ganesha.
make changes to ctdb event scripts to support NFS-Ganesha.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 7298588ed54492f106954c893dd86b0a36783470)
2010-12-06 11:50:12 +11:00
Ronnie Sahlberg
8959c8e850 dont try starting samba through the "init" event
(This used to be ctdb commit e314a449606418a4c4eac6eb319bfcdf1c398cd3)
2010-12-03 11:40:38 +11:00
Ronnie Sahlberg
6ed0009125 When we are no longer the natgw master, dont put the natgw ip on loopback.
We put the ip on loopback just to make sure we would still interoperate with
non-standard configurations on unix-KDC, that are configured to verify the optional
HostAddresses field.
This is not required for AD, since AD does not use this field, and is replaced in
unix land with other/better mechanisms than this "dodgy" check.

This makes it "easier" for applications that have bound to the natgw address
to detect a socket problem and try to reconnect/recover if the ip address
is completely missing from the system.

At the same time, use the winbind specific hook that exists to explicitely tell winbindd : this address is gone, so if you have bound to it, this is a good time to close and rebind your socket.

cq 1020333

(This used to be ctdb commit 0da94869d2912b2a412ba3fbd2137d88ce4e4389)
2010-11-29 12:45:59 +11:00
Ronnie Sahlberg
ebcc866ae0 update autostart/stop to work for samba
(This used to be ctdb commit 37ab57e2adaecc3f7996ea20af45a5df0cd8be76)
2010-11-22 20:42:26 +11:00
Ronnie Sahlberg
a3e7dfadca add an explicit _is_managed_service to iscsi eventscript
(This used to be ctdb commit 44f683a1ba15944d3306a0effd572de3280ff975)
2010-11-18 14:15:56 +11:00
Ronnie Sahlberg
193d9d50d1 Dont pollute the logs with a "file not found" message
CQ S1020745

(This used to be ctdb commit ea8bb7b26bb879a895c267d49672433182390d0d)
2010-11-18 13:54:15 +11:00
Martin Schwenke
c00db6f271 60.nfs eventscript should do nothing if NFS isn't managed by CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 582e5cd077501e8d4131a9c7981781471308edfd)
2010-11-18 13:36:40 +11:00
Martin Schwenke
a2af87482b Eventscript functions - catch failures in ctdb_service_start().
ctdb_service_start() currently succeeds if ctdb_counter_init()
succeeds.

This changes it to fail when a service start fails.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ddb73962d72d933bf0edc28be0dbb45bea7e5ef4)
2010-11-18 12:15:05 +11:00
Martin Schwenke
3ab768e8d4 50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.

This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.

An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().

To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d98f175e8420d921a123ae9c0ce00945350b1537)
2010-11-18 12:12:30 +11:00
Ronnie Sahlberg
4fe85e5be5 add a new support function ctdb_check_counter_equal()
update nfs to try to restart the service after 10 consecutive failures
and to flag the node unhealthy after 15

add similar function to mountd

(This used to be ctdb commit 1569a54bb82fc433895ed68f816cf48399ad9d40)
2010-11-17 13:54:57 +11:00
Martin Schwenke
8fe1ec3754 Eventscripts: make loadconfig() function hookable by the test suite.
Rename loadconfig() to _loadconfig().  Add a new loadconfig() that
simply calls _loadconfig().

This makes it easy for the test suite to override loadconfig().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1d77a3adfff893b3c01b87f791e72c0d3148425c)
2010-11-17 11:46:48 +11:00
Martin Schwenke
e23ca7dba5 Make a time comparison in 60.nfs eventscript more readable.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 26077e6c8eb126584af587e7416154ea4858aea2)
2010-11-17 11:44:26 +11:00
Martin Schwenke
6ab5ae2c9b 60.nfs only fails or warns after 10 consecutive nfsd/statd failures.
These failures are sometimes the result of slow restarts so we want to
avoid dirtying the logs or marking a node unhealthy because of them,
unless they are excessive.

For these 2 cases we use the existing fail counting code but hack a
temporary service_name in a subshell to allow separate fail counts.

We also update ctdb_check_rpc() so that it captures the error output
from rpcinfo and we add a message including the service name to the
beginning.  The error is printed to stdout but is also stored in
ctdb_check_rpc_out to allow it to be conditionally used by the caller.
This function also now returns non-zero rather than exiting on
failure.

Other direct rpcinfo calls are relaced by called to ctdb_check_rpc()
for consistency.

Option handling code for service restarts is cleaned up so that fits
in 80 columns.  A more informative restart messageis now used in all
cases, printing the exact command being used to start a service.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79c25fe241cf5d8f92e23d3736823ebaf4e1769d)
2010-11-17 11:43:09 +11:00
Ronnie Sahlberg
055eafb790 this stuff is just so fragile that it will enter infinite recovery and fail loops
on any kind of tiny unexpected error

unconditionally try to remove ip addresses from both old and new interface
before trying to add it to the new interface to make it less
fragile

(This used to be ctdb commit 80acca2c91c9053c799365bae918db7ed8bdc56f)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
ebed26d755 delete from old interface before adding to new interface
this stops the script from failing with an error if
both interfaces are specified as the same, which otherwise breaks and leads to an infinite recovery loop

(This used to be ctdb commit 565de03a784ed441490f8cd0b137b5cec8716d55)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
76578b9533 dont delete all ips from the system during the initial "init" event
leave any ips as they are and let the recovery daemon remove them as required

(This used to be ctdb commit 8ab311719857847b4cf327507b0af1793551e73c)
2010-11-10 14:55:23 +11:00
Ronnie Sahlberg
a1cfa23d60 Both nfs and nfslock scripts can fail under redhat in very rare situations.
Ctdb can also be configured to ignore checking for knfsd and if it is alive.
In that situation, no attempt will be made to restart nfs, and sicne nfs is not running,  lockd can not be restarted either.

To workaround this, everytime we try to restart the lockmanager, also try to restart nfsd

(This used to be ctdb commit 953dbfbddad656a64e30a6aca115cb1479d11573)
2010-10-28 13:45:40 +11:00
Ronnie Sahlberg
0d75856bb7 When shuttind down, we always unconditionally try to remove the natgw address
even if we are not currently the natgw master.
This adds extra reliability in case we have stopped previously without removing it proper,
but does add spam messages to syslog everytime we shutdowm.

Remove these spam messages from pulluting the syslog upon normal shutdown

(This used to be ctdb commit cd84da6f247ee46bbab8318298d1cd3cfc87aba9)
2010-10-28 13:38:07 +11:00
Ronnie Sahlberg
14c8228292 Redirect the output from 00.ctdb pfetch to stdout.
Normally, the config.tdb database would not exist, so we do not need
to spam syslog with a "config.tdb does not exist" message every time we start ctdb

(This used to be ctdb commit 5792809b72e534161c5ca9ef5c9897abcb3b899c)
2010-10-28 13:35:55 +11:00
Stefan Metzmacher
ab6beb6b7f events.d/11.routing: handle "updateip" event
metze

(This used to be ctdb commit 034635418c7e5274d6bdf4cccc7a10e3b631e2d4)
2010-10-21 11:09:46 +11:00
Ronnie Sahlberg
b4e3a95039 try to restart NFS LOCKD if it failed to start
(This used to be ctdb commit 2913cc93a9a172caf9e0d6675cfa4de4cc957b13)
2010-10-14 08:13:09 +11:00
Ronnie Sahlberg
0de79c12ba Make sure the statd directory exist before trying to access the
"update trigger" file.

CQ 1020344

(This used to be ctdb commit 171f98f6f7ce7d01f47c44043ad599702711b12d)
2010-10-12 08:02:18 +11:00
Ronnie Sahlberg
842d9aab4e move extracting the config from config.tdb for public addresses
into its own function

(This used to be ctdb commit 2d478a39ed8303b0371112d61630660d12b7db2c)
2010-10-12 02:57:53 +11:00
Ronnie Sahlberg
f7febd28af dont stop checking interfaces after the first bond device
continue the loop to process all other interfaces too

(This used to be ctdb commit 500ade4e6a58ea786a665f6be7cf30f43c882570)
2010-10-09 10:55:43 +11:00
Ronnie Sahlberg
51a38dc4a4 Spotted by rusty.
Add a missing $
so we delete $_ip   and not _ip

(This used to be ctdb commit e9d04c5f419eaa0338a3beefba32c52be00242a8)
2010-10-08 15:53:36 +11:00
Ronnie Sahlberg
f5c0539dc6 Change how NATGW is configured to allow special nodes that do not have
network connectivity outside of the cluster to still be able to
participate in a natgw group.
These nodes can not become natgw master since they lack external network
connectivity.

These nodes are configured just the same way as for any other node with
NATGW, with the following two exceptions :
* we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes.
  since these ndoes lack external network we should not check the interface
  for link.
* we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that
  can not become natgw master.

(This used to be ctdb commit ab7b00a37e55beffc074be95b55d8a5c7cb9eef2)
2010-09-08 09:20:16 +10:00
Ronnie Sahlberg
dc2f87737d Dont store temporary runtime data in $CTDB_BASE/state
since that will usually be /etc/ctdb/state and storing this under /etc is just
wrong.

Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead.

(This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1)
2010-09-03 12:43:28 +10:00
Ronnie Sahlberg
c7df27e32d make sure all statd state directories exist before we try to reference them
or else tar and friends will throw an error in the log

(This used to be ctdb commit 96cbd2c0aa9a4641a42b3c33374675fa732ed1e5)
2010-09-01 15:49:57 +10:00
Ronnie Sahlberg
8be5bf1567 dont print a lot of log information about shutting down vsftpd
(This used to be ctdb commit 1a41cd7332703629001201eea8ae9b94f1341c9d)
2010-09-01 13:29:38 +10:00
Ronnie Sahlberg
9ef21f1c07 ouch, remove a dummy debug printout that snuck in there somehow
(This used to be ctdb commit 14c4d99513b4bdb94f60c3e9c4823e04b0833e60)
2010-08-30 19:48:41 +10:00
Ronnie Sahlberg
2b4d9170c2 Merge commit 'martins/master'
(This used to be ctdb commit cc8c851e2e0b46f00b18a6dc61fd2774e97850dd)
2010-08-30 18:22:05 +10:00
Ronnie Sahlberg
12cc826231 Remove the dependency on the underlying cluster filesystem for handling
the clusterwide persistent data associated with the lock manager and
statd notifications.

Use persistent databases to store this data instead of a shared directory.

(This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16)
2010-08-30 18:14:41 +10:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Martin Schwenke
a104d1d823 NFS tickles: use addtickle/deltickle instead of shared tickle directory.
This adds a new function update_tickles() that tracks tickles for a
given port using the new ctdb addtickle/deltickle commands.  This
function is used in events.d/60.nfs to handle NFS tickles.

events.d/61.nfstickle is removed.  The
/proc/sys/net/ipv4/tcp_tw_recycle setup is also moved to
events.d/60.nfs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dca4c4ebf3c35f8db3ae208efb7a83abbf726ed6)
2010-08-26 14:59:59 +10:00
Ronnie Sahlberg
3edec07807 Add a configuration database, implemented as a persistent database.
This database can be used, as an option, to store
the public address assignment instead of editing the /etc/ctdb/public-addresses file manually.

This configuration is stored in one record per key, with a key-name of
public-addresses:node#<pnn>
where <pnn> is the node number.

The content of this record is the same syntax as the /etc/ctdb/public-addresses file.

When ctdbd starts, if this key exist and contains data. It is extracted from the database and compared with the normal file /etc/ctdb/public-addresses.

If the content differs, the config database "wins" and is used to overwrite/update the /etc/ctdb/public-addresses file, after which ctdbd is restarted.

The main benefit with this option is that it can be used to update the public address configuration for nodes that are offline/unreachable by updating their configuration in the persistent database.
Once the offline node is available again, it will resync its databases with the rest of the cluster, find out that the config has changed, apply the changes and restart ctdbd automatically.

The command to store the public address configuration for a node into the persistent database is :

ctdb pstore config.tdb public-addresses:node#<pnn> <filename>

where <pnn> is the node# we wish to update the config for, and <filename> is a file containing the new content for  that nodes public address configuration.

(This used to be ctdb commit 292d7435a360efd7f15a7a99f658a605e07c0a81)
2010-08-25 11:49:56 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Ronnie Sahlberg
729f1ddea0 On RHEL, "service nfs stop;service nfs start" and "service nfs restart"
sometimes (very rarely) fails to restart the service.

    Add a function to restart NFSd on SLES and RHEL-like systems.

    If we detect the system is unhealthy due to kNFSd not running,
    try to restart the service again "service nfs restart" and
    hope for the best.

CQ1019372

(This used to be ctdb commit 25c4ce7e919f13226219f036bcffd2be76b2f06c)
2010-08-19 07:18:22 +10:00
Martin Schwenke
6ce1501aa1 Move NAT gateway firewall rules to recovered|updatenatgw events.
The existing code wasn't working as designed in the start event.  It
should work here.

BZ: 62613
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit aeb70c7e7822854eb87873a5c7783e27e6e72318)
2010-08-18 11:40:07 +09:30
Martin Schwenke
b930c885b3 initscript: wait until we can ping ctdbd before setting tunables.
Currently we do a "sleep 1" after starting and before running
set_ctdb_variables to set the tunables.  This is too arbitrary and
might fail if the system is heavily loaded.  This, for example, could
result in some nodes running with DeterministicIPs and some without,
in which case a different IP allocation algorithm would run depending
on who is the recmaster!

This makes the start function wait until "ctdb ping" succeeds (with 10
second timeout) before trying to run set_ctdb_variables.  If a timeout
occurs then the start function attempts to kill ctdbd before exiting
with a failure.

It also cleans up the status reporting code for Red Hat and SUSE so
that the final status code is reported.  Currently there are cases
where a correct status is prematurely reported before a failure
occurs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cdcd05662a30b51caaeeab4ac44138cac2474e0a)
2010-08-05 15:29:40 +10:00
Martin Schwenke
fe64a8f87a Optimise 61.nfstickle to write the tickles more efficiently.
Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat.  The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6549e9b01538998d51a5f72bfc569776d232b024)
2010-07-30 16:50:18 +10:00
Stefan Metzmacher
794230775c events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them
metze

(This used to be ctdb commit 1e08d1578d1960fcfc5fdd85492fbd6d194e5e94)
2010-07-30 16:33:27 +10:00
Stefan Metzmacher
7b1345d446 config/interface_modify.sh: do the echo before running the script
metze
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit bb1d2bd31073304fc203868517144f61d12b7fc2)
2010-07-15 15:06:51 +09:30
Stefan Metzmacher
3b9eeb1049 config/interface_modify.sh: before calling a script check if it exists and is executable
For non bash shells $_s_script might end with '/*'.

We do the workarround this way, because it makes sense to check
that a script is executable, before trying to execute it.

metze

[ This actually applies to any shell -- Rusty Russell ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)
2010-07-15 15:06:39 +09:30
Rusty Russell
34ce8a4f02 config: wrap iptables in flock to avoid concurrancy.
When doing a releaseip event, we do them in parallel for all the separate
IPs.  This creates a problem for iptables, which isn't reentrant, giving
the strange message:
	iptables encountered unknown error "18446744073709551615" while initializing table "filter"

The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.

The simple workaround is to flock-wrap iptables.  Better would be to rework
the code so we didn't need to use iptables in these paths.

CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)
2010-07-15 10:45:24 +09:30
Ronnie Sahlberg
004b849feb Dont check linkstatus for loopback. This interface never has
issues with the physical layer

(This used to be ctdb commit d938b80a1c409a9ec4b554ddca5b0d949be53d9e)
2010-06-01 14:51:09 +10:00
Ronnie Sahlberg
db9e00eec8 Prevent clients from connecting to the natgw address.
This address is dedicated for outgoing connections.

BZ62613

(This used to be ctdb commit f0e48dd833a4408449083148c172c2136b934e5b)
2010-06-01 12:43:32 +10:00
Ronnie Sahlberg
ad2b7c28b6 Add monitoring of quorum and make the node UNHEALTHY when quarum is lost
(This used to be ctdb commit d58b575e15015c5ef9493ab3ad3e8657c5787e2c)
2010-05-25 12:46:28 +10:00
Ronnie Sahlberg
03b112cb33 in 62.cnfs, lines in /etc/exports can have hte exports quoted,
so strip off any initial " on the exports line

(This used to be ctdb commit dce2244e8ac6617c335cfcd721c3795071b9f2b2)
2010-05-25 12:46:08 +10:00
Michael Adam
b40fa22239 functions: when checking for a directory also check whether it can be accessed.
Thanks to "waKKu" on irc for this improvement.

Michael

(This used to be ctdb commit 81e1483dd0ce2cd091721e456c0c194cc58442f3)
2010-05-11 11:29:45 +02:00
Ronnie Sahlberg
1cb2b0b2d0 Add a new eventscript 62.cnfs to integrate better with gpfs/cnfs
(This used to be ctdb commit 4a679422dc231aa98605b9cc322e4ab442f7bde4)
2010-05-04 13:56:55 +10:00
Ronnie Sahlberg
d6ae1c4173 If the admin makes a configuration mistake and configures NATGW to use the
same ip address as a normal public-address,
check for this in the natgw script and warn the user.

Also prevent ctdb from starting up since this configuration will not work.

BZ60933

(This used to be ctdb commit 480af69b63b9162c85d8e04461ca9e4a083c04a4)
2010-04-28 08:51:06 +10:00
Ronnie Sahlberg
2d9fee4f85 Add a setting where CTDB will monitor and warn for low memory conditions.
CTDB_MONITOR_FREE_MEMORY_WARN

BZ 59747

(This used to be ctdb commit 83446b2e7e28e3ed6627c1950053018b8799984a)
2010-04-23 09:08:38 +10:00
Ronnie Sahlberg
8ef5db522a In the example script to remove all ip addresses after a ctdb crash,
add the NATGW address as one to be removed in addition to the
public addresses.

(This used to be ctdb commit 234b86fb19aae7a43f1dd2c0f69b03164fe5aaca)
2010-04-23 09:08:26 +10:00
Ronnie Sahlberg
4f191982ca add an example script that can be called from crontab to cleanup
and release public ip addresses if ctdbd is no longer running

(This used to be ctdb commit 1cdaaa0a3f53d1b075340a33dfdc42b534e99187)
2010-04-22 14:23:02 +10:00
Ronnie Sahlberg
40434a7c98 add a missing ||
to make the 10.interface script not fail with a syntax error

(This used to be ctdb commit a9831070344a6dcf46c55250f9d74a5870f37dfe)
2010-04-22 14:22:46 +10:00
Martin Schwenke
f765f0ceca Fix a thinko in 2ea0a9f1a93781a0d036feb9fcc0d120b182922f.
If the driver is virtio_net then we assume that the link is up rather
than ignoring the check altogether.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3044d07da2a58260fa06bf489890b279bcf3ec39)
2010-04-20 10:52:31 +10:00
Ralph Wuerthner
d2f7bf804c ethtool does not support virtio_net devices.
Skip link test for this type of devices

Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea0a9f1a93781a0d036feb9fcc0d120b182922f)
2010-04-15 16:38:19 +10:00
Michael Adam
df77489477 events:50.samba: wipe the local part of the serverid db before starting winbind/smnd/nmbd
This is necessary for the new serverid approach.

Michael

(This used to be ctdb commit 8956f32e571093db7f285b83e4dd32960f8afc7c)
2010-03-29 17:05:06 +11:00
Stefan Metzmacher
940e58bf3f config: let 13.per_ip_routing use a flock for generate_auto_link_local()
metze

(This used to be ctdb commit dc2d0d0e559308ad2676f9ad973746c147d65eb9)
2010-03-18 11:57:16 +01:00
Ronnie Sahlberg
d4f7a59960 Merge root@10.1.1.27:/shared/ctdb/ctdb-git
(This used to be ctdb commit e59310132d8126ee3afc191b5db56e80a32986e8)
2010-03-11 18:15:41 +11:00
Wolfgang Mueller-Friedt
e26a26fd7a ctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called with a wrong parameter list
(This used to be ctdb commit e1e285d9f7fa3237dbbacca52a4eb2b264fa5986)
2010-03-11 17:52:42 +11:00
Mathieu Parent
c57c06df8c Fix some more bashisms
(This used to be ctdb commit 3d82ca5b1b8ba2770c739493aa0cdd34bb4827d8)
2010-03-10 17:41:40 +11:00
Mathieu Parent
e7bca0dcfc Correct nice_service()
nice takes a binary as argument and not a function or builtin command

(This used to be ctdb commit e21b40db64b314a24caa2bc611cb48b93decb5aa)
2010-03-10 17:39:56 +11:00
Michael Adam
ff48fc3933 fix bug #7152: check NFS-Shares, fails with to long path-names
Thanks to Thomas Sesselmann <t.sesselmann@dkfz.de> .

Michael

(This used to be ctdb commit da5fc07baa9aa806c3cba52c00fb10cf8b7f2dc5)
2010-02-23 21:08:23 +11:00
Stefan Metzmacher
e44c2396a7 config/13.per_ip_routing: fix typo in error message
metze

(This used to be ctdb commit 4b06665b77cb24d488f4ef03cc9ad5fd5d0feb0e)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
d79a70bca3 config/13.per_ip_routing: use better names for release_script and setup_script
As the basename of the script will be used for the readd script
from setup_iface_ip_readd_script, it's know easier to identify
what script is called by delete_ip_from_iface() while readding
ips to the interface.

metze

(This used to be ctdb commit 3ee225b0b6ed37c22478bd145ced56b1b9b86842)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
08d69d2cec config/13.per_ip_routing: register the setup script with setup_iface_ip_readd_script()
This is needed because we need to resetup the routing table when
the delete_ip_from_iface() function readds the ip to the interface.

metze

(This used to be ctdb commit ea87185ec9977006ef72d5a68c875154e4c84099)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
3a0d830e4c config/13.per_ip_routing: add a setup_per_ip_routing() function
This combines the logic into a shell function which can be used by the
"takeip" and "updateip" hooks.

We check the return values of the "ip" commands now
instead of ignoring them.

We now create a setup_script.sh similar to the release_script.sh
which makes it easier to analyze problems.

metze

(This used to be ctdb commit 624e8878851b4957cc7c02e922ec86926d6927ee)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
061c2a7182 config/10.interface: use delete_ip_from_iface also in the "init" event
metze

(This used to be ctdb commit e2bc5c25116747c58505fe1cb3e2d164257377d1)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
90769bf4eb config/11.natgw: use delete_ip_from_iface() instead of remove_ip()
This also initializes the variables correctly for the
shutdown|removenatgw code path to delete_all.

metze

(This used to be ctdb commit 2c2cbed4fcbc868a990fa6b32fc96126ffc61bb5)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
d71c40cad7 config: make remove_ip() a wrapper of delete_ip_from_iface()
metze

(This used to be ctdb commit e66d6636b80e3614f183366ec92fc3c6d5c323da)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
3bd1910428 config: interface_modify states in a $CTDB_BASE/state/interface_modify directory
metze

(This used to be ctdb commit 756c8b953fef7132dae74b5b244baeb3108dec54)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
d8ab328ee1 config: add setup_iface_ip_readd_script() helper function
This adds a generic infrastructure to register scripts which will
be called when the delete_ip_from_iface() funtion needs to readd
secondary ips to an interface.

metze

(This used to be ctdb commit ac97d65f44e8dc8bf2ec8f68e4db3448521755a2)
2010-02-23 10:38:47 +01:00
Stefan Metzmacher
feebd033eb config: readd ips with a broadcast address in delete_ip_from_iface()
metze

(This used to be ctdb commit e7a6f64cf5bce5abdc47f5db96b286c5a8d66aff)
2010-02-23 10:38:47 +01:00
Ronnie Sahlberg
af79d2c08b Make sure that the natgw eventscript also triggers on the "stopped" event
to remove the natgw configuration and ip assignments used.

BZ61036

(This used to be ctdb commit 344b1f95b126ecabeb4576330038b08bf88e8cb8)
2010-02-23 10:16:17 +11:00
Ronnie Sahlberg
6091dce975 From Sumit Bose <sbose@redhat.com>
Fixes for init script to meet guidelines

(This used to be ctdb commit 9f484404030211df85a215fd2280568a2ec020fb)
2010-02-22 14:06:52 +11:00
Ronnie Sahlberg
5439401dd2 try to restart rpc-rquotad if it is not running
bz60317

(This used to be ctdb commit 2263cd74d511247debadd0f6602bc6396b46ac5e)
2010-02-16 11:02:37 +11:00
Ronnie Sahlberg
70c1e39e64 Add a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
to control whether or not to check if we are swapping, and produce
useful output into the logfile if we are.

For production systems with dedicated nas-heads we should never swap.
But for developer/test systems we often use smaller nondedicated systems where
we can no longer guarantee that we will not be using swap.

(This used to be ctdb commit db87849bf3380914a63a626412bec209dbea7d20)
2010-02-16 11:01:39 +11:00
Ronnie Sahlberg
64111bb02b Add a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
when set to "yes" this will skip checking if knfsd has hung or not.

bz59626

(This used to be ctdb commit b0bf3794753c5bb898295b5109707953cc3dcec5)
2010-02-16 10:59:53 +11:00
Martin Schwenke
d25ab9eca0 Merge commit 'origin/master'
(This used to be ctdb commit 19523fbb12db1ec1e5ee38de1b2d3b99a74c6ca4)
2010-02-10 20:24:28 +11:00
Rusty Russell
34b8b98078 event scripts: add logging for low memory conditions
We should never enter swap; if we do, show the memory state of the machine and the process list.  This will help us diagnose what caused the condition before it's too late and the box starts OOM-killing processes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 627a6d67a0e9e61f8713e62695b3518c51909230)
2010-02-09 12:46:35 +10:30
Martin Schwenke
56b178e1a2 eventscripts: stop loadconfig function from loading ctdb config file twice.
If "$1" was empty than loadconfig would load the ctdb config twice.
This stops that from happening.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0406d406da70aaee7ad6aac236114905c5d03ed2)
2010-01-22 17:19:12 +11:00
Martin Schwenke
407a8f7205 eventscript: Use of $NFS_TICKLE_SHARED_DIRECTORY must be after loadconfig.
Proper fix for 085d1bea78fabf754ef6dd6d323f74a1d361e45c's workaround.
$NFS_TICKLE_SHARED_DIRECTORY was being used before it is set via
loadconfig.

Ronnie actually spotted this one.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ee8b2e298351d05197a2e1494f3331433644c1e6)
2010-01-22 17:14:50 +11:00
Martin Schwenke
02e68340e8 initscript: Remove bash-ism.
Also, change the order of the comparison so it is consistent with
others in the script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44696e15cdb23e7656d3bb0ead54f509495738a7)
2010-01-22 17:13:17 +11:00
Martin Schwenke
d6b0578cfb initscript: handle spaces in option values inserted into $CTDB_OPTIONS.
This puts single quotes around everything and uses eval on the
command-lines that actually start ctdbd.  The eval causes the single
quotes to be interpreted.

The "redhat" init style no longer uses the Red Hat daemon function.
It loses the quoting and re-splits on spaces.  Instead we add an extra
line that uses the success/failure functions to keep things pretty.
Note that this means that we don't respect daemon's
$DAEMON_COREFILE_LIMIT variable but we do our own core file handling
with $CTDB_SUPPRESS_COREFILE anyway.  daemon's core file handling was
probably overriding what we were doing anyway, so this can be regarded
as a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 522fbb012524fe41a67dbe43589a282dda6bcbe2)
2010-01-22 15:34:21 +11:00
Stefan Metzmacher
12c8dd215c config: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path
This is very useful for testing, I use such a script:

cat ~/bin/ethtool
 #!/bin/sh

 IFACE=$1

 case "$IFACE" in
        Neth2)
                ;;
        Neth3)
                ;;
        Neth4)
                ;;
        Neth5)
                ;;
        *)
                exec /usr/sbin/ethtool $@
                ;;
 esac

 ip link set down $IFACE

 exec /usr/sbin/ethtool $@

metze

(This used to be ctdb commit 3bab985cf615720eded4d47b4f9f37a9c28840aa)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
ea5843075c events: add updateip event to 13.per_ip_routing
metze

(This used to be ctdb commit 829150e814a5e6c85d0f21421f46f41e81d74c53)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
6a818e66ae events: 10.interface handle updateip event
metze

(This used to be ctdb commit a5cdf1277387f8c6292153c37fa9ceb64707d04f)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
50bff8c886 config: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig
With this option set to "yes", we don't become unhealthy
as long as at least one interface is still available.

metze

(This used to be ctdb commit d054eb33c6ae92560cddb40732e5dcf622591a3c)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
5d2c3ef656 config: 10.interfaces call monitor_interfaces on startup
metze

(This used to be ctdb commit 615dec051c26aac628f120e96bf12fb39fc6d28a)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
94e7101070 config: 10.interfaces call ctdb ifaces and ctdb setifacelink for monitoring
metze

(This used to be ctdb commit c465f63585c419ba59a6b04cbbf78ae615a7259d)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
9c89dd9210 events: splitout a monitor_interfaces function in 10.interface
metze

(This used to be ctdb commit b5ba56dea57db97d6c6ba3e7582e74fe0e3041fc)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
9a43f5e42b events: 10.interfaces allow multiple interfaces per public address
metze

(This used to be ctdb commit f9837f8b6f887d28f29aeb3eeffe8cfb423b40b4)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
628ac65709 config: add 13.per_ip_routing event script
With this script it's possible to generate routing tables
per public ip address.

metze

(This used to be ctdb commit ff5678fbec2daef461143acf00cef3f94d7655fc)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
2ecf8053f9 config: add some ipv4 helper shell functions
Many thanks to Michael Adam <obnox@samba.org>
for the basic work.

metze

(This used to be ctdb commit ff9c641763702ae99632bbf4d0825d578440c074)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
4493ba6ffa config: add interface_modify.sh and call it under flock to make modification on interfaces atomic
When two releaseip events run in parallel it's possible that the 2nd script
readds a secondary ip that was removed by the 1st script.

metze

(This used to be ctdb commit e02417b2a55c45ac2c125b1b3463c9c39e7bc07a)
2010-01-20 11:10:48 +01:00
Stefan Metzmacher
c251ac20fa events/10.interfaces: move some parts to helper functions
metze

(This used to be ctdb commit 24cd42769d8f32b90a8876a6a08a36ab23076cd1)
2010-01-20 09:44:37 +01:00
Stefan Metzmacher
d01870f138 config/functions: add tickle_tcp_connections()
metze

(This used to be ctdb commit 2397f13d7b5ca3847ef148187c6b179d06f6a47a)
2010-01-20 09:44:37 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Stefan Metzmacher
9cba540514 lib/util: import fault/backtrace handling from samba.
metze

(This used to be ctdb commit 8171d66f0061fe23ed6dfef87ffe63bfc19596eb)
2010-01-20 09:44:36 +01:00
Ronnie Sahlberg
21e5b44673 source the nfs sysconfig file from the 61.nfstickles script
(This used to be ctdb commit 085d1bea78fabf754ef6dd6d323f74a1d361e45c)
2010-01-20 10:35:02 +11:00
Ronnie Sahlberg
a1d60b1511 Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.

Add an entry in the sysconfig file to set this persistently.

(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
2010-01-15 15:38:56 +11:00
Martin Schwenke
b65a44a4ec Revert "Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state."
This reverts commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6.

wbinfo --ping-dc is proving too unreliable.

(This used to be ctdb commit b70021856e76df1ba407c83cfc19bf332fbfc869)
2010-01-12 21:02:44 +11:00
Martin Schwenke
96066d8816 Revert "events/50.samba: only use wbinfo --ping-dc if available"
This reverts commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b.

wbinfo --ping-dc is proving too unreliable.

(This used to be ctdb commit 178f429a7b6d1008d35e857b6ca1df6adb60d255)
2010-01-12 21:02:11 +11:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
96977cc5c4 config: add CTDB_MAX_PERSISTENT_CHECK_ERRORS option
metze

(This used to be ctdb commit fc5f556d488488040303438aefecb5ae2a8e54bc)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c735f03d4 config: try to use tdbtool <tdb> check instead of tdbdump for persistent db checks
metze

(This used to be ctdb commit 52e6d81f4d8a4035272d9256d01bafb8ed593027)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c907f4965 config: load 'ctdb' config before 'nfs' config in statd-callout
All other scripts do 'loadconfig ctdb' before any other 'loadconfig foo'
call. I think we should do the same in statd-callout.

Otherwise it's very confusing, if you have configured some Options
in /etc/sysconfig/ctdb, but /etc/ctdb/statd-callout doesn't notice
them.

metze

(This used to be ctdb commit 10d95581fb90bfdf58ec32345c4e36c27acf4f37)
2009-12-16 08:03:55 +01:00
Ronnie Sahlberg
50820f9e18 Bond devices can have any name the user configures, so
when checking link status for an interface, first
check if this interface is in fact a bond device
(by the precense of a /proc/net/bonding/IFACE file)
and use that file for checking status.

Othervise assume ib* is an infiniband interface which we donnt know how
to check, or otherwise it is an ethernet interface and ethtool should
hopefully work.

(This used to be ctdb commit 8cc6c5de3d7abb0b72eaa6e769e70963b02d84cb)
2009-12-09 11:33:04 +11:00
Ronnie Sahlberg
3ca3f4c771 make sure to also check that interfaces used for NATGW are ok
and have a link.
if not the node should become unhealthy

(This used to be ctdb commit 03b5bbaae1b53830a4cd20d3079ab8f45ffce923)
2009-12-09 11:13:29 +11:00
Stefan Metzmacher
af170d1a8a events/50.samba: only use wbinfo --ping-dc if available
metze

(This used to be ctdb commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b)
2009-12-08 07:38:00 +11:00
Ronnie Sahlberg
cdabe16777 Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state.
(This used to be ctdb commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6)
2009-12-07 18:27:46 +11:00
Martin Schwenke
b17bf38c64 Eventscripts: Fix syntax error in 00.ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea261f791ab919eb1ce5b37073b4f1d30694bb8)
2009-12-01 18:08:57 +11:00
Martin Schwenke
50a26cf75e Eventscripts: Remove executable bit accidently set on some scripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4c6e68ae942c05224c5f8b683fbc2dc1adced8ee)
2009-12-01 17:54:45 +11:00
Martin Schwenke
db25ca69e5 Eventscript argument cleanups and introduction of ctdb_standard_event_handler.
The functions file no longer causes a side-effect by doing a shift.
It also doesn't set a convenience variable for $1.

All eventscripts now explicitly use "$1" in their case statement, as
does the initscript.  The absence of a shift means that the
takeip/releaseip events now explicitly reference $2-$4 rather than
$1-$3.

New function ctdb_standard_event_handler handles the status and
setstatus events, and exits for either of those events.  It is called
via a default case in each eventscript, replacing an explicit status
case where applicable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d55408cbbb3bb71670b80f3dad5639ea0be5b5b)
2009-12-01 17:43:47 +11:00
Martin Schwenke
ad431c3520 Event scripts: functions file now intercepts status and setstatus.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1f37fdc5217e57d2d643d77a811afca747685e0)
2009-11-27 15:57:33 +11:00
Martin Schwenke
ece15620c0 Event scripts: use $script_name rather than $service name for status.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 517e9d9b188b18dffc712a8fecddb41540d27b8d)
2009-11-25 16:42:14 +11:00
Martin Schwenke
ee10ea202b Event scripts: Respect CTDB_MANAGES_NFS and add function log_status_cat.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d97c07be13a8209a81dfc8f73e49371949e4dc3)
2009-11-25 16:34:49 +11:00
Martin Schwenke
1edcb89948 More eventscript cleanups. Initial smoke testing seems OK.
Apart from lots of cleanup work, this also fixes a bug where the share
checks didn't used to cope with directory names containing spaces.
The previous commit also loaded the config incorrectly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3c93336ab92c2e4829ff4dc360045bfa6df21d50)
2009-11-25 16:30:47 +11:00
Martin Schwenke
a4a048b5cd Now vaguely tested initscript updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1e350f9edb74cc44b6c5be4c062fd93e98ba8c4)
2009-11-19 16:48:19 +11:00
Martin Schwenke
ee513c1ba2 More untested eventscript factorisation.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ac655b0a65b32d809d47fec9821f7f31bb2fe2a7)
2009-11-19 15:00:17 +11:00
Martin Schwenke
73cb65bf1a Eventscripts: Untested factorisations and introduction of status event.
This is the first stage of an experimental change to eventscripts.
Ronnie and I did a few hours of factorisation of 40.vsftpd and applied
many of the changes to 41.httpd.  Other eventscripts were also
modified.

At this stage this is completely untested.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 364e70b763f0ccd7714d15723ad3ea4d7e2968a1)
2009-11-13 18:28:25 +11:00
Mathieu Parent
2a66b7dae4 Fix bashism in events.d/11.natgw
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 6ccb495d1110157c06596763c7e252f3182c251e)
2009-11-10 12:07:30 +01:00
Ronnie Sahlberg
3cbaf935af sugegstion from metze,
use killtcp and kill both directions of the nfs connections.
we used to kill only one direction since the other direction was unkillble
but recent kernels allow us to kill both

(This used to be ctdb commit 8001ae580bcc28d45f6026b529d7ffc247cbba34)
2009-11-06 09:54:03 +11:00
Michael Adam
85a4d9a943 ctdb.sysconfig: add a comment section about CTDB_RUN_TIMEOUT_MONITOR
Michael

(This used to be ctdb commit b7dc1e0720991cc65353e07cf87608acea21ba27)
2009-11-05 11:13:53 +01:00
Michael Adam
95333e0ee7 Add a 99.timeout event script to trigger monitor timeouts.
This just sleeps for twice the value of EventScriptTimeout
in the monitor action. It is not run by default, but
can be activated by setting CTDB_RUN_TIMEOUT_MONITOR
in /etc/sysconfig/ctdb .

Michael

(This used to be ctdb commit 1a3ecdee85b82bb3234a92ae6bcdeb92238eb7ee)
2009-11-05 11:13:47 +01:00
Ronnie Sahlberg
0d3bff5fa6 From Rusty
It's much nicer for post-mortem debugging to have a body to examine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 058e21d96c3c02759833fd5ddfe7b43e6a5f5740)
2009-11-05 15:57:46 +11:00
Ronnie Sahlberg
c915f2e5d5 add an extra test for the bond devices and check that there is an active slave.
this to handle the case where all links do have a physical layer, but where all slaves have been disabled using ifdown

(This used to be ctdb commit bf50709630df000583f2b0ef0edc177c01d60eaf)
2009-11-05 12:12:06 +11:00
Ronnie Sahlberg
2501638e15 dont verify winbindd is running properly at startup
(This used to be ctdb commit 9e1b99221c8f257129641f6eda2795537b7ce9de)
2009-11-04 07:50:26 +11:00
Ronnie Sahlberg
9e235af3a2 make the error logged when winbindd fails to access the dc during startup more scary and easier to spot in the logs
(This used to be ctdb commit 0c9b0466fd87b3f1e5d53f867c863217802ac43b)
2009-10-29 11:54:24 +11:00
Ronnie Sahlberg
023d09cd38 Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover."
This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36.

(This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)
2009-10-29 10:49:00 +11:00
Ronnie Sahlberg
279b7ca564 update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover.
(This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)
2009-10-29 10:37:10 +11:00
Ronnie Sahlberg
0588b5f9c5 add a check that winbind can actually talk to teh dc during the startup event
and refuse to start up if it can not

(This used to be ctdb commit 4037b6e73a819a8e2463dfe0959b42875e05e106)
2009-10-27 15:45:03 +11:00
Martin Schwenke
8b2101bc61 Merge commit 'origin/master'
(This used to be ctdb commit 61282d4a9be9e544aaa86f3cffc5b58e417f5ab1)
2009-10-21 21:48:15 +11:00
Ronnie Sahlberg
ff8363697d treat interfaces with the name ethX* as bond devices
(This used to be ctdb commit 3997d7e5471810e9a2f145ce2e795073dfc5eded)
2009-10-21 11:34:17 +11:00
Martin Schwenke
b77094e897 Merge commit 'origin/master'
(This used to be ctdb commit b3ae2b753261443dca317803752a9d61285a3270)
2009-10-19 16:46:45 +11:00
Ronnie Sahlberg
58780f4137 add a direcotry where multiple local scripts can be added to run when executing eventscripts
(This used to be ctdb commit 27d152a918680a59c7412aec7e1772f25b72d469)
2009-10-19 16:22:15 +11:00
Ronnie Sahlberg
cdc77af3ab wait a bit longer before shutting down when the reclock file is missing
pring the filename of the missing file when we turn unhealthy and also
a 'df'

(This used to be ctdb commit 97ded8a629ec762f71bad28515e4fbc810790b1d)
2009-10-19 15:33:20 +11:00
Ronnie Sahlberg
1e91fd0a25 Revert "dont shutdown a node when the reclock file is temporarily unavailable."
This reverts commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50.

(This used to be ctdb commit 02f68dc60e0b7bf26d631850b12834d5c71a88f2)
2009-10-19 15:30:44 +11:00
Martin Schwenke
b20d680070 Merge commit 'origin/master'
(This used to be ctdb commit 5ad283458e59ea8232e01f34be007901c10c8a2e)
2009-10-16 16:36:48 +11:00
Martin Schwenke
0bff3b4289 initscript: when stopping on Red Hat use the success/failure functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bf5402b41282da94fee1ab3e4546ec089ff12f37)
2009-10-16 16:35:56 +11:00
Ronnie Sahlberg
d258616984 dont shutdown a node when the reclock file is temporarily unavailable.
Leave the node as UNHEALTHY this stops clients from accessing the node until
the reclock file can be accessed again

(This used to be ctdb commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50)
2009-10-15 13:19:10 +11:00
Ronnie Sahlberg
30d9fbfbec move the logging of the warning "No reclock file used" to the startup case so we only print this warning on "service ctdb start" and not for "service ctdb *"
(This used to be ctdb commit eb854f65f978f24583e221138eb4f9b917b89285)
2009-10-14 12:12:04 +11:00
Ronnie Sahlberg
070f781e39 always create the nfs state directories during the monitor event.
this allows us to configure and enable nfs at runtime without having to restart ctdbd

(This used to be ctdb commit f6e39d35713475defaa08a623e194f3f2f8f7d53)
2009-10-14 09:15:24 +11:00
Ronnie Sahlberg
df0dba1862 Merge commit 'martins/master'
(This used to be ctdb commit 5f14874c5c705dd637f88a77f30c930fea1201d2)
2009-10-12 16:51:36 +11:00
Martin Schwenke
ab98c1b0f1 Clean up ctdb_check_directories* eventscript functions.
There are 2 problems with this code:

* The loop in ctdb_check_directories_probe() breaks on filenames
  containing whitespace.

  The fix to protect them is to pass "$@" to this function and have it
  operate on "$@".

  Note that there's still a problem with whitespace in filenames in
  the 50.samba eventscript.  To fix this ctdb_check_directories_probe
  should read the filenames from stdin.  Another time...

* The check for '%' in filenames in ctdb_check_directories_probe()
  ends up involving several forks.  On a modern machine this can cost
  a couple of minutes when checking a large number of directories.

  The fix is to use a case statement.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eb1fecaef9aa5cb85dff7d4f7af8a9878deabed8)
2009-10-12 16:32:49 +11:00
Martin Schwenke
d8e2ddc5a8 40.vsftpd: reset the fail counter in the "recovered" event.
Each recovery that involves IP reassignments results in a restart of
vsftpd in the "recovered" event.  Currently, we can have several
recoveries in quick succession and the "monitor" event following each
can fail because vsftpd isn't ready yet.  This results in cumulative
failures, so the node is marked unhealthy, even though vsftpd has
never had a proper opportunity to become ready.

This resets the fail count after each recovery.

While we're here, also move the delete of the restart flag file into
the body of the conditional.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 318abeb4b913a8d846e7eaf4cf5c2a67b61ce974)
2009-10-12 16:17:37 +11:00
Ronnie Sahlberg
42193cbff8 update natgw eventscript to allow you to fore it to update and / or to remove the configuration at runtime
(This used to be ctdb commit deed52b7e4aac94b4d11a8d89d08739e1dfd4ed7)
2009-10-06 16:09:24 +11:00
Ronnie Sahlberg
e90dd8015f add a new notification to trigger on when ctdb has started
(This used to be ctdb commit b1fe04f2e9447f762a0b805763deb29296585ff8)
2009-10-01 14:05:30 +10:00
Martin Schwenke
b27600253d Minor fixes to 01.reclock eventscript.
test -z really needs its argument to be quoted.  Simplified a status
test.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fe26da7780545b1ecc0a7da5bc1cf8beaeea94cc)
2009-09-30 21:21:56 +10:00
Martin Schwenke
78b7043411 40.vsftpd monitor event only fails after 2 failures to connect to port 21.
Change the monitor event in 40.vsftpd so it only fails if there are 2
successive failures connecting to port 21.  This reduces the
likelihood of unhealthy nodes due to vsftpd being restarted for
reconfiguration due to node failover or system reconfiguration.

New eventscript functions ctdb_counter_init, ctdb_counter_incr,
ctdb_counter_limit.  These are used to count arbitrary things in
eventscripts, depending on the eventscript name and a tag that is
passed, and determine if a specified limit has been hit.  They're good
for counting failures!

These functions are used in 40.vsftpd and also in 01.reclock - the
latter used to do the counting without these functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfe63636a163730ae9ad3554b78519b3c07d8896)
2009-09-30 21:05:16 +10:00
Ronnie Sahlberg
c971d934a9 From Wolfgang Mueller-Friedt
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.

Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.

this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.

minor stylish changes made by ronnie s

(This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
2009-09-29 13:27:19 +10:00
Ronnie Sahlberg
9bac6f2e2c change the reclock fail count to 19 monitor intervals before we shut down ctdbd
(This used to be ctdb commit 6e35feb06ec036b9036c5d1cdd94f7cef140d8a6)
2009-09-28 14:12:59 +10:00
Ronnie Sahlberg
4f0f2cc196 add a new eventscript 01.reclock
if the reclock file has been set, then this script will test that the
    reclock file can actually be accessed.
    if the file does not exist, or if the attempts to stat the file hangs,
    the node will be marked unhealthy after the third failed monitoring event
    and after the tenth failure, ctdb itself will shutdown.

(This used to be ctdb commit 2cb04747887674def299e574fccb827c1c3194e7)
2009-09-28 14:06:40 +10:00
Ronnie Sahlberg
4a05b2dfd8 try restarting ststd indefinitely not just once
(This used to be ctdb commit 03b0d913ae009284e2fadda1b9246ec77d19db29)
2009-09-15 19:33:53 +10:00
Ronnie Sahlberg
029fd6b00f Revert "try to restart statd everytime it fails, not just the first time"
This reverts commit 4f7b39a4871af28df1c4545ec37db179fa47a7da.

(This used to be ctdb commit db7b96304e4725f29b12398b7582e385daed63ed)
2009-09-15 19:33:35 +10:00
Ronnie Sahlberg
59cacded72 try to restart statd everytime it fails, not just the first time
(This used to be ctdb commit 4f7b39a4871af28df1c4545ec37db179fa47a7da)
2009-09-15 13:35:58 +10:00
Michael Adam
e80a7001ff Introduce sysconfig variable CTDB_SYSLOG=yes/no (default "no").
This allows for controlling start of ctdbd with or without the option "--syslog"
from the sysconfig/ctdb file.

Michael

(This used to be ctdb commit 7bf9fff9139a4270496bddb97f9433bab87824bf)
2009-09-09 09:52:14 +02:00
Michael Adam
d8f9dad26b Rename the CTDB_INIT_STYLE "ubuntu" to "debian" - this is where it comes from.
Micheal

(This used to be ctdb commit b060911683d8ac201806d35a505867fe3ba9519f)
2009-09-09 09:52:13 +02:00
Mathieu Parent
70294f3136 Fix bashism in nfstickle event script.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit f7a326b560b12f8b46c01d98cdd460e5510c67fb)
2009-09-09 09:52:13 +02:00
Mathieu Parent
e12faf771c Fix bashisms in samba event script.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 0310a6b17d6167c46482a07c6cd96bcabda6ffbc)
2009-09-09 09:52:13 +02:00
Mathieu Parent
28319e4760 Fix bashisms in multipathd event script.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 13b81b6c8e01aa52a31756ecffa797a4761115db)
2009-09-09 09:52:13 +02:00
Mathieu Parent
e160925f86 Fix bashism in natgw eventscript.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 4fad47c1af8503385b090be281ffbd284021279c)
2009-09-09 09:52:12 +02:00
Ronnie Sahlberg
001c0f0c7e make it possible to have ctdb manage (start/stop/monitor) winbind without having samba
(This used to be ctdb commit 77574b7d7fe11c8e73957a80845481f3b2a64219)
2009-09-04 02:59:24 +10:00
Ronnie Sahlberg
d5329b13e9 overwrite the state file, dont append to it.
dont log errors is trying to delete a nonexisting state file

this eliminates some annoying log entries in the ctdb log

(This used to be ctdb commit 7a95257a5ec19f232f661bc7f797051bf08ab776)
2009-09-02 04:39:17 +10:00
Ronnie Sahlberg
f3fd4bb659 redirect stderr to dev null since the rule might not exist when we try to unconditionally delete it
(This used to be ctdb commit e1d709f32196e19d4041ee2958e143791762e08f)
2009-09-02 03:12:27 +10:00
Michael Adam
34d2bb1f6c set broadcast addresses in the takeip event.
Michael

(This used to be ctdb commit e26d9d32e68e7db1cf4f96c47c0126e9e0b213be)
2009-08-28 06:50:53 +10:00
Ronnie Sahlberg
e893393ef2 remove a check for the reclock file we dont need
(This used to be ctdb commit 54c047c48902a15e5d2925bfa86e012a11188796)
2009-08-28 05:19:44 +10:00
Wolfgang Mueller-Friedt
345df3c714 remove repack from eventscript
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>

(This used to be ctdb commit dd334caa98882fc59765b7c84eca8e86de785487)
2009-07-29 13:29:38 +10:00
Ronnie Sahlberg
4d5823ba7c update the natgw eventscript to set the NATGW capability when this feature is used
This does not modify any behaviour of the daemon itself other than showing this flag as ON in the ctdeb getcapabilities output

(This used to be ctdb commit fb337c151bd16ad5ad0c99431224451979d8c651)
2009-07-28 10:00:33 +10:00
Ronnie Sahlberg
6db0f01532 document the new stopped event
(This used to be ctdb commit 70603d9a79c80379bf65d9d703c399a65c109c52)
2009-07-17 12:30:05 +10:00
Ronnie Sahlberg
e5e9fc48b1 create a new event : stopped.
This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ...

Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered.

(This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)
2009-07-17 12:26:16 +10:00
Ronnie Sahlberg
9c6aa4e420 update the eventscript to ensure that stopped nodes can not become the natgw master
also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not.

(This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)
2009-07-17 09:45:05 +10:00
Ronnie Sahlberg
66c8d4fb3d make it possible to start the daemon in STOPPED mode
(This used to be ctdb commit 866aa995dc029db6e510060e9e95a8ca149094ac)
2009-07-09 11:57:20 +10:00
Ronnie Sahlberg
2708b305ca Initscript cleanups.
* Move building of CTDB_OPTIONS to new function build_ctdb_options()
  and have it use a helper function for readability.

* New functions check_persistent_databases() and set_ctdb_variables().

* Remove valgrind-specific stop code, since the general pkill should
  kill ctdbd when running under valgrind.

* Remove some bash-isms (e.g. >& /dev/null) since the script is /bin/sh.

* Make indentation consistent.

* Minor clean-ups.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Conflicts:

	config/ctdb.init

(This used to be ctdb commit bebb21f18e3026cb78a306104e92ee005d1077b2)
2009-07-07 13:45:19 +10:00
Ronnie Sahlberg
3c1351eabd update the sysconfig to show setting the debuglevel using a string literal instead of a numeric value
(This used to be ctdb commit 964530d70ba2ca949380d30a0e3d622963a6206c)
2009-07-01 09:23:52 +10:00
Ronnie Sahlberg
4a1a3652fe Document that you can run ctdb without a reclock file in the sysconfig file
(This used to be ctdb commit 33895d217ee096b356f02b5292ba27a840c4f559)
2009-06-25 11:59:21 +10:00
Ronnie Sahlberg
77ef745394 Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking.
Update the init script to allow starting without a reclock file.

(This used to be ctdb commit 07855ff5eba71e7d607d52e234a42553d9b93605)
2009-06-25 11:50:45 +10:00
Ronnie Sahlberg
d3dde37934 rename 99.routing to 11.routing so the eventscript is processed before
NFS and LVS

(This used to be ctdb commit 16ec9ca56a9f5b88d7a5ed4f89a28a53f5c9c081)
2009-06-23 11:01:04 +10:00
Martin Schwenke
566314ca97 Fix minor problem in previous initscript commit.
The valgrind start case should not use daemon, since this is specific
to Red Hat.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 867f57d166395c92949e480ca725249b0ca8950b)
2009-06-19 18:08:54 +10:00
Martin Schwenke
3dad79b88e Initscript fixes, mostly for "stop" action.
Use a local variable $ctdbd so that we always run ctdbd from the the
same place and so that we know what to kill.  This variable respects
the $CTDBD environment variable, which may be used to specify an
alternative location for the daemon.

In the important cases use "pkill -0 -f" to check if ctdbd is
running.  Also, remove the special case for killing ctdbd when running
under valgrind.  The regular case will handle this just fine.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 070305adfe636c2580776e6bf24bb8be06622b86)
2009-06-19 18:08:31 +10:00
Ronnie Sahlberg
0ddf79a3bc increase the timeout before we shutdown when ther ecovery daemon is hung
(This used to be ctdb commit facddcacb4a961cddb117818fa38a3e97770b2fa)
2009-06-18 09:20:18 +10:00
Ronnie Sahlberg
34fbfb8b89 rename 99.routing to 11.routing
so it is executed before any of the service scripts

(This used to be ctdb commit 1205673499618f90f413fad9e96a88733b5ce359)
2009-06-18 09:11:46 +10:00
Ronnie Sahlberg
caf0e863a4 remove the obsolete ipmux component.
this is replaced by LVS since a long time

(This used to be ctdb commit dca41ec04788922ce5f4c52d346872b3e35f8cbb)
2009-05-25 12:33:52 +10:00
Ronnie Sahlberg
e999ade7bb From Flavio Carmo Junior <carmo.flavio@gmail.com>
Add an eventscript to manage ClamAV

(This used to be ctdb commit bb4ef6c4d2bc3578bdf4432517e98f85ec94e3b6)
2009-05-25 12:10:29 +10:00
Ronnie Sahlberg
934d8a6b5f From : Flavio Carmo Junior <carmo.flavio@gmail.com>
Add a helper function that checks whether a unix domain socket exists
and there is a daemon LISTENING to it  similar to the existing function
to check for a daemon LISTENING to a tcp/ip socket.

(This used to be ctdb commit 025a836ab3be3c078fccd8c10b10dfffbfdd94d0)
2009-05-19 08:47:19 +10:00
Ronnie Sahlberg
be7137faa9 use scope host when adding the interface to loopback so we dont respond to ARPs for this ip
(This used to be ctdb commit fcd6226a6c00cf657532aa76804bfe029df21ba6)
2009-05-14 08:55:05 +10:00
Ronnie Sahlberg
016b37f1e2 change the prefix NATGW_ to CTDB_NATGW_
(This used to be ctdb commit b7ed7fd4a5fbd344d41caa1afa100b1f24506173)
2009-05-14 08:12:48 +10:00
Ronnie Sahlberg
12400298c1 assign the natgw address to loopback and not the private network so that natgw will still work even when public and private networks are one and the same
(This used to be ctdb commit 2bd796b8a098074502fe20e3ab69098b2109c133)
2009-05-12 18:42:13 +10:00
Martin Schwenke
86ad711c37 41.httpd event script workaround for RHEL5-ism.
RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores.  This means that eventually a node runs out of semaphores
and httpd can't be started.  So, before we attempt to start httpd we
clean up any semaphores owned by apache.  We also try to restart httpd
in the monitor event if httpd has gone away.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2d3fbbbb63f443686f9fec42c0bc2058d115806e)
2009-05-12 08:53:32 +10:00
Andrew Tridgell
4f4f03f84a use less intrusive smbstatus call in periodic connections cleanup
(This used to be ctdb commit a152fdc79e3360049aee66c3e628237a91df181f)
2009-05-06 08:20:55 +10:00
Ronnie Sahlberg
2e3542b5e5 dont unconditionally kill/restart ctdb when given "service ctdb start" only start ctdb if it is not already running, and print an error message othervise
(This used to be ctdb commit 94343309992929a592348c936e09a7b4f8b512c1)
2009-04-30 17:38:30 +10:00
Andrew Tridgell
37e2417c59 change shutdown level for ctdb to be 01
We want ctdb to shutdown first, as it manages many other
services. With the old level of 32 the NFS service would shutdown
first, and that would trigger ctdb to do a recovery. Then ctdb itself
would be shutdown a few seconds later, which causes a lot of error
messages in the other nodes logs

(This used to be ctdb commit 2f952af1a12e81a652ec9a4794db96f9593f2676)
2009-04-23 11:35:42 +10:00
Ronnie Sahlberg
4be3e86405 create a function "remote_ip" which can be used from scripts to remove a single ip from an interface.
use this fucntion from the natgw eventscript

(This used to be ctdb commit feab5f30b2d6cebf4dd28abc5a81f93424a4c852)
2009-04-08 12:49:28 +10:00
Ronnie Sahlberg
53d6626503 install a default /etc/ctdb/notify.sh script as example on how to use
snmptrap/email to notify that a node has changed health status

(This used to be ctdb commit ee52c0866e2b26c396fe60946159c559d47199eb)
2009-03-31 14:38:52 +11:00
Ronnie Sahlberg
ad40ee25f9 add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state.
This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes.

(This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)
2009-03-31 14:23:31 +11:00
Ronnie Sahlberg
b9e6e15cd4 we must also try to set the routes when we release an ip since during the release/10.interfaces there can actually be a window where the kernel decides to remove all addresses (before we manually add them back in 10.interfaces) during which the kernel may also decide to delete all routes since there are no gateways reachable through this interface anymore.
(This used to be ctdb commit 34633223a46caaa079da233663f9c6dcc1803f87)
2009-03-31 11:33:28 +11:00
Ronnie Sahlberg
d7ff332896 update how the NATGW configuration works.
allow the cluster to be partitioned into multiple disjoint natgw subsets

(This used to be ctdb commit 1046885cd22b5001e0251de2e536b5f6793459be)
2009-03-25 13:37:57 +11:00
Ronnie Sahlberg
689f76f0b0 Merge branch 'obnox'
(This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)
2009-03-24 17:49:55 +11:00
Ronnie Sahlberg
36ec47d610 create a varient of kill_tcp_connections that only kills off the local side of a connection
(This used to be ctdb commit dc2f28f7c988364b5d45f3048be4db3e5ff113b3)
2009-03-24 14:05:31 +11:00
Ronnie Sahlberg
686adea3fe set --single-public-ip when lvs is used
(This used to be ctdb commit 292fff6eace39141591871e12f9a64e3441237be)
2009-03-24 13:51:32 +11:00
Michael Adam
a83ed1d743 Merge commit 'ctdb-ronnie/master'
(This used to be ctdb commit 39a972b0d6d0d70282c25c54a124b67431467e77)
2009-03-23 10:07:44 +01:00
Ronnie Sahlberg
293a3f1158 update the natgw eventscript and documentation
(This used to be ctdb commit 95d8ddbc2dd0b159e8df003502c3c336668d2c41)
2009-03-19 10:17:44 +11:00
root
9bf792d704 redo how the natgw is done. just use a default route with a high metric instead of fancy policyrouting
(This used to be ctdb commit f03bd2b3d906dac9fb876dca54535d22e9cf1b9e)
2009-03-18 19:19:49 +11:00
root
f037e881a2 change the NATGW_ example in sysconfig to make it more realistic
(This used to be ctdb commit 742283a8f8da7c614ee3a30d48c430e3a3bceeb9)
2009-03-18 09:33:58 +11:00
root
32391ec844 NAT-GW updates. Describe the functionality in the sysconfig file
(This used to be ctdb commit 4c598ab6f8e9b826d437b9ab869c4490f7c4faba)
2009-03-17 07:35:53 +11:00
Michael Adam
fd71213717 ctdb.sysconfig: add CTDB_MANAGES_HTTPD comment section
Michael

(This used to be ctdb commit ccaf9ebe062127124cf23e69dcd2ac2edda40020)
2009-03-10 00:21:04 +01:00
Michael Adam
eac9425820 events.d/50.samba: allow CTDB_SERVICE_{SMB,NMB,WINBIND} to be overriden from sysconfig
Michael

(This used to be ctdb commit b1aba6651143ae1c85b24d78b67c760795ff5bff)
2009-03-09 00:20:30 +01:00
Michael Adam
78294c4f3e ctdb.sysconfig: add CTDB_INIT_STYLE with explanation
Michael

(This used to be ctdb commit 8518c9e0ffec44677d45f60e63936a831d62ab20)
2009-03-09 00:08:26 +01:00
root
798553a9dc Add a variable CTDB_NFS_SKIP_SHARE_CHECK to sysconfig that can disable the check that all shares are accessable.
This can take very long if there are very many shares and is in that case better to implement in a separate cronjob than in ctdb eventscript

(This used to be ctdb commit 432604a1435cd2b5a7178fb5aedf1d4b61bffeb9)
2009-03-04 07:21:55 +11:00
root
c72c15c19a make it possible to disable checking all samba shares.
this is a timeconsuming process and might not be feasible to perform if there are very many thousand shares

(This used to be ctdb commit 051ae5f3c13892b860818eac803d348f09845dc6)
2009-02-20 10:58:34 +11:00
Michael Adam
d6c5f65572 Merge commit 'ctdb-ronnie/master'
(This used to be ctdb commit e1c90b12290c682c2cba90e9afa3a09be014e20e)
2009-02-10 00:28:08 +01:00
root
e7de72a1ac use netstat to check first and only fall back to netcat if netstat is unavailable
(This used to be ctdb commit dfb16ce9ed65048d30109851737a9075d071ecdb)
2009-02-05 14:44:46 +11:00
Michael Adam
0405ec036d events 41.httpd: support suse and ubuntu/debian systems for managing apache
The httpd service on suse and ubuntu/debian systems is usually
called "apache2" nowadays.

Note: There are older installs with Apache 1.3 out there, in which case
the service is called "apache". An extra check for these installs could
be useful as a sequel to this patch...

Michael

(This used to be ctdb commit b9e50e3416fecef6a881be3f1b91be977299293f)
2009-02-04 00:42:33 +01:00
Michael Adam
62f27d0cb3 events.d/41.httpd: fix a typo in the fix of the comment typo
This is embarassing...

Michael

(This used to be ctdb commit dbd90f6210617b23d5695c4c868392363c75d23b)
2009-02-04 00:01:15 +01:00
Michael Adam
77bd2b6c91 ctdb_check_tcp_ports: correctly detect listeners on ipv6 :::<port> w/out netcat
The netstat test only grepped for the ipv4 wildcard address.
Now the ipv6 wildcard listener is correctly detected as well.

Michael

(This used to be ctdb commit 78e7928797e239e71f96eb001460a0dbf943e18f)
2009-01-30 22:45:52 +01:00
Michael Adam
bbf36eebb9 ctdb_check_tcp_ports: fail the check if neither netstat nor netcat/nc is found
Michael

(This used to be ctdb commit 25d04bbe9528fafc68751f7beb22daeee3163d34)
2009-01-30 22:45:52 +01:00
Michael Adam
ba6612ec12 ctdb_check_tcp_ports: cope with multiple locations of netcat or nc
This fixes tcp port monitor events on systems, where netcat or nc
is not found in /usr/bin/, Debian, for instance.

The patch also separates the process of finding the binaries and
calling them, moving the detection outside of the loop over the
ports list.

Michael

(This used to be ctdb commit 3adf100e7f0c04aaf2da9ae4c6984cdb708c3b57)
2009-01-30 22:45:39 +01:00
Michael Adam
5137fd5673 events.d/41.httpd: fix a comment typo
Michael

(This used to be ctdb commit c9a0330989421afc138db6d195acf93f5eeda9fb)
2009-01-27 17:17:58 +01:00
Michael Adam
5e76a9bf65 events 50.samba: fix control of nmbd without separate nmb service script.
protect all potentially empty $CTDB_SERVICE_* script names

Michael

(This used to be ctdb commit df0afcbf9a0308fcd6ddcce1ac9366f785576f44)
2009-01-19 21:22:58 +01:00
Michael Adam
69ef570f0c ctdb.init: fix typo
Michael

(This used to be ctdb commit 145b85c948603cf977a5c5b53d9d9f63fbdba221)
2009-01-16 14:01:37 +01:00
Michael Adam
4c9db19c9a events 50.samba: also support suse and ubuntu/debain systems
for managing samba and winbind

This uses CTDB_INIT_STYLE as exported by ctdb.init.

suse systems usually have separate init scripts for
smb for smbd and nmb for nmbd, and the ubuntu/debian
start script for smbd and nmbd is called samba instead
of smb (on redhat).

Michael

(This used to be ctdb commit 5fe84f96f3f79baba1f44ba57ce217f501b3c1f8)
2009-01-16 13:33:13 +01:00
Michael Adam
a2d6abdb34 funcions: make (nice_)service a noop for empty service name
Michael

(This used to be ctdb commit 4cac2a16b70be772e4f1520020762f63c0bf3efe)
2009-01-16 13:31:02 +01:00
Michael Adam
7c4ce58ba6 ctdb.init: use detect_init_style() in the init script
and export CTDB_INIT_STYLE, so that event scripts
as called by ctdbd can use it.

Michael

(This used to be ctdb commit 56a10594ea9e44e3f034ac11161fd06e5ae46544)
2009-01-16 13:28:19 +01:00
Michael Adam
a6ea1b20e5 functions: add detect_init_style().
Michael

(This used to be ctdb commit ab34a9480b59c649a4fc73a466c8ca0975453ed9)
2009-01-16 13:26:57 +01:00
Michael Adam
2536a0c898 ctdb.init: add $network to RequiredStop to match RequiredStart.
This is to make rpm checks (eg.g for SuSE systems) survive.

Michael

(This used to be ctdb commit 22cafa88f59ebe50c11f5b65a414800db79405a9)
2009-01-16 20:49:52 +11:00
Michael Adam
f844ca744a skip directories containing macros (%) in ctdb_check_directories_probe
This prevents the monitor action of 50.samba from failing
on e.g. a typical [homes] service with "path = /home/%S" .

Michael

(This used to be ctdb commit 023d6c2e3017d323b5a70f987f3b4e0b8b8f0f7b)
2008-12-16 09:51:36 +11:00
Michael Adam
c50a7bbf39 ctdb.init: add Default-Start to init script to enable autostart.
Michael

(This used to be ctdb commit a1a0fa6eb37b5432cc2b176e252856d37fcc4fc8)
2008-12-16 09:51:30 +11:00
Michael Adam
9d36bcb379 ctdb.init: check availability of ctdb (with ping) before calling ctdb status
Michael

(This used to be ctdb commit 0f7444966d8147cf5a742320f51fbb9909d6d42d)
2008-12-16 09:51:24 +11:00
Michael Adam
4e91103791 ctdb.init: behave correctly when calling "service ctdb stop" on stopped service
When "service ctdb stop" is called and the ctdbd is not running,
don't print the "Failed to connect to daemon" error messages.
But print a warning and exit with status success instead.

Michael

(This used to be ctdb commit fac9ad26b2239818e6fc371fbfaa894fa64045be)
2008-12-16 09:51:00 +11:00
Michael Adam
7e0fb89710 ctdb.init: fix return code of "service ctdb stop" on non-redhat systems
Michael

(This used to be ctdb commit f3cb1386e7ea99adba78350bb50bf34d6bdcfe1d)
2008-12-16 09:50:53 +11:00
Michael Adam
759ae998a4 ctdb.init: fix status message of "service ctdb stop" on suse systems
Michael

(This used to be ctdb commit 7834d9b79bf4e4d3c6ce63dd4c3a1e40b9d909e4)
2008-12-16 09:50:47 +11:00
Michael Adam
1cf23b1bd7 Improve the monitor event test for ethernet interfaces (link detection).
On some systems, the ethtool link detection is not successful when a
cable is plugged but the interface has not been brought up previously.
This improves the test by bringing the interface up (without checking
for success here) and trying the ethtool test again afterwards.

Michael

(This used to be ctdb commit 0c2a7bf18c65452ca1c2f0539bf692507d91e3c6)
2008-12-12 09:19:23 +11:00
Michael Adam
54da843031 Use "grep -q" instead of "grep ... > /dev/null" in events.d/10.interfaces
This enhances readability.

Michael

(This used to be ctdb commit 9c6816e040d42d293eaf9ce41eff639135e8b2f5)
2008-12-12 09:18:30 +11:00
Ronnie Sahlberg
69932283ac remove two variables no longer used from the example sysconfig file
(This used to be ctdb commit dab594caf0bfc23c75c8cd2aa75479c7d2e79f1c)
2008-11-21 11:30:32 +11:00
Ronnie Sahlberg
090e5fdf5e new version 1.0.65
update the example sysconfig file. the default log level is 2, not 0

(This used to be ctdb commit 1f25958dc739677a487fa496fbeffcda7a0f2204)
2008-11-13 10:55:20 +11:00
Ronnie Sahlberg
d265e62ee7 dont log "running periodic cleanup" ...
(This used to be ctdb commit e25ea88ea4f270ba65ed5fdacd693f1248f343c0)
2008-10-20 09:45:15 +11:00
Ronnie Sahlberg
5318ca64b6 make it possible to set the script log level in CTDB sysconfig
(This used to be ctdb commit 06097b88709ced09d1f9f869eed9a54e6d2fedbf)
2008-10-17 13:36:52 +11:00
Ronnie Sahlberg
60b98f600e add an eventscript to monitor that the multipath devices are healthy
(This used to be ctdb commit f9779d3a237db59d7fdad92185ac7e42715466e6)
2008-10-15 16:27:33 +11:00
Ronnie Sahlberg
3902855275 change ip route add to route add -net since this works more reliably
update the makefile and rpm to install 99.routing

(This used to be ctdb commit c0b3bd8a3fa580dca5afa97c8012fccb25231373)
2008-10-15 01:49:19 +11:00
Ronnie Sahlberg
bad2949b65 add a new eventscript : 99.routing that is used to add static routes to
interfaces when they are activated (an ip address is added during
takeip)

(This used to be ctdb commit d9779c310e98c9d4eab71a8d1705849ac90deb10)
2008-10-07 11:03:30 +11:00
Ronnie Sahlberg
18b10d400d From Abhijith Das <adas@redhat.com>:
Fixup the initscript sdo it passes rpm-lint

(This used to be ctdb commit f84d0a9a8c7e9589e8833f21e1f977a0adab356b)
2008-08-25 10:13:18 +10:00
Ronnie Sahlberg
b99a88f0b3 Add a "reload" option to the initscript.
(This used to be ctdb commit 2a8bf5e7dc7364a8280d96db0f9579d2582a8524)
2008-08-25 10:03:16 +10:00
Ronnie Sahlberg
ddf2de2154 Do not fail the takeip event if the "ip addr add ..." command failed.
Let the event complete successfully.   the local recovery daemon will check that we have the address and reissue takip othervise.

There are several reasons why "ip addr add "  can fail, one is a misconfiguration
anothe ris that for ipv6 the stack is a lot more picky than for ipv4.     for examplke this WILL fail in ipv6 if there is a duplicate ip address on the network.

thus  this check could cause rolling-recoveries  which is why it has to go

(This used to be ctdb commit 12bc85c90a640a72ff538c003eb81da9dd1f2e3f)
2008-08-22 09:25:47 +10:00
Ronnie Sahlberg
9ce657b044 When we harvest all tcp connections to kill off after a takeip/releaseip event we must also harvest the ipv4 connections which may be presented in ::ff:xxxx:xxxx form by netstat
(This used to be ctdb commit 293d12a40501320a21efaf592b8f20e8590a5197)
2008-08-20 12:50:50 +10:00
Ronnie Sahlberg
43536648c5 update the socketkiller in the eventscripts to be able to handle ipv6
(This used to be ctdb commit 6da7b36b7ccc4ee9b809867ea32036f09a801bb3)
2008-08-20 09:47:00 +10:00
Andrew Tridgell
8d76f55bfc we need an additional gratuitous arp before the NFS tickles
(This used to be ctdb commit f7a70a5f9043b1d7293a515abf5b5228365693da)
2008-08-01 14:23:15 +10:00
Andrew Tridgell
d47fe5f83b ensure we use killtcp on non-NFS/non-CIFS ports for faster failover of
other protocols

(This used to be ctdb commit aefcb1f817581ac8cd67712d07159fc802f96623)
2008-08-01 14:17:50 +10:00
Ronnie Sahlberg
78beb27966 From Alexander Saupp.
If we use vlan tagging and bonding we must strip the vlan part off the name
so we can check the main bonde device for status.

I.e. check bond0  instead of bond0.<VLANTAG>

(This used to be ctdb commit 795c190b004d404b84dda053593139ed51d345e5)
2008-07-28 17:07:44 +10:00
Andrew Tridgell
71f0641dda run the testparm commands in 50.samba in the background, only running
in the foreground if something fails

(This used to be ctdb commit b1fed105ad780e89a128a611ef0bd659818eeebf)
2008-07-23 15:36:23 +10:00
Andrew Tridgell
4eac51341c allow for probing of directories without raising an error
(This used to be ctdb commit 8fed021d11160b137f4140ea02947347250e2959)
2008-07-23 15:35:46 +10:00
Ronnie Sahlberg
9a9b506d23 Add two new options
CTDB_SAMBA_SKIP_CONF_CHECK and CTDB_SAMBA_CHECK_PORTS.
The first is used to tell ctdb to no longer monitoring if the smb.conf file is consistent or not.

The second specifies which ports to check that smb is listening on
instead of using testparm to figure this out.

Since the net, testparm and smbstatus may block indefinitely in some configurations
we must have a way to configure ctdb to NOT use any of these three commands
in the scripts. These commands should thus never be used in scripts.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 2fe52c7979ecd28250ec4ac195d3c3999916e573)
2008-07-15 11:03:35 +10:00
Ronnie Sahlberg
0934f40c2a remove a debugging echo statement
(This used to be ctdb commit 495a6293c284a1e74b9c5e0c112e6ed5feead107)
2008-07-14 11:22:41 +10:00
Ronnie Sahlberg
aa0cab2aaa Update to the LVS eventscript.
Do not assume all nodes are members of LVS so always deciding the recmaster will be lvsmaster wont work.

Instead,
Create the set of active LVS nodes as those nodes that are LVS capable and
also HEALTHY.
Except if ALL LVS capable nodes are unhealthy in which case we allow the unhealthy
nodes to be part of the active set.

In the active set, pick one of the active nodes as being the lvsmaster
which will receive all incoming traffic and distribute it across
the active lvs nodes in the cluster.

(This used to be ctdb commit b2ccb891b81b041e2186e038b67bb4354b7892aa)
2008-07-10 11:42:37 +10:00
Ronnie Sahlberg
ab8535eaa5 make LVS a capability so that we can see which nodes are configured with
LVS and which are not using LVS.

"ctdb getcapabilities"

(This used to be ctdb commit 172d01fb34f032e098b1c77a7b0f17bf11301640)
2008-07-10 10:37:22 +10:00
Ronnie Sahlberg
3c0d725e0b add an option to skip checking that all the samba shares are ok
when monitoring the node health.
this might be useful to skip for environments with thousands of shares

(This used to be ctdb commit dd900d4ed8f07003c4f1db2d441cfc2ef2c89ef5)
2008-07-10 08:56:33 +10:00
Ronnie Sahlberg
3523899c2e remove the attempts to restart NFS.
nfs should never stop spontaneously so trying to restart it is
just counterproductive and at best a workaround to
hide real bugs.

(This used to be ctdb commit 90ab48bb8e17f59fcb27ddbff51de546c4447b64)
2008-07-10 08:05:34 +10:00
Ronnie Sahlberg
31967abf5c if we have enabled LVS but we dont have all the required packages
just log it to the messages
dont stop ctdb from starting

(This used to be ctdb commit 3c3d3ac5f7dec258589aaaf0633cab3b3af65cf3)
2008-07-09 15:17:27 +10:00
Ronnie Sahlberg
6bf597d061 mark /etc/ctdb/functions as a config file to keep rpmlint happy
(This used to be ctdb commit 8f6cd88e74de24af8dde2b6cabb2348c4f914b99)
2008-07-09 10:24:19 +10:00
Ronnie Sahlberg
2d644b3fbe Replace \s with [[:space:]] in our regexps we use for egrep.
Kevin Collins noticed that RHEL5 grep-2.5.1-54.2.el5 built for
x86 does not handle \s    while the exact same RHEL5 package for amd64
does!

[[:space:]] is more portable.  Even across the same package version ( different architecture ) from the same vendor :-)

(This used to be ctdb commit fd7bb21c4f9289fc34a57f9d8cb7c13a02d06096)
2008-07-09 10:03:21 +10:00
Ronnie Sahlberg
5ab7eaa553 update the monitor event for nfs to track how many times in a row it has failed
to "ping" the local nfs daemon.

Once it has failed more than 3 times in a row it will attempt to restart the nfs service.

(This used to be ctdb commit a4e89f57a8d733ea74df7b0de31eb977d6d37388)
2008-07-08 09:58:10 +10:00
Andrew Tridgell
75b8cd1096 added option to start ctdb under valgrind
Just add CTDB_VALGRIND=yes in /etc/sysconfig/ctdb, and look at the
logs in /var/log/ctdb_valgrind.*

(This used to be ctdb commit 9acd577c97059e8924582ac52e9ce5785903f120)
2008-07-04 16:58:14 +10:00
Ronnie Sahlberg
03cbb27a79 make /etc/ctdb/functions executable and add a hashbang to it so
rpmlint wont complain

(This used to be ctdb commit 9b8179ad043a80e0e18eeba427a7b7b15690d039)
2008-06-27 09:29:38 +10:00
Ronnie Sahlberg
46220fc467 read the samba sysconfig from the samba eventscript
(This used to be ctdb commit fb9870916ce0798695b09d33208a19d5de1cfd29)
2008-05-27 08:21:18 +10:00
Ronnie Sahlberg
5836576237 move CTDB_MANAGES_NFS from /etc/sysconfig/nfs to /etc/sysconfig/ctdb
(This used to be ctdb commit 92be23dbd6a5bf32d4d1af4a41437fbcd7d8eaf2)
2008-05-22 06:08:38 +10:00
Ronnie Sahlberg
e9664e5a4c move the CTDB_MANAGES_ISCSI setting from /etc/sysconfig/iscsi to /etc/sysconfig/ctdb
(This used to be ctdb commit a953a0fb450955b62d747bdc82c5b968fe0ed378)
2008-05-22 06:04:36 +10:00
Ronnie Sahlberg
f4ed8efa05 move the config optoin CTDB_MANAGES_VSFTPD from /etc/sysconfig/vsftpd to /etc/sysconfig/ctdb
(This used to be ctdb commit 1ad0295f86370979d0537f7290f5e9c7d1ff6e94)
2008-05-22 06:01:17 +10:00
Ronnie Sahlberg
e50dfb07bf When ctdb has just been installed on a node, there wont be any persistent databases
stored yet.

Fix a cosmetic and annoying warning message when running "service ctdb start" and supress printing out that "warning your ls command to find the persistent databases didnt find any" ...

(This used to be ctdb commit d32b16a4e5ecc31563c6f2767e7d483f3d980284)
2008-05-16 15:14:17 +10:00
Andrew Tridgell
e465110f95 Fix the chicken and egg problem with ctdb/samba and a registry smb.conf
This attempts to fix the problem of ctdb event scripts blocking due to
attempted access to the ctdb databases during recovery. The changes are:

  - now only the 'shutdown' and 'startrecovery' events can be called
    with the databases locked in recovery. The event scripts must ensure
    that for these two events no database access is attempted

  - the recovered, takeip and releaseip events could previously be called
    inside a recovery. The code now ensures that this doesn't happen, delaying
    the events till after recovery has finished

  - the 50.samba event script now avoids using testparm unless it is really
    needed

This needs extensive testing.

(This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b)
2008-05-14 20:57:04 +10:00
Ronnie Sahlberg
d3e24f744a When we run the init script to start the ctdb service
Use tdbdump to verify that all persistent database files are good
before we start the daemon.

(This used to be ctdb commit 13d3eb9a8bc7fad14fcd3e7e023c1336657424d6)
2008-05-12 16:44:33 +10:00
Ronnie Sahlberg
49e38d9f96 when pulling the nfs directories to check during 60.nfs monitor
grep for lines starting with a '/' character since exportfs will sometimes
split a single export line into two lines of output    like this :

[root@fscc-hs21-13 ~]# exportfs
/NFS4exports/tmp
                <world>
/NFS4exports    <world>

(This used to be ctdb commit 7c569720beb626617d800211faaf9029f0deb4cf)
2008-05-11 14:30:43 +10:00
Ronnie Sahlberg
41e762a836 From Mathias Dietz
Make the 60.nfs eventscript more forgiving when using non-us/english
characters in sharenames

(This used to be ctdb commit f4385712134ea783a0c79a687c5d4e6faa1cc4a7)
2008-05-08 06:52:53 +10:00
Ronnie Sahlberg
2c23959616 make sure we lose all elections for recmaster role if we do not have the recmaster capability.
(unless there are no other node at all available with this capability)

(This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80)
2008-05-06 13:56:56 +10:00
Ronnie Sahlberg
d86e48d5ff Add ability to disable recmaster and lmaster roles through sysconfig file and
command line arguments

(This used to be ctdb commit 34b952e4adc53ee82345275a0e28231fa1b2533e)
2008-05-06 10:41:22 +10:00
Ronnie Sahlberg
ea86c31da6 shell scripts need extra spaces sometime
(This used to be ctdb commit f6409b19972fa94257af9aa51def539f639bc226)
2008-04-10 07:01:22 +10:00
Ronnie Sahlberg
b902e09350 add possibility to provide site local modifications to the event system
through a /etc/ctdb/rc.local script that is sources by /etc/ctdb/functions

(This used to be ctdb commit a5b7dd97e3faf0c4f289240307d0e22a67cf2353)
2008-04-10 06:50:12 +10:00
Ronnie Sahlberg
0de4f37c91 return 0 if iscsi is disabled
(This used to be ctdb commit b76400e282cab60ac6b6039dbb33d93bb1350199)
2008-03-31 12:58:20 +11:00
Ronnie Sahlberg
d03bb15eb3 update the iscis support under RHEL5 to allow one iscsi target to be defined for each public address in the cluster.
update the documentation for iscsi

(This used to be ctdb commit c1130e58296e63be3787ec59690941b2677a3378)
2008-03-31 11:00:08 +11:00
Ronnie Sahlberg
39539f6044 Add a new parameter to /etc/sysconfig/ctdb
CTDB_START_AS_DISABLED="yes"

and command line argument
--start-as-disabled

When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses.
The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses.

Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster.

(This used to be ctdb commit b93d29f43f5306c244c887b54a77bca8a061daf2)
2008-02-22 09:42:52 +11:00
Ronnie Sahlberg
c8503e06cd monitor the amount of free memory and if this treshold is crossed, monitoring will log an OOM memory in the ctdb log and shut down ctdb on the node.
by default ctdb does not monitor for OOM.
to enable this you need to uncomment the CTDB_MONITOR_FREE_MEMORY line in /etc/sysconfig/ctdb and specify the amount in MByte free that will trigger OOM and cause ctdb to shutdown the node

(This used to be ctdb commit 35627c7450a03f36a353c3dd7cce31ce3433a7ff)
2008-02-21 13:29:28 +11:00
Ronnie Sahlberg
8da0e15a07 from Mathieu PARENT <math.parent@gmail.com>
Simulate "nice service" on systems that do not have "service"

(This used to be ctdb commit d0e6dcbadaf41745d423640e5ff5bafd9f68eb88)
2008-02-13 08:20:20 +11:00
Ronnie Sahlberg
42702fa770 add helpers to stop/start nfs lockmanager on different platforms
(This used to be ctdb commit 3b797d851bd4bdb8ec2b3981061c668d2cf0f97c)
2008-02-11 09:52:09 +11:00
Ronnie Sahlberg
0e31eaed57 create a startstop_nfs function that can start/stop the nfs service of different platforms
(This used to be ctdb commit f6cc6bd1f62138fbf812d1917f7341e2fa2323da)
2008-02-11 09:35:37 +11:00
Ronnie Sahlberg
81232a9e29 dont use absolute pathnames for the netstat tool
it can be either in /bin or /usr/bin

(This used to be ctdb commit 4ab09e90a8a81b26d2e2af168cfce3c49a98c0e5)
2008-02-07 15:41:48 +11:00
Ronnie Sahlberg
071021b67f dont use an absolute pathname for the touch command
(This used to be ctdb commit dbfa5cb7f91b5c3c7a2dcf337f60b5c4c188a688)
2008-02-07 15:38:59 +11:00
Ronnie Sahlberg
6820f4ea15 dont use an absolute pathname for the iptables tool
(This used to be ctdb commit 8f87385c09b16c0e32d797c4b442865d8185d9ee)
2008-02-07 15:36:26 +11:00
Ronnie Sahlberg
f992455ce3 dont use an absolute path for the basename command
(This used to be ctdb commit 2519d30162fa3e9d5d81efd374543a2e4dfce545)
2008-02-07 15:33:52 +11:00
Ronnie Sahlberg
35ee7d4999 in the 91.lvs event script
IF lvs has been configured, check that the ipvsadm package has also
been installed since we depend on it.
If not, log an error and return 1

(This used to be ctdb commit 506174bbc47f1176122be2e55099149e3db27d57)
2008-02-07 09:42:35 +11:00
Ronnie Sahlberg
a8ea67203f change the IF interface is a BOND THEN xxx ELSE assume everything is ethernet
into a case and add an arm for ib*) (infiniband interfaces)

Dont try using ethtool on ib devices
(mii_tool doesnt work either)

IB does have a command ibv_devinfo   which can tell whether a physical port
is up or not   but it seems nontrivial to map this into a interface name such as ib0

(This used to be ctdb commit ab6bd25542946a732b4378f5476edfb466d6c000)
2008-02-07 09:35:46 +11:00
Ronnie Sahlberg
2a0e73bff0 add monitoring of iscsi to the eventscript
(This used to be ctdb commit e190c4d71c0b54f4c6615258986770eba15f335d)
2008-02-06 14:26:35 +11:00
Ronnie Sahlberg
64b6df09a0 update ctdb version
change flags for 41.httpd

(This used to be ctdb commit 88527a4a5423014f9911fa6061632215e153eb7e)
2008-02-06 14:00:04 +11:00
Ronnie Sahlberg
55efef3237 add an eventscript to start/stop iscsi
(This used to be ctdb commit 1aecd8c9dc2855c40c9182f30e4e71bdae5705e3)
2008-02-06 12:41:00 +11:00
Andrew Tridgell
146d4b0db7 merge async recovery changes from Ronnie
(This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2)
2008-01-29 13:59:28 +11:00
Andrew Tridgell
3777346629 partial merge from ronnie
(This used to be ctdb commit fd316deb8a9e0545c8efa1bfc8ad83962b310405)
2008-01-29 11:39:06 +11:00
Andrew Tridgell
6b50533c22 fixed egrep pattern to use more compatible expression for spaces
(This used to be ctdb commit 2da3871417bb05da8802093ceeb02e89102d99ad)
2008-01-28 17:27:16 +11:00
Andrew Tridgell
3c5bf1fa01 merged 60.nfs changes from ronnie
(This used to be ctdb commit aa7996d4555883360082d9017185464b3551ae08)
2008-01-21 12:46:11 +11:00