1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-27 03:21:53 +03:00
Commit Graph

541 Commits

Author SHA1 Message Date
Ronnie Sahlberg
0de79c12ba Make sure the statd directory exist before trying to access the
"update trigger" file.

CQ 1020344

(This used to be ctdb commit 171f98f6f7ce7d01f47c44043ad599702711b12d)
2010-10-12 08:02:18 +11:00
Ronnie Sahlberg
842d9aab4e move extracting the config from config.tdb for public addresses
into its own function

(This used to be ctdb commit 2d478a39ed8303b0371112d61630660d12b7db2c)
2010-10-12 02:57:53 +11:00
Ronnie Sahlberg
f7febd28af dont stop checking interfaces after the first bond device
continue the loop to process all other interfaces too

(This used to be ctdb commit 500ade4e6a58ea786a665f6be7cf30f43c882570)
2010-10-09 10:55:43 +11:00
Ronnie Sahlberg
51a38dc4a4 Spotted by rusty.
Add a missing $
so we delete $_ip   and not _ip

(This used to be ctdb commit e9d04c5f419eaa0338a3beefba32c52be00242a8)
2010-10-08 15:53:36 +11:00
Ronnie Sahlberg
f5c0539dc6 Change how NATGW is configured to allow special nodes that do not have
network connectivity outside of the cluster to still be able to
participate in a natgw group.
These nodes can not become natgw master since they lack external network
connectivity.

These nodes are configured just the same way as for any other node with
NATGW, with the following two exceptions :
* we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes.
  since these ndoes lack external network we should not check the interface
  for link.
* we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that
  can not become natgw master.

(This used to be ctdb commit ab7b00a37e55beffc074be95b55d8a5c7cb9eef2)
2010-09-08 09:20:16 +10:00
Ronnie Sahlberg
dc2f87737d Dont store temporary runtime data in $CTDB_BASE/state
since that will usually be /etc/ctdb/state and storing this under /etc is just
wrong.

Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead.

(This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1)
2010-09-03 12:43:28 +10:00
Ronnie Sahlberg
c7df27e32d make sure all statd state directories exist before we try to reference them
or else tar and friends will throw an error in the log

(This used to be ctdb commit 96cbd2c0aa9a4641a42b3c33374675fa732ed1e5)
2010-09-01 15:49:57 +10:00
Ronnie Sahlberg
8be5bf1567 dont print a lot of log information about shutting down vsftpd
(This used to be ctdb commit 1a41cd7332703629001201eea8ae9b94f1341c9d)
2010-09-01 13:29:38 +10:00
Ronnie Sahlberg
9ef21f1c07 ouch, remove a dummy debug printout that snuck in there somehow
(This used to be ctdb commit 14c4d99513b4bdb94f60c3e9c4823e04b0833e60)
2010-08-30 19:48:41 +10:00
Ronnie Sahlberg
2b4d9170c2 Merge commit 'martins/master'
(This used to be ctdb commit cc8c851e2e0b46f00b18a6dc61fd2774e97850dd)
2010-08-30 18:22:05 +10:00
Ronnie Sahlberg
12cc826231 Remove the dependency on the underlying cluster filesystem for handling
the clusterwide persistent data associated with the lock manager and
statd notifications.

Use persistent databases to store this data instead of a shared directory.

(This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16)
2010-08-30 18:14:41 +10:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Martin Schwenke
a104d1d823 NFS tickles: use addtickle/deltickle instead of shared tickle directory.
This adds a new function update_tickles() that tracks tickles for a
given port using the new ctdb addtickle/deltickle commands.  This
function is used in events.d/60.nfs to handle NFS tickles.

events.d/61.nfstickle is removed.  The
/proc/sys/net/ipv4/tcp_tw_recycle setup is also moved to
events.d/60.nfs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dca4c4ebf3c35f8db3ae208efb7a83abbf726ed6)
2010-08-26 14:59:59 +10:00
Ronnie Sahlberg
3edec07807 Add a configuration database, implemented as a persistent database.
This database can be used, as an option, to store
the public address assignment instead of editing the /etc/ctdb/public-addresses file manually.

This configuration is stored in one record per key, with a key-name of
public-addresses:node#<pnn>
where <pnn> is the node number.

The content of this record is the same syntax as the /etc/ctdb/public-addresses file.

When ctdbd starts, if this key exist and contains data. It is extracted from the database and compared with the normal file /etc/ctdb/public-addresses.

If the content differs, the config database "wins" and is used to overwrite/update the /etc/ctdb/public-addresses file, after which ctdbd is restarted.

The main benefit with this option is that it can be used to update the public address configuration for nodes that are offline/unreachable by updating their configuration in the persistent database.
Once the offline node is available again, it will resync its databases with the rest of the cluster, find out that the config has changed, apply the changes and restart ctdbd automatically.

The command to store the public address configuration for a node into the persistent database is :

ctdb pstore config.tdb public-addresses:node#<pnn> <filename>

where <pnn> is the node# we wish to update the config for, and <filename> is a file containing the new content for  that nodes public address configuration.

(This used to be ctdb commit 292d7435a360efd7f15a7a99f658a605e07c0a81)
2010-08-25 11:49:56 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Ronnie Sahlberg
729f1ddea0 On RHEL, "service nfs stop;service nfs start" and "service nfs restart"
sometimes (very rarely) fails to restart the service.

    Add a function to restart NFSd on SLES and RHEL-like systems.

    If we detect the system is unhealthy due to kNFSd not running,
    try to restart the service again "service nfs restart" and
    hope for the best.

CQ1019372

(This used to be ctdb commit 25c4ce7e919f13226219f036bcffd2be76b2f06c)
2010-08-19 07:18:22 +10:00
Martin Schwenke
6ce1501aa1 Move NAT gateway firewall rules to recovered|updatenatgw events.
The existing code wasn't working as designed in the start event.  It
should work here.

BZ: 62613
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit aeb70c7e7822854eb87873a5c7783e27e6e72318)
2010-08-18 11:40:07 +09:30
Martin Schwenke
b930c885b3 initscript: wait until we can ping ctdbd before setting tunables.
Currently we do a "sleep 1" after starting and before running
set_ctdb_variables to set the tunables.  This is too arbitrary and
might fail if the system is heavily loaded.  This, for example, could
result in some nodes running with DeterministicIPs and some without,
in which case a different IP allocation algorithm would run depending
on who is the recmaster!

This makes the start function wait until "ctdb ping" succeeds (with 10
second timeout) before trying to run set_ctdb_variables.  If a timeout
occurs then the start function attempts to kill ctdbd before exiting
with a failure.

It also cleans up the status reporting code for Red Hat and SUSE so
that the final status code is reported.  Currently there are cases
where a correct status is prematurely reported before a failure
occurs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cdcd05662a30b51caaeeab4ac44138cac2474e0a)
2010-08-05 15:29:40 +10:00
Martin Schwenke
fe64a8f87a Optimise 61.nfstickle to write the tickles more efficiently.
Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat.  The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6549e9b01538998d51a5f72bfc569776d232b024)
2010-07-30 16:50:18 +10:00
Stefan Metzmacher
794230775c events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them
metze

(This used to be ctdb commit 1e08d1578d1960fcfc5fdd85492fbd6d194e5e94)
2010-07-30 16:33:27 +10:00
Stefan Metzmacher
7b1345d446 config/interface_modify.sh: do the echo before running the script
metze
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit bb1d2bd31073304fc203868517144f61d12b7fc2)
2010-07-15 15:06:51 +09:30
Stefan Metzmacher
3b9eeb1049 config/interface_modify.sh: before calling a script check if it exists and is executable
For non bash shells $_s_script might end with '/*'.

We do the workarround this way, because it makes sense to check
that a script is executable, before trying to execute it.

metze

[ This actually applies to any shell -- Rusty Russell ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)
2010-07-15 15:06:39 +09:30
Rusty Russell
34ce8a4f02 config: wrap iptables in flock to avoid concurrancy.
When doing a releaseip event, we do them in parallel for all the separate
IPs.  This creates a problem for iptables, which isn't reentrant, giving
the strange message:
	iptables encountered unknown error "18446744073709551615" while initializing table "filter"

The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.

The simple workaround is to flock-wrap iptables.  Better would be to rework
the code so we didn't need to use iptables in these paths.

CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)
2010-07-15 10:45:24 +09:30
Ronnie Sahlberg
004b849feb Dont check linkstatus for loopback. This interface never has
issues with the physical layer

(This used to be ctdb commit d938b80a1c409a9ec4b554ddca5b0d949be53d9e)
2010-06-01 14:51:09 +10:00
Ronnie Sahlberg
db9e00eec8 Prevent clients from connecting to the natgw address.
This address is dedicated for outgoing connections.

BZ62613

(This used to be ctdb commit f0e48dd833a4408449083148c172c2136b934e5b)
2010-06-01 12:43:32 +10:00
Ronnie Sahlberg
ad2b7c28b6 Add monitoring of quorum and make the node UNHEALTHY when quarum is lost
(This used to be ctdb commit d58b575e15015c5ef9493ab3ad3e8657c5787e2c)
2010-05-25 12:46:28 +10:00
Ronnie Sahlberg
03b112cb33 in 62.cnfs, lines in /etc/exports can have hte exports quoted,
so strip off any initial " on the exports line

(This used to be ctdb commit dce2244e8ac6617c335cfcd721c3795071b9f2b2)
2010-05-25 12:46:08 +10:00
Michael Adam
b40fa22239 functions: when checking for a directory also check whether it can be accessed.
Thanks to "waKKu" on irc for this improvement.

Michael

(This used to be ctdb commit 81e1483dd0ce2cd091721e456c0c194cc58442f3)
2010-05-11 11:29:45 +02:00
Ronnie Sahlberg
1cb2b0b2d0 Add a new eventscript 62.cnfs to integrate better with gpfs/cnfs
(This used to be ctdb commit 4a679422dc231aa98605b9cc322e4ab442f7bde4)
2010-05-04 13:56:55 +10:00
Ronnie Sahlberg
d6ae1c4173 If the admin makes a configuration mistake and configures NATGW to use the
same ip address as a normal public-address,
check for this in the natgw script and warn the user.

Also prevent ctdb from starting up since this configuration will not work.

BZ60933

(This used to be ctdb commit 480af69b63b9162c85d8e04461ca9e4a083c04a4)
2010-04-28 08:51:06 +10:00
Ronnie Sahlberg
2d9fee4f85 Add a setting where CTDB will monitor and warn for low memory conditions.
CTDB_MONITOR_FREE_MEMORY_WARN

BZ 59747

(This used to be ctdb commit 83446b2e7e28e3ed6627c1950053018b8799984a)
2010-04-23 09:08:38 +10:00
Ronnie Sahlberg
8ef5db522a In the example script to remove all ip addresses after a ctdb crash,
add the NATGW address as one to be removed in addition to the
public addresses.

(This used to be ctdb commit 234b86fb19aae7a43f1dd2c0f69b03164fe5aaca)
2010-04-23 09:08:26 +10:00
Ronnie Sahlberg
4f191982ca add an example script that can be called from crontab to cleanup
and release public ip addresses if ctdbd is no longer running

(This used to be ctdb commit 1cdaaa0a3f53d1b075340a33dfdc42b534e99187)
2010-04-22 14:23:02 +10:00
Ronnie Sahlberg
40434a7c98 add a missing ||
to make the 10.interface script not fail with a syntax error

(This used to be ctdb commit a9831070344a6dcf46c55250f9d74a5870f37dfe)
2010-04-22 14:22:46 +10:00
Martin Schwenke
f765f0ceca Fix a thinko in 2ea0a9f1a93781a0d036feb9fcc0d120b182922f.
If the driver is virtio_net then we assume that the link is up rather
than ignoring the check altogether.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3044d07da2a58260fa06bf489890b279bcf3ec39)
2010-04-20 10:52:31 +10:00
Ralph Wuerthner
d2f7bf804c ethtool does not support virtio_net devices.
Skip link test for this type of devices

Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea0a9f1a93781a0d036feb9fcc0d120b182922f)
2010-04-15 16:38:19 +10:00
Michael Adam
df77489477 events:50.samba: wipe the local part of the serverid db before starting winbind/smnd/nmbd
This is necessary for the new serverid approach.

Michael

(This used to be ctdb commit 8956f32e571093db7f285b83e4dd32960f8afc7c)
2010-03-29 17:05:06 +11:00
Stefan Metzmacher
940e58bf3f config: let 13.per_ip_routing use a flock for generate_auto_link_local()
metze

(This used to be ctdb commit dc2d0d0e559308ad2676f9ad973746c147d65eb9)
2010-03-18 11:57:16 +01:00
Ronnie Sahlberg
d4f7a59960 Merge root@10.1.1.27:/shared/ctdb/ctdb-git
(This used to be ctdb commit e59310132d8126ee3afc191b5db56e80a32986e8)
2010-03-11 18:15:41 +11:00
Wolfgang Mueller-Friedt
e26a26fd7a ctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called with a wrong parameter list
(This used to be ctdb commit e1e285d9f7fa3237dbbacca52a4eb2b264fa5986)
2010-03-11 17:52:42 +11:00
Mathieu Parent
c57c06df8c Fix some more bashisms
(This used to be ctdb commit 3d82ca5b1b8ba2770c739493aa0cdd34bb4827d8)
2010-03-10 17:41:40 +11:00
Mathieu Parent
e7bca0dcfc Correct nice_service()
nice takes a binary as argument and not a function or builtin command

(This used to be ctdb commit e21b40db64b314a24caa2bc611cb48b93decb5aa)
2010-03-10 17:39:56 +11:00
Michael Adam
ff48fc3933 fix bug #7152: check NFS-Shares, fails with to long path-names
Thanks to Thomas Sesselmann <t.sesselmann@dkfz.de> .

Michael

(This used to be ctdb commit da5fc07baa9aa806c3cba52c00fb10cf8b7f2dc5)
2010-02-23 21:08:23 +11:00
Stefan Metzmacher
e44c2396a7 config/13.per_ip_routing: fix typo in error message
metze

(This used to be ctdb commit 4b06665b77cb24d488f4ef03cc9ad5fd5d0feb0e)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
d79a70bca3 config/13.per_ip_routing: use better names for release_script and setup_script
As the basename of the script will be used for the readd script
from setup_iface_ip_readd_script, it's know easier to identify
what script is called by delete_ip_from_iface() while readding
ips to the interface.

metze

(This used to be ctdb commit 3ee225b0b6ed37c22478bd145ced56b1b9b86842)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
08d69d2cec config/13.per_ip_routing: register the setup script with setup_iface_ip_readd_script()
This is needed because we need to resetup the routing table when
the delete_ip_from_iface() function readds the ip to the interface.

metze

(This used to be ctdb commit ea87185ec9977006ef72d5a68c875154e4c84099)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
3a0d830e4c config/13.per_ip_routing: add a setup_per_ip_routing() function
This combines the logic into a shell function which can be used by the
"takeip" and "updateip" hooks.

We check the return values of the "ip" commands now
instead of ignoring them.

We now create a setup_script.sh similar to the release_script.sh
which makes it easier to analyze problems.

metze

(This used to be ctdb commit 624e8878851b4957cc7c02e922ec86926d6927ee)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
061c2a7182 config/10.interface: use delete_ip_from_iface also in the "init" event
metze

(This used to be ctdb commit e2bc5c25116747c58505fe1cb3e2d164257377d1)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
90769bf4eb config/11.natgw: use delete_ip_from_iface() instead of remove_ip()
This also initializes the variables correctly for the
shutdown|removenatgw code path to delete_all.

metze

(This used to be ctdb commit 2c2cbed4fcbc868a990fa6b32fc96126ffc61bb5)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
d71c40cad7 config: make remove_ip() a wrapper of delete_ip_from_iface()
metze

(This used to be ctdb commit e66d6636b80e3614f183366ec92fc3c6d5c323da)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
3bd1910428 config: interface_modify states in a $CTDB_BASE/state/interface_modify directory
metze

(This used to be ctdb commit 756c8b953fef7132dae74b5b244baeb3108dec54)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
d8ab328ee1 config: add setup_iface_ip_readd_script() helper function
This adds a generic infrastructure to register scripts which will
be called when the delete_ip_from_iface() funtion needs to readd
secondary ips to an interface.

metze

(This used to be ctdb commit ac97d65f44e8dc8bf2ec8f68e4db3448521755a2)
2010-02-23 10:38:47 +01:00
Stefan Metzmacher
feebd033eb config: readd ips with a broadcast address in delete_ip_from_iface()
metze

(This used to be ctdb commit e7a6f64cf5bce5abdc47f5db96b286c5a8d66aff)
2010-02-23 10:38:47 +01:00
Ronnie Sahlberg
af79d2c08b Make sure that the natgw eventscript also triggers on the "stopped" event
to remove the natgw configuration and ip assignments used.

BZ61036

(This used to be ctdb commit 344b1f95b126ecabeb4576330038b08bf88e8cb8)
2010-02-23 10:16:17 +11:00
Ronnie Sahlberg
6091dce975 From Sumit Bose <sbose@redhat.com>
Fixes for init script to meet guidelines

(This used to be ctdb commit 9f484404030211df85a215fd2280568a2ec020fb)
2010-02-22 14:06:52 +11:00
Ronnie Sahlberg
5439401dd2 try to restart rpc-rquotad if it is not running
bz60317

(This used to be ctdb commit 2263cd74d511247debadd0f6602bc6396b46ac5e)
2010-02-16 11:02:37 +11:00
Ronnie Sahlberg
70c1e39e64 Add a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
to control whether or not to check if we are swapping, and produce
useful output into the logfile if we are.

For production systems with dedicated nas-heads we should never swap.
But for developer/test systems we often use smaller nondedicated systems where
we can no longer guarantee that we will not be using swap.

(This used to be ctdb commit db87849bf3380914a63a626412bec209dbea7d20)
2010-02-16 11:01:39 +11:00
Ronnie Sahlberg
64111bb02b Add a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
when set to "yes" this will skip checking if knfsd has hung or not.

bz59626

(This used to be ctdb commit b0bf3794753c5bb898295b5109707953cc3dcec5)
2010-02-16 10:59:53 +11:00
Martin Schwenke
d25ab9eca0 Merge commit 'origin/master'
(This used to be ctdb commit 19523fbb12db1ec1e5ee38de1b2d3b99a74c6ca4)
2010-02-10 20:24:28 +11:00
Rusty Russell
34b8b98078 event scripts: add logging for low memory conditions
We should never enter swap; if we do, show the memory state of the machine and the process list.  This will help us diagnose what caused the condition before it's too late and the box starts OOM-killing processes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 627a6d67a0e9e61f8713e62695b3518c51909230)
2010-02-09 12:46:35 +10:30
Martin Schwenke
56b178e1a2 eventscripts: stop loadconfig function from loading ctdb config file twice.
If "$1" was empty than loadconfig would load the ctdb config twice.
This stops that from happening.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0406d406da70aaee7ad6aac236114905c5d03ed2)
2010-01-22 17:19:12 +11:00
Martin Schwenke
407a8f7205 eventscript: Use of $NFS_TICKLE_SHARED_DIRECTORY must be after loadconfig.
Proper fix for 085d1bea78fabf754ef6dd6d323f74a1d361e45c's workaround.
$NFS_TICKLE_SHARED_DIRECTORY was being used before it is set via
loadconfig.

Ronnie actually spotted this one.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ee8b2e298351d05197a2e1494f3331433644c1e6)
2010-01-22 17:14:50 +11:00
Martin Schwenke
02e68340e8 initscript: Remove bash-ism.
Also, change the order of the comparison so it is consistent with
others in the script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44696e15cdb23e7656d3bb0ead54f509495738a7)
2010-01-22 17:13:17 +11:00
Martin Schwenke
d6b0578cfb initscript: handle spaces in option values inserted into $CTDB_OPTIONS.
This puts single quotes around everything and uses eval on the
command-lines that actually start ctdbd.  The eval causes the single
quotes to be interpreted.

The "redhat" init style no longer uses the Red Hat daemon function.
It loses the quoting and re-splits on spaces.  Instead we add an extra
line that uses the success/failure functions to keep things pretty.
Note that this means that we don't respect daemon's
$DAEMON_COREFILE_LIMIT variable but we do our own core file handling
with $CTDB_SUPPRESS_COREFILE anyway.  daemon's core file handling was
probably overriding what we were doing anyway, so this can be regarded
as a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 522fbb012524fe41a67dbe43589a282dda6bcbe2)
2010-01-22 15:34:21 +11:00
Stefan Metzmacher
12c8dd215c config: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path
This is very useful for testing, I use such a script:

cat ~/bin/ethtool
 #!/bin/sh

 IFACE=$1

 case "$IFACE" in
        Neth2)
                ;;
        Neth3)
                ;;
        Neth4)
                ;;
        Neth5)
                ;;
        *)
                exec /usr/sbin/ethtool $@
                ;;
 esac

 ip link set down $IFACE

 exec /usr/sbin/ethtool $@

metze

(This used to be ctdb commit 3bab985cf615720eded4d47b4f9f37a9c28840aa)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
ea5843075c events: add updateip event to 13.per_ip_routing
metze

(This used to be ctdb commit 829150e814a5e6c85d0f21421f46f41e81d74c53)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
6a818e66ae events: 10.interface handle updateip event
metze

(This used to be ctdb commit a5cdf1277387f8c6292153c37fa9ceb64707d04f)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
50bff8c886 config: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig
With this option set to "yes", we don't become unhealthy
as long as at least one interface is still available.

metze

(This used to be ctdb commit d054eb33c6ae92560cddb40732e5dcf622591a3c)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
5d2c3ef656 config: 10.interfaces call monitor_interfaces on startup
metze

(This used to be ctdb commit 615dec051c26aac628f120e96bf12fb39fc6d28a)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
94e7101070 config: 10.interfaces call ctdb ifaces and ctdb setifacelink for monitoring
metze

(This used to be ctdb commit c465f63585c419ba59a6b04cbbf78ae615a7259d)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
9c89dd9210 events: splitout a monitor_interfaces function in 10.interface
metze

(This used to be ctdb commit b5ba56dea57db97d6c6ba3e7582e74fe0e3041fc)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
9a43f5e42b events: 10.interfaces allow multiple interfaces per public address
metze

(This used to be ctdb commit f9837f8b6f887d28f29aeb3eeffe8cfb423b40b4)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
628ac65709 config: add 13.per_ip_routing event script
With this script it's possible to generate routing tables
per public ip address.

metze

(This used to be ctdb commit ff5678fbec2daef461143acf00cef3f94d7655fc)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
2ecf8053f9 config: add some ipv4 helper shell functions
Many thanks to Michael Adam <obnox@samba.org>
for the basic work.

metze

(This used to be ctdb commit ff9c641763702ae99632bbf4d0825d578440c074)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
4493ba6ffa config: add interface_modify.sh and call it under flock to make modification on interfaces atomic
When two releaseip events run in parallel it's possible that the 2nd script
readds a secondary ip that was removed by the 1st script.

metze

(This used to be ctdb commit e02417b2a55c45ac2c125b1b3463c9c39e7bc07a)
2010-01-20 11:10:48 +01:00
Stefan Metzmacher
c251ac20fa events/10.interfaces: move some parts to helper functions
metze

(This used to be ctdb commit 24cd42769d8f32b90a8876a6a08a36ab23076cd1)
2010-01-20 09:44:37 +01:00
Stefan Metzmacher
d01870f138 config/functions: add tickle_tcp_connections()
metze

(This used to be ctdb commit 2397f13d7b5ca3847ef148187c6b179d06f6a47a)
2010-01-20 09:44:37 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Stefan Metzmacher
9cba540514 lib/util: import fault/backtrace handling from samba.
metze

(This used to be ctdb commit 8171d66f0061fe23ed6dfef87ffe63bfc19596eb)
2010-01-20 09:44:36 +01:00
Ronnie Sahlberg
21e5b44673 source the nfs sysconfig file from the 61.nfstickles script
(This used to be ctdb commit 085d1bea78fabf754ef6dd6d323f74a1d361e45c)
2010-01-20 10:35:02 +11:00
Ronnie Sahlberg
a1d60b1511 Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.

Add an entry in the sysconfig file to set this persistently.

(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
2010-01-15 15:38:56 +11:00
Martin Schwenke
b65a44a4ec Revert "Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state."
This reverts commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6.

wbinfo --ping-dc is proving too unreliable.

(This used to be ctdb commit b70021856e76df1ba407c83cfc19bf332fbfc869)
2010-01-12 21:02:44 +11:00
Martin Schwenke
96066d8816 Revert "events/50.samba: only use wbinfo --ping-dc if available"
This reverts commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b.

wbinfo --ping-dc is proving too unreliable.

(This used to be ctdb commit 178f429a7b6d1008d35e857b6ca1df6adb60d255)
2010-01-12 21:02:11 +11:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
96977cc5c4 config: add CTDB_MAX_PERSISTENT_CHECK_ERRORS option
metze

(This used to be ctdb commit fc5f556d488488040303438aefecb5ae2a8e54bc)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c735f03d4 config: try to use tdbtool <tdb> check instead of tdbdump for persistent db checks
metze

(This used to be ctdb commit 52e6d81f4d8a4035272d9256d01bafb8ed593027)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c907f4965 config: load 'ctdb' config before 'nfs' config in statd-callout
All other scripts do 'loadconfig ctdb' before any other 'loadconfig foo'
call. I think we should do the same in statd-callout.

Otherwise it's very confusing, if you have configured some Options
in /etc/sysconfig/ctdb, but /etc/ctdb/statd-callout doesn't notice
them.

metze

(This used to be ctdb commit 10d95581fb90bfdf58ec32345c4e36c27acf4f37)
2009-12-16 08:03:55 +01:00
Ronnie Sahlberg
50820f9e18 Bond devices can have any name the user configures, so
when checking link status for an interface, first
check if this interface is in fact a bond device
(by the precense of a /proc/net/bonding/IFACE file)
and use that file for checking status.

Othervise assume ib* is an infiniband interface which we donnt know how
to check, or otherwise it is an ethernet interface and ethtool should
hopefully work.

(This used to be ctdb commit 8cc6c5de3d7abb0b72eaa6e769e70963b02d84cb)
2009-12-09 11:33:04 +11:00
Ronnie Sahlberg
3ca3f4c771 make sure to also check that interfaces used for NATGW are ok
and have a link.
if not the node should become unhealthy

(This used to be ctdb commit 03b5bbaae1b53830a4cd20d3079ab8f45ffce923)
2009-12-09 11:13:29 +11:00
Stefan Metzmacher
af170d1a8a events/50.samba: only use wbinfo --ping-dc if available
metze

(This used to be ctdb commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b)
2009-12-08 07:38:00 +11:00
Ronnie Sahlberg
cdabe16777 Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state.
(This used to be ctdb commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6)
2009-12-07 18:27:46 +11:00
Martin Schwenke
b17bf38c64 Eventscripts: Fix syntax error in 00.ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea261f791ab919eb1ce5b37073b4f1d30694bb8)
2009-12-01 18:08:57 +11:00
Martin Schwenke
50a26cf75e Eventscripts: Remove executable bit accidently set on some scripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4c6e68ae942c05224c5f8b683fbc2dc1adced8ee)
2009-12-01 17:54:45 +11:00
Martin Schwenke
db25ca69e5 Eventscript argument cleanups and introduction of ctdb_standard_event_handler.
The functions file no longer causes a side-effect by doing a shift.
It also doesn't set a convenience variable for $1.

All eventscripts now explicitly use "$1" in their case statement, as
does the initscript.  The absence of a shift means that the
takeip/releaseip events now explicitly reference $2-$4 rather than
$1-$3.

New function ctdb_standard_event_handler handles the status and
setstatus events, and exits for either of those events.  It is called
via a default case in each eventscript, replacing an explicit status
case where applicable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d55408cbbb3bb71670b80f3dad5639ea0be5b5b)
2009-12-01 17:43:47 +11:00
Martin Schwenke
ad431c3520 Event scripts: functions file now intercepts status and setstatus.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1f37fdc5217e57d2d643d77a811afca747685e0)
2009-11-27 15:57:33 +11:00
Martin Schwenke
ece15620c0 Event scripts: use $script_name rather than $service name for status.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 517e9d9b188b18dffc712a8fecddb41540d27b8d)
2009-11-25 16:42:14 +11:00
Martin Schwenke
ee10ea202b Event scripts: Respect CTDB_MANAGES_NFS and add function log_status_cat.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d97c07be13a8209a81dfc8f73e49371949e4dc3)
2009-11-25 16:34:49 +11:00
Martin Schwenke
1edcb89948 More eventscript cleanups. Initial smoke testing seems OK.
Apart from lots of cleanup work, this also fixes a bug where the share
checks didn't used to cope with directory names containing spaces.
The previous commit also loaded the config incorrectly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3c93336ab92c2e4829ff4dc360045bfa6df21d50)
2009-11-25 16:30:47 +11:00
Martin Schwenke
a4a048b5cd Now vaguely tested initscript updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1e350f9edb74cc44b6c5be4c062fd93e98ba8c4)
2009-11-19 16:48:19 +11:00
Martin Schwenke
ee513c1ba2 More untested eventscript factorisation.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ac655b0a65b32d809d47fec9821f7f31bb2fe2a7)
2009-11-19 15:00:17 +11:00
Martin Schwenke
73cb65bf1a Eventscripts: Untested factorisations and introduction of status event.
This is the first stage of an experimental change to eventscripts.
Ronnie and I did a few hours of factorisation of 40.vsftpd and applied
many of the changes to 41.httpd.  Other eventscripts were also
modified.

At this stage this is completely untested.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 364e70b763f0ccd7714d15723ad3ea4d7e2968a1)
2009-11-13 18:28:25 +11:00
Mathieu Parent
2a66b7dae4 Fix bashism in events.d/11.natgw
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 6ccb495d1110157c06596763c7e252f3182c251e)
2009-11-10 12:07:30 +01:00
Ronnie Sahlberg
3cbaf935af sugegstion from metze,
use killtcp and kill both directions of the nfs connections.
we used to kill only one direction since the other direction was unkillble
but recent kernels allow us to kill both

(This used to be ctdb commit 8001ae580bcc28d45f6026b529d7ffc247cbba34)
2009-11-06 09:54:03 +11:00
Michael Adam
85a4d9a943 ctdb.sysconfig: add a comment section about CTDB_RUN_TIMEOUT_MONITOR
Michael

(This used to be ctdb commit b7dc1e0720991cc65353e07cf87608acea21ba27)
2009-11-05 11:13:53 +01:00
Michael Adam
95333e0ee7 Add a 99.timeout event script to trigger monitor timeouts.
This just sleeps for twice the value of EventScriptTimeout
in the monitor action. It is not run by default, but
can be activated by setting CTDB_RUN_TIMEOUT_MONITOR
in /etc/sysconfig/ctdb .

Michael

(This used to be ctdb commit 1a3ecdee85b82bb3234a92ae6bcdeb92238eb7ee)
2009-11-05 11:13:47 +01:00
Ronnie Sahlberg
0d3bff5fa6 From Rusty
It's much nicer for post-mortem debugging to have a body to examine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 058e21d96c3c02759833fd5ddfe7b43e6a5f5740)
2009-11-05 15:57:46 +11:00
Ronnie Sahlberg
c915f2e5d5 add an extra test for the bond devices and check that there is an active slave.
this to handle the case where all links do have a physical layer, but where all slaves have been disabled using ifdown

(This used to be ctdb commit bf50709630df000583f2b0ef0edc177c01d60eaf)
2009-11-05 12:12:06 +11:00
Ronnie Sahlberg
2501638e15 dont verify winbindd is running properly at startup
(This used to be ctdb commit 9e1b99221c8f257129641f6eda2795537b7ce9de)
2009-11-04 07:50:26 +11:00
Ronnie Sahlberg
9e235af3a2 make the error logged when winbindd fails to access the dc during startup more scary and easier to spot in the logs
(This used to be ctdb commit 0c9b0466fd87b3f1e5d53f867c863217802ac43b)
2009-10-29 11:54:24 +11:00
Ronnie Sahlberg
023d09cd38 Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover."
This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36.

(This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)
2009-10-29 10:49:00 +11:00
Ronnie Sahlberg
279b7ca564 update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover.
(This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)
2009-10-29 10:37:10 +11:00
Ronnie Sahlberg
0588b5f9c5 add a check that winbind can actually talk to teh dc during the startup event
and refuse to start up if it can not

(This used to be ctdb commit 4037b6e73a819a8e2463dfe0959b42875e05e106)
2009-10-27 15:45:03 +11:00
Martin Schwenke
8b2101bc61 Merge commit 'origin/master'
(This used to be ctdb commit 61282d4a9be9e544aaa86f3cffc5b58e417f5ab1)
2009-10-21 21:48:15 +11:00
Ronnie Sahlberg
ff8363697d treat interfaces with the name ethX* as bond devices
(This used to be ctdb commit 3997d7e5471810e9a2f145ce2e795073dfc5eded)
2009-10-21 11:34:17 +11:00
Martin Schwenke
b77094e897 Merge commit 'origin/master'
(This used to be ctdb commit b3ae2b753261443dca317803752a9d61285a3270)
2009-10-19 16:46:45 +11:00
Ronnie Sahlberg
58780f4137 add a direcotry where multiple local scripts can be added to run when executing eventscripts
(This used to be ctdb commit 27d152a918680a59c7412aec7e1772f25b72d469)
2009-10-19 16:22:15 +11:00
Ronnie Sahlberg
cdc77af3ab wait a bit longer before shutting down when the reclock file is missing
pring the filename of the missing file when we turn unhealthy and also
a 'df'

(This used to be ctdb commit 97ded8a629ec762f71bad28515e4fbc810790b1d)
2009-10-19 15:33:20 +11:00
Ronnie Sahlberg
1e91fd0a25 Revert "dont shutdown a node when the reclock file is temporarily unavailable."
This reverts commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50.

(This used to be ctdb commit 02f68dc60e0b7bf26d631850b12834d5c71a88f2)
2009-10-19 15:30:44 +11:00
Martin Schwenke
b20d680070 Merge commit 'origin/master'
(This used to be ctdb commit 5ad283458e59ea8232e01f34be007901c10c8a2e)
2009-10-16 16:36:48 +11:00
Martin Schwenke
0bff3b4289 initscript: when stopping on Red Hat use the success/failure functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bf5402b41282da94fee1ab3e4546ec089ff12f37)
2009-10-16 16:35:56 +11:00
Ronnie Sahlberg
d258616984 dont shutdown a node when the reclock file is temporarily unavailable.
Leave the node as UNHEALTHY this stops clients from accessing the node until
the reclock file can be accessed again

(This used to be ctdb commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50)
2009-10-15 13:19:10 +11:00
Ronnie Sahlberg
30d9fbfbec move the logging of the warning "No reclock file used" to the startup case so we only print this warning on "service ctdb start" and not for "service ctdb *"
(This used to be ctdb commit eb854f65f978f24583e221138eb4f9b917b89285)
2009-10-14 12:12:04 +11:00
Ronnie Sahlberg
070f781e39 always create the nfs state directories during the monitor event.
this allows us to configure and enable nfs at runtime without having to restart ctdbd

(This used to be ctdb commit f6e39d35713475defaa08a623e194f3f2f8f7d53)
2009-10-14 09:15:24 +11:00
Ronnie Sahlberg
df0dba1862 Merge commit 'martins/master'
(This used to be ctdb commit 5f14874c5c705dd637f88a77f30c930fea1201d2)
2009-10-12 16:51:36 +11:00
Martin Schwenke
ab98c1b0f1 Clean up ctdb_check_directories* eventscript functions.
There are 2 problems with this code:

* The loop in ctdb_check_directories_probe() breaks on filenames
  containing whitespace.

  The fix to protect them is to pass "$@" to this function and have it
  operate on "$@".

  Note that there's still a problem with whitespace in filenames in
  the 50.samba eventscript.  To fix this ctdb_check_directories_probe
  should read the filenames from stdin.  Another time...

* The check for '%' in filenames in ctdb_check_directories_probe()
  ends up involving several forks.  On a modern machine this can cost
  a couple of minutes when checking a large number of directories.

  The fix is to use a case statement.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eb1fecaef9aa5cb85dff7d4f7af8a9878deabed8)
2009-10-12 16:32:49 +11:00
Martin Schwenke
d8e2ddc5a8 40.vsftpd: reset the fail counter in the "recovered" event.
Each recovery that involves IP reassignments results in a restart of
vsftpd in the "recovered" event.  Currently, we can have several
recoveries in quick succession and the "monitor" event following each
can fail because vsftpd isn't ready yet.  This results in cumulative
failures, so the node is marked unhealthy, even though vsftpd has
never had a proper opportunity to become ready.

This resets the fail count after each recovery.

While we're here, also move the delete of the restart flag file into
the body of the conditional.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 318abeb4b913a8d846e7eaf4cf5c2a67b61ce974)
2009-10-12 16:17:37 +11:00
Ronnie Sahlberg
42193cbff8 update natgw eventscript to allow you to fore it to update and / or to remove the configuration at runtime
(This used to be ctdb commit deed52b7e4aac94b4d11a8d89d08739e1dfd4ed7)
2009-10-06 16:09:24 +11:00
Ronnie Sahlberg
e90dd8015f add a new notification to trigger on when ctdb has started
(This used to be ctdb commit b1fe04f2e9447f762a0b805763deb29296585ff8)
2009-10-01 14:05:30 +10:00
Martin Schwenke
b27600253d Minor fixes to 01.reclock eventscript.
test -z really needs its argument to be quoted.  Simplified a status
test.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fe26da7780545b1ecc0a7da5bc1cf8beaeea94cc)
2009-09-30 21:21:56 +10:00
Martin Schwenke
78b7043411 40.vsftpd monitor event only fails after 2 failures to connect to port 21.
Change the monitor event in 40.vsftpd so it only fails if there are 2
successive failures connecting to port 21.  This reduces the
likelihood of unhealthy nodes due to vsftpd being restarted for
reconfiguration due to node failover or system reconfiguration.

New eventscript functions ctdb_counter_init, ctdb_counter_incr,
ctdb_counter_limit.  These are used to count arbitrary things in
eventscripts, depending on the eventscript name and a tag that is
passed, and determine if a specified limit has been hit.  They're good
for counting failures!

These functions are used in 40.vsftpd and also in 01.reclock - the
latter used to do the counting without these functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfe63636a163730ae9ad3554b78519b3c07d8896)
2009-09-30 21:05:16 +10:00
Ronnie Sahlberg
c971d934a9 From Wolfgang Mueller-Friedt
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.

Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.

this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.

minor stylish changes made by ronnie s

(This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
2009-09-29 13:27:19 +10:00
Ronnie Sahlberg
9bac6f2e2c change the reclock fail count to 19 monitor intervals before we shut down ctdbd
(This used to be ctdb commit 6e35feb06ec036b9036c5d1cdd94f7cef140d8a6)
2009-09-28 14:12:59 +10:00
Ronnie Sahlberg
4f0f2cc196 add a new eventscript 01.reclock
if the reclock file has been set, then this script will test that the
    reclock file can actually be accessed.
    if the file does not exist, or if the attempts to stat the file hangs,
    the node will be marked unhealthy after the third failed monitoring event
    and after the tenth failure, ctdb itself will shutdown.

(This used to be ctdb commit 2cb04747887674def299e574fccb827c1c3194e7)
2009-09-28 14:06:40 +10:00
Ronnie Sahlberg
4a05b2dfd8 try restarting ststd indefinitely not just once
(This used to be ctdb commit 03b0d913ae009284e2fadda1b9246ec77d19db29)
2009-09-15 19:33:53 +10:00
Ronnie Sahlberg
029fd6b00f Revert "try to restart statd everytime it fails, not just the first time"
This reverts commit 4f7b39a4871af28df1c4545ec37db179fa47a7da.

(This used to be ctdb commit db7b96304e4725f29b12398b7582e385daed63ed)
2009-09-15 19:33:35 +10:00
Ronnie Sahlberg
59cacded72 try to restart statd everytime it fails, not just the first time
(This used to be ctdb commit 4f7b39a4871af28df1c4545ec37db179fa47a7da)
2009-09-15 13:35:58 +10:00
Michael Adam
e80a7001ff Introduce sysconfig variable CTDB_SYSLOG=yes/no (default "no").
This allows for controlling start of ctdbd with or without the option "--syslog"
from the sysconfig/ctdb file.

Michael

(This used to be ctdb commit 7bf9fff9139a4270496bddb97f9433bab87824bf)
2009-09-09 09:52:14 +02:00
Michael Adam
d8f9dad26b Rename the CTDB_INIT_STYLE "ubuntu" to "debian" - this is where it comes from.
Micheal

(This used to be ctdb commit b060911683d8ac201806d35a505867fe3ba9519f)
2009-09-09 09:52:13 +02:00
Mathieu Parent
70294f3136 Fix bashism in nfstickle event script.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit f7a326b560b12f8b46c01d98cdd460e5510c67fb)
2009-09-09 09:52:13 +02:00
Mathieu Parent
e12faf771c Fix bashisms in samba event script.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 0310a6b17d6167c46482a07c6cd96bcabda6ffbc)
2009-09-09 09:52:13 +02:00
Mathieu Parent
28319e4760 Fix bashisms in multipathd event script.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 13b81b6c8e01aa52a31756ecffa797a4761115db)
2009-09-09 09:52:13 +02:00
Mathieu Parent
e160925f86 Fix bashism in natgw eventscript.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 4fad47c1af8503385b090be281ffbd284021279c)
2009-09-09 09:52:12 +02:00
Ronnie Sahlberg
001c0f0c7e make it possible to have ctdb manage (start/stop/monitor) winbind without having samba
(This used to be ctdb commit 77574b7d7fe11c8e73957a80845481f3b2a64219)
2009-09-04 02:59:24 +10:00
Ronnie Sahlberg
d5329b13e9 overwrite the state file, dont append to it.
dont log errors is trying to delete a nonexisting state file

this eliminates some annoying log entries in the ctdb log

(This used to be ctdb commit 7a95257a5ec19f232f661bc7f797051bf08ab776)
2009-09-02 04:39:17 +10:00
Ronnie Sahlberg
f3fd4bb659 redirect stderr to dev null since the rule might not exist when we try to unconditionally delete it
(This used to be ctdb commit e1d709f32196e19d4041ee2958e143791762e08f)
2009-09-02 03:12:27 +10:00
Michael Adam
34d2bb1f6c set broadcast addresses in the takeip event.
Michael

(This used to be ctdb commit e26d9d32e68e7db1cf4f96c47c0126e9e0b213be)
2009-08-28 06:50:53 +10:00
Ronnie Sahlberg
e893393ef2 remove a check for the reclock file we dont need
(This used to be ctdb commit 54c047c48902a15e5d2925bfa86e012a11188796)
2009-08-28 05:19:44 +10:00
Wolfgang Mueller-Friedt
345df3c714 remove repack from eventscript
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>

(This used to be ctdb commit dd334caa98882fc59765b7c84eca8e86de785487)
2009-07-29 13:29:38 +10:00
Ronnie Sahlberg
4d5823ba7c update the natgw eventscript to set the NATGW capability when this feature is used
This does not modify any behaviour of the daemon itself other than showing this flag as ON in the ctdeb getcapabilities output

(This used to be ctdb commit fb337c151bd16ad5ad0c99431224451979d8c651)
2009-07-28 10:00:33 +10:00
Ronnie Sahlberg
6db0f01532 document the new stopped event
(This used to be ctdb commit 70603d9a79c80379bf65d9d703c399a65c109c52)
2009-07-17 12:30:05 +10:00
Ronnie Sahlberg
e5e9fc48b1 create a new event : stopped.
This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ...

Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered.

(This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)
2009-07-17 12:26:16 +10:00
Ronnie Sahlberg
9c6aa4e420 update the eventscript to ensure that stopped nodes can not become the natgw master
also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not.

(This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)
2009-07-17 09:45:05 +10:00
Ronnie Sahlberg
66c8d4fb3d make it possible to start the daemon in STOPPED mode
(This used to be ctdb commit 866aa995dc029db6e510060e9e95a8ca149094ac)
2009-07-09 11:57:20 +10:00
Ronnie Sahlberg
2708b305ca Initscript cleanups.
* Move building of CTDB_OPTIONS to new function build_ctdb_options()
  and have it use a helper function for readability.

* New functions check_persistent_databases() and set_ctdb_variables().

* Remove valgrind-specific stop code, since the general pkill should
  kill ctdbd when running under valgrind.

* Remove some bash-isms (e.g. >& /dev/null) since the script is /bin/sh.

* Make indentation consistent.

* Minor clean-ups.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Conflicts:

	config/ctdb.init

(This used to be ctdb commit bebb21f18e3026cb78a306104e92ee005d1077b2)
2009-07-07 13:45:19 +10:00
Ronnie Sahlberg
3c1351eabd update the sysconfig to show setting the debuglevel using a string literal instead of a numeric value
(This used to be ctdb commit 964530d70ba2ca949380d30a0e3d622963a6206c)
2009-07-01 09:23:52 +10:00
Ronnie Sahlberg
4a1a3652fe Document that you can run ctdb without a reclock file in the sysconfig file
(This used to be ctdb commit 33895d217ee096b356f02b5292ba27a840c4f559)
2009-06-25 11:59:21 +10:00
Ronnie Sahlberg
77ef745394 Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking.
Update the init script to allow starting without a reclock file.

(This used to be ctdb commit 07855ff5eba71e7d607d52e234a42553d9b93605)
2009-06-25 11:50:45 +10:00
Ronnie Sahlberg
d3dde37934 rename 99.routing to 11.routing so the eventscript is processed before
NFS and LVS

(This used to be ctdb commit 16ec9ca56a9f5b88d7a5ed4f89a28a53f5c9c081)
2009-06-23 11:01:04 +10:00
Martin Schwenke
566314ca97 Fix minor problem in previous initscript commit.
The valgrind start case should not use daemon, since this is specific
to Red Hat.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 867f57d166395c92949e480ca725249b0ca8950b)
2009-06-19 18:08:54 +10:00
Martin Schwenke
3dad79b88e Initscript fixes, mostly for "stop" action.
Use a local variable $ctdbd so that we always run ctdbd from the the
same place and so that we know what to kill.  This variable respects
the $CTDBD environment variable, which may be used to specify an
alternative location for the daemon.

In the important cases use "pkill -0 -f" to check if ctdbd is
running.  Also, remove the special case for killing ctdbd when running
under valgrind.  The regular case will handle this just fine.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 070305adfe636c2580776e6bf24bb8be06622b86)
2009-06-19 18:08:31 +10:00
Ronnie Sahlberg
0ddf79a3bc increase the timeout before we shutdown when ther ecovery daemon is hung
(This used to be ctdb commit facddcacb4a961cddb117818fa38a3e97770b2fa)
2009-06-18 09:20:18 +10:00
Ronnie Sahlberg
34fbfb8b89 rename 99.routing to 11.routing
so it is executed before any of the service scripts

(This used to be ctdb commit 1205673499618f90f413fad9e96a88733b5ce359)
2009-06-18 09:11:46 +10:00
Ronnie Sahlberg
caf0e863a4 remove the obsolete ipmux component.
this is replaced by LVS since a long time

(This used to be ctdb commit dca41ec04788922ce5f4c52d346872b3e35f8cbb)
2009-05-25 12:33:52 +10:00
Ronnie Sahlberg
e999ade7bb From Flavio Carmo Junior <carmo.flavio@gmail.com>
Add an eventscript to manage ClamAV

(This used to be ctdb commit bb4ef6c4d2bc3578bdf4432517e98f85ec94e3b6)
2009-05-25 12:10:29 +10:00
Ronnie Sahlberg
934d8a6b5f From : Flavio Carmo Junior <carmo.flavio@gmail.com>
Add a helper function that checks whether a unix domain socket exists
and there is a daemon LISTENING to it  similar to the existing function
to check for a daemon LISTENING to a tcp/ip socket.

(This used to be ctdb commit 025a836ab3be3c078fccd8c10b10dfffbfdd94d0)
2009-05-19 08:47:19 +10:00
Ronnie Sahlberg
be7137faa9 use scope host when adding the interface to loopback so we dont respond to ARPs for this ip
(This used to be ctdb commit fcd6226a6c00cf657532aa76804bfe029df21ba6)
2009-05-14 08:55:05 +10:00
Ronnie Sahlberg
016b37f1e2 change the prefix NATGW_ to CTDB_NATGW_
(This used to be ctdb commit b7ed7fd4a5fbd344d41caa1afa100b1f24506173)
2009-05-14 08:12:48 +10:00
Ronnie Sahlberg
12400298c1 assign the natgw address to loopback and not the private network so that natgw will still work even when public and private networks are one and the same
(This used to be ctdb commit 2bd796b8a098074502fe20e3ab69098b2109c133)
2009-05-12 18:42:13 +10:00
Martin Schwenke
86ad711c37 41.httpd event script workaround for RHEL5-ism.
RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores.  This means that eventually a node runs out of semaphores
and httpd can't be started.  So, before we attempt to start httpd we
clean up any semaphores owned by apache.  We also try to restart httpd
in the monitor event if httpd has gone away.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2d3fbbbb63f443686f9fec42c0bc2058d115806e)
2009-05-12 08:53:32 +10:00
Andrew Tridgell
4f4f03f84a use less intrusive smbstatus call in periodic connections cleanup
(This used to be ctdb commit a152fdc79e3360049aee66c3e628237a91df181f)
2009-05-06 08:20:55 +10:00
Ronnie Sahlberg
2e3542b5e5 dont unconditionally kill/restart ctdb when given "service ctdb start" only start ctdb if it is not already running, and print an error message othervise
(This used to be ctdb commit 94343309992929a592348c936e09a7b4f8b512c1)
2009-04-30 17:38:30 +10:00
Andrew Tridgell
37e2417c59 change shutdown level for ctdb to be 01
We want ctdb to shutdown first, as it manages many other
services. With the old level of 32 the NFS service would shutdown
first, and that would trigger ctdb to do a recovery. Then ctdb itself
would be shutdown a few seconds later, which causes a lot of error
messages in the other nodes logs

(This used to be ctdb commit 2f952af1a12e81a652ec9a4794db96f9593f2676)
2009-04-23 11:35:42 +10:00
Ronnie Sahlberg
4be3e86405 create a function "remote_ip" which can be used from scripts to remove a single ip from an interface.
use this fucntion from the natgw eventscript

(This used to be ctdb commit feab5f30b2d6cebf4dd28abc5a81f93424a4c852)
2009-04-08 12:49:28 +10:00
Ronnie Sahlberg
53d6626503 install a default /etc/ctdb/notify.sh script as example on how to use
snmptrap/email to notify that a node has changed health status

(This used to be ctdb commit ee52c0866e2b26c396fe60946159c559d47199eb)
2009-03-31 14:38:52 +11:00
Ronnie Sahlberg
ad40ee25f9 add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state.
This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes.

(This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)
2009-03-31 14:23:31 +11:00
Ronnie Sahlberg
b9e6e15cd4 we must also try to set the routes when we release an ip since during the release/10.interfaces there can actually be a window where the kernel decides to remove all addresses (before we manually add them back in 10.interfaces) during which the kernel may also decide to delete all routes since there are no gateways reachable through this interface anymore.
(This used to be ctdb commit 34633223a46caaa079da233663f9c6dcc1803f87)
2009-03-31 11:33:28 +11:00
Ronnie Sahlberg
d7ff332896 update how the NATGW configuration works.
allow the cluster to be partitioned into multiple disjoint natgw subsets

(This used to be ctdb commit 1046885cd22b5001e0251de2e536b5f6793459be)
2009-03-25 13:37:57 +11:00
Ronnie Sahlberg
689f76f0b0 Merge branch 'obnox'
(This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)
2009-03-24 17:49:55 +11:00
Ronnie Sahlberg
36ec47d610 create a varient of kill_tcp_connections that only kills off the local side of a connection
(This used to be ctdb commit dc2f28f7c988364b5d45f3048be4db3e5ff113b3)
2009-03-24 14:05:31 +11:00
Ronnie Sahlberg
686adea3fe set --single-public-ip when lvs is used
(This used to be ctdb commit 292fff6eace39141591871e12f9a64e3441237be)
2009-03-24 13:51:32 +11:00
Michael Adam
a83ed1d743 Merge commit 'ctdb-ronnie/master'
(This used to be ctdb commit 39a972b0d6d0d70282c25c54a124b67431467e77)
2009-03-23 10:07:44 +01:00
Ronnie Sahlberg
293a3f1158 update the natgw eventscript and documentation
(This used to be ctdb commit 95d8ddbc2dd0b159e8df003502c3c336668d2c41)
2009-03-19 10:17:44 +11:00
root
9bf792d704 redo how the natgw is done. just use a default route with a high metric instead of fancy policyrouting
(This used to be ctdb commit f03bd2b3d906dac9fb876dca54535d22e9cf1b9e)
2009-03-18 19:19:49 +11:00
root
f037e881a2 change the NATGW_ example in sysconfig to make it more realistic
(This used to be ctdb commit 742283a8f8da7c614ee3a30d48c430e3a3bceeb9)
2009-03-18 09:33:58 +11:00
root
32391ec844 NAT-GW updates. Describe the functionality in the sysconfig file
(This used to be ctdb commit 4c598ab6f8e9b826d437b9ab869c4490f7c4faba)
2009-03-17 07:35:53 +11:00
Michael Adam
fd71213717 ctdb.sysconfig: add CTDB_MANAGES_HTTPD comment section
Michael

(This used to be ctdb commit ccaf9ebe062127124cf23e69dcd2ac2edda40020)
2009-03-10 00:21:04 +01:00
Michael Adam
eac9425820 events.d/50.samba: allow CTDB_SERVICE_{SMB,NMB,WINBIND} to be overriden from sysconfig
Michael

(This used to be ctdb commit b1aba6651143ae1c85b24d78b67c760795ff5bff)
2009-03-09 00:20:30 +01:00
Michael Adam
78294c4f3e ctdb.sysconfig: add CTDB_INIT_STYLE with explanation
Michael

(This used to be ctdb commit 8518c9e0ffec44677d45f60e63936a831d62ab20)
2009-03-09 00:08:26 +01:00
root
798553a9dc Add a variable CTDB_NFS_SKIP_SHARE_CHECK to sysconfig that can disable the check that all shares are accessable.
This can take very long if there are very many shares and is in that case better to implement in a separate cronjob than in ctdb eventscript

(This used to be ctdb commit 432604a1435cd2b5a7178fb5aedf1d4b61bffeb9)
2009-03-04 07:21:55 +11:00
root
c72c15c19a make it possible to disable checking all samba shares.
this is a timeconsuming process and might not be feasible to perform if there are very many thousand shares

(This used to be ctdb commit 051ae5f3c13892b860818eac803d348f09845dc6)
2009-02-20 10:58:34 +11:00
Michael Adam
d6c5f65572 Merge commit 'ctdb-ronnie/master'
(This used to be ctdb commit e1c90b12290c682c2cba90e9afa3a09be014e20e)
2009-02-10 00:28:08 +01:00
root
e7de72a1ac use netstat to check first and only fall back to netcat if netstat is unavailable
(This used to be ctdb commit dfb16ce9ed65048d30109851737a9075d071ecdb)
2009-02-05 14:44:46 +11:00
Michael Adam
0405ec036d events 41.httpd: support suse and ubuntu/debian systems for managing apache
The httpd service on suse and ubuntu/debian systems is usually
called "apache2" nowadays.

Note: There are older installs with Apache 1.3 out there, in which case
the service is called "apache". An extra check for these installs could
be useful as a sequel to this patch...

Michael

(This used to be ctdb commit b9e50e3416fecef6a881be3f1b91be977299293f)
2009-02-04 00:42:33 +01:00
Michael Adam
62f27d0cb3 events.d/41.httpd: fix a typo in the fix of the comment typo
This is embarassing...

Michael

(This used to be ctdb commit dbd90f6210617b23d5695c4c868392363c75d23b)
2009-02-04 00:01:15 +01:00
Michael Adam
77bd2b6c91 ctdb_check_tcp_ports: correctly detect listeners on ipv6 :::<port> w/out netcat
The netstat test only grepped for the ipv4 wildcard address.
Now the ipv6 wildcard listener is correctly detected as well.

Michael

(This used to be ctdb commit 78e7928797e239e71f96eb001460a0dbf943e18f)
2009-01-30 22:45:52 +01:00
Michael Adam
bbf36eebb9 ctdb_check_tcp_ports: fail the check if neither netstat nor netcat/nc is found
Michael

(This used to be ctdb commit 25d04bbe9528fafc68751f7beb22daeee3163d34)
2009-01-30 22:45:52 +01:00
Michael Adam
ba6612ec12 ctdb_check_tcp_ports: cope with multiple locations of netcat or nc
This fixes tcp port monitor events on systems, where netcat or nc
is not found in /usr/bin/, Debian, for instance.

The patch also separates the process of finding the binaries and
calling them, moving the detection outside of the loop over the
ports list.

Michael

(This used to be ctdb commit 3adf100e7f0c04aaf2da9ae4c6984cdb708c3b57)
2009-01-30 22:45:39 +01:00