samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-27 03:21:53 +03:00

History

Martin Schwenke 1ea3616dcc Eventscripts: improvements to 41.httpd. * Reduce the failure counts so that restart attempts happen sooner. * Use service_start() and service_stop() for the restart. ctdb_service_start() resets the failure count, which isn't very useful in this context. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 01776b9f29af9ad5c8534649ece1bd100e450434)		2011-08-11 10:46:56 +10:00
..
00.ctdb	Eventscripts: fix dangerous rm -rf in 00.ctdb init event.	2011-08-09 16:48:57 +10:00
01.reclock	server: add "init" event	2010-01-20 09:44:36 +01:00
10.interface	Eventscripts - Remove local variable usage in 10.interfaces.	2011-08-08 15:44:30 +10:00
11.natgw	NATGW: dont set arp_ignore in 11.natgw anymore since we no longer	2011-04-06 11:33:11 +10:00
11.routing	When using multiple VLANs, some funky stuff can sometimes happen when	2011-05-12 12:06:45 +10:00
13.per_ip_routing	Evenscripts: update 13.per_ip_routing to use ctdb_setup_service_state_dir.	2011-08-09 17:35:37 +10:00
20.multipathd	Evenscripts: update 20.multipathd to use ctdb_setup_service_state_dir.	2011-08-09 17:28:09 +10:00
31.clamd	Eventscript function: change service_start into a function.	2011-08-11 10:46:20 +10:00
40.fs_use	Add new eventscript 40.fs_use that can be used to monitor file system use and flag a node unhealthy when they become full	2011-08-11 10:04:40 +10:00
40.vsftpd	Evenscripts: update 40.vsftpd to use ctdb_service_check_reconfigure.	2011-08-11 10:46:20 +10:00
41.httpd	Eventscripts: improvements to 41.httpd.	2011-08-11 10:46:56 +10:00
50.samba	Eventscripts: make 50.samba use $service_state_dir.	2011-08-11 10:46:56 +10:00
60.ganesha	Eventscripts: remove unnecessary absolute paths from external commands.	2011-08-03 17:19:15 +10:00
60.nfs	Evenscripts: update 60.nfs to use ctdb_service_check_reconfigure.	2011-08-11 10:46:56 +10:00
62.cnfs	Evenscripts: update 61.cnfs to use ctdb_setup_service_state_dir.	2011-08-10 12:27:41 +10:00
70.iscsi	add an explicit _is_managed_service to iscsi eventscript	2010-11-18 14:15:56 +11:00
91.lvs	make the persistent even longer for lvs to make people even happier	2011-08-11 09:12:38 +10:00
99.timeout	Eventscript argument cleanups and introduction of ctdb_standard_event_handler.	2009-12-01 17:43:47 +11:00
README	Correction of spelling errors	2011-03-23 00:35:23 +01:00

README

This directory is where you should put any local or application
specific event scripts for ctdb to call.

All event scripts start with the prefic 'NN.' where N is a digit.
The event scripts are run in sequence based on NN.
Thus 10.interfaces will be run before 60.nfs.

Each NN must be unique and duplicates will cause undefined behaviour.
I.e. having both 10.interfaces and 10.otherstuff is not allowed.


As a special case, any eventscript that ends with a '~' character will be 
ignored since this is a common postfix that some editors will append to 
older versions of a file.


The eventscripts are called with varying number of arguments.
The first argument is the "event" and the rest of the arguments depend
on which event was triggered.

All of the events except the 'shutdown' and 'startrecovery' events will be
called with the ctdb daemon in NORMAL mode (ie. not in recovery)

The events currently implemented are
init
	This event does not take any additional arguments.
	This event is only invoked once, when ctdb is starting up.
	This event is used to do some cleanup work from earlier runs
	and prepare the basic setup.
	At this stage 'ctdb' commands won't work.

	Example: 00.ctdb cleans up $CTDB_VARDIR/state

setup
	This event does not take any additional arguments.
	This event is only invoked once, when ctdb is starting up.
	This event is used to do some cleanup work from earlier runs
	and prepare the basic setup.

	Example: 00.ctdb cleans up $CTDB_VARDIR/state

startup
	This event does not take any additional arguments.
	This event is only invoked once, when ctdb has finished
	the initial recoveries. This event is used to wait for
	the service to start and all resources for the service
	becoming available.

	This is used to prevent ctdb from starting up and advertize its
	services until all dependent services have become available.

	All services that are managed by ctdb should implement this
	event and use it to start the service.

	Example: 50.samba uses this event to start the samba daemon
	and then wait until samba and all its associated services have
	become available. It then also proceeds to wait until all
	shares have become available.

shutdown
	This event is called when the ctdb service is shuting down.
	
	All services that are managed by ctdb should implement this event
	and use it to perform a controlled shutdown of the service.

	Example: 60.nfs uses this event to shut down nfs and all associated
	services and stop exporting any shares when this event is invoked.

monitor
	This event is invoked every X number of seconds.
	The interval can be configured using the MonitorInterval tunable
	but defaults to 15 seconds.

	This event is triggered by ctdb to continuously monitor that all
	managed services are healthy.
	When invoked, the event script will check that the service is healthy
	and return 0 if so. If the service is not healthy the event script
	should return non zero.

	If a service returns nonzero from this script this will cause ctdb
	to consider the node status as UNHEALTHY and will cause the public
	address and all associated services to be failed over to a different
	node in the cluster.

	All managed services should implement this event.

	Example: 10.interfaces which checks that the public interface (if used)
	is healthy, i.e. it has a physical link established.

takeip
	This event is triggered everytime the node takes over a public ip
	address during recovery.
	This event takes three additional arguments :
	'interface' 'ipaddress' and 'netmask'

	Before this event there will always be a 'startrecovery' event.

	This event will always be followed by a 'recovered' event once
	all ipaddresses have been reassigned to new nodes and the ctdb database
	has been recovered.
	If multiple ip addresses are reassigned during recovery it is
	possible to get several 'takeip' events followed by a single 
	'recovered' event.

	Since there might involve substantial work for the service when an ip
	address is taken over and since multiple ip addresses might be taken 
	over in a single recovery it is often best to only mark which addresses
	are being taken over in this event and defer the actual work to 
	reconfigure or restart the services until the 'recovered' event.

	Example: 60.nfs which just records which ip addresses are being taken
	over into a local state directory   and which defers the actual
	restart of the services until the 'recovered' event.


releaseip
	This event is triggered everytime the node releases a public ip
	address during recovery.
	This event takes three additional arguments :
	'interface' 'ipaddress' and 'netmask'

	In all other regards this event is analog to the 'takeip' event above.

	Example: 60.nfs

updateip
	This event is triggered everytime the node moves a public ip
	address between interfaces
	This event takes four additional arguments :
	'old-interface' 'new-interface' 'ipaddress' and 'netmask'

	Example: 10.interface

startrecovery
	This event is triggered everytime we start a recovery process
	or before we start changing ip address allocations.

recovered
	This event is triggered every time we have finished a full recovery
	and also after we have finished reallocating the public ip addresses
	across the cluster.

	Example: 60.nfs which if the ip address configuration has changed
	during the recovery (i.e. if addresses have been taken over or
	released) will kill off any tcp connections that exist for that
	service and also send out statd notifications to all registered 
	clients.
	
stopped
	This event is called when a node is STOPPED and can be used to
	perform additional cleanup that is required.
	Note that a stopped node is considered inactive, so it will not
	be issuing the recovered event once the cluster has recovered.
	See 91.lvs for a use of this event.

Additional note for takeip, releaseip, recovered:

ALL services that depend on the ip address configuration of the node must 
implement all three of these events.

ALL services that use TCP should also implement these events and at least
kill off any tcp connections to the service if the ip address config has 
changed in a similar fashion to how 60.nfs does it.
The reason one must do this is that ESTABLISHED tcp connections may survive
when an ip address is released and removed from the host until the ip address
is re-takenover.
Any tcp connections that survive a release/takeip sequence can potentially
cause the client/server tcp connection to get out of sync with sequence and 
ack numbers and cause a disruptive ack storm.