samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-27 03:21:53 +03:00

195 lines

6.6 KiB

Plaintext

Raw Normal View History

added hooks to make nfs statd behave correctly on failover (This used to be ctdb commit a1ee84fc47892b6c18d417ccf714211fcb07952e) 2007-05-31 05:09:45 +04:00			`#!/bin/sh`

split out events for each subsystem separately (This used to be ctdb commit 03c629a72f234dcc783fa1085e7edba09597c241) 2007-06-01 14:54:26 +04:00			`# this script needs to be installed so that statd points to it with the -H`
docs on how to use statd-callout (This used to be ctdb commit 4a75111b4f3f93dc42c9ced2d23f3cc933712017) 2007-06-02 13:45:06 +04:00			`# command line argument. The easiest way to do that is to put something like this in`
			`# /etc/sysconfig/nfs:`
			`# STATD_HOSTNAME="myhostname -H /etc/ctdb/statd-callout"`

cope with non-standard install dirs in event scripts (This used to be ctdb commit 52fff5345873690a9cc86495f414343eaa3bd540) 2007-09-14 08:14:03 +04:00			`[ -z "$CTDB_BASE" ] && {`
			`export CTDB_BASE="/etc/ctdb"`
			`}`

			`. $CTDB_BASE/functions`
config: load 'ctdb' config before 'nfs' config in statd-callout All other scripts do 'loadconfig ctdb' before any other 'loadconfig foo' call. I think we should do the same in statd-callout. Otherwise it's very confusing, if you have configured some Options in /etc/sysconfig/ctdb, but /etc/ctdb/statd-callout doesn't notice them. metze (This used to be ctdb commit 10d95581fb90bfdf58ec32345c4e36c27acf4f37) 2009-11-09 17:06:59 +03:00			`loadconfig ctdb`
make the init scripts more portable about location of system config files (This used to be ctdb commit 65f3e2bc722e314b2c51c3bfdc544b408a8a64cf) 2007-06-03 16:07:07 +04:00			`loadconfig nfs`
added hooks to make nfs statd behave correctly on failover (This used to be ctdb commit a1ee84fc47892b6c18d417ccf714211fcb07952e) 2007-05-31 05:09:45 +04:00
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00			`[ -z $NFS_HOSTNAME ] && {`
			`echo NFS_HOSTNAME is not configured. statd-callout failed.`
			`exit 0`
			`}`

added hooks to make nfs statd behave correctly on failover (This used to be ctdb commit a1ee84fc47892b6c18d417ccf714211fcb07952e) 2007-05-31 05:09:45 +04:00			`case "$1" in`
			`add-client)`
improve the handling of hosts to notify with statd (This used to be ctdb commit cc87bda7e344bc777b9620a6211e62de4dce4e3b) 2007-09-06 05:30:49 +04:00			`# the callout does not tell us to which ip the client connected`
			`# so we must add it to all the ips that we serve`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			PNN=`ctdb xpnn \| sed -e "s/.*://"`
			`ctdb ip -Y \| while read LINE; do`
			NODE=`echo $LINE \| cut -f3 -d:`
			`[ "$NODE" = "$PNN" ] \|\| {`
			`# not us`
			`continue`
			`}`
			IP=`echo $LINE \| cut -f2 -d:`
Eventscripts: remove unnecessary absolute paths from external commands. For eventscript unit testing it will be necessary to override external commands to allow stub implementations to be used. If absolute paths aren't used then this can be done using either a fake bin/ subdirectory or by using shell functions. This removes all of the simple cases of absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/ctdb.init config/events.d/50.samba Keep old code but remove absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8) 2011-04-13 06:08:09 +04:00			`mkdir -p $CTDB_VARDIR/state/statd/ip/$IP`
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`touch $CTDB_VARDIR/state/statd/ip/$IP/$2`
added hooks to make nfs statd behave correctly on failover (This used to be ctdb commit a1ee84fc47892b6c18d417ccf714211fcb07952e) 2007-05-31 05:09:45 +04:00			`done`
			`;;`
			`del-client)`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`# the callout does not tell us to which ip the client disconnected`
			`# so we must remove it from all the ips that we serve`
			PNN=`ctdb xpnn \| sed -e "s/.*://"`
			`ctdb ip -Y \| while read LINE; do`
			NODE=`echo $LINE \| cut -f3 -d:`
			`[ "$NODE" = "$PNN" ] \|\| {`
			`# not us`
			`continue`
			`}`
			IP=`echo $LINE \| cut -f2 -d:`
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`mkdir -p $CTDB_VARDIR/state/statd/ip/$IP`
Eventscripts: remove unnecessary absolute paths from external commands. For eventscript unit testing it will be necessary to override external commands to allow stub implementations to be used. If absolute paths aren't used then this can be done using either a fake bin/ subdirectory or by using shell functions. This removes all of the simple cases of absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/ctdb.init config/events.d/50.samba Keep old code but remove absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8) 2011-04-13 06:08:09 +04:00			`rm -f $CTDB_VARDIR/state/statd/ip/$IP/$2`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`done`
			`;;`
			`updatelocal)`
			`# For all IPs we serve, collect info and push to the config database`
			PNN=`ctdb xpnn \| sed -e "s/.*://"`
			`ctdb ip -Y \| tail -n +2 \| while read LINE; do`
			NODE=`echo $LINE \| cut -f3 -d:`
			`[ "$NODE" = "$PNN" ] \|\| {`
			`continue`
			`}`
			IP=`echo $LINE \| cut -f2 -d:`

Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`mkdir -p $CTDB_VARDIR/state/statd/ip/$IP`
make sure all statd state directories exist before we try to reference them or else tar and friends will throw an error in the log (This used to be ctdb commit 96cbd2c0aa9a4641a42b3c33374675fa732ed1e5) 2010-09-01 09:48:55 +04:00
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`rm -f $CTDB_VARDIR/state/statd/ip/$IP.tar`
			`tar cfP $CTDB_VARDIR/state/statd/ip/$IP.tar $CTDB_VARDIR/state/statd/ip/$IP`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`rm -f $CTDB_VARDIR/state/statd/ip/$IP.rec`
			`ctdb pfetch ctdb.tdb statd-state:$IP $CTDB_VARDIR/state/statd/ip/$IP.rec 2>/dev/null`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`[ "$?" = "0" ] \|\| {`
			`# something went wrong, try storing this data`
			`echo No record. Store STATD state data for $IP`
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`ctdb pstore ctdb.tdb statd-state:$IP $CTDB_VARDIR/state/statd/ip/$IP.tar 2>/dev/null`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`continue`
			`}`

Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`cmp $CTDB_VARDIR/state/statd/ip/$IP.tar $CTDB_VARDIR/state/statd/ip/$IP.rec >/dev/null 2>/dev/null`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`[ "$?" = "0" ] \|\| {`
			`# something went wrong, try storing this data`
			`echo Updated record. Store STATD state data for $IP`
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`ctdb pstore ctdb.tdb statd-state:$IP $CTDB_VARDIR/state/statd/ip/$IP.tar 2>/dev/null`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`continue`
			`}`
added hooks to make nfs statd behave correctly on failover (This used to be ctdb commit a1ee84fc47892b6c18d417ccf714211fcb07952e) 2007-05-31 05:09:45 +04:00			`done`
			`;;`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00
			`updateremote)`
			`# For all IPs we dont serve, pull the state from the database`
			PNN=`ctdb xpnn \| sed -e "s/.*://"`
			`ctdb ip -Y \| tail -n +2 \| while read LINE; do`
			NODE=`echo $LINE \| cut -f3 -d:`
			`[ "$NODE" = "$PNN" ] && {`
			`continue`
			`}`
			IP=`echo $LINE \| cut -f2 -d:`

Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`mkdir -p $CTDB_VARDIR/state/statd/ip/$IP`
make sure all statd state directories exist before we try to reference them or else tar and friends will throw an error in the log (This used to be ctdb commit 96cbd2c0aa9a4641a42b3c33374675fa732ed1e5) 2010-09-01 09:48:55 +04:00
Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`rm -f $CTDB_VARDIR/state/statd/ip/$IP.rec`
			`ctdb pfetch ctdb.tdb statd-state:$IP $CTDB_VARDIR/state/statd/ip/$IP.rec 2>/dev/null`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`[ "$?" = "0" ] \|\| {`
			`continue`
			`}`

Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`rm -f $CTDB_VARDIR/state/statd/ip/$IP/*`
			`tar xfP $CTDB_VARDIR/state/statd/ip/$IP.rec`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`done`
			`;;`

- create /etc/ctdb/taken_ips and /etc/ctdb/changed_ips analog to the existing /etc/ctdb/released_ips - only call the statd-callout script if the ips have changed and call it with a "notify" argument. we need to restart nfslock service in both cases - change statd-callout to explicitely restart the lock manager and statd when "notify" is called. copy the state directory for each held ip from shared storage to /tmp then use sm-notify to send notifications to all monitored clients (This used to be ctdb commit 800f15a27af885a3f83430d3bc411cc72ac40e86) 2007-06-01 07:14:05 +04:00			`notify)`
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00			`# we must restart the lockmanager (on all nodes) so that we get`
			`# a clusterwide grace period (so other clients dont take out`
			`# conflicting locks through other nodes before all locks have been`
			`# reclaimed)`

			`# we need these settings to make sure that no tcp connections survive`
			`# across a very fast failover/failback`
dont set parameters in statd-callout if they should be set they bshould be set from 10.interfaces (This used to be ctdb commit 0c7c2dae0a976922de58793d576855bc37cd38e1) 2007-10-22 04:18:38 +04:00			`#echo 10 > /proc/sys/net/ipv4/tcp_fin_timeout`
dont set some of the sysctl variables in statd-callout. these are mainly useful for avoiding ack-storms when doing very rapid failover/failback during testing but should not be required in real-world. this gets rid of a lof of annoying messages from the messages file (This used to be ctdb commit 50d289dcce2caa7c7be9b6faa3b38b69c2237038) 2007-10-21 00:42:33 +04:00			`#echo 0 > /proc/sys/net/ipv4/tcp_max_tw_buckets`
			`#echo 0 > /proc/sys/net/ipv4/tcp_max_orphans`
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`# Delete the notification list for statd, we dont want it to`
			`# ping any clients`
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00			`rm -f /var/lib/nfs/statd/sm/*`
			`rm -f /var/lib/nfs/statd/sm.bak/*`
add a short delay after stopping nfslock to make it less likely that "weird" things happen (This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a) 2007-09-07 06:14:53 +04:00
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00			`# we must keep a monotonically increasing state variable for the entire`
			`# cluster so state always increases when ip addresses fail from one`
			`# node to another`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`# We use epoch and hope the nodes are close enough in clock.`
			`# Even numbers mean service is shut down, odd numbers mean`
			`# service is started.`
			STATE=`date +"%s"`
			STATE=`expr "$STATE" "/" "2"`
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00

add a short delay after stopping nfslock to make it less likely that "weird" things happen (This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a) 2007-09-07 06:14:53 +04:00			`# we must also let some time pass between stopping and restarting the`
			`# lockmanager since othervise there is a window where the lockmanager`
			`# will respond "strangely" immediately after restarting it, which`
			`# causes clients to fail to reclaim the locks.`
			`#`
add helpers to stop/start nfs lockmanager on different platforms (This used to be ctdb commit 3b797d851bd4bdb8ec2b3981061c668d2cf0f97c) 2008-02-11 01:52:09 +03:00			`startstop_nfslock stop > /dev/null 2>&1`
add a short delay after stopping nfslock to make it less likely that "weird" things happen (This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a) 2007-09-07 06:14:53 +04:00			`sleep 2`

60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00			`# now start lockmanager again with the new state directory.`
add helpers to stop/start nfs lockmanager on different platforms (This used to be ctdb commit 3b797d851bd4bdb8ec2b3981061c668d2cf0f97c) 2008-02-11 01:52:09 +03:00			`startstop_nfslock start > /dev/null 2>&1`
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033) 2007-09-07 02:52:56 +04:00
			`# we now need to send out additional statd notifications to ensure`
			`# that clients understand that the lockmanager has restarted.`
			`# we have three cases:`
			`# 1, clients that ignore the ip address the stat notification came from`
			`# and ONLY care about the 'name' in the notify packet.`
			`# these clients ONLY work with lock failover IFF that name`
			`# can be resolved into an ipaddress that matches the one used`
			`# to mount the share. (==linux clients)`
			`# This is handled when starting lockmanager above, but those`
			`# packets are sent from the "wrong" ip address, something linux`
			`# clients are ok with, buth other clients will barf at.`
			`# 2, Some clients only accept statd packets IFF they come from the`
			`# 'correct' ip address.`
			`# 2a,Send out the notification using the 'correct' ip address and also`
			`# specify the 'correct' hostname in the statd packet.`
			`# Some clients require both the correct source address and also the`
			`# correct name. (these clients also ONLY work if the ip addresses`
			`# used to map the share can be resolved into the name returned in`
			`# the notify packet.)`
			`# 2b,Other clients require that the source ip address of the notify`
			`# packet matches the ip address used to take out the lock.`
			`# I.e. that the correct source address is used.`
			`# These clients also require that the statd notify packet contains`
			`# the name as the ip address used when the lock was taken out.`
			`#`
			`# Both 2a and 2b are commonly used in lockmanagers since they maximize`
			`# probability that the client will accept the statd notify packet and`
			`# not just ignore it.`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			`# For all IPs we serve, collect info and push to the config database`
			PNN=`ctdb xpnn \| sed -e "s/.*://"`
			`ctdb ip -Y \| tail -n +2 \| while read LINE; do`
			NODE=`echo $LINE \| cut -f3 -d:`
			`[ "$NODE" = "$PNN" ] \|\| {`
			`continue`
			`}`
			IP=`echo $LINE \| cut -f2 -d:`

Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1) 2010-09-03 06:35:25 +04:00			`ls $CTDB_VARDIR/state/statd/ip/$IP \| while read CLIENT; do`
			`rm $CTDB_VARDIR/state/statd/ip/$IP/$CLIENT`
Eventscripts: remove unnecessary absolute paths from external commands. For eventscript unit testing it will be necessary to override external commands to allow stub implementations to be used. If absolute paths aren't used then this can be done using either a fake bin/ subdirectory or by using shell functions. This removes all of the simple cases of absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/ctdb.init config/events.d/50.samba Keep old code but remove absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8) 2011-04-13 06:08:09 +04:00			`smnotify --client=$CLIENT --ip=$IP --server=$ip --stateval=$STATE`
			`smnotify --client=$CLIENT --ip=$IP --server=$NFS_HOSTNAME --stateval=$STATE`
Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16) 2010-08-30 12:13:28 +04:00			STATE=`expr "$STATE" "+" "1"`
Eventscripts: remove unnecessary absolute paths from external commands. For eventscript unit testing it will be necessary to override external commands to allow stub implementations to be used. If absolute paths aren't used then this can be done using either a fake bin/ subdirectory or by using shell functions. This removes all of the simple cases of absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/ctdb.init config/events.d/50.samba Keep old code but remove absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8) 2011-04-13 06:08:09 +04:00			`smnotify --client=$CLIENT --ip=$IP --server=$ip --stateval=$STATE`
			`smnotify --client=$CLIENT --ip=$IP --server=$NFS_HOSTNAME --stateval=$STATE`
do not restart lockd/statd when we takeover an ip address this is overkill since 1, we now kill the tcpconnections for lockd in 60.nfs 2, rpc.statd on linux sends out the notifications using the wrong interface anyway which breaks a lot of clients including linux ! use our own smnotify tool instead of sm-notify (This used to be ctdb commit 0163ad0ec01be6189a98ea91e5cec40f6750218f) 2007-08-04 05:23:04 +04:00			`done`
added hooks to make nfs statd behave correctly on failover (This used to be ctdb commit a1ee84fc47892b6c18d417ccf714211fcb07952e) 2007-05-31 05:09:45 +04:00			`done`
			`;;`
			`esac`

195 lines 6.6 KiB Plaintext Raw Normal View History Unescape Escape

195 lines

6.6 KiB

Plaintext

Raw Normal View History