1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-10 01:18:15 +03:00
samba-mirror/ctdb/config/statd-callout

145 lines
5.7 KiB
Plaintext
Raw Normal View History

#!/bin/sh
# this script needs to be installed so that statd points to it with the -H
# command line argument. The easiest way to do that is to put something like this in
# /etc/sysconfig/nfs:
# STATD_HOSTNAME="myhostname -H /etc/ctdb/statd-callout"
. /etc/ctdb/functions
loadconfig nfs
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
[ -z "$STATD_SHARED_DIRECTORY" ] && {
echo STATD_SHARED_DIRECTORY not configured. statd-callout failed.
exit 0
}
[ -d $STATD_SHARED_DIRECTORY ] || exit 0
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
[ -z $NFS_HOSTNAME ] && {
echo NFS_HOSTNAME is not configured. statd-callout failed.
exit 0
}
case "$1" in
add-client)
# the callout does not tell us to which ip the client connected
# so we must add it to all the ips that we serve
for f in `/bin/ls /etc/ctdb/state/statd/ip/*`; do
ip=`/bin/basename $f`
[ -d $STATD_SHARED_DIRECTORY/$ip ] || /bin/mkdir $STATD_SHARED_DIRECTORY/$ip
/bin/touch $STATD_SHARED_DIRECTORY/$ip/$2
done
;;
del-client)
# the callout does not tell us to which ip the client connected
# so we must add it to all the ips that we serve
for f in `/bin/ls /etc/ctdb/state/statd/ip/*`; do
ip=`/bin/basename $f`
/bin/rm -f $STATD_SHARED_DIRECTORY/$ip/$2
done
;;
notify)
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
# we must restart the lockmanager (on all nodes) so that we get
# a clusterwide grace period (so other clients dont take out
# conflicting locks through other nodes before all locks have been
# reclaimed)
# we need these settings to make sure that no tcp connections survive
# across a very fast failover/failback
echo 0 > /proc/sys/net/ipv4/tcp_max_tw_buckets
echo 0 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 0 > /proc/sys/net/ipv4/tcp_max_orphans
# rebuild the state directory for the local statd to use the correct
# state value and to initally send notifications to all clients
rm -f /var/lib/nfs/statd/sm/*
rm -f /var/lib/nfs/statd/sm.bak/*
cat $STATD_SHARED_DIRECTORY/state >/var/lib/nfs/statd/state
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
# we must keep a monotonically increasing state variable for the entire
# cluster so state always increases when ip addresses fail from one
# node to another
[ ! -f $STATD_SHARED_DIRECTORY/state ] && {
echo 1 | awk '{printf("%c%c%c%c", $0, $0/256, $0/256/256, $0/256/256/256);}' >$STATD_SHARED_DIRECTORY/state
}
# read current state
STATE=`od -t d4 $STATD_SHARED_DIRECTORY/state | head -1 | sed -e "s/^[0-9]*[^0-9]*//"`
# write current state+2 back to the state file
# the /2 *2 are to ensure that state is odd. state must be odd.
STATE=`expr $STATE "/" 2 "*" 2 "+" 3`
echo $STATE | awk '{printf("%c%c%c%c", $0, $0/256, $0/256/256, $0/256/256/256);}' >$STATD_SHARED_DIRECTORY/state
# we must also let some time pass between stopping and restarting the
# lockmanager since othervise there is a window where the lockmanager
# will respond "strangely" immediately after restarting it, which
# causes clients to fail to reclaim the locks.
#
service nfslock stop > /dev/null 2>&1
sleep 2
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
# copy all monitored clients on this node to the local lockmanager
for f in `/bin/ls /etc/ctdb/state/statd/ip/* 2>/dev/null`; do
ip=`/bin/basename $f`
[ -d $STATD_SHARED_DIRECTORY/$ip ] && [ -x /usr/bin/smnotify ] && {
for g in `/bin/ls $STATD_SHARED_DIRECTORY/$ip/* 2>/dev/null`; do
client=`/bin/basename $g`
touch /var/lib/nfs/statd/sm/$client
done
}
done
# now start lockmanager again with the new state directory.
service nfslock start > /dev/null 2>&1
# we now need to send out additional statd notifications to ensure
# that clients understand that the lockmanager has restarted.
# we have three cases:
# 1, clients that ignore the ip address the stat notification came from
# and ONLY care about the 'name' in the notify packet.
# these clients ONLY work with lock failover IFF that name
# can be resolved into an ipaddress that matches the one used
# to mount the share. (==linux clients)
# This is handled when starting lockmanager above, but those
# packets are sent from the "wrong" ip address, something linux
# clients are ok with, buth other clients will barf at.
# 2, Some clients only accept statd packets IFF they come from the
# 'correct' ip address.
# 2a,Send out the notification using the 'correct' ip address and also
# specify the 'correct' hostname in the statd packet.
# Some clients require both the correct source address and also the
# correct name. (these clients also ONLY work if the ip addresses
# used to map the share can be resolved into the name returned in
# the notify packet.)
# 2b,Other clients require that the source ip address of the notify
# packet matches the ip address used to take out the lock.
# I.e. that the correct source address is used.
# These clients also require that the statd notify packet contains
# the name as the ip address used when the lock was taken out.
#
# Both 2a and 2b are commonly used in lockmanagers since they maximize
# probability that the client will accept the statd notify packet and
# not just ignore it.
for f in `/bin/ls /etc/ctdb/state/statd/ip/* 2>/dev/null`; do
ip=`/bin/basename $f`
[ -d $STATD_SHARED_DIRECTORY/$ip ] && [ -x /usr/bin/smnotify ] && {
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
for g in `/bin/ls $STATD_SHARED_DIRECTORY/$ip/* 2>/dev/null`; do
client=`/bin/basename $g`
60.nfs: we must always restart the lockmanager when the cluster has been reconfigured and ip addresses has changed. This is to make sure we get a clusterwide grace period for nfs locking. if we dont do this and only restart locking on the nodes that were direclty affected, a different client can take out a conflicting lock from a different node before affected clients has had a chance to reclaim all the locks lost during reconfigure. grace period on rhel5 kernel has bene increased to 90 seconds! statd-callout: we must restart lockmanager to ensure a clusterwide grace period for nfs. this makes locking "more correct" for nfs clients and prevents other clients/nodes from taking out a conflicting lock while a different client/node tries to reclaim lost locks. This makes it "almost consistent" for NFS clients but there is still the possibility that a cifs client can take out a conflicting lock before an nfs client has had a chance to reclaim an existing lock. This can not be solved with anything less than making the kernel nfs lock manager "samba aware" and making samba aware of the internal state of the kernel lock manager so that they can cooperate. we can not just stop/start the lockmanager back to back in rhel5 since if they are stopped/started too close to eachother then when the new lockmanager upon starting up sends out statd notifications two things can happen: 1, new lockmanager sends out notification BEFORE it has registered with portmapper leading to lockmanager starts lockmanager sends notification to the client client tries to recover the lock and tries to portmap the lockmanager port on the server. server is not (yet) registered with portmapper and server responds "no such program" to hte clients request to discover where lockmanager is. client then just completely gives up reclaiming the lock and doesnt even reattempt the portmapper call after some timeout. ==> lock reclaim failed. 2, if they are started back to back, and a client tries to reclaim the lock the lockmanager sometimes sends two responses back to back to the client. one with status NLM_GRANTED (==you got the lock reclaimed) and one with status NLM_DENIED (==you could not get the lock reclaimed) This confuses the client and leads to the server thinking that the client does have the lock and the client thinking it has not got the lock and orphaned locks result. We also send out additional notification messages of different formats to allow more legacy clients to interoperate with locking. (This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 02:52:56 +04:00
# /bin/rm -f $g
# send out notifications from the "correct" address
# (the same addresse as where the lock was taken out
# on) some clients require that the source address
# matches where the lock was taken out.
# also send it both as a name that the client
# hopefully can resolve into the server ip and
# and also by specifying the raw ip address as name.
/usr/bin/smnotify --client=$client --ip=$ip --server=$ip --stateval=$STATE
/usr/bin/smnotify --client=$client --ip=$ip --server=$NFS_HOSTNAME --stateval=$STATE
done
}
done
;;
esac