1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

86 Commits

Author SHA1 Message Date
Andrew Tridgell
b87ddd9148 no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
2007-09-24 10:00:14 +10:00
Andrew Tridgell
416c0cec6e make the persistent dbdir configurable
(This used to be ctdb commit 2587b887dcfce26b12c66fcb5d34e92da42a1776)
2007-09-21 16:12:04 +10:00
Andrew Tridgell
c62490569b cope with non-standard install dirs in event scripts
(This used to be ctdb commit 52fff5345873690a9cc86495f414343eaa3bd540)
2007-09-14 14:14:03 +10:00
Andrew Tridgell
305f432e50 fix pkill args
(This used to be ctdb commit 9690de97b4746f4a79830465e3a1679e9fbda671)
2007-09-14 11:59:04 +10:00
Andrew Tridgell
955d4d8615 make sure all public IPs are removed at startup
(This used to be ctdb commit b16f33787f2a9471285037f4a6d470e826536570)
2007-09-14 11:56:40 +10:00
Ronnie Sahlberg
8edcd3f83f during startup make sure to delete any public addresses from any
interface

(This used to be ctdb commit 18d80ea6db39e61f60e4c01de164d58bcbd8ab10)
2007-09-14 10:37:10 +10:00
Andrew Tridgell
3b159e4e60 wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
(This used to be ctdb commit 216eb4be7ec481cfe9aaeeada257b77cb394d2e4)
2007-09-14 09:25:11 +10:00
Andrew Tridgell
9cf96a5e4c nicer use of testparm
(This used to be ctdb commit a611ea930fb9dae6e56f6a74b2bdc9e08066d4d1)
2007-09-14 09:24:34 +10:00
Andrew Tridgell
2f86c3f827 ensure smbd and winbindd do die in 50.samba
(This used to be ctdb commit 6f23affedb626fc7a5ca86c4763f3045a5586231)
2007-09-13 14:36:23 +10:00
Andrew Tridgell
6fa6101b1a more shell scripting fixes in 10.interface
(This used to be ctdb commit 4ee2230b3f2ae7437a9d0cf973eb4645d276accd)
2007-09-13 11:57:42 +10:00
Andrew Tridgell
25940014c0 fixed script errors in 10.interface
(This used to be ctdb commit 0c759614d27758cef3eba5942b2cccad54193cbb)
2007-09-13 11:19:30 +10:00
Andrew Tridgell
4f261ae191 remove more cruft from the logs
(This used to be ctdb commit b67f35c483b6cbb5facaa6380c7794709f44213a)
2007-09-13 10:39:05 +10:00
Andrew Tridgell
023b885793 new approach for killing TCP connections on IP release
(This used to be ctdb commit c33a0db29b5604966f582b1f8c5fd66760c72197)
2007-09-13 10:24:48 +10:00
Andrew Tridgell
1b53ecc445 remove clutter from ctdb log file
(This used to be ctdb commit 54d5dcaaee0498f40bbee5059cc72d0ca75d33b7)
2007-09-13 10:03:18 +10:00
Andrew Tridgell
96c54c6188 handle hung or slow ctdb daemons on shutdown
(This used to be ctdb commit a3089211782ab12387c1b04efa28914c94d89b30)
2007-09-12 13:26:24 +10:00
Andrew Tridgell
6c77184d96 - set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;

(This used to be ctdb commit e2df1d6d08e67a36ff05a590a34c56e900741287)
2007-09-12 13:23:36 +10:00
Andrew Tridgell
a6728e0520 fixed location of arp_filter
(This used to be ctdb commit ea239c82fca2b9a648d21e5c603e632011958452)
2007-09-11 16:38:32 +10:00
Andrew Tridgell
57d8102cf8 added back --public-interface to startup script
(This used to be ctdb commit 9e9cb3c0da7251f522c655366ef0868037577a9c)
2007-09-10 15:09:28 +10:00
Ronnie Sahlberg
50381480eb update a comment
(This used to be ctdb commit e7d3ef4443686529299e8f293398cc0522235627)
2007-09-10 07:45:57 +10:00
Ronnie Sahlberg
4ac749bfa4 change the signature to ctdb_sys_have_ip() to also return:
a bool that specifies whether the ip was held by a loopback adaptor or 
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address 
over to the loopback interface

when we release an ip address  after we have move it onto loopback, 
use 60.nfs to kill off the server side (the local part) of the tcp 
connection   so that the tcp connections dont survive a 
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip 
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public 
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
2007-09-10 07:20:44 +10:00
Ronnie Sahlberg
0ebd7beb4b set /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when
10.interfaces startsup

this setting makes the system only respond to APR requests from the NIC 
where the ip address is tied to and adds to the 
"principle of least surprise" when using multihoming servers

(This used to be ctdb commit 39ddf347dc45f599964a4c17e67e71faed00e544)
2007-09-08 08:09:02 +10:00
Ronnie Sahlberg
eb7a15730e add a short delay after stopping nfslock to make it less likely that
"weird" things happen

(This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a)
2007-09-07 12:14:53 +10:00
Ronnie Sahlberg
fa872de664 60.nfs:
we must always restart the lockmanager when the cluster has been 
reconfigured and ip addresses has changed. This is to make sure we get a 
clusterwide grace period for nfs locking.
if we dont do this and only restart locking on the nodes that were 
direclty affected, a different client can take out a conflicting lock 
from a different node before affected clients has had a chance to
reclaim all the locks lost during reconfigure.
grace period on rhel5 kernel has bene increased to 90 seconds!

statd-callout:
we must restart lockmanager to ensure a clusterwide grace period for 
nfs. this makes locking "more correct" for nfs clients and prevents
other clients/nodes from taking out a conflicting lock while a different
client/node tries to reclaim lost locks.
This makes it "almost consistent" for NFS clients   but there is still 
the possibility that a cifs client can take out a conflicting lock 
before an nfs client has had a chance to reclaim an existing lock.
This can not be solved with anything less than making the kernel nfs 
lock manager "samba aware" and making samba aware of the internal state 
of the kernel lock manager so that they can cooperate.

we can not just stop/start the lockmanager back to back in rhel5 since 
if they are stopped/started too close to eachother then when the new 
lockmanager upon starting up sends out statd notifications two things 
can happen:
1, new lockmanager sends out notification BEFORE it has registered with 
portmapper leading to 
  lockmanager starts
  lockmanager sends notification to the client
  client tries to recover the lock and tries to portmap the lockmanager
  port on the server.
  server is not (yet) registered with portmapper and server responds
  "no such program" to hte clients request to discover where lockmanager
   is.
  client then just completely gives up reclaiming the lock and doesnt 
  even reattempt the portmapper call after some timeout.
  ==> lock reclaim failed.
2, if they are started back to back, and a client tries to reclaim the
   lock  the lockmanager sometimes sends two responses back to back
   to the client.   one with status NLM_GRANTED (==you got the lock 
reclaimed) and one with status NLM_DENIED (==you could not get the lock 
reclaimed)
   This confuses the client and leads to the server thinking that the 
client does have the lock   and the client thinking it has not got the 
lock    and orphaned locks result.


We also send out additional notification messages of different formats
to allow more legacy clients to interoperate with locking.

(This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 08:52:56 +10:00
Ronnie Sahlberg
00453a375a improve the handling of hosts to notify with statd
(This used to be ctdb commit cc87bda7e344bc777b9620a6211e62de4dce4e3b)
2007-09-06 11:30:49 +10:00
Ronnie Sahlberg
46eecfea27 we dont use 'sendip' any more so dont check for it and exit from the
61.nfstickles script if it is missing from the host

(This used to be ctdb commit 8eac441e24f4ef33b55f9eaa4856b5c1e1c15213)
2007-09-05 15:39:51 +10:00
Ronnie Sahlberg
12ebb74838 change how we do public addresses and takeover so that we can have
multiple public addresses spread across multiple interfaces on each 
node.

this is a massive patch since we have previously made the assumtion that 
we only have one public address per node.

get rid of the public_interface argument.  the public addresses file 
now explicitely lists which interface the address belongs to

(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
2007-09-04 09:50:07 +10:00
Ronnie Sahlberg
4e61e05f49 when we start 60.nfs we must make sure that the shared storage
nfs-state directory actually exists (by creating it)
or else the lock manager will not start 

(This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)
2007-08-30 15:27:45 +10:00
Ronnie Sahlberg
1ee8c79db7 start winbind before smbd
(This used to be ctdb commit d6a2e22a6d688cfcf5631c8de68fc8ef721635d6)
2007-08-16 11:34:35 +10:00
Ronnie Sahlberg
ce91401724 we should start winbindd before we start smb
(This used to be ctdb commit 03aad3ea55c4816a3790ac9336026b4872a65310)
2007-08-16 11:18:16 +10:00
Ronnie Sahlberg
3b9d50f3ee change the now rather small /etc/ctdb/events script into a service
specific script /etc/ctdb/events.d/00.ctdb

get rid of CTDB_EVENTS_SCRIPT and --event-script

(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
2007-08-15 15:01:31 +10:00
Ronnie Sahlberg
4023576e50 call the service specific event scripts directly from the forked child
instead for from /etc/ctdb/events so that we can get better debugging 
output in the logs when something fails in the scripts

(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)
2007-08-15 14:44:03 +10:00
Ronnie Sahlberg
1fa787e667 fix typo
(This used to be ctdb commit c7a8e7b506f98240c0e9f705fe1f504a6a56a332)
2007-08-15 11:38:27 +10:00
Ronnie Sahlberg
83dbfecad7 add a description on how the event scripts works to the README and make
sure it is installed in /etc/ctdb/events.d

(This used to be ctdb commit adec62a924af5bb023f346e705515b09dbe64f21)
2007-08-15 11:36:01 +10:00
Ronnie Sahlberg
8b58fe2489 do not restart lockd/statd when we takeover an ip address this is
overkill since
1, we now kill the tcpconnections for lockd in 60.nfs
2, rpc.statd on linux sends out the notifications using the wrong 
interface anyway  which breaks a lot of clients  including linux !



use our own smnotify tool instead of sm-notify

(This used to be ctdb commit 0163ad0ec01be6189a98ea91e5cec40f6750218f)
2007-08-04 11:23:04 +10:00
Andrew Tridgell
fb22d3bd2c merged from ronnie
(This used to be ctdb commit 765b07fa5d1af07c8c7212d19d8e9574060b3039)
2007-07-18 20:13:57 +10:00
Ronnie Sahlberg
7e532f8f83 we dont do nfstickles unless ctdb manages nfs
(This used to be ctdb commit 0622b4a969abdc8bd11f200ed5ae1c7b1d188db7)
2007-07-15 11:43:11 +10:00
Ronnie Sahlberg
643b87fbae fix bug introduced in previous commit
(This used to be ctdb commit 8396a7500225c90165ebcfbdc2c65673740e6b25)
2007-07-15 11:37:22 +10:00
Ronnie Sahlberg
e96f733052 there is no point in doing anything in 10.interfaces unless we have a
public interface

(This used to be ctdb commit c0335ee92b16a1e2dfcb37a39872b66a35b0ab94)
2007-07-15 11:28:53 +10:00
Ronnie Sahlberg
8e89b27098 try netstat as a last attempt to check a tcp port in
ctdb_check_tcp_ports() as well

(This used to be ctdb commit ad0292726f9cfc8afe3733b30ac2d5621e9a48f1)
2007-07-15 09:29:08 +10:00
Ronnie Sahlberg
4c276ded1f if we dont have nc or netcat, try using netstat as a final attempt to
check for tcp ports

(the check for these tools should not really use hardcoded paths)

(This used to be ctdb commit 56d77082c07a519dd3804cc24cc7ba889b8469ff)
2007-07-15 09:26:54 +10:00
Ronnie Sahlberg
3890fde07f if we dont have /etc/sysconfig and we dont have /etc/default
check /etc/ctdb/sysconfig as a last option

(This used to be ctdb commit 1043929ceb0cd04ab6466e9a5d7d52f9af1cb8e8)
2007-07-15 09:13:50 +10:00
Ronnie Sahlberg
82824e0680 when we have found that /etc/rc.d/init.d/SERVICE exists, then run that
script and not /etc/rc.d/SERVICE

(This used to be ctdb commit 7f0c3a02ef11fd19c8cd5116fd451ebd10ba5d1b)
2007-07-15 08:54:48 +10:00
Andrew Tridgell
1e14ecd176 - merge from ronnie
- cleaner handling of system capture socket

(This used to be ctdb commit d194a41a71b8466d0726dcbae3970a86386fcb3c)
2007-07-13 11:31:18 +10:00
Andrew Tridgell
0becf46deb allow extra option override in /etc/sysconfig/ctdb
(This used to be ctdb commit f46fae64263ea4776e4bbf9cf14dff17b5b68ddb)
2007-07-13 09:14:15 +10:00
Andrew Tridgell
fc73bc5c24 added --nosetsched option to ctdbd
(This used to be ctdb commit 4cbbb88c1735c7d112e751e22da1c1c69e09bf4a)
2007-07-13 08:47:02 +10:00
Ronnie Sahlberg
4b6d9485ab ctdb killtcp no longer takes a <numrst> argument to control how many
times to try the reset.

the reset retry attempt is now handled inside the daemon

update the 60.nfs script and remove this parameter that is no longer 
used

(This used to be ctdb commit 30fb09b8b9a989e5cfe86b6daf2dcd2487013344)
2007-07-12 08:31:56 +10:00
Ronnie Sahlberg
ed1a52b293 use the socketkiller to kill off all lock manager sessions as well
(This used to be ctdb commit 980b090001ed3a77001e2a3bfc1b03833498f434)
2007-07-10 13:09:35 +10:00
Ronnie Sahlberg
d81bca2072 make it possible to specify how many times ctdb killtcp will try to RST
the tcp connection

change the 60.nfs script to run ctdb killtcp in the foreground so we 
dont get lots of these running in parallel when there are a lot of tcp 
connections to rst

(This used to be ctdb commit d81616214752882242f2886e94681972a790db80)
2007-07-10 10:24:20 +10:00
Ronnie Sahlberg
1c32f65ee0 run the ctdb killtcp in the background
(This used to be ctdb commit d6a514c2b3d427099ed669eef104146608378fa8)
2007-07-10 10:07:26 +10:00
Ronnie Sahlberg
dbc66d054b dont restart the tcp service after a ip takeover, it is more efficient
to just kill off the tcp connections

(This used to be ctdb commit bc481c3f1a44c50648488c4f8a7f15ec395d446f)
2007-07-10 09:45:14 +10:00