Andrew Tridgell
3e4d7bef23
get all the tunables at once in recovery daemon
...
(This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93)
2007-06-07 18:05:25 +10:00
Andrew Tridgell
23bf62fe30
added admin commands to ban/unban nodes
...
(This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad)
2007-06-07 16:34:33 +10:00
Andrew Tridgell
2ed57a9ae1
implement a scheme where nodes are banned if they continuously caused the cluster
...
to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes)
(This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c)
2007-06-07 15:18:55 +10:00
Andrew Tridgell
9754d16d48
merged admin enable/disable change from ronnie
...
(This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a)
2007-06-07 11:15:22 +10:00
Ronnie Sahlberg
9ff733c784
add a control to permanently enable/disable a node
...
(This used to be ctdb commit d66fdba16ca22f62ddac6882a17614879b08a798)
2007-06-07 09:16:17 +10:00
Andrew Tridgell
81fad8636f
added timeouts in all event scripts
...
(This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1)
2007-06-06 13:45:12 +10:00
Andrew Tridgell
af8834dd02
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem
...
(This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f)
2007-06-06 10:25:46 +10:00
Andrew Tridgell
be3a00bd73
clean out some more cruft
...
(This used to be ctdb commit ad16c5fe2748b48a6f6c79976359d56d9bed33f4)
2007-06-05 17:57:07 +10:00
Andrew Tridgell
ac55bc4166
first step in health monitoring of cluster nodes. When not healthy they will be marked disabled
...
(This used to be ctdb commit d3dbd9fc4db21632075b56fc52cf95435c63374a)
2007-06-05 17:43:19 +10:00
Andrew Tridgell
ee546dec81
merge from ronnie
...
(This used to be ctdb commit 531d7ea7aca3116e78a4502a1c8b75a3fb764a4f)
2007-06-04 22:13:59 +10:00
Ronnie Sahlberg
4be9a44ba7
add a control that lists all public ip addresses and which node that
...
currently serves it
(This used to be ctdb commit db9b89dc423b31079e5502323e5fd2bbaf82e1e9)
2007-06-04 21:11:51 +10:00
Andrew Tridgell
39ced972ae
make recovery daemon values tunable
...
(This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353)
2007-06-04 20:22:44 +10:00
Ronnie Sahlberg
1ee8989bd4
merge from tridge
...
(This used to be ctdb commit 3bfede5d46dba5a3654dad9205534391bc339461)
2007-06-04 20:10:53 +10:00
Ronnie Sahlberg
79b54a624e
change the takoverip/releaseip controls to pass a structure containing
...
both the nodenumber and the id of the node that has taken over that
address in addition to the public address itself so that all nodes
can learn which node is currently hosting each of the public addresses
(This used to be ctdb commit 53e9ff790387b85a36fa9c3c44cd4c95cbdf35da)
2007-06-04 20:07:37 +10:00
Andrew Tridgell
dbb2ec43dd
added tunables settable using ctdb command line tool
...
(This used to be ctdb commit 73d440f8cb19373cfad7a2f0f0ca4f963c57ff29)
2007-06-04 19:53:19 +10:00
Andrew Tridgell
f1d81386e6
- start moving tunable variables into their own structure
...
- fixed the test scripts to use a separate dbdir
(This used to be ctdb commit 396752e8908c48373564e915e2d49cfc9ff61eba)
2007-06-04 17:46:37 +10:00
Andrew Tridgell
a57991c0eb
remove some cruft thats not needed any more
...
(This used to be ctdb commit c4308805b997740b77e058c1a14b84cb400a7c30)
2007-06-04 17:23:55 +10:00
Ronnie Sahlberg
a3e4e204dc
add the ip address to the nodemap structure we pull from a server and
...
display the physical address of a node when we do a ctdb status
(This used to be ctdb commit 660bf30db713f0680acd3f74275ad603b35a0c24)
2007-06-04 13:26:07 +10:00
Andrew Tridgell
c5e4ce360a
make test now works again
...
(This used to be ctdb commit 439d87bbb9840f82937e51aff4fe2b80160878c6)
2007-06-02 13:31:36 +10:00
Andrew Tridgell
ebf12646cf
- make specification of a recovery lock file compulsory
...
- die if someone other than the recmaster can get the recovery lock
(This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869)
2007-06-02 11:36:42 +10:00
Andrew Tridgell
4f72a202d9
- moved cmdline options that are only relevant to ctdbd into ctdbd.c
...
- fixed a valgrind error on failing to send a control
- don't mark node dead when already disconnected
- moved node list lock code into common code
(This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b)
2007-06-02 10:03:28 +10:00
Andrew Tridgell
27b0e323e6
disable realtime scheduler in event scripts
...
(This used to be ctdb commit 56225ac6fdfe754289bc7d5e0fc8d21c81a7aa8e)
2007-06-02 08:46:49 +10:00
Andrew Tridgell
5e5701a7b8
- make calling of recovered event script async
...
- shutdown sockets before calling shutdown script
(This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9)
2007-06-02 08:41:19 +10:00
Andrew Tridgell
7db1d04d5c
make the running of the takeover and release event scripts async, to prevent outages due to slow scripts
...
(This used to be ctdb commit 4189be97eee7ab2a50335c860f2fcd9566667d01)
2007-06-01 19:05:41 +10:00
Andrew Tridgell
bf3b740a1b
ctdb is GPL not LGPL
...
(This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960)
2007-05-31 13:50:53 +10:00
Andrew Tridgell
1e72af9c51
close sockets when we exec scripts
...
(This used to be ctdb commit 0fac2164db4279db2d7d376a34be05b890304087)
2007-05-30 15:43:25 +10:00
Andrew Tridgell
8ed48aac51
don't start the transport connecting to the other nodes until after the startup event script has run
...
(This used to be ctdb commit afca3cc74211aa2e18b1f74d36b2add8dffcfdc7)
2007-05-30 13:26:50 +10:00
Andrew Tridgell
2d9e0ad56a
use /etc/services for ctdb
...
(This used to be ctdb commit 64bf6964ff33320c5351337c7f8ed4da5bd71275)
2007-05-29 15:15:00 +10:00
Andrew Tridgell
1140d5a20a
fixed more warnings on 64 bit boxes
...
(This used to be ctdb commit 2f6eae476203f8a8b28e083553204c01f224c8a5)
2007-05-29 13:58:41 +10:00
Andrew Tridgell
bc891232b6
fixed some debug messages
...
(This used to be ctdb commit 037f0149c0c0e65af0a1669b9a52586129e4b48f)
2007-05-29 13:48:30 +10:00
Andrew Tridgell
edcaa0d6a0
clean shutdown in ctdb - release all our IPs
...
(This used to be ctdb commit 2f196cb6a86eb85205d7de1c4cadd4e1e701c06f)
2007-05-29 13:33:59 +10:00
Andrew Tridgell
ead091449b
call the event script on recovery too
...
(This used to be ctdb commit 8c43a91cbd6e502c93bd6cc51df1272eae426709)
2007-05-29 12:55:24 +10:00
Andrew Tridgell
dfadb60318
- moved ctdbd specific options to ctdbd.c from cmdline.c
...
- allow a event script to be specified that will take IPs, release
IPs, and handle recovery in system specific ways
- redirect stderr in subcommands to the log
(This used to be ctdb commit de0fc9ba370db781f9c46406ed180c8211946c7a)
2007-05-29 12:49:25 +10:00
Andrew Tridgell
ccf4d78e04
- renamed ctdb_control utility to ctdb
...
- use -n to specify node number in ctdb utility
- change 'ctdb status' to 'ctdb statistics'
- added 'ctdb status' which shows status
- added netmask to public IPs, so you don't try a takeover on a
foreign network
- cleaned up tools/ctdb_control.c a lot
- generate usage message at runtime
(This used to be ctdb commit 28de71c03ace7d32a9fd9882fabbd5d668b97656)
2007-05-29 12:16:59 +10:00
Andrew Tridgell
9cc3ce8554
automatic cleanup of tcp tickle records
...
(This used to be ctdb commit ede79b571bf89b89f1b8394f262ca0689f8c65f3)
2007-05-28 00:34:40 +10:00
Andrew Tridgell
d41290fbae
added code to ctdb to send a tcp 'tickle' ack when we takeover an
...
IP. A raw tcp ack is sent for each tcp connection held by clients
before the IP takeover.
These acks have a deliberately incorrect sequence number, and should
cause the windows client to send its own ack which will in turn cause
a tcp reset and thus cause windows clients to much more quickly
reconnect to the new node.
(This used to be ctdb commit eef38bfe8461b47489d169c61895d6bb8a8f79a1)
2007-05-27 15:26:29 +10:00
Andrew Tridgell
647540253e
tweak timeouts
...
(This used to be ctdb commit 54a90797469f56d796efd82e9294efff3c5dabcc)
2007-05-27 09:43:25 +10:00
Andrew Tridgell
cc4d8102cd
moved system specific ip code to system.c
...
(This used to be ctdb commit 9de9e4ccda9665108baac12a8716b189d26340b1)
2007-05-26 14:01:08 +10:00
Andrew Tridgell
31053286c5
keep sending ARPs for 2 minutes, every 5 seconds
...
(This used to be ctdb commit d5223f2eed4a762b93a101c720286568578ce7ed)
2007-05-25 21:27:26 +10:00
Andrew Tridgell
7a9e40b288
consider a node dead after 6 seconds, not 15
...
(This used to be ctdb commit b055907f0bd2fa0e83bd84e49039fa868905b941)
2007-05-25 20:00:06 +10:00
Andrew Tridgell
56e3eed3d1
added IP takeover logic for public IPs to ctdb
...
(This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e)
2007-05-25 17:04:13 +10:00
Ronnie Sahlberg
2b6c39a0af
add controls to take over and release an ip address
...
add sending of grat arp both normal grat arp (request) and also
unsolicited grat arp replies
(This used to be ctdb commit 7305c00c21c30bdbafc3722a018513378bd307e6)
2007-05-25 13:05:25 +10:00
Andrew Tridgell
7596347844
make ctdbd realtime if possible
...
(This used to be ctdb commit 8852f6cca52b64a5239c83ab7c6a99ae4edb2597)
2007-05-24 14:52:10 +10:00
Andrew Tridgell
70912e2b0c
added automatic vacuuming of empty records during recovery
...
(This used to be ctdb commit f9181a784ac7009df5e9c996f4e0c3e99098b59a)
2007-05-23 17:21:14 +10:00
Andrew Tridgell
74bf76ca10
merge from ronnie
...
(This used to be ctdb commit 267481b67152bc5885884d223085aa9ef5fe73bd)
2007-05-23 14:50:41 +10:00
Andrew Tridgell
76b2822340
- startup frozen, and do an initial recovery
...
- fixed a bug in traverse
- get a lock on the node list file in the recmaster recovery daemon
(This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913)
2007-05-23 14:35:19 +10:00
Andrew Tridgell
9f7a70657f
start ctdb frozen, and let the election sort things out. This prevents a race on startup
...
(This used to be ctdb commit b788ed3fa64e31e517b4e602e8bd3ae7201ecddd)
2007-05-23 12:23:07 +10:00
Ronnie Sahlberg
e989a1bac8
add controls to enable/disable the monitoring of dead nodes
...
(This used to be ctdb commit 79d29c39bb81feb069db3fc6d3d392c1e75a4d13)
2007-05-21 09:24:34 +10:00
Andrew Tridgell
d549f1e1a3
merge from ronnie
...
(This used to be ctdb commit 985d718e03510398b9a5cfdf6a4d559a90738a11)
2007-05-19 17:21:58 +10:00
Ronnie Sahlberg
02a9f1b0a0
use ctdb_dead_node() instead of reimplementing the same code again
...
this leaves only one single function where a node is marked as dead
instead of two places
(This used to be ctdb commit aa764ea26cc26d5c1ae188105236da603576f45b)
2007-05-19 16:59:10 +10:00