Andrew Tridgell
cb81a2eca8
watch for the freeze child exiting
...
(This used to be ctdb commit 7f350eca8598022ebd198b2476d1f2c2a8f03a8d)
2007-05-12 15:44:35 +10:00
Andrew Tridgell
f7e3004f0a
more robust freeze/thaw logic
...
(This used to be ctdb commit 51c1e51aeb7dfac1683584df7ef1bef98c092f76)
2007-05-12 15:29:06 +10:00
Andrew Tridgell
9cf77dd23f
separate out the freeze/thaw handling from recovery
...
(This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616)
2007-05-12 15:15:27 +10:00
Andrew Tridgell
74a799a83b
added lockwait child code for entering recovery mode. A child processes holds lockall locks for the entire recovery process
...
(This used to be ctdb commit f892f30def75b0d964c35eae38c4cf675597dd28)
2007-05-12 14:34:21 +10:00
Andrew Tridgell
ae55e4181d
added _mark calls for tdb_lockall
...
(This used to be ctdb commit e59134fd2af67c746b907c23fdcde2eccbbe17cf)
2007-05-12 14:33:10 +10:00
Andrew Tridgell
85aff64ed8
fixed debug message
...
(This used to be ctdb commit 9802bf1ef9104b31977020e803b0f81da71c7169)
2007-05-11 17:29:21 +10:00
Ronnie Sahlberg
0c9bb4bb44
we have to get a NEW generation id after completing recovery
...
to solve a race condition with the logic to retransmit in
ctdb_call.c/ctdb_call_timeout()
(This used to be ctdb commit 1044ddca9ff5c434816de35d3f659aa182704e97)
2007-05-11 12:03:19 +10:00
Ronnie Sahlberg
7769a2d45e
merge from tridge
...
(This used to be ctdb commit 826058b547b8e836f0a7066e9479e481ad9c472e)
2007-05-11 10:37:42 +10:00
Ronnie Sahlberg
9ec3024287
add a control to bump the rsn number for all records in a database
...
use this control from the recovery daemon to ensure that the recmaster
always have a higher rsn than andy other node for the records after
recovery completes
(This used to be ctdb commit 6fb6a8b981a804bfcc460c4481c51c7c647230f6)
2007-05-11 10:36:47 +10:00
Andrew Tridgell
63acf8ab95
- merge from ronnie
...
- increment rsn only in become_dmaster
- add torture check for rsn regression in ctdb_ltdb_store
(This used to be ctdb commit 8047506a08bb53ee01aa64f25c9f72839e1e2d68)
2007-05-11 10:33:43 +10:00
Ronnie Sahlberg
9eeb4f1a51
we must bump the rsn everytime we do a REQ_DMASTER or a REPLY_DMASTER
...
to make sure that the "merge records based on rsn during recovery" will
merge correctly.
this is extra important since samba3 never bumps the record when it
writes new data to it !
(This used to be ctdb commit 857e67204065603592c2dbbadbd8667ebba9ccdb)
2007-05-11 06:08:17 +10:00
Ronnie Sahlberg
325713dfeb
make ctdb_control catdb work again
...
(This used to be ctdb commit 40a8fb68c71be0b9f54ae88bf8aa39a4c71f3b5a)
2007-05-11 05:40:11 +10:00
Andrew Tridgell
f8765b19bf
- got rid of the complex hand marshalling in the recovery controls
...
- fixed the re-send of ctdb calls after a generation change
- fixed a reqid idr leak in controls
- removed the write_record test code
- use the new nonblock lockall code to prevent ctdbd from ever doing a
blocking lock that could deadlock with smbd
- moved more of the recovery controls into ctdb_recover.c
(This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec)
2007-05-10 17:43:45 +10:00
Andrew Tridgell
698d2a6af4
added nonblocking varients of the two lockall functions to tdb
...
(This used to be ctdb commit 2e99fa41ce01fa282bc0f3244ca42a78173743ed)
2007-05-10 17:43:08 +10:00
Andrew Tridgell
15bc97cdaa
better timeout handling for calls, controls and traverses
...
(This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe)
2007-05-10 14:06:48 +10:00
Andrew Tridgell
31cd92dc7e
merge from ronnie
...
(This used to be ctdb commit 92b7a849565730744c75a7fb776173554e9f57bf)
2007-05-10 13:15:58 +10:00
Andrew Tridgell
50390bcb18
setup the random number generator a bit better
...
(This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2)
2007-05-10 13:10:23 +10:00
Ronnie Sahlberg
a54390197a
create a correct vnnmap structure to prevent a segv
...
(This used to be ctdb commit 17777bb5e6208e97a82a171243c6c406f53ee02e)
2007-05-10 10:10:58 +10:00
Ronnie Sahlberg
82e37a9886
update ctdb_control to create a correct ctdb_vnn_map->map array
...
(This used to be ctdb commit e510cc89068557881688d6cada38915b3e51f8cd)
2007-05-10 10:03:21 +10:00
Ronnie Sahlberg
a56a2501ac
when starting a new election, also force all nodes into recovery mode so
...
there is no internode traffic to interfere with our election
(This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad)
2007-05-10 09:48:14 +10:00
Ronnie Sahlberg
4370dc1e75
when starting recovery repoint dmaster to an invalid node and not the
...
current vnn
(This used to be ctdb commit 3c2dcc7448b335cf42e8f7edffba21229dccbd79)
2007-05-10 09:46:10 +10:00
Ronnie Sahlberg
325f321409
merge from tridge
...
(This used to be ctdb commit 8c5e6836280499243c0cd247093844a891f00da3)
2007-05-10 09:44:28 +10:00
Ronnie Sahlberg
639e4374e5
actually check the remote nodes and not just the local node
...
(This used to be ctdb commit 09df21be6361743d320fafc120718211eece85c3)
2007-05-10 09:43:01 +10:00
Andrew Tridgell
1e38ae491f
remove old s3 recovery code
...
fixed vnnmap wire format in recover daemon
(This used to be ctdb commit e03fab7bfe0cf43f40c49a3d63e75dc44001d8d8)
2007-05-10 08:49:57 +10:00
Andrew Tridgell
2a82665532
fixed setvnnmap to use wire structures too
...
(This used to be ctdb commit 1208e4219d220b80e2f74974cac8ed2b8956d3ef)
2007-05-10 08:22:26 +10:00
Andrew Tridgell
682df74d59
separate the wire format and internal format for the vnn_map
...
(This used to be ctdb commit 9a71718d87c5162f1423d85c2e86a01f6771925e)
2007-05-10 08:13:19 +10:00
Andrew Tridgell
a8f83423f4
moved the vnn_map initialisation out of the cmdline code
...
(This used to be ctdb commit 81492b840d608dc724d5a25ddef6eb0ce12b95fb)
2007-05-10 07:55:46 +10:00
Andrew Tridgell
ba47b43c6b
merged ronnies code to delay client requests when in recovery mode
...
(This used to be ctdb commit dfca37076d642f3407c63dfe3b685287d27c8f8d)
2007-05-10 07:43:18 +10:00
Ronnie Sahlberg
cbb6f99f41
merge from tridge
...
(This used to be ctdb commit 190cca8488dff982062ae7b1a82cb33cc1cdfaf7)
2007-05-10 06:55:28 +10:00
Ronnie Sahlberg
bbaaf2bbf4
hang the event from the retry structure instead of the hdr structure
...
(This used to be ctdb commit 8536c8c3a30a986ba4945d02aef82b47495ce3f8)
2007-05-09 14:08:11 +10:00
Ronnie Sahlberg
c938c1b5de
when we are in recovery mode and we get a REQ_CALL from a client,
...
defer it for one second and try again
(This used to be ctdb commit 606fb6414b97d1813056982cda7c0fe84d746e67)
2007-05-09 14:06:47 +10:00
Andrew Tridgell
d2a90cc5a5
merge from ronnie
...
(This used to be ctdb commit f67a4842e7b1efb2ad61c41e4895c7698e564bf3)
2007-05-09 11:54:37 +10:00
Ronnie Sahlberg
6929739b7f
add a command line flag to ctdbd to start a recovery daemon.
...
update the recovery test script to start all ctdb daemons with a
recovery daemon
(This used to be ctdb commit 47794e16df285cacefc30208d892d931a6e46b96)
2007-05-09 09:59:23 +10:00
Ronnie Sahlberg
92333fce03
change the name of the recovery daemon to ctdb_recoverd
...
(This used to be ctdb commit b0cf919e4f38961e5cf4e1e79a0cfe4bb4a96d76)
2007-05-09 09:31:53 +10:00
Ronnie Sahlberg
2befe18e29
add a small tool to monitor recovery
...
(This used to be ctdb commit b45936828713c31ee670e2106b49c2351234f310)
2007-05-09 08:05:53 +10:00
Andrew Tridgell
fdb8144e62
fixed a problem with the number of timed events growing without bound with the new seqnum code
...
(This used to be ctdb commit 6109ae3dae8d93c93a2dc76cc561ea6e21458aa6)
2007-05-08 21:16:29 +10:00
Ronnie Sahlberg
5efa3d88c5
we must repoint dmaster to an invalid node during recovery to stop the
...
shortcut from working
(This used to be ctdb commit 5e18930be8c0efb87aa9e2780d9457634b24e156)
2007-05-08 14:51:55 +10:00
Ronnie Sahlberg
e11eebd070
fix alignment bug for pulldb
...
(This used to be ctdb commit f1188289c18805c2c5f8bae61d73df3fc762faee)
2007-05-08 14:42:00 +10:00
Ronnie Sahlberg
a1866c6eeb
hang the timeout event off state and thus we dont need to explicitely
...
free it and also we wont accidentally return from the function without
killing the event first
(This used to be ctdb commit e3d72d024ef7342a808e5c488fd646a39e5fac78)
2007-05-07 07:54:17 +10:00
Ronnie Sahlberg
6bfb5f61ca
it now works to talloc_free() the timed event if we no longer want it to
...
trigger
this must have been a sideeffect of a different bug in the recoverd.c
code that has now been fixed
(This used to be ctdb commit 676446fd1083c371ad0ff72dd8c636ec8e6d1423)
2007-05-07 07:47:16 +10:00
Ronnie Sahlberg
39d81cffb1
recovery daemon with recovery master election
...
election is primitive, it elects the lowest vnn as the recovery master
two new controls, to get/set recovery master for a node
to use recovery daemon, start one
./bin/recoverd --socket=ctdb.socket*
for each ctdb daemon
it has been briefly tested by deleting and adding nodes to a 4 node
cluster but needs more testing
(This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3)
2007-05-07 06:51:58 +10:00
Ronnie Sahlberg
a9657f6aa5
add new controls to get and set the recovery master node of a daemon
...
i.e. which node is "elected" to check for and drive recovery
(This used to be ctdb commit d577093eb4b619392c71ab5ce81e8c02565d93f0)
2007-05-07 05:02:48 +10:00
Ronnie Sahlberg
97bc457321
add a test in the function that checks whether the cluster needs
...
recovery or not that all active nodes are in normal mode.
If we discover that some node is still in recoverymode it may indicate
that a previous recovery ended prematurely and thus we should start a
new recovery
(This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0)
2007-05-07 04:41:12 +10:00
Ronnie Sahlberg
1c438a7256
update a comment to be more desciptive
...
(This used to be ctdb commit 96082c54d830974bf9a4d5bad33ad60379a85798)
2007-05-06 12:46:56 +10:00
Ronnie Sahlberg
1fa2bf831a
change a lot of printf into debug statements
...
(This used to be ctdb commit 6edb9149c7eb36da47e4e6a9dd3ede22263ce3f9)
2007-05-06 10:51:25 +10:00
Ronnie Sahlberg
8a12672992
break out the code to update all nodes to the new vnnmap into a helper
...
function
(This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064)
2007-05-06 10:42:18 +10:00
Ronnie Sahlberg
ee83202da6
create a helper function for recovery to push all local databases out
...
onto the remote nodes
(This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc)
2007-05-06 10:38:44 +10:00
Ronnie Sahlberg
5fb41f4c3b
add an extra blank line
...
(This used to be ctdb commit 75096dde58df6532abbf5b9ebd771e8810156483)
2007-05-06 10:30:18 +10:00
Ronnie Sahlberg
9281cb192c
break the code that repoints dmaster for all local and remote records
...
into a separate helper function
(This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb)
2007-05-06 10:22:13 +10:00
Ronnie Sahlberg
d51a19f2ba
create a helper function for recovery that pulls and merges all remote
...
databases onto the local node
(This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4)
2007-05-06 10:16:48 +10:00