1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

540 Commits

Author SHA1 Message Date
Andrew Tridgell
698d2a6af4 added nonblocking varients of the two lockall functions to tdb
(This used to be ctdb commit 2e99fa41ce01fa282bc0f3244ca42a78173743ed)
2007-05-10 17:43:08 +10:00
Andrew Tridgell
15bc97cdaa better timeout handling for calls, controls and traverses
(This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe)
2007-05-10 14:06:48 +10:00
Andrew Tridgell
31cd92dc7e merge from ronnie
(This used to be ctdb commit 92b7a849565730744c75a7fb776173554e9f57bf)
2007-05-10 13:15:58 +10:00
Andrew Tridgell
50390bcb18 setup the random number generator a bit better
(This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2)
2007-05-10 13:10:23 +10:00
Ronnie Sahlberg
a54390197a create a correct vnnmap structure to prevent a segv
(This used to be ctdb commit 17777bb5e6208e97a82a171243c6c406f53ee02e)
2007-05-10 10:10:58 +10:00
Ronnie Sahlberg
82e37a9886 update ctdb_control to create a correct ctdb_vnn_map->map array
(This used to be ctdb commit e510cc89068557881688d6cada38915b3e51f8cd)
2007-05-10 10:03:21 +10:00
Ronnie Sahlberg
a56a2501ac when starting a new election, also force all nodes into recovery mode so
there is no internode traffic to interfere with our election

(This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad)
2007-05-10 09:48:14 +10:00
Ronnie Sahlberg
4370dc1e75 when starting recovery repoint dmaster to an invalid node and not the
current vnn

(This used to be ctdb commit 3c2dcc7448b335cf42e8f7edffba21229dccbd79)
2007-05-10 09:46:10 +10:00
Ronnie Sahlberg
325f321409 merge from tridge
(This used to be ctdb commit 8c5e6836280499243c0cd247093844a891f00da3)
2007-05-10 09:44:28 +10:00
Ronnie Sahlberg
639e4374e5 actually check the remote nodes and not just the local node
(This used to be ctdb commit 09df21be6361743d320fafc120718211eece85c3)
2007-05-10 09:43:01 +10:00
Andrew Tridgell
1e38ae491f remove old s3 recovery code
fixed vnnmap wire format in recover daemon

(This used to be ctdb commit e03fab7bfe0cf43f40c49a3d63e75dc44001d8d8)
2007-05-10 08:49:57 +10:00
Andrew Tridgell
2a82665532 fixed setvnnmap to use wire structures too
(This used to be ctdb commit 1208e4219d220b80e2f74974cac8ed2b8956d3ef)
2007-05-10 08:22:26 +10:00
Andrew Tridgell
682df74d59 separate the wire format and internal format for the vnn_map
(This used to be ctdb commit 9a71718d87c5162f1423d85c2e86a01f6771925e)
2007-05-10 08:13:19 +10:00
Andrew Tridgell
a8f83423f4 moved the vnn_map initialisation out of the cmdline code
(This used to be ctdb commit 81492b840d608dc724d5a25ddef6eb0ce12b95fb)
2007-05-10 07:55:46 +10:00
Andrew Tridgell
ba47b43c6b merged ronnies code to delay client requests when in recovery mode
(This used to be ctdb commit dfca37076d642f3407c63dfe3b685287d27c8f8d)
2007-05-10 07:43:18 +10:00
Ronnie Sahlberg
cbb6f99f41 merge from tridge
(This used to be ctdb commit 190cca8488dff982062ae7b1a82cb33cc1cdfaf7)
2007-05-10 06:55:28 +10:00
Ronnie Sahlberg
bbaaf2bbf4 hang the event from the retry structure instead of the hdr structure
(This used to be ctdb commit 8536c8c3a30a986ba4945d02aef82b47495ce3f8)
2007-05-09 14:08:11 +10:00
Ronnie Sahlberg
c938c1b5de when we are in recovery mode and we get a REQ_CALL from a client,
defer it for one second and try again   

(This used to be ctdb commit 606fb6414b97d1813056982cda7c0fe84d746e67)
2007-05-09 14:06:47 +10:00
Andrew Tridgell
d2a90cc5a5 merge from ronnie
(This used to be ctdb commit f67a4842e7b1efb2ad61c41e4895c7698e564bf3)
2007-05-09 11:54:37 +10:00
Ronnie Sahlberg
6929739b7f add a command line flag to ctdbd to start a recovery daemon.
update the recovery test script to start all ctdb daemons with a 
recovery daemon

(This used to be ctdb commit 47794e16df285cacefc30208d892d931a6e46b96)
2007-05-09 09:59:23 +10:00
Ronnie Sahlberg
92333fce03 change the name of the recovery daemon to ctdb_recoverd
(This used to be ctdb commit b0cf919e4f38961e5cf4e1e79a0cfe4bb4a96d76)
2007-05-09 09:31:53 +10:00
Ronnie Sahlberg
2befe18e29 add a small tool to monitor recovery
(This used to be ctdb commit b45936828713c31ee670e2106b49c2351234f310)
2007-05-09 08:05:53 +10:00
Andrew Tridgell
fdb8144e62 fixed a problem with the number of timed events growing without bound with the new seqnum code
(This used to be ctdb commit 6109ae3dae8d93c93a2dc76cc561ea6e21458aa6)
2007-05-08 21:16:29 +10:00
Ronnie Sahlberg
5efa3d88c5 we must repoint dmaster to an invalid node during recovery to stop the
shortcut from working

(This used to be ctdb commit 5e18930be8c0efb87aa9e2780d9457634b24e156)
2007-05-08 14:51:55 +10:00
Ronnie Sahlberg
e11eebd070 fix alignment bug for pulldb
(This used to be ctdb commit f1188289c18805c2c5f8bae61d73df3fc762faee)
2007-05-08 14:42:00 +10:00
Ronnie Sahlberg
54d2acec40 merge from tridge
(This used to be ctdb commit da8636707547e77c76dc7e368ddfae35b8a21402)
2007-05-07 08:07:26 +10:00
Andrew Tridgell
d98d8d4de3 merged from ronnie
(This used to be ctdb commit 49aad9fb09ca2c787e6f82ba03cb229cc51844f0)
2007-05-07 07:56:38 +10:00
Ronnie Sahlberg
a1866c6eeb hang the timeout event off state and thus we dont need to explicitely
free it   and also we wont accidentally return from the function without 
killing the event first

(This used to be ctdb commit e3d72d024ef7342a808e5c488fd646a39e5fac78)
2007-05-07 07:54:17 +10:00
Ronnie Sahlberg
6bfb5f61ca it now works to talloc_free() the timed event if we no longer want it to
trigger

this must have been a sideeffect of a different bug in the recoverd.c 
code that has now been fixed

(This used to be ctdb commit 676446fd1083c371ad0ff72dd8c636ec8e6d1423)
2007-05-07 07:47:16 +10:00
Ronnie Sahlberg
39d81cffb1 recovery daemon with recovery master election
election is primitive, it elects the lowest vnn as the recovery master

two new controls, to get/set recovery master for a node



to use recovery daemon,   start one  
./bin/recoverd --socket=ctdb.socket*
for each ctdb daemon


it has been briefly tested by deleting and adding nodes to a 4 node 
cluster but needs more testing

(This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3)
2007-05-07 06:51:58 +10:00
Ronnie Sahlberg
a9657f6aa5 add new controls to get and set the recovery master node of a daemon
i.e. which node is "elected" to check for and drive recovery

(This used to be ctdb commit d577093eb4b619392c71ab5ce81e8c02565d93f0)
2007-05-07 05:02:48 +10:00
Ronnie Sahlberg
97bc457321 add a test in the function that checks whether the cluster needs
recovery or not  that all active nodes are in normal mode.
If we discover that some node is still in recoverymode it may indicate 
that a previous recovery ended prematurely and thus we should start a 
new recovery 

(This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0)
2007-05-07 04:41:12 +10:00
Ronnie Sahlberg
1c438a7256 update a comment to be more desciptive
(This used to be ctdb commit 96082c54d830974bf9a4d5bad33ad60379a85798)
2007-05-06 12:46:56 +10:00
Ronnie Sahlberg
1fa2bf831a change a lot of printf into debug statements
(This used to be ctdb commit 6edb9149c7eb36da47e4e6a9dd3ede22263ce3f9)
2007-05-06 10:51:25 +10:00
Ronnie Sahlberg
8a12672992 break out the code to update all nodes to the new vnnmap into a helper
function

(This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064)
2007-05-06 10:42:18 +10:00
Ronnie Sahlberg
ee83202da6 create a helper function for recovery to push all local databases out
onto the remote nodes

(This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc)
2007-05-06 10:38:44 +10:00
Ronnie Sahlberg
5fb41f4c3b add an extra blank line
(This used to be ctdb commit 75096dde58df6532abbf5b9ebd771e8810156483)
2007-05-06 10:30:18 +10:00
Ronnie Sahlberg
9281cb192c break the code that repoints dmaster for all local and remote records
into a separate helper function

(This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb)
2007-05-06 10:22:13 +10:00
Ronnie Sahlberg
d51a19f2ba create a helper function for recovery that pulls and merges all remote
databases onto the local node

(This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4)
2007-05-06 10:16:48 +10:00
Ronnie Sahlberg
d6ce023c68 create a helper function to make sure the local node that does recovery
has all the databases that exist on any other remote node

(This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4)
2007-05-06 10:12:42 +10:00
Ronnie Sahlberg
0e436f5058 add a helper function to create all missing remote databases detected
during recovery

(This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427)
2007-05-06 10:04:37 +10:00
Ronnie Sahlberg
cadfb24b41 break out the setting/clearing of recovery mode into a dedicated helper
function

(This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18)
2007-05-06 09:53:12 +10:00
Ronnie Sahlberg
c9aafae5ce dont allocate arrays where we can just return a single integer
(This used to be ctdb commit 07bc338e490e0f7018808a2450bc54863eb88c94)
2007-05-06 08:05:22 +10:00
Ronnie Sahlberg
dceab7ff3e dont use arrays where a uint32_t works just as well
(This used to be ctdb commit 843e974b29c93df891ae7cf13323ee960a334f60)
2007-05-06 07:52:20 +10:00
Ronnie Sahlberg
ad41dff7bf add a ifdeffed out block to the call.
we really should kill the event in case the call completed before the 
timeout   so that we can also make timed_out non-static

(This used to be ctdb commit f297eed589b1d4e188f77f195683365cf91d0e62)
2007-05-06 07:32:16 +10:00
Ronnie Sahlberg
4f2cdc2d8b hte timed_out variable needs to be static and can not be on the stack
since if the command times out and we return from ctdb_control   we may 
have events that can trigger later which will overwrite data that is no 
longer in our stackframe

(This used to be ctdb commit 93942543092be618c0bd8ef68b470b0789bad7ad)
2007-05-06 07:07:47 +10:00
Ronnie Sahlberg
c6bd23ee11 update to rhe recovery daemon
ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the 
cluster it crashes the recovery daemon afterwards with a SEGV but no 
useful stack backtrace

(This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c)
2007-05-06 06:58:01 +10:00
Ronnie Sahlberg
60d4b0e8b4 in the recover test
start the daemons with explicit socketnames and explicit ip address/port

remove all --socket=  from all ctdb_control calls since they are not 
needed anymore

(This used to be ctdb commit 593a959d428f5b4a913117a9b5c8fe65a3eb950e)
2007-05-06 06:06:39 +10:00
Ronnie Sahlberg
7bbcc964f2 add support in catdb to dump the content of a specific nodes tdb instead
of traversing the full cluster.
this makes it easier to debug recovery

update the test script for recovery to reflect the newish signatures to
ctdb_control



the catdb control does still segfault however when there are missing 
nodes in the cluster   as there are toward the end of the recovery test

(This used to be ctdb commit 8de2a97c14a444f817ceb36461314f10c9601ecc)
2007-05-06 05:53:15 +10:00
Ronnie Sahlberg
0f6d9c73d8 merge from tridge
(This used to be ctdb commit 08173e3ab77178b9841db0081a51b93291d9e8dc)
2007-05-06 04:38:41 +10:00