1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

193 Commits

Author SHA1 Message Date
Ronnie Sahlberg
9f7b9faf64 add a node->tx_cnt counter
only send keepalive packets if the count is zero

(This used to be ctdb commit 2cbd424231caccf0a531cf6501761115efe68f3e)
2007-05-19 10:20:19 +10:00
Andrew Tridgell
28f2fc669b a better way to resend calls after recovery
(This used to be ctdb commit 444f52e134fc22aaf254d05c86d8b357ded876f4)
2007-05-19 00:56:49 +10:00
Andrew Tridgell
049e1504ee timeout pending controls immediately when a node becomes disconnected
(This used to be ctdb commit 93c4b16f4efef383ba8db83953019ef4821613e0)
2007-05-18 23:48:29 +10:00
Andrew Tridgell
346dfc1bef - up rx_cnt on all packet types
- notice when a node becomes available again

(This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a)
2007-05-18 23:23:36 +10:00
Ronnie Sahlberg
db4c479568 add dead node detection so that if a node does not generate any
keepalive traffic for x seconds   it is deemed dead


this triggers a recovery after a while if a ctdbd has been STOPPED    
but it doesnt recover automatically when the node reappears

(This used to be ctdb commit d6324afe0d13b5e21d06e347caca433c6b36a32a)
2007-05-18 19:19:35 +10:00
Andrew Tridgell
49fe66713f - don't try to send controls to dead nodes
- use only connected nodes in a traverse

(This used to be ctdb commit 9a676dd5d331022d946a56c52c42fc6985b93dbc)
2007-05-17 23:23:41 +10:00
Andrew Tridgell
874fd5c2f7 removed the CTDB_CTRL_FLAG_NOREQUEUE flag
(This used to be ctdb commit 366e849f6f350eda78d79cf1ea55c2637e605c86)
2007-05-17 14:10:38 +10:00
Ronnie Sahlberg
f4738f9c41 we no longer pass lmaster across during pulldb so dont print it from
catdb either

(This used to be ctdb commit b57d60f4789ea7f0dd69c93f6629d8742e182576)
2007-05-17 12:07:29 +10:00
Ronnie Sahlberg
cc760cf13a add a control to shutdown/kill a node
(This used to be ctdb commit 3802f7304fd59d56062c855987e2561753e85a69)
2007-05-17 10:45:31 +10:00
Ronnie Sahlberg
d6ed77468d merge from tridge
(This used to be ctdb commit 0c6dc471e33e80db00a2b006262c4107f39fa023)
2007-05-16 18:44:51 +10:00
Andrew Tridgell
c105f6d789 - merge from ronnie
- fixed a memory leak found by dmitry

(This used to be ctdb commit ae87bf0005666b50850161c3843d6bc7cb5c8971)
2007-05-16 18:10:26 +10:00
Ronnie Sahlberg
f4056d2e28 remove a prototype we no longer need
(This used to be ctdb commit 4a11373ec5e8196cf430f18f6171915f790f794b)
2007-05-16 14:45:43 +10:00
Ronnie Sahlberg
a4ebb6d5ef if a caller specifies a timeout when calling a control, it makes no
sense to have the daemon requeue the packets if they timeout or fail to 
deliver to the remote node

(This used to be ctdb commit 9fb753046787190970654aeb937e96685ac53184)
2007-05-16 12:34:30 +10:00
Ronnie Sahlberg
4b8ddfccad merge from tridge
(This used to be ctdb commit 8d424b41d6cf2973b28a749d1b8e6a028dad9ffe)
2007-05-16 11:12:28 +10:00
Andrew Tridgell
a5198559c9 moved the recovery daemon into the main ctdbd and enable it by default
(This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0)
2007-05-15 15:13:36 +10:00
Ronnie Sahlberg
0d71b6d1e6 merge from tridge
(This used to be ctdb commit 0697f59a044deeab126a39bff97bcd5c1101298e)
2007-05-15 10:28:41 +10:00
Andrew Tridgell
c6afe22b92 added a control to get the local vnn
(This used to be ctdb commit 0b109f574b710f290372512d0694290ea7cd4368)
2007-05-15 10:17:16 +10:00
Andrew Tridgell
cf1056df94 added a -i switch to run ctdbd without forking
(This used to be ctdb commit 327df14ecd58f405fbe8b38afa2ee54a8dd0a2e4)
2007-05-15 09:44:33 +10:00
Ronnie Sahlberg
ed466e20b6 remove the control to bump the rsn since we dont need it anymore
(This used to be ctdb commit a646b6d77bd8adf6c986259c534a05400c4bde11)
2007-05-14 08:03:48 +10:00
Ronnie Sahlberg
4f7fc688f7 merge from tridge
(This used to be ctdb commit 7bca79ad6357149fd7c6b28ce4b05de3d223a7de)
2007-05-14 06:25:15 +10:00
Andrew Tridgell
81826da2df added error messages in ctdb_control replies
(This used to be ctdb commit bd848f5b760e6b2a73ebfc67fd8adb3c31479fb5)
2007-05-12 21:25:26 +10:00
Andrew Tridgell
36ccc10389 make sure we ignore requeued ctdb_call packets of older generations except for packets from the client
(This used to be ctdb commit facab105fbd7fe50f96bdd763ae50ddc54fbdacc)
2007-05-12 18:08:50 +10:00
Andrew Tridgell
2c90d9e794 show total frozen/recoving in status
(This used to be ctdb commit 0d0eb66a63fe6912edb85bf7387ac76acb70babd)
2007-05-12 15:51:08 +10:00
Andrew Tridgell
9cf77dd23f separate out the freeze/thaw handling from recovery
(This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616)
2007-05-12 15:15:27 +10:00
Andrew Tridgell
74a799a83b added lockwait child code for entering recovery mode. A child processes holds lockall locks for the entire recovery process
(This used to be ctdb commit f892f30def75b0d964c35eae38c4cf675597dd28)
2007-05-12 14:34:21 +10:00
Ronnie Sahlberg
9ec3024287 add a control to bump the rsn number for all records in a database
use this control from the recovery daemon to ensure that the recmaster 
always have a higher rsn than andy other node for the records after 
recovery completes

(This used to be ctdb commit 6fb6a8b981a804bfcc460c4481c51c7c647230f6)
2007-05-11 10:36:47 +10:00
Andrew Tridgell
f8765b19bf - got rid of the complex hand marshalling in the recovery controls
- fixed the re-send of ctdb calls after a generation change

- fixed a reqid idr leak in controls

- removed the write_record test code

- use the new nonblock lockall code to prevent ctdbd from ever doing a
  blocking lock that could deadlock with smbd

- moved more of the recovery controls into ctdb_recover.c

(This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec)
2007-05-10 17:43:45 +10:00
Andrew Tridgell
15bc97cdaa better timeout handling for calls, controls and traverses
(This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe)
2007-05-10 14:06:48 +10:00
Andrew Tridgell
31cd92dc7e merge from ronnie
(This used to be ctdb commit 92b7a849565730744c75a7fb776173554e9f57bf)
2007-05-10 13:15:58 +10:00
Andrew Tridgell
682df74d59 separate the wire format and internal format for the vnn_map
(This used to be ctdb commit 9a71718d87c5162f1423d85c2e86a01f6771925e)
2007-05-10 08:13:19 +10:00
Andrew Tridgell
d2a90cc5a5 merge from ronnie
(This used to be ctdb commit f67a4842e7b1efb2ad61c41e4895c7698e564bf3)
2007-05-09 11:54:37 +10:00
Ronnie Sahlberg
6929739b7f add a command line flag to ctdbd to start a recovery daemon.
update the recovery test script to start all ctdb daemons with a 
recovery daemon

(This used to be ctdb commit 47794e16df285cacefc30208d892d931a6e46b96)
2007-05-09 09:59:23 +10:00
Andrew Tridgell
fdb8144e62 fixed a problem with the number of timed events growing without bound with the new seqnum code
(This used to be ctdb commit 6109ae3dae8d93c93a2dc76cc561ea6e21458aa6)
2007-05-08 21:16:29 +10:00
Ronnie Sahlberg
39d81cffb1 recovery daemon with recovery master election
election is primitive, it elects the lowest vnn as the recovery master

two new controls, to get/set recovery master for a node



to use recovery daemon,   start one  
./bin/recoverd --socket=ctdb.socket*
for each ctdb daemon


it has been briefly tested by deleting and adding nodes to a 4 node 
cluster but needs more testing

(This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3)
2007-05-07 06:51:58 +10:00
Ronnie Sahlberg
a9657f6aa5 add new controls to get and set the recovery master node of a daemon
i.e. which node is "elected" to check for and drive recovery

(This used to be ctdb commit d577093eb4b619392c71ab5ce81e8c02565d93f0)
2007-05-07 05:02:48 +10:00
Ronnie Sahlberg
25edbc9a50 add a control to get the pid of a daemon.
this makes it possible to kill a specific daemon in the recover test 
script

(This used to be ctdb commit 2fa394b4c80988cb1a6d04b236ec64cc9d9e8a40)
2007-05-06 04:31:22 +10:00
Ronnie Sahlberg
2e64727079 merge from tridge
(This used to be ctdb commit 8648104f8d76d22427c14422b126f7e979cc2d95)
2007-05-05 16:51:34 +10:00
Andrew Tridgell
9636c97c5a show number of connected clients in status output
(This used to be ctdb commit 99765bbe327bfe9c43415f4943281458f25be51b)
2007-05-05 14:09:46 +10:00
Ronnie Sahlberg
5cb817f031 split the vnn broadcast address into two
one broadcast address for all nodes
and one broadcast address for all nodes in the current vnnmap

update all useage of the old flag to now only broadcast to the vnnmap
except for tools/ctdb_control where it makes more sense to broadcast to 
all nodes

(This used to be ctdb commit dfb65b88cf67ad9d61268c4b47a6d8ae346f47df)
2007-05-05 13:17:26 +10:00
Andrew Tridgell
410d41480a added a dumpmemory control, used to find memory leaks
(This used to be ctdb commit 44fdafaf421e3e906796d529aed2f7c5df201b94)
2007-05-05 11:03:10 +10:00
Andrew Tridgell
adc64aed0a - fixed a crash bug after client disconnect in ctdb_control
- added total memory used to ctdb_control status output

(This used to be ctdb commit a99ffe4372edc63d83d8c8ebf9a60b3413301f5a)
2007-05-05 08:33:35 +10:00
Andrew Tridgell
d8f4e6b209 - added counters for controls in ctdb_control status
(This used to be ctdb commit 858061372fc9902837a1a5b8bcfc0ada58eec193)
2007-05-05 08:11:54 +10:00
Ronnie Sahlberg
1725fcf294 merge from tridge
(This used to be ctdb commit 62574808ef4dcb76760f1dd2496fbe8e34197c23)
2007-05-05 01:22:30 +10:00
Andrew Tridgell
fccc585f5a added seqnum propogation code to ctdb
(This used to be ctdb commit be2572b1b09eaaa1ea6a726d60f16996f9407d13)
2007-05-04 22:18:00 +10:00
Ronnie Sahlberg
508cafd17e merge from tridge
(This used to be ctdb commit 6c8b90cedc67daa89d54db5268fde18bfc20abaf)
2007-05-04 17:05:28 +10:00
Andrew Tridgell
ed3e847785 added a ctdb control for enabling the tdb seqnum
(This used to be ctdb commit c66920d9fb08a4a33418e2c1dcf1fc320fba3761)
2007-05-04 15:33:28 +10:00
Ronnie Sahlberg
7dfdab1b9d recovery daemon
this program is a client to the local ctdb daemon

every second it pulls all vnnmap and nodemaps from all nodes that are 
available and checks if a recovery is required

a recovery is required if :
* all nodes do NOT have an identical vnnmap and generation
* all nodes do NOT have an identical nodemap
* there are active nodes that are NOT in the nodemap
* there are nodes in the nodemap that are NOT active

During recovery,  the recovery tool will also make sure that all nodes 
know about and have created all databases.

(This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26)
2007-05-04 15:21:40 +10:00
Andrew Tridgell
f2fd53056d nicer interface to ctdb traverse
(This used to be ctdb commit e5ce866dcc5037b5069e42bf1e168b646f007b01)
2007-05-04 12:18:39 +10:00
Andrew Tridgell
e752f3bd97 - changed the REQ_REGISTER PDU to be a control
- allow controls to know which client invoked them

- added a client_id to clients, so they can be identified remotely

- added the ability to remove registered srvids

- in the list_keys code, register a temp srvid, then remove it afterwards

(This used to be ctdb commit 29603c51cc6d81362532cd8e50f75c8360c5f5ef)
2007-05-04 11:41:29 +10:00
Ronnie Sahlberg
2b1714a521 update getvnnmap control to take a timeout parameter
dont explicitely free the vnnmap pointer in the getvnnmap control  this 
is freed by the mem_ctx instead

add code to the recoverd to detect when/if recovery is required
veiry that the number of active nodes, the nodemap and the vnn map is 
consistent across the entire cluster and if not   trigger a recovery 
(which right now just prints "we need to do recovery" to the screen.

(This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5)
2007-05-04 09:45:53 +10:00