1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-03 01:18:10 +03:00
Commit Graph

2672 Commits

Author SHA1 Message Date
Ronnie Sahlberg
8b06fc7284 change the structure used for node flag change messages so that we can
see both the old flags as well as the new flags (so we can tell which 
flags changed)

send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to 
every node, connected or not, in the cluster.


in the handler inside the recovery daemon which is invoked for node flag 
change messages, only do a takeover_run() and redistribute the ip addresses IF it was the 
disabled or the unhealthy flags that changed. Also send out the cluster 
reconfigured message in this case.
If any of the other flags changed we dont need to do the takeover_run(0 
here since that will be done during recovery.

(This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)
2007-08-21 17:25:15 +10:00
Ronnie Sahlberg
4e4dd6b886 when we shutdown the service due to receiving a 'ctdb shutdown' command
from the administrator, log this as 'Received SHUTDOWN command. Stopping 
CTDB daemon.'   so that the administrator will know when looking at the 
log 'why' the ctdb service was terminated.

Previously the only thing logged was 'shutting down' which is not 
detailed enough.

(This used to be ctdb commit 5b818c1b72b6594a8d6e45e1865026e3ce33ae63)
2007-08-21 09:46:27 +10:00
Ronnie Sahlberg
5228abef64 add an atexit() that will print "CTDB daemon shutting down" in the log
when the main daemon exits

(This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)
2007-08-21 09:43:53 +10:00
Ronnie Sahlberg
a03c8d4954 setup the logfile much earlier in the startup procedure for ctdbd
change initial errors that cause ctdb to fail to start from printf to 
DEBUG(0

add a DEBUG(0 to log that the ctdb service is starting

(This used to be ctdb commit 680b4fbb283dd68567a62a83345f11a6cc1dd0e5)
2007-08-21 09:33:03 +10:00
Ronnie Sahlberg
b582e13cae make sure that the event script is executable and just ignore it
othervise

(This used to be ctdb commit 65eb7845c70489d654acaaf99cd2c8eac7df11dc)
2007-08-21 09:22:14 +10:00
Ronnie Sahlberg
aed2c58c64 dont pollute the log with 'Registered PID XXX for client YYY' at log
level 0.

change the log level to 3 for this information message

(This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)
2007-08-21 08:42:42 +10:00
Ronnie Sahlberg
7e1f840c8d if a public address has already been taken over by a node, then let that
public address remain at that node until either the node becomes 
unhealthy or the original/primary node for that address becomes healthy 
again.


Othervise what will happen is 
1, if we ban a node,   the banning code immediately does a 
takeover_run() and reassigns the public address to a different node in 
the cluster.
2, a few seconds later (at most) the recovery daemon will detect that 
the number of nodes has shrunk and will initiate a recovery.
During the recovery  the public address would again be assigned to a 
node, this time a different node.

(This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)
2007-08-20 14:16:58 +10:00
Andrew Tridgell
405e123ffb removed redundent debug message
(This used to be ctdb commit 9ee742b7cc43be7da6b568308912a3f2cfe4f4d3)
2007-08-20 11:13:38 +10:00
Andrew Tridgell
46639ac19e merged new event script calling code from ronnnie
(This used to be ctdb commit bbacad61b3eee4276ffe44ed2a23949aca8152cf)
2007-08-20 11:10:30 +10:00
Ronnie Sahlberg
7322e82bcb add text to the event script timeout log on how to find out which script
timed out

(This used to be ctdb commit bd6db995fb00ed45c5f0a50bbe6cf5d0fe22a194)
2007-08-15 15:08:42 +10:00
Ronnie Sahlberg
3b9d50f3ee change the now rather small /etc/ctdb/events script into a service
specific script /etc/ctdb/events.d/00.ctdb

get rid of CTDB_EVENTS_SCRIPT and --event-script

(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
2007-08-15 15:01:31 +10:00
Ronnie Sahlberg
ff58f7c7ea add a comment that the talloc_free also removes the script from the tree
(This used to be ctdb commit ce71f6e9cf983cc4fe66935ad6c18d55dfed03a5)
2007-08-15 14:46:06 +10:00
Ronnie Sahlberg
4023576e50 call the service specific event scripts directly from the forked child
instead for from /etc/ctdb/events so that we can get better debugging 
output in the logs when something fails in the scripts

(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)
2007-08-15 14:44:03 +10:00
Ronnie Sahlberg
5a02262a06 comment that ctdb_event_script_v() is called from a forked childs
context and thus can make blocking calls

(This used to be ctdb commit b31d98281f15995ad340d2510e08e04ed46e271a)
2007-08-15 10:48:10 +10:00
Ronnie Sahlberg
56d5ef27b6 add a wrapper function to create the key used to insert/lookup a certain
tcp connection in the tree that stores the tcp connections to kill by 
sending an RST

add a define that specified the keylength instead of hardcoding it as 4

(This used to be ctdb commit 6a8322cbae10f2c78b2e286c75aeb25ece12ea7f)
2007-08-15 10:01:00 +10:00
Ronnie Sahlberg
adb49f02f0 change the mem hierarchy for trees. let the node be owned by the data
we store in the tree and use a node destructor so that when the data is 
talloc_free()d we also remove the node from the tree.

(This used to be ctdb commit b8dabd1811ebd85ee031563e95085f720a2fa04d)
2007-08-09 14:08:59 +10:00
Ronnie Sahlberg
9c216d0d76 when we want to kill a tcp connection we stored the connection
description (src + dst sockaddr_in) in a linked list.
everytime we receive a captured packet from the network we had to walk 
this list in linear time to see if the packet matched a connection we 
wanted to RST.
which wouldnt scale very well.


replace the linked list with a redblack tree that is indexed by
src address, src port,  dst address,   dst port
to make checking whether the packet belongs to a connection we want to 
RST very fast and scalable


the reason we need to capture packets when we want to kill a TCP 
connection is because we must wait for an ACK coming back from the 
remote host  so that we can learn which sequence number to use in the 
RST.
Most tcp today will ingore any and all RST segments unless the 
sequencenumber lies exactly on the right edge of the window to make 
spoofing RST a little bit more difficult.

(This used to be ctdb commit ced18caea8582af042287beb6333dd1f8ba3344d)
2007-08-08 15:09:19 +10:00
Ronnie Sahlberg
203306400e add helpers to traverse a tree where the key is an array of uint32
(This used to be ctdb commit d328c66827cafff6356e96df2a782930274fe139)
2007-08-08 13:50:18 +10:00
Ronnie Sahlberg
dd14afe6aa after we have checked dest address that it is a public address
update addr to the source address so the rpintout in the log matches
the client that attached to samba

(This used to be ctdb commit 72098b71c79469c86769ca82bbd484c81902d27c)
2007-07-30 16:10:14 +10:00
Ronnie Sahlberg
e666808f60 no need to have a separate assignment of the tcparray pointer followed
by a talloc_steal()
use the returned pointer in talloc_steal as the value to assign

(This used to be ctdb commit 5c6375ad3bbecfa725ec3b1477f259e5a8191866)
2007-07-25 08:03:58 +10:00
Ronnie Sahlberg
81294825e7 when we build the arp structure for sending gratious arp (and tcp
tickles) just talloc_steal the enture tcp_array into the arp 
structure instead of copying each of the entries into a linked list
and then releasing the tcparray.

(This used to be ctdb commit 468e237740cf37a65872ef700bbb1284ede8352a)
2007-07-24 07:46:51 +10:00
Ronnie Sahlberg
ea56d1d20e set the tcp tickle update flag to true once we have done a takeover and
tickled all connections
othervise the other nodes will still remember this list until next time 
we have had a connection/client closing.

(This used to be ctdb commit cb8e5d4bbee2f14f498735489f673ff3679dfd9d)
2007-07-20 19:11:45 +10:00
Ronnie Sahlberg
81767b2a7b when a client connects with TCP_CLIENT we should look at the
destination address to find the public address   not the source address

(This used to be ctdb commit d6d4a7f38a52c1c2579a54d14cb7a6981fb42f5b)
2007-07-20 17:04:08 +10:00
Ronnie Sahlberg
fca90ce3c3 updated ctdb tickle management
there is an array for each node/public address that contains tcp tickles

we send a TCP_ADD as a broadcast to all nodes when a client is added

if tcp tickles are removed, they are only removed immediately from the 
local node.
once every 20 seconds a node will push/broadcast out the tickle list for 
all public addresses it manages.   this will remove any deleted tickles 
from the remote nodes

(This used to be ctdb commit e3c432a915222e1392d91835bc7a73a96ab61ac9)
2007-07-20 15:05:55 +10:00
Ronnie Sahlberg
7b17afdfcd change the tickle list from one global list into an array per public
ip/node

once we have started sending all tickles for a specific ip   delete the 
entire array   so that the tickles dont remain forever in the ctdb 
server

add a control to send the full list of every tickle that is registered 
for a particular public ip/node

(This used to be ctdb commit d0eee33e44d3f8e26debbec21d41e2cbdbb520e6)
2007-07-20 10:06:41 +10:00
Andrew Tridgell
394190d3cc - log registering of tcp clients
- don\'t remove a tcp entry if we do not own the ip
(This used to be ctdb commit 400aa284b9785ce6409e7600df429f5849e3867d)
2007-07-19 15:04:54 +10:00
Andrew Tridgell
689195b455 make sure we still run events when waiting for ctdb_event_script()
(This used to be ctdb commit 05efbfe9ff9691c1d7441e7b9855aed25791faf0)
2007-07-19 13:36:00 +10:00
Andrew Tridgell
fb22d3bd2c merged from ronnie
(This used to be ctdb commit 765b07fa5d1af07c8c7212d19d8e9574060b3039)
2007-07-18 20:13:57 +10:00
Ronnie Sahlberg
4d1f3acc94 add a check if start_node is beyond the end of the nodemap and reset it
back to 0 if it is to prevent an infinite loop.

this could happen if in the future we add a mechanism to add/remove 
nodes to a cluster at runtime

(This used to be ctdb commit 217e80a468713fec86ccb0608460e3401046bb98)
2007-07-16 08:36:09 +10:00
Ronnie Sahlberg
49f98e79fd change the way we pick/find a new node to takeover for a failed node
to keep a static that controls at which noide to start searching the 
list for takeover candidates next time we need to find a node.

each time we find a node to takeover, reset the start variable to point 
to the next node in the list

this makes the distribution of takeover nodes much more even

(This used to be ctdb commit e9800df5a21079ea478d16f7dd2fd4707de85650)
2007-07-16 08:28:44 +10:00
Ronnie Sahlberg
f09566a81a add a private_data field to the killtcp structure and let the system
specific routines populate it as it see fit when creating a 
capture socket.
pass this structure to read_tcp and close capture socket as parameter

(This used to be ctdb commit 79bbfcfb2223889126fe307d5bbfd24917da07ee)
2007-07-13 17:07:10 +10:00
Andrew Tridgell
8f637e6317 ensure killtcp structure is initialised
(This used to be ctdb commit 2fe7d1ce87e55e125411e7406a9e00b8f55e3cb7)
2007-07-13 11:55:58 +10:00
Andrew Tridgell
1e14ecd176 - merge from ronnie
- cleaner handling of system capture socket

(This used to be ctdb commit d194a41a71b8466d0726dcbae3970a86386fcb3c)
2007-07-13 11:31:18 +10:00
Andrew Tridgell
d2a5af7eb8 fully save/restore scheduler parameters
(This used to be ctdb commit 59408eabe7515d49a6eef3b6fb2590a1cd1df956)
2007-07-13 09:35:46 +10:00
Andrew Tridgell
698a8bc909 fixed the sense of do_setsched
(This used to be ctdb commit 68bca2454ff43ce6d8aab2f87d669d33f5f2a10c)
2007-07-13 09:14:31 +10:00
Andrew Tridgell
fc73bc5c24 added --nosetsched option to ctdbd
(This used to be ctdb commit 4cbbb88c1735c7d112e751e22da1c1c69e09bf4a)
2007-07-13 08:47:02 +10:00
Ronnie Sahlberg
a650497680 as an optimization for when we want to send multiple tickles at a time
let the caller create the sending socket and use a single socket instead 
of one new one for each tickle.
pass a sending socket to ctdb_sys_send_tcp()

ctdb_sys_kill_tcp is not longer used so remove it

set the socketflags for close on exec and nonblocking in the helper that 
creates the sockets instead of in the caller

add a helper to create a sending socket to send tickles from

(This used to be ctdb commit 469f3fb238a0674a2b48fdf1a7e657e32428178a)
2007-07-12 09:22:06 +10:00
Ronnie Sahlberg
823b7d4a5f rename killtcp->fd to killtcp->capture_fd
we might want to have two sockets attached to the killtcp structure
one for capturing and a second one for sending  so we dont have to 
create a new socket for each tickle we want to send

(This used to be ctdb commit b3e82ec38047bbec1edfd88ade264077d4cbd2ee)
2007-07-12 08:52:24 +10:00
Ronnie Sahlberg
76ab80104a make the ctdb tool use the killtcp control in the daemon instead of
calling killtcp directly

(This used to be ctdb commit d21e3e9cf11bdcba6234302e033d6549c557dd69)
2007-07-12 08:30:04 +10:00
Ronnie Sahlberg
1ed0c3a9f7 add daemon code for the new kill_tcp control
(This used to be ctdb commit 8fe4ae62255ecb2db36bea736ff17409ba6614c5)
2007-07-11 18:24:25 +10:00
Ronnie Sahlberg
e4db03f7e6 add a ctdb_ prefix to two public functions
(This used to be ctdb commit 32adee5426aa75ddcd4d648ef326ed03d5ff5c46)
2007-07-11 18:13:03 +10:00
Ronnie Sahlberg
aa080f66d9 first cut at a better and more scalable socketkiller
that can kill multiple connections asynchronously using one listening 
socket

(This used to be ctdb commit 22bb44f3d745aa354becd75d30774992f6c40b3a)
2007-07-11 17:43:51 +10:00
Ronnie Sahlberg
0c44e0ad46 add a ctdb_kill_tcp_callback() that will perform a kill tcp using a
background process

(This used to be ctdb commit dcfcaacff56347d94c244512eb72219b05ef9c3d)
2007-07-11 12:33:14 +10:00
Ronnie Sahlberg
135a964220 pass the header to ctdb_become_dmaster instead of just the reqid
this allows us to print from which node Invalid or Dropped orphan become 
dmaster packets came from

(This used to be ctdb commit 88efd1bf4c796cd2b184156b72296587bc38bb40)
2007-07-11 09:44:52 +10:00
Ronnie Sahlberg
2eef287fab print the operation code in the debug message when we discard a packet
due to incorrect generation number

(This used to be ctdb commit 3151e3b2607291572fc6e7380fd60ef7ce438307)
2007-07-11 08:41:29 +10:00
Andrew Tridgell
32de198fd3 update lib/replace from samba4
(This used to be ctdb commit f0555484105668c01c21f56322992e752e831109)
2007-07-10 15:29:31 +10:00
Ronnie Sahlberg
a859723912 nicer handling of DISCONNECTED flag when we update the node flags from
a remote message

(This used to be ctdb commit 9a50ad22be61a09761ffda89de91ef3221917c84)
2007-07-09 17:40:15 +10:00
Ronnie Sahlberg
69f3a09e6f when a remote node has sent us a message to update the flags for a node,
dont let those messages modify the DISCONNECTED flag.

the DISCONNECTED flag must be managed locally since it describes whether 
the local node can communicate with the remote node or not

(This used to be ctdb commit 5650673205d335a32d4f27f66847ea66752a00f0)
2007-07-09 13:21:17 +10:00
Ronnie Sahlberg
b871c3e365 a better way to fix the DISCONNECT|BANNED vs DISCONNECT bug
(This used to be ctdb commit 5c638d7731c5a268de02d3a37828ac7aec9a12de)
2007-07-09 12:55:15 +10:00
Ronnie Sahlberg
3499c8c673 when checking the nodemap flags for consitency while monitoring the
cluster,   we cant check that both the BANNED and the DISCONNECTED flags 
are both set at the same time   since if a node becomes banned just 
before it is DISCONNECTED   there is no guarantee that all other nodes 
will have seen the BANNED flag.

So we must first check the DISCONNECTED flag only   and only if the 
DISCONNECTED flag is not set should we check the BANNED flag.


othervise this can cause a recovery loop while some nodes thing the 
disconnected node is DISCONNECTED|BANNED and other think it is just 
DISCONNECTED

(This used to be ctdb commit 0967b2fff376ead631d98e78b3a97253fc109c69)
2007-07-09 12:33:00 +10:00
Andrew Tridgell
f1db15ffe1 fixed sense of inet_aton test
(This used to be ctdb commit ed5cf9b43c49312d3736e85077863d23990acce8)
2007-07-08 21:09:09 +10:00
Andrew Tridgell
056d3c35a4 call kill_clients when releasing all IPs, as well as for individual IPs
(This used to be ctdb commit ad68904720eb69757601589b06726190321685ac)
2007-07-08 20:45:12 +10:00
Andrew Tridgell
af5ee9981e we do tell banned nodes to release IPs
(This used to be ctdb commit 381dc0421d4d825398c03dcff4e79e3f76c3c981)
2007-07-08 20:24:03 +10:00
Andrew Tridgell
a55c03b31b log the generation numbers to give a hint about this bug
(This used to be ctdb commit 12018494baa33c5f6c52e6eae94ac77a56d3e5a0)
2007-07-08 19:36:55 +10:00
Andrew Tridgell
006227e80a forgot to add this
(This used to be ctdb commit 30fc56b7489e42633532964096e53faee1319dde)
2007-07-04 17:45:46 +10:00
Andrew Tridgell
bdf01ed7c0 - neaten up the command line for killtcp
- split out the event script code into a separate module
- get rid of the separate takeover directory

(This used to be ctdb commit 8ea2c923a3e2464200ff79bf2c3f1f89e6a93ad4)
2007-07-04 16:51:13 +10:00
Ronnie Sahlberg
1cd8bc0c64 add a tuneable to control how long we wait after a successful recovery
before we alow another recovery to be initiated

(This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf)
2007-07-04 08:36:59 +10:00
Andrew Tridgell
6399cf9542 added code to kill registered clients on a IP release
(This used to be ctdb commit ca0243b544987ce0618a99ac87b4abf598991e93)
2007-06-19 03:54:06 +10:00
Andrew Tridgell
732353de5f - merged ctdb_store test from ronnie
- added DatabaseHashSize tunable
- added logging of events inside recovery (for timing)

(This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace)
2007-06-17 23:31:44 +10:00
Andrew Tridgell
97d5bea2eb on startup release all IPs, in case we have any left over from a previous run
(This used to be ctdb commit 5eb2f8f5f70f567c264d6929e95899b70f0e4ec0)
2007-06-12 19:44:54 +10:00
Andrew Tridgell
91362083a1 make sure we start the freeze process quickly on all nodes when we are going to do recovery - this prevents serialisation of freeze, which can take a long time
(This used to be ctdb commit 52675c19e420d83d9556a3e73d9a4b490078aa9c)
2007-06-11 23:03:23 +10:00
Andrew Tridgell
031e205832 raise the default keepalive limit
(This used to be ctdb commit 4776a187a183bd129ded70e9c018c197b3d618be)
2007-06-11 22:27:23 +10:00
Andrew Tridgell
a31ece536c more detail in recovery message
(This used to be ctdb commit bc18a39efcf1fa5edfadc4c2f842f7cf035e4fbd)
2007-06-11 21:37:09 +10:00
Andrew Tridgell
044a2e04c4 - send tcp info to all connected nodes, not just vnnmap nodes
- use a non-blocking freeze when banned
- release all IPs when banned

(This used to be ctdb commit 070e85e532b33b792f85c3e72eee205d906aaf85)
2007-06-10 08:46:33 +10:00
Andrew Tridgell
18ae6e56f0 propogate flag changes to all connected nodes
(This used to be ctdb commit 711d1f7e20f1e98caaf08a57df0b1825ff6e97a0)
2007-06-09 21:58:50 +10:00
Ronnie Sahlberg
40585aed37 should be sufficient to unban nodes when we unbecome recmaster
(This used to be ctdb commit 8a6c4e675b4b877a9d0a7a3701973573ff0b71e8)
2007-06-09 20:13:25 +10:00
Ronnie Sahlberg
5458196b3f unban all nodes when we release recmaster role or when we win an
election

(This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e)
2007-06-09 20:11:51 +10:00
Ronnie Sahlberg
c873c7d4da remove rht unban code from when we take recmaster role. we can not
send control broadcasts yet

(This used to be ctdb commit 39a05dc1d74d49685e6daf929df169d936585208)
2007-06-09 19:49:28 +10:00
Ronnie Sahlberg
9a0d7a688f add code to unban when we become/unbecome recmaster
(This used to be ctdb commit a22cf9b8a6fd46128faca958f33a75cb3fc1ee12)
2007-06-09 19:42:41 +10:00
Andrew Tridgell
06a71762a4 some #include cleanups
(This used to be ctdb commit 1a07d87122d51a40cd8ad5fe13533298c26857cb)
2007-06-07 22:26:27 +10:00
Andrew Tridgell
b50096c835 more code rearrangement
(This used to be ctdb commit 2bcf3b16163041f03add2e5bf9f1f5fb3599ec24)
2007-06-07 22:16:48 +10:00
Andrew Tridgell
ae3d54094b start splitting the code into separate client and server pieces
(This used to be ctdb commit 603cd77988c181525946cd5eb0f4d0d646b58059)
2007-06-07 22:06:19 +10:00