1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-26 10:04:02 +03:00

133 Commits

Author SHA1 Message Date
Ronnie Sahlberg
495a6403da change the api for managing callbacks to controls so that isntead of
passing it as a parameter we set the callback function explicitely from 
the caller if the ..._send() function returned a valid state pointer.

(This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)
2007-08-24 10:42:06 +10:00
Ronnie Sahlberg
62a03ef9d5 get rid of the explicit global timeout used in the previous example and
try this time by relying on the timeouts for the individual controls

(This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be)
2007-08-23 19:38:54 +10:00
Ronnie Sahlberg
f854b5f876 try out a slightly different api for controls where you provide a
callback function which is called upon completion (or timeout) of the 
control.

modify scanning of recmaster in the monitoring_cluster code to try the 
api out

(This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)
2007-08-23 19:27:09 +10:00
Ronnie Sahlberg
4c13bf0c5f break checking that the recoverymode on all nodes are ok out into its
own function

(This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939)
2007-08-23 13:48:39 +10:00
Ronnie Sahlberg
8fd3df2553 hang the ctdb_req_control structure off the ctdb_client_control_state
struct  so that if we timeout a control we can print debug info such as 
what opcode failed and to which node

we dont need the *status parameter to ctdb_client_control_state

create async versions of the getrecmaster control

pass a memory context to getrecmaster

(This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)
2007-08-23 13:00:10 +10:00
Andrew Tridgell
d95476fa38 merge from ronnie
(This used to be ctdb commit e0f1c1acb1188500674626d631e1a1b8726e72ad)
2007-08-22 17:31:29 +10:00
Ronnie Sahlberg
50c09b7465 when we receive a packet from the network, check explicitely that the
node is not banned it the call is for a database record. i.e a REQ/REPLY 
CALL/DMASTER

if we get such a call while banned, ignore the packet and write an entry 
in the logfile

(This used to be ctdb commit 79eb0863609fbb12e28ebf734101b1d3f359b330)
2007-08-22 12:53:24 +10:00
Ronnie Sahlberg
f6e0336b23 create a define to represent the 'invalid' generation id we used in two
places.

create a new helper function to generate new generation id values that 
know about the invalid id and avoids generating it.

update the ctdb status tool to know about the invalid generation id and 
print the string INVALID instead

(This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda)
2007-08-22 12:38:31 +10:00
Ronnie Sahlberg
e3b6d1e511 if the node is inactive i.e. banned or disconnected then that node is
not participating in the cluster

if a client tries to attach to a database while the node is inactive,  
return an error back to the client and fail the attach

(This used to be ctdb commit b26949f3c8e54f3bc60da04d7b4ac69f301068fc)
2007-08-22 11:34:48 +10:00
Ronnie Sahlberg
b47384d57a when a node becomes banned its databases are no longer part of ctdb
and it should thus no longer serve any database access calls until it 
has been reintroduced into the cluster.

when becoming banned,   reset the local generation id to 1   to prevent 
any further database access calls from other nodes from being processed.

(This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8)
2007-08-22 10:38:35 +10:00
Ronnie Sahlberg
5fef81a6f1 if lockwait takes an excessive time to complete. log the time it took to
complete and also the name of the database

(This used to be ctdb commit 221ef0348fd8113a017d229d8c2c7aa5c4dfb5c2)
2007-08-22 09:46:48 +10:00
Ronnie Sahlberg
8b06fc7284 change the structure used for node flag change messages so that we can
see both the old flags as well as the new flags (so we can tell which 
flags changed)

send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to 
every node, connected or not, in the cluster.


in the handler inside the recovery daemon which is invoked for node flag 
change messages, only do a takeover_run() and redistribute the ip addresses IF it was the 
disabled or the unhealthy flags that changed. Also send out the cluster 
reconfigured message in this case.
If any of the other flags changed we dont need to do the takeover_run(0 
here since that will be done during recovery.

(This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)
2007-08-21 17:25:15 +10:00
Ronnie Sahlberg
4e4dd6b886 when we shutdown the service due to receiving a 'ctdb shutdown' command
from the administrator, log this as 'Received SHUTDOWN command. Stopping 
CTDB daemon.'   so that the administrator will know when looking at the 
log 'why' the ctdb service was terminated.

Previously the only thing logged was 'shutting down' which is not 
detailed enough.

(This used to be ctdb commit 5b818c1b72b6594a8d6e45e1865026e3ce33ae63)
2007-08-21 09:46:27 +10:00
Ronnie Sahlberg
5228abef64 add an atexit() that will print "CTDB daemon shutting down" in the log
when the main daemon exits

(This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)
2007-08-21 09:43:53 +10:00
Ronnie Sahlberg
a03c8d4954 setup the logfile much earlier in the startup procedure for ctdbd
change initial errors that cause ctdb to fail to start from printf to 
DEBUG(0

add a DEBUG(0 to log that the ctdb service is starting

(This used to be ctdb commit 680b4fbb283dd68567a62a83345f11a6cc1dd0e5)
2007-08-21 09:33:03 +10:00
Ronnie Sahlberg
b582e13cae make sure that the event script is executable and just ignore it
othervise

(This used to be ctdb commit 65eb7845c70489d654acaaf99cd2c8eac7df11dc)
2007-08-21 09:22:14 +10:00
Ronnie Sahlberg
aed2c58c64 dont pollute the log with 'Registered PID XXX for client YYY' at log
level 0.

change the log level to 3 for this information message

(This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)
2007-08-21 08:42:42 +10:00
Ronnie Sahlberg
7e1f840c8d if a public address has already been taken over by a node, then let that
public address remain at that node until either the node becomes 
unhealthy or the original/primary node for that address becomes healthy 
again.


Othervise what will happen is 
1, if we ban a node,   the banning code immediately does a 
takeover_run() and reassigns the public address to a different node in 
the cluster.
2, a few seconds later (at most) the recovery daemon will detect that 
the number of nodes has shrunk and will initiate a recovery.
During the recovery  the public address would again be assigned to a 
node, this time a different node.

(This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)
2007-08-20 14:16:58 +10:00
Andrew Tridgell
405e123ffb removed redundent debug message
(This used to be ctdb commit 9ee742b7cc43be7da6b568308912a3f2cfe4f4d3)
2007-08-20 11:13:38 +10:00
Andrew Tridgell
46639ac19e merged new event script calling code from ronnnie
(This used to be ctdb commit bbacad61b3eee4276ffe44ed2a23949aca8152cf)
2007-08-20 11:10:30 +10:00
Ronnie Sahlberg
7322e82bcb add text to the event script timeout log on how to find out which script
timed out

(This used to be ctdb commit bd6db995fb00ed45c5f0a50bbe6cf5d0fe22a194)
2007-08-15 15:08:42 +10:00
Ronnie Sahlberg
3b9d50f3ee change the now rather small /etc/ctdb/events script into a service
specific script /etc/ctdb/events.d/00.ctdb

get rid of CTDB_EVENTS_SCRIPT and --event-script

(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
2007-08-15 15:01:31 +10:00
Ronnie Sahlberg
ff58f7c7ea add a comment that the talloc_free also removes the script from the tree
(This used to be ctdb commit ce71f6e9cf983cc4fe66935ad6c18d55dfed03a5)
2007-08-15 14:46:06 +10:00
Ronnie Sahlberg
4023576e50 call the service specific event scripts directly from the forked child
instead for from /etc/ctdb/events so that we can get better debugging 
output in the logs when something fails in the scripts

(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)
2007-08-15 14:44:03 +10:00
Ronnie Sahlberg
5a02262a06 comment that ctdb_event_script_v() is called from a forked childs
context and thus can make blocking calls

(This used to be ctdb commit b31d98281f15995ad340d2510e08e04ed46e271a)
2007-08-15 10:48:10 +10:00
Ronnie Sahlberg
56d5ef27b6 add a wrapper function to create the key used to insert/lookup a certain
tcp connection in the tree that stores the tcp connections to kill by 
sending an RST

add a define that specified the keylength instead of hardcoding it as 4

(This used to be ctdb commit 6a8322cbae10f2c78b2e286c75aeb25ece12ea7f)
2007-08-15 10:01:00 +10:00
Ronnie Sahlberg
adb49f02f0 change the mem hierarchy for trees. let the node be owned by the data
we store in the tree and use a node destructor so that when the data is 
talloc_free()d we also remove the node from the tree.

(This used to be ctdb commit b8dabd1811ebd85ee031563e95085f720a2fa04d)
2007-08-09 14:08:59 +10:00
Ronnie Sahlberg
9c216d0d76 when we want to kill a tcp connection we stored the connection
description (src + dst sockaddr_in) in a linked list.
everytime we receive a captured packet from the network we had to walk 
this list in linear time to see if the packet matched a connection we 
wanted to RST.
which wouldnt scale very well.


replace the linked list with a redblack tree that is indexed by
src address, src port,  dst address,   dst port
to make checking whether the packet belongs to a connection we want to 
RST very fast and scalable


the reason we need to capture packets when we want to kill a TCP 
connection is because we must wait for an ACK coming back from the 
remote host  so that we can learn which sequence number to use in the 
RST.
Most tcp today will ingore any and all RST segments unless the 
sequencenumber lies exactly on the right edge of the window to make 
spoofing RST a little bit more difficult.

(This used to be ctdb commit ced18caea8582af042287beb6333dd1f8ba3344d)
2007-08-08 15:09:19 +10:00
Ronnie Sahlberg
203306400e add helpers to traverse a tree where the key is an array of uint32
(This used to be ctdb commit d328c66827cafff6356e96df2a782930274fe139)
2007-08-08 13:50:18 +10:00
Ronnie Sahlberg
dd14afe6aa after we have checked dest address that it is a public address
update addr to the source address so the rpintout in the log matches
the client that attached to samba

(This used to be ctdb commit 72098b71c79469c86769ca82bbd484c81902d27c)
2007-07-30 16:10:14 +10:00
Ronnie Sahlberg
e666808f60 no need to have a separate assignment of the tcparray pointer followed
by a talloc_steal()
use the returned pointer in talloc_steal as the value to assign

(This used to be ctdb commit 5c6375ad3bbecfa725ec3b1477f259e5a8191866)
2007-07-25 08:03:58 +10:00
Ronnie Sahlberg
81294825e7 when we build the arp structure for sending gratious arp (and tcp
tickles) just talloc_steal the enture tcp_array into the arp 
structure instead of copying each of the entries into a linked list
and then releasing the tcparray.

(This used to be ctdb commit 468e237740cf37a65872ef700bbb1284ede8352a)
2007-07-24 07:46:51 +10:00
Ronnie Sahlberg
ea56d1d20e set the tcp tickle update flag to true once we have done a takeover and
tickled all connections
othervise the other nodes will still remember this list until next time 
we have had a connection/client closing.

(This used to be ctdb commit cb8e5d4bbee2f14f498735489f673ff3679dfd9d)
2007-07-20 19:11:45 +10:00
Ronnie Sahlberg
81767b2a7b when a client connects with TCP_CLIENT we should look at the
destination address to find the public address   not the source address

(This used to be ctdb commit d6d4a7f38a52c1c2579a54d14cb7a6981fb42f5b)
2007-07-20 17:04:08 +10:00
Ronnie Sahlberg
fca90ce3c3 updated ctdb tickle management
there is an array for each node/public address that contains tcp tickles

we send a TCP_ADD as a broadcast to all nodes when a client is added

if tcp tickles are removed, they are only removed immediately from the 
local node.
once every 20 seconds a node will push/broadcast out the tickle list for 
all public addresses it manages.   this will remove any deleted tickles 
from the remote nodes

(This used to be ctdb commit e3c432a915222e1392d91835bc7a73a96ab61ac9)
2007-07-20 15:05:55 +10:00
Ronnie Sahlberg
7b17afdfcd change the tickle list from one global list into an array per public
ip/node

once we have started sending all tickles for a specific ip   delete the 
entire array   so that the tickles dont remain forever in the ctdb 
server

add a control to send the full list of every tickle that is registered 
for a particular public ip/node

(This used to be ctdb commit d0eee33e44d3f8e26debbec21d41e2cbdbb520e6)
2007-07-20 10:06:41 +10:00
Andrew Tridgell
394190d3cc - log registering of tcp clients
- don\'t remove a tcp entry if we do not own the ip
(This used to be ctdb commit 400aa284b9785ce6409e7600df429f5849e3867d)
2007-07-19 15:04:54 +10:00
Andrew Tridgell
689195b455 make sure we still run events when waiting for ctdb_event_script()
(This used to be ctdb commit 05efbfe9ff9691c1d7441e7b9855aed25791faf0)
2007-07-19 13:36:00 +10:00
Andrew Tridgell
fb22d3bd2c merged from ronnie
(This used to be ctdb commit 765b07fa5d1af07c8c7212d19d8e9574060b3039)
2007-07-18 20:13:57 +10:00
Ronnie Sahlberg
4d1f3acc94 add a check if start_node is beyond the end of the nodemap and reset it
back to 0 if it is to prevent an infinite loop.

this could happen if in the future we add a mechanism to add/remove 
nodes to a cluster at runtime

(This used to be ctdb commit 217e80a468713fec86ccb0608460e3401046bb98)
2007-07-16 08:36:09 +10:00
Ronnie Sahlberg
49f98e79fd change the way we pick/find a new node to takeover for a failed node
to keep a static that controls at which noide to start searching the 
list for takeover candidates next time we need to find a node.

each time we find a node to takeover, reset the start variable to point 
to the next node in the list

this makes the distribution of takeover nodes much more even

(This used to be ctdb commit e9800df5a21079ea478d16f7dd2fd4707de85650)
2007-07-16 08:28:44 +10:00
Ronnie Sahlberg
f09566a81a add a private_data field to the killtcp structure and let the system
specific routines populate it as it see fit when creating a 
capture socket.
pass this structure to read_tcp and close capture socket as parameter

(This used to be ctdb commit 79bbfcfb2223889126fe307d5bbfd24917da07ee)
2007-07-13 17:07:10 +10:00
Andrew Tridgell
8f637e6317 ensure killtcp structure is initialised
(This used to be ctdb commit 2fe7d1ce87e55e125411e7406a9e00b8f55e3cb7)
2007-07-13 11:55:58 +10:00
Andrew Tridgell
1e14ecd176 - merge from ronnie
- cleaner handling of system capture socket

(This used to be ctdb commit d194a41a71b8466d0726dcbae3970a86386fcb3c)
2007-07-13 11:31:18 +10:00
Andrew Tridgell
d2a5af7eb8 fully save/restore scheduler parameters
(This used to be ctdb commit 59408eabe7515d49a6eef3b6fb2590a1cd1df956)
2007-07-13 09:35:46 +10:00
Andrew Tridgell
698a8bc909 fixed the sense of do_setsched
(This used to be ctdb commit 68bca2454ff43ce6d8aab2f87d669d33f5f2a10c)
2007-07-13 09:14:31 +10:00
Andrew Tridgell
fc73bc5c24 added --nosetsched option to ctdbd
(This used to be ctdb commit 4cbbb88c1735c7d112e751e22da1c1c69e09bf4a)
2007-07-13 08:47:02 +10:00
Ronnie Sahlberg
a650497680 as an optimization for when we want to send multiple tickles at a time
let the caller create the sending socket and use a single socket instead 
of one new one for each tickle.
pass a sending socket to ctdb_sys_send_tcp()

ctdb_sys_kill_tcp is not longer used so remove it

set the socketflags for close on exec and nonblocking in the helper that 
creates the sockets instead of in the caller

add a helper to create a sending socket to send tickles from

(This used to be ctdb commit 469f3fb238a0674a2b48fdf1a7e657e32428178a)
2007-07-12 09:22:06 +10:00
Ronnie Sahlberg
823b7d4a5f rename killtcp->fd to killtcp->capture_fd
we might want to have two sockets attached to the killtcp structure
one for capturing and a second one for sending  so we dont have to 
create a new socket for each tickle we want to send

(This used to be ctdb commit b3e82ec38047bbec1edfd88ade264077d4cbd2ee)
2007-07-12 08:52:24 +10:00
Ronnie Sahlberg
76ab80104a make the ctdb tool use the killtcp control in the daemon instead of
calling killtcp directly

(This used to be ctdb commit d21e3e9cf11bdcba6234302e033d6549c557dd69)
2007-07-12 08:30:04 +10:00