1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

1146 Commits

Author SHA1 Message Date
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
12ebb74838 change how we do public addresses and takeover so that we can have
multiple public addresses spread across multiple interfaces on each 
node.

this is a massive patch since we have previously made the assumtion that 
we only have one public address per node.

get rid of the public_interface argument.  the public addresses file 
now explicitely lists which interface the address belongs to

(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
2007-09-04 09:50:07 +10:00
Ronnie Sahlberg
4495dbacec merge from tridge
(This used to be ctdb commit 5e2a9333363d76378d27f93231f217999a0c30e5)
2007-09-03 09:29:30 +10:00
Andrew Tridgell
7423bcaabe up the release number
(This used to be ctdb commit 71a6213c92a12bf794c17c30ae4987149b68fe1b)
2007-08-30 17:51:05 +10:00
Andrew Tridgell
6333634e77 merge from ronnie
(This used to be ctdb commit e8138d9375fc34ae0cb31cc0e6ca042baf83eff8)
2007-08-30 17:16:23 +10:00
Ronnie Sahlberg
4e61e05f49 when we start 60.nfs we must make sure that the shared storage
nfs-state directory actually exists (by creating it)
or else the lock manager will not start 

(This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)
2007-08-30 15:27:45 +10:00
Andrew Tridgell
8c94d4dc87 merge from ronnie
(This used to be ctdb commit ab11fd70cf4d2165a5b55930cbad6fddf5397f54)
2007-08-27 18:04:53 +10:00
Ronnie Sahlberg
6b3cb21065 merge from tridge
(This used to be ctdb commit 7cb17a0752c683f9b244e6f61fa45a770593c68d)
2007-08-27 18:04:17 +10:00
Ronnie Sahlberg
794fb10634 add an extra debug statement when we send a SIGTERM to a process
(This used to be ctdb commit a9c1be9cf9efdc69bfc95657b70e9f8b8230cda8)
2007-08-27 17:33:46 +10:00
Ronnie Sahlberg
2c0c94782a make the ctdb shutdown command use the async _send() function to send
the shutdown command
and return success to the caller if the _send() was successful

(This used to be ctdb commit 6bacaf8c7a96044708a6eda10cc8576adb7f5f79)
2007-08-27 15:03:52 +10:00
Andrew Tridgell
7f630b67f6 fixed segv when no public interface is set
(This used to be ctdb commit 55b415f87bd3cba13c73ccd2fe661720754a6af7)
2007-08-27 11:49:42 +10:00
Ronnie Sahlberg
7f02e16143 add async versions of the freeze node control and freeze all nodes in
parallell 

(This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505)
2007-08-27 10:31:22 +10:00
Ronnie Sahlberg
a9c45b2562 change the monitoring of recmode in the recovery daemon to use a fully
async eventdriven api for controls

(This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546)
2007-08-27 09:40:10 +10:00
Ronnie Sahlberg
801bdbdc80 add a control to pull the server id list off a node
(This used to be ctdb commit 38aa759aa88a042c31b401551f6a713fb7bbe84e)
2007-08-26 10:57:02 +10:00
Ronnie Sahlberg
6681da31df add an initial implementation of a service_id structure and three
controls to  register/unregister/check a server id.

a server id consists of TYPE:VNN:ID    where type is specific to the 
application.  VNN is the node where the serverid was registered and ID 
might be a node unique identifier such as a pid or similar.


Clients can register a server id for themself at the local ctdb daemon.
When a client dissappears   or when the domain socket connection for the 
client drops  then any and all server ids registered across that domain 
socket will also be automatically removed from the store.

clients can register as many server_ids as they want at the same time    
but each TYPE:VNN:ID must be globally unique.

Clients have the option of explicitely unregister a server id by using 
the UNREGISTER control.


Registration and unregistration can only be done by clients to the local 
daemon. clients can not register their server id to a remote node.


clients can check if a server id does exist on any ctdb node in the 
network by using the check control

(This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)
2007-08-24 15:53:41 +10:00
Ronnie Sahlberg
de23937368 cleanup invoke_control_callback. we dont need to pass some of these
parameters to _recv() since they are already set

(This used to be ctdb commit 2034dbebb26d7a2d51241943f6ccbe15bb6a5169)
2007-08-24 10:54:34 +10:00
Ronnie Sahlberg
495a6403da change the api for managing callbacks to controls so that isntead of
passing it as a parameter we set the callback function explicitely from 
the caller if the ..._send() function returned a valid state pointer.

(This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)
2007-08-24 10:42:06 +10:00
Ronnie Sahlberg
1da9c03b1f comment why we do a talloc_steal
(This used to be ctdb commit aba7972728307e0ae52ccf8c0dd5808110fb92d7)
2007-08-24 09:34:04 +10:00
Ronnie Sahlberg
62a03ef9d5 get rid of the explicit global timeout used in the previous example and
try this time by relying on the timeouts for the individual controls

(This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be)
2007-08-23 19:38:54 +10:00
Ronnie Sahlberg
f854b5f876 try out a slightly different api for controls where you provide a
callback function which is called upon completion (or timeout) of the 
control.

modify scanning of recmaster in the monitoring_cluster code to try the 
api out

(This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)
2007-08-23 19:27:09 +10:00
Ronnie Sahlberg
4c13bf0c5f break checking that the recoverymode on all nodes are ok out into its
own function

(This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939)
2007-08-23 13:48:39 +10:00
Ronnie Sahlberg
8fd3df2553 hang the ctdb_req_control structure off the ctdb_client_control_state
struct  so that if we timeout a control we can print debug info such as 
what opcode failed and to which node

we dont need the *status parameter to ctdb_client_control_state

create async versions of the getrecmaster control

pass a memory context to getrecmaster

(This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)
2007-08-23 13:00:10 +10:00
Ronnie Sahlberg
20120c2331 in ctdb_call_recv() we must check that state is non-NULL since
ctdb_call() may pass a null pointer to _recv() and this would cause a 
segfault.
fortunately there appears there are no critical users for this codepath 
right now so the risk was more theoretical IF clients start using this 
call it coult segfault.


change ctdb_control() to become fully async so we later can make 
recovery daemon do the expensive controls to nodes in parallell instead 
of in sequence

(This used to be ctdb commit 379789cda6ef049f389f10136aaa1b37a4d063a9)
2007-08-23 11:58:09 +10:00
Ronnie Sahlberg
277cdbe3d1 create an enum to describe the state of a control in flight instead of
using the enum that is for calls

(This used to be ctdb commit f9cf7076151af983a1c4ea56fbeb6d94ea508a34)
2007-08-23 09:53:10 +10:00
Ronnie Sahlberg
a4ede6da6f merge from tridge
(This used to be ctdb commit 3e17a62e7d9f2867d6f697d5dc5cdddf9fdc3497)
2007-08-22 19:28:03 +10:00
Andrew Tridgell
d95476fa38 merge from ronnie
(This used to be ctdb commit e0f1c1acb1188500674626d631e1a1b8726e72ad)
2007-08-22 17:31:29 +10:00
Andrew Tridgell
df9ec77b6b merge from volker
(This used to be ctdb commit a5587b3c065f7115ad5e55429c2c9d9923d3b4dc)
2007-08-22 17:18:55 +10:00
Andrew Tridgell
95f6328678 merge from volker
(This used to be ctdb commit 7007e4f2292aa96287b899d6b9e82c7b597ef58f)
2007-08-22 17:16:01 +10:00
Ronnie Sahlberg
50c09b7465 when we receive a packet from the network, check explicitely that the
node is not banned it the call is for a database record. i.e a REQ/REPLY 
CALL/DMASTER

if we get such a call while banned, ignore the packet and write an entry 
in the logfile

(This used to be ctdb commit 79eb0863609fbb12e28ebf734101b1d3f359b330)
2007-08-22 12:53:24 +10:00
Ronnie Sahlberg
f6e0336b23 create a define to represent the 'invalid' generation id we used in two
places.

create a new helper function to generate new generation id values that 
know about the invalid id and avoids generating it.

update the ctdb status tool to know about the invalid generation id and 
print the string INVALID instead

(This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda)
2007-08-22 12:38:31 +10:00
Ronnie Sahlberg
e3b6d1e511 if the node is inactive i.e. banned or disconnected then that node is
not participating in the cluster

if a client tries to attach to a database while the node is inactive,  
return an error back to the client and fail the attach

(This used to be ctdb commit b26949f3c8e54f3bc60da04d7b4ac69f301068fc)
2007-08-22 11:34:48 +10:00
Ronnie Sahlberg
b47384d57a when a node becomes banned its databases are no longer part of ctdb
and it should thus no longer serve any database access calls until it 
has been reintroduced into the cluster.

when becoming banned,   reset the local generation id to 1   to prevent 
any further database access calls from other nodes from being processed.

(This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8)
2007-08-22 10:38:35 +10:00
Ronnie Sahlberg
5fef81a6f1 if lockwait takes an excessive time to complete. log the time it took to
complete and also the name of the database

(This used to be ctdb commit 221ef0348fd8113a017d229d8c2c7aa5c4dfb5c2)
2007-08-22 09:46:48 +10:00
Ronnie Sahlberg
8b06fc7284 change the structure used for node flag change messages so that we can
see both the old flags as well as the new flags (so we can tell which 
flags changed)

send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to 
every node, connected or not, in the cluster.


in the handler inside the recovery daemon which is invoked for node flag 
change messages, only do a takeover_run() and redistribute the ip addresses IF it was the 
disabled or the unhealthy flags that changed. Also send out the cluster 
reconfigured message in this case.
If any of the other flags changed we dont need to do the takeover_run(0 
here since that will be done during recovery.

(This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)
2007-08-21 17:25:15 +10:00
Ronnie Sahlberg
4e4dd6b886 when we shutdown the service due to receiving a 'ctdb shutdown' command
from the administrator, log this as 'Received SHUTDOWN command. Stopping 
CTDB daemon.'   so that the administrator will know when looking at the 
log 'why' the ctdb service was terminated.

Previously the only thing logged was 'shutting down' which is not 
detailed enough.

(This used to be ctdb commit 5b818c1b72b6594a8d6e45e1865026e3ce33ae63)
2007-08-21 09:46:27 +10:00
Ronnie Sahlberg
5228abef64 add an atexit() that will print "CTDB daemon shutting down" in the log
when the main daemon exits

(This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)
2007-08-21 09:43:53 +10:00
Ronnie Sahlberg
a03c8d4954 setup the logfile much earlier in the startup procedure for ctdbd
change initial errors that cause ctdb to fail to start from printf to 
DEBUG(0

add a DEBUG(0 to log that the ctdb service is starting

(This used to be ctdb commit 680b4fbb283dd68567a62a83345f11a6cc1dd0e5)
2007-08-21 09:33:03 +10:00
Ronnie Sahlberg
b582e13cae make sure that the event script is executable and just ignore it
othervise

(This used to be ctdb commit 65eb7845c70489d654acaaf99cd2c8eac7df11dc)
2007-08-21 09:22:14 +10:00
Ronnie Sahlberg
aed2c58c64 dont pollute the log with 'Registered PID XXX for client YYY' at log
level 0.

change the log level to 3 for this information message

(This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)
2007-08-21 08:42:42 +10:00
Ronnie Sahlberg
7e1f840c8d if a public address has already been taken over by a node, then let that
public address remain at that node until either the node becomes 
unhealthy or the original/primary node for that address becomes healthy 
again.


Othervise what will happen is 
1, if we ban a node,   the banning code immediately does a 
takeover_run() and reassigns the public address to a different node in 
the cluster.
2, a few seconds later (at most) the recovery daemon will detect that 
the number of nodes has shrunk and will initiate a recovery.
During the recovery  the public address would again be assigned to a 
node, this time a different node.

(This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)
2007-08-20 14:16:58 +10:00
Ronnie Sahlberg
d823685ee2 merge from tridge
(This used to be ctdb commit 42f38e787eaa3d8534ce24fca4f29d9ff5bdb9e6)
2007-08-20 13:29:27 +10:00
Andrew Tridgell
405e123ffb removed redundent debug message
(This used to be ctdb commit 9ee742b7cc43be7da6b568308912a3f2cfe4f4d3)
2007-08-20 11:13:38 +10:00
Andrew Tridgell
46639ac19e merged new event script calling code from ronnnie
(This used to be ctdb commit bbacad61b3eee4276ffe44ed2a23949aca8152cf)
2007-08-20 11:10:30 +10:00
Ronnie Sahlberg
1ee8c79db7 start winbind before smbd
(This used to be ctdb commit d6a2e22a6d688cfcf5631c8de68fc8ef721635d6)
2007-08-16 11:34:35 +10:00
Ronnie Sahlberg
ce91401724 we should start winbindd before we start smb
(This used to be ctdb commit 03aad3ea55c4816a3790ac9336026b4872a65310)
2007-08-16 11:18:16 +10:00
Ronnie Sahlberg
7322e82bcb add text to the event script timeout log on how to find out which script
timed out

(This used to be ctdb commit bd6db995fb00ed45c5f0a50bbe6cf5d0fe22a194)
2007-08-15 15:08:42 +10:00
Ronnie Sahlberg
3b9d50f3ee change the now rather small /etc/ctdb/events script into a service
specific script /etc/ctdb/events.d/00.ctdb

get rid of CTDB_EVENTS_SCRIPT and --event-script

(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
2007-08-15 15:01:31 +10:00
Ronnie Sahlberg
ff58f7c7ea add a comment that the talloc_free also removes the script from the tree
(This used to be ctdb commit ce71f6e9cf983cc4fe66935ad6c18d55dfed03a5)
2007-08-15 14:46:06 +10:00
Ronnie Sahlberg
4023576e50 call the service specific event scripts directly from the forked child
instead for from /etc/ctdb/events so that we can get better debugging 
output in the logs when something fails in the scripts

(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)
2007-08-15 14:44:03 +10:00
Ronnie Sahlberg
6fc0653b97 zero out the sa struct to supress a valgrind error
(This used to be ctdb commit b17ff60ad4c5fac76d3f77dacb10c30ae564bf09)
2007-08-15 12:34:41 +10:00