samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	1c88f422d5	add a parameter for the tdb-flags to the client function ctdb_attach() so that we can pass TDB_NOSYNC when we attach to a persistent database and want fast unsafe writes instead of slow but safe tdb_transaction writes. enhance the ctdb_persistent test suite to test both safe and unsafe writes (This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4)	2008-06-04 10:46:20 +10:00
Ronnie Sahlberg	ceaf488f05	do persistent writes in a child process (This used to be ctdb commit 2da3d1f876f5d654f849af8a3e588f5a61300c3d)	2008-05-28 13:04:25 +10:00
Ronnie Sahlberg	ed2cf0291d	second try for safe transaction stores into persistend tdb databases for stores into persistent databases, ALWAYS use a lockwait child take out the lock for the record and never the daemon itself. (This used to be ctdb commit 7fb6cf549de1b5e9ac5a3e4483c7591850ea2464)	2008-05-22 12:47:33 +10:00
Ronnie Sahlberg	909ff219e0	Start implementing support for ipv6. This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well. Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close) a raw socket of the appropriate type inside ctdb_sys_send_tcp(). We know which type of socket v4/v6 to use based on the sin_family of the destination address. Since ctdb_sys_send_tcp() opens its own socket we no longer nede to pass a socket descriptor as a parameter. Get rid of this redundant parameter and fixup all callers. (This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)	2008-05-14 15:47:47 +10:00
Ronnie Sahlberg	2bc0e5a69f	add a new container to hold a socketaddr for either ipv4 or ipv6 (This used to be ctdb commit 93b98838824fae5f47e4ed6b95ae9e4e7597bec3)	2008-05-14 15:40:44 +10:00
Ronnie Sahlberg	b8eb5925cf	Try to use tdb transactions when updating a record and record header inside the ctdb daemon. If a transaction could be started, do safe transaction store when updating the record inside the daemon. If the transaction could not be started (maybe another samba process has a lock on the database?) then just do a normal store instead (instead of blocking the ctdb daemon). The client can "signal" ctdb that updates to this database should, if possible, be done using safe transactions by specifying the TDB_NOSYNC flag when attaching to the database. The TDB flags are passed to ctdb in the "srvid" field of the control header when attaching using the CTDB_CONTROL_DB_ATTACH_PERSISTENT. Currently, samba3.2 does not yet tell ctdbd to handle any persistent databases using safe transactions. If samba3.2 wants a particular persistent database to be handled using safe transactions inside the ctdbd daemon, it should pass TDB_NOSYNC as the flags to the call to attach to a persistent database in ctdbd_db_attach() it currently specifies 0 as the srvid (This used to be ctdb commit 8d6ecf47318188448d934ab76e40da7e4cece67d)	2008-05-12 13:37:31 +10:00
Ronnie Sahlberg	92b61cd7d5	Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6)	2008-05-06 15:42:59 +10:00
Ronnie Sahlberg	a9c45f9513	Add a capabilities field to the ctdb structure Define two capabilities : can be recmaster can be lmaster Default both capabilities to YES Update the ctdb tool to read capabilities off a node (This used to be ctdb commit 50f1255ea9ed15bb8fa11cf838b29afa77e857fd)	2008-05-06 10:02:27 +10:00
Ronnie Sahlberg	0e1a20b603	Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""" remove the transaction stuff and push so that the git tree will work This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b. (This used to be ctdb commit 876d3aca18c27c2239116c8feb6582b3a68c6571)	2008-04-10 15:59:51 +10:00
Ronnie Sahlberg	39f119b42c	Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"" This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603. (This used to be ctdb commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b)	2008-04-10 14:57:41 +10:00
Ronnie Sahlberg	9684befa16	Revert "- accept an optional set of tdb_flags from clients on open a database," This reverts commit 49330f97c78ca0669615297ac3d8498651831214. (This used to be ctdb commit 171d1d71ef9f2373620bd7da3adaecb405338603)	2008-04-10 14:45:45 +10:00
Andrew Tridgell	dc15a9c1f6	- accept an optional set of tdb_flags from clients on open a database, thus allowing the client to pass through the TDB_NOSYNC flag - ensure that tdb_store() operations on persistent databases that don't have TDB_NOSYNC set happen inside a transaction wrapper, thus making them crash safe (This used to be ctdb commit 49330f97c78ca0669615297ac3d8498651831214)	2008-04-10 15:25:48 +10:00
Ronnie Sahlberg	e8e67ef576	add a mechanism to force a node to run the eventscripts with arbitrary arguments ctdb eventscript "command argument argument ..." (This used to be ctdb commit 118a16e763d8332c6ce4d8b8e194775fb874c8c8)	2008-04-02 11:13:30 +11:00
Ronnie Sahlberg	27a7f854f5	add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)	2008-04-01 15:34:54 +11:00
Ronnie Sahlberg	0d7b34c9e5	Add two new controls to add/delete public ip address from a node at runtime. The controls only modify the runtime setting of which public addresses a node can server and does not modify /etc/ctdb/public_addresses. To make the change permanent you also need to edit /etc/ctdb/public_addresses manually. After ip addresses have been added/deleted you need to invoke a recovery for the ip addresses to be redistributed. (This used to be ctdb commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c)	2008-03-27 09:23:27 +11:00
Ronnie Sahlberg	2863d2cfd1	From M Dietz, Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago (This used to be ctdb commit 8477f6a079e2beb8c09c19702733c4e17f5032fe)	2008-03-25 08:27:38 +11:00
Ronnie Sahlberg	d53424731f	in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb. This can cause a memory leak if the call is terminated before we have managed to respond to the client. (and the call is talloc_free()d but the data is still hanging off ctdb) instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak. In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc(). This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so we must change all creations of a ctdb_call into explicitely creating it through talloc() (This used to be ctdb commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f)	2008-03-19 13:54:17 +11:00
Ronnie Sahlberg	74d57f8d51	Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53)	2008-03-13 07:53:29 +11:00
Ronnie Sahlberg	a89ed0fdc2	add a new tunable 'NoIPFailback' when this tunable is set, ip addresses will only be failed over when a node fails. And only those ip addresses held by the failed node will be reallocated in the cluster. When a node becomes active again, this will not lead to any failback of ip addresses. This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes. This tuneable can NOT be active at the same time as DeterministicIPs are used. (This used to be ctdb commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4)	2008-03-03 12:52:16 +11:00
Ronnie Sahlberg	f6f7f54bd6	add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287)	2008-03-03 09:19:30 +11:00
Ronnie Sahlberg	e0036942bc	add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09)	2008-02-29 12:37:42 +11:00
Ronnie Sahlberg	4adeafef11	add a control to get the name of the reclock file from the daemon (This used to be ctdb commit 9effb22cc1616d684352d7ebabb359e69adb0f52)	2008-02-29 10:03:39 +11:00
Ronnie Sahlberg	7bc8007f93	add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY. Use with caution. (This used to be ctdb commit c20293360db67f9876b0c84e5e9e12a5868964cb)	2008-02-22 10:33:09 +11:00
Ronnie Sahlberg	39539f6044	Add a new parameter to /etc/sysconfig/ctdb CTDB_START_AS_DISABLED="yes" and command line argument --start-as-disabled When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses. The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses. Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster. (This used to be ctdb commit b93d29f43f5306c244c887b54a77bca8a061daf2)	2008-02-22 09:42:52 +11:00
Ronnie Sahlberg	9f99b44fd1	to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c)	2008-02-19 14:44:48 +11:00
Ronnie Sahlberg	3f56526037	Specify and print debuglevels by name and not by number (This used to be ctdb commit 79ad830294b8b677fbd0c5ad7ed6fbde71f74f8d)	2008-02-05 10:26:23 +11:00
Andrew Tridgell	9d6ac0cf55	added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)	2008-02-04 17:44:24 +11:00
Andrew Tridgell	146d4b0db7	merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2)	2008-01-29 13:59:28 +11:00
Ronnie Sahlberg	9055978b46	add a ctdb uptime command that prints when ctdb was started and when the last recovery occured (This used to be ctdb commit b86e8ccbdac044bb949c4fc2ebb27635126272a9)	2008-01-17 11:33:23 +11:00
Andrew Tridgell	b62b7fcde8	added syslog support, and use a pipe to catch logging from child processes to the ctdbd logging functions (This used to be ctdb commit 1306b04cd01e996fd1aa1159a9521f2ff7b06165)	2008-01-16 22:03:01 +11:00
Ronnie Sahlberg	5b7838d768	ctdb_control_send() does not need to take an outdata parameter remove the outdata parameter from the function and all callers (This used to be ctdb commit e3951337f8df2ae19cce61c954036590c7a03582)	2008-01-16 10:23:26 +11:00
Ronnie Sahlberg	ba31feaec0	split node health monitoring and checking for connected/disconnected nodes into two separate files. move the monitoring of keepalives for detecting connected/disconnected remote nodes into ctdb_keepalive.c (This used to be ctdb commit 23a57b20c314d5f11a433cf251eb9d9de743849a)	2008-01-15 08:42:12 +11:00
Andrew Tridgell	b866a147d2	get rid of monitor_retry as well (This used to be ctdb commit c957cf9c1d99d5d3f4ca726f7a867c829660a2b7)	2008-01-10 14:49:43 +11:00
Andrew Tridgell	538f519dba	exponential backoff in health monitoring for faster startup (This used to be ctdb commit 1b04a1f675f73b48366ba98803a58c3d8df1b6e1)	2008-01-10 14:40:56 +11:00
Andrew Tridgell	3b3fceacbe	block alarm signals during critical sections of vacuum (This used to be ctdb commit cfb14ae76f00f10d27b56c034b2247ab12d63065)	2008-01-10 09:43:14 +11:00
Andrew Tridgell	1c91398aef	ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b)	2008-01-08 21:28:42 +11:00
Andrew Tridgell	96100fcae6	added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4)	2008-01-08 17:23:27 +11:00
Andrew Tridgell	37861932ce	merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7)	2008-01-07 16:17:22 +11:00
Andrew Tridgell	748843a3c6	added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e)	2008-01-06 13:24:55 +11:00
Andrew Tridgell	c08f2616cd	new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3)	2008-01-06 12:38:01 +11:00
Andrew Tridgell	43aa27c9ee	this is needed with merged tdb (This used to be ctdb commit 3dc07f2bf98ab445ab960ef14173bc6924e3b658)	2008-01-05 17:42:01 +11:00
Andrew Tridgell	e4aefbc66d	a new tunable DatabaseMaxDead that enables the tdb max dead cache logic (This used to be ctdb commit 01c519c3658a8fcb9545b507b597e723658e4c4e)	2008-01-05 09:36:53 +11:00
Ronnie Sahlberg	50573c5391	add ctdb_disable/enable_monitoring() that only modifies the monitoring flag. change calling of the recovered/takeip/releaseip event scripts to use these enable/disable functions instead of stopping/starting monitoring. when we disable monitoring we want all events to still be running in particular the events to monitor for dead nodes and we only want to supress running the monitor event scripts (This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)	2007-11-30 10:09:54 +11:00
Ronnie Sahlberg	0eb6c04dc1	get rid of the control to set the monitoring mode. monitoring should always be enabled (though a node may want to temporarily disable running the "monitor" event scripts but can do so internally without the need for this control) (This used to be ctdb commit e3a33618026823e6af845fd8513cddb08e6b5584)	2007-11-30 10:00:04 +11:00
Ronnie Sahlberg	9e73dc87cc	Add a --node-ip argument so that one can specify which ip address a specific instance of ctdbd should bind to. This helps when running a "virtual" cluster on a single machine where all instcances bind to different alias interfaces. If --node-ip is specified, then we will only try to bind to this ip address only. Othervise we fall back to the original method trying the ip addresses in /etc/ctdb/nodes one by one until we find one we can bind to. No variable in /etc/sysconfig/ctdb added since this parameter only makes sense in a virtual test/debug cluster. (This used to be ctdb commit d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0)	2007-11-26 10:52:55 +11:00
Andrew Tridgell	bde886988b	prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)	2007-11-12 10:53:11 +11:00
Ronnie Sahlberg	4a97876fb7	when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)	2007-10-22 12:34:08 +10:00
Ronnie Sahlberg	d1ba047b7f	add a new transport method so that when a node is marked as dead, we shut down and restart the transport othervise, if we use the tcp transport the tcp connection might try to retransmit the queued data during the time the node is unavailable. this together with the exponential backoff for tcp means that the tcp connection quickly reaches the maximum backoff rto which is often 60 or 120 seconds. this would mean that it could take up to 60/120 seconds before the tcp layer detects that the connection is dead and it has to be reestablished. (This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)	2007-10-19 08:58:30 +10:00
Ronnie Sahlberg	056aac6e0c	add a new tunable : DeterministicIPs that makes the allocation of public addresses to nodes deterministic. Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb When this is set, the first entry in /etc/ctdb/public_addresses will always be hosted by node 0, when that node is available, the second entry by node1 and so on. This tunable allows the allocation of addresses to become very unbalanced and is only for debugging/testing use. Beware, this feature requires that /etc/ctdb/public_addresses are identical on all the nodes in the cluster. (This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)	2007-10-16 12:15:02 +10:00
Andrew Tridgell	0e855c0772	merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676)	2007-10-15 14:17:49 +10:00
Andrew Tridgell	174879621e	add config option for disabling bans (This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)	2007-10-15 13:22:58 +10:00
Ronnie Sahlberg	bdd67bba1e	add a --single-public-ip argument to ctdbd to specify the ip address used in single public ip address mode. when using this argument, --public-interface must also be used. add a vnn structure to the ctdb context to describe the single public ip address update the killtcp control in the daemon that if a socketpair that is to be killed does not match a normal public address it checks if the destination address maches the single public ip address and if so uses that vnn structure from the ctdb context this allows killtcp to kill also connections to the single public ip instead of only normal public addresses (This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)	2007-10-10 09:42:32 +10:00
Ronnie Sahlberg	80cd82f8e4	add a control to send gratious arps from the ctdb daemon (This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)	2007-10-09 11:56:09 +10:00
Ronnie Sahlberg	72379ee3eb	change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)	2007-09-26 14:25:32 +10:00
Andrew Tridgell	80100c3573	run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)	2007-09-24 10:12:18 +10:00
Andrew Tridgell	c60988325d	added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)	2007-09-21 12:24:02 +10:00
Andrew Tridgell	3c0f61cb92	we don't need the is_loopback logic in ctdb any more (This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)	2007-09-13 10:45:06 +10:00
Andrew Tridgell	f3ae1cdb02	- use struct sockaddr_in more consistently instead of string addresses - allow for public_address lines with a defaulting interface (This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)	2007-09-10 14:27:29 +10:00
Ronnie Sahlberg	4ac749bfa4	change the signature to ctdb_sys_have_ip() to also return: a bool that specifies whether the ip was held by a loopback adaptor or not the name of the interface where the ip was held when we release an ip address from an interface, move the ip address over to the loopback interface when we release an ip address after we have move it onto loopback, use 60.nfs to kill off the server side (the local part) of the tcp connection so that the tcp connections dont survive a failover/failback 61.nfstickle, since we kill hte tcp connections when we release an ip address we no longer need to restart the nfs service in 61.nfstickle update ctdb_takeover to use the new signature for ctdb_sys_have_ip when we add a tcp connection to kill in ctdb_killtcp_add_connection() check if either the srouce or destination address match a known public address (This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)	2007-09-10 07:20:44 +10:00
Ronnie Sahlberg	77ec4d5248	allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)	2007-09-04 23:15:23 +10:00
Ronnie Sahlberg	8f819c6a0e	get rid of the ctdb_vnn_list structure and just use a single list of ctdb_vnn (This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)	2007-09-04 18:20:29 +10:00
Ronnie Sahlberg	cf45c5096c	we cant have takeover_ctx hanging off ctdb since it is freed/recreated everytime we release an ip. this context is used to hold all resources needed when sending out gratious arps and tcp tickles during ip takeover. we hang it off the vnn structure that manages that particular ip address instead so that we can have multiple ones going in parallell this bug (or the same bug in different shape) has probably been in ctdb for very very long but is likely to be hard to trigger (This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)	2007-09-04 14:36:52 +10:00
Ronnie Sahlberg	d66d9cdd22	change debug output from vnn to pnn change ctdb_daemon_send_message to take pnn as parameter isntead of vnn (This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)	2007-09-04 10:45:41 +10:00
Ronnie Sahlberg	0c91261340	change ctdb_send_message to take pnn as parameter instead of vnn (This used to be ctdb commit 93dd4fba2e0fa6a011d15406652836785a974880)	2007-09-04 10:42:20 +10:00
Ronnie Sahlberg	157be530dd	change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn (This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f)	2007-09-04 10:38:48 +10:00
Ronnie Sahlberg	211b497818	change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)	2007-09-04 10:33:10 +10:00
Ronnie Sahlberg	6f693bbcbd	change server_id.vnn to server_id.pnn (This used to be ctdb commit 26f2ee2b754a9271454412f05111a19b3013c6eb)	2007-09-04 10:21:51 +10:00
Ronnie Sahlberg	583b6e6ba6	change ctdb_get_vnn to ctdb_get_pnn (This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)	2007-09-04 10:18:44 +10:00
Ronnie Sahlberg	fc9d39c3a6	change ctdb_validate_vnn to ctdb_validate_pnn (This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)	2007-09-04 10:09:58 +10:00
Ronnie Sahlberg	eb4cf6a686	change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)	2007-09-04 10:06:36 +10:00
Ronnie Sahlberg	12ebb74838	change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)	2007-09-04 09:50:07 +10:00
Ronnie Sahlberg	7f02e16143	add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505)	2007-08-27 10:31:22 +10:00
Ronnie Sahlberg	801bdbdc80	add a control to pull the server id list off a node (This used to be ctdb commit 38aa759aa88a042c31b401551f6a713fb7bbe84e)	2007-08-26 10:57:02 +10:00
Ronnie Sahlberg	6681da31df	add an initial implementation of a service_id structure and three controls to register/unregister/check a server id. a server id consists of TYPE:VNN:ID where type is specific to the application. VNN is the node where the serverid was registered and ID might be a node unique identifier such as a pid or similar. Clients can register a server id for themself at the local ctdb daemon. When a client dissappears or when the domain socket connection for the client drops then any and all server ids registered across that domain socket will also be automatically removed from the store. clients can register as many server_ids as they want at the same time but each TYPE:VNN:ID must be globally unique. Clients have the option of explicitely unregister a server id by using the UNREGISTER control. Registration and unregistration can only be done by clients to the local daemon. clients can not register their server id to a remote node. clients can check if a server id does exist on any ctdb node in the network by using the check control (This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)	2007-08-24 15:53:41 +10:00
Ronnie Sahlberg	495a6403da	change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)	2007-08-24 10:42:06 +10:00
Ronnie Sahlberg	f854b5f876	try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)	2007-08-23 19:27:09 +10:00
Ronnie Sahlberg	8fd3df2553	hang the ctdb_req_control structure off the ctdb_client_control_state struct so that if we timeout a control we can print debug info such as what opcode failed and to which node we dont need the *status parameter to ctdb_client_control_state create async versions of the getrecmaster control pass a memory context to getrecmaster (This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)	2007-08-23 13:00:10 +10:00
Ronnie Sahlberg	f6e0336b23	create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda)	2007-08-22 12:38:31 +10:00
Ronnie Sahlberg	8b06fc7284	change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)	2007-08-21 17:25:15 +10:00
Andrew Tridgell	46639ac19e	merged new event script calling code from ronnnie (This used to be ctdb commit bbacad61b3eee4276ffe44ed2a23949aca8152cf)	2007-08-20 11:10:30 +10:00
Ronnie Sahlberg	3b9d50f3ee	change the now rather small /etc/ctdb/events script into a service specific script /etc/ctdb/events.d/00.ctdb get rid of CTDB_EVENTS_SCRIPT and --event-script (This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)	2007-08-15 15:01:31 +10:00
Ronnie Sahlberg	4023576e50	call the service specific event scripts directly from the forked child instead for from /etc/ctdb/events so that we can get better debugging output in the logs when something fails in the scripts (This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)	2007-08-15 14:44:03 +10:00
Andrew Tridgell	f03defff70	merge changes needed for samba4 (This used to be ctdb commit a7f80f78cd62401b3516da3640bf24d6362db872)	2007-08-15 09:03:58 +10:00
Ronnie Sahlberg	fca90ce3c3	updated ctdb tickle management there is an array for each node/public address that contains tcp tickles we send a TCP_ADD as a broadcast to all nodes when a client is added if tcp tickles are removed, they are only removed immediately from the local node. once every 20 seconds a node will push/broadcast out the tickle list for all public addresses it manages. this will remove any deleted tickles from the remote nodes (This used to be ctdb commit e3c432a915222e1392d91835bc7a73a96ab61ac9)	2007-07-20 15:05:55 +10:00
Ronnie Sahlberg	7b17afdfcd	change the tickle list from one global list into an array per public ip/node once we have started sending all tickles for a specific ip delete the entire array so that the tickles dont remain forever in the ctdb server add a control to send the full list of every tickle that is registered for a particular public ip/node (This used to be ctdb commit d0eee33e44d3f8e26debbec21d41e2cbdbb520e6)	2007-07-20 10:06:41 +10:00
Ronnie Sahlberg	f09566a81a	add a private_data field to the killtcp structure and let the system specific routines populate it as it see fit when creating a capture socket. pass this structure to read_tcp and close capture socket as parameter (This used to be ctdb commit 79bbfcfb2223889126fe307d5bbfd24917da07ee)	2007-07-13 17:07:10 +10:00
Andrew Tridgell	1e14ecd176	- merge from ronnie - cleaner handling of system capture socket (This used to be ctdb commit d194a41a71b8466d0726dcbae3970a86386fcb3c)	2007-07-13 11:31:18 +10:00
Andrew Tridgell	d2a5af7eb8	fully save/restore scheduler parameters (This used to be ctdb commit 59408eabe7515d49a6eef3b6fb2590a1cd1df956)	2007-07-13 09:35:46 +10:00
Andrew Tridgell	fc73bc5c24	added --nosetsched option to ctdbd (This used to be ctdb commit 4cbbb88c1735c7d112e751e22da1c1c69e09bf4a)	2007-07-13 08:47:02 +10:00
Ronnie Sahlberg	a650497680	as an optimization for when we want to send multiple tickles at a time let the caller create the sending socket and use a single socket instead of one new one for each tickle. pass a sending socket to ctdb_sys_send_tcp() ctdb_sys_kill_tcp is not longer used so remove it set the socketflags for close on exec and nonblocking in the helper that creates the sockets instead of in the caller add a helper to create a sending socket to send tickles from (This used to be ctdb commit 469f3fb238a0674a2b48fdf1a7e657e32428178a)	2007-07-12 09:22:06 +10:00
Ronnie Sahlberg	823b7d4a5f	rename killtcp->fd to killtcp->capture_fd we might want to have two sockets attached to the killtcp structure one for capturing and a second one for sending so we dont have to create a new socket for each tickle we want to send (This used to be ctdb commit b3e82ec38047bbec1edfd88ade264077d4cbd2ee)	2007-07-12 08:52:24 +10:00
Ronnie Sahlberg	76ab80104a	make the ctdb tool use the killtcp control in the daemon instead of calling killtcp directly (This used to be ctdb commit d21e3e9cf11bdcba6234302e033d6549c557dd69)	2007-07-12 08:30:04 +10:00
Ronnie Sahlberg	1ed0c3a9f7	add daemon code for the new kill_tcp control (This used to be ctdb commit 8fe4ae62255ecb2db36bea736ff17409ba6614c5)	2007-07-11 18:24:25 +10:00
Ronnie Sahlberg	e4db03f7e6	add a ctdb_ prefix to two public functions (This used to be ctdb commit 32adee5426aa75ddcd4d648ef326ed03d5ff5c46)	2007-07-11 18:13:03 +10:00
Ronnie Sahlberg	aa080f66d9	first cut at a better and more scalable socketkiller that can kill multiple connections asynchronously using one listening socket (This used to be ctdb commit 22bb44f3d745aa354becd75d30774992f6c40b3a)	2007-07-11 17:43:51 +10:00
Ronnie Sahlberg	0c44e0ad46	add a ctdb_kill_tcp_callback() that will perform a kill tcp using a background process (This used to be ctdb commit dcfcaacff56347d94c244512eb72219b05ef9c3d)	2007-07-11 12:33:14 +10:00
Andrew Tridgell	f97f2946d2	minor back-merge from samba4 (This used to be ctdb commit c591f9b2d2847f440702e7264c7da2fd6d69f4be)	2007-07-10 18:13:47 +10:00
Andrew Tridgell	32de198fd3	update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109)	2007-07-10 15:29:31 +10:00
Ronnie Sahlberg	e1f774a95b	use the official iana number for ctdb and not 9001 (This used to be ctdb commit f72aeb5eadb0bda97d882b5a27562bfa1bb5f5a2)	2007-07-06 15:29:03 +10:00
Andrew Tridgell	bdf01ed7c0	- neaten up the command line for killtcp - split out the event script code into a separate module - get rid of the separate takeover directory (This used to be ctdb commit 8ea2c923a3e2464200ff79bf2c3f1f89e6a93ad4)	2007-07-04 16:51:13 +10:00
Ronnie Sahlberg	a52f6760f3	add a new ctdb_sys_kill_tcp() function that kills (RST) the specified connection (This used to be ctdb commit 11a972f37d4ca7daf052b3b502620af05699bec4)	2007-07-04 13:53:22 +10:00
Ronnie Sahlberg	8f0a00b72b	change the signature for ctdb_sys_send_ack() to ctdb_sys_send_tcp() to make it possible to provide which seq/ack numbers to use and also whether the RST flag should be set. update all callers to the new signature (This used to be ctdb commit b694d7d4a6f3865a18bea8f484ba690e4ae7546c)	2007-07-04 13:32:38 +10:00
Ronnie Sahlberg	1cd8bc0c64	add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf)	2007-07-04 08:36:59 +10:00
Andrew Tridgell	6399cf9542	added code to kill registered clients on a IP release (This used to be ctdb commit ca0243b544987ce0618a99ac87b4abf598991e93)	2007-06-19 03:54:06 +10:00
Andrew Tridgell	732353de5f	- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace)	2007-06-17 23:31:44 +10:00
Andrew Tridgell	044a2e04c4	- send tcp info to all connected nodes, not just vnnmap nodes - use a non-blocking freeze when banned - release all IPs when banned (This used to be ctdb commit 070e85e532b33b792f85c3e72eee205d906aaf85)	2007-06-10 08:46:33 +10:00
Andrew Tridgell	18ae6e56f0	propogate flag changes to all connected nodes (This used to be ctdb commit 711d1f7e20f1e98caaf08a57df0b1825ff6e97a0)	2007-06-09 21:58:50 +10:00
Andrew Tridgell	06a71762a4	some #include cleanups (This used to be ctdb commit 1a07d87122d51a40cd8ad5fe13533298c26857cb)	2007-06-07 22:26:27 +10:00
Andrew Tridgell	96861466b7	there are now far too many controls for the controls statistics fields to be useful (This used to be ctdb commit f5e188fc7e13b55b6b4081dcc74ea9614a76f9bb)	2007-06-07 18:07:38 +10:00
Andrew Tridgell	3e4d7bef23	get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93)	2007-06-07 18:05:25 +10:00
Andrew Tridgell	23bf62fe30	added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad)	2007-06-07 16:34:33 +10:00
Andrew Tridgell	2ed57a9ae1	implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c)	2007-06-07 15:18:55 +10:00
Andrew Tridgell	9754d16d48	merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a)	2007-06-07 11:15:22 +10:00
Ronnie Sahlberg	9ff733c784	add a control to permanently enable/disable a node (This used to be ctdb commit d66fdba16ca22f62ddac6882a17614879b08a798)	2007-06-07 09:16:17 +10:00
Andrew Tridgell	81fad8636f	added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1)	2007-06-06 13:45:12 +10:00
Andrew Tridgell	af8834dd02	added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f)	2007-06-06 10:25:46 +10:00
Andrew Tridgell	be3a00bd73	clean out some more cruft (This used to be ctdb commit ad16c5fe2748b48a6f6c79976359d56d9bed33f4)	2007-06-05 17:57:07 +10:00
Andrew Tridgell	ac55bc4166	first step in health monitoring of cluster nodes. When not healthy they will be marked disabled (This used to be ctdb commit d3dbd9fc4db21632075b56fc52cf95435c63374a)	2007-06-05 17:43:19 +10:00
Andrew Tridgell	ee546dec81	merge from ronnie (This used to be ctdb commit 531d7ea7aca3116e78a4502a1c8b75a3fb764a4f)	2007-06-04 22:13:59 +10:00
Ronnie Sahlberg	4be9a44ba7	add a control that lists all public ip addresses and which node that currently serves it (This used to be ctdb commit db9b89dc423b31079e5502323e5fd2bbaf82e1e9)	2007-06-04 21:11:51 +10:00
Andrew Tridgell	39ced972ae	make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353)	2007-06-04 20:22:44 +10:00
Ronnie Sahlberg	1ee8989bd4	merge from tridge (This used to be ctdb commit 3bfede5d46dba5a3654dad9205534391bc339461)	2007-06-04 20:10:53 +10:00
Ronnie Sahlberg	79b54a624e	change the takoverip/releaseip controls to pass a structure containing both the nodenumber and the id of the node that has taken over that address in addition to the public address itself so that all nodes can learn which node is currently hosting each of the public addresses (This used to be ctdb commit 53e9ff790387b85a36fa9c3c44cd4c95cbdf35da)	2007-06-04 20:07:37 +10:00
Andrew Tridgell	dbb2ec43dd	added tunables settable using ctdb command line tool (This used to be ctdb commit 73d440f8cb19373cfad7a2f0f0ca4f963c57ff29)	2007-06-04 19:53:19 +10:00
Andrew Tridgell	f1d81386e6	- start moving tunable variables into their own structure - fixed the test scripts to use a separate dbdir (This used to be ctdb commit 396752e8908c48373564e915e2d49cfc9ff61eba)	2007-06-04 17:46:37 +10:00
Andrew Tridgell	a57991c0eb	remove some cruft thats not needed any more (This used to be ctdb commit c4308805b997740b77e058c1a14b84cb400a7c30)	2007-06-04 17:23:55 +10:00
Ronnie Sahlberg	a3e4e204dc	add the ip address to the nodemap structure we pull from a server and display the physical address of a node when we do a ctdb status (This used to be ctdb commit 660bf30db713f0680acd3f74275ad603b35a0c24)	2007-06-04 13:26:07 +10:00
Andrew Tridgell	c5e4ce360a	make test now works again (This used to be ctdb commit 439d87bbb9840f82937e51aff4fe2b80160878c6)	2007-06-02 13:31:36 +10:00
Andrew Tridgell	ebf12646cf	- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869)	2007-06-02 11:36:42 +10:00
Andrew Tridgell	4f72a202d9	- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b)	2007-06-02 10:03:28 +10:00
Andrew Tridgell	27b0e323e6	disable realtime scheduler in event scripts (This used to be ctdb commit 56225ac6fdfe754289bc7d5e0fc8d21c81a7aa8e)	2007-06-02 08:46:49 +10:00
Andrew Tridgell	5e5701a7b8	- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9)	2007-06-02 08:41:19 +10:00
Andrew Tridgell	7db1d04d5c	make the running of the takeover and release event scripts async, to prevent outages due to slow scripts (This used to be ctdb commit 4189be97eee7ab2a50335c860f2fcd9566667d01)	2007-06-01 19:05:41 +10:00
Andrew Tridgell	bf3b740a1b	ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960)	2007-05-31 13:50:53 +10:00
Andrew Tridgell	1e72af9c51	close sockets when we exec scripts (This used to be ctdb commit 0fac2164db4279db2d7d376a34be05b890304087)	2007-05-30 15:43:25 +10:00
Andrew Tridgell	8ed48aac51	don't start the transport connecting to the other nodes until after the startup event script has run (This used to be ctdb commit afca3cc74211aa2e18b1f74d36b2add8dffcfdc7)	2007-05-30 13:26:50 +10:00
Andrew Tridgell	2d9e0ad56a	use /etc/services for ctdb (This used to be ctdb commit 64bf6964ff33320c5351337c7f8ed4da5bd71275)	2007-05-29 15:15:00 +10:00
Andrew Tridgell	1140d5a20a	fixed more warnings on 64 bit boxes (This used to be ctdb commit 2f6eae476203f8a8b28e083553204c01f224c8a5)	2007-05-29 13:58:41 +10:00
Andrew Tridgell	bc891232b6	fixed some debug messages (This used to be ctdb commit 037f0149c0c0e65af0a1669b9a52586129e4b48f)	2007-05-29 13:48:30 +10:00
Andrew Tridgell	edcaa0d6a0	clean shutdown in ctdb - release all our IPs (This used to be ctdb commit 2f196cb6a86eb85205d7de1c4cadd4e1e701c06f)	2007-05-29 13:33:59 +10:00
Andrew Tridgell	ead091449b	call the event script on recovery too (This used to be ctdb commit 8c43a91cbd6e502c93bd6cc51df1272eae426709)	2007-05-29 12:55:24 +10:00
Andrew Tridgell	dfadb60318	- moved ctdbd specific options to ctdbd.c from cmdline.c - allow a event script to be specified that will take IPs, release IPs, and handle recovery in system specific ways - redirect stderr in subcommands to the log (This used to be ctdb commit de0fc9ba370db781f9c46406ed180c8211946c7a)	2007-05-29 12:49:25 +10:00
Andrew Tridgell	ccf4d78e04	- renamed ctdb_control utility to ctdb - use -n to specify node number in ctdb utility - change 'ctdb status' to 'ctdb statistics' - added 'ctdb status' which shows status - added netmask to public IPs, so you don't try a takeover on a foreign network - cleaned up tools/ctdb_control.c a lot - generate usage message at runtime (This used to be ctdb commit 28de71c03ace7d32a9fd9882fabbd5d668b97656)	2007-05-29 12:16:59 +10:00
Andrew Tridgell	9cc3ce8554	automatic cleanup of tcp tickle records (This used to be ctdb commit ede79b571bf89b89f1b8394f262ca0689f8c65f3)	2007-05-28 00:34:40 +10:00
Andrew Tridgell	d41290fbae	added code to ctdb to send a tcp 'tickle' ack when we takeover an IP. A raw tcp ack is sent for each tcp connection held by clients before the IP takeover. These acks have a deliberately incorrect sequence number, and should cause the windows client to send its own ack which will in turn cause a tcp reset and thus cause windows clients to much more quickly reconnect to the new node. (This used to be ctdb commit eef38bfe8461b47489d169c61895d6bb8a8f79a1)	2007-05-27 15:26:29 +10:00
Andrew Tridgell	647540253e	tweak timeouts (This used to be ctdb commit 54a90797469f56d796efd82e9294efff3c5dabcc)	2007-05-27 09:43:25 +10:00
Andrew Tridgell	cc4d8102cd	moved system specific ip code to system.c (This used to be ctdb commit 9de9e4ccda9665108baac12a8716b189d26340b1)	2007-05-26 14:01:08 +10:00
Andrew Tridgell	9e61a5bd77	send a message to clients when an IP has been released (This used to be ctdb commit 8b7ab0b00253462593d368052c2cb10a385b4e63)	2007-05-26 00:05:30 +10:00
Andrew Tridgell	31053286c5	keep sending ARPs for 2 minutes, every 5 seconds (This used to be ctdb commit d5223f2eed4a762b93a101c720286568578ce7ed)	2007-05-25 21:27:26 +10:00
Andrew Tridgell	7a9e40b288	consider a node dead after 6 seconds, not 15 (This used to be ctdb commit b055907f0bd2fa0e83bd84e49039fa868905b941)	2007-05-25 20:00:06 +10:00

1 2 3 4 5 ...

454 Commits