samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	e6170b5389	add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)	2009-06-01 14:18:34 +10:00
Sumit Bose	11988fc77a	structure member node_list_file is not used anywhere (This used to be ctdb commit 0e84ea23d1d998d4d4ac7d8a858b3d8294f056cb)	2009-05-21 11:16:43 +10:00
Sumit Bose	9171a7784c	structure member logfile is not used anywhere (This used to be ctdb commit 4f86c991812c2d0bddbe3de9a9906cf5df118cd4)	2009-05-21 11:15:43 +10:00
Ronnie Sahlberg	98a54c4675	Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon. Log this in "ctdb statistics". Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file. (This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)	2009-05-14 10:33:25 +10:00
root	af25fa38f3	fixed a problem with clients disconnecting during a traverse When a client (such as smbstatus) is killed, it may have outstanding traverse children on remote nodes. We need to catch the client disconnect in ctdbd and send a control to all nodes telling them to kill those outstanding traverse children. (This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)	2009-05-06 07:32:25 +10:00
root	6793f077a8	Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery (This used to be ctdb commit 329df9e47e6ca8ab5143985a999e68f37c6d88a5)	2009-05-01 01:17:59 +10:00
Ronnie Sahlberg	38ea6708dd	add a tuneable RecoveryDropAllIPs so it is possible to control after how long a node that has been stuck in recovery will wait until it will yield all public addresses. this now defaults to 60 seconds This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses (This used to be ctdb commit 70f21428c9eec96bcc787be191e7478ad68956dc)	2009-04-24 18:28:08 +10:00
Ronnie Sahlberg	d94917ec49	Change the (dodgy) seqnumfrequency variable to have ms resolution instead of second resolution. Rename the variable to SeqnumInterval for 1, it is an interval and not a 1/interval unit 2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable this is a real dodgy variable (This used to be ctdb commit 68eac459e5d2b6b534f72821036675ffe5d7a350)	2009-04-01 17:21:38 +11:00
Ronnie Sahlberg	297ab50173	remove a prototype for a function no longer used (This used to be ctdb commit 9ac9745ba9296d01e3b18148ae8c3240e51cf090)	2009-04-01 17:13:48 +11:00
Ronnie Sahlberg	ad40ee25f9	add a mechanism where the ctdb daemon will run a usercontrolled script when the node status changes to/from UNHEALTHY state. This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes. (This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)	2009-03-31 14:23:31 +11:00
Ronnie Sahlberg	689f76f0b0	Merge branch 'obnox' (This used to be ctdb commit 972036a5d510fb9b399f1ee34a8861dee4221267)	2009-03-24 17:49:55 +11:00
Ronnie Sahlberg	7265c713db	we need to set the port properly in the parse_ip helper (This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)	2009-03-24 13:45:11 +11:00
Michael Adam	a83ed1d743	Merge commit 'ctdb-ronnie/master' (This used to be ctdb commit 39a972b0d6d0d70282c25c54a124b67431467e77)	2009-03-23 10:07:44 +01:00
root	629d5ee1fa	add a new command "ctdb scriptstatus" this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript. If an eventscript timedout or returned an error we also show the output from the eventscript. Example : [root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus 6 scripts were executed last monitoring cycle 00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009 10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009 20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009 50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009 OUTPUT:ERROR: Samba tcp port 445 is not responding Add a new helper function "switch_from_server_to_client()" which both the recovery daemon can use as well as in the child process we start for running the actual eventscripts. Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon. (This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)	2009-03-23 19:07:45 +11:00
root	dc05c1b80c	create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a)	2009-03-23 12:37:30 +11:00
Michael Adam	839dec1b12	move common code of system_linux.c and system_aix.c into new system_common.c Michael (This used to be ctdb commit 124874847e5e03ce2a44bddfe778f01dfb0a7a03)	2009-02-28 03:08:31 +01:00
Michael Adam	1821d5619b	remove include <netinet/in.h> from public ctdb.h This is not portable. The ctdb build includes the necessary headers from includes.h. And users of ctdb should cope with including the necessary prerequisite headers themselves. Michael (This used to be ctdb commit fedc6983f5dee39152e6f400f89a3e07eab57f0c)	2009-01-29 13:22:02 +01:00
Michael Adam	70aa6445d6	Fix the build on AIX: sys/socket.h needs to be included before ctdb.h (for struct sockaddr to be defined) Thanks to William Jojo <w.jojo@hvcc.edu> for reporting. Michael (This used to be ctdb commit 7558bca1e99884c02747adb7cbea799d04ee24d5)	2009-01-29 10:24:58 +01:00
Michael Adam	3cca0f75e4	Fix treatment of link local ipv6 addresses: set the scope id. metze / Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)	2009-01-19 22:50:53 +01:00
root	321866dbba	finish the ipv6 support. allow clients to register either ipv4 or ipv6 client connections to the tickles list (This used to be ctdb commit d9b44d7c3255b0fd7359b9afeb613e6ff4c4eaac)	2009-01-13 16:17:20 +11:00
Ronnie Sahlberg	edb7241c05	redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)	2008-12-02 13:26:30 +11:00
Ronnie Sahlberg	a782bdbacd	inew version 1.0.66 ddwq (This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5)	2008-11-24 19:06:02 +11:00
Ronnie Sahlberg	94a56ea410	reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)	2008-11-20 12:43:18 +11:00
Ronnie Sahlberg	e1b0cea427	add control and logging of very high latencies. log the type of operation and the database name for all latencies higher than a treshold (This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)	2008-10-30 12:49:53 +11:00
Ronnie Sahlberg	b9bd20ce55	add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0)	2008-10-22 11:04:41 +11:00
Ronnie Sahlberg	ce66008e08	specify a "script log level" on the commandline to set under which log level any/all output from eventscripts will be logged as (This used to be ctdb commit cdc79d4f22f1a6aec5c34115969421f93663932a)	2008-10-17 07:56:12 +11:00
Ronnie Sahlberg	260718e017	update the client side of getnodemap and getpublicips controls to fallback to the old-style ipv4-only controls if the new-style ipv4/ipv6 control fails. this allows a 1.0.59+ (ipv4/ipv6) ctdb daemon being recmaster to be compatible with pre-1.0.59 versions of ctdb that are ipv4 only. (This used to be ctdb commit 8e912abc2c68f5fe7b06c600ba6fec1a6900127c)	2008-10-15 00:24:44 +11:00
Ronnie Sahlberg	cb300382b0	update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an older ipv4-only version of these controls. We need this so that we are backwardcompatible with old versions of ctdb and so that we can interoperate with a ipv4-only recmaster during a rolling upgrade. (This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)	2008-10-14 10:40:29 +11:00
Ronnie Sahlberg	a3bbe238c9	The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d)	2008-09-17 14:17:41 +10:00
Ronnie Sahlberg	6474f3278d	additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)	2008-09-09 13:44:46 +10:00
Ronnie Sahlberg	a35fa0aa8f	rename ctdb_tcp_client back to the original name ctdb_control_tcp (This used to be ctdb commit 4d1c0418cfe6170bc081684dbe45908a5d285f0b)	2008-08-27 10:24:35 +10:00
Ronnie Sahlberg	5193caec6d	make the function to canonicalize a sockaddr structure public (This used to be ctdb commit 1157d61a0bc557d8ffc453c518dfc48473492bfd)	2008-08-20 11:58:27 +10:00
Ronnie Sahlberg	ef997d344f	initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)	2008-08-19 14:58:29 +10:00
Andrew Tridgell	aa1bc0abba	added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell the difference between a initial commit attempt and a retry, which allows us to get the persistent updates counter right for retries (This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5)	2008-08-08 13:11:28 +10:00
Andrew Tridgell	5a0249d34c	return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113)	2008-08-08 09:58:49 +10:00
Ronnie Sahlberg	b9d8bb23af	remove the reclock file we store pnn counts in. This file creates additional locking stress on the backend filesystem and we may not need it anyway. (This used to be ctdb commit 84236e03e40bcf46fa634d106903277c149a734f)	2008-08-06 11:52:26 +10:00
Andrew Tridgell	237e2f5409	new prototypes (This used to be ctdb commit 71d9d24abae62f70acbd7c1ded8af0b817607c2a)	2008-07-30 19:58:27 +10:00
Andrew Tridgell	98502135e7	added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692)	2008-07-30 19:57:00 +10:00
Andrew Tridgell	abe0232818	rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106)	2008-07-30 14:24:56 +10:00
Ronnie Sahlberg	1bfcca524d	From Michael Adams, change one element from private to private_data Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 0de79352c9b36c118e36905f08ebbe38ecbb957e)	2008-07-22 09:07:42 +10:00
Ronnie Sahlberg	6eb4e46fe1	Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)	2008-07-17 13:50:55 +10:00
Ronnie Sahlberg	ab8535eaa5	make LVS a capability so that we can see which nodes are configured with LVS and which are not using LVS. "ctdb getcapabilities" (This used to be ctdb commit 172d01fb34f032e098b1c77a7b0f17bf11301640)	2008-07-10 10:37:22 +10:00
Andrew Tridgell	9999f18369	an extraordinarily ugly patch! This is a hack to allow backtraces under valgrind to show what opcode is getting uninitialised bytes (This used to be ctdb commit 67bb12c8f0af5914efb44b76bc6ddbb11fc0fcdf)	2008-07-04 18:00:24 +10:00
Andrew Tridgell	8be67e0e09	CTDB_NO_MEMORY_VOID() needs to return on error (This used to be ctdb commit 6d21fd57bedffce2298ce7fe4c7d889c858ba7fa)	2008-07-04 16:58:29 +10:00
Ronnie Sahlberg	ef769e7237	track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429)	2008-07-02 13:55:59 +10:00
Ronnie Sahlberg	05b50ebe0a	print the opcode when an async callback detects an error (This used to be ctdb commit 423934629704683d3a3042570577fb4e04b17a6d)	2008-07-02 12:21:53 +10:00
Ronnie Sahlberg	779468ab3f	if the event scripts hangs EventScriptsBanCount consecutive times in a row the node will ban itself for the default recovery ban period (This used to be ctdb commit 7239d7ecd54037b11eddf47328a3129d281e7d4a)	2008-06-13 13:18:06 +10:00
Ronnie Sahlberg	4b6b094860	add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2)	2008-06-12 16:53:36 +10:00
Ronnie Sahlberg	d8433cacb2	first cut to convert takeover_callback_state{} to use ctdb_sock_addr instead of sockaddr_in (This used to be ctdb commit 5444ebd0815e335a75ef4857546e23f490a22338)	2008-06-04 17:12:57 +10:00
Ronnie Sahlberg	7d39ac131b	convert handling of gratious arps and their controls and helpers to use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6 (This used to be ctdb commit 86d6f53512d358ff68b58dac737ffa7576c3cce6)	2008-06-04 15:13:00 +10:00
Ronnie Sahlberg	1c88f422d5	add a parameter for the tdb-flags to the client function ctdb_attach() so that we can pass TDB_NOSYNC when we attach to a persistent database and want fast unsafe writes instead of slow but safe tdb_transaction writes. enhance the ctdb_persistent test suite to test both safe and unsafe writes (This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4)	2008-06-04 10:46:20 +10:00
Ronnie Sahlberg	ceaf488f05	do persistent writes in a child process (This used to be ctdb commit 2da3d1f876f5d654f849af8a3e588f5a61300c3d)	2008-05-28 13:04:25 +10:00
Ronnie Sahlberg	ed2cf0291d	second try for safe transaction stores into persistend tdb databases for stores into persistent databases, ALWAYS use a lockwait child take out the lock for the record and never the daemon itself. (This used to be ctdb commit 7fb6cf549de1b5e9ac5a3e4483c7591850ea2464)	2008-05-22 12:47:33 +10:00
Ronnie Sahlberg	909ff219e0	Start implementing support for ipv6. This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well. Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close) a raw socket of the appropriate type inside ctdb_sys_send_tcp(). We know which type of socket v4/v6 to use based on the sin_family of the destination address. Since ctdb_sys_send_tcp() opens its own socket we no longer nede to pass a socket descriptor as a parameter. Get rid of this redundant parameter and fixup all callers. (This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)	2008-05-14 15:47:47 +10:00
Ronnie Sahlberg	2bc0e5a69f	add a new container to hold a socketaddr for either ipv4 or ipv6 (This used to be ctdb commit 93b98838824fae5f47e4ed6b95ae9e4e7597bec3)	2008-05-14 15:40:44 +10:00
Ronnie Sahlberg	b8eb5925cf	Try to use tdb transactions when updating a record and record header inside the ctdb daemon. If a transaction could be started, do safe transaction store when updating the record inside the daemon. If the transaction could not be started (maybe another samba process has a lock on the database?) then just do a normal store instead (instead of blocking the ctdb daemon). The client can "signal" ctdb that updates to this database should, if possible, be done using safe transactions by specifying the TDB_NOSYNC flag when attaching to the database. The TDB flags are passed to ctdb in the "srvid" field of the control header when attaching using the CTDB_CONTROL_DB_ATTACH_PERSISTENT. Currently, samba3.2 does not yet tell ctdbd to handle any persistent databases using safe transactions. If samba3.2 wants a particular persistent database to be handled using safe transactions inside the ctdbd daemon, it should pass TDB_NOSYNC as the flags to the call to attach to a persistent database in ctdbd_db_attach() it currently specifies 0 as the srvid (This used to be ctdb commit 8d6ecf47318188448d934ab76e40da7e4cece67d)	2008-05-12 13:37:31 +10:00
Ronnie Sahlberg	92b61cd7d5	Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6)	2008-05-06 15:42:59 +10:00
Ronnie Sahlberg	a9c45f9513	Add a capabilities field to the ctdb structure Define two capabilities : can be recmaster can be lmaster Default both capabilities to YES Update the ctdb tool to read capabilities off a node (This used to be ctdb commit 50f1255ea9ed15bb8fa11cf838b29afa77e857fd)	2008-05-06 10:02:27 +10:00
Ronnie Sahlberg	0e1a20b603	Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""" remove the transaction stuff and push so that the git tree will work This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b. (This used to be ctdb commit 876d3aca18c27c2239116c8feb6582b3a68c6571)	2008-04-10 15:59:51 +10:00
Ronnie Sahlberg	39f119b42c	Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"" This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603. (This used to be ctdb commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b)	2008-04-10 14:57:41 +10:00
Ronnie Sahlberg	9684befa16	Revert "- accept an optional set of tdb_flags from clients on open a database," This reverts commit 49330f97c78ca0669615297ac3d8498651831214. (This used to be ctdb commit 171d1d71ef9f2373620bd7da3adaecb405338603)	2008-04-10 14:45:45 +10:00
Andrew Tridgell	dc15a9c1f6	- accept an optional set of tdb_flags from clients on open a database, thus allowing the client to pass through the TDB_NOSYNC flag - ensure that tdb_store() operations on persistent databases that don't have TDB_NOSYNC set happen inside a transaction wrapper, thus making them crash safe (This used to be ctdb commit 49330f97c78ca0669615297ac3d8498651831214)	2008-04-10 15:25:48 +10:00
Ronnie Sahlberg	e8e67ef576	add a mechanism to force a node to run the eventscripts with arbitrary arguments ctdb eventscript "command argument argument ..." (This used to be ctdb commit 118a16e763d8332c6ce4d8b8e194775fb874c8c8)	2008-04-02 11:13:30 +11:00
Ronnie Sahlberg	27a7f854f5	add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)	2008-04-01 15:34:54 +11:00
Ronnie Sahlberg	0d7b34c9e5	Add two new controls to add/delete public ip address from a node at runtime. The controls only modify the runtime setting of which public addresses a node can server and does not modify /etc/ctdb/public_addresses. To make the change permanent you also need to edit /etc/ctdb/public_addresses manually. After ip addresses have been added/deleted you need to invoke a recovery for the ip addresses to be redistributed. (This used to be ctdb commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c)	2008-03-27 09:23:27 +11:00
Ronnie Sahlberg	2863d2cfd1	From M Dietz, Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago (This used to be ctdb commit 8477f6a079e2beb8c09c19702733c4e17f5032fe)	2008-03-25 08:27:38 +11:00
Ronnie Sahlberg	d53424731f	in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb. This can cause a memory leak if the call is terminated before we have managed to respond to the client. (and the call is talloc_free()d but the data is still hanging off ctdb) instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak. In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc(). This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so we must change all creations of a ctdb_call into explicitely creating it through talloc() (This used to be ctdb commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f)	2008-03-19 13:54:17 +11:00
Ronnie Sahlberg	74d57f8d51	Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53)	2008-03-13 07:53:29 +11:00
Ronnie Sahlberg	a89ed0fdc2	add a new tunable 'NoIPFailback' when this tunable is set, ip addresses will only be failed over when a node fails. And only those ip addresses held by the failed node will be reallocated in the cluster. When a node becomes active again, this will not lead to any failback of ip addresses. This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes. This tuneable can NOT be active at the same time as DeterministicIPs are used. (This used to be ctdb commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4)	2008-03-03 12:52:16 +11:00
Ronnie Sahlberg	f6f7f54bd6	add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287)	2008-03-03 09:19:30 +11:00
Ronnie Sahlberg	e0036942bc	add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09)	2008-02-29 12:37:42 +11:00
Ronnie Sahlberg	4adeafef11	add a control to get the name of the reclock file from the daemon (This used to be ctdb commit 9effb22cc1616d684352d7ebabb359e69adb0f52)	2008-02-29 10:03:39 +11:00
Ronnie Sahlberg	7bc8007f93	add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY. Use with caution. (This used to be ctdb commit c20293360db67f9876b0c84e5e9e12a5868964cb)	2008-02-22 10:33:09 +11:00
Ronnie Sahlberg	39539f6044	Add a new parameter to /etc/sysconfig/ctdb CTDB_START_AS_DISABLED="yes" and command line argument --start-as-disabled When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses. The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses. Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster. (This used to be ctdb commit b93d29f43f5306c244c887b54a77bca8a061daf2)	2008-02-22 09:42:52 +11:00
Ronnie Sahlberg	9f99b44fd1	to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c)	2008-02-19 14:44:48 +11:00
Ronnie Sahlberg	3f56526037	Specify and print debuglevels by name and not by number (This used to be ctdb commit 79ad830294b8b677fbd0c5ad7ed6fbde71f74f8d)	2008-02-05 10:26:23 +11:00
Andrew Tridgell	9d6ac0cf55	added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)	2008-02-04 17:44:24 +11:00
Andrew Tridgell	146d4b0db7	merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2)	2008-01-29 13:59:28 +11:00
Ronnie Sahlberg	9055978b46	add a ctdb uptime command that prints when ctdb was started and when the last recovery occured (This used to be ctdb commit b86e8ccbdac044bb949c4fc2ebb27635126272a9)	2008-01-17 11:33:23 +11:00
Andrew Tridgell	b62b7fcde8	added syslog support, and use a pipe to catch logging from child processes to the ctdbd logging functions (This used to be ctdb commit 1306b04cd01e996fd1aa1159a9521f2ff7b06165)	2008-01-16 22:03:01 +11:00
Ronnie Sahlberg	5b7838d768	ctdb_control_send() does not need to take an outdata parameter remove the outdata parameter from the function and all callers (This used to be ctdb commit e3951337f8df2ae19cce61c954036590c7a03582)	2008-01-16 10:23:26 +11:00
Ronnie Sahlberg	ba31feaec0	split node health monitoring and checking for connected/disconnected nodes into two separate files. move the monitoring of keepalives for detecting connected/disconnected remote nodes into ctdb_keepalive.c (This used to be ctdb commit 23a57b20c314d5f11a433cf251eb9d9de743849a)	2008-01-15 08:42:12 +11:00
Andrew Tridgell	b866a147d2	get rid of monitor_retry as well (This used to be ctdb commit c957cf9c1d99d5d3f4ca726f7a867c829660a2b7)	2008-01-10 14:49:43 +11:00
Andrew Tridgell	538f519dba	exponential backoff in health monitoring for faster startup (This used to be ctdb commit 1b04a1f675f73b48366ba98803a58c3d8df1b6e1)	2008-01-10 14:40:56 +11:00
Andrew Tridgell	3b3fceacbe	block alarm signals during critical sections of vacuum (This used to be ctdb commit cfb14ae76f00f10d27b56c034b2247ab12d63065)	2008-01-10 09:43:14 +11:00
Andrew Tridgell	1c91398aef	ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b)	2008-01-08 21:28:42 +11:00
Andrew Tridgell	96100fcae6	added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4)	2008-01-08 17:23:27 +11:00
Andrew Tridgell	37861932ce	merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7)	2008-01-07 16:17:22 +11:00
Andrew Tridgell	748843a3c6	added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e)	2008-01-06 13:24:55 +11:00
Andrew Tridgell	c08f2616cd	new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3)	2008-01-06 12:38:01 +11:00
Andrew Tridgell	43aa27c9ee	this is needed with merged tdb (This used to be ctdb commit 3dc07f2bf98ab445ab960ef14173bc6924e3b658)	2008-01-05 17:42:01 +11:00
Andrew Tridgell	e4aefbc66d	a new tunable DatabaseMaxDead that enables the tdb max dead cache logic (This used to be ctdb commit 01c519c3658a8fcb9545b507b597e723658e4c4e)	2008-01-05 09:36:53 +11:00
Ronnie Sahlberg	50573c5391	add ctdb_disable/enable_monitoring() that only modifies the monitoring flag. change calling of the recovered/takeip/releaseip event scripts to use these enable/disable functions instead of stopping/starting monitoring. when we disable monitoring we want all events to still be running in particular the events to monitor for dead nodes and we only want to supress running the monitor event scripts (This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)	2007-11-30 10:09:54 +11:00
Ronnie Sahlberg	0eb6c04dc1	get rid of the control to set the monitoring mode. monitoring should always be enabled (though a node may want to temporarily disable running the "monitor" event scripts but can do so internally without the need for this control) (This used to be ctdb commit e3a33618026823e6af845fd8513cddb08e6b5584)	2007-11-30 10:00:04 +11:00
Ronnie Sahlberg	9e73dc87cc	Add a --node-ip argument so that one can specify which ip address a specific instance of ctdbd should bind to. This helps when running a "virtual" cluster on a single machine where all instcances bind to different alias interfaces. If --node-ip is specified, then we will only try to bind to this ip address only. Othervise we fall back to the original method trying the ip addresses in /etc/ctdb/nodes one by one until we find one we can bind to. No variable in /etc/sysconfig/ctdb added since this parameter only makes sense in a virtual test/debug cluster. (This used to be ctdb commit d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0)	2007-11-26 10:52:55 +11:00
Andrew Tridgell	bde886988b	prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)	2007-11-12 10:53:11 +11:00
Ronnie Sahlberg	4a97876fb7	when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)	2007-10-22 12:34:08 +10:00
Ronnie Sahlberg	d1ba047b7f	add a new transport method so that when a node is marked as dead, we shut down and restart the transport othervise, if we use the tcp transport the tcp connection might try to retransmit the queued data during the time the node is unavailable. this together with the exponential backoff for tcp means that the tcp connection quickly reaches the maximum backoff rto which is often 60 or 120 seconds. this would mean that it could take up to 60/120 seconds before the tcp layer detects that the connection is dead and it has to be reestablished. (This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)	2007-10-19 08:58:30 +10:00
Ronnie Sahlberg	056aac6e0c	add a new tunable : DeterministicIPs that makes the allocation of public addresses to nodes deterministic. Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb When this is set, the first entry in /etc/ctdb/public_addresses will always be hosted by node 0, when that node is available, the second entry by node1 and so on. This tunable allows the allocation of addresses to become very unbalanced and is only for debugging/testing use. Beware, this feature requires that /etc/ctdb/public_addresses are identical on all the nodes in the cluster. (This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)	2007-10-16 12:15:02 +10:00
Andrew Tridgell	0e855c0772	merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676)	2007-10-15 14:17:49 +10:00

1 2 3 4 5 ...

454 Commits