samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-02-14 01:57:53 +03:00

Author	SHA1	Message	Date
Ronnie Sahlberg	adf40341a7	ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7)	2008-05-11 14:28:33 +10:00
Ronnie Sahlberg	f196afd58b	fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed. Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses. This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file but is invalid today where we can have different public addresses configs on all nodes and even have some nodes that do NOT use public addresses at all. (This used to be ctdb commit 5833e6b99d9afaf35dc8354df8676b9115418b23)	2008-05-09 13:41:31 +10:00
Andrew Tridgell	abe6d816bb	fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5)	2008-05-08 19:59:24 +10:00
Ronnie Sahlberg	92b61cd7d5	Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6)	2008-05-06 15:42:59 +10:00
Ronnie Sahlberg	2c23959616	make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80)	2008-05-06 13:56:56 +10:00
Ronnie Sahlberg	6863c8f573	close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d)	2008-05-06 13:27:17 +10:00
Ronnie Sahlberg	80f85dc390	Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863)	2008-05-06 11:19:17 +10:00
Ronnie Sahlberg	d86e48d5ff	Add ability to disable recmaster and lmaster roles through sysconfig file and command line arguments (This used to be ctdb commit 34b952e4adc53ee82345275a0e28231fa1b2533e)	2008-05-06 10:41:22 +10:00
Ronnie Sahlberg	a9c45f9513	Add a capabilities field to the ctdb structure Define two capabilities : can be recmaster can be lmaster Default both capabilities to YES Update the ctdb tool to read capabilities off a node (This used to be ctdb commit 50f1255ea9ed15bb8fa11cf838b29afa77e857fd)	2008-05-06 10:02:27 +10:00
Ronnie Sahlberg	073f4a7cb4	when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220)	2008-04-22 00:56:27 +10:00
Ronnie Sahlberg	0e1a20b603	Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""" remove the transaction stuff and push so that the git tree will work This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b. (This used to be ctdb commit 876d3aca18c27c2239116c8feb6582b3a68c6571)	2008-04-10 15:59:51 +10:00
Ronnie Sahlberg	39f119b42c	Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"" This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603. (This used to be ctdb commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b)	2008-04-10 14:57:41 +10:00
Ronnie Sahlberg	9684befa16	Revert "- accept an optional set of tdb_flags from clients on open a database," This reverts commit 49330f97c78ca0669615297ac3d8498651831214. (This used to be ctdb commit 171d1d71ef9f2373620bd7da3adaecb405338603)	2008-04-10 14:45:45 +10:00
Andrew Tridgell	dc15a9c1f6	- accept an optional set of tdb_flags from clients on open a database, thus allowing the client to pass through the TDB_NOSYNC flag - ensure that tdb_store() operations on persistent databases that don't have TDB_NOSYNC set happen inside a transaction wrapper, thus making them crash safe (This used to be ctdb commit 49330f97c78ca0669615297ac3d8498651831214)	2008-04-10 15:25:48 +10:00
Ronnie Sahlberg	cd1858d126	fix compiler warning during a fatal error failing to lock down the socket (This used to be ctdb commit 0ad22de1a614dc2d1926546027be5f5eea3381ed)	2008-04-10 09:56:49 +10:00
Ronnie Sahlberg	2da3fe1b17	From Chris Cowan secure the domain socket and set permissions properly (This used to be ctdb commit ac6a362fc2fc4a56b4c310478a96eb12daace176)	2008-04-10 06:51:53 +10:00
Ronnie Sahlberg	6b797f148c	From Chris Cowan Add support in AIX to track the PID of a client that connects to the unix domain socket (This used to be ctdb commit 4c006c675d577d4a45f4db2929af6d50bc28dd9e)	2008-04-03 10:58:51 +11:00
Ronnie Sahlberg	e8e67ef576	add a mechanism to force a node to run the eventscripts with arbitrary arguments ctdb eventscript "command argument argument ..." (This used to be ctdb commit 118a16e763d8332c6ce4d8b8e194775fb874c8c8)	2008-04-02 11:13:30 +11:00
Ronnie Sahlberg	03d30f405d	decorate the memdump output with a nice field for ctdb_client structures to show the pid of the client that attached (This used to be ctdb commit 0d9314302d0b988b6ab5d533deef40c5b343c249)	2008-04-01 17:17:21 +11:00
Ronnie Sahlberg	27a7f854f5	add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)	2008-04-01 15:34:54 +11:00
Ronnie Sahlberg	0d7b34c9e5	Add two new controls to add/delete public ip address from a node at runtime. The controls only modify the runtime setting of which public addresses a node can server and does not modify /etc/ctdb/public_addresses. To make the change permanent you also need to edit /etc/ctdb/public_addresses manually. After ip addresses have been added/deleted you need to invoke a recovery for the ip addresses to be redistributed. (This used to be ctdb commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c)	2008-03-27 09:23:27 +11:00
Ronnie Sahlberg	26ec64a571	fix a memory leak allocate the memory to the 'call' context and not off the 'ctdb' context (This used to be ctdb commit be89005bd5d13409e377d425db2aad1c0d5b3826)	2008-03-25 11:11:13 +11:00
Ronnie Sahlberg	2863d2cfd1	From M Dietz, Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago (This used to be ctdb commit 8477f6a079e2beb8c09c19702733c4e17f5032fe)	2008-03-25 08:27:38 +11:00
Ronnie Sahlberg	d53424731f	in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb. This can cause a memory leak if the call is terminated before we have managed to respond to the client. (and the call is talloc_free()d but the data is still hanging off ctdb) instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak. In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc(). This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so we must change all creations of a ctdb_call into explicitely creating it through talloc() (This used to be ctdb commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f)	2008-03-19 13:54:17 +11:00
Ronnie Sahlberg	e19264ea26	change the log level for the message when someone connects to a non-public ip (This used to be ctdb commit bc9c4f0d52e9b06aceb08cea99ed3fd20b44616c)	2008-03-13 07:54:55 +11:00
Ronnie Sahlberg	74d57f8d51	Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53)	2008-03-13 07:53:29 +11:00
Ronnie Sahlberg	a89ed0fdc2	add a new tunable 'NoIPFailback' when this tunable is set, ip addresses will only be failed over when a node fails. And only those ip addresses held by the failed node will be reallocated in the cluster. When a node becomes active again, this will not lead to any failback of ip addresses. This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes. This tuneable can NOT be active at the same time as DeterministicIPs are used. (This used to be ctdb commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4)	2008-03-03 12:52:16 +11:00
Ronnie Sahlberg	e08519b74d	when we reallocate the ip addresses for nodes, we must make sure that a node that has been allocated to server an ip actually CAN serve that ip (if we use differing public_addresses files on each node) (This used to be ctdb commit fdaf7cb2d7682507fbf4c6c2b833b327c93fac08)	2008-03-03 10:53:23 +11:00
Ronnie Sahlberg	57d29f1011	add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147)	2008-03-03 10:24:17 +11:00
Ronnie Sahlberg	f6f7f54bd6	add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287)	2008-03-03 09:19:30 +11:00
Ronnie Sahlberg	cadd95263f	change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa)	2008-03-03 07:53:46 +11:00
Ronnie Sahlberg	814570f904	update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b)	2008-02-29 13:14:47 +11:00
Ronnie Sahlberg	efa29c6c98	store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436)	2008-02-29 12:55:20 +11:00
Ronnie Sahlberg	e0036942bc	add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09)	2008-02-29 12:37:42 +11:00
Ronnie Sahlberg	4adeafef11	add a control to get the name of the reclock file from the daemon (This used to be ctdb commit 9effb22cc1616d684352d7ebabb359e69adb0f52)	2008-02-29 10:03:39 +11:00
Ronnie Sahlberg	7bc8007f93	add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY. Use with caution. (This used to be ctdb commit c20293360db67f9876b0c84e5e9e12a5868964cb)	2008-02-22 10:33:09 +11:00
Ronnie Sahlberg	f3b474cffb	Add debug output to indicate why a node starts up in DISABLED state (This used to be ctdb commit 8df75775966ead36e1073896fedeff674a6e0587)	2008-02-22 09:52:57 +11:00
Ronnie Sahlberg	39539f6044	Add a new parameter to /etc/sysconfig/ctdb CTDB_START_AS_DISABLED="yes" and command line argument --start-as-disabled When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses. The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses. Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster. (This used to be ctdb commit b93d29f43f5306c244c887b54a77bca8a061daf2)	2008-02-22 09:42:52 +11:00
Ronnie Sahlberg	9f99b44fd1	to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c)	2008-02-19 14:44:48 +11:00
Ronnie Sahlberg	bef60e8200	read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf)	2008-02-18 19:38:04 +11:00
Ronnie Sahlberg	3f56526037	Specify and print debuglevels by name and not by number (This used to be ctdb commit 79ad830294b8b677fbd0c5ad7ed6fbde71f74f8d)	2008-02-05 10:26:23 +11:00
Andrew Tridgell	f6e53f433b	merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c)	2008-02-04 20:07:15 +11:00
Andrew Tridgell	9d6ac0cf55	added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)	2008-02-04 17:44:24 +11:00
Andrew Tridgell	feb7c05734	removed dependence on dprintf (This used to be ctdb commit c156db449218bf9432e3a6cb3ce0f617197c9069)	2008-01-29 14:31:51 +11:00
Andrew Tridgell	146d4b0db7	merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2)	2008-01-29 13:59:28 +11:00
Andrew Tridgell	eb044bb1d6	make ctdb dumpmemory work remotely, and dump the talloc memory tree to stdout. This is much more useful than putting it in the log, and also fixes a bug where the pipe would overflow internally and cause ctdbd to lockup (This used to be ctdb commit e236979e2162d9bd7a495086342168a696cf76c5)	2008-01-22 14:22:41 +11:00
Andrew Tridgell	d945b1af03	merge from ronnie (This used to be ctdb commit 5f6d59b9d18c694d82591238bc7a6bb98726a3ed)	2008-01-17 16:46:56 +11:00
Ronnie Sahlberg	9625483c2d	add ctdb_uptime.c (This used to be ctdb commit 4c7153681ed4d68d601720d043f9ff95ac7647a9)	2008-01-17 16:37:05 +11:00
Ronnie Sahlberg	9055978b46	add a ctdb uptime command that prints when ctdb was started and when the last recovery occured (This used to be ctdb commit b86e8ccbdac044bb949c4fc2ebb27635126272a9)	2008-01-17 11:33:23 +11:00
Andrew Tridgell	5683a8d1e1	cope better with large debug dumps (This used to be ctdb commit fc3733f8e966376f50799fd1aa7b0a8e1cf66e0e)	2008-01-16 23:06:37 +11:00

1 2 3 4 5 ...

270 Commits