samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00

3426 lines

94 KiB

C

Raw Normal View History

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/*`
			`ctdb recovery daemon`

			`Copyright (C) Ronnie Sahlberg 2007`

ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`This program is free software; you can redistribute it and/or modify`
			`it under the terms of the GNU General Public License as published by`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`the Free Software Foundation; either version 3 of the License, or`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`(at your option) any later version.`

			`This program is distributed in the hope that it will be useful,`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU General Public License for more details.`

			`You should have received a copy of the GNU General Public License`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`along with this program; if not, see <http://www.gnu.org/licenses/>.`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`*/`

			`#include "includes.h"`
			`#include "lib/events/events.h"`
			`#include "system/filesys.h"`
better timeout handling for calls, controls and traverses (This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe) 2007-05-10 08:06:48 +04:00			`#include "system/time.h"`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`#include "system/network.h"`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`#include "system/wait.h"`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`#include "popt.h"`
			`#include "cmdline.h"`
			`#include "../include/ctdb.h"`
			`#include "../include/ctdb_private.h"`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`#include "db_wrap.h"`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`#include "dlinklist.h"`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`/* list of "ctdb ipreallocate" processes to call back when we have`
			`finished the takeover run.`
			`*/`
			`struct ip_reallocate_list {`
			`struct ip_reallocate_list *next;`
			`struct rd_memdump_reply *rd;`
			`};`

new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`struct ctdb_banning_state {`
			`uint32_t count;`
			`struct timeval last_reported_time;`
			`};`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`private state of recovery daemon`
			`*/`
			`struct ctdb_recoverd {`
			`struct ctdb_context *ctdb;`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`uint32_t recmaster;`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`uint32_t num_active;`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`uint32_t num_connected;`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`uint32_t last_culprit_node;`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`struct ctdb_node_map *nodemap;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct timeval priority_time;`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`bool need_takeover_run;`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`bool need_recovery;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`uint32_t node_flags;`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`struct timed_event *send_election_te;`
			`struct timed_event *election_timeout;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct vacuum_info *vacuum_info;`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`TALLOC_CTX *ip_reallocate_ctx;`
			`struct ip_reallocate_list *reallocate_callers;`
add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00			`TALLOC_CTX *ip_check_disable_ctx;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`};`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`#define CONTROL_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_timeout, 0)`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`#define MONITOR_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_interval, 0)`
raise the control timeout in recovery (This used to be ctdb commit 43424ff66daf28c202c12982f20a9f662b6fb125) 2007-05-24 07:49:27 +04:00
convert much of the recovery logic to be async and parallel across all nodes (This used to be ctdb commit 8b72a02bf1045d8befb342a4111ca1316889262e) 2008-01-05 01:35:43 +03:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`/*`
			`ban a node for a period of time`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`static void ctdb_ban_node(struct ctdb_recoverd *rec, uint32_t pnn, uint32_t ban_time)`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`{`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`int ret;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`struct ctdb_ban_time bantime;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Banning node %u for %u seconds\n", pnn, ban_time));`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (!ctdb_validate_pnn(ctdb, pnn)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Bad pnn %u in ctdb_ban_node\n", pnn));`
handle CTDB_CURRENT_NODE in ban commands (This used to be ctdb commit fefb53f1d22c5458a1e107f8352818aee87983de) 2007-06-07 10:48:31 +04:00			`return;`
			`}`

new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`bantime.pnn = pnn;`
			`bantime.time = ban_time;`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ret = ctdb_ctrl_set_ban(ctdb, CONTROL_TIMEOUT(), pnn, &bantime);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to ban node %d\n", pnn));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`return;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`}`

added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`}`

add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`enum monitor_result { MONITOR_OK, MONITOR_RECOVERY_NEEDED, MONITOR_ELECTION_NEEDED, MONITOR_FAILED};`


merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/*`
			`run the "recovered" eventscript on all nodes`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`*/`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`static int run_recovered_eventscript(struct ctdb_context ctdb, struct ctdb_node_map nodemap, const char *caller)`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`{`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_END_RECOVERY,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL, NULL,`
			`NULL) != 0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event when called from %s\n", caller));`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return 0;`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`}`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`/*`
			`remember the trouble maker`
			`*/`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`static void ctdb_set_culprit_count(struct ctdb_recoverd *rec, uint32_t culprit, uint32_t count)`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`{`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`struct ctdb_context *ctdb = talloc_get_type(rec->ctdb, struct ctdb_context);`
			`struct ctdb_banning_state *ban_state;`

			`if (culprit > ctdb->num_nodes) {`
			`DEBUG(DEBUG_ERR,("Trying to set culprit %d but num_nodes is %d\n", culprit, ctdb->num_nodes));`
			`return;`
			`}`

			`if (ctdb->nodes[culprit]->ban_state == NULL) {`
			`ctdb->nodes[culprit]->ban_state = talloc_zero(ctdb->nodes[culprit], struct ctdb_banning_state);`
			`CTDB_NO_MEMORY_VOID(ctdb, ctdb->nodes[culprit]->ban_state);`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00
			`}`
			`ban_state = ctdb->nodes[culprit]->ban_state;`
			`if (timeval_elapsed(&ban_state->last_reported_time) > ctdb->tunable.recovery_grace_period) {`
			`/* this was the first time in a long while this node`
			`misbehaved so we will forgive any old transgressions.`
			`*/`
			`ban_state->count = 0;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`}`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00
			`ban_state->count += count;`
			`ban_state->last_reported_time = timeval_current();`
			`rec->last_culprit_node = culprit;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`}`

If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`/*`
			`remember the trouble maker`
			`*/`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`static void ctdb_set_culprit(struct ctdb_recoverd *rec, uint32_t culprit)`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`{`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit_count(rec, culprit, 1);`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`}`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`/* this callback is called for every node that failed to execute the`
			`start recovery event`
			`*/`
			`static void startrecovery_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR, (__location__ " Node %u failed the startrecovery event. Setting it as recovery fail culprit\n", node_pnn));`

			`ctdb_set_culprit(rec, node_pnn);`
			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/*`
			`run the "startrecovery" eventscript on all nodes`
			`*/`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`static int run_startrecovery_eventscript(struct ctdb_recoverd rec, struct ctdb_node_map nodemap)`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`{`
			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
			`struct ctdb_context *ctdb = rec->ctdb;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_START_RECOVERY,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL,`
			`startrecovery_fail_callback,`
			`rec) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`static void async_getcap_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`{`
			`if ( (outdata.dsize != sizeof(uint32_t)) \|\| (outdata.dptr == NULL) ) {`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Invalid lenght/pointer for getcap callback : %u %p\n", (unsigned)outdata.dsize, outdata.dptr));`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`return;`
			`}`
when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef) 2008-10-17 14:18:06 +04:00			`if (node_pnn < ctdb->num_nodes) {`
			`ctdb->nodes[node_pnn]->capabilities = ((uint32_t )outdata.dptr);`
			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`}`

			`/*`
			`update the node capabilities for all connected nodes`
			`*/`
			`static int update_capabilities(struct ctdb_context ctdb, struct ctdb_node_map nodemap)`
			`{`
			`uint32_t *nodes;`
			`TALLOC_CTX *tmp_ctx;`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_GET_CAPABILITIES,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
			`CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, tdb_null,`
			`async_getcap_callback, NULL,`
			`NULL) != 0) {`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Failed to read node capabilities.\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`static void set_recmode_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR,("Failed to freeze node %u during recovery. Set it as ban culprit for %d credits\n", node_pnn, rec->nodemap->num));`
			`ctdb_set_culprit_count(rec, node_pnn, rec->nodemap->num);`
			`}`

add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d) 2009-10-12 09:48:05 +04:00			`static void transaction_start_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR,("Failed to start recovery transaction on node %u. Set it as ban culprit for %d credits\n", node_pnn, rec->nodemap->num));`
			`ctdb_set_culprit_count(rec, node_pnn, rec->nodemap->num);`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`change recovery mode on all nodes`
			`*/`
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`static int set_recovery_mode(struct ctdb_context ctdb, struct ctdb_recoverd rec, struct ctdb_node_map *nodemap, uint32_t rec_mode)`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA data;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`uint32_t *nodes;`
			`TALLOC_CTX *tmp_ctx;`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`/* freeze all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`if (rec_mode == CTDB_RECOVERY_ACTIVE) {`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`int i;`

			`for (i=1; i<=NUM_DB_PRIORITIES; i++) {`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_FREEZE,`
			`nodes, i,`
			`CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, tdb_null,`
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`NULL,`
			`set_recmode_fail_callback,`
			`rec) != 0) {`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to freeze nodes. Recovery failed.\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`}`
			`}`

make sure we start the freeze process quickly on all nodes when we are going to do recovery - this prevents serialisation of freeze, which can take a long time (This used to be ctdb commit 52675c19e420d83d9556a3e73d9a4b490078aa9c) 2007-06-11 17:03:23 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`data.dsize = sizeof(uint32_t);`
			`data.dptr = (unsigned char *)&rec_mode;`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECMODE,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
			`CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
separate out the freeze/thaw handling from recovery (This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616) 2007-05-12 09:15:27 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`change recovery master on all node`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`static int set_recovery_master(struct ctdb_context ctdb, struct ctdb_node_map nodemap, uint32_t pnn)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA data;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`data.dsize = sizeof(uint32_t);`
			`data.dptr = (unsigned char *)&pnn;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECMASTER,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recmaster. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return 0;`
			`}`

during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`/* update all remote nodes to use the same db priority that we have`
			`this can fail if the remove node has not yet been upgraded to`
			`support this function, so we always return success and never fail`
			`a recovery if this call fails.`
			`*/`
			`static int update_db_priority_on_remote_nodes(struct ctdb_context *ctdb,`
			`struct ctdb_node_map *nodemap,`
			`uint32_t pnn, struct ctdb_dbid_map dbmap, TALLOC_CTX mem_ctx)`
			`{`
			`int db;`
			`uint32_t *nodes;`

			`nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);`

			`/* step through all local databases */`
			`for (db=0; db<dbmap->num;db++) {`
			`TDB_DATA data;`
			`struct ctdb_db_priority db_prio;`
			`int ret;`

			`db_prio.db_id = dbmap->dbs[db].dbid;`
			`ret = ctdb_ctrl_get_db_priority(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, dbmap->dbs[db].dbid, &db_prio.priority);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to read database priority from local node for db 0x%08x\n", dbmap->dbs[db].dbid));`
			`continue;`
			`}`

			`DEBUG(DEBUG_INFO,("Update DB priority for db 0x%08x to %u\n", dbmap->dbs[db].dbid, db_prio.priority));`

			`data.dptr = (uint8_t *)&db_prio;`
			`data.dsize = sizeof(db_prio);`

			`if (ctdb_client_async_control(ctdb,`
			`CTDB_CONTROL_SET_DB_PRIORITY,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to set DB priority for 0x%08x\n", db_prio.db_id));`
			`}`
			`}`

			`return 0;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`ensure all other nodes have attached to any databases that we have`
			`*/`
			`static int create_missing_remote_databases(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_dbid_map dbmap, TALLOC_CTX mem_ctx)`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`{`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`int i, j, db, ret;`
			`struct ctdb_dbid_map *remote_dbmap;`

update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`/* verify that all other nodes have all our databases */`
			`for (j=0; j<nodemap->num; j++) {`
			`/* we dont need to ourself ourselves */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`continue;`
			`}`
			`/* dont check nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_dbmap);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node %u\n", pnn));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`

			`/* step through all local databases */`
			`for (db=0; db<dbmap->num;db++) {`
			`const char *name;`


			`for (i=0;i<remote_dbmap->num;i++) {`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`if (dbmap->dbs[db].dbid == remote_dbmap->dbs[i].dbid) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`break;`
			`}`
			`}`
			`/* the remote node already have this database */`
			`if (i!=remote_dbmap->num) {`
			`continue;`
			`}`
			`/* ok so we need to create this database */`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), pnn, dbmap->dbs[db].dbid,`
			`mem_ctx, &name);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbname from node %u\n", pnn));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_createdb(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
			`mem_ctx, name, dbmap->dbs[db].persistent);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create remote db:%s\n", name));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`
			`}`
			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return 0;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`ensure we are attached to any databases that anyone else is attached to`
			`*/`
			`static int create_missing_local_databases(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_dbid_map *dbmap, TALLOC_CTX mem_ctx)`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`{`
			`int i, j, db, ret;`
			`struct ctdb_dbid_map *remote_dbmap;`

			`/* verify that we have all database any other node has */`
			`for (j=0; j<nodemap->num; j++) {`
			`/* we dont need to ourself ourselves */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`continue;`
			`}`
			`/* dont check nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_dbmap);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node %u\n", pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`

			`/* step through all databases on the remote node */`
			`for (db=0; db<remote_dbmap->num;db++) {`
			`const char *name;`

			`for (i=0;i<(*dbmap)->num;i++) {`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`if (remote_dbmap->dbs[db].dbid == (*dbmap)->dbs[i].dbid) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`break;`
			`}`
			`}`
			`/* we already have this db locally */`
			`if (i!=(*dbmap)->num) {`
			`continue;`
			`}`
			`/* ok so we need to create this database and`
			`rebuild dbmap`
			`*/`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`remote_dbmap->dbs[db].dbid, mem_ctx, &name);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbname from node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_createdb(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, name,`
			`remote_dbmap->dbs[db].persistent);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create local db:%s\n", name));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, dbmap);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to reread dbmap on node %u\n", pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
			`}`
			`}`

			`return 0;`
			`}`

create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`pull the remote database contents from one node into the recdb`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`*/`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`static int pull_one_remote_database(struct ctdb_context *ctdb, uint32_t srcnode,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`struct tdb_wrap *recdb, uint32_t dbid,`
			`bool persistent)`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int ret;`
			`TDB_DATA outdata;`
renamed the pulldb structure to a ctdb_marshall_buffer (This used to be ctdb commit bad53b2d342bb9760497e6f4a61e64ca50d6e771) 2008-07-30 13:59:18 +04:00			`struct ctdb_marshall_buffer *reply;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`struct ctdb_rec_data *rec;`
			`int i;`
			`TALLOC_CTX *tmp_ctx = talloc_new(recdb);`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`ret = ctdb_ctrl_pulldb(ctdb, srcnode, dbid, CTDB_LMASTER_ANY, tmp_ctx,`
			`CONTROL_TIMEOUT(), &outdata);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Unable to copy db from node %u\n", srcnode));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`reply = (struct ctdb_marshall_buffer *)outdata.dptr;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`if (outdata.dsize < offsetof(struct ctdb_marshall_buffer, data)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " invalid data in pulldb reply\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`rec = (struct ctdb_rec_data *)&reply->data[0];`

			`for (i=0;`
			`i<reply->count;`
			`rec = (struct ctdb_rec_data )(rec->length + (uint8_t )rec), i++) {`
			`TDB_DATA key, data;`
			`struct ctdb_ltdb_header *hdr;`
			`TDB_DATA existing;`

			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[key.dsize];`
			`data.dsize = rec->datalen;`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " bad ltdb record\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`/* fetch the existing record, if any */`
			`existing = tdb_fetch(recdb->tdb, key);`

			`if (existing.dptr != NULL) {`
			`struct ctdb_ltdb_header header;`
			`if (existing.dsize < sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Bad record size %u from node %u\n",`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`(unsigned)existing.dsize, srcnode));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`free(existing.dptr);`
			`talloc_free(tmp_ctx);`
			`return -1;`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`header = (struct ctdb_ltdb_header )existing.dptr;`
			`free(existing.dptr);`
Revert "recovery: add special pull-logic for persistent databases" This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222. This special recovery logic is wrong now with the transaction rewrite. The treatment of persistent databases will later be rewritten to use the database sequence number. Michael (This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be) 2009-12-11 19:05:30 +03:00			`if (!(header.rsn < hdr->rsn \|\|`
			`(header.dmaster != ctdb->recovery_master && header.rsn == hdr->rsn))) {`
			`continue;`
recovery: add special pull-logic for persistent databases The decision mechanism which records of a persistent db are to be pulled into the recdb during recovery is now as follows: * Usually a record with the higher rsn than that already stored is taken. (Just as for normal tdbs.) * If a transaction is running on some node, then those nodes copies of all records are taken and are not overwritten later by other nodes' copies. In order to keep track of whether a record's copy was obtained from a node with a transaction running, the recovery mechanism misuses the ctdb tdb header field 'lacount' in the recdb. It is cleared later when pushing out the recdb database to the other nodes. This way, an incomplete transaction is not spoiled when a recovery interrupts and the replay should usually succeed (possibly after a few retries). Michael (This used to be ctdb commit 8aef46d2aab3efb322dda51eaa202653cefd5222) 2009-12-04 13:21:29 +03:00			`}`
			`}`
Revert "recovery: add special pull-logic for persistent databases" This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222. This special recovery logic is wrong now with the transaction rewrite. The treatment of persistent databases will later be rewritten to use the database sequence number. Michael (This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be) 2009-12-11 19:05:30 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (tdb_store(recdb->tdb, key, data, TDB_REPLACE) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to store record\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`}`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`

create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`pull all the remote database contents into the recdb`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`*/`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`static int pull_remote_database(struct ctdb_context *ctdb,`
			`struct ctdb_recoverd *rec,`
			`struct ctdb_node_map *nodemap,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`struct tdb_wrap *recdb, uint32_t dbid,`
			`bool persistent)`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int j;`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* pull all records from all other nodes across onto this node`
			`(this merges based on rsn)`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
			`/* dont merge from nodes that are unavailable */`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`if (pull_one_remote_database(ctdb, nodemap->nodes[j].pnn, recdb, dbid, persistent) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to pull remote database from node %u\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`nodemap->nodes[j].pnn));`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`ctdb_set_culprit_count(rec, nodemap->nodes[j].pnn, nodemap->num);`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`return -1;`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`}`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`update flags on all active nodes`
			`*/`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`static int update_flags_on_all_nodes(struct ctdb_context ctdb, struct ctdb_node_map nodemap, uint32_t pnn, uint32_t flags)`
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00			`{`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`int ret;`
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), pnn, flags, ~flags);`
			`if (ret != 0) {`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to update nodeflags on remote nodes\n"));`
			`return -1;`
			`}`
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00
			`return 0;`
			`}`
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`ensure all nodes have the same vnnmap we do`
			`*/`
added automatic vacuuming of empty records during recovery (This used to be ctdb commit f9181a784ac7009df5e9c996f4e0c3e99098b59a) 2007-05-23 11:21:14 +04:00			`static int update_vnnmap_on_all_nodes(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_vnn_map vnnmap, TALLOC_CTX mem_ctx)`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`{`
			`int j, ret;`

			`/* push the new vnn map out to all the nodes */`
			`for (j=0; j<nodemap->num; j++) {`
			`/* dont push to nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_setvnnmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, mem_ctx, vnnmap);`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set vnnmap for node %u\n", pnn));`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`return -1;`
			`}`
			`}`

			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct vacuum_info {`
			`struct vacuum_info next, prev;`
			`struct ctdb_recoverd *rec;`
			`uint32_t srcnode;`
			`struct ctdb_db_context *ctdb_db;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recs;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct ctdb_rec_data *r;`
			`};`

			`static void vacuum_fetch_next(struct vacuum_info *v);`

added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`/*`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`called when a vacuum fetch has completed - just free it and do the next one`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`*/`
			`static void vacuum_fetch_callback(struct ctdb_client_call_state *state)`
			`{`
From Michael Adams, change one element from private to private_data Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 0de79352c9b36c118e36905f08ebbe38ecbb957e) 2008-07-22 03:07:42 +04:00			`struct vacuum_info *v = talloc_get_type(state->async.private_data, struct vacuum_info);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(state);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`vacuum_fetch_next(v);`
			`}`


			`/*`
			`process the next element from the vacuum list`
			`*/`
			`static void vacuum_fetch_next(struct vacuum_info *v)`
			`{`
			`struct ctdb_call call;`
			`struct ctdb_rec_data *r;`

			`while (v->recs->count) {`
			`struct ctdb_client_call_state *state;`
			`TDB_DATA data;`
			`struct ctdb_ltdb_header *hdr;`

			`ZERO_STRUCT(call);`
			`call.call_id = CTDB_NULL_FUNC;`
			`call.flags = CTDB_IMMEDIATE_MIGRATION;`

			`r = v->r;`
			`v->r = (struct ctdb_rec_data )(r->length + (uint8_t )r);`
			`v->recs->count--;`

			`call.key.dptr = &r->data[0];`
			`call.key.dsize = r->keylen;`

			`/* ensure we don't block this daemon - just skip a record if we can't get`
			`the chainlock */`
			`if (tdb_chainlock_nonblock(v->ctdb_db->ltdb->tdb, call.key) != 0) {`
			`continue;`
			`}`

			`data = tdb_fetch(v->ctdb_db->ltdb->tdb, call.key);`
fixed a memory leak in the recovery daemon (This used to be ctdb commit 73c27cf4c62cbe44b2b8fd00f907974d0808500c) 2008-01-15 12:11:44 +03:00			`if (data.dptr == NULL) {`
			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`continue;`
			`}`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
			`free(data.dptr);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`continue;`
			`}`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`if (hdr->dmaster == v->rec->ctdb->pnn) {`
			`/* its already local */`
fixed a memory leak in the recovery daemon (This used to be ctdb commit 73c27cf4c62cbe44b2b8fd00f907974d0808500c) 2008-01-15 12:11:44 +03:00			`free(data.dptr);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`continue;`
			`}`

fixed a memory leak in the recovery daemon (This used to be ctdb commit 73c27cf4c62cbe44b2b8fd00f907974d0808500c) 2008-01-15 12:11:44 +03:00			`free(data.dptr);`

ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`state = ctdb_call_send(v->ctdb_db, &call);`
			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`if (state == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to setup vacuum fetch call\n"));`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`talloc_free(v);`
			`return;`
			`}`
			`state->async.fn = vacuum_fetch_callback;`
From Michael Adams, change one element from private to private_data Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 0de79352c9b36c118e36905f08ebbe38ecbb957e) 2008-07-22 03:07:42 +04:00			`state->async.private_data = v;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`return;`
			`}`

			`talloc_free(v);`
			`}`


			`/*`
			`destroy a vacuum info structure`
			`*/`
			`static int vacuum_info_destructor(struct vacuum_info *v)`
			`{`
			`DLIST_REMOVE(v->rec->vacuum_info, v);`
			`return 0;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`


			`/*`
			`handler for vacuum fetch`
			`*/`
			`static void vacuum_fetch_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recs;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`int ret, i;`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`const char *name;`
			`struct ctdb_dbid_map *dbmap=NULL;`
			`bool persistent = false;`
			`struct ctdb_db_context *ctdb_db;`
			`struct ctdb_rec_data *r;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`uint32_t srcnode;`
			`struct vacuum_info *v;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`recs = (struct ctdb_marshall_buffer *)data.dptr;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`r = (struct ctdb_rec_data *)&recs->data[0];`

			`if (recs->count == 0) {`
fix some slow memory leaks in the vacuuming handler in the recovery daemon (This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341) 2008-09-16 01:55:57 +04:00			`talloc_free(tmp_ctx);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`return;`
			`}`

			`srcnode = r->reqid;`

			`for (v=rec->vacuum_info;v;v=v->next) {`
only match vacuum list if on the same database (This used to be ctdb commit 27e56955e93027534780cc7549ddb224670d82b6) 2008-01-09 02:22:20 +03:00			`if (srcnode == v->srcnode && recs->db_id == v->ctdb_db->db_id) {`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`/* we're already working on records from this node */`
fix some slow memory leaks in the vacuuming handler in the recovery daemon (This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341) 2008-09-16 01:55:57 +04:00			`talloc_free(tmp_ctx);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`return;`
			`}`
			`}`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
			`/* work out if the database is persistent */`
			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &dbmap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from local node\n"));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`for (i=0;i<dbmap->num;i++) {`
			`if (dbmap->dbs[i].dbid == recs->db_id) {`
			`persistent = dbmap->dbs[i].persistent;`
			`break;`
			`}`
			`}`
			`if (i == dbmap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to find db_id 0x%x on local node\n", recs->db_id));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* find the name of this database */`
			`if (ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, recs->db_id, tmp_ctx, &name) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to get name of db 0x%x\n", recs->db_id));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* attach to it */`
add a parameter for the tdb-flags to the client function ctdb_attach() so that we can pass TDB_NOSYNC when we attach to a persistent database and want fast unsafe writes instead of slow but safe tdb_transaction writes. enhance the ctdb_persistent test suite to test both safe and unsafe writes (This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4) 2008-06-04 04:46:20 +04:00			`ctdb_db = ctdb_attach(ctdb, name, persistent, 0);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`if (ctdb_db == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to attach to database '%s'\n", name));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`v = talloc_zero(rec, struct vacuum_info);`
			`if (v == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Out of memory\n"));`
fix some slow memory leaks in the vacuuming handler in the recovery daemon (This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341) 2008-09-16 01:55:57 +04:00			`talloc_free(tmp_ctx);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`return;`
			`}`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`v->rec = rec;`
			`v->srcnode = srcnode;`
			`v->ctdb_db = ctdb_db;`
			`v->recs = talloc_memdup(v, recs, data.dsize);`
			`if (v->recs == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Out of memory\n"));`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`talloc_free(v);`
fix some slow memory leaks in the vacuuming handler in the recovery daemon (This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341) 2008-09-16 01:55:57 +04:00			`talloc_free(tmp_ctx);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`return;`
			`}`
			`v->r = (struct ctdb_rec_data *)&v->recs->data[0];`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`DLIST_ADD(rec->vacuum_info, v);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`talloc_set_destructor(v, vacuum_info_destructor);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`vacuum_fetch_next(v);`
fix some slow memory leaks in the vacuuming handler in the recovery daemon (This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341) 2008-09-16 01:55:57 +04:00			`talloc_free(tmp_ctx);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`/*`
			`called when ctdb_wait_timeout should finish`
			`*/`
			`static void ctdb_wait_handler(struct event_context ev, struct timed_event te,`
			`struct timeval yt, void *p)`
			`{`
			`uint32_t timed_out = (uint32_t )p;`
			`(*timed_out) = 1;`
			`}`

			`/*`
			`wait for a given number of seconds`
			`*/`
			`static void ctdb_wait_timeout(struct ctdb_context *ctdb, uint32_t secs)`
			`{`
			`uint32_t timed_out = 0;`
			`event_add_timed(ctdb->ev, ctdb, timeval_current_ofs(secs, 0), ctdb_wait_handler, &timed_out);`
			`while (!timed_out) {`
			`event_loop_once(ctdb->ev);`
			`}`
			`}`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/*`
			`called when an election times out (ends)`
			`*/`
			`static void ctdb_election_timeout(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *p)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`rec->election_timeout = NULL;`
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00
			`DEBUG(DEBUG_WARNING,(__location__ " Election timed out\n"));`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`


			`/*`
			`wait for an election to finish. It finished election_timeout seconds after`
			`the last election packet is received`
			`*/`
			`static void ctdb_wait_election(struct ctdb_recoverd *rec)`
			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`while (rec->election_timeout) {`
			`event_loop_once(ctdb->ev);`
			`}`
			`}`

sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`/*`
when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd) 2007-11-23 03:31:42 +03:00			`Update our local flags from all remote connected nodes.`
			`This is only run when we are or we belive we are the recovery master`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`*/`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`static int update_local_flags(struct ctdb_recoverd rec, struct ctdb_node_map nodemap)`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`{`
			`int j;`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`struct ctdb_context *ctdb = rec->ctdb;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`

			`/* get the nodemap for all active remote nodes and verify`
			`they are the same as for this node`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
			`struct ctdb_node_map *remote_nodemap=NULL;`
			`int ret;`

			`if (nodemap->nodes[j].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`
			`if (nodemap->nodes[j].pnn == ctdb->pnn) {`
			`continue;`
			`}`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
			`mem_ctx, &remote_nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from remote node %u\n",`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`nodemap->nodes[j].pnn));`
move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`talloc_free(mem_ctx);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`return MONITOR_FAILED;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`
			`if (nodemap->nodes[j].flags != remote_nodemap->nodes[j].flags) {`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`/* We should tell our daemon about this so it`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`updates its flags or else we will log the same`
			`message again in the next iteration of recovery.`
when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd) 2007-11-23 03:31:42 +03:00			`Since we are the recovery master we can just as`
			`well update the flags on all nodes.`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`*/`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, nodemap->nodes[j].flags, ~nodemap->nodes[j].flags);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update nodeflags on remote nodes\n"));`
			`return -1;`
			`}`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`/* Update our local copy of the flags in the recovery`
			`daemon.`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Remote node %u had flags 0x%x, local had 0x%x - updating local\n",`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`nodemap->nodes[j].pnn, remote_nodemap->nodes[j].flags,`
			`nodemap->nodes[j].flags));`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`nodemap->nodes[j].flags = remote_nodemap->nodes[j].flags;`
			`}`
			`talloc_free(remote_nodemap);`
			`}`
			`talloc_free(mem_ctx);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`return MONITOR_OK;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`


create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00			`/* Create a new random generation ip.`
			`The generation id can not be the INVALID_GENERATION id`
			`*/`
			`static uint32_t new_generation(void)`
			`{`
			`uint32_t generation;`

			`while (1) {`
			`generation = random();`

			`if (generation != INVALID_GENERATION) {`
			`break;`
			`}`
			`}`

			`return generation;`
			`}`
we are the culprit if we can't get the reclock (This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a) 2007-10-05 06:01:40 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/*`
			`create a temporary working database`
			`*/`
			`static struct tdb_wrap create_recdb(struct ctdb_context ctdb, TALLOC_CTX *mem_ctx)`
			`{`
			`char *name;`
			`struct tdb_wrap *recdb;`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00			`unsigned tdb_flags;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* open up the temporary recovery database */`
server: create recdb.tdb.X in /var/ctdb/state/ metze (This used to be ctdb commit 92e05282d6c4f16e55d914cc3bde3738ea2d44ad) 2009-11-23 17:36:45 +03:00			`name = talloc_asprintf(mem_ctx, "%s/recdb.tdb.%u",`
			`ctdb->db_directory_state,`
			`ctdb->pnn);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (name == NULL) {`
			`return NULL;`
			`}`
			`unlink(name);`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00
			`tdb_flags = TDB_NOLOCK;`
Add --valgringing flag instead of --nosetsched The do_setsched was being tested for whether to mmap tdbs: let's make it explicit. We can also happily move the kill-child eventscript hack under this flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469) 2009-12-16 13:29:15 +03:00			`if (ctdb->valgrinding) {`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00			`tdb_flags \|= TDB_NOMMAP;`
			`}`
ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls metze Signed-off-by: Stefan Metzmacher <metze@samba.org> (This used to be ctdb commit 1635e931b909c66eb3b1f5357e3a549b1a0da70d) 2009-11-20 23:17:59 +03:00			`tdb_flags \|= TDB_DISALLOW_NESTING;`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`recdb = tdb_wrap_open(mem_ctx, name, ctdb->tunable.database_hash_size,`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00			`tdb_flags, O_RDWR\|O_CREAT\|O_EXCL, 0600);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (recdb == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create temp recovery database '%s'\n", name));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`}`

			`talloc_free(name);`

			`return recdb;`
			`}`


			`/*`
			`a traverse function for pulling all relevent records from recdb`
			`*/`
			`struct recdb_data {`
			`struct ctdb_context *ctdb;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recdata;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`uint32_t len;`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`bool failed;`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`bool persistent;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`};`

			`static int traverse_recdb(struct tdb_context tdb, TDB_DATA key, TDB_DATA data, void p)`
			`{`
			`struct recdb_data params = (struct recdb_data )p;`
			`struct ctdb_rec_data *rec;`
			`struct ctdb_ltdb_header *hdr;`

			`/* skip empty records */`
			`if (data.dsize <= sizeof(struct ctdb_ltdb_header)) {`
			`return 0;`
			`}`

			`/* update the dmaster field to point to us */`
			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
recovery: for persistent db's don't set the dmaster to the recmaster node number It is important to keep track of the dmaster (i.e. the node that last committed a transaction containing changes to this node). Michael (This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c) 2009-11-29 13:17:18 +03:00			`if (!params->persistent) {`
			`hdr->dmaster = params->ctdb->pnn;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* add the record to the blob ready to send to the nodes */`
			`rec = ctdb_marshall_record(params->recdata, 0, key, NULL, data);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`if (rec == NULL) {`
			`params->failed = true;`
			`return -1;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`params->recdata = talloc_realloc_size(NULL, params->recdata, rec->length + params->len);`
			`if (params->recdata == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to expand recdata to %u (%u records)\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`rec->length + params->len, params->recdata->count));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params->failed = true;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
			`params->recdata->count++;`
			`memcpy(params->len+(uint8_t *)params->recdata, rec, rec->length);`
			`params->len += rec->length;`
			`talloc_free(rec);`

			`return 0;`
			`}`

			`/*`
			`push the recdb database out to all nodes`
			`*/`
			`static int push_recdb_database(struct ctdb_context *ctdb, uint32_t dbid,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`bool persistent,`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`struct tdb_wrap recdb, struct ctdb_node_map nodemap)`
			`{`
			`struct recdb_data params;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recdata;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA outdata;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`recdata = talloc_zero(recdb, struct ctdb_marshall_buffer);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`CTDB_NO_MEMORY(ctdb, recdata);`

			`recdata->db_id = dbid;`

			`params.ctdb = ctdb;`
			`params.recdata = recdata;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`params.len = offsetof(struct ctdb_marshall_buffer, data);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params.failed = false;`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`params.persistent = persistent;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`if (tdb_traverse_read(recdb->tdb, traverse_recdb, &params) == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to traverse recdb database\n"));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`talloc_free(params.recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`if (params.failed) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to traverse recdb database\n"));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`talloc_free(params.recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`return -1;`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`recdata = params.recdata;`

			`outdata.dptr = (void *)recdata;`
			`outdata.dsize = params.len;`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_PUSH_DB,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, outdata,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to push recdb records to nodes for db 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - pushed remote database 0x%x of size %u\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`dbid, recdata->count));`

			`talloc_free(recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`return 0;`
			`}`


			`/*`
			`go through a full recovery on one database`
			`*/`
			`static int recover_database(struct ctdb_recoverd *rec,`
			`TALLOC_CTX *mem_ctx,`
			`uint32_t dbid,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`bool persistent,`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`uint32_t pnn,`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`struct ctdb_node_map *nodemap,`
			`uint32_t transaction_id)`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`{`
			`struct tdb_wrap *recdb;`
			`int ret;`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`TDB_DATA data;`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`struct ctdb_control_wipe_database w;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`recdb = create_recdb(ctdb, mem_ctx);`
			`if (recdb == NULL) {`
			`return -1;`
			`}`

			`/* pull all remote databases onto the recdb */`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`ret = pull_remote_database(ctdb, rec, nodemap, recdb, dbid, persistent);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to pull remote database 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - pulled remote database 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* wipe all the remote databases. This is safe as we are in a transaction */`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`w.db_id = dbid;`
			`w.transaction_id = transaction_id;`

			`data.dptr = (void *)&w;`
			`data.dsize = sizeof(w);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, recdb, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_WIPE_DATABASE,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to wipe database. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(recdb);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

			`/* push out the correct database. This sets the dmaster and skips`
			`the empty records */`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`ret = push_recdb_database(ctdb, dbid, persistent, recdb, nodemap);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (ret != 0) {`
			`talloc_free(recdb);`
			`return -1;`
			`}`

			`/* all done with this database */`
			`talloc_free(recdb);`

			`return 0;`
			`}`

when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef) 2008-10-17 14:18:06 +04:00			`/*`
			`reload the nodes file`
			`*/`
			`static void reload_nodes_file(struct ctdb_context *ctdb)`
			`{`
null out the pointer before we reload the nodes file (This used to be ctdb commit 4b0f32047e8bece0a052bdbe2209afe91b7e8ce3) 2008-10-17 14:38:42 +04:00			`ctdb->nodes = NULL;`
when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef) 2008-10-17 14:18:06 +04:00			`ctdb_load_nodes_file(ctdb);`
			`}`


- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00			`/*`
			`we are the recmaster, and recovery is needed - start a recovery run`
			`*/`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`static int do_recovery(struct ctdb_recoverd *rec,`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`TALLOC_CTX *mem_ctx, uint32_t pnn,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`struct ctdb_node_map nodemap, struct ctdb_vnn_map vnnmap)`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`{`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`int i, j, ret;`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`uint32_t generation;`
			`struct ctdb_dbid_map *dbmap;`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`TDB_DATA data;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon. Log this in "ctdb statistics". Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file. (This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a) 2009-05-14 04:33:25 +04:00			`struct timeval start_time;`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Starting do_recovery\n"));`
added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760) 2007-10-18 10:27:36 +04:00
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`/* if recovery fails, force it again */`
			`rec->need_recovery = true;`

new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`for (i=0; i<ctdb->num_nodes; i++) {`
			`struct ctdb_banning_state *ban_state;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`if (ctdb->nodes[i]->ban_state == NULL) {`
			`continue;`
			`}`
			`ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;`
			`if (ban_state->count < 2*ctdb->num_nodes) {`
			`continue;`
			`}`
			`DEBUG(DEBUG_NOTICE,("Node %u has caused %u recoveries recently - banning it for %u seconds\n",`
			`ctdb->nodes[i]->pnn, ban_state->count,`
			`ctdb->tunable.recovery_ban_period));`
			`ctdb_ban_node(rec, ctdb->nodes[i]->pnn, ctdb->tunable.recovery_ban_period);`
			`ban_state->count = 0;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00
			`if (ctdb->tunable.verify_recovery_lock != 0) {`
			`DEBUG(DEBUG_ERR,("Taking out recovery lock from recovery daemon\n"));`
			`start_time = timeval_current();`
			`if (!ctdb_recovery_lock(ctdb, true)) {`
			`ctdb_set_culprit(rec, pnn);`
			`DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery\n"));`
			`return -1;`
			`}`
			`ctdb_ctrl_report_recd_lock_latency(ctdb, CONTROL_TIMEOUT(), timeval_elapsed(&start_time));`
			`DEBUG(DEBUG_ERR,("Recovery lock taken successfully by recovery daemon\n"));`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00			`}`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery initiated due to problem with node %u\n", rec->last_culprit_node));`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`/* get a list of all databases */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, &dbmap);`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node :%u\n", pnn));`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return -1;`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* we do the db creation before we set the recovery mode, so the freeze happens`
			`on all databases we will be dealing with. */`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`/* verify that we have all the databases any other node has */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = create_missing_local_databases(ctdb, nodemap, pnn, &dbmap, mem_ctx);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create missing local databases\n"));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

			`/* verify that all other nodes have all our databases */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = create_missing_remote_databases(ctdb, nodemap, pnn, dbmap, mem_ctx);`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create missing remote databases\n"));`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - created remote databases\n"));`
- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace) 2007-06-17 17:31:44 +04:00
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`/* update the database priority for all remote databases */`
			`ret = update_db_priority_on_remote_nodes(ctdb, nodemap, pnn, dbmap, mem_ctx);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to set db priority on remote nodes\n"));`
			`}`
			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated db priority for all databases\n"));`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* set recovery mode to active on all nodes */`
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`ret = set_recovery_mode(ctdb, rec, nodemap, CTDB_RECOVERY_ACTIVE);`
in the destructor for the lock-wait child, make sure that we cancel any pending transactions. (This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98) 2008-07-07 02:50:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to active on cluster\n"));`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/* execute the "startrecovery" event script on all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`ret = run_startrecovery_eventscript(rec, nodemap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event on cluster\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`return -1;`
			`}`

server/recovery: update flags on nodes before syncing dbs metze (This used to be ctdb commit 49d2dca9ad837e1b397294fb0e966bf0b77f751c) 2009-11-27 18:36:05 +03:00			`/*`
			`update all nodes to have the same flags that we have`
			`*/`
			`for (i=0;i<nodemap->num;i++) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`ret = update_flags_on_all_nodes(ctdb, nodemap, i, nodemap->nodes[i].flags);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update flags on all nodes for node %d\n", i));`
			`return -1;`
			`}`
			`}`

			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated flags\n"));`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* pick a new generation number */`
			`generation = new_generation();`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* change the vnnmap on this node to use the new generation`
			`number but not on any other nodes.`
			`this guarantees that if we abort the recovery prematurely`
			`for some reason (a node stops responding?)`
			`that we can just return immediately and we will reenter`
			`recovery shortly again.`
			`I.e. we deliberately leave the cluster with an inconsistent`
			`generation id to allow us to abort recovery at any stage and`
			`just restart it from scratch.`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`*/`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`vnnmap->generation = generation;`
			`ret = ctdb_ctrl_setvnnmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, vnnmap);`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set vnnmap for node %u\n", pnn));`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`return -1;`
			`}`

added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`data.dptr = (void *)&generation;`
			`data.dsize = sizeof(uint32_t);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_START,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d) 2009-10-12 09:48:05 +04:00			`NULL,`
			`transaction_start_fail_callback,`
			`rec) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to start transactions. Recovery failed.\n"));`
add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d) 2009-10-12 09:48:05 +04:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_CANCEL,`
			`nodes, 0,`
			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL,`
			`NULL,`
			`NULL) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to cancel recovery transaction\n"));`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE,(__location__ " started transactions on all nodes\n"));`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`for (i=0;i<dbmap->num;i++) {`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`ret = recover_database(rec, mem_ctx,`
			`dbmap->dbs[i].dbid,`
			`dbmap->dbs[i].persistent,`
			`pnn, nodemap, generation);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Failed to recover database 0x%x\n", dbmap->dbs[i].dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - starting database commits\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* commit all the changes */`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_COMMIT,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to commit recovery changes. Recovery failed.\n"));`
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - committed databases\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`/* update the capabilities for all nodes */`
			`ret = update_capabilities(ctdb, nodemap);`
			`if (ret!=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update node capabilities.\n"));`
			`return -1;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/* build a new vnn map with all the currently active and`
			`unbanned nodes */`
create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00			`generation = new_generation();`
remove old s3 recovery code fixed vnnmap wire format in recover daemon (This used to be ctdb commit e03fab7bfe0cf43f40c49a3d63e75dc44001d8d8) 2007-05-10 02:49:57 +04:00			`vnnmap = talloc(mem_ctx, struct ctdb_vnn_map);`
			`CTDB_NO_MEMORY(ctdb, vnnmap);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`vnnmap->generation = generation;`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`vnnmap->size = 0;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`vnnmap->map = talloc_zero_array(vnnmap, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`for (i=j=0;i<nodemap->num;i++) {`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (!(ctdb->nodes[i]->capabilities & CTDB_CAP_LMASTER)) {`
			`/* this node can not be an lmaster */`
			`DEBUG(DEBUG_DEBUG, ("Node %d cant be a LMASTER, skipping it\n", i));`
			`continue;`
			`}`

			`vnnmap->size++;`
fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5) 2008-05-08 13:59:24 +04:00			`vnnmap->map = talloc_realloc(vnnmap, vnnmap->map, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
			`vnnmap->map[j++] = nodemap->nodes[i].pnn;`

recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (vnnmap->size == 0) {`
			`DEBUG(DEBUG_NOTICE, ("No suitable lmasters found. Adding local node (recmaster) anyway.\n"));`
			`vnnmap->size++;`
fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5) 2008-05-08 13:59:24 +04:00			`vnnmap->map = talloc_realloc(vnnmap, vnnmap->map, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
			`vnnmap->map[0] = pnn;`
			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`/* update to the new vnnmap on all nodes */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = update_vnnmap_on_all_nodes(ctdb, nodemap, pnn, vnnmap, mem_ctx);`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to update vnnmap on all nodes\n"));`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated vnnmap\n"));`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* update recmaster to point to us for all nodes */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = set_recovery_master(ctdb, nodemap, pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery master\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated recmaster\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`update all nodes to have the same flags that we have`
			`*/`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`for (i=0;i<nodemap->num;i++) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`ret = update_flags_on_all_nodes(ctdb, nodemap, i, nodemap->nodes[i].flags);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update flags on all nodes for node %d\n", i));`
			`return -1;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated flags\n"));`
- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace) 2007-06-17 17:31:44 +04:00
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf This attempts to fix the problem of ctdb event scripts blocking due to attempted access to the ctdb databases during recovery. The changes are: - now only the 'shutdown' and 'startrecovery' events can be called with the databases locked in recovery. The event scripts must ensure that for these two events no database access is attempted - the recovered, takeip and releaseip events could previously be called inside a recovery. The code now ensures that this doesn't happen, delaying the events till after recovery has finished - the 50.samba event script now avoids using testparm unless it is really needed This needs extensive testing. (This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b) 2008-05-14 14:57:04 +04:00			`/* disable recovery mode */`
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`ret = set_recovery_mode(ctdb, rec, nodemap, CTDB_RECOVERY_NORMAL);`
in the destructor for the lock-wait child, make sure that we cancel any pending transactions. (This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98) 2008-07-07 02:50:12 +04:00			`if (ret != 0) {`
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf This attempts to fix the problem of ctdb event scripts blocking due to attempted access to the ctdb databases during recovery. The changes are: - now only the 'shutdown' and 'startrecovery' events can be called with the databases locked in recovery. The event scripts must ensure that for these two events no database access is attempted - the recovered, takeip and releaseip events could previously be called inside a recovery. The code now ensures that this doesn't happen, delaying the events till after recovery has finished - the 50.samba event script now avoids using testparm unless it is really needed This needs extensive testing. (This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b) 2008-05-14 14:57:04 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to normal on cluster\n"));`
			`return -1;`
			`}`

			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - disabled recovery mode\n"));`

added IP takeover logic for public IPs to ctdb (This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e) 2007-05-25 11:04:13 +04:00			`/*`
fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed. Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses. This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file but is invalid today where we can have different public addresses configs on all nodes and even have some nodes that do NOT use public addresses at all. (This used to be ctdb commit 5833e6b99d9afaf35dc8354df8676b9115418b23) 2008-05-09 07:41:31 +04:00			`tell nodes to takeover their public IPs`
added IP takeover logic for public IPs to ctdb (This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e) 2007-05-25 11:04:13 +04:00			`*/`
fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed. Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses. This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file but is invalid today where we can have different public addresses configs on all nodes and even have some nodes that do NOT use public addresses at all. (This used to be ctdb commit 5833e6b99d9afaf35dc8354df8676b9115418b23) 2008-05-09 07:41:31 +04:00			`rec->need_takeover_run = false;`
			`ret = ctdb_takeover_run(ctdb, nodemap);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to setup public takeover addresses\n"));`
			`return -1;`
added IP takeover logic for public IPs to ctdb (This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e) 2007-05-25 11:04:13 +04:00			`}`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - takeip finished\n"));`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/* execute the "recovered" event script on all nodes */`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`ret = run_recovered_eventscript(ctdb, nodemap, "do_recovery");`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event on cluster. Recovery process failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`return -1;`
			`}`

read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - finished the recovered event\n"));`

send a message to clients when an IP has been released (This used to be ctdb commit 8b7ab0b00253462593d368052c2cb10a385b4e63) 2007-05-25 18:05:30 +04:00			`/* send a message to all clients telling them that the cluster`
			`has been reconfigured */`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`ctdb_send_message(ctdb, CTDB_BROADCAST_CONNECTED, CTDB_SRVID_RECONFIGURE, tdb_null);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery complete\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`rec->need_recovery = false;`

with the new banning logic with one struct for each node we no longer "forget" the other culprits as often as we used to do, which means that things like "ctdb recover" can now actually lead to a node becomming banned if we perform too many recoveries too frequently. change this to provide absolution to all nodes once they have participated in a recovery session. (This used to be ctdb commit f66d17fb2e81a35d5adb3754e1cc902f76b4590a) 2009-09-25 07:14:53 +04:00			`/* we managed to complete a full recovery, make sure to forgive`
			`any past sins by the nodes that could now participate in the`
			`recovery.`
			`*/`
			`DEBUG(DEBUG_ERR,("Resetting ban count to 0 for all nodes\n"));`
			`for (i=0;i<nodemap->num;i++) {`
			`struct ctdb_banning_state *ban_state;`

			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`ban_state = (struct ctdb_banning_state *)ctdb->nodes[nodemap->nodes[i].pnn]->ban_state;`
			`if (ban_state == NULL) {`
			`continue;`
			`}`

			`ban_state->count = 0;`
			`}`


add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`/* We just finished a recovery successfully.`
			`We now wait for rerecovery_timeout before we allow`
			`another recovery to take place.`
			`*/`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " New recoveries supressed for the rerecovery timeout\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`ctdb_wait_timeout(ctdb, ctdb->tunable.rerecovery_timeout);`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Rerecovery timeout elapsed. Recovery reactivated.\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`return 0;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/*`
			`elections are won by first checking the number of connected nodes, then`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`the priority time, then the pnn`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`*/`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`struct election_message {`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`uint32_t num_connected;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct timeval priority_time;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`uint32_t node_flags;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`};`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/*`
			`form this nodes election data`
			`*/`
			`static void ctdb_election_data(struct ctdb_recoverd rec, struct election_message em)`
			`{`
			`int ret, i;`
			`struct ctdb_node_map *nodemap;`
			`struct ctdb_context *ctdb = rec->ctdb;`

			`ZERO_STRUCTP(em);`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`em->pnn = rec->ctdb->pnn;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`em->priority_time = rec->priority_time;`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, rec, &nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " unable to get election data\n"));`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`return;`
			`}`

When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00			`rec->node_flags = nodemap->nodes[ctdb->pnn].flags;`
			`em->node_flags = rec->node_flags;`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`for (i=0;i<nodemap->num;i++) {`
			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {`
			`em->num_connected++;`
			`}`
			`}`
make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80) 2008-05-06 07:56:56 +04:00
			`/* we shouldnt try to win this election if we cant be a recmaster */`
			`if ((ctdb->capabilities & CTDB_CAP_RECMASTER) == 0) {`
			`em->num_connected = 0;`
			`em->priority_time = timeval_current();`
			`}`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`talloc_free(nodemap);`
			`}`

			`/*`
			`see if the given election data wins`
			`*/`
			`static bool ctdb_election_win(struct ctdb_recoverd rec, struct election_message em)`
			`{`
			`struct election_message myem;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`int cmp = 0;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
			`ctdb_election_data(rec, &myem);`

make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80) 2008-05-06 07:56:56 +04:00			`/* we cant win if we dont have the recmaster capability */`
			`if ((rec->ctdb->capabilities & CTDB_CAP_RECMASTER) == 0) {`
			`return false;`
			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`/* we cant win if we are banned */`
			`if (rec->node_flags & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`return false;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`}`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00
stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba) 2009-07-09 08:44:03 +04:00			`/* we cant win if we are stopped */`
			`if (rec->node_flags & NODE_FLAGS_STOPPED) {`
			`return false;`
			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`/* we will automatically win if the other node is banned */`
			`if (em->node_flags & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`return true;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`}`

stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba) 2009-07-09 08:44:03 +04:00			`/* we will automatically win if the other node is banned */`
			`if (em->node_flags & NODE_FLAGS_STOPPED) {`
			`return true;`
			`}`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/* try to use the most connected node */`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`if (cmp == 0) {`
			`cmp = (int)myem.num_connected - (int)em->num_connected;`
			`}`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
			`/* then the longest running node */`
			`if (cmp == 0) {`
later times are a lower priority, not a higher priority (This used to be ctdb commit e96424e7d366df29767c4eeaccdcc0cc975cb8ae) 2007-06-07 13:21:55 +04:00			`cmp = timeval_compare(&em->priority_time, &myem.priority_time);`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`}`

			`if (cmp == 0) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`cmp = (int)myem.pnn - (int)em->pnn;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`}`

			`return cmp > 0;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`send out an election request`
			`*/`
if a new node enters the cluster, that node will already be frozen at start but the rest of the nodes are not frozen. at this stage an election is called by the new node. Since in this case the nodes are not froze, we can not modify the recmaster of the nodes so it is expected that this control would fail. Add a boolean to send_election_request() to make it not try to set the recmaster locally for the case where we are in an election phase while not frozen. (This used to be ctdb commit c5035657606283d2e35bea40992505e84ca8e7be) 2008-07-18 06:07:25 +04:00			`static int send_election_request(struct ctdb_recoverd *rec, uint32_t pnn, bool update_recmaster)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
			`int ret;`
			`TDB_DATA election_data;`
			`struct election_message emsg;`
			`uint64_t srvid;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`srvid = CTDB_SRVID_RECOVERY;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`ctdb_election_data(rec, &emsg);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
			`election_data.dsize = sizeof(struct election_message);`
			`election_data.dptr = (unsigned char *)&emsg;`


			`/* send an election message to all active nodes */`
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00			`DEBUG(DEBUG_INFO,(__location__ " Send election request to all active nodes\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`ctdb_send_message(ctdb, CTDB_BROADCAST_ALL, srvid, election_data);`

if a new node enters the cluster, that node will already be frozen at start but the rest of the nodes are not frozen. at this stage an election is called by the new node. Since in this case the nodes are not froze, we can not modify the recmaster of the nodes so it is expected that this control would fail. Add a boolean to send_election_request() to make it not try to set the recmaster locally for the case where we are in an election phase while not frozen. (This used to be ctdb commit c5035657606283d2e35bea40992505e84ca8e7be) 2008-07-18 06:07:25 +04:00
			`/* A new node that is already frozen has entered the cluster.`
			`The existing nodes are not frozen and dont need to be frozen`
			`until the election has ended and we start the actual recovery`
			`*/`
			`if (update_recmaster == true) {`
			`/* first we assume we will win the election and set`
			`recoverymaster to be ourself on the current node`
			`*/`
			`ret = ctdb_ctrl_setrecmaster(ctdb, CONTROL_TIMEOUT(), pnn, pnn);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " failed to send recmaster election request\n"));`
			`return -1;`
			`}`
			`}`


recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return 0;`
			`}`

unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`/*`
			`this function will unban all nodes in the cluster`
			`*/`
			`static void unban_all_nodes(struct ctdb_context *ctdb)`
			`{`
			`int ret, i;`
			`struct ctdb_node_map *nodemap;`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " failed to get nodemap to unban all nodes\n"));`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`return;`
			`}`

			`for (i=0;i<nodemap->num;i++) {`
			`if ( (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED))`
			`&& (nodemap->nodes[i].flags & NODE_FLAGS_BANNED) ) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[i].pnn, 0, NODE_FLAGS_BANNED);`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
			`}`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`/*`
			`we think we are winning the election - send a broadcast election request`
			`*/`
			`static void election_send_request(struct event_context ev, struct timed_event te, struct timeval t, void *p)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`int ret;`

if a new node enters the cluster, that node will already be frozen at start but the rest of the nodes are not frozen. at this stage an election is called by the new node. Since in this case the nodes are not froze, we can not modify the recmaster of the nodes so it is expected that this control would fail. Add a boolean to send_election_request() to make it not try to set the recmaster locally for the case where we are in an election phase while not frozen. (This used to be ctdb commit c5035657606283d2e35bea40992505e84ca8e7be) 2008-07-18 06:07:25 +04:00			`ret = send_election_request(rec, ctdb_get_pnn(rec->ctdb), false);`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to send election request!\n"));`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`

			`talloc_free(rec->send_election_te);`
			`rec->send_election_te = NULL;`
			`}`

add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`/*`
			`handler for memory dumps`
			`*/`
			`static void mem_dump_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`TDB_DATA *dump;`
			`int ret;`
			`struct rd_memdump_reply *rd;`

			`if (data.dsize != sizeof(struct rd_memdump_reply)) {`
			`DEBUG(DEBUG_ERR, (__location__ " Wrong size of return address.\n"));`
fix a slow memory leak in the recovery daemon in the error paths for the memdump function (This used to be ctdb commit 5e641ef9d6cca286061138a9680dcf2495736e8b) 2008-09-16 03:00:48 +04:00			`talloc_free(tmp_ctx);`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`return;`
			`}`
			`rd = (struct rd_memdump_reply *)data.dptr;`

			`dump = talloc_zero(tmp_ctx, TDB_DATA);`
			`if (dump == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to allocate memory for memdump\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`
			`ret = ctdb_dump_memory(ctdb, dump);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb_dump_memory() failed\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`DEBUG(DEBUG_ERR, ("recovery master memory dump\n"));`

			`ret = ctdb_send_message(ctdb, rd->pnn, rd->srvid, *dump);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to send rd memdump reply message\n"));`
fix a slow memory leak in the recovery daemon in the error paths for the memdump function (This used to be ctdb commit 5e641ef9d6cca286061138a9680dcf2495736e8b) 2008-09-16 03:00:48 +04:00			`talloc_free(tmp_ctx);`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`return;`
			`}`

			`talloc_free(tmp_ctx);`
			`}`

add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00			`/*`
			`handler for reload_nodes`
			`*/`
			`static void reload_nodes_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR, (__location__ " Reload nodes file from recovery daemon\n"));`

			`reload_nodes_file(rec->ctdb);`
			`}`

add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00
			`static void reenable_ip_check(struct event_context ev, struct timed_event te,`
			`struct timeval yt, void *p)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`

			`talloc_free(rec->ip_check_disable_ctx);`
			`rec->ip_check_disable_ctx = NULL;`
			`}`

			`static void disable_ip_check_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
			`uint32_t timeout;`

			`if (rec->ip_check_disable_ctx != NULL) {`
			`talloc_free(rec->ip_check_disable_ctx);`
			`rec->ip_check_disable_ctx = NULL;`
			`}`

			`if (data.dsize != sizeof(uint32_t)) {`
From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde) 2009-10-22 05:19:40 +04:00			`DEBUG(DEBUG_ERR,(__location__ " Wrong size for data :%lu "`
			`"expexting %lu\n", (long unsigned)data.dsize,`
			`(long unsigned)sizeof(uint32_t)));`
add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00			`return;`
			`}`
			`if (data.dptr == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " No data recaived\n"));`
			`return;`
			`}`

			`timeout = ((uint32_t )data.dptr);`
			`DEBUG(DEBUG_NOTICE,("Disabling ip check for %u seconds\n", timeout));`

			`rec->ip_check_disable_ctx = talloc_new(rec);`
			`CTDB_NO_MEMORY_VOID(ctdb, rec->ip_check_disable_ctx);`

			`event_add_timed(ctdb->ev, rec->ip_check_disable_ctx, timeval_current_ofs(timeout, 0), reenable_ip_check, rec);`
			`}`


add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`/*`
			`handler for ip reallocate, just add it to the list of callers and`
			`handle this later in the monitor_cluster loop so we do not recurse`
			`with other callers to takeover_run()`
			`*/`
			`static void ip_reallocate_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
			`struct ip_reallocate_list *caller;`

			`if (data.dsize != sizeof(struct rd_memdump_reply)) {`
			`DEBUG(DEBUG_ERR, (__location__ " Wrong size of return address.\n"));`
			`return;`
			`}`

			`if (rec->ip_reallocate_ctx == NULL) {`
			`rec->ip_reallocate_ctx = talloc_new(rec);`
From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde) 2009-10-22 05:19:40 +04:00			`CTDB_NO_MEMORY_FATAL(ctdb, rec->ip_reallocate_ctx);`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`}`

			`caller = talloc(rec->ip_reallocate_ctx, struct ip_reallocate_list);`
			`CTDB_NO_MEMORY_FATAL(ctdb, caller);`

			`caller->rd = (struct rd_memdump_reply *)talloc_steal(caller, data.dptr);`
			`caller->next = rec->reallocate_callers;`
			`rec->reallocate_callers = caller;`

			`return;`
			`}`

			`static void process_ipreallocate_requests(struct ctdb_context ctdb, struct ctdb_recoverd rec)`
			`{`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`TDB_DATA result;`
			`int32_t ret;`
			`struct ip_reallocate_list *callers;`

			`DEBUG(DEBUG_INFO, ("recovery master forced ip reallocation\n"));`
			`ret = ctdb_takeover_run(ctdb, rec->nodemap);`
			`result.dsize = sizeof(int32_t);`
			`result.dptr = (uint8_t *)&ret;`

			`for (callers=rec->reallocate_callers; callers; callers=callers->next) {`
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00
			`/* Someone that sent srvid==0 does not want a reply */`
			`if (callers->rd->srvid == 0) {`
			`continue;`
			`}`
From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde) 2009-10-22 05:19:40 +04:00			`DEBUG(DEBUG_INFO,("Sending ip reallocate reply message to "`
server: print out the full 64-bit srvid on 32-bit hosts metze (This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949) 2009-10-09 17:50:59 +04:00			`"%u:%llu\n", (unsigned)callers->rd->pnn,`
			`(unsigned long long)callers->rd->srvid));`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`ret = ctdb_send_message(ctdb, callers->rd->pnn, callers->rd->srvid, result);`
			`if (ret != 0) {`
From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde) 2009-10-22 05:19:40 +04:00			`DEBUG(DEBUG_ERR,("Failed to send ip reallocate reply "`
server: print out the full 64-bit srvid on 32-bit hosts metze (This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949) 2009-10-09 17:50:59 +04:00			`"message to %u:%llu\n",`
From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde) 2009-10-22 05:19:40 +04:00			`(unsigned)callers->rd->pnn,`
server: print out the full 64-bit srvid on 32-bit hosts metze (This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949) 2009-10-09 17:50:59 +04:00			`(unsigned long long)callers->rd->srvid));`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
			`talloc_free(rec->ip_reallocate_ctx);`
			`rec->ip_reallocate_ctx = NULL;`
			`rec->reallocate_callers = NULL;`

			`}`
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/*`
			`handler for recovery master elections`
			`*/`
			`static void election_handler(struct ctdb_context *ctdb, uint64_t srvid,`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`TDB_DATA data, void *private_data)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`int ret;`
			`struct election_message em = (struct election_message )data.dptr;`
			`TALLOC_CTX *mem_ctx;`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/* we got an election packet - update the timeout for the election */`
			`talloc_free(rec->election_timeout);`
			`rec->election_timeout = event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(ctdb->tunable.election_timeout, 0),`
			`ctdb_election_timeout, rec);`

setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`mem_ctx = talloc_new(ctdb);`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* someone called an election. check their election data`
			`and if we disagree and we would rather be the elected node,`
			`send a new election message to all other nodes`
			`*/`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`if (ctdb_election_win(rec, em)) {`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (!rec->send_election_te) {`
			`rec->send_election_te = event_add_timed(ctdb->ev, rec,`
			`timeval_current_ofs(0, 500000),`
			`election_send_request, rec);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
			`talloc_free(mem_ctx);`
should be sufficient to unban nodes when we unbecome recmaster (This used to be ctdb commit 8a6c4e675b4b877a9d0a7a3701973573ff0b71e8) 2007-06-09 14:13:25 +04:00			`/unban_all_nodes(ctdb);/`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`/* we didn't win */`
			`talloc_free(rec->send_election_te);`
			`rec->send_election_te = NULL;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`if (ctdb->tunable.verify_recovery_lock != 0) {`
			`/* release the recmaster lock */`
			`if (em->pnn != ctdb->pnn &&`
			`ctdb->recovery_lock_fd != -1) {`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`unban_all_nodes(ctdb);`
			`}`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`}`

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* ok, let that guy become recmaster then */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_setrecmaster(ctdb, CONTROL_TIMEOUT(), ctdb_get_pnn(ctdb), em->pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to send recmaster election request"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`talloc_free(mem_ctx);`
			`return;`
			`}`

			`talloc_free(mem_ctx);`
			`return;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`force the start of the election process`
			`*/`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`static void force_election(struct ctdb_recoverd *rec, uint32_t pnn,`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_node_map *nodemap)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
			`int ret;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00			`DEBUG(DEBUG_INFO,(__location__ " Force an election\n"));`

when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00			`/* set all nodes to recovery mode to stop all internode traffic */`
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`ret = set_recovery_mode(ctdb, rec, nodemap, CTDB_RECOVERY_ACTIVE);`
in the destructor for the lock-wait child, make sure that we cancel any pending transactions. (This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98) 2008-07-07 02:50:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to active on cluster\n"));`
when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00			`return;`
			`}`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`talloc_free(rec->election_timeout);`
			`rec->election_timeout = event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(ctdb->tunable.election_timeout, 0),`
			`ctdb_election_timeout, rec);`

if a new node enters the cluster, that node will already be frozen at start but the rest of the nodes are not frozen. at this stage an election is called by the new node. Since in this case the nodes are not froze, we can not modify the recmaster of the nodes so it is expected that this control would fail. Add a boolean to send_election_request() to make it not try to set the recmaster locally for the case where we are in an election phase while not frozen. (This used to be ctdb commit c5035657606283d2e35bea40992505e84ca8e7be) 2008-07-18 06:07:25 +04:00			`ret = send_election_request(rec, pnn, true);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to initiate recmaster election"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`

moved system specific ip code to system.c (This used to be ctdb commit 9de9e4ccda9665108baac12a8716b189d26340b1) 2007-05-26 08:01:08 +04:00			`/* wait for a few seconds to collect all responses */`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`ctdb_wait_election(rec);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`



			`/*`
			`handler for when a node changes its flags`
			`*/`
			`static void monitor_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`int ret;`
			`struct ctdb_node_flag_change c = (struct ctdb_node_flag_change )data.dptr;`
			`struct ctdb_node_map *nodemap=NULL;`
			`TALLOC_CTX *tmp_ctx;`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`uint32_t changed_flags;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`int i;`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35) 2009-10-10 06:55:11 +04:00			`int disabled_flag_changed;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (data.dsize != sizeof(*c)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ "Invalid data in ctdb_node_flag_change\n"));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`return;`
			`}`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY_VOID(ctdb, tmp_ctx);`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &nodemap);`
fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec) 2007-12-27 02:07:01 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ "ctdb_ctrl_getnodemap failed in monitor_handler\n"));`
fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec) 2007-12-27 02:07:01 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`for (i=0;i<nodemap->num;i++) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[i].pnn == c->pnn) break;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

			`if (i == nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ "Flag change for non-existant node %u\n", c->pnn));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`changed_flags = c->old_flags ^ c->new_flags;`

			`if (nodemap->nodes[i].flags != c->new_flags) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Node %u has changed flags - now 0x%x was 0x%x\n", c->pnn, c->new_flags, c->old_flags));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35) 2009-10-10 06:55:11 +04:00			`disabled_flag_changed = (nodemap->nodes[i].flags ^ c->new_flags) & NODE_FLAGS_DISABLED;`

change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`nodemap->nodes[i].flags = c->new_flags;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`ret = ctdb_ctrl_getrecmaster(ctdb, tmp_ctx, CONTROL_TIMEOUT(),`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_CURRENT_NODE, &ctdb->recovery_master);`

			`if (ret == 0) {`
hang the ctdb_req_control structure off the ctdb_client_control_state struct so that if we timeout a control we can print debug info such as what opcode failed and to which node we dont need the *status parameter to ctdb_client_control_state create async versions of the getrecmaster control pass a memory context to getrecmaster (This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75) 2007-08-23 07:00:10 +04:00			`ret = ctdb_ctrl_getrecmode(ctdb, tmp_ctx, CONTROL_TIMEOUT(),`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_CURRENT_NODE, &ctdb->recovery_mode);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (ret == 0 &&`
change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`ctdb->recovery_master == ctdb->pnn &&`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`/* Only do the takeover run if the perm disabled or unhealthy`
			`flags changed since these will cause an ip failover but not`
			`a recovery.`
			`If the node became disconnected or banned this will also`
			`lead to an ip address failover but that is handled`
			`during recovery`
			`*/`
verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35) 2009-10-10 06:55:11 +04:00			`if (disabled_flag_changed) {`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`rec->need_takeover_run = true;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`/*`
			`handler for when we need to push out flag changes ot all other nodes`
			`*/`
			`static void push_flags_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`int ret;`
			`struct ctdb_node_flag_change c = (struct ctdb_node_flag_change )data.dptr;`
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`struct ctdb_node_map *nodemap=NULL;`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`uint32_t recmaster;`
			`uint32_t *nodes;`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`/* find the recovery master */`
			`ret = ctdb_ctrl_getrecmaster(ctdb, tmp_ctx, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &recmaster);`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`if (ret != 0) {`
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster from local node\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* read the node flags from the recmaster */`
			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), recmaster, tmp_ctx, &nodemap);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from node %u\n", c->pnn));`
			`talloc_free(tmp_ctx);`
			`return;`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`}`
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`if (c->pnn >= nodemap->num) {`
			`DEBUG(DEBUG_ERR,(__location__ " Nodemap from recmaster does not contain node %d\n", c->pnn));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* send the flags update to all connected nodes */`
			`nodes = list_of_connected_nodes(ctdb, nodemap, tmp_ctx, true);`

			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_MODIFY_FLAGS,`
			`nodes, 0, CONTROL_TIMEOUT(),`
			`false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb_control to modify node flags failed\n"));`

			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`talloc_free(tmp_ctx);`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct verify_recmode_normal_data {`
			`uint32_t count;`
			`enum monitor_result status;`
			`};`

			`static void verify_recmode_normal_callback(struct ctdb_client_control_state *state)`
			`{`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`struct verify_recmode_normal_data *rmdata = talloc_get_type(state->async.private_data, struct verify_recmode_normal_data);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00

			`/* one more node has responded with recmode data*/`
			`rmdata->count--;`

			`/* if we failed to get the recmode, then return an error and let`
			`the main loop try again.`
			`*/`
			`if (state->state != CTDB_CONTROL_DONE) {`
			`if (rmdata->status == MONITOR_OK) {`
			`rmdata->status = MONITOR_FAILED;`
			`}`
			`return;`
			`}`

			`/* if we got a response, then the recmode will be stored in the`
			`status field`
			`*/`
			`if (state->status != CTDB_RECOVERY_NORMAL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Node:%u was in recovery mode. Restart recovery process\n", state->c->hdr.destnode));`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`rmdata->status = MONITOR_RECOVERY_NEEDED;`
			`}`

			`return;`
			`}`


			`/* verify that all nodes are in normal recovery mode */`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`static enum monitor_result verify_recmode(struct ctdb_context ctdb, struct ctdb_node_map nodemap)`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`{`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct verify_recmode_normal_data *rmdata;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct ctdb_client_control_state *state;`
			`enum monitor_result status;`
			`int j;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`rmdata = talloc(mem_ctx, struct verify_recmode_normal_data);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rmdata);`
			`rmdata->count = 0;`
			`rmdata->status = MONITOR_OK;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
			`/* loop over all active nodes and send an async getrecmode call to`
			`them*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`state = ctdb_ctrl_getrecmode_send(ctdb, mem_ctx,`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`CONTROL_TIMEOUT(),`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`if (state == NULL) {`
			`/* we failed to send the control, treat this as`
			`an error and try again next iteration`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to call ctdb_ctrl_getrecmode_send during monitoring\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`return MONITOR_FAILED;`
			`}`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`/* set up the callback functions */`
			`state->async.fn = verify_recmode_normal_callback;`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`state->async.private_data = rmdata;`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
			`/* one more control to wait for to complete */`
			`rmdata->count++;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`}`

change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
			`/* now wait for up to the maximum number of seconds allowed`
			`or until all nodes we expect a response from has replied`
			`*/`
			`while (rmdata->count > 0) {`
			`event_loop_once(ctdb->ev);`
			`}`

			`status = rmdata->status;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`return status;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`}`

change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`struct verify_recmaster_data {`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`struct ctdb_recoverd *rec;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`uint32_t count;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`enum monitor_result status;`
			`};`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`static void verify_recmaster_callback(struct ctdb_client_control_state *state)`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`{`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`struct verify_recmaster_data *rmdata = talloc_get_type(state->async.private_data, struct verify_recmaster_data);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00

			`/* one more node has responded with recmaster data*/`
			`rmdata->count--;`

			`/* if we failed to get the recmaster, then return an error and let`
			`the main loop try again.`
			`*/`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`if (state->state != CTDB_CONTROL_DONE) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`if (rmdata->status == MONITOR_OK) {`
			`rmdata->status = MONITOR_FAILED;`
			`}`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`

			`/* if we got a response, then the recmaster will be stored in the`
			`status field`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (state->status != rmdata->pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Node %d does not agree we are the recmaster. Need a new recmaster election\n", state->c->hdr.destnode));`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`ctdb_set_culprit(rmdata->rec, state->c->hdr.destnode);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->status = MONITOR_ELECTION_NEEDED;`
			`}`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`


			`/* verify that all nodes agree that we are the recmaster */`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`static enum monitor_result verify_recmaster(struct ctdb_recoverd rec, struct ctdb_node_map nodemap, uint32_t pnn)`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`{`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`struct verify_recmaster_data *rmdata;`
			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
			`struct ctdb_client_control_state *state;`
			`enum monitor_result status;`
			`int j;`

			`rmdata = talloc(mem_ctx, struct verify_recmaster_data);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rmdata);`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`rmdata->rec = rec;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->count = 0;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`rmdata->pnn = pnn;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->status = MONITOR_OK;`

			`/* loop over all active nodes and send an async getrecmaster call to`
			`them*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
			`state = ctdb_ctrl_getrecmaster_send(ctdb, mem_ctx,`
get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be) 2007-08-23 13:38:54 +04:00			`CONTROL_TIMEOUT(),`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`if (state == NULL) {`
			`/* we failed to send the control, treat this as`
			`an error and try again next iteration`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to call ctdb_ctrl_getrecmaster_send during monitoring\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
			`return MONITOR_FAILED;`
			`}`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`/* set up the callback functions */`
			`state->async.fn = verify_recmaster_callback;`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`state->async.private_data = rmdata;`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`/* one more control to wait for to complete */`
			`rmdata->count++;`
			`}`


			`/* now wait for up to the maximum number of seconds allowed`
			`or until all nodes we expect a response from has replied`
			`*/`
get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be) 2007-08-23 13:38:54 +04:00			`while (rmdata->count > 0) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`event_loop_once(ctdb->ev);`
			`}`

			`status = rmdata->status;`
			`talloc_free(mem_ctx);`
			`return status;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`/* called to check that the allocation of public ip addresses is ok.`
			`*/`
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`static int verify_ip_allocation(struct ctdb_context ctdb, struct ctdb_recoverd rec, uint32_t pnn)`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`{`
			`TALLOC_CTX *mem_ctx = talloc_new(NULL);`
			`struct ctdb_all_public_ips *ips = NULL;`
			`struct ctdb_uptime *uptime1 = NULL;`
			`struct ctdb_uptime *uptime2 = NULL;`
			`int ret, j;`

fixed a memory leak in the recovery daemon thanks to vl for spotting this (This used to be ctdb commit 96df98d9f86ecc6bb1a458eb2101e5c1bc0f96e6) 2008-08-11 17:33:05 +04:00			`ret = ctdb_ctrl_uptime(ctdb, mem_ctx, CONTROL_TIMEOUT(),`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`CTDB_CURRENT_NODE, &uptime1);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get uptime from local node %u\n", pnn));`
			`talloc_free(mem_ctx);`
			`return -1;`
			`}`

			`/* read the ip allocation from the local node */`
			`ret = ctdb_ctrl_get_public_ips(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, mem_ctx, &ips);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get public ips from local node %u\n", pnn));`
			`talloc_free(mem_ctx);`
			`return -1;`
			`}`

fixed a memory leak in the recovery daemon thanks to vl for spotting this (This used to be ctdb commit 96df98d9f86ecc6bb1a458eb2101e5c1bc0f96e6) 2008-08-11 17:33:05 +04:00			`ret = ctdb_ctrl_uptime(ctdb, mem_ctx, CONTROL_TIMEOUT(),`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`CTDB_CURRENT_NODE, &uptime2);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get uptime from local node %u\n", pnn));`
			`talloc_free(mem_ctx);`
			`return -1;`
			`}`

			`/* skip the check if the startrecovery time has changed */`
			`if (timeval_compare(&uptime1->last_recovery_started,`
			`&uptime2->last_recovery_started) != 0) {`
			`DEBUG(DEBUG_NOTICE, (__location__ " last recovery time changed while we read the public ip list. skipping public ip address check\n"));`
From Volker L Fix a slow memory leak in the recovery daemon if there is a recoery triggered during the public ip reassignment process (This used to be ctdb commit 0aca4daf908b76d6013ff3dfad41beb9114fc1a3) 2008-09-16 00:50:28 +04:00			`talloc_free(mem_ctx);`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`return 0;`
			`}`

			`/* skip the check if the endrecovery time has changed */`
			`if (timeval_compare(&uptime1->last_recovery_finished,`
			`&uptime2->last_recovery_finished) != 0) {`
			`DEBUG(DEBUG_NOTICE, (__location__ " last recovery time changed while we read the public ip list. skipping public ip address check\n"));`
From Volker L Fix a slow memory leak in the recovery daemon if there is a recoery triggered during the public ip reassignment process (This used to be ctdb commit 0aca4daf908b76d6013ff3dfad41beb9114fc1a3) 2008-09-16 00:50:28 +04:00			`talloc_free(mem_ctx);`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`return 0;`
			`}`

			`/* skip the check if we have started but not finished recovery */`
			`if (timeval_compare(&uptime1->last_recovery_finished,`
			`&uptime1->last_recovery_started) != 1) {`
mprove the log message when we skip the ip allocation check from the recovery daemon. we also skip this check if we are already in the process of performing an ip reallocation and not only when we are performing a full recovery. (This used to be ctdb commit 1a09b02767f3928d3c5db0e0afc59bb938e4a445) 2009-10-21 04:51:30 +04:00			`DEBUG(DEBUG_NOTICE, (__location__ " in the middle of recovery or ip reallocation. skipping public ip address check\n"));`
From Volker L Fix a slow memory leak in the recovery daemon if there is a recoery triggered during the public ip reassignment process (This used to be ctdb commit 0aca4daf908b76d6013ff3dfad41beb9114fc1a3) 2008-09-16 00:50:28 +04:00			`talloc_free(mem_ctx);`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
			`return 0;`
			`}`

			`/* verify that we have the ip addresses we should have`
			`and we dont have ones we shouldnt have.`
			`if we find an inconsistency we set recmode to`
			`active on the local node and wait for the recmaster`
			`to do a full blown recovery`
			`*/`
			`for (j=0; j<ips->num; j++) {`
			`if (ips->ips[j].pnn == pnn) {`
initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b) 2008-08-19 08:58:29 +04:00			`if (!ctdb_sys_have_ip(&ips->ips[j].addr)) {`
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`struct takeover_run_reply rd;`
			`TDB_DATA data;`

initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b) 2008-08-19 08:58:29 +04:00			`DEBUG(DEBUG_CRIT,("Public address '%s' is missing and we should serve this ip\n",`
			`ctdb_addr_to_str(&ips->ips[j].addr)));`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`rd.pnn = ctdb->pnn;`
			`rd.srvid = 0;`
			`data.dptr = (uint8_t *)&rd;`
			`data.dsize = sizeof(rd);`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`ret = ctdb_send_message(ctdb, rec->recmaster, CTDB_SRVID_TAKEOVER_RUN, data);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to send ipreallocate to recmaster :%d\n", (int)rec->recmaster));`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`}`
			`}`
			`} else {`
initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b) 2008-08-19 08:58:29 +04:00			`if (ctdb_sys_have_ip(&ips->ips[j].addr)) {`
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`struct takeover_run_reply rd;`
			`TDB_DATA data;`

initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b) 2008-08-19 08:58:29 +04:00			`DEBUG(DEBUG_CRIT,("We are still serving a public address '%s' that we should not be serving.\n",`
			`ctdb_addr_to_str(&ips->ips[j].addr)));`

when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`rd.pnn = ctdb->pnn;`
			`rd.srvid = 0;`
			`data.dptr = (uint8_t *)&rd;`
			`data.dsize = sizeof(rd);`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`ret = ctdb_send_message(ctdb, rec->recmaster, CTDB_SRVID_TAKEOVER_RUN, data);`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`if (ret != 0) {`
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to send ipreallocate to recmaster :%d\n", (int)rec->recmaster));`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`}`
			`}`
			`}`
			`}`

			`talloc_free(mem_ctx);`
			`return 0;`
			`}`

redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00
			`static void async_getnodemap_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_node_map **remote_nodemaps = callback_data;`

			`if (node_pnn >= ctdb->num_nodes) {`
			`DEBUG(DEBUG_ERR,(__location__ " pnn from invalid node\n"));`
			`return;`
			`}`

			`remote_nodemaps[node_pnn] = (struct ctdb_node_map *)talloc_steal(remote_nodemaps, outdata.dptr);`

			`}`

			`static int get_remote_nodemaps(struct ctdb_context ctdb, TALLOC_CTX mem_ctx,`
			`struct ctdb_node_map *nodemap,`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`struct ctdb_node_map **remote_nodemaps)`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`{`
			`uint32_t *nodes;`

			`nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_GET_NODEMAP,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`CONTROL_TIMEOUT(), false, tdb_null,`
			`async_getnodemap_callback,`
			`NULL,`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`remote_nodemaps) != 0) {`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to pull all remote nodemaps\n"));`

			`return -1;`
			`}`

			`return 0;`
			`}`

in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process. This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery. (This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20) 2009-06-19 08:44:26 +04:00			`enum reclock_child_status { RECLOCK_CHECKING, RECLOCK_OK, RECLOCK_FAILED, RECLOCK_TIMEOUT};`
			`struct ctdb_check_reclock_state {`
			`struct ctdb_context *ctdb;`
			`struct timeval start_time;`
			`int fd[2];`
			`pid_t child;`
			`struct timed_event *te;`
			`struct fd_event *fde;`
			`enum reclock_child_status status;`
			`};`

			`/* when we free the reclock state we must kill any child process.`
			`*/`
			`static int check_reclock_destructor(struct ctdb_check_reclock_state *state)`
			`{`
			`struct ctdb_context *ctdb = state->ctdb;`

			`ctdb_ctrl_report_recd_lock_latency(ctdb, CONTROL_TIMEOUT(), timeval_elapsed(&state->start_time));`

			`if (state->fd[0] != -1) {`
			`close(state->fd[0]);`
			`state->fd[0] = -1;`
			`}`
			`if (state->fd[1] != -1) {`
			`close(state->fd[1]);`
			`state->fd[1] = -1;`
			`}`
			`kill(state->child, SIGKILL);`
			`return 0;`
			`}`

			`/*`
			`called if our check_reclock child times out. this would happen if`
			`i/o to the reclock file blocks.`
			`*/`
			`static void ctdb_check_reclock_timeout(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *private_data)`
			`{`
			`struct ctdb_check_reclock_state *state = talloc_get_type(private_data,`
			`struct ctdb_check_reclock_state);`

			`DEBUG(DEBUG_ERR,(__location__ " check_reclock child process hung/timedout CFS slow to grant locks?\n"));`
			`state->status = RECLOCK_TIMEOUT;`
			`}`

			`/* this is called when the child process has completed checking the reclock`
			`file and has written data back to us through the pipe.`
			`*/`
			`static void reclock_child_handler(struct event_context ev, struct fd_event fde,`
			`uint16_t flags, void *private_data)`
			`{`
			`struct ctdb_check_reclock_state *state= talloc_get_type(private_data,`
			`struct ctdb_check_reclock_state);`
			`char c = 0;`
			`int ret;`

			`/* we got a response from our child process so we can abort the`
			`timeout.`
			`*/`
			`talloc_free(state->te);`
			`state->te = NULL;`

			`ret = read(state->fd[0], &c, 1);`
			`if (ret != 1 \|\| c != RECLOCK_OK) {`
			`DEBUG(DEBUG_ERR,(__location__ " reclock child process returned error %d\n", c));`
			`state->status = RECLOCK_FAILED;`

			`return;`
			`}`

			`state->status = RECLOCK_OK;`
			`return;`
			`}`

			`static int check_recovery_lock(struct ctdb_context *ctdb)`
			`{`
			`int ret;`
			`struct ctdb_check_reclock_state *state;`
			`pid_t parent = getpid();`

			`if (ctdb->recovery_lock_fd == -1) {`
			`DEBUG(DEBUG_CRIT,("recovery master doesn't have the recovery lock\n"));`
			`return -1;`
			`}`

			`state = talloc(ctdb, struct ctdb_check_reclock_state);`
			`CTDB_NO_MEMORY(ctdb, state);`

			`state->ctdb = ctdb;`
			`state->start_time = timeval_current();`
			`state->status = RECLOCK_CHECKING;`
			`state->fd[0] = -1;`
			`state->fd[1] = -1;`

			`ret = pipe(state->fd);`
			`if (ret != 0) {`
			`talloc_free(state);`
			`DEBUG(DEBUG_CRIT,(__location__ " Failed to open pipe for check_reclock child\n"));`
			`return -1;`
			`}`

			`state->child = fork();`
			`if (state->child == (pid_t)-1) {`
			`DEBUG(DEBUG_CRIT,(__location__ " fork() failed in check_reclock child\n"));`
dont leak file descriptors (This used to be ctdb commit 268c3e4b269a92741a02280c84384178e73de10e) 2009-06-19 08:54:22 +04:00			`close(state->fd[0]);`
			`state->fd[0] = -1;`
			`close(state->fd[1]);`
			`state->fd[1] = -1;`
in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process. This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery. (This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20) 2009-06-19 08:44:26 +04:00			`talloc_free(state);`
			`return -1;`
			`}`

			`if (state->child == 0) {`
			`char cc = RECLOCK_OK;`
			`close(state->fd[0]);`
			`state->fd[0] = -1;`

			`if (pread(ctdb->recovery_lock_fd, &cc, 1, 0) == -1) {`
			`DEBUG(DEBUG_CRIT,("failed read from recovery_lock_fd - %s\n", strerror(errno)));`
			`cc = RECLOCK_FAILED;`
			`}`

			`write(state->fd[1], &cc, 1);`
			`/* make sure we die when our parent dies */`
			`while (kill(parent, 0) == 0 \|\| errno != ESRCH) {`
			`sleep(5);`
			`write(state->fd[1], &cc, 1);`
			`}`
			`_exit(0);`
			`}`
			`close(state->fd[1]);`
			`state->fd[1] = -1;`
add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e) 2009-10-15 04:24:54 +04:00			`set_close_on_exec(state->fd[0]);`

lower the debug levels for the "create FD messages" so we dont fill up the logs. (This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3) 2009-10-21 08:26:24 +04:00			`DEBUG(DEBUG_DEBUG, (__location__ " Created PIPE FD:%d for check_recovery_lock\n", state->fd[0]));`
in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process. This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery. (This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20) 2009-06-19 08:44:26 +04:00
			`talloc_set_destructor(state, check_reclock_destructor);`

			`state->te = event_add_timed(ctdb->ev, state, timeval_current_ofs(15, 0),`
			`ctdb_check_reclock_timeout, state);`
			`if (state->te == NULL) {`
			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create a timed event for reclock child\n"));`
			`talloc_free(state);`
			`return -1;`
			`}`

			`state->fde = event_add_fd(ctdb->ev, state, state->fd[0],`
			`EVENT_FD_READ\|EVENT_FD_AUTOCLOSE,`
			`reclock_child_handler,`
			`(void *)state);`

			`if (state->fde == NULL) {`
			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create an fd event for reclock child\n"));`
			`talloc_free(state);`
			`return -1;`
			`}`

			`while (state->status == RECLOCK_CHECKING) {`
			`event_loop_once(ctdb->ev);`
			`}`

			`if (state->status == RECLOCK_FAILED) {`
			`DEBUG(DEBUG_ERR,(__location__ " reclock child failed when checking file\n"));`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`talloc_free(state);`
			`return -1;`
			`}`

			`talloc_free(state);`
			`return 0;`
			`}`

update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`static int update_recovery_lock_file(struct ctdb_context *ctdb)`
			`{`
			`TALLOC_CTX *tmp_ctx = talloc_new(NULL);`
			`const char *reclockfile;`

			`if (ctdb_ctrl_getreclock(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &reclockfile) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to read reclock file from daemon\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`if (reclockfile == NULL) {`
			`if (ctdb->recovery_lock_file != NULL) {`
			`DEBUG(DEBUG_ERR,("Reclock file disabled\n"));`
			`talloc_free(ctdb->recovery_lock_file);`
			`ctdb->recovery_lock_file = NULL;`
			`if (ctdb->recovery_lock_fd != -1) {`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`}`
			`}`
			`ctdb->tunable.verify_recovery_lock = 0;`
			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

			`if (ctdb->recovery_lock_file == NULL) {`
			`ctdb->recovery_lock_file = talloc_strdup(ctdb, reclockfile);`
			`if (ctdb->recovery_lock_fd != -1) {`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`}`
			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`


			`if (!strcmp(reclockfile, ctdb->recovery_lock_file)) {`
			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

			`talloc_free(ctdb->recovery_lock_file);`
			`ctdb->recovery_lock_file = talloc_strdup(ctdb, reclockfile);`
			`ctdb->tunable.verify_recovery_lock = 0;`
			`if (ctdb->recovery_lock_fd != -1) {`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`/*`
			`the main monitoring loop`
			`*/`
clean out some more cruft (This used to be ctdb commit ad16c5fe2748b48a6f6c79976359d56d9bed33f4) 2007-06-05 11:57:07 +04:00			`static void monitor_cluster(struct ctdb_context *ctdb)`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`{`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`uint32_t pnn;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`TALLOC_CTX *mem_ctx=NULL;`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`struct ctdb_node_map *nodemap=NULL;`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`struct ctdb_node_map *recmaster_nodemap=NULL;`
			`struct ctdb_node_map **remote_nodemaps=NULL;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`struct ctdb_vnn_map *vnnmap=NULL;`
			`struct ctdb_vnn_map *remote_vnnmap=NULL;`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`int32_t debug_level;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`int i, j, ret;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_recoverd *rec;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("monitor_cluster starting\n"));`
added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760) 2007-10-18 10:27:36 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`rec = talloc_zero(ctdb, struct ctdb_recoverd);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rec);`

			`rec->ctdb = ctdb;`

use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`rec->priority_time = timeval_current();`

add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`/* register a message port for sending memory dumps */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_MEM_DUMP, mem_dump_handler, rec);`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/* register a message port for recovery elections */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_RECOVERY, election_handler, rec);`

reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`/* when nodes are disabled/enabled */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_SET_NODE_FLAGS, monitor_handler, rec);`

			`/* when we are asked to puch out a flag change */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_PUSH_NODE_FLAGS, push_flags_handler, rec);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`/* register a message port for vacuum fetch */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_VACUUM_FETCH, vacuum_fetch_handler, rec);`
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00			`/* register a message port for reloadnodes */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_RELOAD_NODES, reload_nodes_handler, rec);`

add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`/* register a message port for performing a takeover run */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_TAKEOVER_RUN, ip_reallocate_handler, rec);`

add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00			`/* register a message port for disabling the ip check for a short while */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_DISABLE_IP_CHECK, disable_ip_check_handler, rec);`

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`again:`
			`if (mem_ctx) {`
			`talloc_free(mem_ctx);`
			`mem_ctx = NULL;`
			`}`
			`mem_ctx = talloc_new(ctdb);`
			`if (!mem_ctx) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create temporary context\n"));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`exit(-1);`
			`}`

			`/* we only check for recovery once every second */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`ctdb_wait_timeout(ctdb, ctdb->tunable.recover_interval);`
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00
merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`/* verify that the main daemon is still running */`
			`if (kill(ctdb->ctdbd_pid, 0) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,("CTDB daemon is no longer available. Shutting down recovery daemon\n"));`
merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`exit(-1);`
			`}`

additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 07:44:46 +04:00			`/* ping the local daemon to tell it we are alive */`
			`ctdb_ctrl_recd_ping(ctdb);`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (rec->election_timeout) {`
			`/* an election is in progress */`
			`goto again;`
			`}`

read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`/* read the debug level from the parent and update locally */`
			`ret = ctdb_ctrl_get_debuglevel(ctdb, CTDB_CURRENT_NODE, &debug_level);`
			`if (ret !=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to read debuglevel from parent\n"));`
			`goto again;`
			`}`
			`LogLevel = debug_level;`

move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00
			`/* We must check if we need to ban a node here but we want to do this`
			`as early as possible so we dont wait until we have pulled the node`
			`map from the local node. thats why we have the hardcoded value 20`
			`*/`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`for (i=0; i<ctdb->num_nodes; i++) {`
			`struct ctdb_banning_state *ban_state;`

			`if (ctdb->nodes[i]->ban_state == NULL) {`
			`continue;`
			`}`
			`ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;`
			`if (ban_state->count < 20) {`
			`continue;`
			`}`
			`DEBUG(DEBUG_NOTICE,("Node %u has caused %u recoveries recently - banning it for %u seconds\n",`
			`ctdb->nodes[i]->pnn, ban_state->count,`
			`ctdb->tunable.recovery_ban_period));`
			`ctdb_ban_node(rec, ctdb->nodes[i]->pnn, ctdb->tunable.recovery_ban_period);`
			`ban_state->count = 0;`
move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00			`}`

make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`/* get relevant tunables */`
get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93) 2007-06-07 12:05:25 +04:00			`ret = ctdb_ctrl_get_all_tunables(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->tunable);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to get tunables - retrying\n"));`
get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93) 2007-06-07 12:05:25 +04:00			`goto again;`
			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`/* get the current recovery lock file from the server */`
			`if (update_recovery_lock_file(ctdb) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to update the recovery lock file\n"));`
			`goto again;`
			`}`

			`/* Make sure that if recovery lock verification becomes disabled when`
Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`we close the file`
			`*/`
			`if (ctdb->tunable.verify_recovery_lock == 0) {`
			`if (ctdb->recovery_lock_fd != -1) {`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`}`
			`}`

change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn (This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f) 2007-09-04 04:38:48 +04:00			`pnn = ctdb_ctrl_getpnn(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE);`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (pnn == (uint32_t)-1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to get local pnn - retrying\n"));`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`goto again;`
			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`/* get the vnnmap */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getvnnmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, &vnnmap);`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get vnnmap from node %u\n", pnn));`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`goto again;`
			`}`


start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/* get number of nodes */`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`if (rec->nodemap) {`
			`talloc_free(rec->nodemap);`
			`rec->nodemap = NULL;`
			`nodemap=NULL;`
			`}`
			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), pnn, rec, &rec->nodemap);`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from node %u\n", pnn));`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`goto again;`
			`}`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`nodemap = rec->nodemap;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`/* check which node is the recovery master */`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`ret = ctdb_ctrl_getrecmaster(ctdb, mem_ctx, CONTROL_TIMEOUT(), pnn, &rec->recmaster);`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster from node %u\n", pnn));`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`goto again;`
			`}`

add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`/* if we are not the recmaster we can safely ignore any ip reallocate requests */`
			`if (rec->recmaster != pnn) {`
			`if (rec->ip_reallocate_ctx != NULL) {`
			`talloc_free(rec->ip_reallocate_ctx);`
			`rec->ip_reallocate_ctx = NULL;`
			`rec->reallocate_callers = NULL;`
			`}`
			`}`
			`/* if there are takeovers requested, perform it and notify the waiters */`
			`if (rec->reallocate_callers) {`
			`process_ipreallocate_requests(ctdb, rec);`
			`}`

change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (rec->recmaster == (uint32_t)-1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,(__location__ " Initial recovery master set - forcing election\n"));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`goto again;`
			`}`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00

			`/* if the local daemon is STOPPED, we verify that the databases are`
			`also frozen and thet the recmode is set to active`
			`*/`
			`if (nodemap->nodes[pnn].flags & NODE_FLAGS_STOPPED) {`
			`ret = ctdb_ctrl_getrecmode(ctdb, mem_ctx, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->recovery_mode);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to read recmode from local node\n"));`
			`}`
			`if (ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {`
			`DEBUG(DEBUG_ERR,("Node is stopped but recovery mode is not active. Activate recovery mode and lock databases\n"));`

allow setting the recmode even when not completely frozen. we sometimes have to do this when we want to trigger a recovery (This used to be ctdb commit 46194e87e189521375b39b4ef33da2b493429fd8) 2009-10-12 06:06:16 +04:00			`ret = ctdb_ctrl_freeze_priority(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, 1);`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to freeze node due to node being STOPPED\n"));`
			`goto again;`
			`}`
			`ret = ctdb_ctrl_setrecmode(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, CTDB_RECOVERY_ACTIVE);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to activate recovery mode due to node being stopped\n"));`

			`goto again;`
			`}`
			`goto again;`
			`}`
			`}`
stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba) 2009-07-09 08:44:03 +04:00			`/* If the local node is stopped, verify we are not the recmaster`
			`and yield this role if so`
			`*/`
			`if ((nodemap->nodes[pnn].flags & NODE_FLAGS_STOPPED) && (rec->recmaster == pnn)) {`
			`DEBUG(DEBUG_ERR,("Local node is STOPPED. Yielding recmaster role\n"));`
			`force_election(rec, pnn, nodemap);`
			`goto again;`
			`}`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00
when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00			`/* check that we (recovery daemon) and the local ctdb daemon`
			`agrees on whether we are banned or not`
			`*/`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`//qqq`
when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`/* remember our own node flags */`
			`rec->node_flags = nodemap->nodes[pnn].flags;`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`/* count how many active nodes there are */`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`rec->num_active = 0;`
			`rec->num_connected = 0;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (i=0; i<nodemap->num; i++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`rec->num_active++;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {`
			`rec->num_connected++;`
			`}`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`


recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* verify that the recmaster node is still active */`
			`for (j=0; j<nodemap->num; j++) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (nodemap->nodes[j].pnn==rec->recmaster) {`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`break;`
			`}`
setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`}`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00
			`if (j == nodemap->num) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`DEBUG(DEBUG_ERR, ("Recmaster node %u not in list. Force reelection\n", rec->recmaster));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`goto again;`
			`}`

first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433) 2007-10-11 01:10:17 +04:00			`/* if recovery master is disconnected we must elect a new recmaster */`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_DISCONNECTED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, ("Recmaster node %u is disconnected. Force reelection\n", nodemap->nodes[j].pnn));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433) 2007-10-11 01:10:17 +04:00			`goto again;`
			`}`

merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`/* grap the nodemap from the recovery master to check if it is banned */`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`mem_ctx, &recmaster_nodemap);`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from recovery master %u\n",`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`nodemap->nodes[j].pnn));`
			`goto again;`
			`}`


redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`if (recmaster_nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, ("Recmaster node %u no longer available. Force reelection\n", nodemap->nodes[j].pnn));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`goto again;`
			`}`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`/* verify that we have all ip addresses we should have and we dont`
			`* have addresses we shouldnt have.`
			`*/`
inew version 1.0.66 ddwq (This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5) 2008-11-24 11:06:02 +03:00			`if (ctdb->do_checkpublicip) {`
add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00			`if (rec->ip_check_disable_ctx == NULL) {`
when we detect a ip-allocation mismatch, just force a new ip reassignment instead of a full blown recovery (This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51) 2009-12-01 08:06:59 +03:00			`if (verify_ip_allocation(ctdb, rec, pnn) != 0) {`
add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Public IPs were inconsistent.\n"));`
			`}`
inew version 1.0.66 ddwq (This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5) 2008-11-24 11:06:02 +03:00			`}`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`}`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
			`/* if we are not the recmaster then we do not need to check`
			`if recovery is needed`
			`*/`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (pnn != rec->recmaster) {`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`goto again;`
			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`/* ensure our local copies of flags are right */`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`ret = update_local_flags(rec, nodemap);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`if (ret == MONITOR_ELECTION_NEEDED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("update_local_flags() called for a re-election.\n"));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`goto again;`
			`}`
			`if (ret != MONITOR_OK) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Unable to update local flags\n"));`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`goto again;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`}`

allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00			`/* update the list of public ips that a node can handle for`
			`all connected nodes`
			`*/`
when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef) 2008-10-17 14:18:06 +04:00			`if (ctdb->num_nodes != nodemap->num) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb->num_nodes (%d) != nodemap->num (%d) reloading nodes file\n", ctdb->num_nodes, nodemap->num));`
			`reload_nodes_file(ctdb);`
			`goto again;`
			`}`
allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00			`for (j=0; j<nodemap->num; j++) {`
			`/* release any existing data */`
			`if (ctdb->nodes[j]->public_ips) {`
			`talloc_free(ctdb->nodes[j]->public_ips);`
			`ctdb->nodes[j]->public_ips = NULL;`
			`}`
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`

allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00			`/* grab a new shiny list of public ips from the node */`
			`if (ctdb_ctrl_get_public_ips(ctdb, CONTROL_TIMEOUT(),`
			`ctdb->nodes[j]->pnn,`
			`ctdb->nodes,`
			`&ctdb->nodes[j]->public_ips)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to read public ips from node : %u\n",`
allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00			`ctdb->nodes[j]->pnn));`
			`goto again;`
			`}`
			`}`


recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* verify that all active nodes agree that we are the recmaster */`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`switch (verify_recmaster(rec, nodemap, pnn)) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_RECOVERY_NEEDED:`
			`/* can not happen */`
			`goto again;`
			`case MONITOR_ELECTION_NEEDED:`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`goto again;`
			`case MONITOR_OK:`
			`break;`
			`case MONITOR_FAILED:`
			`goto again;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`


- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`if (rec->need_recovery) {`
			`/* a previous recovery didn't finish */`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`goto again;`
			`}`

add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`/* verify that all active nodes are in normal mode`
			`and not in recovery mode`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`*/`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`switch (verify_recmode(ctdb, nodemap)) {`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_RECOVERY_NEEDED:`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`goto again;`
			`case MONITOR_FAILED:`
			`goto again;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_ELECTION_NEEDED:`
			`/* can not happen */`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_OK:`
			`break;`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`}`


Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`if (ctdb->tunable.verify_recovery_lock != 0) {`
			`/* we should have the reclock - check its not stale */`
			`ret = check_recovery_lock(ctdb);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,("Failed check_recovery_lock. Force a recovery\n"));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`goto again;`
			`}`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`}`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`/* get the nodemap for all active remote nodes`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`*/`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`remote_nodemaps = talloc_array(mem_ctx, struct ctdb_node_map *, nodemap->num);`
			`if (remote_nodemaps == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " failed to allocate remote nodemap array\n"));`
			`goto again;`
			`}`
			`for(i=0; i<nodemap->num; i++) {`
			`remote_nodemaps[i] = NULL;`
			`}`
			`if (get_remote_nodemaps(ctdb, mem_ctx, nodemap, remote_nodemaps) != 0) {`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to read remote nodemaps\n"));`
			`goto again;`
			`}`

			`/* verify that all other nodes have the same nodemap as we have`
			`*/`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (j=0; j<nodemap->num; j++) {`
We dont need to verify the nodemap on remote nodes that are banned (This used to be ctdb commit 7f8f9385deee6eff2b7303147bc6412bbdc122df) 2009-04-06 06:00:22 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`if (remote_nodemaps[j] == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Did not get a remote nodemap for node %d, restarting monitoring\n", j));`
if we cant pull the remote nodemap off a node we should mark it as a culprit so it eventually becomes banned. (This used to be ctdb commit 0889ae3c237bdb3bd72d45f2f64f5e5d8420870c) 2009-04-02 07:50:43 +04:00			`ctdb_set_culprit(rec, j);`

update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`goto again;`
			`}`

redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`/* if the nodes disagree on how many nodes there are`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`then this is a good reason to try recovery`
			`*/`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`if (remote_nodemaps[j]->num != nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different node count. %u vs %u of the local node\n",`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`nodemap->nodes[j].pnn, remote_nodemaps[j]->num, nodemap->num));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* if the nodes disagree on which nodes exist and are`
			`active, then that is also a good reason to do recovery`
			`*/`
			`for (i=0;i<nodemap->num;i++) {`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`if (remote_nodemaps[j]->nodes[i].pnn != nodemap->nodes[i].pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different nodemap pnn for %d (%u vs %u).\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, i,`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`remote_nodemaps[j]->nodes[i].pnn, nodemap->nodes[i].pnn));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`
			`}`

redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`/* verify the flags are consistent`
			`*/`
			`for (i=0; i<nodemap->num; i++) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`if (nodemap->nodes[i].flags != remote_nodemaps[j]->nodes[i].flags) {`
			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different flags for node %u. It has 0x%02x vs our 0x%02x\n",`
			`nodemap->nodes[j].pnn,`
			`nodemap->nodes[i].pnn,`
			`remote_nodemaps[j]->nodes[i].flags,`
			`nodemap->nodes[j].flags));`
			`if (i == j) {`
			`DEBUG(DEBUG_ERR,("Use flags 0x%02x from remote node %d for cluster update of its own flags\n", remote_nodemaps[j]->nodes[i].flags, j));`
			`update_flags_on_all_nodes(ctdb, nodemap, nodemap->nodes[i].pnn, remote_nodemaps[j]->nodes[i].flags);`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`goto again;`
			`} else {`
			`DEBUG(DEBUG_ERR,("Use flags 0x%02x from local recmaster node for cluster update of node %d flags\n", nodemap->nodes[i].flags, i));`
			`update_flags_on_all_nodes(ctdb, nodemap, nodemap->nodes[i].pnn, nodemap->nodes[i].flags);`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`goto again;`
			`}`
			`}`
			`}`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`


			`/* there better be the same number of lmasters in the vnn map`
setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`as there are active nodes or we will have to do a recovery`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`*/`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`if (vnnmap->size != rec->num_active) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " The vnnmap count is different from the number of active nodes. %u vs %u\n",`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`vnnmap->size, rec->num_active));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* verify that all active nodes in the nodemap also exist in`
			`the vnnmap.`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

			`for (i=0; i<vnnmap->size; i++) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`if (vnnmap->map[i] == nodemap->nodes[j].pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`break;`
			`}`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (i == vnnmap->size) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Node %u is active in the nodemap but did not exist in the vnnmap\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`
			`}`


also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`/* verify that all other nodes have the same vnnmap`
			`and are from the same generation`
			`*/`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getvnnmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get vnnmap from remote node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`/* verify the vnnmap generation is the same */`
			`if (vnnmap->generation != remote_vnnmap->generation) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different generation of vnnmap. %u vs %u (ours)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_vnnmap->generation, vnnmap->generation));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`goto again;`
			`}`

update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`/* verify the vnnmap size is the same */`
			`if (vnnmap->size != remote_vnnmap->size) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different size of vnnmap. %u vs %u (ours)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_vnnmap->size, vnnmap->size));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* verify the vnnmap is the same */`
			`for (i=0;i<vnnmap->size;i++) {`
			`if (remote_vnnmap->map[i] != vnnmap->map[i]) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different vnnmap.\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`
			`}`
			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/* we might need to change who has what IP assigned */`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`if (rec->need_takeover_run) {`
			`rec->need_takeover_run = false;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`/* execute the "startrecovery" event script on all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`ret = run_startrecovery_eventscript(rec, nodemap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event on cluster\n"));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`ret = ctdb_takeover_run(ctdb, nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to setup public takeover addresses - starting recovery\n"));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`}`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`/* execute the "recovered" event script on all nodes */`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`ret = run_recovered_eventscript(ctdb, nodemap, "monitor_cluster");`
dont check whether the "recovered" event was successful or not since this event wont run unless the recovery mode is normal but we can not know what the recovery mode will be in the future on a remote node so since we issue these commands that will execute in the future at some other node it is pointless to try to check if it worked or not in particular if "failure to successfully run the eventscript" would then trigger a full new recovery which is disruptive and expensive. (This used to be ctdb commit 2c292039a0139dcf5bb2bd964eb6f8902d094c50) 2008-05-15 09:01:01 +04:00			`#if 0`
			`// we cant check whether the event completed successfully`
			`// since this script WILL fail if the node is in recovery mode`
			`// and if that race happens, the code here would just cause a second`
			`// cascading recovery.`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event on cluster. Update of public ips failed.\n"));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`}`
dont check whether the "recovered" event was successful or not since this event wont run unless the recovery mode is normal but we can not know what the recovery mode will be in the future on a remote node so since we issue these commands that will execute in the future at some other node it is pointless to try to check if it worked or not in particular if "failure to successfully run the eventscript" would then trigger a full new recovery which is disruptive and expensive. (This used to be ctdb commit 2c292039a0139dcf5bb2bd964eb6f8902d094c50) 2008-05-15 09:01:01 +04:00			`#endif`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`}`

force an update of the flags from the recmaster after each monitoring run (This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4) 2008-06-26 07:08:37 +04:00
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/*`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`event handler for when the main ctdbd dies`
			`*/`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`static void ctdb_recoverd_parent(struct event_context ev, struct fd_event fde,`
			`uint16_t flags, void *private_data)`
			`{`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT,("recovery daemon parent died - exiting\n"));`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`_exit(1);`
			`}`

Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`/*`
			`called regularly to verify that the recovery daemon is still running`
			`*/`
			`static void ctdb_check_recd(struct event_context ev, struct timed_event te,`
			`struct timeval yt, void *p)`
			`{`
			`struct ctdb_context *ctdb = talloc_get_type(p, struct ctdb_context);`

			`if (kill(ctdb->recoverd_pid, 0) != 0) {`
			`DEBUG(DEBUG_ERR,("Recovery daemon (pid:%d) is no longer running. Shutting down main daemon\n", (int)ctdb->recoverd_pid));`

			`ctdb_stop_recoverd(ctdb);`
			`ctdb_stop_keepalive(ctdb);`
			`ctdb_stop_monitoring(ctdb);`
			`ctdb_release_all_ips(ctdb);`
ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7) 2008-05-11 08:28:33 +04:00			`if (ctdb->methods != NULL) {`
			`ctdb->methods->shutdown(ctdb);`
			`}`
eventscript: introduce enum for different event script calls. Rather than doing strcmp everywhere, pass an explicit enum around. This also subtly documents what options are available. The "options" arg is now used for extra arguments only. Unfortunately, gcc complains on empty format strings, so we make ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We leave ctdb_event_script_callback() taking varargs, which means callers have to do "%s", "". For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts from the ctdb tool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2) 2009-11-24 03:46:49 +03:00			`ctdb_event_script(ctdb, CTDB_EVENT_SHUTDOWN);`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00
			`exit(10);`
			`}`

			`event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(30, 0),`
			`ctdb_check_recd, ctdb);`
			`}`

proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`static void recd_sig_child_handler(struct event_context *ev,`
			`struct signal_event *se, int signum, int count,`
			`void *dont_care,`
			`void *private_data)`
			`{`
			`// struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);`
			`int status;`
			`pid_t pid = -1;`

			`while (pid != 0) {`
			`pid = waitpid(-1, &status, WNOHANG);`
			`if (pid == -1) {`
dont log an error if waitpid returns -1 and errno is ECHILD (This used to be ctdb commit fdf50f3e774e3980af81c0b6f4ff81d085f4f697) 2009-06-19 09:55:13 +04:00			`if (errno != ECHILD) {`
			`DEBUG(DEBUG_ERR, (__location__ " waitpid() returned error. errno:%s(%d)\n", strerror(errno),errno));`
			`}`
proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`return;`
			`}`
			`if (pid > 0) {`
			`DEBUG(DEBUG_DEBUG, ("RECD SIGCHLD from %d\n", (int)pid));`
			`}`
			`}`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`startup the recovery daemon as a child of the main ctdb daemon`
			`*/`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`int ctdb_start_recoverd(struct ctdb_context *ctdb)`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`{`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`int fd[2];`
proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`struct signal_event *se;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`if (pipe(fd) != 0) {`
			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`ctdb->ctdbd_pid = getpid();`

when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`ctdb->recoverd_pid = fork();`
			`if (ctdb->recoverd_pid == -1) {`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`if (ctdb->recoverd_pid != 0) {`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(fd[0]);`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(30, 0),`
			`ctdb_check_recd, ctdb);`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return 0;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(fd[1]);`

			`srandom(getpid() ^ time(NULL));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a) 2009-03-23 04:37:30 +03:00			`if (switch_from_server_to_client(ctdb) != 0) {`
			`DEBUG(DEBUG_CRIT, (__location__ "ERROR: failed to switch recovery daemon into client mode. shutting down.\n"));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`exit(1);`
			`}`

add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e) 2009-10-15 04:24:54 +04:00			`DEBUG(DEBUG_NOTICE, (__location__ " Created PIPE FD:%d to recovery daemon\n", fd[0]));`

create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a) 2009-03-23 04:37:30 +03:00			`event_add_fd(ctdb->ev, ctdb, fd[0], EVENT_FD_READ\|EVENT_FD_AUTOCLOSE,`
			`ctdb_recoverd_parent, &fd[0]);`

proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`/* set up a handler to pick up sigchld */`
			`se = event_add_signal(ctdb->ev, ctdb,`
			`SIGCHLD, 0,`
			`recd_sig_child_handler,`
			`ctdb);`
			`if (se == NULL) {`
			`DEBUG(DEBUG_CRIT,("Failed to set up signal handler for SIGCHLD in recovery daemon\n"));`
			`exit(1);`
			`}`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`monitor_cluster(ctdb);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT,("ERROR: ctdb_recoverd finished!?\n"));`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00
			`/*`
			`shutdown the recovery daemon`
			`*/`
			`void ctdb_stop_recoverd(struct ctdb_context *ctdb)`
			`{`
			`if (ctdb->recoverd_pid == 0) {`
			`return;`
			`}`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Shutting down recovery daemon\n"));`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`kill(ctdb->recoverd_pid, SIGTERM);`
			`}`

3426 lines 94 KiB C Raw Normal View History Unescape Escape

3426 lines

94 KiB

C

Raw Normal View History